If we want to have an MSAN build, it'll help if we can build our own zlib
so that it's instrumented by MSAN.
Today we build our own zlib on Windows, but require the system to provide it
elsewhere. This just makes everyone build it (except Android framework of course).
This drops the SIMD files. They're only used to accelerate deflate
(compression), so they're not terribly interesting to us. Again, this only
really changes compression speed on Windows bots... pretty niche.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1665843002
Review URL: https://codereview.chromium.org/1665843002
Bump SK_IMAGE_VERSION to test the images in v2 in GoogleStorage, which
includes the images from v1 plus test images for SkRawCodec.
Only define skia_decodes_raw on platforms that support it, rather than
defining it always and checking additional conditions to determine
whether to support raw. Further, define it and SK_CODEC_DECODES_RAW
for all targets, so we can use the compile flag in other targets.
In DM, exclude the raw extensions if SK_CODEC_DECODES_RAW is not defined.
Blacklist raw extensions on NexusPlayer, which was running out of memory
when running them.
BUG=skia:4829
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1612113002
Review URL: https://codereview.chromium.org/1612113002
Reason for revert:
Speculative to fix windows bots
Original issue's description:
> Treat bad values passed to --images as a fatal error
>
> If an option is passed to --images that is either a non-existent path or
> a folder with no images matching the supported types, assume this is
> an error and exit, so they can supply a valid path instead.
>
> Share code between DM and nanobench in SkCommonFlags.
>
> nanobench now behaves more like DM - it will check a directory for
> images that match the supported extensions.
>
> Only consider image paths ending in RAW suffixes as images if
> SK_CODE_DECODES_RAW is defined. This prevents us from seeing failure
> to decode errors on platforms that cannot decode it.
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1611323004
>
> Committed: https://skia.googlesource.com/skia/+/7579786f3bd5a8fda84a1abc45b16213c3371f93TBR=mtklein@google.com,borenet@google.com,msarett@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
# Not skipping CQ checks because original CL landed more than 1 days ago.
Review URL: https://codereview.chromium.org/1653543002
If an option is passed to --images that is either a non-existent path or
a folder with no images matching the supported types, assume this is
an error and exit, so they can supply a valid path instead.
Share code between DM and nanobench in SkCommonFlags.
nanobench now behaves more like DM - it will check a directory for
images that match the supported extensions.
Only consider image paths ending in RAW suffixes as images if
SK_CODE_DECODES_RAW is defined. This prevents us from seeing failure
to decode errors on platforms that cannot decode it.
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1611323004
Review URL: https://codereview.chromium.org/1611323004
Local timing says this 4-byte Paeth function takes about 0.3x the time the serial libpng code does, dropping from ~10 cycles per byte to ~2.9.
bpp=4 is mainly an easy demo. This approach can work for any bpp up to 16, 1 pixel at a time, at roughly the same cost per pixel. Doing more than 1 pixel at a time is a tricky math problem I have yet to attempt to solve.
Everything here can be trivially downgraded to MMX, supporting bpp up to 8. It seems to be a little slower (~3.5 cycles per byte), but it would make the code compatible with every x86 that can still power on.
I've tried four approaches:
- this way;
- doing things naively in 16-bit;
- a 16-bit version that requires division by 3 (i.e. mulhi_epu16(..., 0x5580) );
- a mostly 8-bit version of the same.
They're all fine, but this one is consistently the fastest I've measured.
I'd be happy to settle on the naive 16-bit version too, which would have a very clear implementation that's only minorly slower than this version. The other two are way more complicated, and would require us to draw some serious ASCII diagrams to explain.
I have learned that the .skp serialization tests (serialize-8888) have a nice side effect of testing the correctness of these filters!
(Since writing the description above, I've bumped things up to {Paeth,Sub,Avg} x { 3 bpp, 4 bpp }.)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1573943002
Review URL: https://codereview.chromium.org/1573943002
Reason for revert:
Bot failures
Original issue's description:
> AVX 2 SrcOver blits: color32, blitmask.
>
> As a follow up to the SSE 4.1 CL, this should look pretty familiar.
>
> I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later.
>
> Perf changes (relative to SSE 4.1):
> Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit
> Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent
> text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit
> text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit
>
> This should be a big throughout win, and a small latency win.
> This should all be pixel-exact to the previous SSE 4.1 code.
>
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36TBR=herb@google.com,mtklein@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1632713002
As a follow up to the SSE 4.1 CL, this should look pretty familiar.
I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later.
Perf changes (relative to SSE 4.1):
Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit
Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent
text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit
text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit
This should be a big throughout win, and a small latency win.
This should all be pixel-exact to the previous SSE 4.1 code.
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot
Review URL: https://codereview.chromium.org/1532613002