Local timing says this 4-byte Paeth function takes about 0.3x the time the serial libpng code does, dropping from ~10 cycles per byte to ~2.9.
bpp=4 is mainly an easy demo. This approach can work for any bpp up to 16, 1 pixel at a time, at roughly the same cost per pixel. Doing more than 1 pixel at a time is a tricky math problem I have yet to attempt to solve.
Everything here can be trivially downgraded to MMX, supporting bpp up to 8. It seems to be a little slower (~3.5 cycles per byte), but it would make the code compatible with every x86 that can still power on.
I've tried four approaches:
- this way;
- doing things naively in 16-bit;
- a 16-bit version that requires division by 3 (i.e. mulhi_epu16(..., 0x5580) );
- a mostly 8-bit version of the same.
They're all fine, but this one is consistently the fastest I've measured.
I'd be happy to settle on the naive 16-bit version too, which would have a very clear implementation that's only minorly slower than this version. The other two are way more complicated, and would require us to draw some serious ASCII diagrams to explain.
I have learned that the .skp serialization tests (serialize-8888) have a nice side effect of testing the correctness of these filters!
(Since writing the description above, I've bumped things up to {Paeth,Sub,Avg} x { 3 bpp, 4 bpp }.)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1573943002
Review URL: https://codereview.chromium.org/1573943002
All platforms except android are configured to use the statically linked copy
of libpng. Android uses the system provided dynamic copy for SkImageDecoder
and the static copy for SkCodec. The exception being android framework builds
that currently use the dynamic copy everywhere.
This CL also enables NEON optimizations for libpng.
Review URL: https://codereview.chromium.org/1058823002
Reason for revert:
breaking the nexus_9 and ios builds.
Original issue's description:
> Enable both static and dynamically linked libpng
>
> All platforms except android are configured to use the statically linked copy of libpng. Android uses the system provided dynamic copy for SkImageDecoder and the static copy for SkCodec. The exception being android framework builds that currently use the dynamic copy everywhere.
>
> This CL also enables NEON optimizations for libpng.
>
> Committed: https://skia.googlesource.com/skia/+/2469c999518e7b0063d35e9e2eb074a0477c21acTBR=scroggo@google.com,msarett@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1050183002
All platforms except android are configured to use the statically linked copy of libpng. Android uses the system provided dynamic copy for SkImageDecoder and the static copy for SkCodec. The exception being android framework builds that currently use the dynamic copy everywhere.
This CL also enables NEON optimizations for libpng.
Review URL: https://codereview.chromium.org/1032253003
DM:
Add a flag to use SkCodec instead of SkImageDecoder.
SkCodec:
Base class for codecs, allowing creation from an SkStream or an SkData.
An SkCodec, on creation, knows properties of the data like its width and height. Further calls can be used to generate the image.
TODO: Add scanline iterator
SkPngCodec:
New decoder for png. Wraps libpng. The code has been repurposed from SkImageDecoder_libpng.
TODO: Handle other destination colortypes
TODO: Substitute the transpose color
TODO: Allow silencing warnings
TODO: Use RGB instead of filler?
TODO: sRGB
SkSwizzler:
Simplified version of SkScaledSampler. Unlike the sampler, this object does no sampling.
TODO: Implement other swizzles.
Requires a gclient sync to pull down libpng.
BUG=skia:3257
Committed: https://skia.googlesource.com/skia/+/ca358852b4fed656d11107b2aaf28318a4518b49
(and then reverted)
Review URL: https://codereview.chromium.org/930283002