- add back Sk4i typedef
- define SSE casts in terms of Sk4i
* uint8 <-> float becomes uint8 <-> int <-> float
* uint16 <-> float becomes uint16 <-> int <-> float
This has the nice side effect of specializing uint8 <-> int
and uint16 <-> int, which are useful in their own right.
There are many cast specializations now, some of which call each other.
I have tried to arrange them in some sort of sensible order, subject to
the constraint that those called must precede those who call.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1690633003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1690633003
These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].
Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.
In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.
Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.
Getting close to _u16 performance:
micros bench
261.13 xferu64_bw_1_opaque_u16
1833.51 xferu64_bw_1_alpha_u16
2762.32 ? xferu64_aa_1_opaque_u16
3334.29 xferu64_aa_1_alpha_u16
249.78 xferu64_bw_1_opaque_f16
3383.18 xferu64_bw_1_alpha_f16
4214.72 xferu64_aa_1_opaque_f16
4701.19 xferu64_aa_1_alpha_f16
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005
Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1685133005
Reason for revert:
Gotta fix Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Original issue's description:
> SkHalfToFloat_01 / SkFloatToHalf_01
>
> These are basically inlined, 4-at-a-time versions of our existing functions,
> but cut down to avoid any work that's only necessary outside [0,1].
>
> Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.
>
> In exchange for a little speed, f32->f16 does not round properly.
> Instead it truncates, so it's never off by more than 1 bit.
>
> Support for finite values >1 or <0 is straightforward to add back.
> >1 might already work as-is.
>
> Getting close to _u16 performance:
> micros bench
> 261.13 xferu64_bw_1_opaque_u16
> 1833.51 xferu64_bw_1_alpha_u16
> 2762.32 ? xferu64_aa_1_opaque_u16
> 3334.29 xferu64_aa_1_alpha_u16
> 249.78 xferu64_bw_1_opaque_f16
> 3383.18 xferu64_bw_1_alpha_f16
> 4214.72 xferu64_aa_1_opaque_f16
> 4701.19 xferu64_aa_1_alpha_f16
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005
>
> Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9TBR=jvanverth@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1693443003
These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].
Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.
In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.
Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.
Getting close to _u16 performance:
micros bench
261.13 xferu64_bw_1_opaque_u16
1833.51 xferu64_bw_1_alpha_u16
2762.32 ? xferu64_aa_1_opaque_u16
3334.29 xferu64_aa_1_alpha_u16
249.78 xferu64_bw_1_opaque_f16
3383.18 xferu64_bw_1_alpha_f16
4214.72 xferu64_aa_1_opaque_f16
4701.19 xferu64_aa_1_alpha_f16
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005
Review URL: https://codereview.chromium.org/1685133005
Reason for revert:
Breaks Ubuntu and Mac CMAKE
Original issue's description:
> Make SkPicture/SkImageGenerator default to SkCodec
>
> Remove reference to SkImageDecoder from SkPicture. Make the default
> InstallPixelRefProc passed to CreateFromStream use
> SkImageGenerator::NewFromEncoded instead.
>
> Make SkImageGenerator::NewFromEncoded create an SkCodecImageGenerator.
> Remove the old version that used SkImageDecoder.
>
> Remove all versions of lazy_decode_bitmap/LazyDecodeBitmap. The default
> now behaves lazily.
>
> Update all clients to use the default.
>
> Move SkImageDecoderGenerator into KtxTest.cpp, and use it directly.
>
> BUG=skia:4691
> BUG=skia:4290
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1671193002
>
> Committed: https://skia.googlesource.com/skia/+/026388a01864c74208ad57d1ba4f711602d101c6TBR=msarett@google.com,reed@google.com,scroggo@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4691
Review URL: https://codereview.chromium.org/1685963004
Remove reference to SkImageDecoder from SkPicture. Make the default
InstallPixelRefProc passed to CreateFromStream use
SkImageGenerator::NewFromEncoded instead.
Make SkImageGenerator::NewFromEncoded create an SkCodecImageGenerator.
Remove the old version that used SkImageDecoder.
Remove all versions of lazy_decode_bitmap/LazyDecodeBitmap. The default
now behaves lazily.
Update all clients to use the default.
Move SkImageDecoderGenerator into KtxTest.cpp, and use it directly.
BUG=skia:4691
BUG=skia:4290
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1671193002
Review URL: https://codereview.chromium.org/1671193002
Add a couple of utility functions to SkPaint that return the bounds
of glyphs between a pair of lines.
The common use case envisioned generates the edges of descenders
between the top and bottom bounds of an underline to allow computing
a stroke that skips those descenders.
The implementation stores a linked list in each glyph containing
the bounds of the lines parallel to the advance and the outermost
intersections within those bounds.
When the glyph cache is constructed, the glyph path is intersected
with the bounds and the extreme min and max values within the bounds
is added to an intercept.
Share the text to path iter to construct the data.
Make a half-hearted attempt to support vertical text; while the
vertical implementation is complete; surrounding code (e.g. paint
align) has short-comings with vertical.
R=fmalita@chromium.org, reed@google.com
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1654883003
Review URL: https://codereview.chromium.org/1654883003