Commit Graph

16 Commits

Author SHA1 Message Date
msarett
6bdbf4412b Improve naive SkColorXform to half floats
This should give us a good baseline to explore using SkRasterPipeline.

A particular colorxform to half float drops from 425us to 282us on my desktop.

Color Xform to Half Float (HP z620)
Original                              425us
Trans16 (not 32)                      355us
Vector Trans16                        378us
Trans16 + Keep Halfs in Vector        335us
Vector Trans16 + Keep Halfs in Vector 282us
Final                                 282us

Color Xform to Half Float (Nexus 5X)
Original                              556us
Final                                 472us

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2159993003
2016-07-19 09:07:55 -07:00
mtklein
58e389b051 Expand _01 half<->float limitation to _finite. Simplify.
It's become clear we need to sometimes deal with values <0 or >1.
    I'm not yet convinced we care about NaN or +-inf.

    We had some fairly clever tricks and optimizations here for NEON
    and SSE.  I've thrown them out in favor of a single implementation.
    If we find the specializations mattered, we can certainly figure out
    how to extend them to this new range/domain.

    This happens to add a vectorized float -> half for ARMv7, which was
    missing from the _01 version.  (The SSE strategy was not portable to
    platforms that flush denorm floats to zero.)

    I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
2016-07-15 07:00:11 -07:00
mtklein
64bbad360f Revert of Expand _01 half<->float limitation to _finite. Simplify. (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ )
Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast

Original issue's description:
> Expand _01 half<->float limitation to _finite.  Simplify.
>
>     It's become clear we need to sometimes deal with values <0 or >1.
>     I'm not yet convinced we care about NaN or +-inf.
>
>     We had some fairly clever tricks and optimizations here for NEON
>     and SSE.  I've thrown them out in favor of a single implementation.
>     If we find the specializations mattered, we can certainly figure out
>     how to extend them to this new range/domain.
>
>     This happens to add a vectorized float -> half for ARMv7, which was
>     missing from the _01 version.  (The SSE strategy was not portable to
>     platforms that flush denorm floats to zero.)
>
>     I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90

TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2151023003
2016-07-14 12:03:04 -07:00
mtklein
3296bee70d Expand _01 half<->float limitation to _finite. Simplify.
It's become clear we need to sometimes deal with values <0 or >1.
    I'm not yet convinced we care about NaN or +-inf.

    We had some fairly clever tricks and optimizations here for NEON
    and SSE.  I've thrown them out in favor of a single implementation.
    If we find the specializations mattered, we can certainly figure out
    how to extend them to this new range/domain.

    This happens to add a vectorized float -> half for ARMv7, which was
    missing from the _01 version.  (The SSE strategy was not portable to
    platforms that flush denorm floats to zero.)

    I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2145663003
2016-07-14 11:02:09 -07:00
mtklein
05c73b7ed5 Remove bulk float <-> half routines. These are dead code.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2152583002

Review-Url: https://codereview.chromium.org/2152583002
2016-07-13 13:30:49 -07:00
brianosman
e074d1fa6a Change SkColor4f to RGBA channel order
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2093763003

Review-Url: https://codereview.chromium.org/2093763003
2016-06-24 06:31:47 -07:00
robertphillips
c5035e70cc Add SkSpecialImage::extractSubset & NewFromPixmap
This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface)

This now relies on: https://codereview.chromium.org/1813483002/ (ImagePixelLocker now manually allocates SkPixmap) to clean up the uses of SkAutoPixmapStorage in Chromium

GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002

Committed: https://skia.googlesource.com/skia/+/250581493a0859987e482810879e85e5ac2dc002

Review URL: https://codereview.chromium.org/1787883002
2016-03-17 06:58:39 -07:00
robertphillips
19dea94f1d Revert of Add SkSpecialImage::extractSubset & NewFromPixmap (patchset #5 id:80001 of https://codereview.chromium.org/1787883002/ )
Reason for revert:
Need to wean ImagePixelLocker.h off of SkAutoPixmapStorage :(

Original issue's description:
> Add SkSpecialImage::extractSubset & NewFromPixmap
>
> This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface)
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002
>
> Committed: https://skia.googlesource.com/skia/+/250581493a0859987e482810879e85e5ac2dc002

TBR=bsalomon@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Review URL: https://codereview.chromium.org/1808833002
2016-03-16 10:39:09 -07:00
robertphillips
250581493a Add SkSpecialImage::extractSubset & NewFromPixmap
This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface)

GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002

Review URL: https://codereview.chromium.org/1787883002
2016-03-16 09:47:08 -07:00
reed
dd9ffea9ce make SkPM4f private
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1713653002

Review URL: https://codereview.chromium.org/1713653002
2016-02-18 12:39:14 -08:00
mtklein
ddb64c81fb new version of SkHalfToFloat_01
This is a little faster than the previous version, and much better explained.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1688233002

Review URL: https://codereview.chromium.org/1688233002
2016-02-11 12:48:23 -08:00
mtklein
fff055cc5f SkHalfToFloat_01 / SkFloatToHalf_01
These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].

Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.

In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.

Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.

Getting close to _u16 performance:
    micros   	bench
    261.13  	xferu64_bw_1_opaque_u16
   1833.51  	xferu64_bw_1_alpha_u16
   2762.32 ?	xferu64_aa_1_opaque_u16
   3334.29  	xferu64_aa_1_alpha_u16
    249.78  	xferu64_bw_1_opaque_f16
   3383.18  	xferu64_bw_1_alpha_f16
   4214.72  	xferu64_aa_1_opaque_f16
   4701.19  	xferu64_aa_1_alpha_f16

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005

Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9

CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1685133005
2016-02-11 06:30:03 -08:00
mtklein
cbefc5e4ca Revert of SkHalfToFloat_01 / SkFloatToHalf_01 (patchset #11 id:200001 of https://codereview.chromium.org/1685133005/ )
Reason for revert:
Gotta fix Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Original issue's description:
> SkHalfToFloat_01 / SkFloatToHalf_01
>
> These are basically inlined, 4-at-a-time versions of our existing functions,
> but cut down to avoid any work that's only necessary outside [0,1].
>
> Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.
>
> In exchange for a little speed, f32->f16 does not round properly.
> Instead it truncates, so it's never off by more than 1 bit.
>
> Support for finite values >1 or <0 is straightforward to add back.
> >1 might already work as-is.
>
> Getting close to _u16 performance:
>     micros   	bench
>     261.13  	xferu64_bw_1_opaque_u16
>    1833.51  	xferu64_bw_1_alpha_u16
>    2762.32 ?	xferu64_aa_1_opaque_u16
>    3334.29  	xferu64_aa_1_alpha_u16
>     249.78  	xferu64_bw_1_opaque_f16
>    3383.18  	xferu64_bw_1_alpha_f16
>    4214.72  	xferu64_aa_1_opaque_f16
>    4701.19  	xferu64_aa_1_alpha_f16
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005
>
> Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9

TBR=jvanverth@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1693443003
2016-02-11 06:00:49 -08:00
mtklein
9ea11a4235 SkHalfToFloat_01 / SkFloatToHalf_01
These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].

Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.

In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.

Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.

Getting close to _u16 performance:
    micros   	bench
    261.13  	xferu64_bw_1_opaque_u16
   1833.51  	xferu64_bw_1_alpha_u16
   2762.32 ?	xferu64_aa_1_opaque_u16
   3334.29  	xferu64_aa_1_alpha_u16
    249.78  	xferu64_bw_1_opaque_f16
   3383.18  	xferu64_bw_1_alpha_f16
   4214.72  	xferu64_aa_1_opaque_f16
   4701.19  	xferu64_aa_1_alpha_f16

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005

Review URL: https://codereview.chromium.org/1685133005
2016-02-11 05:56:08 -08:00
mtklein
a525cb151b skeleton for float <-> half optimized procs
Nothing fancy yet, just calls the serial code in a loop.

I will try to folow this up with at least some of:
   - SSE2 version of serial code
   - NEON version of serial code
   - NEON version using vcvt.f32.f16/vcvt.f16.f32
   - F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph
The last two are fastest but need runtime detection.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003

Review URL: https://codereview.chromium.org/1686543003
2016-02-09 08:18:10 -08:00
reed
3601f280dc add kRGBA_F16_SkColorType
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1666343002

Review URL: https://codereview.chromium.org/1666343002
2016-02-05 11:18:39 -08:00