Commit Graph

19 Commits

Author SHA1 Message Date
Ben Wagner
8a1036caf1 Include what you use with signbit.
Include cmath in a few source files which use signbit and a relying on
magic to happen to use it.

Also Fix nuttiness in SampleClip. No need to #define single character
identifiers.

Change-Id: Iae3352d0cab9aaa6c37d6424f064b3d86fa2e011
Reviewed-on: https://skia-review.googlesource.com/4626
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
Commit-Queue: Herb Derby <herb@google.com>
2016-11-09 20:29:45 +00:00
mtklein
a2d2f38600 f16<->f32 ftz is an optional thing for speed.
The ARMv8 asm path actually does it right... that should be okay.

My Nexus 5x fails `dm -m _finite_ftz` before this and passes after it.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2276533002

TBR=msarett@google.com

Review-Url: https://codereview.chromium.org/2276533002
2016-08-23 08:58:12 -07:00
mtklein
8ae991e433 Flush denorm half floats to zero.
I think we convinced ourselves that denorms, while a good chunk of half floats,
cover a rather small fraction of the representable range, which is always
close enough to zero to flush.

This makes both paths of the conversion to or from float considerably simpler.

These functions now work for zero-or-normal half floats (excluding infinite, NaN).
I'm not aware of a term for this class so I've called them "ordinary".

A handful of GMs and SKPs draw differently in --config f16, but all imperceptibly.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2256023002

Review-Url: https://codereview.chromium.org/2256023002
2016-08-22 13:20:18 -07:00
msarett
6bdbf4412b Improve naive SkColorXform to half floats
This should give us a good baseline to explore using SkRasterPipeline.

A particular colorxform to half float drops from 425us to 282us on my desktop.

Color Xform to Half Float (HP z620)
Original                              425us
Trans16 (not 32)                      355us
Vector Trans16                        378us
Trans16 + Keep Halfs in Vector        335us
Vector Trans16 + Keep Halfs in Vector 282us
Final                                 282us

Color Xform to Half Float (Nexus 5X)
Original                              556us
Final                                 472us

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2159993003
2016-07-19 09:07:55 -07:00
mtklein
58e389b051 Expand _01 half<->float limitation to _finite. Simplify.
It's become clear we need to sometimes deal with values <0 or >1.
    I'm not yet convinced we care about NaN or +-inf.

    We had some fairly clever tricks and optimizations here for NEON
    and SSE.  I've thrown them out in favor of a single implementation.
    If we find the specializations mattered, we can certainly figure out
    how to extend them to this new range/domain.

    This happens to add a vectorized float -> half for ARMv7, which was
    missing from the _01 version.  (The SSE strategy was not portable to
    platforms that flush denorm floats to zero.)

    I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
2016-07-15 07:00:11 -07:00
mtklein
64bbad360f Revert of Expand _01 half<->float limitation to _finite. Simplify. (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ )
Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast

Original issue's description:
> Expand _01 half<->float limitation to _finite.  Simplify.
>
>     It's become clear we need to sometimes deal with values <0 or >1.
>     I'm not yet convinced we care about NaN or +-inf.
>
>     We had some fairly clever tricks and optimizations here for NEON
>     and SSE.  I've thrown them out in favor of a single implementation.
>     If we find the specializations mattered, we can certainly figure out
>     how to extend them to this new range/domain.
>
>     This happens to add a vectorized float -> half for ARMv7, which was
>     missing from the _01 version.  (The SSE strategy was not portable to
>     platforms that flush denorm floats to zero.)
>
>     I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90

TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2151023003
2016-07-14 12:03:04 -07:00
mtklein
3296bee70d Expand _01 half<->float limitation to _finite. Simplify.
It's become clear we need to sometimes deal with values <0 or >1.
    I'm not yet convinced we care about NaN or +-inf.

    We had some fairly clever tricks and optimizations here for NEON
    and SSE.  I've thrown them out in favor of a single implementation.
    If we find the specializations mattered, we can certainly figure out
    how to extend them to this new range/domain.

    This happens to add a vectorized float -> half for ARMv7, which was
    missing from the _01 version.  (The SSE strategy was not portable to
    platforms that flush denorm floats to zero.)

    I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2145663003
2016-07-14 11:02:09 -07:00
mtklein
05c73b7ed5 Remove bulk float <-> half routines. These are dead code.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2152583002

Review-Url: https://codereview.chromium.org/2152583002
2016-07-13 13:30:49 -07:00
brianosman
e074d1fa6a Change SkColor4f to RGBA channel order
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2093763003

Review-Url: https://codereview.chromium.org/2093763003
2016-06-24 06:31:47 -07:00
robertphillips
c5035e70cc Add SkSpecialImage::extractSubset & NewFromPixmap
This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface)

This now relies on: https://codereview.chromium.org/1813483002/ (ImagePixelLocker now manually allocates SkPixmap) to clean up the uses of SkAutoPixmapStorage in Chromium

GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002

Committed: https://skia.googlesource.com/skia/+/250581493a0859987e482810879e85e5ac2dc002

Review URL: https://codereview.chromium.org/1787883002
2016-03-17 06:58:39 -07:00
robertphillips
19dea94f1d Revert of Add SkSpecialImage::extractSubset & NewFromPixmap (patchset #5 id:80001 of https://codereview.chromium.org/1787883002/ )
Reason for revert:
Need to wean ImagePixelLocker.h off of SkAutoPixmapStorage :(

Original issue's description:
> Add SkSpecialImage::extractSubset & NewFromPixmap
>
> This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface)
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002
>
> Committed: https://skia.googlesource.com/skia/+/250581493a0859987e482810879e85e5ac2dc002

TBR=bsalomon@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Review URL: https://codereview.chromium.org/1808833002
2016-03-16 10:39:09 -07:00
robertphillips
250581493a Add SkSpecialImage::extractSubset & NewFromPixmap
This is calved off of: https://codereview.chromium.org/1785643003/ (Switch SkBlurImageFilter over to new onFilterImage interface)

GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1787883002

Review URL: https://codereview.chromium.org/1787883002
2016-03-16 09:47:08 -07:00
reed
dd9ffea9ce make SkPM4f private
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1713653002

Review URL: https://codereview.chromium.org/1713653002
2016-02-18 12:39:14 -08:00
mtklein
ddb64c81fb new version of SkHalfToFloat_01
This is a little faster than the previous version, and much better explained.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1688233002

Review URL: https://codereview.chromium.org/1688233002
2016-02-11 12:48:23 -08:00
mtklein
fff055cc5f SkHalfToFloat_01 / SkFloatToHalf_01
These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].

Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.

In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.

Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.

Getting close to _u16 performance:
    micros   	bench
    261.13  	xferu64_bw_1_opaque_u16
   1833.51  	xferu64_bw_1_alpha_u16
   2762.32 ?	xferu64_aa_1_opaque_u16
   3334.29  	xferu64_aa_1_alpha_u16
    249.78  	xferu64_bw_1_opaque_f16
   3383.18  	xferu64_bw_1_alpha_f16
   4214.72  	xferu64_aa_1_opaque_f16
   4701.19  	xferu64_aa_1_alpha_f16

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005

Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9

CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1685133005
2016-02-11 06:30:03 -08:00
mtklein
cbefc5e4ca Revert of SkHalfToFloat_01 / SkFloatToHalf_01 (patchset #11 id:200001 of https://codereview.chromium.org/1685133005/ )
Reason for revert:
Gotta fix Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD

Original issue's description:
> SkHalfToFloat_01 / SkFloatToHalf_01
>
> These are basically inlined, 4-at-a-time versions of our existing functions,
> but cut down to avoid any work that's only necessary outside [0,1].
>
> Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.
>
> In exchange for a little speed, f32->f16 does not round properly.
> Instead it truncates, so it's never off by more than 1 bit.
>
> Support for finite values >1 or <0 is straightforward to add back.
> >1 might already work as-is.
>
> Getting close to _u16 performance:
>     micros   	bench
>     261.13  	xferu64_bw_1_opaque_u16
>    1833.51  	xferu64_bw_1_alpha_u16
>    2762.32 ?	xferu64_aa_1_opaque_u16
>    3334.29  	xferu64_aa_1_alpha_u16
>     249.78  	xferu64_bw_1_opaque_f16
>    3383.18  	xferu64_bw_1_alpha_f16
>    4214.72  	xferu64_aa_1_opaque_f16
>    4701.19  	xferu64_aa_1_alpha_f16
>
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005
>
> Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9

TBR=jvanverth@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1693443003
2016-02-11 06:00:49 -08:00
mtklein
9ea11a4235 SkHalfToFloat_01 / SkFloatToHalf_01
These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].

Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.

In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.

Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.

Getting close to _u16 performance:
    micros   	bench
    261.13  	xferu64_bw_1_opaque_u16
   1833.51  	xferu64_bw_1_alpha_u16
   2762.32 ?	xferu64_aa_1_opaque_u16
   3334.29  	xferu64_aa_1_alpha_u16
    249.78  	xferu64_bw_1_opaque_f16
   3383.18  	xferu64_bw_1_alpha_f16
   4214.72  	xferu64_aa_1_opaque_f16
   4701.19  	xferu64_aa_1_alpha_f16

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005

Review URL: https://codereview.chromium.org/1685133005
2016-02-11 05:56:08 -08:00
mtklein
a525cb151b skeleton for float <-> half optimized procs
Nothing fancy yet, just calls the serial code in a loop.

I will try to folow this up with at least some of:
   - SSE2 version of serial code
   - NEON version of serial code
   - NEON version using vcvt.f32.f16/vcvt.f16.f32
   - F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph
The last two are fastest but need runtime detection.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003

Review URL: https://codereview.chromium.org/1686543003
2016-02-09 08:18:10 -08:00
reed
3601f280dc add kRGBA_F16_SkColorType
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1666343002

Review URL: https://codereview.chromium.org/1666343002
2016-02-05 11:18:39 -08:00