Commit Graph

19 Commits

Author SHA1 Message Date
mtklein
e9a3e3c17a Convert SkPMFloat to [0,1] range and prune its API.
Now that Sk4px exists, there's a lot less sense in eeking out every
cycle of speed from SkPMFloat: if we need to go _really_ fast, we
should use Sk4px.  SkPMFloat's going to be used for things that are
already slow: large-range intermediates, divides, sqrts, etc.

A [0,1] range is easier to work with, and can even be faster if we
eliminate enough *255 and *1/255 steps.  This is particularly true
on ARM, where NEON can do the *255 and /255 steps for us while
converting float<->int.

We have lots of experimental SkPMFloat <-> SkPMColor APIs that
I'm now removing.  Of the existing APIs, roundClamp() is the sanest,
so I've kept only that, now called round().  The 4-at-a-time APIs
never panned out, so they're gone.

There will be small diffs on:
colormatrix coloremoji colorfilterimagefilter fadefilter imagefilters_xfermodes imagefilterscropexpand imagefiltersgraph tileimagefilter

BUG=skia:

Review URL: https://codereview.chromium.org/1201343004
2015-06-25 08:56:28 -07:00
mtklein
3d626834b4 New names for SkPMFloat methods.
BUG=skia:

Review URL: https://codereview.chromium.org/1055123002
2015-04-03 07:05:20 -07:00
mtklein
0340df5b36 back to Sk4f for SkPMColor
#floats

BUG=skia:
BUG=skia:3592

Review URL: https://codereview.chromium.org/1047823002
2015-03-31 08:17:00 -07:00
mtklein
c9adb05b64 Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N.  Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster).  Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.

This also makes implementing new specializations easier and more encapsulated.  We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h  and SkNx_neon.h.

This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.

To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet.  I will happily add them back if they seem useful.

You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc.  Here's how you should feel:
  - Sk4f, Sk4s, Sk2d: feel awesome
  - Sk2f, Sk2s, Sk4d: feel pretty good

No public API changes.
TBR=reed@google.com

BUG=skia:3592

Review URL: https://codereview.chromium.org/1048593002
2015-03-30 10:50:27 -07:00
mtklein
3d4c4a5a9f SkPMFloat::trunc()
Add and test trunc(), which is what get() used to be before rounding.
Using trunc() is a ~40% speedup on our linear gradient bench.

#neon #floats
BUG=skia:3592
#n5
#n9
CQ_INCLUDE_TRYBOTS=client.skia.android:Test-Android-Nexus5-Adreno330-Arm7-Debug-Trybot;client.skia.android:Test-Android-Nexus9-TegraK1-Arm64-Release-Trybot

Review URL: https://codereview.chromium.org/1032243002
2015-03-26 12:32:29 -07:00
mtklein
15391ee4ac Update 4-at-a-time APIs.
There is no reason to require the 4 SkPMFloats (registers) to be adjacent.
The only potential win in loads and stores comes from the SkPMColors being adjacent.

Makes no difference to existing bench.

BUG=skia:

Review URL: https://codereview.chromium.org/1035583002
2015-03-25 13:43:34 -07:00
mtklein
4b65059e6e Go back to storeAligned / LoadAligned for SkPMFloat <->Sk4f.
This seems to fix the miscompilation bug on ARM64 / Release / GCC 4.9.

We switched this over originally for perf issues with NEON, but I can't see any now.  Will keep an eye out.

BUG=skia:3570

Review URL: https://codereview.chromium.org/1026403002
2015-03-24 09:47:12 -07:00
mtklein
92d04da38f Replace _mm_cvtps_epi32(x) with _mm_cvttps_epi32(_mm_add_ps(0.5f), x).
We don't have control over which way _mm_cvtps_epi32 rounds.

  - This makes the SSE SkPMFloat rounding consistent with _neon and _none.
  - Sk4f::cast<Sk4i>() is closer to (int)float's behavior.  (Correct when >=0).

Add tests that would fail at head.

BUG=skia:

Review URL: https://codereview.chromium.org/1029163002
2015-03-23 12:01:46 -07:00
mtklein
bf0c56f82b Hack around skia:3570 for now.
BUG=skia:3570

Review URL: https://codereview.chromium.org/1021353002
2015-03-20 12:21:53 -07:00
mtklein
26bf90e5d6 operator overloads for Sk4x, use them all where possible
BUG=skia:
NOTRY=true

Review URL: https://codereview.chromium.org/1024633003
2015-03-20 06:00:57 -07:00
mtklein
a27cdefae1 Make Sk4f(float) constructor explicit.
BUG=skia:

Review URL: https://codereview.chromium.org/985003003
2015-03-06 16:20:22 -08:00
mtklein
548bf38b28 4-at-a-time SkPMColor -> SkPMFloat API.
Please see if this looks usable.  It may even give a perf boost if you use it, even without custom implementations for each instruction set.

I've been trying this morning to beat this naive loop implementation, but so far no luck with either _SSE2.h or _SSSE3.h.  It's possible this is an artifact of the microbenchmark, because we're not doing anything between the conversions.  I'd like to see how this fits into real code, what assembly's generated, what the hot spots are, etc.

I've updated the tests to test these new APIs, and splintered off a pair of new benchmarks that use the new APIs.  This required some minor rejiggering in the benches.

BUG=skia:

Review URL: https://codereview.chromium.org/978213003
2015-03-05 11:31:59 -08:00
mtklein
4e644f5d50 Update SkPMFloat API a bit.
Instead of set(SkPMColor), add a constructor SkPMFloat(SkPMColor).
Replace setA(), setR(), etc. with a 4 float constructor.

And, promise to stick to SkPMColor order.

BUG=skia:

Review URL: https://codereview.chromium.org/977773002
2015-03-04 11:25:27 -08:00
mtklein
0aebf5d0d3 Test and fix SkPMFloat rounding.
SSE rounds for free (that was a happy accident: they also have a truncating version).
NEON does not, nor obviously the portable code, so they add 0.5 before truncating.

NOPRESUBMIT=true

BUG=skia:

Review URL: https://codereview.chromium.org/974643002
2015-03-03 08:57:07 -08:00
mtklein
60d2a32b2d Make SkPMFloats store floats in [0,255] instead of [0,1].
This pushes the cost of the *255 and *1/255 conversions onto only those code
paths that need it.  We're not doing it any more efficiently than can be done
with Sk4f.

In microbenchmark isolation, this is about a 15% speedup.

BUG=skia:
NOPRESUBMIT=true

Review URL: https://codereview.chromium.org/973603002
2015-03-03 07:46:15 -08:00
mtklein
870b9ea386 add auto SkPMFloat <-> Sk4f conversion
BUG=skia:

Review URL: https://codereview.chromium.org/954323002
2015-02-26 10:43:16 -08:00
mtklein
a2f4be76a9 Sketch SkPMFloat
BUG=skia:

Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon-Trybot

http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio

Review URL: https://codereview.chromium.org/936633002
2015-02-23 10:04:34 -08:00
mtklein
088302756b Revert of Sketch SkPMFloat (patchset #15 id:270001 of https://codereview.chromium.org/936633002/)
Reason for revert:
http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio

Original issue's description:
> Sketch SkPMFloat
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f

TBR=reed@google.com,msarrett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/952453004
2015-02-23 09:44:34 -08:00
mtklein
50d2b3114b Sketch SkPMFloat
BUG=skia:

Review URL: https://codereview.chromium.org/936633002
2015-02-23 09:39:27 -08:00