skia2

Author	SHA1	Message	Date
mtklein	e9a3e3c17a	Convert SkPMFloat to [0,1] range and prune its API. Now that Sk4px exists, there's a lot less sense in eeking out every cycle of speed from SkPMFloat: if we need to go _really_ fast, we should use Sk4px. SkPMFloat's going to be used for things that are already slow: large-range intermediates, divides, sqrts, etc. A [0,1] range is easier to work with, and can even be faster if we eliminate enough 255 and 1/255 steps. This is particularly true on ARM, where NEON can do the *255 and /255 steps for us while converting float<->int. We have lots of experimental SkPMFloat <-> SkPMColor APIs that I'm now removing. Of the existing APIs, roundClamp() is the sanest, so I've kept only that, now called round(). The 4-at-a-time APIs never panned out, so they're gone. There will be small diffs on: colormatrix coloremoji colorfilterimagefilter fadefilter imagefilters_xfermodes imagefilterscropexpand imagefiltersgraph tileimagefilter BUG=skia: Review URL: https://codereview.chromium.org/1201343004	2015-06-25 08:56:28 -07:00
mtklein	3d626834b4	New names for SkPMFloat methods. BUG=skia: Review URL: https://codereview.chromium.org/1055123002	2015-04-03 07:05:20 -07:00
mtklein	0340df5b36	back to Sk4f for SkPMColor #floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1047823002	2015-03-31 08:17:00 -07:00
mtklein	c9adb05b64	Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T> The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc. This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h. This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h. To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful. You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel: - Sk4f, Sk4s, Sk2d: feel awesome - Sk2f, Sk2s, Sk4d: feel pretty good No public API changes. TBR=reed@google.com BUG=skia:3592 Review URL: https://codereview.chromium.org/1048593002	2015-03-30 10:50:27 -07:00
mtklein	3d4c4a5a9f	SkPMFloat::trunc() Add and test trunc(), which is what get() used to be before rounding. Using trunc() is a ~40% speedup on our linear gradient bench. #neon #floats BUG=skia:3592 #n5 #n9 CQ_INCLUDE_TRYBOTS=client.skia.android:Test-Android-Nexus5-Adreno330-Arm7-Debug-Trybot;client.skia.android:Test-Android-Nexus9-TegraK1-Arm64-Release-Trybot Review URL: https://codereview.chromium.org/1032243002	2015-03-26 12:32:29 -07:00
mtklein	15391ee4ac	Update 4-at-a-time APIs. There is no reason to require the 4 SkPMFloats (registers) to be adjacent. The only potential win in loads and stores comes from the SkPMColors being adjacent. Makes no difference to existing bench. BUG=skia: Review URL: https://codereview.chromium.org/1035583002	2015-03-25 13:43:34 -07:00
mtklein	4b65059e6e	Go back to storeAligned / LoadAligned for SkPMFloat <->Sk4f. This seems to fix the miscompilation bug on ARM64 / Release / GCC 4.9. We switched this over originally for perf issues with NEON, but I can't see any now. Will keep an eye out. BUG=skia:3570 Review URL: https://codereview.chromium.org/1026403002	2015-03-24 09:47:12 -07:00
mtklein	92d04da38f	Replace _mm_cvtps_epi32(x) with _mm_cvttps_epi32(_mm_add_ps(0.5f), x). We don't have control over which way _mm_cvtps_epi32 rounds. - This makes the SSE SkPMFloat rounding consistent with _neon and _none. - Sk4f::cast<Sk4i>() is closer to (int)float's behavior. (Correct when >=0). Add tests that would fail at head. BUG=skia: Review URL: https://codereview.chromium.org/1029163002	2015-03-23 12:01:46 -07:00
mtklein	bf0c56f82b	Hack around skia:3570 for now. BUG=skia:3570 Review URL: https://codereview.chromium.org/1021353002	2015-03-20 12:21:53 -07:00
mtklein	26bf90e5d6	operator overloads for Sk4x, use them all where possible BUG=skia: NOTRY=true Review URL: https://codereview.chromium.org/1024633003	2015-03-20 06:00:57 -07:00
mtklein	a27cdefae1	Make Sk4f(float) constructor explicit. BUG=skia: Review URL: https://codereview.chromium.org/985003003	2015-03-06 16:20:22 -08:00
mtklein	548bf38b28	4-at-a-time SkPMColor -> SkPMFloat API. Please see if this looks usable. It may even give a perf boost if you use it, even without custom implementations for each instruction set. I've been trying this morning to beat this naive loop implementation, but so far no luck with either _SSE2.h or _SSSE3.h. It's possible this is an artifact of the microbenchmark, because we're not doing anything between the conversions. I'd like to see how this fits into real code, what assembly's generated, what the hot spots are, etc. I've updated the tests to test these new APIs, and splintered off a pair of new benchmarks that use the new APIs. This required some minor rejiggering in the benches. BUG=skia: Review URL: https://codereview.chromium.org/978213003	2015-03-05 11:31:59 -08:00
mtklein	4e644f5d50	Update SkPMFloat API a bit. Instead of set(SkPMColor), add a constructor SkPMFloat(SkPMColor). Replace setA(), setR(), etc. with a 4 float constructor. And, promise to stick to SkPMColor order. BUG=skia: Review URL: https://codereview.chromium.org/977773002	2015-03-04 11:25:27 -08:00
mtklein	0aebf5d0d3	Test and fix SkPMFloat rounding. SSE rounds for free (that was a happy accident: they also have a truncating version). NEON does not, nor obviously the portable code, so they add 0.5 before truncating. NOPRESUBMIT=true BUG=skia: Review URL: https://codereview.chromium.org/974643002	2015-03-03 08:57:07 -08:00
mtklein	60d2a32b2d	Make SkPMFloats store floats in [0,255] instead of [0,1]. This pushes the cost of the 255 and 1/255 conversions onto only those code paths that need it. We're not doing it any more efficiently than can be done with Sk4f. In microbenchmark isolation, this is about a 15% speedup. BUG=skia: NOPRESUBMIT=true Review URL: https://codereview.chromium.org/973603002	2015-03-03 07:46:15 -08:00
mtklein	870b9ea386	add auto SkPMFloat <-> Sk4f conversion BUG=skia: Review URL: https://codereview.chromium.org/954323002	2015-02-26 10:43:16 -08:00
mtklein	a2f4be76a9	Sketch SkPMFloat BUG=skia: Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon-Trybot http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio Review URL: https://codereview.chromium.org/936633002	2015-02-23 10:04:34 -08:00
mtklein	088302756b	Revert of Sketch SkPMFloat (patchset #15 id:270001 of https://codereview.chromium.org/936633002/) Reason for revert: http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu13.10-GCC4.8-Arm7-Release-Android_Neon/builds/2120/steps/build%20most/logs/stdio Original issue's description: > Sketch SkPMFloat > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/50d2b3114b3e59dc84811881591bf25b2c1ecb9f TBR=reed@google.com,msarrett@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/952453004	2015-02-23 09:44:34 -08:00
mtklein	50d2b3114b	Sketch SkPMFloat BUG=skia: Review URL: https://codereview.chromium.org/936633002	2015-02-23 09:39:27 -08:00

19 Commits