skia2

Author	SHA1	Message	Date
mtklein	4be181e304	3-15% speedup to HardLight / Overlay xfermodes. While investigating my bug (skia:4052) I saw this TODO and figured it'd make me feel better about an otherwise unsuccessful investigation. This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls to 2 cheap comparisons and 1 expensive div255(). NEON speeds up by a more modest ~3%. BUG=skia: Review URL: https://codereview.chromium.org/1230663005	2015-07-14 10:54:19 -07:00
mtklein	e20633ed26	Add a GM that reproduces layout test failures with my new xfermode code. Inspired by https://storage.googleapis.com/chromium-layout-test-archives/linux_blink_rel/69169/layout-test-results/results.html I think the root cause is overflow. Also, adds tests for Sk16b::operator<(). It wasn't wrong, but it was suspect (used in all three of these xfermode implementations) and so it's best to have tests. BUG=skia: Review URL: https://codereview.chromium.org/1228393006	2015-07-13 12:06:33 -07:00
mtklein	059ac00446	Update some Sk4px APIs. Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones. - SkAlpha / SkPMColor constructors become static factories. - Remove div255TruncNarrow(), rename div255RoundNarrow() to div255(). In practice we always want to round, and the narrowing to 8-bit is contextually obvious. - Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed. Drop Round for the same reason as above... we should always round. - Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts. - use operator*() for 8-bit x 8-bit -> 16-bit math. It's always what we want, and there's generally no 8x8->8 alternative. - MapFoo can take a const Func&. Don't think it makes a big difference, but nice to do. BUG=skia: Review URL: https://codereview.chromium.org/1202013002	2015-06-22 10:39:38 -07:00
mtklein	3747875341	Thorough tests for saturatedAdd and mulDiv255Round. BUG=skia:3951 Committed: https://skia.googlesource.com/skia/+/ce9d11189a5924b47c3629063b72bae9d466c2c7 CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot Review URL: https://codereview.chromium.org/1184113003	2015-06-15 10:58:43 -07:00
mtklein	b1bbca8049	Revert of Thorough tests for saturatedAdd and mulDiv255Round. (patchset #1 id:1 of https://codereview.chromium.org/1184113003/) Reason for revert: https://code.google.com/p/skia/issues/detail?id=3951 Original issue's description: > Thorough tests for saturatedAdd and mulDiv255Round. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/ce9d11189a5924b47c3629063b72bae9d466c2c7 TBR=fmalita@chromium.org,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1177123004	2015-06-15 10:03:41 -07:00
mtklein	ce9d11189a	Thorough tests for saturatedAdd and mulDiv255Round. BUG=skia: Review URL: https://codereview.chromium.org/1184113003	2015-06-15 08:58:22 -07:00
mtklein	f2fe0e0320	Remove overly-promiscuous SkNx syntax sugar. I haven't figured out a pithy way to have these apply to only classes originating from SkNx, so let's just remove them. There aren't too many use cases, and it's not really any less readable without them. Semantically, this is a no-op. BUG=skia: Review URL: https://codereview.chromium.org/1167153002	2015-06-10 08:57:28 -07:00
mtklein	27e517ae53	add Min to SkNi, specialized for u8 and u16 on SSE and NEON 0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does. I plan to use this to implement the Difference xfermode, and it seems generally handy. BUG=skia: Review URL: https://codereview.chromium.org/1133933004	2015-05-14 17:53:04 -07:00
mtklein	d7c014ff03	Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. BUG=skia: Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297 CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot Review URL: https://codereview.chromium.org/1109913002	2015-04-27 14:22:32 -07:00
mtklein	9a22f489e8	Revert of Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM (patchset #2 id:20001 of https://codereview.chromium.org/1109913002/) Reason for revert: arm64 typos Original issue's description: > Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM > > This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. > > BUG=skia: > > Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297 TBR=reed@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1105233003	2015-04-27 13:55:53 -07:00
mtklein	9de16283fd	Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM This is a logical no-op. Everything was using the equivalent of rsqrt1() before, and is now after. BUG=skia: Review URL: https://codereview.chromium.org/1109913002	2015-04-27 13:51:28 -07:00
mtklein	1113da72ec	Mike's radial gradient CL with better float -> int. patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) This looks quite launchable. radial_gradient3, min of 100 samples: N5: 985µs -> 946µs MBP: 395µs -> 279µs On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? BUG=skia: CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot,Build-Ubuntu-GCC-Arm7-Release-Android_NoNeon-Trybot Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec Review URL: https://codereview.chromium.org/1109643002	2015-04-27 12:08:01 -07:00
mtklein	8d3e9dff3f	Revert of Mike's radial gradient CL with better float -> int. (patchset #7 id:120001 of https://codereview.chromium.org/1109643002/) Reason for revert: compile failures. Original issue's description: > Mike's radial gradient CL with better float -> int. > > patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) > > This looks quite launchable. radial_gradient3, min of 100 samples: > N5: 985µs -> 946µs > MBP: 395µs -> 279µs > > On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? > > BUG=skia: > > CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot > > Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec TBR=reed@google.com,robertphillips@google.com,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=skia: Review URL: https://codereview.chromium.org/1109883003	2015-04-27 11:21:16 -07:00
mtklein	abf6c5cf95	Mike's radial gradient CL with better float -> int. patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001) This looks quite launchable. radial_gradient3, min of 100 samples: N5: 985µs -> 946µs MBP: 395µs -> 279µs On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time. Is that something we could do in float math rather than with a lookup table? BUG=skia: CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot Review URL: https://codereview.chromium.org/1109643002	2015-04-27 11:13:53 -07:00
mtklein	115acee938	Sk4h and Sk8h for SSE These will underly the SkPMFloat-like class for uint16_t components. Sk4h will back a single-pixel version, and Sk8h any larger number than that. BUG=skia: Review URL: https://codereview.chromium.org/1088883005	2015-04-14 14:02:52 -07:00
mtklein	a156a8ffbe	Use switch operator[](int) to kth<int>() so we can use vget_lane. #floats BUG=skia: BUG=skia:3592 Review URL: https://codereview.chromium.org/1059743002	2015-04-03 06:16:13 -07:00
mtklein	c9adb05b64	Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T> The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc. This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h. This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h. To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful. You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel: - Sk4f, Sk4s, Sk2d: feel awesome - Sk2f, Sk2s, Sk4d: feel pretty good No public API changes. TBR=reed@google.com BUG=skia:3592 Review URL: https://codereview.chromium.org/1048593002	2015-03-30 10:50:27 -07:00

17 Commits