skia2

Author	SHA1	Message	Date
Michael Ludwig	767586b330	Update Sk4px to use skvx instead of SkNx Adds a saturated_add function that was on SkNx and used in SkXfermode_opts, but hadn't been ported to skvx yet. Removes the Sk4px_opts variants and simplifies some of its functions; many were already defined skvx. The largest change is that Sk4px does not extend skvx::byte16, since it used to extend Sk16b. Now it just has a vector as a data type. This was necessary so that we could define operators that were typed for Sk4px and Wide w/o conflicting with the free operators that were defined for the base skvx types. Change-Id: I8c667ba86f662ccf07ad85aa32e78abfc0a8c7ae Reviewed-on: https://skia-review.googlesource.com/c/skia/+/542645 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Michael Ludwig <michaelludwig@google.com>	2022-05-23 17:41:53 +00:00
Michael Ludwig	8e870728db	Update filters to use skvx instead of SkNx Change-Id: I1a5490f546a3cb046c64b114a30be991d2d9f2cc Reviewed-on: https://skia-review.googlesource.com/c/skia/+/541064 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Michael Ludwig <michaelludwig@google.com>	2022-05-23 14:38:11 +00:00
Michael Ludwig	4621ef2a8a	Improve skvx::any() and all() intrinsics Removes specializations for all() on AVX2 and SSE 4.1, which give the wrong results if the ints didn't have all bits set (inconsistent with other platforms and non-SIMD). Added a unit test that checks this case. The mirror specializations for any() on AVX2 and SSE 4.1 are actually valid, so added those, and added a 2 instruction specialization for SSE for any() and all(). This is what clang-trunk produces on -O3, but ToT clang struggles to vectorize it. Also adds specializations for NEON for any() and all(), since even clang-trunk was struggling to vectorize it automatically. In particular, this will help skgpu::graphite::Rect's implementations of intersect and contains, which use any/all to get a final boolean value. In the Instruments app, I had see Rect's intersection as a hotspot on the Mac M1, and this vectorization helps a bit. Also takes the opportunity to remove fake C++14 constexpr for a real constexpr. Change-Id: Ib142e305ae5615056a777424e379b6da82d44f0c Reviewed-on: https://skia-review.googlesource.com/c/skia/+/542296 Commit-Queue: Michael Ludwig <michaelludwig@google.com> Reviewed-by: Herb Derby <herb@google.com>	2022-05-20 00:50:59 +00:00
Michael Ludwig	9b59fe655c	Convert color data to skvx::float4 from Sk4f Change-Id: I511f6105537b24953de1533ad7b73d1186afd4fc Reviewed-on: https://skia-review.googlesource.com/c/skia/+/541060 Commit-Queue: Michael Ludwig <michaelludwig@google.com> Reviewed-by: Brian Osman <brianosman@google.com>	2022-05-19 19:45:23 +00:00
Michael Ludwig	11c0ca2833	[graphite] Use scalar constructor in Rect::Infinite Change-Id: I38d16569a9270ac359d3c3ba4eb1045e805f7638 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/541419 Commit-Queue: Michael Ludwig <michaelludwig@google.com> Reviewed-by: Robert Phillips <robertphillips@google.com> Reviewed-by: Jorge Betancourt <jmbetancourt@google.com>	2022-05-17 20:03:24 +00:00
Michael Ludwig	5c08e3c357	Standardize on skvx aliases, plus clean-up This adds aliases like skvx::float2, float4, etc. to SkVx.h and goes through existing usages of SkVx to standardize on those aliases, or refer to the full name directly. In particular, this lets us clean up the equivalent aliases in src/gpu/tessellate, src/gpu/graphite/VectorTypes and src/gpu/ganesh/GrVx Where possible, I switched to using skvx::Foo directly and leveraged auto to make it less redundant. Headers always used the full type except for PatchWriter.h and Rect.h because of the number of their usages. In this case, the alias is scoped to private so it can't leak. This is prep to migrate older code that is still using SkNx and its aliases like Sk4f to SkVx as well. Change-Id: I9dd104e83cf17c2b88995a047cfd2e2b0fe6fac2 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/541058 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Michael Ludwig <michaelludwig@google.com>	2022-05-17 18:04:55 +00:00
Brian Osman	503f2b7f71	Remove unnecessary semi-colons Change-Id: I7a9d2b78865a4207be3ab1c1f613e9e829414f5d Reviewed-on: https://skia-review.googlesource.com/c/skia/+/507921 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Herb Derby <herb@google.com>	2022-02-11 22:13:11 +00:00
Ben Wagner	b042508dd5	Respect SKNX_NO_SIMD fully in SkVx. SkVx.h attempted to not use vector instructions if SKNX_NO_SIMD was set. However, this was incomplete and could lead to a wasm build with SKNX_NO_SIMD still trying to use some vector instructions which are not present if __wasm_simd128__ is defined. This change requires some additional "include what you use" includes since some other files were depending on SkVx.h including the vector instruction headers. Bug: cl/421848579 Change-Id: I6a878d64b76677a925b94724926c62f3e42ddd4c Reviewed-on: https://skia-review.googlesource.com/c/skia/+/496313 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Ben Wagner <bungeman@google.com>	2022-01-19 18:33:46 +00:00
Herb Derby	73065f325f	Reland "add a scaled uint32x4_t divided by uint32_t to SkVx" This is a reland of `35a74eab5d` Added guard for SKNX_NO_SIMD. I guess they don't want speedy goodness. Original change's description: > add a scaled uint32x4_t divided by uint32_t to SkVx > > This extracts the divide used in SkImageBlurFilter.cpp, and > encapsulates it into ScaledDividerU32. It generates results that > are with in +/- 1 of the rounded answer generated by doubles. > > I have added hand coded implementations for sse and for neon to > hopefully to avoid code generation problems. > > Bug: skia:12522 > > Change-Id: Ia7372d45895c799f69f8c0fd9fdea5efac321139 > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/458216 > Reviewed-by: Brian Osman <brianosman@google.com> > Commit-Queue: Herb Derby <herb@google.com> Bug: skia:12522 Change-Id: I9833a98f159827f483147c8155f1b92b7a7130ed Reviewed-on: https://skia-review.googlesource.com/c/skia/+/458716 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Herb Derby <herb@google.com>	2021-10-12 20:02:01 +00:00
Herb Derby	e4ac6eabe8	Revert "add a scaled uint32x4_t divided by uint32_t to SkVx" This reverts commit `35a74eab5d`. Reason for revert: Breaks Google3 Original change's description: > add a scaled uint32x4_t divided by uint32_t to SkVx > > This extracts the divide used in SkImageBlurFilter.cpp, and > encapsulates it into ScaledDividerU32. It generates results that > are with in +/- 1 of the rounded answer generated by doubles. > > I have added hand coded implementations for sse and for neon to > hopefully to avoid code generation problems. > > Bug: skia:12522 > > Change-Id: Ia7372d45895c799f69f8c0fd9fdea5efac321139 > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/458216 > Reviewed-by: Brian Osman <brianosman@google.com> > Commit-Queue: Herb Derby <herb@google.com> Bug: skia:12522 Change-Id: Id5d6968c813322dfc68e549e2f3afea7da9a0e18 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/c/skia/+/458258 Auto-Submit: Herb Derby <herb@google.com> Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>	2021-10-12 18:14:18 +00:00
Herb Derby	35a74eab5d	add a scaled uint32x4_t divided by uint32_t to SkVx This extracts the divide used in SkImageBlurFilter.cpp, and encapsulates it into ScaledDividerU32. It generates results that are with in +/- 1 of the rounded answer generated by doubles. I have added hand coded implementations for sse and for neon to hopefully to avoid code generation problems. Bug: skia:12522 Change-Id: Ia7372d45895c799f69f8c0fd9fdea5efac321139 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/458216 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Herb Derby <herb@google.com>	2021-10-12 15:56:03 +00:00
Chris Dalton	90a66821f0	Add convenient "xyzw" accessors and swizzles to skvx (take 2) Bug: skia:12515 Change-Id: I8db3501c129d93fc1eb822c90840119a7a7f2b4b Reviewed-on: https://skia-review.googlesource.com/c/skia/+/457478 Reviewed-by: Michael Ludwig <michaelludwig@google.com> Commit-Queue: Chris Dalton <csmartdalton@google.com>	2021-10-11 18:59:47 +00:00
Herb Derby	206c1f3f7e	use fp_contract for better code generation in approx_acos Change-Id: I96c842490ebaaae1733ee1359c46462ae1c80420 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/457896 Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Chris Dalton <csmartdalton@google.com>	2021-10-11 17:38:36 +00:00
Chris Dalton	bfd6b09dc9	Extract a "VecStorage" base class in skvx This is step 1 of a less surgical method of specializing vectors with constructors and swizzles. Bug: skia:12515 Change-Id: I4d65ad595387b35fa74df8564c73952e0a8b681c Reviewed-on: https://skia-review.googlesource.com/c/skia/+/457477 Commit-Queue: Chris Dalton <csmartdalton@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com>	2021-10-11 16:58:29 +00:00
Chris Dalton	c63e913f57	Revert "Add convenient "xyzw" accessors and swizzles to skvx" This reverts commit `01b02956c7`. Reason for revert: Codegen regressions Original change's description: > Add convenient "xyzw" accessors and swizzles to skvx > > Change-Id: Ic300285d10679a4e34190ab7b6b08bd1f6d80330 > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/454309 > Reviewed-by: Michael Ludwig <michaelludwig@google.com> > Commit-Queue: Chris Dalton <csmartdalton@google.com> Bug: skia:12515 Change-Id: Id853e4d9e25c6d2ae622668ef064e1b2b078b824 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/c/skia/+/457476 Auto-Submit: Chris Dalton <csmartdalton@google.com> Reviewed-by: Chris Dalton <csmartdalton@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Commit-Queue: Michael Ludwig <michaelludwig@google.com>	2021-10-08 23:55:11 +00:00
Chris Dalton	8fa6dbffff	Move approx_acos and strided loads from GrVx to SkVx Change-Id: Icf2d589b7a748f98cfa1be77217f5a21aed0a1b2 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/457187 Commit-Queue: Chris Dalton <csmartdalton@google.com> Reviewed-by: Brian Salomon <bsalomon@google.com>	2021-10-08 17:33:06 +00:00
Chris Dalton	01b02956c7	Add convenient "xyzw" accessors and swizzles to skvx Change-Id: Ic300285d10679a4e34190ab7b6b08bd1f6d80330 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/454309 Reviewed-by: Michael Ludwig <michaelludwig@google.com> Commit-Queue: Chris Dalton <csmartdalton@google.com>	2021-10-08 00:33:24 +00:00
Mike Klein	cd74dea856	macro hygiene in SkVx.h These macros are not meant to leak out of the file. Change-Id: I7e24f65a3053785410c7fac760fd3af46c5c1f1c Reviewed-on: https://skia-review.googlesource.com/c/skia/+/337739 Auto-Submit: Mike Klein <mtklein@google.com> Commit-Queue: Chris Dalton <csmartdalton@google.com> Reviewed-by: Chris Dalton <csmartdalton@google.com>	2020-11-23 22:49:33 +00:00
Chris Dalton	81b270a659	Optimize SkChopCubicAt to chop at two points at once Adds an SkChopCubicAt overload that performs two chops at once in SIMD. Also updates SkChopCubicAt to accept T values of 0 and 1. This has been the source of bugs in the past. Bug: skia:10419 Change-Id: Ic8a482a69192fb1685f3766411cbdceed830f9b7 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/327436 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Chris Dalton <csmartdalton@google.com>	2020-10-26 21:36:25 +00:00
Mike Klein	9a6efa6be2	move scalar functions out of SkVx.h Now that we have skvx::map(), anyone can write this sort of scalar-to-vector code. There are no vector instructions for these, so they'll never going to be particularly interesting for SkVx to provide. We did work out _approximate_ versions of each of these for SkVM, and that's what we use to evaluate these programs there. So if this stuff really matters we could port that logic back over to SkVx.h. But in terms of pure refactoring, I think this is where we want to sit until we decide to use those approximations. I don't really want to invest much time in the SkSLByteCode interpreter any more. Change-Id: I4e595dee5fd9e608905305e46b2aebcab986c561 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/326277 Commit-Queue: Brian Osman <brianosman@google.com> Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Brian Osman <brianosman@google.com>	2020-10-14 14:24:46 +00:00
Mike Klein	6b8b2ea6be	move cfi stifle post-refactor Cq-Include-Trybots: luci.chromium.try:linux_chromium_cfi_rel_ng Bug: chromium:1137652, chromium:1137958 Change-Id: I8575b588f9a1ba89740b95382b2462338e34bec5 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/326478 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-10-14 00:27:07 +00:00
Mike Klein	a221f1c36d	remove skvx::{rsqrt,rcp} These don't return reliable portable results, so I don't want to promote them as good ideas to use. You can get at least 5 different results from these across the four main architectures we support, and they've been the root cause of bugs uncovered only in production on undertested platforms. Luckily, unused outside of tests. Change-Id: I532731fe4cddf127253341e5ace8d9c5c9ebb0f1 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/326108 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-10-13 15:52:56 +00:00
Mike Klein	3637a44a36	update comments and rearrange SkVx.h Just a little refactor no-op. Change-Id: I1842a0190cd96c60da2fe3c7f88fa56c9f73af81 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/325681 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-10-12 20:06:13 +00:00
Mike Klein	840e8ea740	power up skvx::map Rewrite map() to allow any number of arguments, now also used for 2-argument (pow) and 3-argument (fma) operations. I left a note about fma()... I can't understand why, but calling as map(fmaf, x,y,z) ends up with scalar calls to fmaf(), but with the lambda indirection we see perfect vector codegen. I had to break map() back into two parts. I don't see any way to pass both a variadic number of arguments and play our trick with the default std::index_sequence parameter. The lane lambda similarly exists only to split up the expansion of the Rest... type pack from the I... index pack; you can't use two pack expansions in the same expression. Change-Id: Ia156a7fd846237f687d6018a7f95550c9fd4a56d Reviewed-on: https://skia-review.googlesource.com/c/skia/+/325736 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-10-12 19:43:33 +00:00
Mike Klein	7e129b8b89	Reland "refactor any()/all(), ptest for all()" This is a reland of `e24f7f3de7` ... with fix for ~0 constants for the pedantic MSVC. Original change's description: > refactor any()/all(), ptest for all() > > Part of this is a simple refactor, adapting any() and all() to the new > style of specialization. > > And with that refactor in place, add AVX2/SSE4.1 for all() using ptest. > This isn't terribly important, but it does help make Op::asserts run > faster in the SkVM interpreter. I like to run with asserts enabled, and > this makes passing asserts much cheaper---failing asserts are expensive > still of course, printing to SkDebugf(), etc. > > Change-Id: Iebdeee701fab7c50cce8e457674b565f7dd2ec21 > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317422 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Mike Klein <mtklein@google.com> Cq-Include-Trybots: luci.skia.skia.primary:Build-Win-MSVC-x86_64-Debug Change-Id: I93f08177ef3439e65e4383cc517dba60c0c4ef3e Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317638 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-09-17 21:19:04 +00:00
Mike Klein	952f8f17e4	Reland "update skvx scalar-fallback strategy" This is a reland of `4985db413d` ...with a better implementation of map(). I don't understand why we had to revert, but it had something with calling the function pointer in map_(), so maybe this will help. I've flattened the map_() / map() merge CL into this one, and marked the resulting map() as no_sanitize("cfi"). I don't see anything wrong, so I think it's a false positive. Original change's description: > update skvx scalar-fallback strategy > > Turns out Clang's a lot better at auto-vectorizing "obvious" scalar code > into obvious vector code when it's written out the long way, e.g. > > F32x4 x = ...; > x = { sqrtf(x[0]), sqrtf(x[1]), sqrtf(x[2]), sqrtf(x[3]) }; > > vectorizes into sqrtps a lot more reliably than our recurse-onto-scalars > strategy, and also better than the other naive approach, > > F32x4 x = ...; > for (int i = 0; i < 4; i++) { x[i] = sqrtf(x[i]); } > > So here I've added a map(V, fn) -> V' using C++14 tricks to let the > compiler handle the expansion of x = { fn(x[0]), fn(x[1]), ... > fn(x[N-1]) } for any N, and implemented most skvx scalar fallback code > using that. > > With these now vectorizing well at any N, we can remove any > specializations we'd written for particular N, really tidying up. > > Over in the SkVM interpreter, this is a big improvement for ceil and > floor, which were being done 2 floats at a time instead of 8. They're > now slimmed way down to > > shlq $6, %r13 > vroundps $K, (%r12,%r13), %ymm0 > vroundps $K, 32(%r12,%r13), %ymm1 > jmp ... > > where K is 9 or 10 depending on the op. > > I haven't found a scalar function that Clang will vectorize to vcvtps2pd > (the rounding one, not truncating vcvttps2pd), so I've kept lrint() > written the long way, updated to the style I've been using lately with > specializations inline. > > Change-Id: Ia97abe3c876008228bf62b1daacd6f6140408fc4 > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317375 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Mike Klein <mtklein@google.com> Cq-Include-Trybots: luci.chromium.try:linux_chromium_cfi_rel_ng Bug: chromium:1129408 Change-Id: Ia9c14074b9a14a67dd221f4925894d35a551f9d7 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317551 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-09-17 19:58:34 +00:00
Mike Klein	ed423aa542	Revert "update skvx scalar-fallback strategy" This reverts commit `4985db413d`. Reason for revert: ../../third_party/skia/include/private/SkVx.h:491:14: runtime error: control flow integrity check for type 'float (float)' failed during indirect function call (/lib/x86_64-linux-gnu/libm.so.6+0x36460): note: (unknown) defined here ../../third_party/skia/include/private/SkVx.h:491:14: note: check failed in /b/s/w/ir/out/Release/viz_unittests, destination function located in /lib/x86_64-linux-gnu/libm.so.6 #0 0x55e964d3c1f9 in skvx::Vec<4, float> skvx::map_<4, float, float, 0ul, 1ul, 2ul, 3ul>(skvx::Vec<4, float> const&, float (*)(float), std::__1::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul>) ./../../third_party/skia/include/private/SkVx.h:491 I don't understand what's wrong here, but I have a better map() coming up anyway. Original change's description: > update skvx scalar-fallback strategy > > Turns out Clang's a lot better at auto-vectorizing "obvious" scalar code > into obvious vector code when it's written out the long way, e.g. > > F32x4 x = ...; > x = { sqrtf(x[0]), sqrtf(x[1]), sqrtf(x[2]), sqrtf(x[3]) }; > > vectorizes into sqrtps a lot more reliably than our recurse-onto-scalars > strategy, and also better than the other naive approach, > > F32x4 x = ...; > for (int i = 0; i < 4; i++) { x[i] = sqrtf(x[i]); } > > So here I've added a map(V, fn) -> V' using C++14 tricks to let the > compiler handle the expansion of x = { fn(x[0]), fn(x[1]), ... > fn(x[N-1]) } for any N, and implemented most skvx scalar fallback code > using that. > > With these now vectorizing well at any N, we can remove any > specializations we'd written for particular N, really tidying up. > > Over in the SkVM interpreter, this is a big improvement for ceil and > floor, which were being done 2 floats at a time instead of 8. They're > now slimmed way down to > > shlq $6, %r13 > vroundps $K, (%r12,%r13), %ymm0 > vroundps $K, 32(%r12,%r13), %ymm1 > jmp ... > > where K is 9 or 10 depending on the op. > > I haven't found a scalar function that Clang will vectorize to vcvtps2pd > (the rounding one, not truncating vcvttps2pd), so I've kept lrint() > written the long way, updated to the style I've been using lately with > specializations inline. > > Change-Id: Ia97abe3c876008228bf62b1daacd6f6140408fc4 > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317375 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Mike Klein <mtklein@google.com> TBR=mtklein@google.com,herb@google.com Change-Id: I27b5eff3328bf2ddf7063ee0dee14a378ff23b89 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317546 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-09-17 14:10:10 +00:00
Mike Klein	7e8f13e751	Revert "refactor any()/all(), ptest for all()" This reverts commit `e24f7f3de7`. Reason for revert: Build-Win-MSVC-x86_64-Debug Original change's description: > refactor any()/all(), ptest for all() > > Part of this is a simple refactor, adapting any() and all() to the new > style of specialization. > > And with that refactor in place, add AVX2/SSE4.1 for all() using ptest. > This isn't terribly important, but it does help make Op::asserts run > faster in the SkVM interpreter. I like to run with asserts enabled, and > this makes passing asserts much cheaper---failing asserts are expensive > still of course, printing to SkDebugf(), etc. > > Change-Id: Iebdeee701fab7c50cce8e457674b565f7dd2ec21 > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317422 > Reviewed-by: Herb Derby <herb@google.com> > Commit-Queue: Mike Klein <mtklein@google.com> TBR=mtklein@google.com,herb@google.com Change-Id: Ib3ecbe93aa9d14b10dd87e8aa247f275c2c3eb67 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317545 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-09-17 14:08:50 +00:00
Mike Klein	e24f7f3de7	refactor any()/all(), ptest for all() Part of this is a simple refactor, adapting any() and all() to the new style of specialization. And with that refactor in place, add AVX2/SSE4.1 for all() using ptest. This isn't terribly important, but it does help make Op::asserts run faster in the SkVM interpreter. I like to run with asserts enabled, and this makes passing asserts much cheaper---failing asserts are expensive still of course, printing to SkDebugf(), etc. Change-Id: Iebdeee701fab7c50cce8e457674b565f7dd2ec21 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317422 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-09-17 13:37:58 +00:00
Mike Klein	4985db413d	update skvx scalar-fallback strategy Turns out Clang's a lot better at auto-vectorizing "obvious" scalar code into obvious vector code when it's written out the long way, e.g. F32x4 x = ...; x = { sqrtf(x[0]), sqrtf(x[1]), sqrtf(x[2]), sqrtf(x[3]) }; vectorizes into sqrtps a lot more reliably than our recurse-onto-scalars strategy, and also better than the other naive approach, F32x4 x = ...; for (int i = 0; i < 4; i++) { x[i] = sqrtf(x[i]); } So here I've added a map(V, fn) -> V' using C++14 tricks to let the compiler handle the expansion of x = { fn(x[0]), fn(x[1]), ... fn(x[N-1]) } for any N, and implemented most skvx scalar fallback code using that. With these now vectorizing well at any N, we can remove any specializations we'd written for particular N, really tidying up. Over in the SkVM interpreter, this is a big improvement for ceil and floor, which were being done 2 floats at a time instead of 8. They're now slimmed way down to shlq $6, %r13 vroundps $K, (%r12,%r13), %ymm0 vroundps $K, 32(%r12,%r13), %ymm1 jmp ... where K is 9 or 10 depending on the op. I haven't found a scalar function that Clang will vectorize to vcvtps2pd (the rounding one, not truncating vcvttps2pd), so I've kept lrint() written the long way, updated to the style I've been using lately with specializations inline. Change-Id: Ia97abe3c876008228bf62b1daacd6f6140408fc4 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317375 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-09-16 20:37:18 +00:00
Mike Klein	7b1620f366	refactor skvx min/max Implement min and max using if_then_else(y<x,...) on vectors rather than recursing to std::min/std::max applied to scalars. But actually, factor out and use naive_if_then_else(), which Clang can reason through better than it can our specialized if_then_else(). This lets every min() or max() I've looked at compile down to ideal codegen, vmaxps, vpminsw, etc, where if you use if_then_else() you'd see the literal comparison and blend as written. I've been looking at q14x2 codegen in the interpreter, and most things were already good, unexpectedly even uavg_q14x2. The biggest surprise was how bad the min/max codegen was, and looking back, even the min_f32 and max_f32 codegen is super bad. This CL fixes all that, leaving us with the ideal codegen using the specific instruction you'd want, replacing a giant mess of code that recursed down to scalars. mul_q14x2 is still bad, but an easy follow up. Change-Id: I77b5d7c9aa20a9a2f5ceb3e40f1e18ace2a1b5c1 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317310 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-09-16 17:10:18 +00:00
Mike Klein	4108364efc	_mm256_blendv_epi8 needs avx2 Change-Id: Ib10215e1e5a86bf78cc34f9dca670417bb217b73 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317271 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-09-16 03:40:01 +00:00
Mike Klein	c3ad6a1e59	make skvx::if_then_else work at byte granularity The default implementation of if_then_else is logically bitwise, (cond & true_val) \| (~cond & false_val) The existing skvx specializations work only for 32-bit lanes, but we can easily make them work for any type where the whole vector is the right size by reducing the granularity down to byte level. Existing code using 32-bit values and 0xffff'ffff or 0x0000'0000 masks will continue to work the same. But this now lets us use, e.g. 16-bit values with 0xffff and 0x0000 masks, or even things like 32-bit values and a mask like 0xff00ff00, selecting byte by byte. We can't go any lower without falling back on the generic bitwise implementation, so we'll have to settle for not getting to use a mask like 0x0f0f0f0f. Change-Id: I8518cb3cafc7f6e1480b4ae8af50daad2d28c5df Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317170 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-09-15 21:29:41 +00:00
Mike Klein	a1711092b2	skvx spring cleaning - remove some workarounds - more SI/SIN/SIT/SINT use - rewrap a lot of code to 100 cols - etc. misc. Change-Id: I78b7ff272afcbb8658cf147aad8af85d0e2acf42 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/314676 Auto-Submit: Mike Klein <mtklein@google.com> Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-09-02 15:22:55 +00:00
Jose Dapena Paz	dc4da5ac9a	GCC: fix type passed to vcvt_f32_f16 to be a float16x4_t in SkVx.h GCC intrisics type validation is stricter than the one in Clang, so passing a uint16x4_t to a function expected to accept float16x4_t is not valid. Bug: chromium:819294 Change-Id: I6d68e5458345e78bdb05dd028481fe9cae36c5ff Reviewed-on: https://skia-review.googlesource.com/c/skia/+/307276 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-07-31 19:09:33 +00:00
Elliot Evans	6b2602ed2d	Remove remaining usages of skvx::mad This CL attempts to remove the remaining subset of skvx::mad usages. https://skia-review.googlesource.com/c/skia/+/304853 removes all usages of skvx::mad but causes small differences in rendering, so it is not suitable for landing. https://skia-review.googlesource.com/c/skia/+/306702/ removes all non-nested usages of skvx::mad Change-Id: Iab5d4cfd0feb856c38b3ebbfe3bf3ed5aad20fe6 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/306722 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com> Commit-Queue: Elliot Evans <elliotevans@google.com>	2020-07-30 20:41:09 +00:00
Mike Klein	4d680cdf07	a bunch of half-related stuff - add f32<->f16 functions to skvx - add f32<->f16 x86 instructions to skvm::Assembler - add f32<->f16 ops to skvm, using the skvx functions in the interpreter Still TODO: use the new x86 instructions in the JIT (For now like in many other ways, the aarch64 JIT continues to languish. Will pick that back up one day.) Change-Id: Ib8dc1ccdc75ecb23769ea4947d66d3ab22520f23 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/302942 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-07-15 20:47:31 +00:00
Elliot Evans	72b8aea019	Fix experimental_simd CanvasKit build. Although it appeared that the experimental_simd CanvasKit build was working, the build was not producing actual wasm SIMD operations. This CL fixes that issue by changing the build arguments. This issue also fixes an incorrect type issue with the SkVx wasm SIMD implementation. Bug: skia:10453 Change-Id: If26f84b09e4d84df36be589245878c821972dffc Reviewed-on: https://skia-review.googlesource.com/c/skia/+/302669 Reviewed-by: Kevin Lubick <kjlubick@google.com> Reviewed-by: Mike Klein <mtklein@google.com>	2020-07-15 20:39:42 +00:00
Mike Klein	5cb47d6a88	refactor skvx::if_then_else() First move if_then_else() specializations inline using a quasi-constexpr-if approach, letting them operate on any types of the right vector and lane size. We can't use constexpr-if per se because this header is sometimes used in C++14 contexts. Then, add AVX specialization for 8x32-bit types. SkVM's interpreter uses if_then_else() on three i32x16, and these changes allow that to vectorize ideally, as two vblendvps instructions. Change-Id: I8355c47975c736c1fbc32b1f8ceddb772978d271 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/302080 Auto-Submit: Mike Klein <mtklein@google.com> Commit-Queue: Brian Osman <brianosman@google.com> Reviewed-by: Brian Osman <brianosman@google.com>	2020-07-13 15:02:07 +00:00
Elliot Evans	fe7e74b3a7	Add an `experimental_simd` build target to CanvasKit. The `experimental_simd` build target builds CanvasKit using the Emscripten `-msimd128` flag, to build CanvasKit with SIMD instructions in the compiled WASM. This build of CanvasKit works in Chrome Canary 86.0.4186.0 with chrome://flags#enable-webassembly-simd enabled. Also add WebAssembly-specific intrinsics to SkVx.h to enable support for almost all native SIMD operations in CanvasKit WebAssmebly. Also add a Skia/modules/canvaskit/wasm_tools/SIMD folder which contains build_simd_test.sh for testing whether WASM SIMD intrinsics operations are actually being used by skvx, and for testing correctness of WASM SIMD operations. Also contains simd_float_test.cpp and simd_int_test.cpp which serve as documentation for which operations are correctly turned into WASM SIMD operations by emscripten. Bug: skia:10453 Change-Id: Icd312b4d189e8d8667d3ffe12a72bfa6febaab2f Reviewed-on: https://skia-review.googlesource.com/c/skia/+/299705 Reviewed-by: Mike Klein <mtklein@google.com>	2020-06-30 22:52:31 +00:00
Florin Malita	3facc9c886	[skottie] Non-legacy brightness effect Includes POW intrinsic plumbing. Change-Id: Ida961718e28822c8559f17f97003f67082dd44cc Reviewed-on: https://skia-review.googlesource.com/c/skia/+/287156 Commit-Queue: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com>	2020-05-04 14:04:24 +00:00
Mike Reed	8520e76f05	migrate spiral demo from canvaskit - add atan, fract, dividef, subtractf Also wants mix(), but I'm still learning how to handle 2 args functions (e.g. how to support atan(y,x) as well) Change-Id: Ib9f233cd1c4266110cfea68a7d444f834f875f1f Reviewed-on: https://skia-review.googlesource.com/c/skia/+/286276 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Mike Klein <mtklein@google.com> Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Reed <reed@google.com>	2020-04-30 16:52:22 +00:00
Mike Klein	c2160258f3	add skvx::{sin,cos,tan} This lets us get rid of VECTOR_UNARY_FN_VEC. I don't know exactly what was wrong with VECTOR_UNARY_FN_VEC, but `color.rgb = color.rgb + a(sin(6.28color.rgb)*0.159)` looks ok to me now when run through the interpreter. Change-Id: I700398cd55eca1b8e1b3b46858415ecae5585a32 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/286065 Reviewed-by: Florin Malita <fmalita@chromium.org> Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2020-04-29 15:39:12 +00:00
Mike Klein	5caf7dee25	restore Op::round While I think trunc(mad(x, scale, 0.5)) is fine for doing our float to fixed point conversions, round(mul(x, scale)) was kind of better all around: - better rounding than +0.5 and trunc - faster when mad() is not an fma - often now no need to use the constant 0.5f or have it in a register - allows the mul() in to_unorm to use mul_f32_imm Those last two points are key... this actually frees up 2 registers in the x86 JIT when using to_unorm(). So I think maybe we can resurrect round and still guarantee our desired intra-machine stability by committing to using instructions that follow the current rounding mode, which is what [v]cvtps2dq inextricably uses. Left some notes on the ARM impl... we're rounding to nearest even there, which is probably the current mode anyway, but to be more correct we need a slightly longer impl that rounds float->float then "truncates". Unsure whether it matters in practice. Same deal in the unit test that I added back, now testing negative and 0.5 cases too. The expectations assume the current mode is nearest even. I had the idea to resurrect this when I was looking at adding _imm Ops for fma_f32. I noticed that the y and z arguments to an fma_f32 were by far most likely to be constants, and when they are, they're by far likely to both be constants, e.g. 255.0f & 0.5f from to_unorm(8,...). llvm disassembly for SkVM_round unit test looks good: ~ $ llc -mcpu=haswell /tmp/skvm-jit-1231521224.bc -o - .section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 15 .globl "_skvm-jit-1231521224" ## -- Begin function skvm-jit-1231521224 .p2align 4, 0x90 "_skvm-jit-1231521224": ## @skvm-jit-1231521224 .cfi_startproc cmpl $8, %edi jl LBB0_3 .p2align 4, 0x90 LBB0_2: ## %loopK ## =>This Inner Loop Header: Depth=1 vcvtps2dq (%rsi), %ymm0 vmovupd %ymm0, (%rdx) addl $-8, %edi addq $32, %rsi addq $32, %rdx cmpl $8, %edi jge LBB0_2 LBB0_3: ## %hoist1 xorl %eax, %eax testl %edi, %edi jle LBB0_6 .p2align 4, 0x90 LBB0_5: ## %loop1 ## =>This Inner Loop Header: Depth=1 vcvtss2si (%rsi,%rax), %ecx movl %ecx, (%rdx,%rax) decl %edi addq $4, %rax testl %edi, %edi jg LBB0_5 LBB0_6: ## %leave vzeroupper retq .cfi_endproc ## -- End function Change-Id: Ib59eb3fd8a6805397850d93226c6c6d37cc3ab84 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/276738 Auto-Submit: Mike Klein <mtklein@google.com> Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-03-12 21:10:34 +00:00
Mike Klein	ec370976c6	move skvm interpreter to SkOpts again This is the easiest way to guarantee Op::fma_f32 actually fuses, by using platform intrinsics. While implementing this we noticed that quad-pumping was actually slower than double-pumping by about 25%, and single-pumping was between the two. Switch from quad to double pumping. Change-Id: Ib93fd175fb8f6aaf49f769a95edfa9fd6b2674f6 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/275299 Commit-Queue: Mike Klein <mtklein@google.com> Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-03-05 17:47:42 +00:00
Mike Klein	21ef0d5424	force-inline skvx methods These are inline, but still subject to the ODR, and in Debug builds they might not be inlined. This fixes one unit test failure on the x86 Debug GCC Test bot. Bug: skia:9664 Change-Id: Id3837fdfbf69bd7012339d89d16e8dedaf113de2 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/260520 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-12-17 22:25:18 +00:00
Mike Klein	7d3b27d90e	free skvx from its Skia shackles Remove the need to include SkTypes.h in SkVx.h, making SkVx entirely independent of Skia. As an experiment, switch to checking Clang/GCC-style __SSE__ / __ARM_NEON defines directly instead of the slightly more abstract SK_CPU_SSE_LEVEL / SK_ARM_HAS_NEON. Those SK_ defines only exist to help SSE detection on MSVC, which SkVx generates serial code for anyway. If this sticks I may do this same sort of change all through Skia. Change-Id: I1c51fd6ba1fa48f199ce623824d5ef20ff6be995 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219541 Reviewed-by: Brian Osman <brianosman@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-07 18:08:23 +00:00
Mike Klein	c0bd9f9fe5	rewrite includes to not need so much -Ifoo Current strategy: everything from the top Things to look at first are the manual changes: - added tools/rewrite_includes.py - removed -Idirectives from BUILD.gn - various compile.sh simplifications - tweak tools/embed_resources.py - update gn/find_headers.py to write paths from the top - update gn/gn_to_bp.py SkUserConfig.h layout so that #include "include/config/SkUserConfig.h" always gets the header we want. No-Presubmit: true Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Hal Canary <halcanary@google.com> Reviewed-by: Brian Osman <brianosman@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>	2019-04-24 16:27:11 +00:00
Mike Klein	96e4e53cf1	Reland "align skvx::Vec<N,T> to Nsizeof(T)" This is a reland of `e3b110dc6e` PS1 is the original, so best to diff against that. This is the original with compiler workarounds. Original change's description: > align skvx::Vec<N,T> to Nsizeof(T) > > This increases the alignment of these vector types. I would have liked > to keep the alignment minimal, but it's probably no big deal either way. > > In terms of code generation, it doesn't make much difference for x86 or > ARMv8, but it seems hugely important for good ARMv7 NEON code. It's a > ~10x difference for the bench I've been playing around with that spends > most of its time in that SkOpts::blit_row_color32 routine. > > Bug: chromium:952502 > Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581 > Commit-Queue: Michael Ludwig <michaelludwig@google.com> > Reviewed-by: Michael Ludwig <michaelludwig@google.com> > Auto-Submit: Mike Klein <mtklein@google.com> Bug: chromium:952502 Cq-Include-Trybots: skia.primary:Test-Win2016-MSVC-GCE-CPU-AVX2-x86-Release-All,Build-Debian9-GCC-mips64el-Debug Change-Id: Ief99e14ab4de5a56840ed6bb326cf7669c51dc97 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208681 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-04-16 19:48:50 +00:00
Mike Klein	9a885b27f3	pass SkVx::Vec arguments as const& Yet another surprising finding when looking at ARM code generation is that passing these values to functions by const& does make a difference, even when fully inlined. I can only guess that the compiler's somehow more sure that way that the values won't change? Anyway, convert all skvx functions that take Vec arguments to take const Vec& instead. This tweak is enough to let the natural implementation of mull() actually produce good code generation, so I've promoted that to SkVx.h and added a unit test. Notice in the NEON case we've got a base case at N=8 and two recursive cases, one down to 8 as usual when N > 8, but also one up to 8 when N < 8. This also is another big speedup for ARMv7 NEON, bringing it to nearly the same speed as ARMv8 NEON on the same device. Bug: chromium:952502 Change-Id: I0f19bab45cf02222ccc8090053ea2a4a380f1dfe Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208582 Commit-Queue: Michael Ludwig <michaelludwig@google.com> Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com>	2019-04-16 19:24:50 +00:00

1 2

64 Commits