skia2

Author	SHA1	Message	Date
Mike Klein	5caf7dee25	restore Op::round While I think trunc(mad(x, scale, 0.5)) is fine for doing our float to fixed point conversions, round(mul(x, scale)) was kind of better all around: - better rounding than +0.5 and trunc - faster when mad() is not an fma - often now no need to use the constant 0.5f or have it in a register - allows the mul() in to_unorm to use mul_f32_imm Those last two points are key... this actually frees up 2 registers in the x86 JIT when using to_unorm(). So I think maybe we can resurrect round and still guarantee our desired intra-machine stability by committing to using instructions that follow the current rounding mode, which is what [v]cvtps2dq inextricably uses. Left some notes on the ARM impl... we're rounding to nearest even there, which is probably the current mode anyway, but to be more correct we need a slightly longer impl that rounds float->float then "truncates". Unsure whether it matters in practice. Same deal in the unit test that I added back, now testing negative and 0.5 cases too. The expectations assume the current mode is nearest even. I had the idea to resurrect this when I was looking at adding _imm Ops for fma_f32. I noticed that the y and z arguments to an fma_f32 were by far most likely to be constants, and when they are, they're by far likely to both be constants, e.g. 255.0f & 0.5f from to_unorm(8,...). llvm disassembly for SkVM_round unit test looks good: ~ $ llc -mcpu=haswell /tmp/skvm-jit-1231521224.bc -o - .section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 15 .globl "_skvm-jit-1231521224" ## -- Begin function skvm-jit-1231521224 .p2align 4, 0x90 "_skvm-jit-1231521224": ## @skvm-jit-1231521224 .cfi_startproc cmpl $8, %edi jl LBB0_3 .p2align 4, 0x90 LBB0_2: ## %loopK ## =>This Inner Loop Header: Depth=1 vcvtps2dq (%rsi), %ymm0 vmovupd %ymm0, (%rdx) addl $-8, %edi addq $32, %rsi addq $32, %rdx cmpl $8, %edi jge LBB0_2 LBB0_3: ## %hoist1 xorl %eax, %eax testl %edi, %edi jle LBB0_6 .p2align 4, 0x90 LBB0_5: ## %loop1 ## =>This Inner Loop Header: Depth=1 vcvtss2si (%rsi,%rax), %ecx movl %ecx, (%rdx,%rax) decl %edi addq $4, %rax testl %edi, %edi jg LBB0_5 LBB0_6: ## %leave vzeroupper retq .cfi_endproc ## -- End function Change-Id: Ib59eb3fd8a6805397850d93226c6c6d37cc3ab84 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/276738 Auto-Submit: Mike Klein <mtklein@google.com> Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-03-12 21:10:34 +00:00
Mike Klein	ec370976c6	move skvm interpreter to SkOpts again This is the easiest way to guarantee Op::fma_f32 actually fuses, by using platform intrinsics. While implementing this we noticed that quad-pumping was actually slower than double-pumping by about 25%, and single-pumping was between the two. Switch from quad to double pumping. Change-Id: Ib93fd175fb8f6aaf49f769a95edfa9fd6b2674f6 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/275299 Commit-Queue: Mike Klein <mtklein@google.com> Commit-Queue: Herb Derby <herb@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-03-05 17:47:42 +00:00
Mike Klein	21ef0d5424	force-inline skvx methods These are inline, but still subject to the ODR, and in Debug builds they might not be inlined. This fixes one unit test failure on the x86 Debug GCC Test bot. Bug: skia:9664 Change-Id: Id3837fdfbf69bd7012339d89d16e8dedaf113de2 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/260520 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-12-17 22:25:18 +00:00
Mike Klein	7d3b27d90e	free skvx from its Skia shackles Remove the need to include SkTypes.h in SkVx.h, making SkVx entirely independent of Skia. As an experiment, switch to checking Clang/GCC-style __SSE__ / __ARM_NEON defines directly instead of the slightly more abstract SK_CPU_SSE_LEVEL / SK_ARM_HAS_NEON. Those SK_ defines only exist to help SSE detection on MSVC, which SkVx generates serial code for anyway. If this sticks I may do this same sort of change all through Skia. Change-Id: I1c51fd6ba1fa48f199ce623824d5ef20ff6be995 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219541 Reviewed-by: Brian Osman <brianosman@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-07 18:08:23 +00:00
Mike Klein	c0bd9f9fe5	rewrite includes to not need so much -Ifoo Current strategy: everything from the top Things to look at first are the manual changes: - added tools/rewrite_includes.py - removed -Idirectives from BUILD.gn - various compile.sh simplifications - tweak tools/embed_resources.py - update gn/find_headers.py to write paths from the top - update gn/gn_to_bp.py SkUserConfig.h layout so that #include "include/config/SkUserConfig.h" always gets the header we want. No-Presubmit: true Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Hal Canary <halcanary@google.com> Reviewed-by: Brian Osman <brianosman@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>	2019-04-24 16:27:11 +00:00
Mike Klein	96e4e53cf1	Reland "align skvx::Vec<N,T> to Nsizeof(T)" This is a reland of `e3b110dc6e` PS1 is the original, so best to diff against that. This is the original with compiler workarounds. Original change's description: > align skvx::Vec<N,T> to Nsizeof(T) > > This increases the alignment of these vector types. I would have liked > to keep the alignment minimal, but it's probably no big deal either way. > > In terms of code generation, it doesn't make much difference for x86 or > ARMv8, but it seems hugely important for good ARMv7 NEON code. It's a > ~10x difference for the bench I've been playing around with that spends > most of its time in that SkOpts::blit_row_color32 routine. > > Bug: chromium:952502 > Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581 > Commit-Queue: Michael Ludwig <michaelludwig@google.com> > Reviewed-by: Michael Ludwig <michaelludwig@google.com> > Auto-Submit: Mike Klein <mtklein@google.com> Bug: chromium:952502 Cq-Include-Trybots: skia.primary:Test-Win2016-MSVC-GCE-CPU-AVX2-x86-Release-All,Build-Debian9-GCC-mips64el-Debug Change-Id: Ief99e14ab4de5a56840ed6bb326cf7669c51dc97 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208681 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-04-16 19:48:50 +00:00
Mike Klein	9a885b27f3	pass SkVx::Vec arguments as const& Yet another surprising finding when looking at ARM code generation is that passing these values to functions by const& does make a difference, even when fully inlined. I can only guess that the compiler's somehow more sure that way that the values won't change? Anyway, convert all skvx functions that take Vec arguments to take const Vec& instead. This tweak is enough to let the natural implementation of mull() actually produce good code generation, so I've promoted that to SkVx.h and added a unit test. Notice in the NEON case we've got a base case at N=8 and two recursive cases, one down to 8 as usual when N > 8, but also one up to 8 when N < 8. This also is another big speedup for ARMv7 NEON, bringing it to nearly the same speed as ARMv8 NEON on the same device. Bug: chromium:952502 Change-Id: I0f19bab45cf02222ccc8090053ea2a4a380f1dfe Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208582 Commit-Queue: Michael Ludwig <michaelludwig@google.com> Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com>	2019-04-16 19:24:50 +00:00
Mike Klein	50303b9adc	Revert "align skvx::Vec<N,T> to Nsizeof(T)" This reverts commit `e3b110dc6e`. Reason for revert: bot failures Original change's description: > align skvx::Vec<N,T> to Nsizeof(T) > > This increases the alignment of these vector types. I would have liked > to keep the alignment minimal, but it's probably no big deal either way. > > In terms of code generation, it doesn't make much difference for x86 or > ARMv8, but it seems hugely important for good ARMv7 NEON code. It's a > ~10x difference for the bench I've been playing around with that spends > most of its time in that SkOpts::blit_row_color32 routine. > > Bug: chromium:952502 > Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f > Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581 > Commit-Queue: Michael Ludwig <michaelludwig@google.com> > Reviewed-by: Michael Ludwig <michaelludwig@google.com> > Auto-Submit: Mike Klein <mtklein@google.com> TBR=mtklein@google.com,michaelludwig@google.com Change-Id: I72357b9775685efcc2cd75db220711c8145b8ac4 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: chromium:952502 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208680 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-04-16 18:25:54 +00:00
Mike Klein	e3b110dc6e	align skvx::Vec<N,T> to N*sizeof(T) This increases the alignment of these vector types. I would have liked to keep the alignment minimal, but it's probably no big deal either way. In terms of code generation, it doesn't make much difference for x86 or ARMv8, but it seems hugely important for good ARMv7 NEON code. It's a ~10x difference for the bench I've been playing around with that spends most of its time in that SkOpts::blit_row_color32 routine. Bug: chromium:952502 Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581 Commit-Queue: Michael Ludwig <michaelludwig@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Auto-Submit: Mike Klein <mtklein@google.com>	2019-04-16 17:59:22 +00:00
Mike Klein	3bad19cfc2	use __builtin_shufflevector when available See https://clang.llvm.org/docs/LanguageExtensions.html#langext-builtin-shufflevector It's basically exactly skvx::shuffle(), but allows two input vectors. I just pass the same vector twice. Change-Id: I3920e2b156b4b85843eaf197adb540d8296c5569 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207723 Commit-Queue: Michael Ludwig <michaelludwig@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Auto-Submit: Mike Klein <mtklein@google.com>	2019-04-11 21:11:58 +00:00
Mike Klein	4b44a0d01a	add SkVx helpers for working with unorm8 These replicate the base logic of Sk4px::Wide::div255() and Sk4px::approxMulDiv255(), and will come in handy replacing them. No platform specializations yet... want to remind myself what codegen they get from these vanilla versions first, and then I'll fill in the platform specific stuff as needed. The tests should cover everything pretty exhaustively. Change-Id: I5854d1bc0902a85cbb2351f669c4da7cc31a8775 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207683 Commit-Queue: Mike Klein <mtklein@google.com> Commit-Queue: Michael Ludwig <michaelludwig@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Auto-Submit: Mike Klein <mtklein@google.com>	2019-04-11 17:54:23 +00:00
Mike Klein	1c62426f96	make to_vec template parameters explicit For some reason, Clang can infer <N,T> but GCC can't. No big deal... we know exactly the ones we want anyway. Change-Id: I15ba4d4edbd3bc0f37ebe3c2b6e411726cd9fb69 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207341 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com>	2019-04-10 19:45:55 +00:00
Mike Klein	da7b053527	tweak SkVx to play nicely with others Was starting to use this and ran into a few problems with clashing symbols, namely SI and cast(). Seemed simple enough to not use SI, and to move all the free-standing types into skvx: skvx::cast, skvx::shuffle, etc. Change-Id: Ia5d8ef6d0ae5375bf80d76be88d16f0c9cde56e7 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207340 Commit-Queue: Mike Klein <mtklein@google.com> Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com>	2019-04-10 19:40:05 +00:00
Mike Klein	f4438d56e9	skvx: allow more implicit conversions Guarding the implict constructors and scalar/vector operations with std::is_convertible ought to make SkVx types feel more like normal C types, allowing implicit conversions exactly when the scalar equivalents would. This shouldn't change the behavior of any code, or make anything new possible... just nicer to read and write. Change-Id: Iff4b89012c5b8c7f7933e6841c925b81186bc614 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/201402 Commit-Queue: Mike Klein <mtklein@google.com> Commit-Queue: Michael Ludwig <michaelludwig@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Auto-Submit: Mike Klein <mtklein@google.com>	2019-03-14 19:11:21 +00:00
Mike Klein	e9fc58663e	mark more methods inline Since these are all already static, it doesn't have any real functional impact in terms of linking or codegen. But it does supress unused function warnings in compilation units that don't use everything. Add a new SI boilerplate macro to go along with SINT and SIT. Change-Id: If2c09951b7453338dd20a3a88e3abbee5eefcd27 Reviewed-on: https://skia-review.googlesource.com/c/195921 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com>	2019-02-27 20:27:10 +00:00
Mike Klein	41b995c4fb	specialize if_then_else(int4,float4,float4) Add SSE, SSE4.1, and NEON specializations. The if_then_else() unit tests in SkVxTest.cpp should cover this. I had to give up on my dream of not using Skia headers for now. There's really no good way of knowing whether we've got SSE4.1 support in MSVC except when we explicitly define SK_CPU_SSE_LEVEL=SK_CPU_SSE_LEVEL_SSE41. This refactor to use SK_CPU_SSE_LEVEL let MSVC point out a slight ordering problem that would cause an infinite loop calling any of the specializions like sqrt(float2). I believe moving them after the float4 specializations will fix that. Change-Id: I83639f378a182716d1b37e92b6d725472698f874 Reviewed-on: https://skia-review.googlesource.com/c/195920 Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Michael Ludwig <michaelludwig@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-02-27 20:12:20 +00:00
Mike Klein	cd9ef731fe	Mask -> M Just trying to get things mostly under 100 cols. Change-Id: Ifc8f4f0b78a89dfc5ba6ca2e310e969f1880e194 Reviewed-on: https://skia-review.googlesource.com/c/191001 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-02-09 19:11:18 +00:00
Mike Klein	dcfc3ef110	skvx wip - remove ALWAYS_INLINE until we find we need it - make bit_puns explicit - implement everything recursively so, e.g. sqrt(float8) picks up sqrt(float4) when not otherwise specialized. - implement SSE specializations: of the operations I tested, only sqrt, rcp, and rsqrt needed any help. The others look good as-is. Change-Id: I1b679c7bd9a99f952272b118d7ade2469b55d604 Reviewed-on: https://skia-review.googlesource.com/c/190222 Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-02-07 20:06:46 +00:00
Mike Klein	53a5298a2f	add mad() and shuffle() to SkVx Change-Id: Ie3e5b353f84e74d398a5350dc0baff5541789119 Reviewed-on: https://skia-review.googlesource.com/c/189982 Commit-Queue: Mike Klein <mtklein@google.com> Commit-Queue: Herb Derby <herb@google.com> Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>	2019-02-06 21:12:48 +00:00
Mike Klein	429251513f	fill in most remaining skvx operations Obviously lots of these new operations like sqrt() will want platform specialization. That'll come later. Change-Id: Ia0758425d4ec5911968a3d0ad63fa387b9b4cb39 Reviewed-on: https://skia-review.googlesource.com/c/189848 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com> Auto-Submit: Mike Klein <mtklein@google.com>	2019-02-06 20:03:24 +00:00
Mike Klein	455c74797b	sketch SkVx Change-Id: I1cb8113af243ed6327179d295835295834a752aa Reviewed-on: https://skia-review.googlesource.com/c/189581 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com> Auto-Submit: Mike Klein <mtklein@google.com>	2019-02-06 16:06:32 +00:00

21 Commits