Commit Graph

45 Commits

Author SHA1 Message Date
Mike Klein
9a6efa6be2 move scalar functions out of SkVx.h
Now that we have skvx::map(), anyone can write this sort of
scalar-to-vector code.  There are no vector instructions for these, so
they'll never going to be particularly interesting for SkVx to provide.

We did work out _approximate_ versions of each of these for SkVM, and
that's what we use to evaluate these programs there.  So if this stuff
really matters we could port that logic back over to SkVx.h.

But in terms of pure refactoring, I think this is where we want to sit
until we decide to use those approximations.  I don't really want to
invest much time in the SkSLByteCode interpreter any more.

Change-Id: I4e595dee5fd9e608905305e46b2aebcab986c561
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/326277
Commit-Queue: Brian Osman <brianosman@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
2020-10-14 14:24:46 +00:00
Mike Klein
6b8b2ea6be move cfi stifle post-refactor
Cq-Include-Trybots: luci.chromium.try:linux_chromium_cfi_rel_ng
Bug: chromium:1137652, chromium:1137958
Change-Id: I8575b588f9a1ba89740b95382b2462338e34bec5
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/326478
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-10-14 00:27:07 +00:00
Mike Klein
a221f1c36d remove skvx::{rsqrt,rcp}
These don't return reliable portable results, so I don't want to promote
them as good ideas to use.  You can get at least 5 different results
from these across the four main architectures we support, and they've
been the root cause of bugs uncovered only in production on undertested
platforms.

Luckily, unused outside of tests.

Change-Id: I532731fe4cddf127253341e5ace8d9c5c9ebb0f1
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/326108
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-10-13 15:52:56 +00:00
Mike Klein
3637a44a36 update comments and rearrange SkVx.h
Just a little refactor no-op.

Change-Id: I1842a0190cd96c60da2fe3c7f88fa56c9f73af81
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/325681
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-10-12 20:06:13 +00:00
Mike Klein
840e8ea740 power up skvx::map
Rewrite map() to allow any number of arguments,
now also used for 2-argument (pow) and 3-argument (fma) operations.

I left a note about fma()... I can't understand why, but calling as
map(fmaf, x,y,z) ends up with scalar calls to fmaf(), but with the
lambda indirection we see perfect vector codegen.

I had to break map() back into two parts. I don't see any way to pass
both a variadic number of arguments and play our trick with the default
std::index_sequence parameter.  The lane lambda similarly exists only to
split up the expansion of the Rest... type pack from the I... index
pack; you can't use two pack expansions in the same expression.

Change-Id: Ia156a7fd846237f687d6018a7f95550c9fd4a56d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/325736
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-10-12 19:43:33 +00:00
Mike Klein
7e129b8b89 Reland "refactor any()/all(), ptest for all()"
This is a reland of e24f7f3de7
... with fix for ~0 constants for the pedantic MSVC.

Original change's description:
> refactor any()/all(), ptest for all()
>
> Part of this is a simple refactor, adapting any() and all() to the new
> style of specialization.
>
> And with that refactor in place, add AVX2/SSE4.1 for all() using ptest.
> This isn't terribly important, but it does help make Op::asserts run
> faster in the SkVM interpreter.  I like to run with asserts enabled, and
> this makes passing asserts much cheaper---failing asserts are expensive
> still of course, printing to SkDebugf(), etc.
>
> Change-Id: Iebdeee701fab7c50cce8e457674b565f7dd2ec21
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317422
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>

Cq-Include-Trybots: luci.skia.skia.primary:Build-Win-MSVC-x86_64-Debug
Change-Id: I93f08177ef3439e65e4383cc517dba60c0c4ef3e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317638
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-09-17 21:19:04 +00:00
Mike Klein
952f8f17e4 Reland "update skvx scalar-fallback strategy"
This is a reland of 4985db413d

...with a better implementation of map().  I don't understand
why we had to revert, but it had something with calling the
function pointer in map_(), so maybe this will help.

I've flattened the map_() / map() merge CL into this one,
and marked the resulting map() as no_sanitize("cfi").  I
don't see anything wrong, so I think it's a false positive.

Original change's description:
> update skvx scalar-fallback strategy
>
> Turns out Clang's a lot better at auto-vectorizing "obvious" scalar code
> into obvious vector code when it's written out the long way, e.g.
>
>      F32x4 x = ...;
>      x = { sqrtf(x[0]), sqrtf(x[1]), sqrtf(x[2]), sqrtf(x[3]) };
>
> vectorizes into sqrtps a lot more reliably than our recurse-onto-scalars
> strategy, and also better than the other naive approach,
>
>      F32x4 x = ...;
>      for (int i = 0; i < 4; i++) { x[i] = sqrtf(x[i]); }
>
> So here I've added a map(V, fn) -> V' using C++14 tricks to let the
> compiler handle the expansion of x = { fn(x[0]), fn(x[1]), ...
> fn(x[N-1]) } for any N, and implemented most skvx scalar fallback code
> using that.
>
> With these now vectorizing well at any N, we can remove any
> specializations we'd written for particular N, really tidying up.
>
> Over in the SkVM interpreter, this is a big improvement for ceil and
> floor, which were being done 2 floats at a time instead of 8.  They're
> now slimmed way down to
>
>    shlq       $6, %r13
>    vroundps   $K, (%r12,%r13), %ymm0
>    vroundps   $K, 32(%r12,%r13), %ymm1
>    jmp        ...
>
> where K is 9 or 10 depending on the op.
>
> I haven't found a scalar function that Clang will vectorize to vcvtps2pd
> (the rounding one, not truncating vcvttps2pd), so I've kept lrint()
> written the long way, updated to the style I've been using lately with
> specializations inline.
>
> Change-Id: Ia97abe3c876008228bf62b1daacd6f6140408fc4
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317375
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>

Cq-Include-Trybots: luci.chromium.try:linux_chromium_cfi_rel_ng
Bug: chromium:1129408
Change-Id: Ia9c14074b9a14a67dd221f4925894d35a551f9d7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317551
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-09-17 19:58:34 +00:00
Mike Klein
ed423aa542 Revert "update skvx scalar-fallback strategy"
This reverts commit 4985db413d.

Reason for revert:

../../third_party/skia/include/private/SkVx.h:491:14: runtime error: control flow integrity check for type 'float (float)' failed during indirect function call
(/lib/x86_64-linux-gnu/libm.so.6+0x36460): note: (unknown) defined here
../../third_party/skia/include/private/SkVx.h:491:14: note: check failed in /b/s/w/ir/out/Release/viz_unittests, destination function located in /lib/x86_64-linux-gnu/libm.so.6
    #0 0x55e964d3c1f9 in skvx::Vec<4, float> skvx::map_<4, float, float, 0ul, 1ul, 2ul, 3ul>(skvx::Vec<4, float> const&, float (*)(float), std::__1::integer_sequence<unsigned long, 0ul, 1ul, 2ul, 3ul>) ./../../third_party/skia/include/private/SkVx.h:491

I don't understand what's wrong here, but I have a better map() coming up anyway.

Original change's description:
> update skvx scalar-fallback strategy
> 
> Turns out Clang's a lot better at auto-vectorizing "obvious" scalar code
> into obvious vector code when it's written out the long way, e.g.
> 
>      F32x4 x = ...;
>      x = { sqrtf(x[0]), sqrtf(x[1]), sqrtf(x[2]), sqrtf(x[3]) };
> 
> vectorizes into sqrtps a lot more reliably than our recurse-onto-scalars
> strategy, and also better than the other naive approach,
> 
>      F32x4 x = ...;
>      for (int i = 0; i < 4; i++) { x[i] = sqrtf(x[i]); }
> 
> So here I've added a map(V, fn) -> V' using C++14 tricks to let the
> compiler handle the expansion of x = { fn(x[0]), fn(x[1]), ...
> fn(x[N-1]) } for any N, and implemented most skvx scalar fallback code
> using that.
> 
> With these now vectorizing well at any N, we can remove any
> specializations we'd written for particular N, really tidying up.
> 
> Over in the SkVM interpreter, this is a big improvement for ceil and
> floor, which were being done 2 floats at a time instead of 8.  They're
> now slimmed way down to
> 
>    shlq       $6, %r13
>    vroundps   $K, (%r12,%r13), %ymm0
>    vroundps   $K, 32(%r12,%r13), %ymm1
>    jmp        ...
> 
> where K is 9 or 10 depending on the op.
> 
> I haven't found a scalar function that Clang will vectorize to vcvtps2pd
> (the rounding one, not truncating vcvttps2pd), so I've kept lrint()
> written the long way, updated to the style I've been using lately with
> specializations inline.
> 
> Change-Id: Ia97abe3c876008228bf62b1daacd6f6140408fc4
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317375
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>

TBR=mtklein@google.com,herb@google.com

Change-Id: I27b5eff3328bf2ddf7063ee0dee14a378ff23b89
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317546
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-09-17 14:10:10 +00:00
Mike Klein
7e8f13e751 Revert "refactor any()/all(), ptest for all()"
This reverts commit e24f7f3de7.

Reason for revert: Build-Win-MSVC-x86_64-Debug


Original change's description:
> refactor any()/all(), ptest for all()
> 
> Part of this is a simple refactor, adapting any() and all() to the new
> style of specialization.
> 
> And with that refactor in place, add AVX2/SSE4.1 for all() using ptest.
> This isn't terribly important, but it does help make Op::asserts run
> faster in the SkVM interpreter.  I like to run with asserts enabled, and
> this makes passing asserts much cheaper---failing asserts are expensive
> still of course, printing to SkDebugf(), etc.
> 
> Change-Id: Iebdeee701fab7c50cce8e457674b565f7dd2ec21
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317422
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>

TBR=mtklein@google.com,herb@google.com

Change-Id: Ib3ecbe93aa9d14b10dd87e8aa247f275c2c3eb67
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317545
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-09-17 14:08:50 +00:00
Mike Klein
e24f7f3de7 refactor any()/all(), ptest for all()
Part of this is a simple refactor, adapting any() and all() to the new
style of specialization.

And with that refactor in place, add AVX2/SSE4.1 for all() using ptest.
This isn't terribly important, but it does help make Op::asserts run
faster in the SkVM interpreter.  I like to run with asserts enabled, and
this makes passing asserts much cheaper---failing asserts are expensive
still of course, printing to SkDebugf(), etc.

Change-Id: Iebdeee701fab7c50cce8e457674b565f7dd2ec21
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317422
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-09-17 13:37:58 +00:00
Mike Klein
4985db413d update skvx scalar-fallback strategy
Turns out Clang's a lot better at auto-vectorizing "obvious" scalar code
into obvious vector code when it's written out the long way, e.g.

     F32x4 x = ...;
     x = { sqrtf(x[0]), sqrtf(x[1]), sqrtf(x[2]), sqrtf(x[3]) };

vectorizes into sqrtps a lot more reliably than our recurse-onto-scalars
strategy, and also better than the other naive approach,

     F32x4 x = ...;
     for (int i = 0; i < 4; i++) { x[i] = sqrtf(x[i]); }

So here I've added a map(V, fn) -> V' using C++14 tricks to let the
compiler handle the expansion of x = { fn(x[0]), fn(x[1]), ...
fn(x[N-1]) } for any N, and implemented most skvx scalar fallback code
using that.

With these now vectorizing well at any N, we can remove any
specializations we'd written for particular N, really tidying up.

Over in the SkVM interpreter, this is a big improvement for ceil and
floor, which were being done 2 floats at a time instead of 8.  They're
now slimmed way down to

   shlq       $6, %r13
   vroundps   $K, (%r12,%r13), %ymm0
   vroundps   $K, 32(%r12,%r13), %ymm1
   jmp        ...

where K is 9 or 10 depending on the op.

I haven't found a scalar function that Clang will vectorize to vcvtps2pd
(the rounding one, not truncating vcvttps2pd), so I've kept lrint()
written the long way, updated to the style I've been using lately with
specializations inline.

Change-Id: Ia97abe3c876008228bf62b1daacd6f6140408fc4
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317375
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-09-16 20:37:18 +00:00
Mike Klein
7b1620f366 refactor skvx min/max
Implement min and max using if_then_else(y<x,...) on vectors
rather than recursing to std::min/std::max applied to scalars.

But actually, factor out and use naive_if_then_else(), which Clang can
reason through better than it can our specialized if_then_else().  This
lets every min() or max() I've looked at compile down to ideal codegen,
vmaxps, vpminsw, etc, where if you use if_then_else() you'd see the
literal comparison and blend as written.

I've been looking at q14x2 codegen in the interpreter, and most things
were already good, unexpectedly even uavg_q14x2.  The biggest surprise
was how bad the min/max codegen was, and looking back, even the min_f32
and max_f32 codegen is super bad.  This CL fixes all that, leaving us
with the ideal codegen using the specific instruction you'd want,
replacing a giant mess of code that recursed down to scalars.

mul_q14x2 is still bad, but an easy follow up.

Change-Id: I77b5d7c9aa20a9a2f5ceb3e40f1e18ace2a1b5c1
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317310
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-09-16 17:10:18 +00:00
Mike Klein
4108364efc _mm256_blendv_epi8 needs avx2
Change-Id: Ib10215e1e5a86bf78cc34f9dca670417bb217b73
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317271
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-09-16 03:40:01 +00:00
Mike Klein
c3ad6a1e59 make skvx::if_then_else work at byte granularity
The default implementation of if_then_else is logically bitwise,

   (cond & true_val) | (~cond & false_val)

The existing skvx specializations work only for 32-bit lanes, but we can
easily make them work for any type where the whole vector is the right
size by reducing the granularity down to byte level.

Existing code using 32-bit values and 0xffff'ffff or 0x0000'0000 masks
will continue to work the same.  But this now lets us use, e.g. 16-bit
values with 0xffff and 0x0000 masks, or even things like 32-bit values
and a mask like 0xff00ff00, selecting byte by byte.

We can't go any lower without falling back on the generic bitwise
implementation, so we'll have to settle for not getting to use a mask
like 0x0f0f0f0f.

Change-Id: I8518cb3cafc7f6e1480b4ae8af50daad2d28c5df
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317170
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-09-15 21:29:41 +00:00
Mike Klein
a1711092b2 skvx spring cleaning
- remove some workarounds
  - more SI/SIN/SIT/SINT use
  - rewrap a lot of code to 100 cols
  - etc. misc.

Change-Id: I78b7ff272afcbb8658cf147aad8af85d0e2acf42
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/314676
Auto-Submit: Mike Klein <mtklein@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-09-02 15:22:55 +00:00
Jose Dapena Paz
dc4da5ac9a GCC: fix type passed to vcvt_f32_f16 to be a float16x4_t in SkVx.h
GCC intrisics type validation is stricter than the one in Clang, so
passing a uint16x4_t to a function expected to accept float16x4_t
is not valid.

Bug: chromium:819294
Change-Id: I6d68e5458345e78bdb05dd028481fe9cae36c5ff
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/307276
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-07-31 19:09:33 +00:00
Elliot Evans
6b2602ed2d Remove remaining usages of skvx::mad
This CL attempts to remove the remaining subset of skvx::mad usages.
https://skia-review.googlesource.com/c/skia/+/304853 removes all usages
of skvx::mad but causes small differences in rendering, so it is not
suitable for landing.

https://skia-review.googlesource.com/c/skia/+/306702/ removes all
non-nested usages of skvx::mad

Change-Id: Iab5d4cfd0feb856c38b3ebbfe3bf3ed5aad20fe6
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/306722
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Elliot Evans <elliotevans@google.com>
2020-07-30 20:41:09 +00:00
Mike Klein
4d680cdf07 a bunch of half-related stuff
- add f32<->f16 functions to skvx
  - add f32<->f16 x86 instructions to skvm::Assembler
  - add f32<->f16 ops to skvm,
    using the skvx functions in the interpreter

Still TODO:
    use the new x86 instructions in the JIT

(For now like in many other ways, the aarch64 JIT
continues to languish.  Will pick that back up one day.)

Change-Id: Ib8dc1ccdc75ecb23769ea4947d66d3ab22520f23
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/302942
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-07-15 20:47:31 +00:00
Elliot Evans
72b8aea019 Fix experimental_simd CanvasKit build.
Although it appeared that the experimental_simd CanvasKit build was
working, the build was not producing actual wasm SIMD operations. This
CL fixes that issue by changing the build arguments.

This issue also fixes an incorrect type issue with the SkVx wasm SIMD
implementation.

Bug: skia:10453
Change-Id: If26f84b09e4d84df36be589245878c821972dffc
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/302669
Reviewed-by: Kevin Lubick <kjlubick@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
2020-07-15 20:39:42 +00:00
Mike Klein
5cb47d6a88 refactor skvx::if_then_else()
First move if_then_else() specializations inline using a
quasi-constexpr-if approach, letting them operate on any types of the
right vector and lane size.  We can't use constexpr-if per se because
this header is sometimes used in C++14 contexts.

Then, add AVX specialization for 8x32-bit types.

SkVM's interpreter uses if_then_else() on three i32x16, and these
changes allow that to vectorize ideally, as two vblendvps instructions.

Change-Id: I8355c47975c736c1fbc32b1f8ceddb772978d271
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/302080
Auto-Submit: Mike Klein <mtklein@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
2020-07-13 15:02:07 +00:00
Elliot Evans
fe7e74b3a7 Add an experimental_simd build target to CanvasKit.
The `experimental_simd` build target builds
CanvasKit using the Emscripten `-msimd128` flag, to build CanvasKit
with SIMD instructions in the compiled WASM. This build of
CanvasKit works in Chrome Canary 86.0.4186.0
with chrome://flags#enable-webassembly-simd enabled.

Also add WebAssembly-specific intrinsics to SkVx.h to enable
support for almost all native SIMD operations in CanvasKit WebAssmebly.

Also add a Skia/modules/canvaskit/wasm_tools/SIMD folder which contains
build_simd_test.sh for testing whether WASM SIMD intrinsics operations
are actually being used by skvx, and for testing correctness of
WASM SIMD operations. Also contains simd_float_test.cpp and
simd_int_test.cpp which serve as documentation for which operations are
correctly turned into WASM SIMD operations by emscripten.

Bug: skia:10453
Change-Id: Icd312b4d189e8d8667d3ffe12a72bfa6febaab2f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/299705
Reviewed-by: Mike Klein <mtklein@google.com>
2020-06-30 22:52:31 +00:00
Florin Malita
3facc9c886 [skottie] Non-legacy brightness effect
Includes POW intrinsic plumbing.

Change-Id: Ida961718e28822c8559f17f97003f67082dd44cc
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/287156
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@google.com>
2020-05-04 14:04:24 +00:00
Mike Reed
8520e76f05 migrate spiral demo from canvaskit
- add atan, fract, dividef, subtractf

Also wants mix(), but I'm still learning how to handle 2 args
functions (e.g. how to support atan(y,x) as well)

Change-Id: Ib9f233cd1c4266110cfea68a7d444f834f875f1f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/286276
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Reed <reed@google.com>
2020-04-30 16:52:22 +00:00
Mike Klein
c2160258f3 add skvx::{sin,cos,tan}
This lets us get rid of VECTOR_UNARY_FN_VEC.

I don't know exactly what was wrong with VECTOR_UNARY_FN_VEC,
but `color.rgb = color.rgb + a*(sin(6.28*color.rgb)*0.159)` looks
ok to me now when run through the interpreter.

Change-Id: I700398cd55eca1b8e1b3b46858415ecae5585a32
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/286065
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-04-29 15:39:12 +00:00
Mike Klein
5caf7dee25 restore Op::round
While I think trunc(mad(x, scale, 0.5)) is fine for doing our float
to fixed point conversions, round(mul(x, scale)) was kind of better
all around:

   - better rounding than +0.5 and trunc
   - faster when mad() is not an fma
   - often now no need to use the constant 0.5f or have it in a register
   - allows the mul() in to_unorm to use mul_f32_imm

Those last two points are key... this actually frees up 2 registers in
the x86 JIT when using to_unorm().

So I think maybe we can resurrect round and still guarantee our desired
intra-machine stability by committing to using instructions that follow
the current rounding mode, which is what [v]cvtps2dq inextricably uses.

Left some notes on the ARM impl... we're rounding to nearest even there,
which is probably the current mode anyway, but to be more correct we
need a slightly longer impl that rounds float->float then "truncates".
Unsure whether it matters in practice.  Same deal in the unit test that
I added back, now testing negative and 0.5 cases too. The expectations
assume the current mode is nearest even.

I had the idea to resurrect this when I was looking at adding _imm Ops
for fma_f32.  I noticed that the y and z arguments to an fma_f32 were by
far most likely to be constants, and when they are, they're by far likely
to both be constants, e.g. 255.0f & 0.5f from to_unorm(8,...).

llvm disassembly for SkVM_round unit test looks good:

~ $ llc -mcpu=haswell /tmp/skvm-jit-1231521224.bc -o -
	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 15
	.globl	"_skvm-jit-1231521224"  ## -- Begin function skvm-jit-1231521224
	.p2align	4, 0x90
"_skvm-jit-1231521224":                 ## @skvm-jit-1231521224
	.cfi_startproc
	cmpl	$8, %edi
	jl	LBB0_3
	.p2align	4, 0x90
LBB0_2:                                 ## %loopK
                                        ## =>This Inner Loop Header: Depth=1
	vcvtps2dq	(%rsi), %ymm0
	vmovupd	%ymm0, (%rdx)
	addl	$-8, %edi
	addq	$32, %rsi
	addq	$32, %rdx
	cmpl	$8, %edi
	jge	LBB0_2
LBB0_3:                                 ## %hoist1
	xorl	%eax, %eax
	testl	%edi, %edi
	jle	LBB0_6
	.p2align	4, 0x90
LBB0_5:                                 ## %loop1
                                        ## =>This Inner Loop Header: Depth=1
	vcvtss2si	(%rsi,%rax), %ecx
	movl	%ecx, (%rdx,%rax)
	decl	%edi
	addq	$4, %rax
	testl	%edi, %edi
	jg	LBB0_5
LBB0_6:                                 ## %leave
	vzeroupper
	retq
	.cfi_endproc
                                        ## -- End function

Change-Id: Ib59eb3fd8a6805397850d93226c6c6d37cc3ab84
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/276738
Auto-Submit: Mike Klein <mtklein@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-03-12 21:10:34 +00:00
Mike Klein
ec370976c6 move skvm interpreter to SkOpts again
This is the easiest way to guarantee Op::fma_f32
actually fuses, by using platform intrinsics.

While implementing this we noticed that quad-pumping
was actually slower than double-pumping by about 25%,
and single-pumping was between the two.  Switch from
quad to double pumping.

Change-Id: Ib93fd175fb8f6aaf49f769a95edfa9fd6b2674f6
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/275299
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-03-05 17:47:42 +00:00
Mike Klein
21ef0d5424 force-inline skvx methods
These are inline, but still subject to the ODR,
and in Debug builds they might not be inlined.

This fixes one unit test failure on the x86 Debug GCC Test bot.

Bug: skia:9664
Change-Id: Id3837fdfbf69bd7012339d89d16e8dedaf113de2
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/260520
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-12-17 22:25:18 +00:00
Mike Klein
7d3b27d90e free skvx from its Skia shackles
Remove the need to include SkTypes.h in SkVx.h,
making SkVx entirely independent of Skia.

As an experiment, switch to checking Clang/GCC-style __SSE__ /
__ARM_NEON defines directly instead of the slightly more abstract
SK_CPU_SSE_LEVEL / SK_ARM_HAS_NEON.

Those SK_ defines only exist to help SSE detection on MSVC, which SkVx
generates serial code for anyway.

If this sticks I may do this same sort of change all through Skia.

Change-Id: I1c51fd6ba1fa48f199ce623824d5ef20ff6be995
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219541
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-06-07 18:08:23 +00:00
Mike Klein
c0bd9f9fe5 rewrite includes to not need so much -Ifoo
Current strategy: everything from the top

Things to look at first are the manual changes:

   - added tools/rewrite_includes.py
   - removed -Idirectives from BUILD.gn
   - various compile.sh simplifications
   - tweak tools/embed_resources.py
   - update gn/find_headers.py to write paths from the top
   - update gn/gn_to_bp.py SkUserConfig.h layout
     so that #include "include/config/SkUserConfig.h" always
     gets the header we want.

No-Presubmit: true
Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Hal Canary <halcanary@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
2019-04-24 16:27:11 +00:00
Mike Klein
96e4e53cf1 Reland "align skvx::Vec<N,T> to N*sizeof(T)"
This is a reland of e3b110dc6e

PS1 is the original, so best to diff against that.
This is the original with compiler workarounds.

Original change's description:
> align skvx::Vec<N,T> to N*sizeof(T)
>
> This increases the alignment of these vector types.  I would have liked
> to keep the alignment minimal, but it's probably no big deal either way.
>
> In terms of code generation, it doesn't make much difference for x86 or
> ARMv8, but it seems hugely important for good ARMv7 NEON code.  It's a
> ~10x difference for the bench I've been playing around with that spends
> most of its time in that SkOpts::blit_row_color32 routine.
>
> Bug: chromium:952502
> Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581
> Commit-Queue: Michael Ludwig <michaelludwig@google.com>
> Reviewed-by: Michael Ludwig <michaelludwig@google.com>
> Auto-Submit: Mike Klein <mtklein@google.com>

Bug: chromium:952502
Cq-Include-Trybots: skia.primary:Test-Win2016-MSVC-GCE-CPU-AVX2-x86-Release-All,Build-Debian9-GCC-mips64el-Debug
Change-Id: Ief99e14ab4de5a56840ed6bb326cf7669c51dc97
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208681
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-04-16 19:48:50 +00:00
Mike Klein
9a885b27f3 pass SkVx::Vec arguments as const&
Yet another surprising finding when looking at ARM code generation is
that passing these values to functions by const& does make a difference,
even when fully inlined.  I can only guess that the compiler's somehow
more sure that way that the values won't change?  Anyway, convert all
skvx functions that take Vec arguments to take const Vec& instead.

This tweak is enough to let the natural implementation of mull()
actually produce good code generation, so I've promoted that to SkVx.h
and added a unit test.  Notice in the NEON case we've got a base case at
N=8 and two recursive cases, one down to 8 as usual when N > 8, but also
one up to 8 when N < 8.

This also is another big speedup for ARMv7 NEON, bringing it to nearly
the same speed as ARMv8 NEON on the same device.

Bug: chromium:952502
Change-Id: I0f19bab45cf02222ccc8090053ea2a4a380f1dfe
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208582
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
2019-04-16 19:24:50 +00:00
Mike Klein
50303b9adc Revert "align skvx::Vec<N,T> to N*sizeof(T)"
This reverts commit e3b110dc6e.

Reason for revert: bot failures

Original change's description:
> align skvx::Vec<N,T> to N*sizeof(T)
> 
> This increases the alignment of these vector types.  I would have liked
> to keep the alignment minimal, but it's probably no big deal either way.
> 
> In terms of code generation, it doesn't make much difference for x86 or
> ARMv8, but it seems hugely important for good ARMv7 NEON code.  It's a
> ~10x difference for the bench I've been playing around with that spends
> most of its time in that SkOpts::blit_row_color32 routine.
> 
> Bug: chromium:952502
> Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581
> Commit-Queue: Michael Ludwig <michaelludwig@google.com>
> Reviewed-by: Michael Ludwig <michaelludwig@google.com>
> Auto-Submit: Mike Klein <mtklein@google.com>

TBR=mtklein@google.com,michaelludwig@google.com

Change-Id: I72357b9775685efcc2cd75db220711c8145b8ac4
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: chromium:952502
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208680
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-04-16 18:25:54 +00:00
Mike Klein
e3b110dc6e align skvx::Vec<N,T> to N*sizeof(T)
This increases the alignment of these vector types.  I would have liked
to keep the alignment minimal, but it's probably no big deal either way.

In terms of code generation, it doesn't make much difference for x86 or
ARMv8, but it seems hugely important for good ARMv7 NEON code.  It's a
~10x difference for the bench I've been playing around with that spends
most of its time in that SkOpts::blit_row_color32 routine.

Bug: chromium:952502
Change-Id: Ib12caad6b9b3f3f6e821ed70bfb57099db37b15f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/208581
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
2019-04-16 17:59:22 +00:00
Mike Klein
3bad19cfc2 use __builtin_shufflevector when available
See https://clang.llvm.org/docs/LanguageExtensions.html#langext-builtin-shufflevector

It's basically exactly skvx::shuffle(), but allows two input vectors.
I just pass the same vector twice.

Change-Id: I3920e2b156b4b85843eaf197adb540d8296c5569
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207723
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
2019-04-11 21:11:58 +00:00
Mike Klein
4b44a0d01a add SkVx helpers for working with unorm8
These replicate the base logic of Sk4px::Wide::div255() and
Sk4px::approxMulDiv255(), and will come in handy replacing them.

No platform specializations yet... want to remind myself what
codegen they get from these vanilla versions first, and then
I'll fill in the platform specific stuff as needed.  The tests
should cover everything pretty exhaustively.

Change-Id: I5854d1bc0902a85cbb2351f669c4da7cc31a8775
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207683
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
2019-04-11 17:54:23 +00:00
Mike Klein
1c62426f96 make to_vec template parameters explicit
For some reason, Clang can infer <N,T> but GCC can't.
No big deal... we know exactly the ones we want anyway.

Change-Id: I15ba4d4edbd3bc0f37ebe3c2b6e411726cd9fb69
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207341
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
2019-04-10 19:45:55 +00:00
Mike Klein
da7b053527 tweak SkVx to play nicely with others
Was starting to use this and ran into a few problems with clashing
symbols, namely SI and cast().  Seemed simple enough to not use SI,
and to move all the free-standing types into skvx: skvx::cast,
skvx::shuffle, etc.

Change-Id: Ia5d8ef6d0ae5375bf80d76be88d16f0c9cde56e7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/207340
Commit-Queue: Mike Klein <mtklein@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
2019-04-10 19:40:05 +00:00
Mike Klein
f4438d56e9 skvx: allow more implicit conversions
Guarding the implict constructors and scalar/vector
operations with std::is_convertible ought to make SkVx
types feel more like normal C types, allowing implicit
conversions exactly when the scalar equivalents would.

This shouldn't change the behavior of any code, or make
anything new possible... just nicer to read and write.

Change-Id: Iff4b89012c5b8c7f7933e6841c925b81186bc614
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/201402
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
2019-03-14 19:11:21 +00:00
Mike Klein
e9fc58663e mark more methods inline
Since these are all already static, it doesn't have any real functional
impact in terms of linking or codegen.  But it does supress unused
function warnings in compilation units that don't use everything.

Add a new SI boilerplate macro to go along with SINT and SIT.

Change-Id: If2c09951b7453338dd20a3a88e3abbee5eefcd27
Reviewed-on: https://skia-review.googlesource.com/c/195921
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
2019-02-27 20:27:10 +00:00
Mike Klein
41b995c4fb specialize if_then_else(int4,float4,float4)
Add SSE, SSE4.1, and NEON specializations.

The if_then_else() unit tests in SkVxTest.cpp should cover this.

I had to give up on my dream of not using Skia headers for now.  There's
really no good way of knowing whether we've got SSE4.1 support in MSVC
except when we explicitly define SK_CPU_SSE_LEVEL=SK_CPU_SSE_LEVEL_SSE41.

This refactor to use SK_CPU_SSE_LEVEL let MSVC point out a slight
ordering problem that would cause an infinite loop calling any of
the specializions like sqrt(float2).  I believe moving them after
the float4 specializations will fix that.

Change-Id: I83639f378a182716d1b37e92b6d725472698f874
Reviewed-on: https://skia-review.googlesource.com/c/195920
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Michael Ludwig <michaelludwig@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-02-27 20:12:20 +00:00
Mike Klein
cd9ef731fe Mask -> M
Just trying to get things mostly under 100 cols.

Change-Id: Ifc8f4f0b78a89dfc5ba6ca2e310e969f1880e194
Reviewed-on: https://skia-review.googlesource.com/c/191001
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-02-09 19:11:18 +00:00
Mike Klein
dcfc3ef110 skvx wip
- remove ALWAYS_INLINE until we find we need it

 - make bit_puns explicit

 - implement everything recursively so, e.g.
   sqrt(float8) picks up sqrt(float4) when
   not otherwise specialized.

 - implement SSE specializations:
   of the operations I tested, only sqrt, rcp, and rsqrt
   needed any help.  The others look good as-is.

Change-Id: I1b679c7bd9a99f952272b118d7ade2469b55d604
Reviewed-on: https://skia-review.googlesource.com/c/190222
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-02-07 20:06:46 +00:00
Mike Klein
53a5298a2f add mad() and shuffle() to SkVx
Change-Id: Ie3e5b353f84e74d398a5350dc0baff5541789119
Reviewed-on: https://skia-review.googlesource.com/c/189982
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2019-02-06 21:12:48 +00:00
Mike Klein
429251513f fill in most remaining skvx operations
Obviously lots of these new operations like sqrt() will want platform
specialization.  That'll come later.

Change-Id: Ia0758425d4ec5911968a3d0ad63fa387b9b4cb39
Reviewed-on: https://skia-review.googlesource.com/c/189848
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
2019-02-06 20:03:24 +00:00
Mike Klein
455c74797b sketch SkVx
Change-Id: I1cb8113af243ed6327179d295835295834a752aa
Reviewed-on: https://skia-review.googlesource.com/c/189581
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
2019-02-06 16:06:32 +00:00