4985db413d
Turns out Clang's a lot better at auto-vectorizing "obvious" scalar code into obvious vector code when it's written out the long way, e.g. F32x4 x = ...; x = { sqrtf(x[0]), sqrtf(x[1]), sqrtf(x[2]), sqrtf(x[3]) }; vectorizes into sqrtps a lot more reliably than our recurse-onto-scalars strategy, and also better than the other naive approach, F32x4 x = ...; for (int i = 0; i < 4; i++) { x[i] = sqrtf(x[i]); } So here I've added a map(V, fn) -> V' using C++14 tricks to let the compiler handle the expansion of x = { fn(x[0]), fn(x[1]), ... fn(x[N-1]) } for any N, and implemented most skvx scalar fallback code using that. With these now vectorizing well at any N, we can remove any specializations we'd written for particular N, really tidying up. Over in the SkVM interpreter, this is a big improvement for ceil and floor, which were being done 2 floats at a time instead of 8. They're now slimmed way down to shlq $6, %r13 vroundps $K, (%r12,%r13), %ymm0 vroundps $K, 32(%r12,%r13), %ymm1 jmp ... where K is 9 or 10 depending on the op. I haven't found a scalar function that Clang will vectorize to vcvtps2pd (the rounding one, not truncating vcvttps2pd), so I've kept lrint() written the long way, updated to the style I've been using lately with specializations inline. Change-Id: Ia97abe3c876008228bf62b1daacd6f6140408fc4 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317375 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com> |
||
---|---|---|
.. | ||
android | ||
c | ||
codec | ||
config | ||
core | ||
docs | ||
effects | ||
encode | ||
gpu | ||
pathops | ||
ports | ||
private | ||
svg | ||
third_party | ||
utils |