342b1b2753
I'm staring at this assembly, vmovups (%rsi), %ymm3 vpsrld $24, %ymm3, %ymm4 vpslld $16, %ymm4, %ymm15 vorps %ymm4, %ymm15, %ymm4 vpsubw %ymm4, %ymm0, %ymm4 Just knowing that could be vmovups (%rsi), %ymm3 vpshufb 0x??(%rip), %ymm3, %ymm4 vpsubw %ymm4, %ymm0, %ymm4 That is, instead of shifting, shifting, and bit-oring to create the 0a0a scale factor from ymm3, we could just byte shuffle directly using some pre-baked control pattern (stored at the end of the program like other constants) pshufb lets you arbitrarily remix bytes from its argument and zero bytes, and NEON has a similar family of vtbl instructions, even including that same feature of injecting zeroes. I think I've got this working, and the speedup is great, from 0.19 to 0.16 ns/px for I32_SWAR, and from 0.43 to 0.38 ns/px for I32. Change-Id: Iab850275e826b4187f0efc9495a4b9eab4402c38 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220871 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com> |
||
---|---|---|
.. | ||
android_fonts | ||
empty_images | ||
fonts | ||
icc_profiles | ||
images | ||
invalid_images | ||
lua | ||
nima | ||
particles | ||
skottie | ||
text | ||
Cowboy.svg | ||
crbug769134.fil | ||
nov-talk-sequence.txt | ||
pdf_command_stream.txt | ||
README | ||
SkVMTest.expected |
The resources directory includes some third party content used by Skia. Licenses for that code are included in this file. Openclipart Openclipart uses the Creative Commons Zero 1.0 Public Domain License every time an artist uploads a piece of clipart to Openclipart to make it clear the artist is releasing the creative work for anyone to use for any reason, even commercially. This act of "sharing" is the foundation Openclipart is based upon. More details on the license can be found at https://creativecommons.org/publicdomain/zero/1.0/. LGPL or compatible (as implied by inclusion in KDE SVN) http://websvn.kde.org/trunk/tests/ksvgtests/custom/cowboy.svg