e0fe62adaa
store128() has been lowered into SkVM Ops strangely (two interlocking 64-bit stores) only because of SkVM's limit of three arguments per Op. With four arguments we can lower store128() in a straightforward way. Perhaps surprisingly, I've left the implementations of store128 fairly naive, with narrower stores than having all this data together in one place allows. I do want to follow up here, but not so much because the speed of store128 is important, rather more so because getting the tools in place for idiomatic store128 implementations will lead us down a path with great knock-on effects for more interesting features. We'll need four adjacent temporary registers to use the ARM-idiomatic st2.4s/st4.4s approaches for store64/store128, and the idiomatic x86 implementations need multiple temporary registers too. Once we're able to manage multiple adjacent registers as a unit, we'll be able to stretch the idea to things like load64/load128 returning 2 or 4 registers worth of data from a single Op. And the ultimate goal is in Half-is-fp16 mode, where we'll be able to fill one register with 16-bit float/int/mask data and spread any 32-bit data across a register pair. Change-Id: Ieb20d8b7d00e9d806cb27fd30ebfd50ae9317da7 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/355936 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com> |
||
---|---|---|
.. | ||
Sk4px_NEON.h | ||
Sk4px_none.h | ||
Sk4px_SSE2.h | ||
SkBitmapProcState_opts.h | ||
SkBlitMask_opts.h | ||
SkBlitRow_opts.h | ||
SkChecksum_opts.h | ||
SkOpts_avx.cpp | ||
SkOpts_crc32.cpp | ||
SkOpts_hsw.cpp | ||
SkOpts_skx.cpp | ||
SkOpts_sse41.cpp | ||
SkOpts_sse42.cpp | ||
SkOpts_ssse3.cpp | ||
SkRasterPipeline_opts.h | ||
SkSwizzler_opts.h | ||
SkUtils_opts.h | ||
SkVM_opts.h | ||
SkXfermode_opts.h |