d3cc16c8bb
Like any other Instruction the store*s are assigned a destination register d, which doesn't really make sense, but works perfectly as a temporary register. This means store8 doesn't need to reserve xmm/ymm15 as a temporary... it already has one naturally. As you might expect, the examples we have so far assign the consumed input x register as the d register, so things that used to look like vpackusdw %ymm6 ,%ymm6 ,%ymm15 vpermq $0xd8 ,%ymm15,%ymm15 vpackuswb %ymm15,%ymm15,%ymm15 vmoq %xmm15,(%rdx) now look more like vpackusdw %ymm6,%ymm6,%ymm6 vpermq $0xd8,%ymm6,%ymm6 vpackuswb %ymm6,%ymm6,%ymm6 vmoq %xmm6,(%rdx) Should be no perf difference, just simplified register bookkeeping. This may suggest splitting load8/store8 into finer instructions, two to do the physical loads and stores, and two for the 8->32 and 32->8 widen and narrow? On the other hand load8 really is just one vpmovzxbd instruction, so it'd be a shame to split it. I suspect this will become more clear as I add 16-bit support. Change-Id: I7c2b4d6b1689d40b50382f65fc00c01c54529c8a Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220543 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@google.com> |
||
---|---|---|
animations | ||
bench | ||
bin | ||
dm | ||
docker | ||
docs/examples | ||
example | ||
experimental | ||
fuzz | ||
gm | ||
gn | ||
include | ||
infra | ||
modules | ||
platform_tools | ||
resources | ||
samplecode | ||
site | ||
specs | ||
src | ||
tests | ||
third_party | ||
tools | ||
.clang-format | ||
.clang-tidy | ||
.gitignore | ||
.gn | ||
AUTHORS | ||
BUILD.gn | ||
codereview.settings | ||
CONTRIBUTING | ||
CQ_COMMITTERS | ||
DEPS | ||
go.mod | ||
go.sum | ||
LICENSE | ||
OWNERS | ||
PRESUBMIT.py | ||
public.bzl | ||
README | ||
README.chromium | ||
whitespace.txt |
Skia is a complete 2D graphic library for drawing Text, Geometries, and Images. See full details, and build instructions, at https://skia.org.