184f601346
vpermps (added here) makes this very easy, with an index controlling what 32-bit values go where. A index of the form {0,2,4,6|?,?,?,?} will put the 4 low 32-bit halves of 4 64-bit values in lanes 0,1,2,3. We can use that twice to get all 8 low halves, then our new vperm2f128 to put them together. Conveniently vpermps can also load directly from memory: vpermps (%rdi), {0,2,4,6|?,?,?,?}, lo vpermps 32(%rdi), {0,2,4,6|?,?,?,?}, hi vperm2f128 0x20, lo,hi, dst We don't care what those top four indices are for load64_lo, so we'll use them as the indices for load64_hi. That makes the full index {0,2,4,6|1,3,5,7}, and load64_hi will just vpermf128 the other 128-bits of lo/hi: vpermps (%rdi), {?,?,?,?|1,3,5,7}, lo vpermps 32(%rdi), {?,?,?,?|1,3,5,7}, hi vperm2f128 0x31, lo,hi, dst vpermps needs its index in a register, so we use a temporary for that. Our logical lo can alias dst, and hi can alias that index, so it's just one extra temporary register in the end. Change-Id: Ie6a4efbf12ddada45dd09c0f580fa7350cf3019e Reviewed-on: https://skia-review.googlesource.com/c/skia/+/305171 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com> |
||
---|---|---|
animations | ||
bench | ||
bin | ||
build/fuchsia | ||
build_overrides | ||
client_utils/android | ||
demos.skia.org | ||
dm | ||
docker | ||
docs/examples | ||
example | ||
experimental | ||
fuzz | ||
gm | ||
gn | ||
include | ||
infra | ||
modules | ||
platform_tools | ||
resources | ||
samplecode | ||
site | ||
specs | ||
src | ||
tests | ||
third_party | ||
tools | ||
.clang-format | ||
.clang-tidy | ||
.gitignore | ||
.gn | ||
AUTHORS | ||
BUILD.gn | ||
codereview.settings | ||
CONTRIBUTING | ||
CQ_COMMITTERS | ||
DEPS | ||
go.mod | ||
go.sum | ||
LICENSE | ||
OWNERS | ||
PRESUBMIT.py | ||
public.bzl | ||
README | ||
README.chromium | ||
RELEASE_NOTES.txt | ||
whitespace.txt |
Skia is a complete 2D graphic library for drawing Text, Geometries, and Images. See full details, and build instructions, at https://skia.org.