Go to file
Mike Klein fae63e477c hoist tbl masks
This adds a warmup phase to let each instruction do any setup
it needs, adding lookup entries for splat and bytes, and
on aarch64, hoisting the mask to a register when we can.

Oddly, this measures as a ~3x slowdown on the phone I'm testing, an
international Galaxy S9 with a Samsung Mongoose 3 processor.  I've got
to imagine this somehow makes the processor think there's a carried loop
dependency when there is not?  Anyway, we already know that that's a
pretty crazy CPU (reports FP16 compute but cannot), and this does deliver
a speedup on the Pixel 2's Kryo 280 / Cortex A73, so I think maybe I'll
just swap back to testing with the Pixel 2 and forget about that S9.

Here's a before/after codelisting with a hoisted tbl mask.  In the
before case it's loaded in the loop with `ldr q3, #152`, and becomes
`ldr q0, #168` outside the loop.  llvm-mca says this should cut one
cycle per loop, and with optimal out of order execution the loop cost
would drop from ~8.7 cycles to ~8.3.  In practice, it looks like about a
15% speedup.

before:
	ldr	q0, #188
	ldr	q1, #200
	cmp	x0, #4                  // =4
	b.lt	#76
	ldr	q2, [x1]
	ldr	q3, #152
	tbl	v3.16b, { v2.16b }, v3.16b
	sub	v3.8h, v0.8h, v3.8h
	ldr	q4, [x2]
	and	v5.16b, v4.16b, v1.16b
	ushr	v4.8h, v4.8h, #8
	mul	v5.8h, v5.8h, v3.8h
	ushr	v5.8h, v5.8h, #8
	mul	v3.8h, v4.8h, v3.8h
	bic	v3.16b, v3.16b, v1.16b
	orr	v3.16b, v5.16b, v3.16b
	add	v2.4s, v2.4s, v3.4s
	str	q2, [x2]
	add	x1, x1, #16             // =16
	add	x2, x2, #16             // =16
	sub	x0, x0, #4              // =4
	b.al	#-76
	cmp	x0, #1                  // =1
	b.lt	#76
	ldr	s2, [x1]
	ldr	q3, #72
	tbl	v3.16b, { v2.16b }, v3.16b
	sub	v3.8h, v0.8h, v3.8h
	ldr	s4, [x2]
	and	v5.16b, v4.16b, v1.16b
	ushr	v4.8h, v4.8h, #8
	mul	v5.8h, v5.8h, v3.8h
	ushr	v5.8h, v5.8h, #8
	mul	v3.8h, v4.8h, v3.8h
	bic	v3.16b, v3.16b, v1.16b
	orr	v3.16b, v5.16b, v3.16b
	add	v2.4s, v2.4s, v3.4s
	str	s2, [x2]
	add	x1, x1, #4              // =4
	add	x2, x2, #4              // =4
	sub	x0, x0, #1              // =1
	b.al	#-76
	ret

after: ldr	q0, #168
	ldr	q1, #180
	ldr	q2, #192
	cmp	x0, #4                  // =4
	b.lt	#72
	ldr	q3, [x1]
	tbl	v4.16b, { v3.16b }, v0.16b
	sub	v4.8h, v1.8h, v4.8h
	ldr	q5, [x2]
	and	v6.16b, v5.16b, v2.16b
	ushr	v5.8h, v5.8h, #8
	mul	v6.8h, v6.8h, v4.8h
	ushr	v6.8h, v6.8h, #8
	mul	v4.8h, v5.8h, v4.8h
	bic	v4.16b, v4.16b, v2.16b
	orr	v4.16b, v6.16b, v4.16b
	add	v3.4s, v3.4s, v4.4s
	str	q3, [x2]
	add	x1, x1, #16             // =16
	add	x2, x2, #16             // =16
	sub	x0, x0, #4              // =4
	b.al	#-72
	cmp	x0, #1                  // =1
	b.lt	#72
	ldr	s3, [x1]
	tbl	v4.16b, { v3.16b }, v0.16b
	sub	v4.8h, v1.8h, v4.8h
	ldr	s5, [x2]
	and	v6.16b, v5.16b, v2.16b
	ushr	v5.8h, v5.8h, #8
	mul	v6.8h, v6.8h, v4.8h
	ushr	v6.8h, v6.8h, #8
	mul	v4.8h, v5.8h, v4.8h
	bic	v4.16b, v4.16b, v2.16b
	orr	v4.16b, v6.16b, v4.16b
	add	v3.4s, v3.4s, v4.4s
	str	s3, [x2]
	add	x1, x1, #4              // =4
	add	x2, x2, #4              // =4
	sub	x0, x0, #1              // =1
	b.al	#-72
	ret
Change-Id: I352a98d3ac2ad84c338330ef4cfae0292a0b32da
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/229064
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-07-23 00:16:27 +00:00
animations
bench Reland "hide drawlooper from paint" 2019-07-22 20:03:36 +00:00
bin Add bin/try-clients to trigger client tryjobs 2019-06-11 16:55:53 +00:00
build_overrides Update to Dawn ToT. 2019-07-22 16:20:36 +00:00
dm Add GrProtected parameter to all createBackendTexture variants 2019-07-01 19:46:18 +00:00
docker fix Dockerfile? 2019-06-24 16:34:59 +00:00
docs/examples Reland "hide drawlooper from paint" 2019-07-22 20:03:36 +00:00
example ModifierKey unifies sk_app::Window::ModifierKey & Sample::Click::ModifierKey 2019-07-09 16:16:41 +00:00
experimental sk_app, Sample: Unify InputState enum. 2019-07-16 14:51:03 +00:00
fuzz Make fuzzing use embedded test font 2019-06-03 16:29:21 +00:00
gm Reland "hide drawlooper from paint" 2019-07-22 20:03:36 +00:00
gn ccpr: Add an MSAA atlas mode 2019-07-19 20:52:17 +00:00
include Reland "hide drawlooper from paint" 2019-07-22 20:03:36 +00:00
infra Roll recipe dependencies (trivial). 2019-07-22 21:20:56 +00:00
modules [canvaskit] Return damage rect from ManagedAnimation::seek() 2019-07-22 18:29:05 +00:00
platform_tools tools: separate TimeUtils from AnimTimer 2019-07-12 15:05:01 +00:00
resources move hoist analysis back into Builder 2019-07-22 19:34:06 +00:00
samplecode Reland "hide drawlooper from paint" 2019-07-22 20:03:36 +00:00
site documentation/build, BUILDCONFIG: Visual Studio Build Tools 2019 2019-07-12 14:17:16 +00:00
specs [img-decode] Start on proposed new spec 2019-05-06 17:39:19 +00:00
src hoist tbl masks 2019-07-23 00:16:27 +00:00
tests let JIT code hoist when possible 2019-07-22 21:06:34 +00:00
third_party Update to Dawn ToT. 2019-07-22 16:20:36 +00:00
tools Reland "hide drawlooper from paint" 2019-07-22 20:03:36 +00:00
.clang-format restore .clang-format 2019-03-21 15:52:32 +00:00
.clang-tidy add google-build-namespaces to clang-tidy checks 2018-12-12 16:33:59 +00:00
.gitignore clean up some .gitignores 2019-05-15 19:55:45 +00:00
.gn Basic standalone GN configs. 2016-07-21 12:25:45 -07:00
AUTHORS Fix Metal includes breaking macOS local builds 2019-07-08 14:02:47 +00:00
BUILD.gn First draft of Dawn backend: clears are working. 2019-07-18 18:09:12 +00:00
codereview.settings Make uploading to Gerrit the default for Skia 2016-11-09 19:07:56 +00:00
CONTRIBUTING
CQ_COMMITTERS
DEPS Fix Lua DEPS 2019-07-22 18:14:05 +00:00
go.mod Update Go deps 2019-07-21 05:26:46 +00:00
go.sum Update Go deps 2019-07-21 05:26:46 +00:00
LICENSE BUG=skia:5602 2016-09-02 11:19:34 -07:00
OWNERS add OWNERS file 2017-12-01 19:50:19 +00:00
PRESUBMIT.py Reland "[infra] Make most builds idempotent" 2019-07-19 12:11:27 +00:00
public.bzl First draft of Dawn backend: clears are working. 2019-07-18 18:09:12 +00:00
README
README.chromium
whitespace.txt Whitespace test 2019-05-18 13:05:29 +00:00

Skia is a complete 2D graphic library for drawing Text, Geometries, and Images.

See full details, and build instructions, at https://skia.org.