Commit Graph

9 Commits

Author SHA1 Message Date
Mike Klein
45d9cc86b3 remove i16x2 ops
These are neat but mostly just a distraction for now.
I've left all the assembly in place and unit tested
to make putting these back easy when we want to.

Change-Id: Id2bd05eca363baf9c4e31125ee79e722ded54cb7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/283307
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-04-13 19:08:11 +00:00
Mike Klein
5b5dabcc56 make some SkVM benches uneven
N=15 and N=63 make for nice even looking profiles
on ARM and x86 respectively, with N=15 running 3
body loops and 3 tail loops on ARM, N=63 running
7 body loops and 7 tail loops on x86.

Change-Id: Ie7616bd99c949328bbb7d7048fc6f468ff1e3ad2
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/227220
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-07-15 02:38:26 +00:00
Mike Klein
1ae6ac81e6 add RP comparison for SkVM_Overhead
Looks like ~50ns overhead for RP vs ~14,000ns for SkVM.

Change-Id: I85ef73d3387657b14615fcfa5cfd9df5c2325343
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/223302
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-06-25 02:52:19 +00:00
Mike Klein
8b5cf82dca add SkVM_Overhead bench, simple improvements
This new bench lets us measure the overhead of program building,
optimization, and JITting.  Surprisingly, at head the optimization in
Builder::done() takes longer than the JIT.

The new bench clocks in around 40µs on my laptop at head,
  then 32µs after switching val_to_reg to be an std::vector,
  then 27µs after switching deaths to be an std::vector too,
  then 22µs after switching fIndex to be an SkTHashMap,
  then 20µs after calling program.reserve(fProgram.size()),
  then 19µs after switching JIT data maps to SkTHashMap too.

I tried swapping some std::vector for SkTDArray to no benefit, actually
a little detriment.  So I think this is roughly all the low-hanging
fruit, with time split now roughly equally between Builder::Done(),
JITting in Program::eval(), and the original calls to Builder
themselves.

Also disable perf dumps on Mac.  No real value there until I can dump a
dylib, and it's just one more thing I have to remember to disable before
running this sort of benchmark.

Change-Id: I1c6e58ed00ac94ad622c7d740712634f60787102
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222984
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-06-22 18:36:49 +00:00
Mike Klein
244ba5550b rearrange who mprotects
This moves the responsibility for allocating executable code out of
Assembler.  The pages Xbyak uses are obviously executable, so this is
redundant right now, but it'll let us switch to something simple like
std::vector<uint8_t> as we continue to cut out Xbyak.

Make how Program holds its cached JIT program slightly less of a mess.

Change-Id: I38d6f01006da1da60f4aed675e9ddf97de9aec52
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222575
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-06-21 18:40:47 +00:00
Mike Klein
397fc88fc0 first VEX ymm vector ops
- 32x8 i32 add,sub,mul
   - add I32_Naive bench/test builder to get better i32 mul coverage
   - minor refactoring all over

Change-Id: I13cc19ff37a2da0bcff289ba51baac08f456d6c5
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222485
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-06-20 18:20:00 +00:00
Mike Klein
7b7077cc36 centralize test/bench SkVM builders
Eliminate the duplicate functionality,
and better testing for the bench builders.

Change-Id: If20e52107738903f854aec431416e573d7a7d640
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218041
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-06-04 16:55:59 +00:00
Mike Klein
03ce675b5f fix SKVM_ benches
Things were running suspiciously well...

_I32 had a typo that cut out 3/4 of its multiplies...

_I32_SWAR was missing a mask operation needed to drop
the junk low byte of the high half after the multiply.

The bench times now make a bit more sense and are in line
with how much work we're actually doing: F32's the slowest,
I32 a little faster, and I32_SWAR fastest:

    curr/maxrss	loops	min	median	mean	max	stddev	samples   	config	bench
      35/36  MB	58	2.03ns	2.04ns	2.04ns	2.04ns	0%	▂▂▂▂▁▁█▁▂▁	nonrendering	SkVM_4096_I32_SWAR
      35/36  MB	42	3.44ns	3.48ns	3.49ns	3.59ns	1%	▂▆▅█▃▃▁▂▂▄	nonrendering	SkVM_4096_I32
      35/36  MB	30	4.9ns	5.21ns	5.11ns	5.33ns	3%	▆▇█▆▆▁▂▁▁▅	nonrendering	SkVM_4096_F32
      35/36  MB	203	0.696ns	0.697ns	0.705ns	0.758ns	3%	█▂▂▁▁▁▁▁▁▂	nonrendering	SkVM_4096_RP
      35/36  MB	942	0.188ns	0.188ns	0.188ns	0.189ns	0%	▂▁▂▁▃█▂▁▁▁	nonrendering	SkVM_4096_Opts

Change-Id: I2850dc3f9df1828f03499eb278b8231f48eaae63
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217982
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
2019-06-03 20:35:24 +00:00
Mike Klein
68c50d015b sketch an skvm
With all the thinking around a stack-based interpreter,
I figured I'd sketch out some ideas for a register VM too.

I kind of have the hunch that this is the direction that
will actually let us replace large amounts of Skia's CPU
backend with an efficient interpreter or JIT.

Change-Id: Ia2b5ba4a3fc27556f5b6ba95cd1ace46d3217403
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/216665
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2019-06-03 19:53:48 +00:00