skia2

Author	SHA1	Message	Date
Mike Klein	45d9cc86b3	remove i16x2 ops These are neat but mostly just a distraction for now. I've left all the assembly in place and unit tested to make putting these back easy when we want to. Change-Id: Id2bd05eca363baf9c4e31125ee79e722ded54cb7 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/283307 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>	2020-04-13 19:08:11 +00:00
Mike Klein	5b5dabcc56	make some SkVM benches uneven N=15 and N=63 make for nice even looking profiles on ARM and x86 respectively, with N=15 running 3 body loops and 3 tail loops on ARM, N=63 running 7 body loops and 7 tail loops on x86. Change-Id: Ie7616bd99c949328bbb7d7048fc6f468ff1e3ad2 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/227220 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-07-15 02:38:26 +00:00
Mike Klein	1ae6ac81e6	add RP comparison for SkVM_Overhead Looks like ~50ns overhead for RP vs ~14,000ns for SkVM. Change-Id: I85ef73d3387657b14615fcfa5cfd9df5c2325343 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/223302 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-25 02:52:19 +00:00
Mike Klein	8b5cf82dca	add SkVM_Overhead bench, simple improvements This new bench lets us measure the overhead of program building, optimization, and JITting. Surprisingly, at head the optimization in Builder::done() takes longer than the JIT. The new bench clocks in around 40µs on my laptop at head, then 32µs after switching val_to_reg to be an std::vector, then 27µs after switching deaths to be an std::vector too, then 22µs after switching fIndex to be an SkTHashMap, then 20µs after calling program.reserve(fProgram.size()), then 19µs after switching JIT data maps to SkTHashMap too. I tried swapping some std::vector for SkTDArray to no benefit, actually a little detriment. So I think this is roughly all the low-hanging fruit, with time split now roughly equally between Builder::Done(), JITting in Program::eval(), and the original calls to Builder themselves. Also disable perf dumps on Mac. No real value there until I can dump a dylib, and it's just one more thing I have to remember to disable before running this sort of benchmark. Change-Id: I1c6e58ed00ac94ad622c7d740712634f60787102 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222984 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-22 18:36:49 +00:00
Mike Klein	244ba5550b	rearrange who mprotects This moves the responsibility for allocating executable code out of Assembler. The pages Xbyak uses are obviously executable, so this is redundant right now, but it'll let us switch to something simple like std::vector<uint8_t> as we continue to cut out Xbyak. Make how Program holds its cached JIT program slightly less of a mess. Change-Id: I38d6f01006da1da60f4aed675e9ddf97de9aec52 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222575 Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-21 18:40:47 +00:00
Mike Klein	397fc88fc0	first VEX ymm vector ops - 32x8 i32 add,sub,mul - add I32_Naive bench/test builder to get better i32 mul coverage - minor refactoring all over Change-Id: I13cc19ff37a2da0bcff289ba51baac08f456d6c5 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222485 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-20 18:20:00 +00:00
Mike Klein	7b7077cc36	centralize test/bench SkVM builders Eliminate the duplicate functionality, and better testing for the bench builders. Change-Id: If20e52107738903f854aec431416e573d7a7d640 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218041 Reviewed-by: Mike Klein <mtklein@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-04 16:55:59 +00:00
Mike Klein	03ce675b5f	fix SKVM_ benches Things were running suspiciously well... _I32 had a typo that cut out 3/4 of its multiplies... _I32_SWAR was missing a mask operation needed to drop the junk low byte of the high half after the multiply. The bench times now make a bit more sense and are in line with how much work we're actually doing: F32's the slowest, I32 a little faster, and I32_SWAR fastest: curr/maxrss loops min median mean max stddev samples config bench 35/36 MB 58 2.03ns 2.04ns 2.04ns 2.04ns 0% ▂▂▂▂▁▁█▁▂▁ nonrendering SkVM_4096_I32_SWAR 35/36 MB 42 3.44ns 3.48ns 3.49ns 3.59ns 1% ▂▆▅█▃▃▁▂▂▄ nonrendering SkVM_4096_I32 35/36 MB 30 4.9ns 5.21ns 5.11ns 5.33ns 3% ▆▇█▆▆▁▂▁▁▅ nonrendering SkVM_4096_F32 35/36 MB 203 0.696ns 0.697ns 0.705ns 0.758ns 3% █▂▂▁▁▁▁▁▁▂ nonrendering SkVM_4096_RP 35/36 MB 942 0.188ns 0.188ns 0.188ns 0.189ns 0% ▂▁▂▁▃█▂▁▁▁ nonrendering SkVM_4096_Opts Change-Id: I2850dc3f9df1828f03499eb278b8231f48eaae63 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217982 Commit-Queue: Mike Klein <mtklein@google.com> Commit-Queue: Brian Osman <brianosman@google.com> Auto-Submit: Mike Klein <mtklein@google.com> Reviewed-by: Brian Osman <brianosman@google.com>	2019-06-03 20:35:24 +00:00
Mike Klein	68c50d015b	sketch an skvm With all the thinking around a stack-based interpreter, I figured I'd sketch out some ideas for a register VM too. I kind of have the hunch that this is the direction that will actually let us replace large amounts of Skia's CPU backend with an efficient interpreter or JIT. Change-Id: Ia2b5ba4a3fc27556f5b6ba95cd1ace46d3217403 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/216665 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@google.com>	2019-06-03 19:53:48 +00:00

9 Commits