skia2

AuroraMiddleware/skia2

Fork 0

Commit Graph

Author	SHA1	Message	Date
mtklein	490b61569d	Port SkXfermode opts to SkOpts.h Renames Sk4pxXfermode.h to SkXfermode_opts.h, and refactors it a tiny bit internally. This moves xfermode optimization from being "compile-time everywhere but NEON" to simply "runtime everywhere". I don't anticipate any effect on perf or correctness. BUG=skia:4117 Review URL: https://codereview.chromium.org/1264543006	2015-07-31 11:50:27 -07:00
mtklein	7eb0945af2	Port SkUtils opts to SkOpts. With this new arrangement, the benefits of inlining sk_memset16/32 have changed. On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower. On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster. On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster. We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization. So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8. BUG=skia:4117 Review URL: https://codereview.chromium.org/1270573002	2015-07-31 10:46:50 -07:00

Author

SHA1

Message

Date

mtklein

490b61569d

Port SkXfermode opts to SkOpts.h

Renames Sk4pxXfermode.h to SkXfermode_opts.h,
and refactors it a tiny bit internally.

This moves xfermode optimization from being "compile-time everywhere but NEON"
to simply "runtime everywhere".  I don't anticipate any effect on perf or
correctness.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1264543006

2015-07-31 11:50:27 -07:00

mtklein

7eb0945af2

Port SkUtils opts to SkOpts.

With this new arrangement, the benefits of inlining sk_memset16/32 have changed.

On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower.
On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster).  For small N<=10 inlining is still significantly faster.
On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster.

We were not using the NEON memset16 and memset32 procs on ARMv8.  At first blush, that seems to be an oversight, but if so it's an extremely lucky one.  The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization.

So, no need to inline any more on x86, and still inline for N<=10 on ARMv7.  Always inline for ARMv8.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1270573002

2015-07-31 10:46:50 -07:00

2 Commits