skia2/include/private
Mike Klein 8c1e0effbb sketch out structure for ops with immediates
Lots of x86 instructions can take their right hand side argument from
memory directly rather than a register.  We can use this to avoid the
need to allocate a register for many constants.

The strategy in this CL is one of several I've been stewing over, the
simplest of those strategies I think.  There are some trade offs
particularly on ARM; this naive ARM implementation means we'll load&op
every time, even though the load part of the operation can logically be
hoisted.  From here on I'm going to just briefly enumerate a few other
approaches that allow the optimization on x86 and still allow the
immediate splats to hoist on ARM.

1) don't do it on ARM
A very simple approach is to simply not perform this optimization on
ARM.  ARM has more vector registers than x86, and so register pressure
is lower there.  We're going to end up with splatted constants in
registers anyway, so maybe just let that happen the normal way instead
of some roundabout complicated hack like I'll talk about in 2).  The
only downside in my mind is that this approach would make high-level
program descriptions platform dependent, which isn't so bad, but it's
been nice to be able to compare and diff debug dumps.

2) split Op::splat up
The next less-simple approach to this problem could fix this by
splitting splats into two Ops internally, one inner Op::immediate that
guantees at least the constant is in memory and is compatible with
immediate-aware Ops like mul_f32_imm, and an outer Op::constant that
depends on that Op::immediate and further guarantees that constant has
been broadcast into a register to be compatible with non-immediate-aware
ops like div_f32.  When building a program, immediate-aware ops would
peek for Op::constants as they do today for Op::splats, but instead of
embedding the immediate themselves, they'd replace their dependency with
the inner Op::immediate.

On x86 these new Ops would work just as advertised, with Op::immediate a
runtime no-op, Op::constant the usual vbroadcastss.  On ARM
Op::immediate needs to go all the way and splat out a register to make
the constant compatible with immediate-aware ops, and the Op::constant
becomes a noop now instead.  All this comes together to let the
Op::immediate splat hoist up out of the loop while still feeding
Op::mul_f32_imm and co.  It's a rather complicated approach to solving
this issue, but I might want to explore it just to see how bad it is.

3) do it inside the x86 JIT
The conceptually best approach is to find a way to do this peepholing
only inside the JIT only on x86, avoiding the need for new
Op::mul_f32_imm and co.  ARM and the interpreter don't benefit from this
peephole, so the x86 JIT is the logical owner of this optimization.
Finding a clean way to do this without too much disruption is the least
baked idea I've got here, though I think the most desirable long-term.

Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Debug-All-SK_USE_SKVM_BLITTER,Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-All-SK_USE_SKVM_BLITTER
Change-Id: Ie9c6336ed08b6fbeb89acf920a48a319f74f3643
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/254217
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2019-11-12 20:17:55 +00:00
..
GrContext_Base.h SK_API in src/gpu cleanup 2019-08-22 02:00:07 +00:00
GrGLTypesPriv.h Rework how backend-specific formats are retrieved from GrBackendFormat. 2019-08-08 17:20:34 +00:00
GrImageContext.h SK_API in src/gpu cleanup 2019-08-22 02:00:07 +00:00
GrRecordingContext.h Add creation-time POD memory pool for GrOps 2019-10-22 16:35:41 +00:00
GrResourceKey.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
GrSharedEnums.h sksl enum support 2017-11-13 14:36:40 +00:00
GrSingleOwner.h Add thread safety annotations for SkMutex 2019-05-13 15:44:33 +00:00
GrTypesPriv.h Some improvements to backend texture creation. 2019-11-04 20:37:41 +00:00
GrVkTypesPriv.h Store protectedness on GrVkImageInfo. 2019-07-18 20:03:08 +00:00
SkBitmaskEnum.h tools/skui: put all enums in a common namespace 2019-08-29 15:39:32 +00:00
SkChecksum.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkColorData.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkDeferredDisplayList.h Add creation-time POD memory pool for GrOps 2019-10-22 16:35:41 +00:00
SkEncodedInfo.h Move skcms.h to include/third_party/skcms 2019-04-29 15:02:45 +00:00
SkFixed.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkFloatBits.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkFloatingPoint.h Move SkImageFilter functionality into private SkImageFilter_Base 2019-08-02 18:56:39 +00:00
SkHalf.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkImageInfoPriv.h add SkColorTypeIsNormalized() 2019-11-08 17:51:45 +00:00
SkMacros.h SkTypes: more into SkMacros 2018-06-12 20:24:43 +00:00
SkMalloc.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkMutex.h Remove all global mutexes 2019-06-18 00:39:15 +00:00
SkNoncopyable.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkNx_neon.h minor Sk4px cleanup 2018-12-18 20:46:25 +00:00
SkNx_sse.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkNx.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkOnce.h experimental support for go/fibers 2019-10-28 16:48:55 +00:00
SkPathRef.h reverse/restore order of verbs in path to be forward (normal) 2019-09-05 21:14:08 +00:00
SkSafe32.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkSafe_math.h Guard against buggy ucrt\math.h. 2016-11-28 15:40:23 +00:00
SkSemaphore.h experimental support for go/fibers 2019-10-28 16:48:55 +00:00
SkShadowFlags.h Remove deprecated drawShadow interfaces 2018-02-28 19:07:29 +00:00
SkSpinlock.h Add thread safety annotations. 2019-05-10 13:40:38 +00:00
SkTArray.h IWYU for SkTLogic.h 2019-05-02 21:17:37 +00:00
SkTDArray.h Skip unneeded reallocs in SkTDArray::shrinkToFit() 2019-10-23 02:06:36 +00:00
SkTemplates.h Use void(void*) instead of decltype with sk_free. 2019-08-20 22:28:42 +00:00
SkTFitsIn.h Add support for MSVC run-time checks (and control flow guard) 2019-02-04 20:55:24 +00:00
SkTHash.h sketch out structure for ops with immediates 2019-11-12 20:17:55 +00:00
SkThreadAnnotations.h experimental support for go/fibers 2019-10-28 16:48:55 +00:00
SkThreadID.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkTLogic.h build: fix compilation on macOS with 10.14 SDK 2019-10-30 20:18:58 +00:00
SkTo.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00
SkVx.h free skvx from its Skia shackles 2019-06-07 18:08:23 +00:00
SkWeakRefCnt.h rewrite includes to not need so much -Ifoo 2019-04-24 16:27:11 +00:00