Lots of x86 instructions can take their right hand side argument from
memory directly rather than a register. We can use this to avoid the
need to allocate a register for many constants.
The strategy in this CL is one of several I've been stewing over, the
simplest of those strategies I think. There are some trade offs
particularly on ARM; this naive ARM implementation means we'll load&op
every time, even though the load part of the operation can logically be
hoisted. From here on I'm going to just briefly enumerate a few other
approaches that allow the optimization on x86 and still allow the
immediate splats to hoist on ARM.
1) don't do it on ARM
A very simple approach is to simply not perform this optimization on
ARM. ARM has more vector registers than x86, and so register pressure
is lower there. We're going to end up with splatted constants in
registers anyway, so maybe just let that happen the normal way instead
of some roundabout complicated hack like I'll talk about in 2). The
only downside in my mind is that this approach would make high-level
program descriptions platform dependent, which isn't so bad, but it's
been nice to be able to compare and diff debug dumps.
2) split Op::splat up
The next less-simple approach to this problem could fix this by
splitting splats into two Ops internally, one inner Op::immediate that
guantees at least the constant is in memory and is compatible with
immediate-aware Ops like mul_f32_imm, and an outer Op::constant that
depends on that Op::immediate and further guarantees that constant has
been broadcast into a register to be compatible with non-immediate-aware
ops like div_f32. When building a program, immediate-aware ops would
peek for Op::constants as they do today for Op::splats, but instead of
embedding the immediate themselves, they'd replace their dependency with
the inner Op::immediate.
On x86 these new Ops would work just as advertised, with Op::immediate a
runtime no-op, Op::constant the usual vbroadcastss. On ARM
Op::immediate needs to go all the way and splat out a register to make
the constant compatible with immediate-aware ops, and the Op::constant
becomes a noop now instead. All this comes together to let the
Op::immediate splat hoist up out of the loop while still feeding
Op::mul_f32_imm and co. It's a rather complicated approach to solving
this issue, but I might want to explore it just to see how bad it is.
3) do it inside the x86 JIT
The conceptually best approach is to find a way to do this peepholing
only inside the JIT only on x86, avoiding the need for new
Op::mul_f32_imm and co. ARM and the interpreter don't benefit from this
peephole, so the x86 JIT is the logical owner of this optimization.
Finding a clean way to do this without too much disruption is the least
baked idea I've got here, though I think the most desirable long-term.
Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Debug-All-SK_USE_SKVM_BLITTER,Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-All-SK_USE_SKVM_BLITTER
Change-Id: Ie9c6336ed08b6fbeb89acf920a48a319f74f3643
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/254217
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
This reduces the number of map lookups that need to happen.
Change-Id: I068819f2576bf644a5c3550d48e69413e19179d3
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/237217
Reviewed-by: Khushal Sagar <khushalsagar@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
Found with a GCC compiler warning.
Seems like a sane idea to keep Slots fully initialized.
Change-Id: I7a7bd4ccc4a6800d6e262aa7c616d3cab36d74dc
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/227121
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Switching some SkVM code from std::unordered_map to SkTHashMap caused
the -MSRTC bot to barf unexpectedly, but in a way that makes sense in
retrospect. The code to hash skvm::Builder::Instructions returns size_t
to fit the unordered_map convention, and I forgot to change that to
SkTHashMap's preferred uint32_t. So we began to implicitly truncate
that size_t to uint32_t on 64-bit machines, one of the potential issues
the -MSRTC bot exists to catch.
This change simply masks any user-provided hash to 32 bits explicitly.
We could alternatively update the Instruction hash code, but I think the
mask here is so cheap (usually notional, zero-cost) that compatibility
with std::unordered_map makes this approach more desirable.
Cq-Include-Trybots: skia.primary:Test-Win2016-MSVC-GCE-CPU-AVX2-x86_64-Debug-All-MSRTC
Change-Id: I0551e7590d5039962e213c6672927bd84e1a0856
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/223136
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Current strategy: everything from the top
Things to look at first are the manual changes:
- added tools/rewrite_includes.py
- removed -Idirectives from BUILD.gn
- various compile.sh simplifications
- tweak tools/embed_resources.py
- update gn/find_headers.py to write paths from the top
- update gn/gn_to_bp.py SkUserConfig.h layout
so that #include "include/config/SkUserConfig.h" always
gets the header we want.
No-Presubmit: true
Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Hal Canary <halcanary@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
This should help avoid confusion from T**.
Change-Id: I1851baa2a55714721fa935d234b6a4a1c6d6504f
Reviewed-on: https://skia-review.googlesource.com/c/182562
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Change-Id: I0944195d3087c97353994ff219f77464d94f1ba8
Reviewed-on: https://skia-review.googlesource.com/135045
Commit-Queue: Hal Canary <halcanary@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Auto-Submit: Hal Canary <halcanary@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Change-Id: I693ddcd4ade101ba4eb4102e03adce183aa1d672
Reviewed-on: https://skia-review.googlesource.com/133829
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
include/private:
- SkAutoTArray, SkAutoTMalloc no longer use bare pointers to owned memory,
- SkTHash and SkAutoTArray are now std::move()able.
- SkAutoTArray::swap no longer neccesary.
- SkAutoTMalloc::operator=() defined.
src/pdf:
- SkPDFCanon and SkPDFObjectSerializer are now std::move()able.
- `template <class T> static void renew(T* t) { t->~T(); new (t) T; }` is gone.
Change-Id: I2f36a0780c47d1427a85da240121c898387fb4cf
Reviewed-on: https://skia-review.googlesource.com/123401
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Hal Canary <halcanary@google.com>
The element rearrange logic in SkTHashTable::remove() marks empty slots
as such, but does not reset their value.
When breaking out of the rearrange loop, we must also reset the last empty
slot value to avoid retaining unwanted copies.
Change-Id: I8ba2a25088c0aa5210277124e0917224cb295691
Reviewed-on: https://skia-review.googlesource.com/9533
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Don't know why I thought this had to be so complicated before.
BUG=skia:6053
Change-Id: Ie714fed1cb47e9add166d4227d3d31f95eba2411
Reviewed-on: https://skia-review.googlesource.com/6121
Reviewed-by: Hal Canary <halcanary@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
* Switch to linear probing - this allows delete to rearrange elements to fill in empty slots
* NULL -> nullptr
Change-Id: I741c2f3bb2734bf638d0c0a78c6cc549f563a5d9
Reviewed-on: https://skia-review.googlesource.com/5980
Commit-Queue: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Klein <mtklein@chromium.org>
The more I look at std::unordered_map and co., the less I like them.
I think we might want to bet on SkTHash*.
As a simple first improvement, add move support.
Next comes shrinking, and then I'll start moving over SkTDynamicHash users.
BUG=skia:6053
Change-Id: Ifdb5d713aab66434ca271c7f18a0cbbb0720099c
Reviewed-on: https://skia-review.googlesource.com/5943
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Hal Canary <halcanary@google.com>
Passing &SkGoodHash to SkTHashMap and SkTHashSet doesn't guarantee that it's actually instantiated. Using a functor does.
BUG=skia:
Review URL: https://codereview.chromium.org/1405053002
include/views/SkOSWindow_Win.h includes it.
To move SkTHash.h to include/private, SkChecksum.h needs to go there too. To move SkChecksum.h to include/private, SkTLogic needs to go there too.
This adds a bunch of -Iinclude/private to tools.gyp I missed in the last CL.
No public API changes.
TBR=reed@google.com
BUG=skia:4126
Review URL: https://codereview.chromium.org/1260613006
Some of this is transitive, like SkRecords.h used by SkMiniRecorder.h
used by (public) SkPictureRecorder.h.
BUG=skia:
Review URL: https://codereview.chromium.org/1217293004