This new bench lets us measure the overhead of program building,
optimization, and JITting. Surprisingly, at head the optimization in
Builder::done() takes longer than the JIT.
The new bench clocks in around 40µs on my laptop at head,
then 32µs after switching val_to_reg to be an std::vector,
then 27µs after switching deaths to be an std::vector too,
then 22µs after switching fIndex to be an SkTHashMap,
then 20µs after calling program.reserve(fProgram.size()),
then 19µs after switching JIT data maps to SkTHashMap too.
I tried swapping some std::vector for SkTDArray to no benefit, actually
a little detriment. So I think this is roughly all the low-hanging
fruit, with time split now roughly equally between Builder::Done(),
JITting in Program::eval(), and the original calls to Builder
themselves.
Also disable perf dumps on Mac. No real value there until I can dump a
dylib, and it's just one more thing I have to remember to disable before
running this sort of benchmark.
Change-Id: I1c6e58ed00ac94ad622c7d740712634f60787102
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222984
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
For now, disable the vpmovusdb AVX-512 instruction, using the compound
AVX2 fallback instead. I need to learn how to encode EVEX prefixes
before we can use that, and it's not very important.
That's everything! We're fully in control now, and should be able to
run this on any x86-64 Linux or Mac. And we can relax some of the
defined(SKVM_JIT) guards so that, e.g., we can unit test Assembler even
on all platforms.
Stifle some warnings about ~bool by ~(int)bool.
Would like to enable when is_mac too but can't seem to get past
(bogus?) thread annotation on the bots. My local Mac is fine. :/
Change-Id: If00bdd97ebd9684ed109933e2fa70c5e6f6ea339
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222631
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Most image filters were fixed with just the changes to SkImage::makeWithFilter
and the changes to SkSpecialImage's subset handling (particularly the
raster backend that could read from a bitmap view, or ganesh impls that relied
SkSpecialImage::draw).
The gpu implementation for alpha threshold, blurs, matrix convolutions,
and displacement maps have been updated to account for the special image's
offset when it reads from the backing texture.
Change-Id: I8778aa373e60e9268961305057b2bf6da2bdb3af
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221121
Reviewed-by: Robert Phillips <robertphillips@google.com>
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Move the invariants for glyph image data into SkGlyph.
Change-Id: I1958612bb73cfffe42df19a11c8899048559013b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222876
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Bug: chromium:977315
Change-Id: Ia5b734f5c0f0806af0f096de5add880a777c5c25
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222793
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
This shows off a little how easy backwards-only labels are.
The rip == rbp + Mod::Indirect convention isn't something
you'd be able to guess without just looking at the docs.
I'm not actually sure if you can only use rbp or also r13,
but LLVM seems to always do the equivalent of rbp... might
just be that high bit in VEX is ignored: they're registers
5 and 13, 8 apart, only distinguished by that bit.
Convenienly RIP addressing is always 32-bit, so there's
no benefit to spending time checking whether the offset
fits in a byte, though most of our offsets would.
Change-Id: I01b7fb1500667e1bf98490d5144459f92e1b375d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222857
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
I think this makes the relatioship between mask and entry clearer?
Can't have JIT code handle >0 elements unless that JIT code itself
exists.
Change-Id: I238d54a5084c7f90bd32c83db5423840cf415b17
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222856
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
This moves the responsibility for allocating executable code out of
Assembler. The pages Xbyak uses are obviously executable, so this is
redundant right now, but it'll let us switch to something simple like
std::vector<uint8_t> as we continue to cut out Xbyak.
Make how Program holds its cached JIT program slightly less of a mess.
Change-Id: I38d6f01006da1da60f4aed675e9ddf97de9aec52
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222575
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
This makes NV12 and NV21 toggle on/off as a pair.
Bug: skia:9155
Change-Id: Ie0d3f2b3c0aba9a1777a722190bcf6aa5b5e85c3
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222460
Reviewed-by: Greg Daniel <egdaniel@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
If you launch the Mac viewer from the command line, it will sit there
until you click on the thumbnail in the dock, and only then will bring
up the window. This fixes that so it will open the window immediately.
Change-Id: I5628dc6c59833f808a61dedde457774114dd0e94
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222783
Commit-Queue: Jim Van Verth <jvanverth@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Auto-Submit: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Change-Id: I4d1a102264d8c97bf9120c3891d569ef96a92922
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222782
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
By putting data first in descending alignment then code, we never need
any alignment padding.
This also makes all jumps and ip-relative data loads backward, so
they're really easy to assemble. No need for any sort of deferred
where-does-this-label-mean logic; the label can just be a simple byte
offset established before you need to use it.
Nothing new switched off of Xbyak in this CL, but the rearrangement
makes the rest a lot easier.
The one downside I've found so far is that the disassembly of the
first instruction can get confused into data or other instructions,
e.g.
63: 01 ff add %edi,%edi
65: 00 ff add %bh,%bh
67: 00 00 add %al,(%rax)
69: ff 00 incl (%rax)
6b: ff c4 inc %esp
6d: e2 7d loop ec <skvm-jit-884702985+0xac>
6f: 18 05 eb ff ff ff sbb %al,-0x15(%rip) # 60 <skvm-jit-884702985+0x20>
75: c4 e2 7d 18 0d e6 ff ff ff vbroadcastss -0x1a(%rip),%ymm1 # 64 <skvm-jit-884702985+0x24>
7e: c4 e2 7d 18 15 e1 ff ff ff vbroadcastss -0x1f(%rip),%ymm2 # 68 <skvm-jit-884702985+0x28>
There are 3 vbroadcastss instructions here, each starting with c4 e2 7d
18, but the first has been disassembled as if its c4 were part of the
last data entry (0xff00ff00) as inc %esp.
Probably not a big deal for now, particularly since those vbroadcastss
are all outside the loop and never show up on a profile. If it gets too
confusing I think we can dump the programs starting from the beginning
of the code instead of from the data; we won't be able to inspect the
data, but everything should disassemble perfectly.
Change-Id: I0cc864359fd0740fc026070eaf2b6cb130783a57
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222574
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
This centralizes the initial-lane-mask logic, and makes the return value
copying much more straightforward by just passing in the width. Lets us
shrink the arrays in the interpreter pipeline stage to the correct size.
Also normalize some formatting and structure.
Change-Id: I446598dcdd550d88ff1db1afe7507f31fa96d1d7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222510
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Introduce textBlobToGlyphRunListWithoutRSX to convert text blob into
glyph runs. Convert the core of the code from working over text blobs
to working over glyph runs.
+ Misc cleanups
Change-Id: I33c1fc5e948dd7270031496325a96409f2cfeeb6
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222277
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Implement radial wipe with a sweep gradient shader mask filter.
The implementation is slightly convoluted because edge feathering requires a real blur, which in turn requires content layer isolation.
So there are two distinct operation modes:
- no feather -> draw the content directly into the dest buffer, with the mask filter
deferred in SG context
- feather -> draw the content into a separate layer, then blend (dstOut) the composed
blur+shader mask on top
Change-Id: I253701aff42db8010ce463762252c262e2c5d92b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222596
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Change-Id: I1d133259264adfdc872b0f4aeaa9390363c46341
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222040
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Since explicit resource allocation has stuck these instantiate calls are no longer required.
Change-Id: I5a8a7fa714eb1e9550f4f645ce8fced2d5f7aa4e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222457
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
He's being set as reviewer for recipe rolls, which causes the rest of us
not to see them.
Change-Id: Idaa59e32ba3fa28d2843a263e6fd8a0d0e234657
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222776
Reviewed-by: Ravi Mistry <rmistry@google.com>
Commit-Queue: Eric Boren <borenet@google.com>
This is an automated CL created by the recipe roller. This CL rolls recipe
changes from upstream projects (e.g. depot_tools) into downstream projects
(e.g. tools/build).
Please review the expectation changes, and LGTM+CQ.
More info is at https://goo.gl/zkKdpD. Use https://goo.gl/noib3a to file a bug.
recipe_engine:
https://crrev.com/c8f87374dfa70a48f26ce431db7b264209d8c479 Implements use of RawResult in engine.py as a recipe return value. (debrian@google.com)
R=stephana@google.com
Recipe-Tryjob-Bypass-Reason: Autoroller
Bugdroid-Send-Email: False
Change-Id: I82379417a3f517a2d389a6978400614dd5be088f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221654
Commit-Queue: Eric Boren <borenet@google.com>
Reviewed-by: Robbie Iannucci <iannucci@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
8f8d2c8d54..31223069ea
Created with:
gclient setdep -r ../src@31223069ea
The AutoRoll server is located here: https://autoroll.skia.org/r/chromium-skia-autoroll
Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md
If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.
CQ_INCLUDE_TRYBOTS=skia.primary:Perf-Mac10.13-Clang-MacBookPro11.5-GPU-RadeonHD8870M-x86_64-Release-All-CommandBuffer;skia.primary:Test-Mac10.13-Clang-MacBookPro11.5-GPU-RadeonHD8870M-x86_64-Debug-All-CommandBuffer
TBR=bsalomon@google.com
Change-Id: I78bbc05b42783766cc32761c158f3a52db7332c4
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222639
Reviewed-by: skia-autoroll <skia-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: skia-autoroll <skia-autoroll@skia-public.iam.gserviceaccount.com>
c5c937e1e8..bf4cfa77c4
git log c5c937e1e8bd..bf4cfa77c4bf --date=short --no-merges --format='%ad %ae %s'
2019-06-20 ynovikov@chromium.org Switch ANGLE Win and Linux try bots to builderless.
2019-06-20 courtneygo@google.com Vulkan: Fix dirty element array buffer updates.
2019-06-20 dongja@google.com Debug: suppress INFO level messages by default
2019-06-20 spang@chromium.org Vulkan: Fix out of bounds access of pWaitDstStageMask
2019-06-20 timvp@google.com Vulkan: Implement copyBufferSubData
2019-06-20 jmadill@chromium.org Vulkan: Add more trace events.
2019-06-20 jmadill@chromium.org Vulkan: Sync image in TextureVk::syncState.
2019-06-20 jmadill@chromium.org Move event tracer back into common.
2019-06-20 lujc@google.com Fix dEQp test results path on Android
Created with:
gclient setdep -r third_party/externals/angle2@bf4cfa77c4bf
The AutoRoll server is located here: https://autoroll.skia.org/r/angle-skia-autoroll
Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md
If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.
CQ_INCLUDE_TRYBOTS=skia.primary:Build-Debian9-Clang-x86_64-Release-ANGLE;skia.primary:Perf-Win10-Clang-AlphaR2-GPU-RadeonR9M470X-x86_64-Debug-All-ANGLE;skia.primary:Perf-Win10-Clang-NUC5i7RYH-GPU-IntelIris6100-x86_64-Debug-All-ANGLE;skia.primary:Perf-Win10-Clang-NUC6i5SYK-GPU-IntelIris540-x86_64-Debug-All-ANGLE;skia.primary:Perf-Win10-Clang-NUCD34010WYKH-GPU-IntelHD4400-x86_64-Debug-All-ANGLE;skia.primary:Perf-Win10-Clang-ShuttleC-GPU-GTX960-x86_64-Debug-All-ANGLE;skia.primary:Test-Win10-Clang-AlphaR2-GPU-RadeonR9M470X-x86_64-Debug-All-ANGLE;skia.primary:Test-Win10-Clang-NUC6i5SYK-GPU-IntelIris540-x86_64-Debug-All-ANGLE;skia.primary:Test-Win10-Clang-NUCD34010WYKH-GPU-IntelHD4400-x86_64-Debug-All-ANGLE;skia.primary:Test-Win10-Clang-ShuttleC-GPU-GTX960-x86_64-Debug-All-ANGLE
TBR=bsalomon@google.com
Change-Id: Icd1b02351ac94fd3ed6c546c104376ff5b4ab642
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222638
Reviewed-by: skia-autoroll <skia-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: skia-autoroll <skia-autoroll@skia-public.iam.gserviceaccount.com>
Like the Posix code below it, the Mac semaphore wait code
can wake spuriously. Keep trying if we get KERN_ABORTED.
Pattern aped from V8. I noticed they like us don't do
anything to test if the Windows WaitForSingleObject()
call fails. Can it fail?
Bug: chromium:977341
Change-Id: I34f407fc4d6717deb6edcf7aa7bed1f8fb8b1baa
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222583
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
This wraps up everything that doesn't touch memory or a label.
Small cleanups and refactoring as I start to understand some
things better...
Change-Id: I788fa877cfcab8f87c961df28fe561b51a5c62ff
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222571
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
The encoding kind of all goes through the same paths,
as the three argument instructions, but like the nursery
rhyme when there are only two they kind of all roll over
and the op-extension hops into the bed.
vpermq is the first place we need to set the W bit
to indicate a 64-bit lane operation, so a little
minimal plumbing for that. It takes its arguments
a little differently too, passing dst where you'd
expect, the source where we'd pass y, and requiring
us to pass literal 0000 for the vvvv bits in VEX
(inverted as normal to literal 1111).
Change-Id: I91a4cd1b316eb908992631ce8b2cb3c62078e8c6
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222565
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Somewhat arbitrarily switched from vandps et al. float bitwise
ops to vpand et al. int bitwise ops. We tend to use them more
int-y generally so the disassembly maybe reads more clearly?
They're identical in behavior AFAIK.
Shuffle tmp around from being an Xbyak::Ymm to its index.
I don't think there's anything tricky here. Spookily things
all seem to work first try, as long as I don't make a typo.
Change-Id: I2b5d4ded7800915824cbd7c917edfd36e229d306
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222528
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Bug: skia:
Change-Id: I43f723f617602de5ba2bf82ff2fe1286527a8cc6
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222500
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Robert Phillips <robertphillips@google.com>
This reverts commit 6f24faa623.
Reason for revert: larger impact on GMs than expected
Original change's description:
> GPU: always use TopLeft origin for saveLayer render targets.
>
> Should have no user-visible result.
>
> Change-Id: Iae444888557347bfdf75e6353966cde907ee0a1e
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222506
> Reviewed-by: Greg Daniel <egdaniel@google.com>
> Commit-Queue: Stephen White <senorblanco@chromium.org>
TBR=egdaniel@google.com,bsalomon@google.com,senorblanco@chromium.org
Change-Id: Ic8217206adfbd7d506a70c3d9ca828f0dde58c66
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222509
Reviewed-by: Stephen White <senorblanco@chromium.org>
Commit-Queue: Stephen White <senorblanco@chromium.org>
The bufferAllocator was being unref'ed before the SkSpinlock went out
of scope, so the spinlock was then using deleted memory.
Bug: skia:8243
Change-Id: I69f090acccaaa3ba7fe2e4190103019a6f4e9359
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222503
Commit-Queue: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Robert Phillips <robertphillips@google.com>
Change-Id: I6b709a2b5e8014cedb152e763ea4470c6f79de4b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222504
Commit-Queue: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Auto-Submit: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Should have no user-visible result.
Change-Id: Iae444888557347bfdf75e6353966cde907ee0a1e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222506
Reviewed-by: Greg Daniel <egdaniel@google.com>
Commit-Queue: Stephen White <senorblanco@chromium.org>
Change-Id: Ia2e21bc984c509cd2405499f93ee5e19941cc492
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222499
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
This reverts commit 731454a085.
Reason for revert: Seeing a lot of performance regressions.
Original change's description:
> Prefer using GrOvalOpFactory over GrFillRRect for circles and
> axis-aligned circular roundrects.
>
> Bug: chromium:971936
> Change-Id: I4cd0cd9047b9b06d657826820ba5a937547f87c3
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221000
> Commit-Queue: Jim Van Verth <jvanverth@google.com>
> Reviewed-by: Khushal Sagar <khushalsagar@chromium.org>
TBR=jvanverth@google.com,csmartdalton@google.com,khushalsagar@chromium.org
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: chromium:971936
Change-Id: Iab803b1777ef5e3d754d8ac1f404b01f57d9c1a8
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222501
Reviewed-by: Jim Van Verth <jvanverth@google.com>
Commit-Queue: Jim Van Verth <jvanverth@google.com>
Just landing the initial struct before adding specific flags to the struct
and filling out the formats.
Bug: skia:6718
Change-Id: I1013845cb61482184915d181a7fc7f85800b72c5
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222498
Reviewed-by: Robert Phillips <robertphillips@google.com>
Commit-Queue: Greg Daniel <egdaniel@google.com>
- 32x8 i32 add,sub,mul
- add I32_Naive bench/test builder to get better i32 mul coverage
- minor refactoring all over
Change-Id: I13cc19ff37a2da0bcff289ba51baac08f456d6c5
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/222485
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>