This CL:
replaces GrProxyRef with sk_sp
streamlines GrIORefProxy to be more like SkRefCntBase (i.e., move the fTarget pointer to GrSurfaceProxy)
Change-Id: I17d515100bb2d9104eed64269bd3bf75c1ebbbb8
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221997
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
Uses puppeteer to bring up Chrome headless and then calls lottie-web-perf.html with endpoints for lottie.min.js and the target lottie file.
NoTry: true
Bug: skia:9187
Change-Id: Ic1f4c9fc1cd68b3f747e58bdbf1ea51387e5e139
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221717
Commit-Queue: Ravi Mistry <rmistry@google.com>
Reviewed-by: Joe Gregorio <jcgregorio@google.com>
Change-Id: I2117e41388962682a40f9db9ffc62150b30c7847
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221779
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Change-Id: I6f9ee51f7c063ca03bf48fccd413dae244edd191
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221778
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Change-Id: I4badf65abc63c592195d0ef9e15c015f7adb54b1
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221718
Auto-Submit: Brian Osman <brianosman@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
ANGLE has a collection of platform hooks for embedding. Setting these
four allows us to get trace events related to shader compile and other
ANGLE work.
Change-Id: I11c32155023c6f4bda72daddfecc2dbe48b00675
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221657
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
In GrRecordingContext I moved the auditTrail onto the heap and only there
when compiling for tests. This allowed us to move a lot of files out of
include private.
Change-Id: Ib76ac211c0c6fd10bacaccf0c5f93f21a59f35d5
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221344
Commit-Queue: Greg Daniel <egdaniel@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Auto-Submit: Greg Daniel <egdaniel@google.com>
Uses puppeteer to bring up Chrome headless and then calls
skottie-wasm-perf-html with endpoints for canvaskit.js, canvaskit.wasm,
and the target lottie file.
Notry: true
Bug: skia:9179
Change-Id: I3250f2edf92329dce6e3f0bf125fa26b70bed632
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221557
Reviewed-by: Joe Gregorio <jcgregorio@google.com>
Commit-Queue: Ravi Mistry <rmistry@google.com>
This reverts commit 90507286cc.
Reason for revert: Seems to be breaking some builds
Original change's description:
> Shuffle SkSL sources around so compiler and bytecode can be used w/o GPU
>
> Change-Id: I7236a30040ab532086e68d6e9de2898dd7acaa32
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221098
> Commit-Queue: Brian Osman <brianosman@google.com>
> Reviewed-by: Mike Reed <reed@google.com>
> Reviewed-by: Mike Klein <mtklein@google.com>
TBR=mtklein@google.com,kjlubick@google.com,brianosman@google.com,ethannicholas@google.com,reed@google.com
Change-Id: Ie230315a72ebcfae32bc9ce7bafec1f87106cff2
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221536
Reviewed-by: Robert Phillips <robertphillips@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
This new cap tells Ganesh how many samples to use when performing
internal draws with MSAA or mixed samples. The default is always 4x,
but the client can change that with
GrContextOptions::fPreferredInternalMSAASampleCount.
Also adds a command line flag to viewer to control
fPreferredInternalMSAASampleCount.
Bug: skia:
Change-Id: Iba369273e802aa1bee796b576b3c18af347b0494
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221156
Commit-Queue: Chris Dalton <csmartdalton@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
--motion_angle ... [default is 180]
--motion_samples ... [default is 1, for no motion blur]
Change-Id: Iec0f31655b3369f51e0b398efb2d5b156dcbaf2e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221416
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Reed <reed@google.com>
Auto-Submit: Mike Reed <reed@google.com>
Optimizations to JSON size (%f -> %g) changed the meaning of the digits
argument, causing these timestamps to become severely truncated. Traces
have been fairly useless as a result (too many events starting/stopping
at the same time). This adds enough digits back that things are better.
Change-Id: I3f2d2a3dd064daf8449ac34ab5440f95e339a392
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221346
Commit-Queue: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Auto-Submit: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Change-Id: I7236a30040ab532086e68d6e9de2898dd7acaa32
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221098
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
I'm staring at this assembly,
vmovups (%rsi), %ymm3
vpsrld $24, %ymm3, %ymm4
vpslld $16, %ymm4, %ymm15
vorps %ymm4, %ymm15, %ymm4
vpsubw %ymm4, %ymm0, %ymm4
Just knowing that could be
vmovups (%rsi), %ymm3
vpshufb 0x??(%rip), %ymm3, %ymm4
vpsubw %ymm4, %ymm0, %ymm4
That is, instead of shifting, shifting, and bit-oring
to create the 0a0a scale factor from ymm3, we could just
byte shuffle directly using some pre-baked control pattern
(stored at the end of the program like other constants)
pshufb lets you arbitrarily remix bytes from its argument and
zero bytes, and NEON has a similar family of vtbl instructions,
even including that same feature of injecting zeroes.
I think I've got this working, and the speedup is great,
from 0.19 to 0.16 ns/px for I32_SWAR, and
from 0.43 to 0.38 ns/px for I32.
Change-Id: Iab850275e826b4187f0efc9495a4b9eab4402c38
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220871
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Now that we've got shr_16x2, extract(..., 8, splat(0x00ff00ff)) is
better done as shr_16x2(..., 8). This swaps a 16-bit shift in for
the 32-bit shift, a wash, but lets us drop the bit_and at the end,
saving one whole instruction.
This places I32_SWAR a tiny little bit faster than the code in Opts,
like .19 ns/px vs .20 ns/px for Opts.
Change-Id: I4160dc03ecc8b855c0773a927f1510ad5cbb4b87
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220856
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
This is the final bunny I've got in my hat, I think...
Remembering that none of the s += d*invA adds can overflow,
we can use a single 32-bit add to add them all at once.
This means we don't have to unpack the src pixel into rb/ga
halves. We need only extract the alpha for invA.
This brings I32_SWAR even with the Opts code!
curr/maxrss loops min median mean max stddev samples config bench
36/36 MB 133 0.206ns 0.211ns 0.208ns 0.211ns 1% ▁▇▁█▁▇▁▇▁▇ nonrendering SkVM_4096_I32_SWAR
37/37 MB 152 0.432ns 0.432ns 0.434ns 0.444ns 1% ▃▁▁▁▁▃▁▁█▁ nonrendering SkVM_4096_I32
37/37 MB 50 0.781ns 0.794ns 0.815ns 0.895ns 5% ▆▂█▃▅▂▂▁▂▁ nonrendering SkVM_4096_F32
37/37 MB 76 0.773ns 0.78ns 0.804ns 0.907ns 6% ▄█▅▁▁▁▁▂▁▁ nonrendering SkVM_4096_RP
37/37 MB 268 0.201ns 0.203ns 0.203ns 0.204ns 0% █▇▆▆▆▆▁▆▆▆ nonrendering SkVM_4096_Opts
Change-Id: Ibf0a9c5d90b35f1e9cf7265868bd18b7e0a76c43
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220805
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
I figure the easiest way to expose 16-bit operations
is to expose 16x2 pair operations... this means we
can continue to always work with the same size vector.
Switching from 32-bit multiplies to 16-bit multiplies
is going to deliver the most oomph... they cost roughly
half what 32-bit multiplies do on x86.
Speed now:
I32_SWAR: 0.27 ns/px
I32: 0.43 ns/px
F32: 0.76 ns/px
RP: 0.8 ns/px
Opts: 0.2 ns/px
Change-Id: I8350c71722a9bde714ba18f97b8687fe35cc749f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220709
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
I just kind of remembered that if we're doing (xy+x)/256
and x is a destination channel and y is 255-sa, then you
can get the +x for free by multiplying by 256-sa instead.
(d * (255-sa) + d)
(d * (255-sa + 1))
(d * (256-sa) )
Duh. This is a trick we play in a lot of legacy code and
I've just now realized it's exactly equivalent to the trick
I want to play here... sigh.
Folding this math in kind of makes mul/mad_unorm8 moot.
Speed's getting good:
I32_SWAR: 0.3 ns/px
I32 : 0.55 ns/px
F32 : 0.8 ns/px
RP : 0.8 ns/px
Opts : 0.2 ns/px
Change-Id: I4d10db51ea80a3258c36e97b6b334ad253804613
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220708
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
The mask-only special case for extract is wrong...
it never looked it its input!
This not only makes things correct-er, but oddly it also
makes them faster by breaking inter-loop data dependencies.
Disable tests for _I32... they're actually still broken
because of a much more systemic flaw in how I've evaluated
programs. The _F32 and _I32_SWAR JIT code and all interpreted
code is just getting lucky. o_O
While here, update the I32_SWAR code to use the same math as I32,
(x*y+x)/256 for unorm8 mul. This just helps keep me sane.
Change-Id: I1acc09adb84c426fca4b2be5ca8c2d46d9678dd8
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220577
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Change-Id: Id61b7b9d9bc7611727a27be0172fcabc2ef4345a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220522
Commit-Queue: Brian Osman <brianosman@google.com>
Commit-Queue: Ethan Nicholas <ethannicholas@google.com>
Auto-Submit: Brian Osman <brianosman@google.com>
Reviewed-by: Ethan Nicholas <ethannicholas@google.com>
This makes our usage of sk_cf_obj consistent with Chrome.
Bug: skia:8243
Change-Id: I159339577a0e8595e7cdd47ffb9ab0653269e108
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218973
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Jim Van Verth <jvanverth@google.com>
This makes GrCFResource a template class with similar semantics to sk_sp.
Change-Id: I9ae9988dac6b39477b16d65591ef6fff44903c36
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218376
Commit-Queue: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Reviewed-by: Mike Reed <reed@google.com>
We want the non-color, non-pixel-data version of createBackendTexture to truly create uninitialized textures.
Change-Id: I08867508ea181b7ba3685638cc7a3ea11d527a24
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218396
Commit-Queue: Robert Phillips <robertphillips@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Reviewed-by: Greg Daniel <egdaniel@google.com>
Convert extract(x,bits,z) to be (x >> bits) & z,
now a more explicit parallel to pack().
This lets us eliminate the funky bit counting required from the old
instruction, but more saliently it makes it more likely that the masks
we AND with will be the same value.
Ultimately down at the x86 or ARM ISA level, the AND instructions don't
really benefit from having an immediate argument (while the shifts do).
We might as well treat the mask as a normal value, letting it get
commoned with identical values, loop hoisted, etc.
Change-Id: I48a38468b46f2c730574c025f412262296472447
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219597
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Just your basic periodic cleanup.
Change-Id: I04d9ca5cfff05e6e2c29dd9caef3ce12cda45247
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219340
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Previous changes fixed the zombie device and queue, assuming that
they're bridged correctly.
Bug: skia:8243
Change-Id: Id4c2d10beacbb2ac749187d8d54fc2d276cf7b3d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219378
Auto-Submit: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
At some point adding more and more complex instructions reduces
to the absurdity of SolveTheWholeProblem-The-Instruction, but
I think this one will come up often enough to still make sense.
mad() makes sense for unorm8 just about everywhere mad() makes
sense for f32.
This instruction won't matter to a JIT, but helps the interpreter.
Change-Id: Iace92296cffbb6fbc3acd1f853cb01c51792f796
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218716
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Change-Id: I5bccb29582f894c982de78e41fd5dbf3888b9be1
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218190
Commit-Queue: Ravi Mistry <rmistry@google.com>
Commit-Queue: Eric Boren <borenet@google.com>
Reviewed-by: Ravi Mistry <rmistry@google.com>
Kind of the flip side of pack.
Made slightly awkward by instructions having only one immediate...
calling _BitScanForward / __builtin_ctz() at runtime seems to work
fine, but it really could have been done at compile time.
Change-Id: Ic83fe8e0a1603fb9189598dcc26c842cc797bf45
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218241
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
This instruction can lower to some useful SSE/NEON
instructions, and even if not, is a handy way to
express the frequent paring of << and |.
I32_SWAR: 2.3 -> 1.9
I32: 2.6 -> 2.4
F32: 5.1 -> 4.7
Change-Id: Ia169ad40f0aaef32417e05d9bf91c2d2542e7b5f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218238
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Another way for an interpreter to go faster
is to provide better instructions.
mul_unorm8 is one we use all the time.
Drops _I32 bench from ~3.6ns/px to ~2.6ns/px.
Change-Id: I9d08914c114048b79075796af9ec802236b35706
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218236
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Eliminate the duplicate functionality,
and better testing for the bench builders.
Change-Id: If20e52107738903f854aec431416e573d7a7d640
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218041
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
For Skia's testing we always want an initialized backend texture. Except for the
BackendAllocationTests of course.
Change-Id: I47b5e4c9845b3f58ebad5ba052780a69d6cd6954
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/216348
Reviewed-by: Greg Daniel <egdaniel@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
Bug: skia:8243
Change-Id: I609db0bf4183b3508c002f9f2d0a8a70a276edca
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217942
Commit-Queue: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Greg Daniel <egdaniel@google.com>
This is a reland of a36e089065
This is only active when Metal is enabled.
Original change's description:
> Added AutoreleasePool for managing pool memory in testing apps.
>
> This is only active on MacOS and iOS -- on other platforms it
> will do nothing as they have no need for autorelease pools.
>
> Bug: skia:8243
> Change-Id: Ib74968dab6e3455a72e726429832101d0d410076
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217126
> Commit-Queue: Jim Van Verth <jvanverth@google.com>
> Reviewed-by: Brian Osman <brianosman@google.com>
Bug: skia:8243
Change-Id: I743a3dcc93b46387a6a330e855c2e8810b482544
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217379
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Jim Van Verth <jvanverth@google.com>
The texture resource in MtlTextureInfo is a CoreFoundation object,
so we have to manage the refcounting ourselves. GrCFResource is a
wrapper class which will automatically take care of this for us on
creation, assignment, and destruction.
Bug: chromium:964498
Change-Id: I9a3768833744d4aaaec2142200283f0e7eaad165
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/216351
Commit-Queue: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Reviewed-by: Christopher Cameron <ccameron@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
This reverts commit a36e089065.
Reason for revert: Primary suspect in breaking G3
Original change's description:
> Added AutoreleasePool for managing pool memory in testing apps.
>
> This is only active on MacOS and iOS -- on other platforms it
> will do nothing as they have no need for autorelease pools.
>
> Bug: skia:8243
> Change-Id: Ib74968dab6e3455a72e726429832101d0d410076
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217126
> Commit-Queue: Jim Van Verth <jvanverth@google.com>
> Reviewed-by: Brian Osman <brianosman@google.com>
TBR=egdaniel@google.com,jvanverth@google.com,brianosman@google.com
Change-Id: I64f6e0baba21a9d35682ab53bdf418180be8579b
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: skia:8243
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217377
Reviewed-by: Kevin Lubick <kjlubick@google.com>
Commit-Queue: Kevin Lubick <kjlubick@google.com>
This is only active on MacOS and iOS -- on other platforms it
will do nothing as they have no need for autorelease pools.
Bug: skia:8243
Change-Id: Ib74968dab6e3455a72e726429832101d0d410076
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217126
Commit-Queue: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>