I'm staring at this assembly,
vmovups (%rsi), %ymm3
vpsrld $24, %ymm3, %ymm4
vpslld $16, %ymm4, %ymm15
vorps %ymm4, %ymm15, %ymm4
vpsubw %ymm4, %ymm0, %ymm4
Just knowing that could be
vmovups (%rsi), %ymm3
vpshufb 0x??(%rip), %ymm3, %ymm4
vpsubw %ymm4, %ymm0, %ymm4
That is, instead of shifting, shifting, and bit-oring
to create the 0a0a scale factor from ymm3, we could just
byte shuffle directly using some pre-baked control pattern
(stored at the end of the program like other constants)
pshufb lets you arbitrarily remix bytes from its argument and
zero bytes, and NEON has a similar family of vtbl instructions,
even including that same feature of injecting zeroes.
I think I've got this working, and the speedup is great,
from 0.19 to 0.16 ns/px for I32_SWAR, and
from 0.43 to 0.38 ns/px for I32.
Change-Id: Iab850275e826b4187f0efc9495a4b9eab4402c38
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220871
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Yet another way to transform a layer, disguised as a distort effect.
TBR=
Change-Id: Ic2d5479fa6ae27b460de60875924f73f77fc7f71
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/221001
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Now that we've got shr_16x2, extract(..., 8, splat(0x00ff00ff)) is
better done as shr_16x2(..., 8). This swaps a 16-bit shift in for
the 32-bit shift, a wash, but lets us drop the bit_and at the end,
saving one whole instruction.
This places I32_SWAR a tiny little bit faster than the code in Opts,
like .19 ns/px vs .20 ns/px for Opts.
Change-Id: I4160dc03ecc8b855c0773a927f1510ad5cbb4b87
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220856
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
This is the final bunny I've got in my hat, I think...
Remembering that none of the s += d*invA adds can overflow,
we can use a single 32-bit add to add them all at once.
This means we don't have to unpack the src pixel into rb/ga
halves. We need only extract the alpha for invA.
This brings I32_SWAR even with the Opts code!
curr/maxrss loops min median mean max stddev samples config bench
36/36 MB 133 0.206ns 0.211ns 0.208ns 0.211ns 1% ▁▇▁█▁▇▁▇▁▇ nonrendering SkVM_4096_I32_SWAR
37/37 MB 152 0.432ns 0.432ns 0.434ns 0.444ns 1% ▃▁▁▁▁▃▁▁█▁ nonrendering SkVM_4096_I32
37/37 MB 50 0.781ns 0.794ns 0.815ns 0.895ns 5% ▆▂█▃▅▂▂▁▂▁ nonrendering SkVM_4096_F32
37/37 MB 76 0.773ns 0.78ns 0.804ns 0.907ns 6% ▄█▅▁▁▁▁▂▁▁ nonrendering SkVM_4096_RP
37/37 MB 268 0.201ns 0.203ns 0.203ns 0.204ns 0% █▇▆▆▆▆▁▆▆▆ nonrendering SkVM_4096_Opts
Change-Id: Ibf0a9c5d90b35f1e9cf7265868bd18b7e0a76c43
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220805
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
I figure the easiest way to expose 16-bit operations
is to expose 16x2 pair operations... this means we
can continue to always work with the same size vector.
Switching from 32-bit multiplies to 16-bit multiplies
is going to deliver the most oomph... they cost roughly
half what 32-bit multiplies do on x86.
Speed now:
I32_SWAR: 0.27 ns/px
I32: 0.43 ns/px
F32: 0.76 ns/px
RP: 0.8 ns/px
Opts: 0.2 ns/px
Change-Id: I8350c71722a9bde714ba18f97b8687fe35cc749f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220709
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
I just kind of remembered that if we're doing (xy+x)/256
and x is a destination channel and y is 255-sa, then you
can get the +x for free by multiplying by 256-sa instead.
(d * (255-sa) + d)
(d * (255-sa + 1))
(d * (256-sa) )
Duh. This is a trick we play in a lot of legacy code and
I've just now realized it's exactly equivalent to the trick
I want to play here... sigh.
Folding this math in kind of makes mul/mad_unorm8 moot.
Speed's getting good:
I32_SWAR: 0.3 ns/px
I32 : 0.55 ns/px
F32 : 0.8 ns/px
RP : 0.8 ns/px
Opts : 0.2 ns/px
Change-Id: I4d10db51ea80a3258c36e97b6b334ad253804613
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220708
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
The mask-only special case for extract is wrong...
it never looked it its input!
This not only makes things correct-er, but oddly it also
makes them faster by breaking inter-loop data dependencies.
Disable tests for _I32... they're actually still broken
because of a much more systemic flaw in how I've evaluated
programs. The _F32 and _I32_SWAR JIT code and all interpreted
code is just getting lucky. o_O
While here, update the I32_SWAR code to use the same math as I32,
(x*y+x)/256 for unorm8 mul. This just helps keep me sane.
Change-Id: I1acc09adb84c426fca4b2be5ca8c2d46d9678dd8
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220577
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Add logic to adjust glyph positions based on animated tracking properties.
This adjustment is applied post-shaping (it doesn't observe the text box),
and requires line re-alignment - thus it is being processed per-line.
Change-Id: Id44a295032a48c7216f126cb02dd2d2d5cc18ae3
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220076
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Range selector's "Based On" property controls how range indices map
to glyphs: characters, characters-excluding-spaces, words, lines.
To support this feature:
- update SkottieShaper to track domain-relevant info per fragment
(fLineIndex, fIsWhitespace)
- update TextAdapter to build domain maps
(domain index -> fragment span)
- update RangeSelector to run its range indices through a domain map,
if present.
Change-Id: I80e713f6beaa2578aa0eae1d1ddae8e1e47d8d10
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219859
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Ben Wagner <bungeman@google.com>
I used to have a dump of the value program before it was
translated to registers, but it went away a while ago.
This restores it.
Change-Id: I9b8bfcb124843cad4b0dc44bdf0a03e95a0c83d8
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219757
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Auto-Submit: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Convert extract(x,bits,z) to be (x >> bits) & z,
now a more explicit parallel to pack().
This lets us eliminate the funky bit counting required from the old
instruction, but more saliently it makes it more likely that the masks
we AND with will be the same value.
Ultimately down at the x86 or ARM ISA level, the AND instructions don't
really benefit from having an immediate argument (while the shifts do).
We might as well treat the mask as a normal value, letting it get
commoned with identical values, loop hoisted, etc.
Change-Id: I48a38468b46f2c730574c025f412262296472447
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219597
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
The current implementation applies constant coverage (outside selector
range) based on computed integral edges.
But the integral range is clamped to the valid index domain and its
extremes are always assumed to have partial coverage - so we never get
to constant-blit the full buffer when the interval is outside, which
can yield incorrect coverage for the first/last fragments.
Update the constant coverage logic to operate in full domain coordinates.
Change-Id: I23902674fe5e822081fb8262167511df1cc3463e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/219206
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Introduce square/ramp/triangle/round/smooth shape generators,
and use them to seed the range selector coverage pipeline.
Change-Id: Ib7b94ceecd2ccf66820f4dd2443fdd62e2ac6a1b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218828
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Ben Wagner <bungeman@google.com>
At some point adding more and more complex instructions reduces
to the absurdity of SolveTheWholeProblem-The-Instruction, but
I think this one will come up often enough to still make sense.
mad() makes sense for unorm8 just about everywhere mad() makes
sense for f32.
This instruction won't matter to a JIT, but helps the interpreter.
Change-Id: Iace92296cffbb6fbc3acd1f853cb01c51792f796
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218716
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
I'm of two minds about this... it adds register pressure and really only
tends to hoist few instructions that are fairly cheap anway. On the
other hand, it's neat, it's easy to turn off (just set the initial
hoist value to false in Builder::push()) and it does deliver a
noticeable though slight performance improvement in the interpreter.
I think the final decision will probably come down to what we think
about maintainability?
Change-Id: Idd6346f70f03188917918406731154246a7c6fcb
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218584
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
This reverts commit 346f82c1c3.
Reason for revert: *SAN bots
Original change's description:
> print 1/K floats as fractions
>
> Change-Id: Id00cbd0950e77debb5ab5d45541dc0f8d13a3c42
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218338
> Reviewed-by: Brian Osman <brianosman@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>
TBR=mtklein@google.com,brianosman@google.com
Change-Id: Ic35cec97d2dc2c1e19dbdf8ea7b505ad75072da1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218529
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Change-Id: Id00cbd0950e77debb5ab5d45541dc0f8d13a3c42
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218338
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Each animator can have multiple range selectors, whose combined "coverage"
modulates how the animator props compose with other/initial props.
Since there can be multiple animators with different/arbitrary selectors,
we compute independent property values for each fragment.
Supported features:
- start, end, offset, amount
- units: percentage, index
- based-on: characters-only for now
- mode: add-only for now
- shape: square-only for now
Change-Id: If7fee46ffb29e1f92542822481ed699fd0b0b521
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218076
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Ben Wagner <bungeman@google.com>
Change-Id: Ib0d4f354787e413749fdda8b59ccc2f94472b0ce
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218243
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Kind of the flip side of pack.
Made slightly awkward by instructions having only one immediate...
calling _BitScanForward / __builtin_ctz() at runtime seems to work
fine, but it really could have been done at compile time.
Change-Id: Ic83fe8e0a1603fb9189598dcc26c842cc797bf45
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218241
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
This instruction can lower to some useful SSE/NEON
instructions, and even if not, is a handy way to
express the frequent paring of << and |.
I32_SWAR: 2.3 -> 1.9
I32: 2.6 -> 2.4
F32: 5.1 -> 4.7
Change-Id: Ia169ad40f0aaef32417e05d9bf91c2d2542e7b5f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218238
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Another way for an interpreter to go faster
is to provide better instructions.
mul_unorm8 is one we use all the time.
Drops _I32 bench from ~3.6ns/px to ~2.6ns/px.
Change-Id: I9d08914c114048b79075796af9ec802236b35706
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218236
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Eliminate the duplicate functionality,
and better testing for the bench builders.
Change-Id: If20e52107738903f854aec431416e573d7a7d640
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218041
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
This is mostly to test how easy rebaselining SkVMTest is.
Change-Id: I27ab6f6bb8b7e126327374009783afd86d416f55
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218039
Reviewed-by: Brian Osman <brianosman@google.com>
- keep expectations in resources/
- overwrite automatically if needed
so we can see the diff in Git
Change-Id: I2486b127ebcc7f40332fd0462e38b1af04d3e32b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/218038
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
The interesting bit here is a change in glyph positioning:
AE text animator transforms are to be applied relative to the glyph position.
To support this behavior, update Shaper to externalize glyph positioning
when in fragmented mode. I.e. instead of baking glyph positions in blobs,
apply them at the scene graph transform level (such that they compose with
animated transforms correctly).
Change-Id: I9aeb5e6f8c1ec1a2c8b5351e8fc2a73d4bdf5cad
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217556
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Limitations:
- no range selectors (applies to the whole text)
- only position, fill color and stroke color for now
Change-Id: I91e88a6107c5f66687c1c27f27a71be3914bde25
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/217386
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Ben Wagner <bungeman@google.com>
Introduce a new Shaper::Valign enum to support aligning the shaped text
visual bottom with the text box bottom.
This option corresponds to JSON prop sk_vj: 2.
kResizeToFit (used to be sk_vj: 2) is now bumped to sk_vj: 3.
Change-Id: Ib1621a21a42bfc21c99826e203c587a3fdc663dc
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/215821
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
This is implemented at backend-neutral level and so misses some
opportunities to reduce the number of passes in the GPU backend.
Filter quality is interpreted as:
none - single nearest neighbor resampling
low - chain of bilinear resamplings. 2x up/down except for one
step which may be smaller than 2x.
medium - same as low
high - when both scale factors are up then same as low but with bicubic
filtering rather than linear. Otherwise, same as low.
Bug: skia:8962
Change-Id: I4467636c14b802d6a0d9b5c363c1ad9e87a1a44b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/213831
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Salomon <bsalomon@google.com>
Update sample effects to use that (and remove the need for the
hacky workaround "random -> frame" affector I was using).
Current perf on my workstation, 6k particles updating:
native: 0.67 ms
interp: 0.97 ms
Change-Id: I3a2168c210d7431ffffe2b87ab6adade69f1dce7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/214190
Reviewed-by: Ethan Nicholas <ethannicholas@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Change-Id: Ib440570ecbd46b5bc98d346592cbbb72f58ae85a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/212500
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Ethan Nicholas <ethannicholas@google.com>
Introduce a new SkottieShaper VAlign option (kResizeToFit), to scale the text
size for the best box fit.
The basic idea is to perform a binary search on the font size, until
the shaped text fits snuggly within the specified box. The search is
focused on height, as horizontal fitting is assumed to be handled in
SkShaper.
Change-Id: I56269e02dda7a34e4ef3b79c205ea651b909f370
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/212962
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
This allows for testing falling into various buckets in the gpu
fallbacks.
Change-Id: Ia0c319a6bdd03c5cdece1ce83ab228c1a3a7c46d
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/199420
Reviewed-by: Jim Van Verth <jvanverth@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
Motivation: it would be a good idea if the API documentation examples
were checked into the skia repository, so we could make sure they
compile as part of the commit queue.
Fiddle would make/update a named fiddle each time it gets a new
commit of Skia, extracted from the code in the examples/ directory.
The docs would point at those named fiddles. Named fiddles have urls
in the form:
https://fiddle.skia.org/c/@Bitmap_000
Then we would stick a link to the example into the header documentation
like this:
/** Allocates the pixel memory for the bitmap, given its dimensions
and SkColorType. Returns true on success, where success means
either setPixels() or setPixelRef() was called.
@param bitmap SkBitmap containing SkImageInfo as input, and
SkPixelRef as output
@return true if SkPixelRef was allocated
@example https://fiddle.skia.org/c/@Bitmap_000
*/
bool allocPixelRef(SkBitmap* bitmap) override;
There are still around 200 disabled examples that need to be fixed
(these result from API changes since the author left).
Change-Id: I14a31348a9ccaaa31f65424b91e3a3533d2583a7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/198824
Commit-Queue: Hal Canary <halcanary@google.com>
Reviewed-by: Leon Scroggins <scroggo@google.com>
Reviewed-by: Joe Gregorio <jcgregorio@google.com>
Bug: skia:
Change-Id: I499262277ac1c8d92a39a66f6e846e248b102aef
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/197767
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Kevin Lubick <kjlubick@google.com>
All curves (and path affectors) are driven by an SkParticleValue. The
value can derive its value from the current defaults (age of particle
or effect), or explicitly choose the other one, a random value, or any
other particle value. Values can be range adjusted and support repeat,
clamp, and mirror tiling.
Also fixed some more issues related to resource path in the slide GUI.
Bug: skia:
Change-Id: I4755018d5b57ae2d5ec400d541055ca4fb542978
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/196760
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Bug: skia:
Change-Id: Ie6485a11bb57fecef470d727dcf3b4fe5dff0b90
Reviewed-on: https://skia-review.googlesource.com/c/195582
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Added explicit Linear segment type, merge math evaluation helpers for
scalar and color curves. Add logic to visitFields that cuts down on the
serialized size of simple curves, and makes the GUI easier to work with.
Remove the curve plot from the GUI. It was incorrect (wrong points at
cubic handle locations), not terribly helpful, and difficult to
maintain.
Bug: skia:
Change-Id: I190cb5d118b1f4b910984e4df50ee3351c8be895
Reviewed-on: https://skia-review.googlesource.com/c/195884
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
The other generator was never used (or useful). String-based serialization
of enums is quite helpful, though.
Bug: skia:
Change-Id: Ic9d58f8d20cfe7aba47722bd74f1e6f8f0f219e5
Reviewed-on: https://skia-review.googlesource.com/c/195368
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Add particle "frame" enum, to allow effects relative to local, world,
or velocity. Remove the "orient along velocity" and replace with a much
more general orientation affector (angle curve + frame). Add an angular
velocity affector to mirror the behavior of the linear velocity affector.
Bug: skia:
Change-Id: Ibbaaeb352c9547d00d81c7916d00148dd65ed2b9
Reviewed-on: https://skia-review.googlesource.com/c/195361
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Adjust reference frame for affector to be consistent (so angles are
counted clockwise from "up" in both local and world modes).
Bug: skia:
Change-Id: I643e1484bc0a58d1f1c0cfe35ac2ab37dc2ea409
Reviewed-on: https://skia-review.googlesource.com/c/194189
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>