I realized when writing the comment on https://crrev.com/1135363002/
that I'd really just sketched out the entire thing, so I couldn't help
but actually write up a working CL. How does this do for your benchmark?
BUG=chromium:487075
Review URL: https://codereview.chromium.org/1130123006
SSE runs 2-3x faster (than 4f), NEON runs 1.2-1.4x faster (than existing NEON).
Small diffs on {aarectmodes, imagefilters_xfermodes, hairmodes, mixed_xfermodes} only on AA edges due to precision drop.
BUG=skia:
Review URL: https://codereview.chromium.org/1132853005
Implemented by extracting out the non-scale/translate components
and applying that post-filter as an SkMatrixImageFilter.
BUG=skia:
Review URL: https://codereview.chromium.org/1120043002
alphas() extracts the 4 alphas from an existing Sk4px as another Sk4px.
LoadNAlphas() constructs an Sk4px from N packed alphas.
In both cases, we end up with 4x repeated alphas aligned with their pixels.
alphas()
A0 R0 G0 B0 A1 R1 G1 B1 A2 R2 G2 B2 A3 R3 G3 B3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3
Load4Alphas()
A0 A1 A2 A3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3
Load2Alphas()
A0 A1
->
A0 A0 A0 A0 A1 A1 A1 A1 0 0 0 0 0 0 0 0
This is a 5-10% speedup for AA on Intel, and wash on ARM.
AA is still mostly dominated by the final lerp.
alphas() isn't used yet, but it's similar enough to Load[24]Alphas()
that it was easier to write all at once.
BUG=skia:
Review URL: https://codereview.chromium.org/1138333003
Multiple Master and TrueType fonts support variation axes.
This implements back-end support for axes on platforms which
support it.
Review URL: https://codereview.chromium.org/1027373002
Confirm that no path ops tests are flaky, and clean up errors around
that. The test framework was incorrectly checking for >= MAX_ERRORS for
failure and <= MAX_ERRORS for success.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1140563003
The xml parsing of the Android font configuration is quite lax in what
it allows. This has lead to issues with forward compatibility and
extending the existing format. This has lead to some confusion about
what the actual format is and how a given proposed change will affect
existing files and readers.
The main issue this fixes is containment. Tags are now only recognized
at the correct levels in the correct containing tags. Tags which are
not recognized are properly skipped. Tags which accumulate character
data now only accumulate the character data in their own element as
opposed to all child elements.
Review URL: https://codereview.chromium.org/1138073002
Improve line/curve coincident detection and resolution. This fixed the remaining simple failures.
When an edge is unsortable, use the ray intersection to determine the angles' winding.
Deal with degenerate segments.
TBR=reed@google.com
BUG=skia:3588,skia:3762
Review URL: https://codereview.chromium.org/1140813002
For SSE, Sk4px is better than Sk4f is better than SkXfermodes_opts_SSE2 (where implemented).
For NEON, Sk4px is better than SkXfermodes_opts_arm_neon is better than Sk4f (where implemented).
This is a 1.6-1.9x speedup for Plus,Modulate, and Screen for NEON.
BUG=skia:
Review URL: https://codereview.chromium.org/1128053004
Xfermode_Plus runs 4-5x faster.
We expect mixed_xfermodes to have a small diff. This is because kFoldCoverageIntoSrcAlpha was incorrectly set to true.
This implementation handily beats the Sk4f impl, the portable impl, and the existing SSE2 impl. Reading the SkXfermodes_opts_SSE2.cpp file, I'm pretty confident that we'll be able to beat all SSE2 impls.
I believe this impl will beat or match the existing NEON impl too, but that may not be true for more complicated xfermodes. They can take advantage of transposing ARGBARGB... to AAAARRRR.... cheaply and I haven't figured out an abstraction for that yet that doesn't screw SSE.
Adds:
- MapDstSrc() to Sk4px
- saturatedAdd() to SkNi (only implemented as far as it's used).
- div255Narrow()
BUG=skia:
Review URL: https://codereview.chromium.org/1138893002
This is a quick Skia transcription of the Chromium tool at
src/skia/tools/filter_fuzz_stub.cc
to read and decode filters captured as .fil files.
R=joshualitt@google.com,mtklein@google.com,reed@google.com,robertphillips@google.com
BUG=487213
Review URL: https://codereview.chromium.org/1126423005
Xfermode_SrcOver:
SSE: 2.08ms -> 2.03ms (~2% faster)
NEON: my N5 is noisy, but there appears to be no perf change
BUG=skia:
Review URL: https://codereview.chromium.org/1132273004