The implementation is nearly identical to Sk2f, with these changes:
- float32x2_t -> float64x2_t
- vfoo -> vfooq
- one extra Newton's method step in sqrt().
Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON).
SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler
seem to set __ARM_NEON__), so this CL fixes everything up.
BUG=skia:
Review URL: https://codereview.chromium.org/1020963002
Initial experiments did show that the 256 tile size fixed the hd2000 win7
nanobot failures. However it did not have any effect on other bots, so this
change is to move back to the larger tile size on all bots expect for the
hd2000.
BUG=skia:
Review URL: https://codereview.chromium.org/1022083002
Due to revertapalooza, stencil buffers shared across differently sized rendertargets aren't getting cleared correctly. This CL disables the sharing until the clearing issues can be remedied. Note that stencil buffers should still be shared between identically sized render targets and should still be lazily allocated.
Review URL: https://codereview.chromium.org/1020203002
This should make it easy to compare performance of the non-SIMD Sk2x / Sk4x
code with our existing portable scalar code. I'm not adding this to SkPMFloat
only because we don't have an existing scalar baseline there to compare to.
We'll have to keep our wits about us: I just tried your new benchmarks, and
Clang's autovectorizer produced almost as good SSE as we did with intrinsics for
geo_evalquadat1 and geo_evalquadtangentat1, but not for geo_chopquadat1,
which went serial.
BUG=skia:
Review URL: https://codereview.chromium.org/1026723003
Also decreases the precision of Sk4f::rsqrt() for speed, keeping Sk4f::sqrt() the same:
instead of doing two estimation steps in rsqrt(), do one there and one more in sqrt().
Tests pass on my Nexus 7. float64x2_t is still a TODO for when I get a hold of a Nexus 9.
BUG=skia:
Review URL: https://codereview.chromium.org/1018423003
The bench improves from 39 to 30, about half from porting to Sk2f, half from
x.add(x) instead of x.multiply(two).
Remove Sk4f Load2/store2 now that we have Sk2f.
BUG=skia:
Review URL: https://codereview.chromium.org/1019773004
Since we're not scaling down as far, we can increase the base
sizes a little.
Also darken debug text colors to get better value comparison
with black.
BUG=skia:3541
Review URL: https://codereview.chromium.org/1001183003
Going back to old nanobench tile size to see if the increase to tile is what has been
causing recent nanobench crashes. The crashes seem very nondeterministic and hard to
debug manually.
256x256 is too small of a tile to give accurate gpu results but if this fixes we can try some compromise in the middle
BUG=skia:
Review URL: https://codereview.chromium.org/1022823003
FreeType always returns the 'hhea' font metrics for ascent and descent,
and ignores the 'OS/2'::fsSelection::UseTypoMetrics bit. (It also
ignores the VDMX table, which makes this change correct.) This change
uses the typographic font metrics when the font requests their use.
Review URL: https://codereview.chromium.org/1020643002
Rather than making SkCodec an option instead of SkImageDecoder,
create a separate CodecSrc. This allows us to compare the two.
For both CodecSrc and ImageSrc, do not decode to a gpu backend.
BUG=skia:3475
Review URL: https://codereview.chromium.org/978823002
Reason for revert:
Bad glyphs in dftext GM.
Original issue's description:
> Ensure that we use different glyph entries for regular and df text.
>
> Currently if we switch between regular text and df text while using
> the same GrContext, they may use the same entry in the Ganesh font cache,
> which is incorrect. This change ensures that they will have different entries.
>
> Committed: https://skia.googlesource.com/skia/+/8dc58edd71c11f232860724dfa3b566895478034TBR=joshualitt@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1011403005
Currently if we switch between regular text and df text while using
the same GrContext, they may use the same entry in the Ganesh font cache,
which is incorrect. This change ensures that they will have different entries.
Review URL: https://codereview.chromium.org/1020593003
In order to implement these stencil clears we had to use a work around where we
would bind a color buffer renderbuffer to the fbo before clearing the stencil buffer.
However this workaround seems to cause the win 7 hd2000 machines to all crash on some
memory access issue.
For now we will comment on the change and go back to the old world
BUG=skia:
Review URL: https://codereview.chromium.org/1015223002
(This is essentially a revert of https://codereview.chromium.org/503833002/.)
This was necessary back when SkPaint was flattened even for in-process use. Now that we only flatten SkPaint for cross-process use, there's no need to serialize UniqueIDs.
Note: SkDropShadowImageFilter is being constructed with a croprect and UniqueID (of 0) in Blink. I've made the uniqueID param default to 0 temporarily, until this rolls in and Blink can be changed. (Blink can't be changed first, since unlike the other filters, there's no constructor that takes a cropRect but not a uniqueID.)
BUG=skia:
Review URL: https://codereview.chromium.org/1019493002
We now support kN32 and kRGB_565 color types.
Additionally, we support premul, unpremul, and opaque alpha types.
Unpremul is currently untested as we cannot currently draw to unpremul.
BUG=skia:
Review URL: https://codereview.chromium.org/1013743003
A store/load pair like this is a redundant no-op:
store simd_register_a, memory_address
load memory_address, simd_register_a
Everyone seems to be good at removing those when using SSE, but GCC and Clang
are pretty terrible at this for NEON. We end up issuing both redundant
commands, usually to and from the stack. That's slow. Let's not do that.
This CL unions in the native SIMD register type into SkPMFloat, so that we can
assign to and from it directly, which is generating a lot better NEON code. On
my Nexus 5, the benchmarks improve from 36ns to 23ns.
SSE is just as fast either way, but I paralleled the NEON code for consistency.
It's a little terser. And because it needed the platform headers anyway, I
moved all includes into SkPMFloat.h, again only for consistency.
I'd union in Sk4f too to make its conversion methods a little clearer,
but MSVC won't let me (it has a copy constructor... they're apparently not up
to speed with C++11 unrestricted unions).
BUG=skia:
Review URL: https://codereview.chromium.org/1015083004
Add call to SkScalarRoundToScalar(). The old code calculated the scale
from the text size, but now the text size is calculated from the scale
(which is arguably the right way to think about it). However, the old
code always rounded the final resulting text size, while the new code
does not.
In the 'no hinting' case, the text size is already rounded to an integer
(so that the rest of the matrix is minimized). In the 'hinted' case, the
entire scale has been removed from the matrix, so the scale value is the
'real' residual size. The old code rounded this size, and the new code
should as well.
BUG=464784
Review URL: https://codereview.chromium.org/1014953002