The implementation is nearly identical to Sk2f, with these changes:
- float32x2_t -> float64x2_t
- vfoo -> vfooq
- one extra Newton's method step in sqrt().
Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON).
SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler
seem to set __ARM_NEON__), so this CL fixes everything up.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/e57b5cab261a243dcbefa74c91c896c28959bf09
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.7-Clang-Arm7-Debug-iOS-Trybot,Build-Ubuntu-GCC-Arm64-Release-Android-Trybot
Review URL: https://codereview.chromium.org/1020963002
Adds an SSE2 version of the Color32A_D565 function, to replace
the existing SSE4 version. Also does some minor cleanup.
Performance improvement in the following Skia benchmarks.
Measured on Atom Silvermont:
Xfermode_SrcOver - x3
luma_colorfilter_large - x4.6
luma_colorfilter_small - x2
tablebench - ~15%
chart_bw - ~10%
Measured on Corei7 Haswell:
luma_colorfilter_large running SSE2 - x2
luma_colorfilter_large running SSE4 - x2.3
Also improves performance in WPS Office application and 2D subtest of 0xbenchmark on Android.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Review URL: https://codereview.chromium.org/923523002
The implementation is nearly identical to Sk2f, with these changes:
- float32x2_t -> float64x2_t
- vfoo -> vfooq
- one extra Newton's method step in sqrt().
Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON).
SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler
seem to set __ARM_NEON__), so this CL fixes everything up.
BUG=skia:
Review URL: https://codereview.chromium.org/1020963002
Initial experiments did show that the 256 tile size fixed the hd2000 win7
nanobot failures. However it did not have any effect on other bots, so this
change is to move back to the larger tile size on all bots expect for the
hd2000.
BUG=skia:
Review URL: https://codereview.chromium.org/1022083002
Due to revertapalooza, stencil buffers shared across differently sized rendertargets aren't getting cleared correctly. This CL disables the sharing until the clearing issues can be remedied. Note that stencil buffers should still be shared between identically sized render targets and should still be lazily allocated.
Review URL: https://codereview.chromium.org/1020203002
This should make it easy to compare performance of the non-SIMD Sk2x / Sk4x
code with our existing portable scalar code. I'm not adding this to SkPMFloat
only because we don't have an existing scalar baseline there to compare to.
We'll have to keep our wits about us: I just tried your new benchmarks, and
Clang's autovectorizer produced almost as good SSE as we did with intrinsics for
geo_evalquadat1 and geo_evalquadtangentat1, but not for geo_chopquadat1,
which went serial.
BUG=skia:
Review URL: https://codereview.chromium.org/1026723003
Also decreases the precision of Sk4f::rsqrt() for speed, keeping Sk4f::sqrt() the same:
instead of doing two estimation steps in rsqrt(), do one there and one more in sqrt().
Tests pass on my Nexus 7. float64x2_t is still a TODO for when I get a hold of a Nexus 9.
BUG=skia:
Review URL: https://codereview.chromium.org/1018423003
The bench improves from 39 to 30, about half from porting to Sk2f, half from
x.add(x) instead of x.multiply(two).
Remove Sk4f Load2/store2 now that we have Sk2f.
BUG=skia:
Review URL: https://codereview.chromium.org/1019773004
Since we're not scaling down as far, we can increase the base
sizes a little.
Also darken debug text colors to get better value comparison
with black.
BUG=skia:3541
Review URL: https://codereview.chromium.org/1001183003
Going back to old nanobench tile size to see if the increase to tile is what has been
causing recent nanobench crashes. The crashes seem very nondeterministic and hard to
debug manually.
256x256 is too small of a tile to give accurate gpu results but if this fixes we can try some compromise in the middle
BUG=skia:
Review URL: https://codereview.chromium.org/1022823003
FreeType always returns the 'hhea' font metrics for ascent and descent,
and ignores the 'OS/2'::fsSelection::UseTypoMetrics bit. (It also
ignores the VDMX table, which makes this change correct.) This change
uses the typographic font metrics when the font requests their use.
Review URL: https://codereview.chromium.org/1020643002
Rather than making SkCodec an option instead of SkImageDecoder,
create a separate CodecSrc. This allows us to compare the two.
For both CodecSrc and ImageSrc, do not decode to a gpu backend.
BUG=skia:3475
Review URL: https://codereview.chromium.org/978823002
Reason for revert:
Bad glyphs in dftext GM.
Original issue's description:
> Ensure that we use different glyph entries for regular and df text.
>
> Currently if we switch between regular text and df text while using
> the same GrContext, they may use the same entry in the Ganesh font cache,
> which is incorrect. This change ensures that they will have different entries.
>
> Committed: https://skia.googlesource.com/skia/+/8dc58edd71c11f232860724dfa3b566895478034TBR=joshualitt@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1011403005
Currently if we switch between regular text and df text while using
the same GrContext, they may use the same entry in the Ganesh font cache,
which is incorrect. This change ensures that they will have different entries.
Review URL: https://codereview.chromium.org/1020593003
In order to implement these stencil clears we had to use a work around where we
would bind a color buffer renderbuffer to the fbo before clearing the stencil buffer.
However this workaround seems to cause the win 7 hd2000 machines to all crash on some
memory access issue.
For now we will comment on the change and go back to the old world
BUG=skia:
Review URL: https://codereview.chromium.org/1015223002
(This is essentially a revert of https://codereview.chromium.org/503833002/.)
This was necessary back when SkPaint was flattened even for in-process use. Now that we only flatten SkPaint for cross-process use, there's no need to serialize UniqueIDs.
Note: SkDropShadowImageFilter is being constructed with a croprect and UniqueID (of 0) in Blink. I've made the uniqueID param default to 0 temporarily, until this rolls in and Blink can be changed. (Blink can't be changed first, since unlike the other filters, there's no constructor that takes a cropRect but not a uniqueID.)
BUG=skia:
Review URL: https://codereview.chromium.org/1019493002