The benches for N <= 10 get around 2x faster on my N7 and N9. I believe this
is because of the reduced function-call-then-function-pointer-call overhead on
the N7, and additionally because it seems autovectorization beats our NEON code
for small N on the N9.
My desktop is unchanged, though that's probably because N=10 lies well within a
region where memset's performance is essentially constant: N=100 takes only
about 2x as long as N=1 and N=10, which perform nearly identically.
BUG=skia:
Review URL: https://codereview.chromium.org/1073863002
Instead of using a full-blown SkPaint to store run font info, use a
custom structure.
This saves 96 bytes / run on 64bit platforms.
R=reed@google.com,mtklein@google.com,joshualitt@google.com
Review URL: https://codereview.chromium.org/1070943002
This CL also adds a new parameter to SkBitmapSource which gives the user control of the filter quality.
BUG=472795
Review URL: https://codereview.chromium.org/1072603002
No diffs against head for DM --config 8888 gpu 2ndpic-8888 2ndpic-gpu.
picture_overhead_draw 1.62us -> 1.6us 0.99x
picture_overhead_nodraw 792ns -> 342ns 0.43x
tiles and serialization modes will also test this a bit.
BUG=chromium:470553
Review URL: https://codereview.chromium.org/1067893002
If no one has read the picture's unique ID, there's no point invalidating it.
This is the same trick we pull with SkPixelRefs.
Before:
26M 1 1.49µs 1.6µs 1.77µs 6.25µs 42% picture_overhead_draw
13M 32 742ns 749ns 756ns 823ns 2% picture_overhead_nodraw
After:
26M 1 1.27µs 1.33µs 1.49µs 5.51µs 45% picture_overhead_draw
14M 43 677ns 680ns 681ns 701ns 1% picture_overhead_nodraw
BUG=skia:
Review URL: https://codereview.chromium.org/1061283002
- It's no longer needed to help the (2011?) transition to SkAutoTUnref.
- It prevents us from making classes that go in SkAutoTUnrefs final,
i.e. all ref-counted classes.
This had better not have been public API...
TBR=reed@google.com
BUG=skia:
Review URL: https://codereview.chromium.org/1068443002
Add a virtual method on SkStream which will do a "peek" some bytes, so
that those bytes are read, but the next call to read will be
unaffected.
Implement peek for SkMemoryStream, where the implementation is simple
and obvious.
Implement peek on SkFrontBufferedStream.
Add tests.
Motivated by decoding streams which cannot be rewound.
TBR=reed@google.com
BUG=skia:3257
Review URL: https://codereview.chromium.org/1044953002
This mirrors the behavior in onGetPixels, and allows the implementation
to share code for handling calls to rewindIfNeeded.
This also fixes a bug where getScanlineDecoder was calling
rewindIfNeeded and treating the result as a bool.
In SkPngCodec, factor out the code to call rewindIfNeeded, and call it
in both onGetPixels and onGetScanlineDecoder.
Update the test to include testing the scanline decoder. Rename "gen"
to "codec" now that it must be an SkCodec.
BUG=skia:3257
Depends on https://codereview.chromium.org/1048423003/ (DIFFERENT ISSUE).
Review URL: https://codereview.chromium.org/1050893002
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
Need to land SK_SUPPORT_LEGACY_SCALAR_MAPPOINTS in chrome to suppress Affine
version which causes slight differences (which will need to be rebaselined)
BUG=skia:
Review URL: https://codereview.chromium.org/1045493002
I'd like to add a new API to SkStream for peeking - i.e. reading some
bytes without advancing the stream. This will be implemented for the
streams where it makes sense. I think the function should look
something like the following:
size_t peek(void* buffer, size_t bytesToRead) {
return this->onPeek(buffer, bytesToRead);
}
virtual size_t onPeek(void* buffer, size_t bytesToRead) {
return 0; // unimplemented base class.
}
In order to avoid confusion, I'd like to remove SkMemoryStream::peek(),
which is not currently used internally, by Chrome, or by Android as far
as I can tell. There is also another function does the same thing:
getPosition().
BUG=skia:3257
Review URL: https://codereview.chromium.org/1039373002
Replace the implicit curve intersection with a geometric curve intersection. The implicit intersection proved mathematically unstable and took a long time to zero in on an answer.
Use pointers instead of indices to refer to parts of curves. Indices required awkward renumbering.
Unify t and point values so that small intervals can be eliminated in one pass.
Break cubics up front to eliminate loops and cusps.
Make the Simplify and Op code more regular and eliminate arbitrary differences.
Add a builder that takes an array of paths and operators.
Delete unused code.
BUG=skia:3588
R=reed@google.com
Review URL: https://codereview.chromium.org/1037573004
Introduce a paint filter proxy base class as a SkDrawFilter replacement,
and convert SkDebugCanvas to use the new approach.
BUG=skia:3587
R=reed@google.com,mtklein@google.com,robertphillips@google.com,tomhudson@google.com
Review URL: https://codereview.chromium.org/1032173002
Add an interface for decoding scanlines, and implement that interface
in the PNG decoder.
Use a separate method to determine whether an image that used a type
with alpha was actually opaque.
SkScanlineDecoder.h:
New interface for decoding scanlines.
SkCodec.h:
Add getScanlineDecoder.
Add a virtual function (with non-virtual caller) for determining
whether the image truly had alpha. The client can call this to
determine if the image was actually opaque if it reported having alpha.
Remove code to sneakily change the passed in alpha type.
SkCodec_libpng.*:
Split up code onGetPixels into helper functions that can be shared with
the scanline decoder.
Implement scanline decoding.
Implement onReallyHasAlpha.
SkSwizzler.*:
Add a new SrcConfig as a default, which is invalid.
Add a function for setting fDstRow directly.
Assert fDstRow is not NULL.
BUG=skia:3257
Review URL: https://codereview.chromium.org/1010903003
Subclasses of SkCanvas may need to override the behavior here - for
example, any proxy or deferred canvas may not know its own size and
need to delegate to another object.
We'll also work on reducing use of this function
(https://skbug.com/3569), but some of the current uses seem to be
semantically necessary.
R=reed@google.com
BUG=skia:3566
Review URL: https://codereview.chromium.org/1022423002
The implementation is nearly identical to Sk2f, with these changes:
- float32x2_t -> float64x2_t
- vfoo -> vfooq
- one extra Newton's method step in sqrt().
Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON).
SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler
seem to set __ARM_NEON__), so this CL fixes everything up.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/e57b5cab261a243dcbefa74c91c896c28959bf09
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.7-Clang-Arm7-Debug-iOS-Trybot,Build-Ubuntu-GCC-Arm64-Release-Android-Trybot
Review URL: https://codereview.chromium.org/1020963002
Adds an SSE2 version of the Color32A_D565 function, to replace
the existing SSE4 version. Also does some minor cleanup.
Performance improvement in the following Skia benchmarks.
Measured on Atom Silvermont:
Xfermode_SrcOver - x3
luma_colorfilter_large - x4.6
luma_colorfilter_small - x2
tablebench - ~15%
chart_bw - ~10%
Measured on Corei7 Haswell:
luma_colorfilter_large running SSE2 - x2
luma_colorfilter_large running SSE4 - x2.3
Also improves performance in WPS Office application and 2D subtest of 0xbenchmark on Android.
Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
Review URL: https://codereview.chromium.org/923523002
The implementation is nearly identical to Sk2f, with these changes:
- float32x2_t -> float64x2_t
- vfoo -> vfooq
- one extra Newton's method step in sqrt().
Also, generally fix NEON detection to be defined(SK_ARM_HAS_NEON).
SK_ARM_HAS_NEON is not being set on ARM64 bots right now (nor does the compiler
seem to set __ARM_NEON__), so this CL fixes everything up.
BUG=skia:
Review URL: https://codereview.chromium.org/1020963002
(This is essentially a revert of https://codereview.chromium.org/503833002/.)
This was necessary back when SkPaint was flattened even for in-process use. Now that we only flatten SkPaint for cross-process use, there's no need to serialize UniqueIDs.
Note: SkDropShadowImageFilter is being constructed with a croprect and UniqueID (of 0) in Blink. I've made the uniqueID param default to 0 temporarily, until this rolls in and Blink can be changed. (Blink can't be changed first, since unlike the other filters, there's no constructor that takes a cropRect but not a uniqueID.)
BUG=skia:
Review URL: https://codereview.chromium.org/1019493002