Motivation: gross code simplification, also no bitset lookups at draw time.
SkPDFFont owns its glyph useage bitset.
SkPDFSubstituteMap goes away.
SkPDFObject interface is simplified.
SkPDFDocument tracks font usage (as hash set), not glyph usage.
SkPDFFont gets a simpler constructor.
SkPDFFont has first and last glyph set in constructor, not adjusted later.
SkPDFFont implementations are simplified.
SkPDFGlyphSet is replaced with simple SkBitSet.
SkPDFFont sizes its SkBitSets based on glyph count.
SkPDFGlyphSetMap goes away.
SkBitSet is now non-copyable.
SkBitSet now how utility methods to match old SkPDFGlyphSet.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2253283004
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Release-GDI-Trybot,Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Debug-GDI-Trybot
Review-Url: https://codereview.chromium.org/2253283004
Useful when:
(1) Client does not realize src and dst match (calls color
xform anyway).
(2) Client wants half floats, src and dst have matching
gamuts
(3) Client wants premul (done correctly in linear space),
src and dst have matching gamuts.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2206403003
Review-Url: https://codereview.chromium.org/2206403003
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Committed: https://skia.googlesource.com/skia/+/d22a817ff57986407facd16af36320fc86ce02da
Review-Url: https://codereview.chromium.org/2225393002
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Review-Url: https://codereview.chromium.org/2225393002
The recording bench must record some source material into some sort of
display list, and fundamentally cannot separate the timing of the two.
This CL makes it so the source material and display list are of the same type.
So instead of previous:
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkRecord-based picture -> threadsafe SkLiteDL
Now this times
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkLiteDL -> threadsafe SkLiteDL
This makes it easier to profile SkLiteDL and explore both recording and playback overhead hot spots.
The threadsafety is incidental for the source (and doesn't affect playback speed),
but I think it's handy to keep around on the destination to make a more fair comparison.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2230323002
Review-Url: https://codereview.chromium.org/2230323002
- This code is entirely private and is not being used by anything.
- In a future CL we will write a class that uses CurveMeasure to compute dash points. In order to determine whether CurveMeasure or PathMeasure should be faster, we need the dash info (the sum of the on/off intervals and how many there are)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2187083002
Review-Url: https://codereview.chromium.org/2187083002
About 9x faster than Murmur3 for long inputs.
Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot
Review-Url: https://codereview.chromium.org/2208903002
SkLiteRecorder, a new SkCanvas, fills out SkLiteDL, a new SkDrawable.
This SkDrawable is a display list similar to SkRecord and SkBigPicture / SkRecordedDrawable, but with a few new design points inspired by Android and slimming paint:
1) SkLiteDL is structured as one big contiguous array rather than the two layer structure of SkRecord. This trades away flexibility and large-op-count performance for better data locality for small to medium size pictures.
2) We keep a global freelist of SkLiteDLs, both reusing the SkLiteDL struct itself and its contiguous byte array. This keeps the expected number of mallocs per display list allocation <1 (really, ~0) for cyclical use cases.
These two together mean recording is faster. Measuring against the code we use at head, SkLiteRecorder trends about ~3x faster across various size pictures, matching speed at 0 draws and beating the special-case 1-draw pictures we have today. (I.e. we won't need those special case implementations anymore, because they're slower than this new generic code.) This new strategy records 10 drawRects() in about the same time the old strategy took for 2.
This strategy stays the winner until at least 500 drawRect()s on my laptop, where I stopped checking.
A simpler alternative to freelisting is also possible (but not implemented here), where we allow the client to manually reset() an SkLiteDL for reuse when its refcnt is 1. That's essentially what we're doing with the freelist, except tracking what's available for reuse globally instead of making the client do it.
This code is not fully capable yet, but most of the key design points are there. The internal structure of SkLiteDL is the area I expect to be most volatile (anything involving Op), but its interface and the whole of SkLiteRecorder ought to be just about done.
You can run nanobench --match picture_overhead as a demo. Everything it exercises is fully fleshed out, so what it tests is an apples-to-apples comparison as far as recording costs go. I have not yet compared playback performance.
It should be simple to wrap this into an SkPicture subclass if we want.
I won't start proposing we replace anything old with anything new quite yet until I have more ducks in a row, but this does look pretty promising (similar to the SkRecord over old SkPicture change a couple years ago) and I'd like to land, experiment, iterate, especially with an eye toward Android.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2213333002
Review-Url: https://codereview.chromium.org/2213333002
With the move from SkData::NewXXX to SkData::MakeXXX most
SkAutoTUnref<SkData> were changed to sk_sp<SkData>. However,
there are still a few SkAutoTUnref<SkData> around, so clean
them up.
Review-Url: https://codereview.chromium.org/2212493002
Most visibly this adds a macro SK_RASTER_STAGE that cuts down on the boilerplate of defining a raster pipeline stage function.
Most interestingly, SK_RASTER_STAGE doesn't define a SkRasterPipeline::Fn, but rather a new type EasyFn. This function is always static and inlined, and the details of interacting with the SkRasterPipeline::Stage are taken care of for you: ctx is just passed as a void*, and st->next() is always called. All EasyFns have to do is take care of the meat of the work: update r,g,b, etc. and read and write from their context.
The really neat new feature here is that you can either add EasyFns to a pipeline with the new append() functions, _or_ call them directly yourself. This lets you use the same set of pieces to build either a pipelined version of the function or a custom, fused version. The bench shows this off.
On my desktop, the pipeline version of the bench takes about 25% more time to run than the fused one.
The old approach to creating stages still works fine. I haven't updated SkXfermode.cpp or SkArithmeticMode.cpp because they seemed just as clear using Fn directly as they would have using EasyFn.
If this looks okay to you I will rework the comments in SkRasterPipeline to explain SK_RASTER_STAGE and EasyFn a bit as I've done here in the CL description.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2195853002
Review-Url: https://codereview.chromium.org/2195853002
Motivation:
SkPDFStream and SkPDFSharedStream now work the same.
Also:
- move SkPDFStream into SkPDFTypes (it's a fundamental PDF type).
- minor refactor of SkPDFSharedStream
- SkPDFSharedStream takes unique_ptr to represent ownership
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2190883003
Review-Url: https://codereview.chromium.org/2190883003
AtomicTest was the only use of sk_atomic_add().
AtomicInc64 bench was the only use of sk_atomic_inc(int64_t*).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183473005
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-TSAN-Trybot,Test-Ubuntu-GCC-Golo-GPU-GT610-x86_64-Release-TSAN-Trybot
Review-Url: https://codereview.chromium.org/2183473005
This trims the SkPM4fPriv methods down to just foolproof methods.
(Anything trying to build these itself is probably wrong.)
Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore,
at least not efficiently, so this refactor is somewhat more invasive
than you might think. Generally this means things using to_4f() are
also making a misstep... that's gone too.
It also does not make sense to try to play games with linear floats
with 255 bias any more. That hack can't work with real sRGB coding.
Rather than update them, I've removed a couple of L32 xfermode fast
paths. I'd even rather drop it entirely...
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2163683002
I basically just ran a big 5-deep for-loop over the five constants here.
This is the first set of coefficients I found that round trips all bytes.
I suspect there are many such sets.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2162063003
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
SkPDFUtils now has a special function (SkPDFUtils::AppendColorComponent)
just for writing out (color/255) as a decimal with three digits of
precision.
SkPDFUnion now has a type to represent a color component. It holds a
utint_8, but calls into AppendColorComponent to serialize.
Added a unit test that tests all possible input values.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2151863003
Review-Url: https://codereview.chromium.org/2151863003
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
If we make sure all SkOpts functions are static, we can give the namespaces any
name we like. This lets us drop the sk_ prefix and give a real indication of
the default SIMD instruction set rather than just saying sk_default.
Both of these changes help debugger, profiler, and crash report readability.
Perhaps more importantly, keeping these functions static helps prevent
accidentally linking in unused versions of functions, as you see here with
sk_avx::srcover_srgb_srgb().
This requires we update SkBlend_opts tests and benches to call SkOpts functions
through SkOpts rather than declaring the methods externally. In practice this
drops testing of the SSE2 version on machines with SSE4. If we still really
need to test/bench the compile time best SIMD level version of this method
against the runtime detected best, we can include SkBlend_opts.h into the tests
or benches directly, similar to what we do for the trivial, brute-force, or best
non-SIMD versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145833002
SkMatrix::scale and ::rotate take a point around which to scale or rotate.
Canvas lacks these helpers, so the code to rotate a canvas around a
point has been duplicated many times. Factor all of these
implementations into SkCanvas::rotate.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2142033002
Review-Url: https://codereview.chromium.org/2142033002