Most visibly this adds a macro SK_RASTER_STAGE that cuts down on the boilerplate of defining a raster pipeline stage function.
Most interestingly, SK_RASTER_STAGE doesn't define a SkRasterPipeline::Fn, but rather a new type EasyFn. This function is always static and inlined, and the details of interacting with the SkRasterPipeline::Stage are taken care of for you: ctx is just passed as a void*, and st->next() is always called. All EasyFns have to do is take care of the meat of the work: update r,g,b, etc. and read and write from their context.
The really neat new feature here is that you can either add EasyFns to a pipeline with the new append() functions, _or_ call them directly yourself. This lets you use the same set of pieces to build either a pipelined version of the function or a custom, fused version. The bench shows this off.
On my desktop, the pipeline version of the bench takes about 25% more time to run than the fused one.
The old approach to creating stages still works fine. I haven't updated SkXfermode.cpp or SkArithmeticMode.cpp because they seemed just as clear using Fn directly as they would have using EasyFn.
If this looks okay to you I will rework the comments in SkRasterPipeline to explain SK_RASTER_STAGE and EasyFn a bit as I've done here in the CL description.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2195853002
Review-Url: https://codereview.chromium.org/2195853002
Motivation:
SkPDFStream and SkPDFSharedStream now work the same.
Also:
- move SkPDFStream into SkPDFTypes (it's a fundamental PDF type).
- minor refactor of SkPDFSharedStream
- SkPDFSharedStream takes unique_ptr to represent ownership
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2190883003
Review-Url: https://codereview.chromium.org/2190883003
AtomicTest was the only use of sk_atomic_add().
AtomicInc64 bench was the only use of sk_atomic_inc(int64_t*).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183473005
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-TSAN-Trybot,Test-Ubuntu-GCC-Golo-GPU-GT610-x86_64-Release-TSAN-Trybot
Review-Url: https://codereview.chromium.org/2183473005
This trims the SkPM4fPriv methods down to just foolproof methods.
(Anything trying to build these itself is probably wrong.)
Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore,
at least not efficiently, so this refactor is somewhat more invasive
than you might think. Generally this means things using to_4f() are
also making a misstep... that's gone too.
It also does not make sense to try to play games with linear floats
with 255 bias any more. That hack can't work with real sRGB coding.
Rather than update them, I've removed a couple of L32 xfermode fast
paths. I'd even rather drop it entirely...
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2163683002
I basically just ran a big 5-deep for-loop over the five constants here.
This is the first set of coefficients I found that round trips all bytes.
I suspect there are many such sets.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2162063003
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
SkPDFUtils now has a special function (SkPDFUtils::AppendColorComponent)
just for writing out (color/255) as a decimal with three digits of
precision.
SkPDFUnion now has a type to represent a color component. It holds a
utint_8, but calls into AppendColorComponent to serialize.
Added a unit test that tests all possible input values.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2151863003
Review-Url: https://codereview.chromium.org/2151863003
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
If we make sure all SkOpts functions are static, we can give the namespaces any
name we like. This lets us drop the sk_ prefix and give a real indication of
the default SIMD instruction set rather than just saying sk_default.
Both of these changes help debugger, profiler, and crash report readability.
Perhaps more importantly, keeping these functions static helps prevent
accidentally linking in unused versions of functions, as you see here with
sk_avx::srcover_srgb_srgb().
This requires we update SkBlend_opts tests and benches to call SkOpts functions
through SkOpts rather than declaring the methods externally. In practice this
drops testing of the SSE2 version on machines with SSE4. If we still really
need to test/bench the compile time best SIMD level version of this method
against the runtime detected best, we can include SkBlend_opts.h into the tests
or benches directly, similar to what we do for the trivial, brute-force, or best
non-SIMD versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145833002
SkMatrix::scale and ::rotate take a point around which to scale or rotate.
Canvas lacks these helpers, so the code to rotate a canvas around a
point has been duplicated many times. Factor all of these
implementations into SkCanvas::rotate.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2142033002
Review-Url: https://codereview.chromium.org/2142033002
Adds a module that performs instanced rendering and starts using it
for a select subset of draws on Mac GL platforms. The instance
processor can currently handle rects, ovals, round rects, and double
round rects. It can generalize shapes as round rects in order to
improve batching. The instance processor also employs new drawing
algorithms, irrespective of instanced rendering, that improve GPU-side
performance (e.g. sample mask, different triangle layouts, etc.).
This change only scratches the surface of instanced rendering. The
majority of draws still only have one instance. Future work may
include:
* Passing coord transforms through the texel buffer.
* Sending FP uniforms through instanced vertex attribs.
* Using instanced rendering for more draws (stencil writes,
drawAtlas, etc.).
* Adding more shapes to the instance processor’s repertoire.
* Batching draws that have mismatched scissors (analyzing draw
bounds, inserting clip planes, etc.).
* Bindless textures.
* Uber shaders.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2066993003
Committed: https://skia.googlesource.com/skia/+/42eafa4bc00354b132ad114d22ed6b95d8849891
Review-Url: https://codereview.chromium.org/2066993003