AtomicTest was the only use of sk_atomic_add().
AtomicInc64 bench was the only use of sk_atomic_inc(int64_t*).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183473005
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-TSAN-Trybot,Test-Ubuntu-GCC-Golo-GPU-GT610-x86_64-Release-TSAN-Trybot
Review-Url: https://codereview.chromium.org/2183473005
This trims the SkPM4fPriv methods down to just foolproof methods.
(Anything trying to build these itself is probably wrong.)
Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore,
at least not efficiently, so this refactor is somewhat more invasive
than you might think. Generally this means things using to_4f() are
also making a misstep... that's gone too.
It also does not make sense to try to play games with linear floats
with 255 bias any more. That hack can't work with real sRGB coding.
Rather than update them, I've removed a couple of L32 xfermode fast
paths. I'd even rather drop it entirely...
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2163683002
I basically just ran a big 5-deep for-loop over the five constants here.
This is the first set of coefficients I found that round trips all bytes.
I suspect there are many such sets.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2162063003
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
SkPDFUtils now has a special function (SkPDFUtils::AppendColorComponent)
just for writing out (color/255) as a decimal with three digits of
precision.
SkPDFUnion now has a type to represent a color component. It holds a
utint_8, but calls into AppendColorComponent to serialize.
Added a unit test that tests all possible input values.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2151863003
Review-Url: https://codereview.chromium.org/2151863003
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
If we make sure all SkOpts functions are static, we can give the namespaces any
name we like. This lets us drop the sk_ prefix and give a real indication of
the default SIMD instruction set rather than just saying sk_default.
Both of these changes help debugger, profiler, and crash report readability.
Perhaps more importantly, keeping these functions static helps prevent
accidentally linking in unused versions of functions, as you see here with
sk_avx::srcover_srgb_srgb().
This requires we update SkBlend_opts tests and benches to call SkOpts functions
through SkOpts rather than declaring the methods externally. In practice this
drops testing of the SSE2 version on machines with SSE4. If we still really
need to test/bench the compile time best SIMD level version of this method
against the runtime detected best, we can include SkBlend_opts.h into the tests
or benches directly, similar to what we do for the trivial, brute-force, or best
non-SIMD versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145833002
SkMatrix::scale and ::rotate take a point around which to scale or rotate.
Canvas lacks these helpers, so the code to rotate a canvas around a
point has been duplicated many times. Factor all of these
implementations into SkCanvas::rotate.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2142033002
Review-Url: https://codereview.chromium.org/2142033002
Adds a module that performs instanced rendering and starts using it
for a select subset of draws on Mac GL platforms. The instance
processor can currently handle rects, ovals, round rects, and double
round rects. It can generalize shapes as round rects in order to
improve batching. The instance processor also employs new drawing
algorithms, irrespective of instanced rendering, that improve GPU-side
performance (e.g. sample mask, different triangle layouts, etc.).
This change only scratches the surface of instanced rendering. The
majority of draws still only have one instance. Future work may
include:
* Passing coord transforms through the texel buffer.
* Sending FP uniforms through instanced vertex attribs.
* Using instanced rendering for more draws (stencil writes,
drawAtlas, etc.).
* Adding more shapes to the instance processor’s repertoire.
* Batching draws that have mismatched scissors (analyzing draw
bounds, inserting clip planes, etc.).
* Bindless textures.
* Uber shaders.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2066993003
Committed: https://skia.googlesource.com/skia/+/42eafa4bc00354b132ad114d22ed6b95d8849891
Review-Url: https://codereview.chromium.org/2066993003
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
QCMS Xform 0.495 ms
Skia Old Xform 0.235 ms
Skia NEW Xform 0.423 ms
Vs Old Code 0.56x
Vs QCMS 1.17x
So to summarize, we are now much slower than before,
but still a bit faster than QCMS. And now we are also
far more accurate than QCMS :).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2060823003
Because we recognize commonly used gamma tables and
parameters as 2.2f, about 98% of jpegs with color profiles
will pass through this xform (assuming the dst is also
2.2f). Sample size is 10,322 jpegs.
I won't go crazy with performance numbers because this is
a work in progress, particularly in terms of correctness.
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
Decode Time + QCMS Xform 1.28 ms
QCMS Xform Only 0.495 ms
Decode Time + Skia Opt Xform 1.01 ms
Skia Opt Xform Only 0.235 ms
Decode Time + Xform Speed-up 1.27x
Xform Only Speed-up 2.11x
FWIW, Skia xform time before these optimizations was
41.1 ms. But we expected that code to be slow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2046013002
$ git grep -l '<windows.h>' include src
include/private/SkLeanWindows.h
$ git grep -l SkLeanWindows.h | grep '\.h$'
include/ports/SkTypeface_win.h
include/utils/win/SkHRESULT.h
include/utils/win/SkTScopedComPtr.h
include/views/SkEvent.h
src/core/SkMathPriv.h
src/ports/SkTypeface_win_dw.h
src/utils/SkThreadUtils_win.h
src/utils/win/SkWGL.h
The same for `#include <intrin.h>` that was found in SkMath.h.
Those functions that needed it are moved to SkMathPriv.h.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041943002
CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_chromium_compile_dbg_ng,win_chromium_compile_rel_ng
Review-Url: https://codereview.chromium.org/2041943002