This is handy now, and becomes necessary with fancier backends:
- most code can't speak the type of AVX pipeline stages,
so indirection's definitely needed there;
- if the pipleine is entirely composed of stock stages,
these enum values become an abstract recipe that can be JITted.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342
Reviewed-on: https://skia-review.googlesource.com/2782
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX.
I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now.
That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative.
For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc
Reviewed-on: https://skia-review.googlesource.com/2752
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
In some build configurations (I think, GN, GCC 6, Debug) I get a warning that i is used unintialized. This likely has something to do with GCC correctly seeing that the SkTCast construction there is illegal aliasing, and perhaps thus "doesn't happen". Might be that if the SkTCast gets inlined, it decides its implementation is secretly kosher, and so Release builds don't see this. None of this happens with the GCCs we have on the bots... too old?
Instead use memcpy() here, which is well defined to do what we intended.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2758
Change-Id: Iaf5c75fbd852193b0b861bf5e71450502511d102
Reviewed-on: https://skia-review.googlesource.com/2758
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
We used to step at a 4-pixel stride as long as possible, then run up to 3 times, one pixel at a time. Now replace those 1-at-a-time runs with a single tail stamp if there are 1-3 remaining pixels.
This style is simply more efficient: e.g. we'll blend and lerp once for 3 pixels instead of 3 times. This should make short blits significantly more efficient. It's also more future-oriented... AVX+ on Intel and SVE on ARM support masked loads and stores, so we can do the entire tail in one direct step.
This also makes it possible to re-arrange the code a bit to encapsulate each stage better. I think generally this code reads more clearly than the old code, but YMMV. I've arranged things so you write one function, but it's compiled into two specializations, one for tail=0 (Body) and one for tail>0 (Tail). It's pretty tidy.
For now I've just burned a register to pass around tail. It's 2 bits now, maybe soon 3 with AVX, and capped at 4 for even the craziest new toys, so there are plenty of places we can pack it if we want to get clever.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2717
Change-Id: I45852a3e5d4c5b5e9315302c46601aee0d32265f
Reviewed-on: https://skia-review.googlesource.com/2717
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Today if you use the simple SK_RASTER_STAGE interface to build a pipeline, each stage you add calls into a next stage. The last stage you add calls into a special backstop stage JustReturn that, well, just returns, ending the pipeline.
This adds last(), which cuts that last stage off the pipeline. Instead, the stage you add using last() returns directly, ending the pipeline itself without jumping into JustReturn.
This reduces the overhead of using the pipelined version of SkRasterPipelineBench from ~25% to ~20% on my desktop.
Also, add docs.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2713
Change-Id: I11469378e2765c6e34db52eb3eef648d6612da3f
Reviewed-on: https://skia-review.googlesource.com/2713
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
It was not a fan of this (blatant) aliasing.
I suspect this best_non_simd_srcover_srgb_srgb() function has several
other aliasing issues that use undefined behavior, but this is all it's
complaining about for now.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2606
Change-Id: I25a8800e810bccf5068c8a10e9c8c8f565e57304
Reviewed-on: https://skia-review.googlesource.com/2606
Commit-Queue: Mike Klein <mtklein@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Reason for revert:
Hitting an assert
Original issue's description:
> Support Float32 output from SkColorSpaceXform
>
> * Adds Float32 support to SkColorSpaceXform
> * Changes API to allows clients to ask for F32, updates clients to
> new API
> * Adds Sk4f_load4 and Sk4f_store4 to SkNx
> * Make use of new xform in SkGr.cpp
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
> CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2347473007
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2339233003
The SkFontData type is not exposed externally, so any method which uses
it can be updated to use smart pointers without affecting external
users. Updating this first will make updating the public API much
easier.
This also updates SkStreamAsset* SkStream::NewFromFile(const char*) to
std::unique_ptr<SkStreamAsset> SkStream::MakeFromFile(const char*). It
appears that no one outside Skia is currently using SkStream::NewfromFile
so this is a good time to update it as well.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339273002
Committed: https://skia.googlesource.com/skia/+/d8c2476a8b1e1e1a1771b17e8dd4db8645914f8c
Review-Url: https://codereview.chromium.org/2339273002
Reason for revert:
Killing Mac
Original issue's description:
> SkFontData to use smart pointers.
>
> The SkFontData type is not exposed externally, so any method which uses
> it can be updated to use smart pointers without affecting external
> users. Updating this first will make updating the public API much
> easier.
>
> This also updates SkStreamAsset* SkStream::NewFromFile(const char*) to
> std::unique_ptr<SkStreamAsset> SkStream::MakeFromFile(const char*). It
> appears that no one outside Skia is currently using SkStream::NewfromFile
> so this is a good time to update it as well.
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339273002
>
> Committed: https://skia.googlesource.com/skia/+/d8c2476a8b1e1e1a1771b17e8dd4db8645914f8cTBR=mtklein@chromium.org,halcanary@google.com,mtklein@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review-Url: https://codereview.chromium.org/2343933002
The SkFontData type is not exposed externally, so any method which uses
it can be updated to use smart pointers without affecting external
users. Updating this first will make updating the public API much
easier.
This also updates SkStreamAsset* SkStream::NewFromFile(const char*) to
std::unique_ptr<SkStreamAsset> SkStream::MakeFromFile(const char*). It
appears that no one outside Skia is currently using SkStream::NewfromFile
so this is a good time to update it as well.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339273002
Review-Url: https://codereview.chromium.org/2339273002
We used to build src and dst transfer fn tables
every time a new xform was created with linear
src and dst. Now we don't compute them because
we don't need them.
This will make SkColorSpaceXform a far better
option for any xforms with float or half-float
inputs or outputs, particularly on a small number
of pixels.
This CL also moves SkColorSpaceXform closer to
what I anticipate will be the eventual 'API design'.
I think apply() will want to take a SrcColorType enum
(not created yet because it's not necessary yet) and
a DstColorType enum (still using SkColorType because
there's not yet a reason not to).
Performance changes:
toSRGB 341us -> 366us
to2Dot2 404us -> 403us
toF16 318us -> 304us
There's no reason for toSRGB or to2Dot2 to change.
The refactor seems to have caused the compiler to
order the instructions a little differently...
This is something to come back to if we need to
squeeze more performance out of sRGB. For now,
let's not be held up by something we don't control.
F16 likely improves because we are no longer
(unnecessarily) building the linear tables.
Code size gets a little bigger. Measuring
SkColorSpaceXform size as a percentage of src/ size,
we go from 0.8% to 1.4%.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2335723002
Review-Url: https://codereview.chromium.org/2335723002
HWUI skips transparent rects when drawing.
When skia draws using bilerp, we will blend
transparent rects with neighboring rects and might
draw a bit of a smudge.
This CL adds the option to skip rects, allowing us
to have compatible behavior with the framework.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2305433002
Review-Url: https://codereview.chromium.org/2305433002
Verify the rules that we're converging on for surfaces:
- For 8888, we only support sRGB-like gamma, or no color space at all.
- For F16, we require a color space, with linear gamma.
- For all other formats, we do not support color spaces.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2270823002
Review-Url: https://codereview.chromium.org/2270823002
Motivation: gross code simplification, also no bitset lookups at draw time.
SkPDFFont owns its glyph useage bitset.
SkPDFSubstituteMap goes away.
SkPDFObject interface is simplified.
SkPDFDocument tracks font usage (as hash set), not glyph usage.
SkPDFFont gets a simpler constructor.
SkPDFFont has first and last glyph set in constructor, not adjusted later.
SkPDFFont implementations are simplified.
SkPDFGlyphSet is replaced with simple SkBitSet.
SkPDFFont sizes its SkBitSets based on glyph count.
SkPDFGlyphSetMap goes away.
SkBitSet is now non-copyable.
SkBitSet now how utility methods to match old SkPDFGlyphSet.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2253283004
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Release-GDI-Trybot,Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Debug-GDI-Trybot
Review-Url: https://codereview.chromium.org/2253283004
Useful when:
(1) Client does not realize src and dst match (calls color
xform anyway).
(2) Client wants half floats, src and dst have matching
gamuts
(3) Client wants premul (done correctly in linear space),
src and dst have matching gamuts.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2206403003
Review-Url: https://codereview.chromium.org/2206403003
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Committed: https://skia.googlesource.com/skia/+/d22a817ff57986407facd16af36320fc86ce02da
Review-Url: https://codereview.chromium.org/2225393002
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Review-Url: https://codereview.chromium.org/2225393002
The recording bench must record some source material into some sort of
display list, and fundamentally cannot separate the timing of the two.
This CL makes it so the source material and display list are of the same type.
So instead of previous:
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkRecord-based picture -> threadsafe SkLiteDL
Now this times
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkLiteDL -> threadsafe SkLiteDL
This makes it easier to profile SkLiteDL and explore both recording and playback overhead hot spots.
The threadsafety is incidental for the source (and doesn't affect playback speed),
but I think it's handy to keep around on the destination to make a more fair comparison.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2230323002
Review-Url: https://codereview.chromium.org/2230323002
- This code is entirely private and is not being used by anything.
- In a future CL we will write a class that uses CurveMeasure to compute dash points. In order to determine whether CurveMeasure or PathMeasure should be faster, we need the dash info (the sum of the on/off intervals and how many there are)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2187083002
Review-Url: https://codereview.chromium.org/2187083002
About 9x faster than Murmur3 for long inputs.
Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot
Review-Url: https://codereview.chromium.org/2208903002
SkLiteRecorder, a new SkCanvas, fills out SkLiteDL, a new SkDrawable.
This SkDrawable is a display list similar to SkRecord and SkBigPicture / SkRecordedDrawable, but with a few new design points inspired by Android and slimming paint:
1) SkLiteDL is structured as one big contiguous array rather than the two layer structure of SkRecord. This trades away flexibility and large-op-count performance for better data locality for small to medium size pictures.
2) We keep a global freelist of SkLiteDLs, both reusing the SkLiteDL struct itself and its contiguous byte array. This keeps the expected number of mallocs per display list allocation <1 (really, ~0) for cyclical use cases.
These two together mean recording is faster. Measuring against the code we use at head, SkLiteRecorder trends about ~3x faster across various size pictures, matching speed at 0 draws and beating the special-case 1-draw pictures we have today. (I.e. we won't need those special case implementations anymore, because they're slower than this new generic code.) This new strategy records 10 drawRects() in about the same time the old strategy took for 2.
This strategy stays the winner until at least 500 drawRect()s on my laptop, where I stopped checking.
A simpler alternative to freelisting is also possible (but not implemented here), where we allow the client to manually reset() an SkLiteDL for reuse when its refcnt is 1. That's essentially what we're doing with the freelist, except tracking what's available for reuse globally instead of making the client do it.
This code is not fully capable yet, but most of the key design points are there. The internal structure of SkLiteDL is the area I expect to be most volatile (anything involving Op), but its interface and the whole of SkLiteRecorder ought to be just about done.
You can run nanobench --match picture_overhead as a demo. Everything it exercises is fully fleshed out, so what it tests is an apples-to-apples comparison as far as recording costs go. I have not yet compared playback performance.
It should be simple to wrap this into an SkPicture subclass if we want.
I won't start proposing we replace anything old with anything new quite yet until I have more ducks in a row, but this does look pretty promising (similar to the SkRecord over old SkPicture change a couple years ago) and I'd like to land, experiment, iterate, especially with an eye toward Android.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2213333002
Review-Url: https://codereview.chromium.org/2213333002
With the move from SkData::NewXXX to SkData::MakeXXX most
SkAutoTUnref<SkData> were changed to sk_sp<SkData>. However,
there are still a few SkAutoTUnref<SkData> around, so clean
them up.
Review-Url: https://codereview.chromium.org/2212493002
Most visibly this adds a macro SK_RASTER_STAGE that cuts down on the boilerplate of defining a raster pipeline stage function.
Most interestingly, SK_RASTER_STAGE doesn't define a SkRasterPipeline::Fn, but rather a new type EasyFn. This function is always static and inlined, and the details of interacting with the SkRasterPipeline::Stage are taken care of for you: ctx is just passed as a void*, and st->next() is always called. All EasyFns have to do is take care of the meat of the work: update r,g,b, etc. and read and write from their context.
The really neat new feature here is that you can either add EasyFns to a pipeline with the new append() functions, _or_ call them directly yourself. This lets you use the same set of pieces to build either a pipelined version of the function or a custom, fused version. The bench shows this off.
On my desktop, the pipeline version of the bench takes about 25% more time to run than the fused one.
The old approach to creating stages still works fine. I haven't updated SkXfermode.cpp or SkArithmeticMode.cpp because they seemed just as clear using Fn directly as they would have using EasyFn.
If this looks okay to you I will rework the comments in SkRasterPipeline to explain SK_RASTER_STAGE and EasyFn a bit as I've done here in the CL description.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2195853002
Review-Url: https://codereview.chromium.org/2195853002
Motivation:
SkPDFStream and SkPDFSharedStream now work the same.
Also:
- move SkPDFStream into SkPDFTypes (it's a fundamental PDF type).
- minor refactor of SkPDFSharedStream
- SkPDFSharedStream takes unique_ptr to represent ownership
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2190883003
Review-Url: https://codereview.chromium.org/2190883003
AtomicTest was the only use of sk_atomic_add().
AtomicInc64 bench was the only use of sk_atomic_inc(int64_t*).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183473005
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-TSAN-Trybot,Test-Ubuntu-GCC-Golo-GPU-GT610-x86_64-Release-TSAN-Trybot
Review-Url: https://codereview.chromium.org/2183473005
This trims the SkPM4fPriv methods down to just foolproof methods.
(Anything trying to build these itself is probably wrong.)
Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore,
at least not efficiently, so this refactor is somewhat more invasive
than you might think. Generally this means things using to_4f() are
also making a misstep... that's gone too.
It also does not make sense to try to play games with linear floats
with 255 bias any more. That hack can't work with real sRGB coding.
Rather than update them, I've removed a couple of L32 xfermode fast
paths. I'd even rather drop it entirely...
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2163683002
I basically just ran a big 5-deep for-loop over the five constants here.
This is the first set of coefficients I found that round trips all bytes.
I suspect there are many such sets.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2162063003
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
SkPDFUtils now has a special function (SkPDFUtils::AppendColorComponent)
just for writing out (color/255) as a decimal with three digits of
precision.
SkPDFUnion now has a type to represent a color component. It holds a
utint_8, but calls into AppendColorComponent to serialize.
Added a unit test that tests all possible input values.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2151863003
Review-Url: https://codereview.chromium.org/2151863003
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
If we make sure all SkOpts functions are static, we can give the namespaces any
name we like. This lets us drop the sk_ prefix and give a real indication of
the default SIMD instruction set rather than just saying sk_default.
Both of these changes help debugger, profiler, and crash report readability.
Perhaps more importantly, keeping these functions static helps prevent
accidentally linking in unused versions of functions, as you see here with
sk_avx::srcover_srgb_srgb().
This requires we update SkBlend_opts tests and benches to call SkOpts functions
through SkOpts rather than declaring the methods externally. In practice this
drops testing of the SSE2 version on machines with SSE4. If we still really
need to test/bench the compile time best SIMD level version of this method
against the runtime detected best, we can include SkBlend_opts.h into the tests
or benches directly, similar to what we do for the trivial, brute-force, or best
non-SIMD versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145833002
SkMatrix::scale and ::rotate take a point around which to scale or rotate.
Canvas lacks these helpers, so the code to rotate a canvas around a
point has been duplicated many times. Factor all of these
implementations into SkCanvas::rotate.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2142033002
Review-Url: https://codereview.chromium.org/2142033002
Adds a module that performs instanced rendering and starts using it
for a select subset of draws on Mac GL platforms. The instance
processor can currently handle rects, ovals, round rects, and double
round rects. It can generalize shapes as round rects in order to
improve batching. The instance processor also employs new drawing
algorithms, irrespective of instanced rendering, that improve GPU-side
performance (e.g. sample mask, different triangle layouts, etc.).
This change only scratches the surface of instanced rendering. The
majority of draws still only have one instance. Future work may
include:
* Passing coord transforms through the texel buffer.
* Sending FP uniforms through instanced vertex attribs.
* Using instanced rendering for more draws (stencil writes,
drawAtlas, etc.).
* Adding more shapes to the instance processor’s repertoire.
* Batching draws that have mismatched scissors (analyzing draw
bounds, inserting clip planes, etc.).
* Bindless textures.
* Uber shaders.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2066993003
Committed: https://skia.googlesource.com/skia/+/42eafa4bc00354b132ad114d22ed6b95d8849891
Review-Url: https://codereview.chromium.org/2066993003
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
QCMS Xform 0.495 ms
Skia Old Xform 0.235 ms
Skia NEW Xform 0.423 ms
Vs Old Code 0.56x
Vs QCMS 1.17x
So to summarize, we are now much slower than before,
but still a bit faster than QCMS. And now we are also
far more accurate than QCMS :).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2060823003
Because we recognize commonly used gamma tables and
parameters as 2.2f, about 98% of jpegs with color profiles
will pass through this xform (assuming the dst is also
2.2f). Sample size is 10,322 jpegs.
I won't go crazy with performance numbers because this is
a work in progress, particularly in terms of correctness.
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
Decode Time + QCMS Xform 1.28 ms
QCMS Xform Only 0.495 ms
Decode Time + Skia Opt Xform 1.01 ms
Skia Opt Xform Only 0.235 ms
Decode Time + Xform Speed-up 1.27x
Xform Only Speed-up 2.11x
FWIW, Skia xform time before these optimizations was
41.1 ms. But we expected that code to be slow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2046013002
$ git grep -l '<windows.h>' include src
include/private/SkLeanWindows.h
$ git grep -l SkLeanWindows.h | grep '\.h$'
include/ports/SkTypeface_win.h
include/utils/win/SkHRESULT.h
include/utils/win/SkTScopedComPtr.h
include/views/SkEvent.h
src/core/SkMathPriv.h
src/ports/SkTypeface_win_dw.h
src/utils/SkThreadUtils_win.h
src/utils/win/SkWGL.h
The same for `#include <intrin.h>` that was found in SkMath.h.
Those functions that needed it are moved to SkMathPriv.h.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041943002
CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_chromium_compile_dbg_ng,win_chromium_compile_rel_ng
Review-Url: https://codereview.chromium.org/2041943002
When targetting iOS and using gyp to generate the build files, it is not
possible to select files to build depending on the architecture. Due to
that, the skia code was disabling all optimisation when SK_BUILD_FOR_IOS
was defined.
Since it is possible to select the correct optimised version when using
gn, this pessimisation is hurting the build. Introduce a new define to
disable the optimisation SK_BUILD_NO_OPTS. It will be used by Chromium
when building skia for iOS with gyp but not gn.
Define SK_BUILD_NO_OPTS along-side SK_BUILD_FOR_IOS for all files that
look like build configuration (Xcode projects, gyp configuration files,
public.bzl) in order to avoid introducing breakage on those builds.
BUG=607933
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2002423002
Review-Url: https://codereview.chromium.org/2002423002
- remove dead code
- rewrite float -> int converters
The strategy for the new converters is:
- convert input to double
- floor/ceil/round in double space
- pin that double to [SK_MinS32, SK_MaxS32]
- truncate that double to int32_t
This simpler strategy does not work:
- floor/ceil/round in float space
- pin that float to [SK_MinS32, SK_MaxS32]
- truncate that float to int32_t
SK_MinS32 and SK_MaxS32 are not representable as floats:
they round to the nearest float, ±2^31, which makes the
pin insufficient for floats near SK_MinS32 (-2^31+1) or
SK_MaxS32 (+2^31-1).
float only has 24 bits of precision, and we need 31.
double can represent all integers up to 50-something bits.
An alternative is to pin in float to ±2147483520, the last
exactly representable float before SK_MaxS32 (127 too small).
Our tests test that we round as floor(x+0.5), which can
return different numbers than round(x) for negative x.
So this CL explicitly uses floor(x+0.5).
I've updated the tests with ±inf and ±NaN, and tried to
make them a little clearer, especially using SK_MinS32
instead of -SK_MaxS32.
I have not timed anything here. I have never seen any of these
methods in a profile.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2012333003
Review-Url: https://codereview.chromium.org/2012333003
This reverts commit 554784cd85 and
1956b4ae1c
Reason for revert - ASAN failures, e.g. from https://uberchromegw.corp.google.com/i/client.skia/builders/Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug-MSAN/builds/2233/steps/perf_skia%20on%20Ubuntu/logs/stdio :
Uninitialized value was created by a heap allocation
0 0x7f69aa96f799 in operator new[](unsigned long) /b/work/skia/third_party/externals/llvm/out/../projects/compiler-rt/lib/msan/msan_new_delete.cc:37
1 0x7f69aaa315c1 in SkAutoTArray<unsigned int>::reset(int) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../include/private/../private/SkTemplates.h:137:22
2 0x7f69aaa34ee9 in LinearSrcOverBench<SrcOverVSkOptsSSE41>::LinearSrcOverBench(char const*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:108:9
3 0x7f69aaa30cf2 in $_24::operator()(void*) const /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1
4 0x7f69aaa30c87 in $_24::__invoke(void*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1
5 0x7f69aaa68856 in BenchmarkStream::rawNext() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:653:32
6 0x7f69aaa61467 in BenchmarkStream::next() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:642:25
7 0x7f69aaa5b703 in nanobench_main() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1119:27
8 0x7f69aaa5e10d in main /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1290:12
9 0x7f69a8c95ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
TBR=herb@google.com
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1969803002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1969803002
Reason for revert:
for (int i = 0; i < loops; i++) {
54 canvas->drawPoints(SkCanvas::kLines_PointMode, PTS, fPts, paint); 68 canvas->drawLine(fStartPts[i].x(), fStartPts[i].y(), fEndPts[i].x(), fEndPts[i].y(), pai
nt);
This change means we index arbitrarily far into fStartPts (if loops gets big enough).
Original issue's description:
> Modify LineBench for drawing
>
> Currently we only have benchmark for lines that contains randomly generated
> points. This CL modifies the benchmark which adds cases of drawing straight
> lines. Also, this CL changes the call from drawPoints() to drawLines()
> which measures the performance of line drawing.
>
> BUG=skia:5243
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1936153002
>
> Committed: https://skia.googlesource.com/skia/+/6b27a5e7292d9a18e376f0c229ec62f7b801305aTBR=bsalomon@chromium.org,robertphillips@chromium.org,robertphillips@google.com,xidachen@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:5243
Review-Url: https://codereview.chromium.org/1952063005
Refactor GrGpuResource to contain two different pieces of state:
a) instance is budgeted or not budgeted
b) instance references wrapped backend objects or not
The "object lifecycle" was also attached to backend object
handles (ids), which made the code a bit unclear. Backend objects
would be associated with GrGpuResource::LifeCycle, even though
GrGpuResource::LifeCycle refers to the GpuResource, and individual
backend objects in one GpuResource might be governed with different
"lifecycle".
Mark the budgeted/not budgeted with SkBudgeted::kYes, SkBudgeted::kNo.
This was previously GrGpuResource::kCached_LifeCycle,
GrGpuResource::kUncached_LifeCycle.
Mark the "references wrapped object" with boolean. This was previously
GrGpuResource::kBorrowed_LifeCycle,
GrGpuResource::kAdopted_LifeCycle for GrGpuResource.
Associate the backend object ownership status with
GrBackendObjectOwnership for the backend object handles.
The resource type leaf constuctors, such has GrGLTexture or
GrGLTextureRenderTarget take "budgeted" parameter. This parameter
is passed to GrGpuResource::registerWithCache().
The resource type intermediary constructors, such as GrGLTexture
constructors for class GrGLTextureRenderTarget do not take "budgeted"
parameters, intermediary construtors do not call registerWithCache.
Removes the need for tagging GrGpuResource -derived subclass
constructors with "Derived" parameter.
Makes instances that wrap backend objects be registered with
a new function GrGpuResource::registerWithCacheWrapped().
Removes "budgeted" parameter from classes such as StencilAttahment, as
they are always cached and never wrap any external backend objects.
Removes the use of concept "external" from the member function names.
The API refers to the objects as "wrapped", so make all related
functions use the term consistently.
No change in functionality. Resources referencing wrapped objects are
always inserted to the cache with budget decision kNo.
BUG=594928
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1862043002
Review URL: https://codereview.chromium.org/1862043002
Motivation: This function is used throughout SkPDF.
Note that the compiler can usually inline the result of strlen() for literal strings.
Before:
out/Release/nanobench -m WStreamWriteText -q
Timer overhead: 24.2ns
! -> high variance, ? -> moderate variance
micros bench
6.10 WStreamWriteText nonrendering
After:
out/Release/nanobench -m WStreamWriteText -q
Timer overhead: 23.9ns
! -> high variance, ? -> moderate variance
micros bench
2.51 WStreamWriteText nonrendering
PDF runtime change: -0.8% ±0.04%.
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1844343004
Review URL: https://codereview.chromium.org/1844343004
Before this change, the PDFCanon held a map from BitmapKeys
to SkImages for de-duping bitmaps. Even if the PDFDocument
serialized images early, the Canon still held a ref to that
image inside the map. With this change, the Canon holds a
single map from BitmapKeys to PDFObjects. Now, Images are
only held by the PDFObject, which the document serializes
and drops early.
This change also:
- Moves SkBitmapKey into its own header (for possible
reuse); it now can operate with images as well as
bitmaps.
- Creates SkImageBitmap, which wraps a pointer to a bitmap
or an image and abstracts out some common tasks so that
drawBitmap and drawImage behave the same.
- Modifies SkPDFCreateBitmapObject to take and return a
sk_sp<T>, not a T*.
- Refactors SkPDFDevice::internalDrawImage to use bitmaps
or images (via a SkImageBitmap).
- Turns on pre-serialization of all images.
BUG=skia:5087
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1829693002
Review URL: https://codereview.chromium.org/1829693002