We used to step at a 4-pixel stride as long as possible, then run up to 3 times, one pixel at a time. Now replace those 1-at-a-time runs with a single tail stamp if there are 1-3 remaining pixels.
This style is simply more efficient: e.g. we'll blend and lerp once for 3 pixels instead of 3 times. This should make short blits significantly more efficient. It's also more future-oriented... AVX+ on Intel and SVE on ARM support masked loads and stores, so we can do the entire tail in one direct step.
This also makes it possible to re-arrange the code a bit to encapsulate each stage better. I think generally this code reads more clearly than the old code, but YMMV. I've arranged things so you write one function, but it's compiled into two specializations, one for tail=0 (Body) and one for tail>0 (Tail). It's pretty tidy.
For now I've just burned a register to pass around tail. It's 2 bits now, maybe soon 3 with AVX, and capped at 4 for even the craziest new toys, so there are plenty of places we can pack it if we want to get clever.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2717
Change-Id: I45852a3e5d4c5b5e9315302c46601aee0d32265f
Reviewed-on: https://skia-review.googlesource.com/2717
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Today if you use the simple SK_RASTER_STAGE interface to build a pipeline, each stage you add calls into a next stage. The last stage you add calls into a special backstop stage JustReturn that, well, just returns, ending the pipeline.
This adds last(), which cuts that last stage off the pipeline. Instead, the stage you add using last() returns directly, ending the pipeline itself without jumping into JustReturn.
This reduces the overhead of using the pipelined version of SkRasterPipelineBench from ~25% to ~20% on my desktop.
Also, add docs.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2713
Change-Id: I11469378e2765c6e34db52eb3eef648d6612da3f
Reviewed-on: https://skia-review.googlesource.com/2713
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
It was not a fan of this (blatant) aliasing.
I suspect this best_non_simd_srcover_srgb_srgb() function has several
other aliasing issues that use undefined behavior, but this is all it's
complaining about for now.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2606
Change-Id: I25a8800e810bccf5068c8a10e9c8c8f565e57304
Reviewed-on: https://skia-review.googlesource.com/2606
Commit-Queue: Mike Klein <mtklein@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Herb Derby <herb@google.com>
Reason for revert:
Hitting an assert
Original issue's description:
> Support Float32 output from SkColorSpaceXform
>
> * Adds Float32 support to SkColorSpaceXform
> * Changes API to allows clients to ask for F32, updates clients to
> new API
> * Adds Sk4f_load4 and Sk4f_store4 to SkNx
> * Make use of new xform in SkGr.cpp
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
> CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2347473007
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2339233003
The SkFontData type is not exposed externally, so any method which uses
it can be updated to use smart pointers without affecting external
users. Updating this first will make updating the public API much
easier.
This also updates SkStreamAsset* SkStream::NewFromFile(const char*) to
std::unique_ptr<SkStreamAsset> SkStream::MakeFromFile(const char*). It
appears that no one outside Skia is currently using SkStream::NewfromFile
so this is a good time to update it as well.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339273002
Committed: https://skia.googlesource.com/skia/+/d8c2476a8b1e1e1a1771b17e8dd4db8645914f8c
Review-Url: https://codereview.chromium.org/2339273002
Reason for revert:
Killing Mac
Original issue's description:
> SkFontData to use smart pointers.
>
> The SkFontData type is not exposed externally, so any method which uses
> it can be updated to use smart pointers without affecting external
> users. Updating this first will make updating the public API much
> easier.
>
> This also updates SkStreamAsset* SkStream::NewFromFile(const char*) to
> std::unique_ptr<SkStreamAsset> SkStream::MakeFromFile(const char*). It
> appears that no one outside Skia is currently using SkStream::NewfromFile
> so this is a good time to update it as well.
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339273002
>
> Committed: https://skia.googlesource.com/skia/+/d8c2476a8b1e1e1a1771b17e8dd4db8645914f8cTBR=mtklein@chromium.org,halcanary@google.com,mtklein@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review-Url: https://codereview.chromium.org/2343933002
The SkFontData type is not exposed externally, so any method which uses
it can be updated to use smart pointers without affecting external
users. Updating this first will make updating the public API much
easier.
This also updates SkStreamAsset* SkStream::NewFromFile(const char*) to
std::unique_ptr<SkStreamAsset> SkStream::MakeFromFile(const char*). It
appears that no one outside Skia is currently using SkStream::NewfromFile
so this is a good time to update it as well.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339273002
Review-Url: https://codereview.chromium.org/2339273002
We used to build src and dst transfer fn tables
every time a new xform was created with linear
src and dst. Now we don't compute them because
we don't need them.
This will make SkColorSpaceXform a far better
option for any xforms with float or half-float
inputs or outputs, particularly on a small number
of pixels.
This CL also moves SkColorSpaceXform closer to
what I anticipate will be the eventual 'API design'.
I think apply() will want to take a SrcColorType enum
(not created yet because it's not necessary yet) and
a DstColorType enum (still using SkColorType because
there's not yet a reason not to).
Performance changes:
toSRGB 341us -> 366us
to2Dot2 404us -> 403us
toF16 318us -> 304us
There's no reason for toSRGB or to2Dot2 to change.
The refactor seems to have caused the compiler to
order the instructions a little differently...
This is something to come back to if we need to
squeeze more performance out of sRGB. For now,
let's not be held up by something we don't control.
F16 likely improves because we are no longer
(unnecessarily) building the linear tables.
Code size gets a little bigger. Measuring
SkColorSpaceXform size as a percentage of src/ size,
we go from 0.8% to 1.4%.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2335723002
Review-Url: https://codereview.chromium.org/2335723002
HWUI skips transparent rects when drawing.
When skia draws using bilerp, we will blend
transparent rects with neighboring rects and might
draw a bit of a smudge.
This CL adds the option to skip rects, allowing us
to have compatible behavior with the framework.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2305433002
Review-Url: https://codereview.chromium.org/2305433002
Verify the rules that we're converging on for surfaces:
- For 8888, we only support sRGB-like gamma, or no color space at all.
- For F16, we require a color space, with linear gamma.
- For all other formats, we do not support color spaces.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2270823002
Review-Url: https://codereview.chromium.org/2270823002
Motivation: gross code simplification, also no bitset lookups at draw time.
SkPDFFont owns its glyph useage bitset.
SkPDFSubstituteMap goes away.
SkPDFObject interface is simplified.
SkPDFDocument tracks font usage (as hash set), not glyph usage.
SkPDFFont gets a simpler constructor.
SkPDFFont has first and last glyph set in constructor, not adjusted later.
SkPDFFont implementations are simplified.
SkPDFGlyphSet is replaced with simple SkBitSet.
SkPDFFont sizes its SkBitSets based on glyph count.
SkPDFGlyphSetMap goes away.
SkBitSet is now non-copyable.
SkBitSet now how utility methods to match old SkPDFGlyphSet.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2253283004
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Release-GDI-Trybot,Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Debug-GDI-Trybot
Review-Url: https://codereview.chromium.org/2253283004
Useful when:
(1) Client does not realize src and dst match (calls color
xform anyway).
(2) Client wants half floats, src and dst have matching
gamuts
(3) Client wants premul (done correctly in linear space),
src and dst have matching gamuts.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2206403003
Review-Url: https://codereview.chromium.org/2206403003
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Committed: https://skia.googlesource.com/skia/+/d22a817ff57986407facd16af36320fc86ce02da
Review-Url: https://codereview.chromium.org/2225393002
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Review-Url: https://codereview.chromium.org/2225393002
The recording bench must record some source material into some sort of
display list, and fundamentally cannot separate the timing of the two.
This CL makes it so the source material and display list are of the same type.
So instead of previous:
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkRecord-based picture -> threadsafe SkLiteDL
Now this times
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkLiteDL -> threadsafe SkLiteDL
This makes it easier to profile SkLiteDL and explore both recording and playback overhead hot spots.
The threadsafety is incidental for the source (and doesn't affect playback speed),
but I think it's handy to keep around on the destination to make a more fair comparison.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2230323002
Review-Url: https://codereview.chromium.org/2230323002
- This code is entirely private and is not being used by anything.
- In a future CL we will write a class that uses CurveMeasure to compute dash points. In order to determine whether CurveMeasure or PathMeasure should be faster, we need the dash info (the sum of the on/off intervals and how many there are)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2187083002
Review-Url: https://codereview.chromium.org/2187083002
About 9x faster than Murmur3 for long inputs.
Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot
Review-Url: https://codereview.chromium.org/2208903002
SkLiteRecorder, a new SkCanvas, fills out SkLiteDL, a new SkDrawable.
This SkDrawable is a display list similar to SkRecord and SkBigPicture / SkRecordedDrawable, but with a few new design points inspired by Android and slimming paint:
1) SkLiteDL is structured as one big contiguous array rather than the two layer structure of SkRecord. This trades away flexibility and large-op-count performance for better data locality for small to medium size pictures.
2) We keep a global freelist of SkLiteDLs, both reusing the SkLiteDL struct itself and its contiguous byte array. This keeps the expected number of mallocs per display list allocation <1 (really, ~0) for cyclical use cases.
These two together mean recording is faster. Measuring against the code we use at head, SkLiteRecorder trends about ~3x faster across various size pictures, matching speed at 0 draws and beating the special-case 1-draw pictures we have today. (I.e. we won't need those special case implementations anymore, because they're slower than this new generic code.) This new strategy records 10 drawRects() in about the same time the old strategy took for 2.
This strategy stays the winner until at least 500 drawRect()s on my laptop, where I stopped checking.
A simpler alternative to freelisting is also possible (but not implemented here), where we allow the client to manually reset() an SkLiteDL for reuse when its refcnt is 1. That's essentially what we're doing with the freelist, except tracking what's available for reuse globally instead of making the client do it.
This code is not fully capable yet, but most of the key design points are there. The internal structure of SkLiteDL is the area I expect to be most volatile (anything involving Op), but its interface and the whole of SkLiteRecorder ought to be just about done.
You can run nanobench --match picture_overhead as a demo. Everything it exercises is fully fleshed out, so what it tests is an apples-to-apples comparison as far as recording costs go. I have not yet compared playback performance.
It should be simple to wrap this into an SkPicture subclass if we want.
I won't start proposing we replace anything old with anything new quite yet until I have more ducks in a row, but this does look pretty promising (similar to the SkRecord over old SkPicture change a couple years ago) and I'd like to land, experiment, iterate, especially with an eye toward Android.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2213333002
Review-Url: https://codereview.chromium.org/2213333002
With the move from SkData::NewXXX to SkData::MakeXXX most
SkAutoTUnref<SkData> were changed to sk_sp<SkData>. However,
there are still a few SkAutoTUnref<SkData> around, so clean
them up.
Review-Url: https://codereview.chromium.org/2212493002