Should feel very similar to Sk4h_store4:
NEON uses its native instruction, SSE unpacks manually.
Since we'll have our F16s in 4 Sk4h by the time we're done here,
this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2184753002
This gives us a little more control over instruction order, allowing
us to pipeline the muls and get better performance. Technically,
clang should be able to do this for us anyway...
Performance on HP z620 (201295.jpg):
toSRGB: 371us -> 356us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2175413002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2175413002
* do a lot less floating-point math by converting to
an integer as early as possible [faster].
* round rather than truncate.
* use 8 significant digits rather than 9 when possible.
* remove trailing zeros in fractions.
before:
0.12 ! PDFScalar nonrendering
after:
0.07 ! PDFScalar nonrendering
Accuracy guaranteed by existing unit test.
Example diffs:
-/Shading <</Function <</C0 [.321568638 .333333343 .321568638]
+/Shading <</Function <</C0 [.32156864 .33333334 .32156864]
-/C1 [.258823543 .270588248 .258823543]
+/C1 [.25882354 .27058825 .25882354]
-1 0 0 -1 20 120.394500 Tm
+1 0 0 -1 20 120.394501 Tm
-1 0 0 -1 20 184.789001 Tm
+1 0 0 -1 20 184.789 Tm
-291.503997 0 l
+291.504 0 l
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2146103004
Review-Url: https://codereview.chromium.org/2146103004
This clamps to [0,1] premul just before every store to memory.
By making the clamp a stage itself, this design makes it easy to move the clamp
around, to replace it with a debug-only assert-we're-clamped stage for certain
formats, clamp in more places, programatically not clamp, etc. etc.
Before this change, clamping was a little haphazard: store_srgb clamped
R, G and B to [0,1], but not A, and didn't clamp the colors to A. 565
didn't clamp at all.
6 GMs draw subtly differently in sRGB, I think because we've started clamping
colors to alpha to enforce premultiplication better. No changes for 565.
My hope is that now no other stage need ever concern itself with clamping.
So we don't double-clamp, I've added a _noclamp version of sk_linear_to_srgb()
that simply asserts a clamp isn't necessary. This happens to expose the Sk4f
_needs_trunc version that might be useful for power users (*cough* Matt *cough*).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2178793002
Review-Url: https://codereview.chromium.org/2178793002
This is going to be needed in many more places as I finish connecting the
dots. Even better - I'd like to switch to a world where SkColorSpace !=
nullptr is the only signal we use for gamma-correct rendering, so I can
eliminate SkSourceGammaTreatment and SkSurfaceProps::isGammaCorrect.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2180503002
Review-Url: https://codereview.chromium.org/2180503002
These are asserts are firing from a recent change to our scissor code.
Since these asserts were added, the Vulkan spec has been updated to no
longer require the scissor is insides the bounds of the image, just that
x + width does not overflow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2171283004
Review-Url: https://codereview.chromium.org/2171283004
This is an experiment / demo to have our 565 backend fold into
SkRasterPipelineBlitter as it grows more powerful. I plan to follow up with
the same for the other 8888 format.
Blur mask filters look significantly different (better) after this change.
We keep the full 13-14-13 bits of precision for mask blits, where the old code
uses 11-11-10 bit intermediates.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2172343002
Review-Url: https://codereview.chromium.org/2172343002
This CL has several parts that are intertwined:
* move pin/wrap functionality into BilerpSampler.
* remove the nearest neighbor and bilerp tilers
* create a simplified general tiler
* remove the pipeline virtual calls bilerpEdge and bilerpSpan because everything works of sample points now.
* redo all the bilerp sampling to use the new local to methods to wrap/pin.
* introduce a new medium rate sample that handles spans with 1 < |dx| < 2.
This change improves the performance as displayed below:
Most of top 25 desktop improves or are the same. A few are worse, but close to the noise floor. In addition, this change has about 3% smaller code.
old time new time new/old
13274693 8414645 0.633886 top25desk_google_com_search_q_c.skp_1
4946466 3258018 0.658656 top25desk_wordpress.skp_1
6977187 5737584 0.822335 top25desk_youtube_com.skp_1
3770021 3296831 0.874486 top25desk_google_com__hl_en_q_b.skp_1
8890813 8600143 0.967307 top25desk_answers_yahoo_com.skp_1
3178974 3094300 0.973364 top25desk_facebook.skp_1
8871835 8711260 0.981901 top25desk_twitter.skp_1
838509 829290 0.989005 top25desk_blogger.skp_1
2821870 2801111 0.992644 top25desk_plus_google_com_11003.skp_1
511978 509530 0.995219 top25desk_techcrunch_com.skp_1
2408588 2397435 0.995369 top25desk_ebay_com.skp_1
4446919 4448004 1.00024 top25desk_espn.skp_1
2863241 2875696 1.00435 top25desk_google_com_calendar_.skp_1
7170086 7208447 1.00535 top25desk_booking_com.skp_1
7356109 7417776 1.00838 top25desk_pinterest.skp_1
5265591 5340392 1.01421 top25desk_weather_com.skp_1
5675244 5774144 1.01743 top25desk_sports_yahoo_com_.skp_1
1048531 1067663 1.01825 top25desk_games_yahoo_com.skp_1
2075501 2115131 1.01909 top25desk_amazon_com.skp_1
4262170 4370441 1.0254 top25desk_news_yahoo_com.skp_1
3789319 3897996 1.02868 top25desk_docs___1_open_documen.skp_1
919336 949979 1.03333 top25desk_wikipedia__1_tab_.skp_1
4274454 4489369 1.05028 top25desk_mail_google_com_mail_.skp_1
4149326 4376556 1.05476 top25desk_linkedin.skp_1
BUG=skia:5566
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134893002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/8602ede5fdfa721dcad4dcb11db028c1c24265f1
Review-Url: https://codereview.chromium.org/2134893002