DrawContext's isGammaCorrect now just based on presence of color space.
Next change will remove the function and flag entirely, but I wanted to
land this separately. This alters a few GMs in srgb/f16 mode, generally
those that are creating off-screen surfaces in ways that were somewhat
lossy before. No unexplained changes.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2186633002
Review-Url: https://codereview.chromium.org/2186633002
Should feel very similar to Sk4h_store4:
NEON uses its native instruction, SSE unpacks manually.
Since we'll have our F16s in 4 Sk4h by the time we're done here,
this also extracts an Sk4h->Sk4f routine from the old uint64_t->Sk4f one.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2184753002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2184753002
This gives us a little more control over instruction order, allowing
us to pipeline the muls and get better performance. Technically,
clang should be able to do this for us anyway...
Performance on HP z620 (201295.jpg):
toSRGB: 371us -> 356us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2175413002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2175413002
* do a lot less floating-point math by converting to
an integer as early as possible [faster].
* round rather than truncate.
* use 8 significant digits rather than 9 when possible.
* remove trailing zeros in fractions.
before:
0.12 ! PDFScalar nonrendering
after:
0.07 ! PDFScalar nonrendering
Accuracy guaranteed by existing unit test.
Example diffs:
-/Shading <</Function <</C0 [.321568638 .333333343 .321568638]
+/Shading <</Function <</C0 [.32156864 .33333334 .32156864]
-/C1 [.258823543 .270588248 .258823543]
+/C1 [.25882354 .27058825 .25882354]
-1 0 0 -1 20 120.394500 Tm
+1 0 0 -1 20 120.394501 Tm
-1 0 0 -1 20 184.789001 Tm
+1 0 0 -1 20 184.789 Tm
-291.503997 0 l
+291.504 0 l
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2146103004
Review-Url: https://codereview.chromium.org/2146103004
This clamps to [0,1] premul just before every store to memory.
By making the clamp a stage itself, this design makes it easy to move the clamp
around, to replace it with a debug-only assert-we're-clamped stage for certain
formats, clamp in more places, programatically not clamp, etc. etc.
Before this change, clamping was a little haphazard: store_srgb clamped
R, G and B to [0,1], but not A, and didn't clamp the colors to A. 565
didn't clamp at all.
6 GMs draw subtly differently in sRGB, I think because we've started clamping
colors to alpha to enforce premultiplication better. No changes for 565.
My hope is that now no other stage need ever concern itself with clamping.
So we don't double-clamp, I've added a _noclamp version of sk_linear_to_srgb()
that simply asserts a clamp isn't necessary. This happens to expose the Sk4f
_needs_trunc version that might be useful for power users (*cough* Matt *cough*).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2178793002
Review-Url: https://codereview.chromium.org/2178793002
This is going to be needed in many more places as I finish connecting the
dots. Even better - I'd like to switch to a world where SkColorSpace !=
nullptr is the only signal we use for gamma-correct rendering, so I can
eliminate SkSourceGammaTreatment and SkSurfaceProps::isGammaCorrect.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2180503002
Review-Url: https://codereview.chromium.org/2180503002
These are asserts are firing from a recent change to our scissor code.
Since these asserts were added, the Vulkan spec has been updated to no
longer require the scissor is insides the bounds of the image, just that
x + width does not overflow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2171283004
Review-Url: https://codereview.chromium.org/2171283004
This is an experiment / demo to have our 565 backend fold into
SkRasterPipelineBlitter as it grows more powerful. I plan to follow up with
the same for the other 8888 format.
Blur mask filters look significantly different (better) after this change.
We keep the full 13-14-13 bits of precision for mask blits, where the old code
uses 11-11-10 bit intermediates.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2172343002
Review-Url: https://codereview.chromium.org/2172343002
This CL has several parts that are intertwined:
* move pin/wrap functionality into BilerpSampler.
* remove the nearest neighbor and bilerp tilers
* create a simplified general tiler
* remove the pipeline virtual calls bilerpEdge and bilerpSpan because everything works of sample points now.
* redo all the bilerp sampling to use the new local to methods to wrap/pin.
* introduce a new medium rate sample that handles spans with 1 < |dx| < 2.
This change improves the performance as displayed below:
Most of top 25 desktop improves or are the same. A few are worse, but close to the noise floor. In addition, this change has about 3% smaller code.
old time new time new/old
13274693 8414645 0.633886 top25desk_google_com_search_q_c.skp_1
4946466 3258018 0.658656 top25desk_wordpress.skp_1
6977187 5737584 0.822335 top25desk_youtube_com.skp_1
3770021 3296831 0.874486 top25desk_google_com__hl_en_q_b.skp_1
8890813 8600143 0.967307 top25desk_answers_yahoo_com.skp_1
3178974 3094300 0.973364 top25desk_facebook.skp_1
8871835 8711260 0.981901 top25desk_twitter.skp_1
838509 829290 0.989005 top25desk_blogger.skp_1
2821870 2801111 0.992644 top25desk_plus_google_com_11003.skp_1
511978 509530 0.995219 top25desk_techcrunch_com.skp_1
2408588 2397435 0.995369 top25desk_ebay_com.skp_1
4446919 4448004 1.00024 top25desk_espn.skp_1
2863241 2875696 1.00435 top25desk_google_com_calendar_.skp_1
7170086 7208447 1.00535 top25desk_booking_com.skp_1
7356109 7417776 1.00838 top25desk_pinterest.skp_1
5265591 5340392 1.01421 top25desk_weather_com.skp_1
5675244 5774144 1.01743 top25desk_sports_yahoo_com_.skp_1
1048531 1067663 1.01825 top25desk_games_yahoo_com.skp_1
2075501 2115131 1.01909 top25desk_amazon_com.skp_1
4262170 4370441 1.0254 top25desk_news_yahoo_com.skp_1
3789319 3897996 1.02868 top25desk_docs___1_open_documen.skp_1
919336 949979 1.03333 top25desk_wikipedia__1_tab_.skp_1
4274454 4489369 1.05028 top25desk_mail_google_com_mail_.skp_1
4149326 4376556 1.05476 top25desk_linkedin.skp_1
BUG=skia:5566
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134893002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/8602ede5fdfa721dcad4dcb11db028c1c24265f1
Review-Url: https://codereview.chromium.org/2134893002
Previously, SkClipStack would call "setEmpty" on itself when an
inverse-filled difference element made the stack empty. This was
a problem because setEmpty would forget the element had an inverse
fill, yet leave the op as "difference". This change modifies it to
manually update the clip bounds and set the gen-ID to kEmptyGenID,
rather than calling setEmpty.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2175493002
Review-Url: https://codereview.chromium.org/2175493002
We're using the linear procs for sRGB destintations
and the sRGB procs for linear destinations. Fix that.
C.f. State32::getLCDProc(), which flags |= kDstIsSRGB_LCDFlag.
kDistIsSRGB is (1<<2) == 4, so the sRGB procs must be 4-7, not 0-3.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2177493002
Review-Url: https://codereview.chromium.org/2177493002
Functions like GrMakeInfoFromTexture encouraged incorrect code to be
written. Similarly, the ability to construct an info from any GrSurface
was never going to be correct. Luckily, the only client of that had all
of the correct parameters much higher on the stack (and dictated or
replaced most of the properties of the returned info anyway).
With this, I can finally remove the color space as an output of the
pixel config -> color type conversion, which was never going to be
correct.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2173513002
Review-Url: https://codereview.chromium.org/2173513002
Improves performance for xforms toSRGB and to2Dot2. Seems
more optimal to save clamping until the end. That way we
don't stall the mul pipeline with a min/max.
toSRGB: 371us -> 346us
to2Dot2: 404us -> 387us
FWIW, it probably makes sense to clamp inside
sk_linear_to_srgb anyway. If not, we should potentially
provide two versions (one that clamps and one that
doesn't).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2173803002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2173803002
Reason for revert:
Crashing on Win with:
Caught exception 3221225477 EXCEPTION_ACCESS_VIOLATION, was running:
unit test GrShape
srgb gm shadertext2
srgb gm shallow_gradient_conical
srgb gm shallow_gradient_sweep
srgb gm shallow_gradient_linear_nodither
step returned non-zero exit code: -1073741819
https://status.skia.org/?commit_label=author&filter=search&search_value=Test-Win-MSVC-GCE-CPU-AVX2-x86-Release
Original issue's description:
> In the current code, tiling and bilerp sampling are strongly tied together. They can be separated by taking advantage of observation that translating a sample point into filter points in the bilerp stage the filter points will be at most 0.5 outside the tile. This allows simplified repositioning for the various tiling modes; clamp and mirror use min and max while repeat has max -> 0 and 0-> max. This allows bilerp to simply treat the filter points that fall off the tile. This allows tiling and bilerp sampling to be totally separate.
>
> This CL has several parts that are intertwined:
> * move pin/wrap functionality into BilerpSampler.
> * remove the nearest neighbor and bilerp tilers
> * create a simplified general tiler
> * remove the pipeline virtual calls bilerpEdge and bilerpSpan because everything works of sample points now.
> * redo all the bilerp sampling to use the new local to methods to wrap/pin.
> * introduce a new medium rate sample that handles spans with 1 < |dx| < 2.
>
> This change improves the performance as displayed below:
> Most of top 25 desktop improves or are the same. A few are worse, but close to the noise floor. In addition, this change has about 3% smaller code.
>
> old time new time new/old
> 13274693 8414645 0.633886 top25desk_google_com_search_q_c.skp_1
> 4946466 3258018 0.658656 top25desk_wordpress.skp_1
> 6977187 5737584 0.822335 top25desk_youtube_com.skp_1
> 3770021 3296831 0.874486 top25desk_google_com__hl_en_q_b.skp_1
> 8890813 8600143 0.967307 top25desk_answers_yahoo_com.skp_1
> 3178974 3094300 0.973364 top25desk_facebook.skp_1
> 8871835 8711260 0.981901 top25desk_twitter.skp_1
> 838509 829290 0.989005 top25desk_blogger.skp_1
> 2821870 2801111 0.992644 top25desk_plus_google_com_11003.skp_1
> 511978 509530 0.995219 top25desk_techcrunch_com.skp_1
> 2408588 2397435 0.995369 top25desk_ebay_com.skp_1
> 4446919 4448004 1.00024 top25desk_espn.skp_1
> 2863241 2875696 1.00435 top25desk_google_com_calendar_.skp_1
> 7170086 7208447 1.00535 top25desk_booking_com.skp_1
> 7356109 7417776 1.00838 top25desk_pinterest.skp_1
> 5265591 5340392 1.01421 top25desk_weather_com.skp_1
> 5675244 5774144 1.01743 top25desk_sports_yahoo_com_.skp_1
> 1048531 1067663 1.01825 top25desk_games_yahoo_com.skp_1
> 2075501 2115131 1.01909 top25desk_amazon_com.skp_1
> 4262170 4370441 1.0254 top25desk_news_yahoo_com.skp_1
> 3789319 3897996 1.02868 top25desk_docs___1_open_documen.skp_1
> 919336 949979 1.03333 top25desk_wikipedia__1_tab_.skp_1
> 4274454 4489369 1.05028 top25desk_mail_google_com_mail_.skp_1
> 4149326 4376556 1.05476 top25desk_linkedin.skp_1
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134893002
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/8602ede5fdfa721dcad4dcb11db028c1c24265f1TBR=mtklein@google.com,herb@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2174793002
This CL has several parts that are intertwined:
* move pin/wrap functionality into BilerpSampler.
* remove the nearest neighbor and bilerp tilers
* create a simplified general tiler
* remove the pipeline virtual calls bilerpEdge and bilerpSpan because everything works of sample points now.
* redo all the bilerp sampling to use the new local to methods to wrap/pin.
* introduce a new medium rate sample that handles spans with 1 < |dx| < 2.
This change improves the performance as displayed below:
Most of top 25 desktop improves or are the same. A few are worse, but close to the noise floor. In addition, this change has about 3% smaller code.
old time new time new/old
13274693 8414645 0.633886 top25desk_google_com_search_q_c.skp_1
4946466 3258018 0.658656 top25desk_wordpress.skp_1
6977187 5737584 0.822335 top25desk_youtube_com.skp_1
3770021 3296831 0.874486 top25desk_google_com__hl_en_q_b.skp_1
8890813 8600143 0.967307 top25desk_answers_yahoo_com.skp_1
3178974 3094300 0.973364 top25desk_facebook.skp_1
8871835 8711260 0.981901 top25desk_twitter.skp_1
838509 829290 0.989005 top25desk_blogger.skp_1
2821870 2801111 0.992644 top25desk_plus_google_com_11003.skp_1
511978 509530 0.995219 top25desk_techcrunch_com.skp_1
2408588 2397435 0.995369 top25desk_ebay_com.skp_1
4446919 4448004 1.00024 top25desk_espn.skp_1
2863241 2875696 1.00435 top25desk_google_com_calendar_.skp_1
7170086 7208447 1.00535 top25desk_booking_com.skp_1
7356109 7417776 1.00838 top25desk_pinterest.skp_1
5265591 5340392 1.01421 top25desk_weather_com.skp_1
5675244 5774144 1.01743 top25desk_sports_yahoo_com_.skp_1
1048531 1067663 1.01825 top25desk_games_yahoo_com.skp_1
2075501 2115131 1.01909 top25desk_amazon_com.skp_1
4262170 4370441 1.0254 top25desk_news_yahoo_com.skp_1
3789319 3897996 1.02868 top25desk_docs___1_open_documen.skp_1
919336 949979 1.03333 top25desk_wikipedia__1_tab_.skp_1
4274454 4489369 1.05028 top25desk_mail_google_com_mail_.skp_1
4149326 4376556 1.05476 top25desk_linkedin.skp_1
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2134893002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2134893002
If the length of a line path is sufficiently long relative to the dash
interval, it is possible to cause SkDashPathEffect::asPoints to produce
so many points that it overflows the amount that can fit in an int type,
or otherwise produce non-finite values, i.e. path from (0,0) to (0,9e15)
with a dash interval of 1.
This fixes that by capping the amount of points to a sane limit - in this
case, 1mil, since that limit is also used in utils/SkDashPath.cpp and has
precedent.
Downstream Firefox bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1287515
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2165013002
Review-Url: https://codereview.chromium.org/2165013002
Fix another fuzzer bug.
Some PathOps asserts only make sense if the incoming data is
well-behaved. Well-behaved tests set debugging state to
trigger these additional asserts.
Formalize this by creating macros similar to SkASSERT that
check to see if the assert should be skipped.
TBR=reed@google.com
BUG=629962
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2169863002
Review-Url: https://codereview.chromium.org/2169863002
This trims the SkPM4fPriv methods down to just foolproof methods.
(Anything trying to build these itself is probably wrong.)
Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore,
at least not efficiently, so this refactor is somewhat more invasive
than you might think. Generally this means things using to_4f() are
also making a misstep... that's gone too.
It also does not make sense to try to play games with linear floats
with 255 bias any more. That hack can't work with real sRGB coding.
Rather than update them, I've removed a couple of L32 xfermode fast
paths. I'd even rather drop it entirely...
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2163683002
On Nexus Player and occasionally Nexus 5x we get transparent boxes around
paths. This appears to be because the dFdy call is not as accurate as
dFdx, which is the opposite of Mali 400. As Mali 400 is not supported with
Vulkan, we can go back to using dFdx in this case.
BUG=skia:5523
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163213004
Review-Url: https://codereview.chromium.org/2163213004
I basically just ran a big 5-deep for-loop over the five constants here.
This is the first set of coefficients I found that round trips all bytes.
I suspect there are many such sets.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2162063003
Make SkASSERTF output readable.
Ensure the assert predicate is stringified once.
Make the abort code consistent.
TBR=reed
This doesn't change any public API, most of this should be privatized.
Review-Url: https://codereview.chromium.org/2161103002
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
GrTextureAccess optionally includes an instance, computed from the src
and dst color spaces. In all common cases (no color space for either src
or dst, or same color space for both), no object is allocated.
This change is orthogonal to my attempts to get color space attached to
render targets - regardless of how we choose to do that, this will give
us the source color space at all points where we are connecting src to
dst.
There are many dangling injection points where I've been inserting
nullptr, but I have a record of all of them. Additionally, there are now
three places (the most common simple paths for bitmap/image rendering)
where things are plumbed enough that I expect to have access to the dst
color space (all marked with XFORMTODO).
In addition to getting the dst color space, I need to inject shader code
and uniform uploading for appendTextureLookup and friends.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2154753003
Review-Url: https://codereview.chromium.org/2154753003
Most changes stem from working on an examples bracketed
by #if DEBUG_UNDER_DEVELOPMENT // tiger
These exposed many problems with coincident curves,
as well as errors throughout the code.
Fixing these errors also fixed a number of fuzzer-inspired
bug reports.
* Line/Curve Intersections
Check to see if the end of the line nearly intersects
the curve. This was a FIXME in the old code.
* Performance
Use a central chunk allocator.
Plumb the allocator into the global variable state
so that it can be shared. (Note that 'SkGlobalState'
is allocated on the stack and is visible to children
functions but not other threads.)
* Refactor
Let SkOpAngle grow up from a structure to a class.
Let SkCoincidentSpans grow up from a structure to a class.
Rename enum Alias to AliasMatch.
* Coincidence Rewrite
Add more debugging to coincidence detection.
Parallel debugging routines have read-only logic to report
the current coincidence state so that steps through the
logic can expose whether things got better or worse.
More functions can error-out and cause the pathops
engine to non-destructively exit.
* Accuracy
Remove code that adjusted point locations. Instead,
offset the curve part so that sorted curves all use
the same origin.
Reduce the size (and influence) of magic numbers.
* Testing
The debug suite with verify and the full release suite
./out/Debug/pathops_unittest -v -V
./out/Release/pathops_unittest -v -V -x
expose one error. That error is captured as cubics_d3.
This error exists in the checked in code as well.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128633003
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2128633003
Review-Url: https://codereview.chromium.org/2128633003
Reason for revert:
Causing roll to fail on telemetry_perf_unittests (bencharks.system_health_smoke_test.SystemHealthBenchmarkSmokeTest.system_health.memory_desktop.load:search:taobao (and baidu)) and browser_tests (FindInPageControllerTest.FindInPageSpecialURLS).
This is due to triggering the assert in copyFTBitmap
SkASSERT(dstMask.fBounds.width() == static_cast<int>(srcFTBitmap.width));
when called from inside the block guarded by
if (bitmapTransform.isIdentity())
Original issue's description:
> Rotate bitmap strikes with FreeType.
>
> BUG=skia:3490
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2139703002
>
> Committed: https://skia.googlesource.com/skia/+/31e0c1379e6d0ce48196183e295b929af51fa74eTBR=mtklein@google.com,reed@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:3490
Review-Url: https://codereview.chromium.org/2149253005
SkPDFUtils now has a special function (SkPDFUtils::AppendColorComponent)
just for writing out (color/255) as a decimal with three digits of
precision.
SkPDFUnion now has a type to represent a color component. It holds a
utint_8, but calls into AppendColorComponent to serialize.
Added a unit test that tests all possible input values.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2151863003
Review-Url: https://codereview.chromium.org/2151863003
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.
We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.
This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)
I've tested the full float range for FloatToHalf on my desktop and a 5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast
Original issue's description:
> Expand _01 half<->float limitation to _finite. Simplify.
>
> It's become clear we need to sometimes deal with values <0 or >1.
> I'm not yet convinced we care about NaN or +-inf.
>
> We had some fairly clever tricks and optimizations here for NEON
> and SSE. I've thrown them out in favor of a single implementation.
> If we find the specializations mattered, we can certainly figure out
> how to extend them to this new range/domain.
>
> This happens to add a vectorized float -> half for ARMv7, which was
> missing from the _01 version. (The SSE strategy was not portable to
> platforms that flush denorm floats to zero.)
>
> I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2151023003
It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.
We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.
This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)
I've tested the full float range for FloatToHalf on my desktop and a 5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145663003
Looks like this code is using sk_calloc(), NULL on failure, accidentally
instead of sk_calloc_throw(). We're using sk_malloc_throw() in the parallel
code path, so it really seems like we're not checking the result pointer.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2152753002
Review-Url: https://codereview.chromium.org/2152753002
I've also changed it so all attachment views (texture, color, and resolve) are created separately and not shared with each other. This just added a lot more complexity than we were probably even saving in time.
A quick fix to make sure we don't reuse keys in resource tracking also
got merged into this change.
BUG=skia:5223
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2146103002
Review-Url: https://codereview.chromium.org/2146103002
These two new types are in support of Vulkan and the ability to send
separate texture and sampler uniforms to the shader. They don't really fit
well in the current system, since the current system ties together to idea
of intended use and how to emit shader code into the same GrSLType enum.
In vulkan, I want the GrGLSLSampler object to be used as a Sampler2D, but
when appending its declaration it will emit a Texture2D and sampler object.
Our query for GrSLTypeIsSamplerType refers more to the combination of texture
and sampler and not just the sampler part. The GrSLTypeIs2DTextureType query
is for is a a SamplerType that uses Texture2Ds. My new types don't really fit
into either these categories as they are just half of the whole.
In some refactoring down the road (possibly connected with SkSL), I suggest we
split apart the concept of how we intend to use a GrGLSLSampler (Sampler2D, SamplerBuffer,
etc.), from how we actually add it to the code (sampler, texture2D, sampler2D, etc.).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2143143002
Review-Url: https://codereview.chromium.org/2143143002
When we start a new MonotonePoly due to a handedness change, we don't need to
increase the vertex count, since that edge (and vertex) has already been
accounted for in the previous MonotonePoly.
This was not a correctness issue, but was causing us to allocate
extra vertices which would go unused.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2146063002
Review-Url: https://codereview.chromium.org/2146063002
On some platforms, a newly-created buffer was liable to be CPU backed.
This would break code that expected a VBO (aka instanced rendering).
This change adds an optional flag to GrResourceProvider that requires
a buffer to be created in GPU memory.
It also moves the CPU backing logic into Gr land in order to properly
cache real VBOs on platforms that prefer client-side buffers.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2143333002
Review-Url: https://codereview.chromium.org/2143333002
If we make sure all SkOpts functions are static, we can give the namespaces any
name we like. This lets us drop the sk_ prefix and give a real indication of
the default SIMD instruction set rather than just saying sk_default.
Both of these changes help debugger, profiler, and crash report readability.
Perhaps more importantly, keeping these functions static helps prevent
accidentally linking in unused versions of functions, as you see here with
sk_avx::srcover_srgb_srgb().
This requires we update SkBlend_opts tests and benches to call SkOpts functions
through SkOpts rather than declaring the methods externally. In practice this
drops testing of the SSE2 version on machines with SSE4. If we still really
need to test/bench the compile time best SIMD level version of this method
against the runtime detected best, we can include SkBlend_opts.h into the tests
or benches directly, similar to what we do for the trivial, brute-force, or best
non-SIMD versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145833002
Now that there may be multiple font managers in a process the typeface
ids must be unique across all typefaces, not just unique within a font
manager. If two typefaces have the same id there will be issues in the
glyph cache. All existing font managers were already doing this by
calling SkFontCache::NewFontID, so centralize this in SkTypeface.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147733002
Review-Url: https://codereview.chromium.org/2147733002
SkMatrix::scale and ::rotate take a point around which to scale or rotate.
Canvas lacks these helpers, so the code to rotate a canvas around a
point has been duplicated many times. Factor all of these
implementations into SkCanvas::rotate.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2142033002
Review-Url: https://codereview.chromium.org/2142033002