BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2820
Change-Id: Ifce7bce8b764d2dea02733d823396576a7da609f
Reviewed-on: https://skia-review.googlesource.com/2820
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Robert Phillips <robertphillips@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Mike Reed <reed@google.com>
Passing in a large buffer along with a source colour space that
used a CLUT would cause apply() to read freed heap memory, or
for smaller buffers read possibly re-used stack memory.
The code previously likely lucked out due to optimizations
removing most or all of the subsequent stack allocations.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2759
Change-Id: I39f357bce080c4d737a83dd019f0d1ccbc56f995
Reviewed-on: https://skia-review.googlesource.com/2759
Commit-Queue: Robert Aftias <raftias@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
This is handy now, and becomes necessary with fancier backends:
- most code can't speak the type of AVX pipeline stages,
so indirection's definitely needed there;
- if the pipleine is entirely composed of stock stages,
these enum values become an abstract recipe that can be JITted.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2782
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: Iedd62e99ce39e94cf3e6ffc78c428f0ccc182342
Reviewed-on: https://skia-review.googlesource.com/2782
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reason for revert:
Broke bots
Original issue's description:
> Explicit control in tools of ANGLE frontend and backend
>
> Update the ANGLE test GL context, GrContextFactory, and config parsing to allow explicit control of ANGLE front/backend.
>
> This will allow us to explicitly test ES2 vs ES3 interfaces to ANGLE as well as D3D9, D3D11, and OpenGL backends.
>
> Also makes the angle api types valid in all builds (but will just fail when SK_ANGLE=1 or not on windows for the d3d backends).
>
> BUG=skia:5804
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2381033002
>
> Committed: https://skia.googlesource.com/skia/+/50094fb489543655df026be4e4f99e09e57a1f49TBR=brianosman@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:5804
Review-Url: https://codereview.chromium.org/2384483003
Update the ANGLE test GL context, GrContextFactory, and config parsing to allow explicit control of ANGLE front/backend.
This will allow us to explicitly test ES2 vs ES3 interfaces to ANGLE as well as D3D9, D3D11, and OpenGL backends.
Also makes the angle api types valid in all builds (but will just fail when SK_ANGLE=1 or not on windows for the d3d backends).
BUG=skia:5804
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2381033002
NOTREECHECKS=true
NOTRY=true
NOPRESUBMIT=true
Review-Url: https://codereview.chromium.org/2381033002
This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX.
I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now.
That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative.
For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc
Reviewed-on: https://skia-review.googlesource.com/2752
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
It's more annoying than helpful to have GCC turn mul,add into fma.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2780
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Change-Id: I63f4615f73aed112f10f6cb516d899b820918298
Reviewed-on: https://skia-review.googlesource.com/2780
Commit-Queue: Cary Clark <caryclark@google.com>
Reviewed-by: Cary Clark <caryclark@google.com>
In some build configurations (I think, GN, GCC 6, Debug) I get a warning that i is used unintialized. This likely has something to do with GCC correctly seeing that the SkTCast construction there is illegal aliasing, and perhaps thus "doesn't happen". Might be that if the SkTCast gets inlined, it decides its implementation is secretly kosher, and so Release builds don't see this. None of this happens with the GCCs we have on the bots... too old?
Instead use memcpy() here, which is well defined to do what we intended.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2758
Change-Id: Iaf5c75fbd852193b0b861bf5e71450502511d102
Reviewed-on: https://skia-review.googlesource.com/2758
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
This adds proper target types, dependencies, and library handling. This
is enough to build and run dm on Linux and Mac.
Change-Id: I5220f67f7dd3dbada7ad03ef83fff8fd80158fad
Reviewed-on: https://skia-review.googlesource.com/2664
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
- Allow an second argument to limit the number of samples.
- If no benchmarks match, warn and exit instead of infinitely looping.
The default limit of 2147483647 10ms samples will run for 9 months, which I think is long enough to not need any special infinity logic.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2747
Change-Id: Id70cf77b624e19dc04e1d75a71385aee3c988a80
Reviewed-on: https://skia-review.googlesource.com/2747
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Added gradient shader factories that take SkColor4f + SkColorSpace.
Modified Descriptor to only store SkColor4f + SkColorSpace.
Existing factories make use of helper code to convert SkColor and
forward to the new factories.
Bumped SKP version to handle new gradient serialization format.
I was toying with using half-float when serializing SkColor4f,
despite my aggressive packing of flags, this format is significantly
bigger.
Also added GM to use 4f factories. This GM should (and does)
look identical to the existing gradients GM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2370063002
Review-Url: https://codereview.chromium.org/2370063002
This is a less generally applicable trick than I have previously hoped. The need to thread through contexts into each stage really means you can only include one context-dependent stage in each fused batch.
We can still manually fuse these, of course, as you can see in SkRasterPipelineBench. It's just that we can't really write a generic compile-time template to do it except for context-free stages. And since we can't write a generic version, and I have only this one specific use case right now, I've kept it quite specific to that use case.
This does work pretty well for this use case, though. Here's the fused clamp-then-store-565:
+0x00 pushq %rbp
+0x01 movq %rsp, %rbp
+0x04 movq 8(%rdi), %rax
+0x08 xorps %xmm4, %xmm4
+0x0b maxps %xmm4, %xmm3
+0x0e maxps %xmm4, %xmm0
+0x11 maxps %xmm4, %xmm1
+0x14 maxps %xmm4, %xmm2
+0x17 minps 4262818(%rip), %xmm3
+0x1e minps %xmm3, %xmm0
+0x21 minps %xmm3, %xmm1
+0x24 minps %xmm3, %xmm2
+0x27 movaps 4965378(%rip), %xmm3
+0x2e mulps %xmm3, %xmm0
+0x31 cvtps2dq %xmm0, %xmm0
+0x35 pslld $11, %xmm0
+0x3a mulps 4965375(%rip), %xmm1
+0x41 cvtps2dq %xmm1, %xmm1
+0x45 pslld $5, %xmm1
+0x4a mulps %xmm3, %xmm2
+0x4d cvtps2dq %xmm2, %xmm2
+0x51 orpd %xmm0, %xmm2
+0x55 orpd %xmm1, %xmm2
+0x59 pshufb 4474510(%rip), %xmm2
+0x62 movq %xmm2, (%rax,%rsi,2)
+0x67 popq %rbp
+0x68 retq
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2745
Change-Id: Ia7d66aecc6cbff154158d2600d7874feed1a76f6
Reviewed-on: https://skia-review.googlesource.com/2745
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
We used to step at a 4-pixel stride as long as possible, then run up to 3 times, one pixel at a time. Now replace those 1-at-a-time runs with a single tail stamp if there are 1-3 remaining pixels.
This style is simply more efficient: e.g. we'll blend and lerp once for 3 pixels instead of 3 times. This should make short blits significantly more efficient. It's also more future-oriented... AVX+ on Intel and SVE on ARM support masked loads and stores, so we can do the entire tail in one direct step.
This also makes it possible to re-arrange the code a bit to encapsulate each stage better. I think generally this code reads more clearly than the old code, but YMMV. I've arranged things so you write one function, but it's compiled into two specializations, one for tail=0 (Body) and one for tail>0 (Tail). It's pretty tidy.
For now I've just burned a register to pass around tail. It's 2 bits now, maybe soon 3 with AVX, and capped at 4 for even the craziest new toys, so there are plenty of places we can pack it if we want to get clever.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2717
Change-Id: I45852a3e5d4c5b5e9315302c46601aee0d32265f
Reviewed-on: https://skia-review.googlesource.com/2717
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reason for revert:
Regression on displacement GM when sRGB support is lacking.
Original issue's description:
> Tag checkerboard bitmaps as sRGB
>
> Significantly reduces the diff between legacy and sRGB/F16 on about 25
> GMs. This is just the biggest piece of low-hanging fruit. Many GMs create
> N32 raster surfaces to procedurally generate source textures, and I'd like
> to fix all of them. It's much easier to reason about the GMs (is sRGB
> doing the right thing) when everything is tagged like this - the only
> expected differences are due to filtering and blending.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2368933003
>
> Committed: https://skia.googlesource.com/skia/+/fe843cea499ba163d53281425af210b1887d28e7TBR=mtklein@google.com,reed@google.com,robertphillips@google.com
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG=skia:
Review-Url: https://codereview.chromium.org/2375063002
Unlike nanobench this tool has no purpose when built in Debug mode.
Just don't let it happen.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2718
Change-Id: Iaa7b8c44d46024485d4f5ce3d9c3e33d865b99d7
Reviewed-on: https://skia-review.googlesource.com/2718
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Gradients (and other shaders) are going to end up serializing this
particular color space very frequently, so we want a shorthand way of
writing it out. I think it's also helpful to have a clearer way of
creating it (vs. NewNamed(kSRGB_Named)->makeLinearGamma()).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2377763002
Review-Url: https://codereview.chromium.org/2377763002
Today if you use the simple SK_RASTER_STAGE interface to build a pipeline, each stage you add calls into a next stage. The last stage you add calls into a special backstop stage JustReturn that, well, just returns, ending the pipeline.
This adds last(), which cuts that last stage off the pipeline. Instead, the stage you add using last() returns directly, ending the pipeline itself without jumping into JustReturn.
This reduces the overhead of using the pipelined version of SkRasterPipelineBench from ~25% to ~20% on my desktop.
Also, add docs.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2713
Change-Id: I11469378e2765c6e34db52eb3eef648d6612da3f
Reviewed-on: https://skia-review.googlesource.com/2713
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
I'm not seeing any problems with these locally. Perhaps the bots have something to say.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2709
Change-Id: I6f0c7045c8f270efcd71d837f22a40e9f9d3e9b7
Reviewed-on: https://skia-review.googlesource.com/2709
Commit-Queue: Ben Wagner <bungeman@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
I don't know _why_ Clang would like these .inc files to have a newline at the end of the file, but it seems a harmless way to silence the warning.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2711
Change-Id: I6c530ee5096c48c91ddf322aca916e70a0dd770b
Reviewed-on: https://skia-review.googlesource.com/2711
Reviewed-by: Ethan Nicholas <ethannicholas@google.com>