Motivation: gross code simplification, also no bitset lookups at draw time.
SkPDFFont owns its glyph useage bitset.
SkPDFSubstituteMap goes away.
SkPDFObject interface is simplified.
SkPDFDocument tracks font usage (as hash set), not glyph usage.
SkPDFFont gets a simpler constructor.
SkPDFFont has first and last glyph set in constructor, not adjusted later.
SkPDFFont implementations are simplified.
SkPDFGlyphSet is replaced with simple SkBitSet.
SkPDFFont sizes its SkBitSets based on glyph count.
SkPDFGlyphSetMap goes away.
SkBitSet is now non-copyable.
SkBitSet now how utility methods to match old SkPDFGlyphSet.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2253283004
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Release-GDI-Trybot,Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Debug-GDI-Trybot
Review-Url: https://codereview.chromium.org/2253283004
Useful when:
(1) Client does not realize src and dst match (calls color
xform anyway).
(2) Client wants half floats, src and dst have matching
gamuts
(3) Client wants premul (done correctly in linear space),
src and dst have matching gamuts.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2206403003
Review-Url: https://codereview.chromium.org/2206403003
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Committed: https://skia.googlesource.com/skia/+/d22a817ff57986407facd16af36320fc86ce02da
Review-Url: https://codereview.chromium.org/2225393002
Impl Overview
(1) Keep the device clip bounds up to date. This
requires minimal additional work in a few places
throughout canvas.
(2) Keep track of if the ctm isScaleTranslate. Yes,
there's a function that does this, but it's slow
to call.
(3) Perform the src->device transform in quick reject,
then check intersection/nan.
Other Notes:
(1) NaN and intersection checks are performed
simultaneously.
(2) We no longer quick reject infinity.
(3) Affine and perspective are both handled in the slow
case.
(4) SkRasterClip::isEmpty() is handled by the intersection
check.
Performance on Nexus 6P:
93.2ms -> 59.8ms
Overall Android Jank Tests Performance Impact:
Should gain us a ms or two on some tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2225393002
Review-Url: https://codereview.chromium.org/2225393002
The recording bench must record some source material into some sort of
display list, and fundamentally cannot separate the timing of the two.
This CL makes it so the source material and display list are of the same type.
So instead of previous:
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkRecord-based picture -> threadsafe SkLiteDL
Now this times
--nolite: SkRecord-based picture -> SkRecord-based picture
--lite: SkLiteDL -> threadsafe SkLiteDL
This makes it easier to profile SkLiteDL and explore both recording and playback overhead hot spots.
The threadsafety is incidental for the source (and doesn't affect playback speed),
but I think it's handy to keep around on the destination to make a more fair comparison.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2230323002
Review-Url: https://codereview.chromium.org/2230323002
- This code is entirely private and is not being used by anything.
- In a future CL we will write a class that uses CurveMeasure to compute dash points. In order to determine whether CurveMeasure or PathMeasure should be faster, we need the dash info (the sum of the on/off intervals and how many there are)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2187083002
Review-Url: https://codereview.chromium.org/2187083002
About 9x faster than Murmur3 for long inputs.
Most of this is a mechanical change from SkChecksum::Murmur3(...) to SkOpts::hash(...).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2208903002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac-Clang-x86_64-Release-CMake-Trybot
Review-Url: https://codereview.chromium.org/2208903002
SkLiteRecorder, a new SkCanvas, fills out SkLiteDL, a new SkDrawable.
This SkDrawable is a display list similar to SkRecord and SkBigPicture / SkRecordedDrawable, but with a few new design points inspired by Android and slimming paint:
1) SkLiteDL is structured as one big contiguous array rather than the two layer structure of SkRecord. This trades away flexibility and large-op-count performance for better data locality for small to medium size pictures.
2) We keep a global freelist of SkLiteDLs, both reusing the SkLiteDL struct itself and its contiguous byte array. This keeps the expected number of mallocs per display list allocation <1 (really, ~0) for cyclical use cases.
These two together mean recording is faster. Measuring against the code we use at head, SkLiteRecorder trends about ~3x faster across various size pictures, matching speed at 0 draws and beating the special-case 1-draw pictures we have today. (I.e. we won't need those special case implementations anymore, because they're slower than this new generic code.) This new strategy records 10 drawRects() in about the same time the old strategy took for 2.
This strategy stays the winner until at least 500 drawRect()s on my laptop, where I stopped checking.
A simpler alternative to freelisting is also possible (but not implemented here), where we allow the client to manually reset() an SkLiteDL for reuse when its refcnt is 1. That's essentially what we're doing with the freelist, except tracking what's available for reuse globally instead of making the client do it.
This code is not fully capable yet, but most of the key design points are there. The internal structure of SkLiteDL is the area I expect to be most volatile (anything involving Op), but its interface and the whole of SkLiteRecorder ought to be just about done.
You can run nanobench --match picture_overhead as a demo. Everything it exercises is fully fleshed out, so what it tests is an apples-to-apples comparison as far as recording costs go. I have not yet compared playback performance.
It should be simple to wrap this into an SkPicture subclass if we want.
I won't start proposing we replace anything old with anything new quite yet until I have more ducks in a row, but this does look pretty promising (similar to the SkRecord over old SkPicture change a couple years ago) and I'd like to land, experiment, iterate, especially with an eye toward Android.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2213333002
Review-Url: https://codereview.chromium.org/2213333002
With the move from SkData::NewXXX to SkData::MakeXXX most
SkAutoTUnref<SkData> were changed to sk_sp<SkData>. However,
there are still a few SkAutoTUnref<SkData> around, so clean
them up.
Review-Url: https://codereview.chromium.org/2212493002
Most visibly this adds a macro SK_RASTER_STAGE that cuts down on the boilerplate of defining a raster pipeline stage function.
Most interestingly, SK_RASTER_STAGE doesn't define a SkRasterPipeline::Fn, but rather a new type EasyFn. This function is always static and inlined, and the details of interacting with the SkRasterPipeline::Stage are taken care of for you: ctx is just passed as a void*, and st->next() is always called. All EasyFns have to do is take care of the meat of the work: update r,g,b, etc. and read and write from their context.
The really neat new feature here is that you can either add EasyFns to a pipeline with the new append() functions, _or_ call them directly yourself. This lets you use the same set of pieces to build either a pipelined version of the function or a custom, fused version. The bench shows this off.
On my desktop, the pipeline version of the bench takes about 25% more time to run than the fused one.
The old approach to creating stages still works fine. I haven't updated SkXfermode.cpp or SkArithmeticMode.cpp because they seemed just as clear using Fn directly as they would have using EasyFn.
If this looks okay to you I will rework the comments in SkRasterPipeline to explain SK_RASTER_STAGE and EasyFn a bit as I've done here in the CL description.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2195853002
Review-Url: https://codereview.chromium.org/2195853002
Motivation:
SkPDFStream and SkPDFSharedStream now work the same.
Also:
- move SkPDFStream into SkPDFTypes (it's a fundamental PDF type).
- minor refactor of SkPDFSharedStream
- SkPDFSharedStream takes unique_ptr to represent ownership
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2190883003
Review-Url: https://codereview.chromium.org/2190883003
AtomicTest was the only use of sk_atomic_add().
AtomicInc64 bench was the only use of sk_atomic_inc(int64_t*).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2183473005
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-TSAN-Trybot,Test-Ubuntu-GCC-Golo-GPU-GT610-x86_64-Release-TSAN-Trybot
Review-Url: https://codereview.chromium.org/2183473005
This trims the SkPM4fPriv methods down to just foolproof methods.
(Anything trying to build these itself is probably wrong.)
Things like Sk4f srgb_to_linear(Sk4f) can't really exist anymore,
at least not efficiently, so this refactor is somewhat more invasive
than you might think. Generally this means things using to_4f() are
also making a misstep... that's gone too.
It also does not make sense to try to play games with linear floats
with 255 bias any more. That hack can't work with real sRGB coding.
Rather than update them, I've removed a couple of L32 xfermode fast
paths. I'd even rather drop it entirely...
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2163683002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2163683002
I basically just ran a big 5-deep for-loop over the five constants here.
This is the first set of coefficients I found that round trips all bytes.
I suspect there are many such sets.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2162063003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2162063003
This should give us a good baseline to explore using SkRasterPipeline.
A particular colorxform to half float drops from 425us to 282us on my desktop.
Color Xform to Half Float (HP z620)
Original 425us
Trans16 (not 32) 355us
Vector Trans16 378us
Trans16 + Keep Halfs in Vector 335us
Vector Trans16 + Keep Halfs in Vector 282us
Final 282us
Color Xform to Half Float (Nexus 5X)
Original 556us
Final 472us
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2159993003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2159993003
SkPDFUtils now has a special function (SkPDFUtils::AppendColorComponent)
just for writing out (color/255) as a decimal with three digits of
precision.
SkPDFUnion now has a type to represent a color component. It holds a
utint_8, but calls into AppendColorComponent to serialize.
Added a unit test that tests all possible input values.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2151863003
Review-Url: https://codereview.chromium.org/2151863003
I measured relative runtimes on my laptop:
pack_int_uint16_t_ss…
1036 …e41 1x …se3 1.01x …e2_b 3.01x …e2_a 3.02x
I've run into Clang problems with the actual _mm_packus_epi32 instruction, I think,
so I'm going to exercise a little cowardice and leave that option disabled for now.
The ssse3 version probably looks a little faster than it will be in practice.
We'll usually need to load its mask, which here is hoisted out of the bench loop.
The two sse2 variants are close enough in speed that I'm tie breaking them on other
concerns: the <<16, >>16 version doesn't need any scratch registers or to load any
constants, so it wins.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2150343002
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Review-Url: https://codereview.chromium.org/2150343002
If we make sure all SkOpts functions are static, we can give the namespaces any
name we like. This lets us drop the sk_ prefix and give a real indication of
the default SIMD instruction set rather than just saying sk_default.
Both of these changes help debugger, profiler, and crash report readability.
Perhaps more importantly, keeping these functions static helps prevent
accidentally linking in unused versions of functions, as you see here with
sk_avx::srcover_srgb_srgb().
This requires we update SkBlend_opts tests and benches to call SkOpts functions
through SkOpts rather than declaring the methods externally. In practice this
drops testing of the SSE2 version on machines with SSE4. If we still really
need to test/bench the compile time best SIMD level version of this method
against the runtime detected best, we can include SkBlend_opts.h into the tests
or benches directly, similar to what we do for the trivial, brute-force, or best
non-SIMD versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145833002
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2145833002
SkMatrix::scale and ::rotate take a point around which to scale or rotate.
Canvas lacks these helpers, so the code to rotate a canvas around a
point has been duplicated many times. Factor all of these
implementations into SkCanvas::rotate.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2142033002
Review-Url: https://codereview.chromium.org/2142033002
Adds a module that performs instanced rendering and starts using it
for a select subset of draws on Mac GL platforms. The instance
processor can currently handle rects, ovals, round rects, and double
round rects. It can generalize shapes as round rects in order to
improve batching. The instance processor also employs new drawing
algorithms, irrespective of instanced rendering, that improve GPU-side
performance (e.g. sample mask, different triangle layouts, etc.).
This change only scratches the surface of instanced rendering. The
majority of draws still only have one instance. Future work may
include:
* Passing coord transforms through the texel buffer.
* Sending FP uniforms through instanced vertex attribs.
* Using instanced rendering for more draws (stencil writes,
drawAtlas, etc.).
* Adding more shapes to the instance processor’s repertoire.
* Batching draws that have mismatched scissors (analyzing draw
bounds, inserting clip planes, etc.).
* Bindless textures.
* Uber shaders.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2066993003
Committed: https://skia.googlesource.com/skia/+/42eafa4bc00354b132ad114d22ed6b95d8849891
Review-Url: https://codereview.chromium.org/2066993003
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
QCMS Xform 0.495 ms
Skia Old Xform 0.235 ms
Skia NEW Xform 0.423 ms
Vs Old Code 0.56x
Vs QCMS 1.17x
So to summarize, we are now much slower than before,
but still a bit faster than QCMS. And now we are also
far more accurate than QCMS :).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2060823003
Because we recognize commonly used gamma tables and
parameters as 2.2f, about 98% of jpegs with color profiles
will pass through this xform (assuming the dst is also
2.2f). Sample size is 10,322 jpegs.
I won't go crazy with performance numbers because this is
a work in progress, particularly in terms of correctness.
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
Decode Time + QCMS Xform 1.28 ms
QCMS Xform Only 0.495 ms
Decode Time + Skia Opt Xform 1.01 ms
Skia Opt Xform Only 0.235 ms
Decode Time + Xform Speed-up 1.27x
Xform Only Speed-up 2.11x
FWIW, Skia xform time before these optimizations was
41.1 ms. But we expected that code to be slow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2046013002
$ git grep -l '<windows.h>' include src
include/private/SkLeanWindows.h
$ git grep -l SkLeanWindows.h | grep '\.h$'
include/ports/SkTypeface_win.h
include/utils/win/SkHRESULT.h
include/utils/win/SkTScopedComPtr.h
include/views/SkEvent.h
src/core/SkMathPriv.h
src/ports/SkTypeface_win_dw.h
src/utils/SkThreadUtils_win.h
src/utils/win/SkWGL.h
The same for `#include <intrin.h>` that was found in SkMath.h.
Those functions that needed it are moved to SkMathPriv.h.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041943002
CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_chromium_compile_dbg_ng,win_chromium_compile_rel_ng
Review-Url: https://codereview.chromium.org/2041943002
When targetting iOS and using gyp to generate the build files, it is not
possible to select files to build depending on the architecture. Due to
that, the skia code was disabling all optimisation when SK_BUILD_FOR_IOS
was defined.
Since it is possible to select the correct optimised version when using
gn, this pessimisation is hurting the build. Introduce a new define to
disable the optimisation SK_BUILD_NO_OPTS. It will be used by Chromium
when building skia for iOS with gyp but not gn.
Define SK_BUILD_NO_OPTS along-side SK_BUILD_FOR_IOS for all files that
look like build configuration (Xcode projects, gyp configuration files,
public.bzl) in order to avoid introducing breakage on those builds.
BUG=607933
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2002423002
Review-Url: https://codereview.chromium.org/2002423002
- remove dead code
- rewrite float -> int converters
The strategy for the new converters is:
- convert input to double
- floor/ceil/round in double space
- pin that double to [SK_MinS32, SK_MaxS32]
- truncate that double to int32_t
This simpler strategy does not work:
- floor/ceil/round in float space
- pin that float to [SK_MinS32, SK_MaxS32]
- truncate that float to int32_t
SK_MinS32 and SK_MaxS32 are not representable as floats:
they round to the nearest float, ±2^31, which makes the
pin insufficient for floats near SK_MinS32 (-2^31+1) or
SK_MaxS32 (+2^31-1).
float only has 24 bits of precision, and we need 31.
double can represent all integers up to 50-something bits.
An alternative is to pin in float to ±2147483520, the last
exactly representable float before SK_MaxS32 (127 too small).
Our tests test that we round as floor(x+0.5), which can
return different numbers than round(x) for negative x.
So this CL explicitly uses floor(x+0.5).
I've updated the tests with ±inf and ±NaN, and tried to
make them a little clearer, especially using SK_MinS32
instead of -SK_MaxS32.
I have not timed anything here. I have never seen any of these
methods in a profile.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2012333003
Review-Url: https://codereview.chromium.org/2012333003
This reverts commit 554784cd85 and
1956b4ae1c
Reason for revert - ASAN failures, e.g. from https://uberchromegw.corp.google.com/i/client.skia/builders/Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug-MSAN/builds/2233/steps/perf_skia%20on%20Ubuntu/logs/stdio :
Uninitialized value was created by a heap allocation
0 0x7f69aa96f799 in operator new[](unsigned long) /b/work/skia/third_party/externals/llvm/out/../projects/compiler-rt/lib/msan/msan_new_delete.cc:37
1 0x7f69aaa315c1 in SkAutoTArray<unsigned int>::reset(int) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../include/private/../private/SkTemplates.h:137:22
2 0x7f69aaa34ee9 in LinearSrcOverBench<SrcOverVSkOptsSSE41>::LinearSrcOverBench(char const*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:108:9
3 0x7f69aaa30cf2 in $_24::operator()(void*) const /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1
4 0x7f69aaa30c87 in $_24::__invoke(void*) /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/SkBlend_optsBench.cpp:167:1
5 0x7f69aaa68856 in BenchmarkStream::rawNext() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:653:32
6 0x7f69aaa61467 in BenchmarkStream::next() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:642:25
7 0x7f69aaa5b703 in nanobench_main() /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1119:27
8 0x7f69aaa5e10d in main /b/work/skia/out/Build-Ubuntu-GCC-x86_64-Debug-MSAN/Debug/../../../bench/nanobench.cpp:1290:12
9 0x7f69a8c95ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287
TBR=herb@google.com
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1969803002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1969803002
Reason for revert:
for (int i = 0; i < loops; i++) {
54 canvas->drawPoints(SkCanvas::kLines_PointMode, PTS, fPts, paint); 68 canvas->drawLine(fStartPts[i].x(), fStartPts[i].y(), fEndPts[i].x(), fEndPts[i].y(), pai
nt);
This change means we index arbitrarily far into fStartPts (if loops gets big enough).
Original issue's description:
> Modify LineBench for drawing
>
> Currently we only have benchmark for lines that contains randomly generated
> points. This CL modifies the benchmark which adds cases of drawing straight
> lines. Also, this CL changes the call from drawPoints() to drawLines()
> which measures the performance of line drawing.
>
> BUG=skia:5243
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1936153002
>
> Committed: https://skia.googlesource.com/skia/+/6b27a5e7292d9a18e376f0c229ec62f7b801305aTBR=bsalomon@chromium.org,robertphillips@chromium.org,robertphillips@google.com,xidachen@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:5243
Review-Url: https://codereview.chromium.org/1952063005
Refactor GrGpuResource to contain two different pieces of state:
a) instance is budgeted or not budgeted
b) instance references wrapped backend objects or not
The "object lifecycle" was also attached to backend object
handles (ids), which made the code a bit unclear. Backend objects
would be associated with GrGpuResource::LifeCycle, even though
GrGpuResource::LifeCycle refers to the GpuResource, and individual
backend objects in one GpuResource might be governed with different
"lifecycle".
Mark the budgeted/not budgeted with SkBudgeted::kYes, SkBudgeted::kNo.
This was previously GrGpuResource::kCached_LifeCycle,
GrGpuResource::kUncached_LifeCycle.
Mark the "references wrapped object" with boolean. This was previously
GrGpuResource::kBorrowed_LifeCycle,
GrGpuResource::kAdopted_LifeCycle for GrGpuResource.
Associate the backend object ownership status with
GrBackendObjectOwnership for the backend object handles.
The resource type leaf constuctors, such has GrGLTexture or
GrGLTextureRenderTarget take "budgeted" parameter. This parameter
is passed to GrGpuResource::registerWithCache().
The resource type intermediary constructors, such as GrGLTexture
constructors for class GrGLTextureRenderTarget do not take "budgeted"
parameters, intermediary construtors do not call registerWithCache.
Removes the need for tagging GrGpuResource -derived subclass
constructors with "Derived" parameter.
Makes instances that wrap backend objects be registered with
a new function GrGpuResource::registerWithCacheWrapped().
Removes "budgeted" parameter from classes such as StencilAttahment, as
they are always cached and never wrap any external backend objects.
Removes the use of concept "external" from the member function names.
The API refers to the objects as "wrapped", so make all related
functions use the term consistently.
No change in functionality. Resources referencing wrapped objects are
always inserted to the cache with budget decision kNo.
BUG=594928
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1862043002
Review URL: https://codereview.chromium.org/1862043002
Motivation: This function is used throughout SkPDF.
Note that the compiler can usually inline the result of strlen() for literal strings.
Before:
out/Release/nanobench -m WStreamWriteText -q
Timer overhead: 24.2ns
! -> high variance, ? -> moderate variance
micros bench
6.10 WStreamWriteText nonrendering
After:
out/Release/nanobench -m WStreamWriteText -q
Timer overhead: 23.9ns
! -> high variance, ? -> moderate variance
micros bench
2.51 WStreamWriteText nonrendering
PDF runtime change: -0.8% ±0.04%.
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1844343004
Review URL: https://codereview.chromium.org/1844343004
Before this change, the PDFCanon held a map from BitmapKeys
to SkImages for de-duping bitmaps. Even if the PDFDocument
serialized images early, the Canon still held a ref to that
image inside the map. With this change, the Canon holds a
single map from BitmapKeys to PDFObjects. Now, Images are
only held by the PDFObject, which the document serializes
and drops early.
This change also:
- Moves SkBitmapKey into its own header (for possible
reuse); it now can operate with images as well as
bitmaps.
- Creates SkImageBitmap, which wraps a pointer to a bitmap
or an image and abstracts out some common tasks so that
drawBitmap and drawImage behave the same.
- Modifies SkPDFCreateBitmapObject to take and return a
sk_sp<T>, not a T*.
- Refactors SkPDFDevice::internalDrawImage to use bitmaps
or images (via a SkImageBitmap).
- Turns on pre-serialization of all images.
BUG=skia:5087
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1829693002
Review URL: https://codereview.chromium.org/1829693002
Call drop() after calling emitObject() on top-level objects. In Debug
mode, assert that each object is emited exactly once by asserting that
emitObject is never called after drop(). Same for addResources().
To make sure that top level objects don't get deleted prematurely,
SkPDFObjNumMap takes a reference.
Motivation: save RAM. Allow even earlier serialization with later
changes.
Also: Switch some SkTDArrays to SkTArrays.
BUG=skia:5087
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1790023003
Review URL: https://codereview.chromium.org/1790023003
The C++ standard library uses the name "release" for the operation we call "detach".
Rewriting each "detach(" to "release(" brings us a step closer to using standard library types directly (e.g. std::unique_ptr instead of SkAutoTDelete).
This was a fairly blind transformation. There may have been unintentional conversions in here, but it's probably for the best to have everything uniformly say "release".
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1809733002
Review URL: https://codereview.chromium.org/1809733002
Reason for revert:
Breaks Ubuntu and Mac CMAKE
Original issue's description:
> Make SkPicture/SkImageGenerator default to SkCodec
>
> Remove reference to SkImageDecoder from SkPicture. Make the default
> InstallPixelRefProc passed to CreateFromStream use
> SkImageGenerator::NewFromEncoded instead.
>
> Make SkImageGenerator::NewFromEncoded create an SkCodecImageGenerator.
> Remove the old version that used SkImageDecoder.
>
> Remove all versions of lazy_decode_bitmap/LazyDecodeBitmap. The default
> now behaves lazily.
>
> Update all clients to use the default.
>
> Move SkImageDecoderGenerator into KtxTest.cpp, and use it directly.
>
> BUG=skia:4691
> BUG=skia:4290
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1671193002
>
> Committed: https://skia.googlesource.com/skia/+/026388a01864c74208ad57d1ba4f711602d101c6TBR=msarett@google.com,reed@google.com,scroggo@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4691
Review URL: https://codereview.chromium.org/1685963004
Remove reference to SkImageDecoder from SkPicture. Make the default
InstallPixelRefProc passed to CreateFromStream use
SkImageGenerator::NewFromEncoded instead.
Make SkImageGenerator::NewFromEncoded create an SkCodecImageGenerator.
Remove the old version that used SkImageDecoder.
Remove all versions of lazy_decode_bitmap/LazyDecodeBitmap. The default
now behaves lazily.
Update all clients to use the default.
Move SkImageDecoderGenerator into KtxTest.cpp, and use it directly.
BUG=skia:4691
BUG=skia:4290
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1671193002
Review URL: https://codereview.chromium.org/1671193002
Reason for revert:
Speculative to fix windows bots
Original issue's description:
> Treat bad values passed to --images as a fatal error
>
> If an option is passed to --images that is either a non-existent path or
> a folder with no images matching the supported types, assume this is
> an error and exit, so they can supply a valid path instead.
>
> Share code between DM and nanobench in SkCommonFlags.
>
> nanobench now behaves more like DM - it will check a directory for
> images that match the supported extensions.
>
> Only consider image paths ending in RAW suffixes as images if
> SK_CODE_DECODES_RAW is defined. This prevents us from seeing failure
> to decode errors on platforms that cannot decode it.
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1611323004
>
> Committed: https://skia.googlesource.com/skia/+/7579786f3bd5a8fda84a1abc45b16213c3371f93TBR=mtklein@google.com,borenet@google.com,msarett@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
# Not skipping CQ checks because original CL landed more than 1 days ago.
Review URL: https://codereview.chromium.org/1653543002
If an option is passed to --images that is either a non-existent path or
a folder with no images matching the supported types, assume this is
an error and exit, so they can supply a valid path instead.
Share code between DM and nanobench in SkCommonFlags.
nanobench now behaves more like DM - it will check a directory for
images that match the supported extensions.
Only consider image paths ending in RAW suffixes as images if
SK_CODE_DECODES_RAW is defined. This prevents us from seeing failure
to decode errors on platforms that cannot decode it.
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1611323004
Review URL: https://codereview.chromium.org/1611323004
- Plant a flag to say "pretend all the inputs are RGBA".
This is how libpng thinks.
This is the opposite of what the implementation had been doing,
so I've rearranged everything to reflect the new orientation.
- Rewrite the names to be less mysterious looking. No more Xs.
- Make the src type uniformly const void*, to allow for 888 (RGB) srcs.
This should be performance and pixel neutral. (Please revert if it's not.)
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1626463002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1626463002
This should make skipping an image much cheaper.
Before:
$ time out/Release/nanobench --images resources/ --match sdkjlfasjlfds
4.65 real 4.41 user 0.19 sys
$ time out/Release/nanobench --images resources/ --match sdkjlfasjlfds
0.05 real 0.03 user 0.01 sys
The effect should be much more dramatic when there are more images to skip (e.g. on the bots).
This cuts about 6 minutes off the Debug CQ trybot.
BUG=skia:4768
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1577873002
Review URL: https://codereview.chromium.org/1577873002
Remove refcounting from SkGLContext.
SkGLContext is expected to behave like GrContextFactory would own
it, as implied by the GrContextFactory function.
If it is refcounted, this does not hold.
Also other use sites, such as in SkOSWindow_win (command buffer gl
object), confirm the behavior. The object is explicitly owned and
destroyed, not shared.
Also fixes potential crashes from using GL context of an abandoned
context.
Also fixes potential crashes in DM/nanobench, if the GrContext lives
longer than GLContext through internal refing of GrContext.
Moves the non-trivial implementations from GrContextFactory.h to
.cpp, just for consistency sake.
Changes pathops_unittest.gyp. The pathops_unittest uses
GrContextFactory, but did not link to its implementation. The reason
they worked was that the implementation used (constructors, destructors)
happened to be in the .h file.
This works towards being able to use command buffer and NVPR from
the SampleApp.
BUG=skia:2992
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1511773005
Committed: https://skia.googlesource.com/skia/+/830e012187f951d49d7e46e196ac8d1e653a25da
Review URL: https://codereview.chromium.org/1511773005
Reason for revert:
Broke tests on Android, iOS, Mac and Windows.
Original issue's description:
> Make SkGLContext lifetime more well-defined
>
> Remove refcounting from SkGLContext.
>
> SkGLContext is expected to behave like GrContextFactory would own
> it, as implied by the GrContextFactory function.
>
> If it is refcounted, this does not hold.
>
> Also other use sites, such as in SkOSWindow_win (command buffer gl
> object), confirm the behavior. The object is explicitly owned and
> destroyed, not shared.
>
> Also fixes potential crashes from using GL context of an abandoned
> context.
>
> Also fixes potential crashes in DM/nanobench, if the GrContext lives
> longer than GLContext through internal refing of GrContext.
>
> Moves the non-trivial implementations from GrContextFactory.h to
> .cpp, just for consistency sake.
>
> Changes pathops_unittest.gyp. The pathops_unittest uses
> GrContextFactory, but did not link to its implementation. The reason
> they worked was that the implementation used (constructors, destructors)
> happened to be in the .h file.
>
> This works towards being able to use command buffer and NVPR from
> the SampleApp.
>
> BUG=skia:2992
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1511773005
>
> Committed: https://skia.googlesource.com/skia/+/830e012187f951d49d7e46e196ac8d1e653a25daTBR=bsalomon@google.com,kkinnunen@nvidia.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:2992
Review URL: https://codereview.chromium.org/1555053003
Remove refcounting from SkGLContext.
SkGLContext is expected to behave like GrContextFactory would own
it, as implied by the GrContextFactory function.
If it is refcounted, this does not hold.
Also other use sites, such as in SkOSWindow_win (command buffer gl
object), confirm the behavior. The object is explicitly owned and
destroyed, not shared.
Also fixes potential crashes from using GL context of an abandoned
context.
Also fixes potential crashes in DM/nanobench, if the GrContext lives
longer than GLContext through internal refing of GrContext.
Moves the non-trivial implementations from GrContextFactory.h to
.cpp, just for consistency sake.
Changes pathops_unittest.gyp. The pathops_unittest uses
GrContextFactory, but did not link to its implementation. The reason
they worked was that the implementation used (constructors, destructors)
happened to be in the .h file.
This works towards being able to use command buffer and NVPR from
the SampleApp.
BUG=skia:2992
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1511773005
Review URL: https://codereview.chromium.org/1511773005
Add extended config specification form that can be used to run different
gpu backend with different APIs.
The configs can be specified with the form:
gpu(api=string,dit=bool,nvpr=bool,samples=int)
This replaces and removes the --gpuAPI flag.
All existing configs should still work.
Adds following documentation:
out/Debug/dm --help config
Flags:
--config: type: string default: 565 8888 gpu nonrendering
Options: 565 8888 debug gpu gpudebug gpudft gpunull msaa16 msaa4
nonrendering null nullgpu nvprmsaa16 nvprmsaa4 pdf pdf_poppler skp svg
xps or use extended form 'backend(option=value,...)'.
Extended form: 'backend(option=value,...)'
Possible backends and options:
gpu(api=string,dit=bool,nvpr=bool,samples=int) GPU backend
api type: string default: native.
Select graphics API to use with gpu backend.
Options:
native Use platform default OpenGL or OpenGL ES backend.
gl Use OpenGL.
gles Use OpenGL ES.
debug Use debug OpenGL.
null Use null OpenGL.
dit type: bool default: false.
Use device independent text.
nvpr type: bool default: false.
Use NV_path_rendering OpenGL and OpenGL ES extension.
samples type: int default: 0.
Use multisampling with N samples.
Predefined configs:
gpu = gpu()
msaa4 = gpu(samples=4)
msaa16 = gpu(samples=16)
nvprmsaa4 = gpu(nvpr=true,samples=4)
nvprmsaa16 = gpu(nvpr=true,samples=16)
gpudft = gpu(dit=true)
gpudebug = gpu(api=debug)
gpunull = gpu(api=null)
debug = gpu(api=debug)
nullgpu = gpu(api=null)
BUG=skia:2992
Committed: https://skia.googlesource.com/skia/+/e13ca329fca4c28cf4e078561f591ab27b743d23
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1490113005
Committed: https://skia.googlesource.com/skia/+/c8b4336444e7b90382e04e33665fb3b8490b825b
Committed: https://skia.googlesource.com/skia/+/9ebc3f0ee6db215dde461dc4777d85988cf272dd
Review URL: https://codereview.chromium.org/1490113005
Add extended config specification form that can be used to run different
gpu backend with different APIs.
The configs can be specified with the form:
gpu(api=string,dit=bool,nvpr=bool,samples=int)
This replaces and removes the --gpuAPI flag.
All existing configs should still work.
Adds following documentation:
out/Debug/dm --help config
Flags:
--config: type: string default: 565 8888 gpu nonrendering
Options: 565 8888 debug gpu gpudebug gpudft gpunull msaa16 msaa4
nonrendering null nullgpu nvprmsaa16 nvprmsaa4 pdf pdf_poppler skp svg
xps or use extended form 'backend(option=value,...)'.
Extended form: 'backend(option=value,...)'
Possible backends and options:
gpu(api=string,dit=bool,nvpr=bool,samples=int) GPU backend
api type: string default: native.
Select graphics API to use with gpu backend.
Options:
native Use platform default OpenGL or OpenGL ES backend.
gl Use OpenGL.
gles Use OpenGL ES.
debug Use debug OpenGL.
null Use null OpenGL.
dit type: bool default: false.
Use device independent text.
nvpr type: bool default: false.
Use NV_path_rendering OpenGL and OpenGL ES extension.
samples type: int default: 0.
Use multisampling with N samples.
Predefined configs:
gpu = gpu()
msaa4 = gpu(samples=4)
msaa16 = gpu(samples=16)
nvprmsaa4 = gpu(nvpr=true,samples=4)
nvprmsaa16 = gpu(nvpr=true,samples=16)
gpudft = gpu(dit=true)
gpudebug = gpu(api=debug)
gpunull = gpu(api=null)
debug = gpu(api=debug)
nullgpu = gpu(api=null)
BUG=skia:2992
Committed: https://skia.googlesource.com/skia/+/e13ca329fca4c28cf4e078561f591ab27b743d23
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1490113005
Committed: https://skia.googlesource.com/skia/+/c8b4336444e7b90382e04e33665fb3b8490b825b
Review URL: https://codereview.chromium.org/1490113005
Reason for revert:
This CL changed 1200 images on gold, when I wouldn't expect any diffs from the description.
Original issue's description:
> Add config options to run different GPU APIs to dm and nanobench
>
> Add extended config specification form that can be used to run different
> gpu backend with different APIs.
>
> The configs can be specified with the form:
> gpu(api=string,dit=bool,nvpr=bool,samples=int)
>
> This replaces and removes the --gpuAPI flag.
>
> All existing configs should still work.
>
> Adds following documentation:
>
> out/Debug/dm --help config
>
> Flags:
> --config: type: string default: 565 8888 gpu nonrendering
> Options: 565 8888 debug gpu gpudebug gpudft gpunull msaa16 msaa4
> nonrendering null nullgpu nvprmsaa16 nvprmsaa4 pdf pdf_poppler skp svg
> xps or use extended form 'backend(option=value,...)'.
>
> Extended form: 'backend(option=value,...)'
>
> Possible backends and options:
>
> gpu(api=string,dit=bool,nvpr=bool,samples=int) GPU backend
> api type: string default: native.
> Select graphics API to use with gpu backend.
> Options:
> native Use platform default OpenGL or OpenGL ES backend.
> gl Use OpenGL.
> gles Use OpenGL ES.
> debug Use debug OpenGL.
> null Use null OpenGL.
> dit type: bool default: false.
> Use device independent text.
> nvpr type: bool default: false.
> Use NV_path_rendering OpenGL and OpenGL ES extension.
> samples type: int default: 0.
> Use multisampling with N samples.
>
> Predefined configs:
>
> gpu = gpu()
> msaa4 = gpu(samples=4)
> msaa16 = gpu(samples=16)
> nvprmsaa4 = gpu(nvpr=true,samples=4)
> nvprmsaa16 = gpu(nvpr=true,samples=16)
> gpudft = gpu(dit=true)
> gpudebug = gpu(api=debug)
> gpunull = gpu(api=null)
> debug = gpu(api=debug)
> nullgpu = gpu(api=null)
>
> BUG=skia:2992
>
> Committed: https://skia.googlesource.com/skia/+/e13ca329fca4c28cf4e078561f591ab27b743d23
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1490113005
>
> Committed: https://skia.googlesource.com/skia/+/c8b4336444e7b90382e04e33665fb3b8490b825bTBR=mtklein@google.com,bsalomon@google.com,scroggo@google.com,kkinnunen@nvidia.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:2992
Review URL: https://codereview.chromium.org/1536963002
Add extended config specification form that can be used to run different
gpu backend with different APIs.
The configs can be specified with the form:
gpu(api=string,dit=bool,nvpr=bool,samples=int)
This replaces and removes the --gpuAPI flag.
All existing configs should still work.
Adds following documentation:
out/Debug/dm --help config
Flags:
--config: type: string default: 565 8888 gpu nonrendering
Options: 565 8888 debug gpu gpudebug gpudft gpunull msaa16 msaa4
nonrendering null nullgpu nvprmsaa16 nvprmsaa4 pdf pdf_poppler skp svg
xps or use extended form 'backend(option=value,...)'.
Extended form: 'backend(option=value,...)'
Possible backends and options:
gpu(api=string,dit=bool,nvpr=bool,samples=int) GPU backend
api type: string default: native.
Select graphics API to use with gpu backend.
Options:
native Use platform default OpenGL or OpenGL ES backend.
gl Use OpenGL.
gles Use OpenGL ES.
debug Use debug OpenGL.
null Use null OpenGL.
dit type: bool default: false.
Use device independent text.
nvpr type: bool default: false.
Use NV_path_rendering OpenGL and OpenGL ES extension.
samples type: int default: 0.
Use multisampling with N samples.
Predefined configs:
gpu = gpu()
msaa4 = gpu(samples=4)
msaa16 = gpu(samples=16)
nvprmsaa4 = gpu(nvpr=true,samples=4)
nvprmsaa16 = gpu(nvpr=true,samples=16)
gpudft = gpu(dit=true)
gpudebug = gpu(api=debug)
gpunull = gpu(api=null)
debug = gpu(api=debug)
nullgpu = gpu(api=null)
BUG=skia:2992
Committed: https://skia.googlesource.com/skia/+/e13ca329fca4c28cf4e078561f591ab27b743d23
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1490113005
Review URL: https://codereview.chromium.org/1490113005
These were originally used to compare to the old implementation of
subset decoding in SkImageDecoder. The old implementation has been
removed, and they do not provide useful information that is not
covered by the BitmapRegionDecoderBenches.
This will greatly speed up some of our infra bots, which spend a lot of
time decoding interlaced PNGs repeatedly (thanks to
SubsetTranslateBench).
BUG=skia:4715
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1531833002
Review URL: https://codereview.chromium.org/1531833002
Reason for revert:
speculative revert to see if it unblocks the DEPS roll
https://codereview.chromium.org/1529443002
Original issue's description:
> Add config options to run different GPU APIs to dm and nanobench
>
> Add extended config specification form that can be used to run different
> gpu backend with different APIs.
>
> The configs can be specified with the form:
> gpu(api=string,dit=bool,nvpr=bool,samples=int)
>
> This replaces and removes the --gpuAPI flag.
>
> All existing configs should still work.
>
> Adds following documentation:
>
> out/Debug/dm --help config
>
> Flags:
> --config: type: string default: 565 8888 gpu nonrendering
> Options: 565 8888 debug gpu gpudebug gpudft gpunull msaa16 msaa4
> nonrendering null nullgpu nvprmsaa16 nvprmsaa4 pdf pdf_poppler skp svg
> xps or use extended form 'backend(option=value,...)'.
>
> Extended form: 'backend(option=value,...)'
>
> Possible backends and options:
>
> gpu(api=string,dit=bool,nvpr=bool,samples=int) GPU backend
> api type: string default: native.
> Select graphics API to use with gpu backend.
> Options:
> native Use platform default OpenGL or OpenGL ES backend.
> gl Use OpenGL.
> gles Use OpenGL ES.
> debug Use debug OpenGL.
> null Use null OpenGL.
> dit type: bool default: false.
> Use device independent text.
> nvpr type: bool default: false.
> Use NV_path_rendering OpenGL and OpenGL ES extension.
> samples type: int default: 0.
> Use multisampling with N samples.
>
> Predefined configs:
>
> gpu = gpu()
> msaa4 = gpu(samples=4)
> msaa16 = gpu(samples=16)
> nvprmsaa4 = gpu(nvpr=true,samples=4)
> nvprmsaa16 = gpu(nvpr=true,samples=16)
> gpudft = gpu(dit=true)
> gpudebug = gpu(api=debug)
> gpunull = gpu(api=null)
> debug = gpu(api=debug)
> nullgpu = gpu(api=null)
>
> BUG=skia:2992
>
> Committed: https://skia.googlesource.com/skia/+/e13ca329fca4c28cf4e078561f591ab27b743d23TBR=bsalomon@google.com,scroggo@google.com,joshualitt@google.com,kkinnunen@nvidia.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:2992
Review URL: https://codereview.chromium.org/1528473002
Add extended config specification form that can be used to run different
gpu backend with different APIs.
The configs can be specified with the form:
gpu(api=string,dit=bool,nvpr=bool,samples=int)
This replaces and removes the --gpuAPI flag.
All existing configs should still work.
Adds following documentation:
out/Debug/dm --help config
Flags:
--config: type: string default: 565 8888 gpu nonrendering
Options: 565 8888 debug gpu gpudebug gpudft gpunull msaa16 msaa4
nonrendering null nullgpu nvprmsaa16 nvprmsaa4 pdf pdf_poppler skp svg
xps or use extended form 'backend(option=value,...)'.
Extended form: 'backend(option=value,...)'
Possible backends and options:
gpu(api=string,dit=bool,nvpr=bool,samples=int) GPU backend
api type: string default: native.
Select graphics API to use with gpu backend.
Options:
native Use platform default OpenGL or OpenGL ES backend.
gl Use OpenGL.
gles Use OpenGL ES.
debug Use debug OpenGL.
null Use null OpenGL.
dit type: bool default: false.
Use device independent text.
nvpr type: bool default: false.
Use NV_path_rendering OpenGL and OpenGL ES extension.
samples type: int default: 0.
Use multisampling with N samples.
Predefined configs:
gpu = gpu()
msaa4 = gpu(samples=4)
msaa16 = gpu(samples=16)
nvprmsaa4 = gpu(nvpr=true,samples=4)
nvprmsaa16 = gpu(nvpr=true,samples=16)
gpudft = gpu(dit=true)
gpudebug = gpu(api=debug)
gpunull = gpu(api=null)
debug = gpu(api=debug)
nullgpu = gpu(api=null)
BUG=skia:2992
Review URL: https://codereview.chromium.org/1490113005
In the current implementation, a blur filter is always created even in the
case when sigma.fX == 0 && sigma.fY == 0. This CL makes the blur filter
return input in this case.
BUG=568393
Review URL: https://codereview.chromium.org/1518643002
Make SkAutoTMalloc's interface look more like SkAutoMalloc:
- add free(), which does what you expect
- make reset() return a pointer fPtr
No public API changes (SkAutoTMalloc is in include/private)
BUG=skia:2148
Review URL: https://codereview.chromium.org/1516833003
- Carmack rsqrt uses an int where it wants a uint32_t.
- turn off all santizers (including signed-integer-overflow) in third_party/externals/sftntly.
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug-ASAN-Trybot
BUG=skia:4635
Review URL: https://codereview.chromium.org/1511643002
This tries to cut things down to the very minimum you might want to know:
$ out/Release/nanobench --match nytimes --config 8888 gpu -q
Timer overhead: 29.6ns
! -> high variance, ? -> moderate variance
micros bench
2479.05 ! desk_nytimes.skp_1_mpd 8888
1313.92 desk_nytimes.skp_1_mpd gpu
3617.65 desk_nytimes.skp_1 8888
1158.34 desk_nytimes.skp_1 gpu
1368.99 ! keymobi_nytimes_com_.skp_1_mpd 8888
393.40 keymobi_nytimes_com_.skp_1_mpd gpu
1179.68 ! keymobi_nytimes_com_.skp_1 8888
342.74 keymobi_nytimes_com_.skp_1 gpu
All times are printed in microseconds, and high variance runs are marked.
BUG=skia:
Review URL: https://codereview.chromium.org/1493313003
Reason for revert:
BUG=skia:4609
skbug.com/4609
Seems like GrContextFactory needs to fail when the NVPR option is requested but the driver version isn't sufficiently high.
Original issue's description:
> Make NVPR a GL context option instead of a GL context
>
> Make NVPR a GL context option instead of a GL context.
> This may enable NVPR to be run with command buffer
> interface.
>
> No functionality change in DM or nanobench. NVPR can
> only be run with normal GL APIs.
>
> BUG=skia:2992
>
> Committed: https://skia.googlesource.com/skia/+/eeebdb538d476c1bfc8b63a946094ca1b505ecd1TBR=mtklein@google.com,jvanverth@google.com,kkinnunen@nvidia.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:2992
Review URL: https://codereview.chromium.org/1486153002
eeebdb538d did not update a Config that
only builds if SK_BUILD_FOR_ANDROID_FRAMEWORK. This appears to be the
necessary parameter that is missing.
BUG=skia:2992
Review URL: https://codereview.chromium.org/1491563003
Make NVPR a GL context option instead of a GL context.
This may enable NVPR to be run with command buffer
interface.
No functionality change in DM or nanobench. NVPR can
only be run with normal GL APIs.
BUG=skia:2992
Review URL: https://codereview.chromium.org/1448883002
Skia'a nanobench can hold onto an image, that causes the lifetime
of GrGLGpu to extend past that of the GL interface.
Kill reference to surface and image before killing the interface
Review URL: https://codereview.chromium.org/1472433002
Remove GrContextFactory::getGLContext, it is problematic:
It has the bug of not checking for the context type.
It also is error-prone, since the GL context is not made
current, but it is callers may assume it is current.
It is also not used very much.
Clients can use GrContextFactory::getContextInfo.
BUG=skia:
Review URL: https://codereview.chromium.org/1455093003
SkChecksum::Compute is a very, very poorly distributed hash function.
This replaces all remaining uses with Murmur3.
The only interesting stuff is in src/gpu.
BUG=skia:
Review URL: https://codereview.chromium.org/1436973003
Rename SkCodecTools.h to SkBitmapRegionDecoderPriv.h
Move BRD code to its own directory in tools. This
allows us to not need to expose the entire tools
directory in Android.
BUG=skia:
Review URL: https://codereview.chromium.org/1417393004
We no longer need to worry about namespace
conflicts SkBitmapRegionDecoder in Android (which
we are replacing).
Additionally, the static Create() function does not
need to repeat the name BitmapRegionDecoder.
BUG=skia:
Review URL: https://codereview.chromium.org/1415243007
In SkSampledCodec, allow the native codec to do its scaling first, then
sample on top of that. Since the only codec which can do native scaling
is JPEG, and we know what it can do, hard-code for JPEG. Check to see
if the sampleSize is something JPEG supports, or a multiple of
something it supports. If so, use JPEG directly or combine them.
BUG=skia:4320
Review URL: https://codereview.chromium.org/1417583009
These are benches similar to the imagefilterscropexpand GM: an
input filter is cropped to a smaller size, then the blur is re-expanded
out to a larger size.
BUG=skbug:4502
Review URL: https://codereview.chromium.org/1412373004
Recent changes to WallTimer broke --samplingTime. In particular, this idiom became nonsensical:
WallTimer timer;
timer.start();
do {
...
timer.end();
} while(timer.fWall < ...);
WallTimer started making private use of fWall between when start() and end() were called, so the second time around the loop we end up with nonsense.
If that makes no sense, don't worry. The code here using now_ms() is just as fast, just as precise, and clearer.
I took the opportunity to simplify --samplingTime <complicated string parsing> to --ms <int>, and to simplify the code that depends on it.
BUG=skia:
Review URL: https://codereview.chromium.org/1419103004
The result SkBitmap, the pixel allocator, and the alpha
preference need to be communicated from the client to
the region decoder.
BUG=skia:
Review URL: https://codereview.chromium.org/1418093006
This change updates a small subset of benchmarks to flush the GrContext
between draw loops (specifically SKP benchmarks, SampleApp, and the
warmup in visualbench). This helps improve timing accuracy by not
allowing the gpu to batch across draw boundaries in the affected
benchmarks.
BUG=skia:
Review URL: https://codereview.chromium.org/1427533002
We've migrated SkHwuiRenderer into the Android Framework as
android::uirenderer::TestWindowContext in response to an internal
bug; we now delete that class and change our build references here.
R=djsollen@google.com
Review URL: https://codereview.chromium.org/1407053009
Remove a call to canFilterImageGPU() / filterImageGPU() from
filterInputGPU(). There's no reason to do this, since
the subsequent filterImage() call will do it for us anyway.
And this call actually defeats caching (as demonstrated by
the attached bench).
BUG=skia:
Review URL: https://codereview.chromium.org/1411013004
I don't really expect this to fix the errors, but I think
it's worth it to try shaking up the valgrind bot overnight.
There's some strange behavior with regard to color type on
the valgrind bot that I can't reproduce and that we aren't
seeing on any of the other bots.
TBR=mtklein,scroggo
BUG=skia:
Review URL: https://codereview.chromium.org/1418723002
- Remove --images '' to renable image benchmarking
- Add a flag to disable testing JPEG's buildTileIndex, since it also leaks memory
- Do not run images on GPU
- Do not run large interlaced images on 32 bit bots
- When buildTileIndex is not being used in the subset benches, do not use it for BRD
BUG=skia:3418
BUG=skia:4469
BUG=skia:4471
BUG=skia:4360
Review URL: https://codereview.chromium.org/1396113002
This CL allows the SkScanlineDecoder to decode partial
scanlines.
This is a first step in efficiently implementing subsetting
in SkScaledCodec.
BUG=skia:4209
Review URL: https://codereview.chromium.org/1390213002
SubsetTranslateBench.cpp:
Unref the color table, so it gets deleted.
SkBitmapRegionDecoderInterface.cpp:
Delete the stream if it is not used.
BUG=skia:3418
Review URL: https://codereview.chromium.org/1396113003
Rather than implementing some sort of "fill" in every
SkCodec subclass for incomplete images, let's make the
parent class handle this situation.
This includes an API change to SkCodec.h
SkCodec::getScanlines() now returns the number of lines it
read successfully, rather than an SkCodec::Result enum.
getScanlines() most often fails on an incomplete input, in
which case it is useful to know how many lines were
successfully decoded - this provides more information than
kIncomplete vs kSuccess. We do lose information when the
API is used improperly, as we are no longer able to return
kInvalidParameter or kScanlineNotStarted.
Known Issues:
Does not work for incomplete fFrameIsSubset gifs.
Does not work for incomplete icos.
BUG=skia:
Review URL: https://codereview.chromium.org/1332053002
Instead of decoding one line at a time, if the ScanlineOrder is kNone,
decode all of the lines in one pass, and then copy the subset into the
output. This will allow us to more realistically test subset decodes
for interlaced png. It also makes running them not take forever.
Do *not* support other modes (besides kTopDown), since they are not
used by the big three we need to replace BitmapRegionDecoder
implementation (skbug.com/4428).
Fix a bug in SubsetTranslateBench and SubsetZoomBench:
When we decode another subset, we need to reset the scanline decode
first. This bug appears to have been present since the introduction of
these tests in crrev.com/1160953002
BUG=skia:4205
BUG=skia:3418
Review URL: https://codereview.chromium.org/1387233002
Reason for revert:
Breaks Chrome with this link error: ../../third_party/skia/include/effects/SkMorphologyImageFilter.h:75: error: undefined reference to 'SkMorphologyImageFilter::SkMorphologyImageFilter(int, int, SkImageFilter*, SkImageFilter::CropRect const*)'
../../third_party/skia/include/effects/SkMorphologyImageFilter.h:104: error: undefined reference to 'SkMorphologyImageFilter::SkMorphologyImageFilter(int, int, SkImageFilter*, SkImageFilter::CropRect const*)'
Presumably due to code in third_party/WebKit/Source/platform/graphics/filters/FEMorphology.cpp that contains:
#include "SkMorphologyImageFilter.h"
...
if (m_type == FEMORPHOLOGY_OPERATOR_DILATE)
return adoptRef(SkDilateImageFilter::Create(radiusX, radiusY, input.get(), &rect));
return adoptRef(SkErodeImageFilter::Create(radiusX, radiusY, input.get(), &rect));
Original issue's description:
> factories should return baseclass, allowing the impl to specialize
>
> waiting on https://codereview.chromium.org/1386163002/# to land
>
> BUG=skia:4424
>
> Committed: https://skia.googlesource.com/skia/+/80a6dcaa1b757826ed7414f64b035d512d9ccbf8TBR=senorblanco@google.com,robertphillips@google.com,reed@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4424
Review URL: https://codereview.chromium.org/1389063002
GLBenches do not expect gl state to change between onPerCanvasPreDraw and *PostDraw, but we do a clear and sometimes we clear as draw. This causes us to bind vertex objects / programs / etc.
This change creates two new virtual methods which are called right before and immediately after timing.
BUG=skia:
Review URL: https://codereview.chromium.org/1379853003
Benefits:
- This mimics other decoding APIs (including the ones SkCodec relies
on, e.g. a png_struct, which can be used to decode an entire image or
one line at a time).
- It allows a client to ask us to do what we can do efficiently - i.e.
start from encoded data and either decode the whole thing or scanlines.
- It removes the duplicate methods which appeared in both SkCodec and
SkScanlineDecoder (some of which, e.g. in SkJpegScanlineDecoder, just
call fCodec->sameMethod()).
- It simplifies moving more checks into the base class (e.g. the
examples in skbug.com/4284).
BUG=skia:4175
BUG=skia:4284
=====================================================================
SkScanlineDecoder.h/.cpp:
Removed.
SkCodec.h/.cpp:
Add methods, enums, and variables which were previously in
SkScanlineDecoder.
Default fCurrScanline to -1, as a sentinel that start has not been
called.
General changes:
Convert SkScanlineDecoders to SkCodecs.
General changes in SkCodec subclasses:
Merge SkScanlineDecoder implementation into SkCodec. Most (all?) owned
an SkCodec, so they now call this-> instead of fCodec->.
SkBmpCodec.h/.cpp:
Replace the unused rowOrder method with an override for
onGetScanlineOrder.
Make getDstRow const, since it is called by onGetY, which is const.
SkCodec_libpng.h/.cpp:
Make SkPngCodec an abstract class, with two subclasses which handle
scanline decoding separately (they share code for decoding the entire
image). Reimplement onReallyHasAlpha so that it can return the most
recent result (e.g. after a scanline decode which only decoded part
of the image) or a better answer (e.g. if the whole image is known to
be opaque).
Compute fNumberPasses early, so we know which subclass to instantiate.
Make SkPngInterlaceScanlineDecoder use the base class' fCurrScanline
rather than a separate variable.
CodexTest.cpp:
Add tests for the state changes in SkCodec (need to call start before
decoding scanlines; calling getPixels means that start will need to
be called again before decoding more scanlines).
Add a test which decodes in stripes, currently only used for an
interlaced PNG.
TODO: Add tests for onReallyHasAlpha.
Review URL: https://codereview.chromium.org/1365313002
To avoid breaking existing SKPs, add a deserialization stub which
unflattens SkBitmapSource records to SkImageSources.
R=reed@google.com,mtklein@google.com,robertphillips@google.com
Review URL: https://codereview.chromium.org/1363913002
SkBitmapRegionDecoderInterface provides an interface
for multiple implementations of Android's
BitmapRegionDecoder.
We already have correctness tests in DM that will enable us
to compare the quality of our various BRD implementations.
We also need these performance tests to compare the speed
of our various implementations.
BUG=skia:4357
Review URL: https://codereview.chromium.org/1344993003
The command buffer's GetShaderiv and GetProgramiv code checks
that the success value passed in is either -1 or 0.
Review URL: https://codereview.chromium.org/1318143004
This switches over SkXfermodes_opts.h and SkColorMatrixFilter to use Sk4f,
and converts the SkPMFloat benches to Sk4f benches.
No pixels should change here, and no code beyond the Sk4f_ benches should change speed.
The benches are faster than the old versions.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1324743002
Unfortunately, immintrin.h (which is also included by SkTypes)
includes xmmintrin.h which includes mm_malloc.h which includes
stdlib.h for malloc even though, from the implementation, it is
difficult to see why.
Fortunately, arm_neon.h does not seem to be involved in such
shenanigans, so building for Android will keep things sane.
TBR=reed@google.com
Doesn't change Skia API, just moves an include.
Review URL: https://codereview.chromium.org/1313203003
Prior to this CL, if a client wanted to decode scanlines, they had to
create an SkCodec in order to get an SkScanlineDecoder. This introduces
complications if input data is not easily shared between the two
objects.
Instead, add methods to SkScanlineDecoder for creating a new one from
input data, and remove the creation functions from SkCodec.
Update DM and tests.
Review URL: https://codereview.chromium.org/1267583002
(and a couple presubmit fixes)
This allows us to turn back on -Werror for LLVM coverage builds,
and more generally supports building with Clang 3.7.
No public API changes.
TBR=reed@google.com
BUG=skia:
Review URL: https://codereview.chromium.org/1232463006
Make getScanlineDecoder return a new object each time, which is
owned by the caller, and independent from any existing scanline
decoders and the SkCodec itself.
Since the SkCodec already contains the entire state machine, and it
is used by the scanline decoders, simply create a new SkCodec which
is now owned by the scanline decoder.
Move code that cleans up after using a scanline decoder into its
destructor
One side effect is that creating the first scanline decoder requires
a duplication of the stream and re-reading the header. (With some
more complexity/changes, we could pass the state machine to the
scanline decoder and make the SkCodec recreate its own state machine
instead.) The typical client of the scanline decoder (region decoder)
uses an SkMemoryStream, so the duplication is cheap, although we
should consider the extra time to reread the header/recreate the state
machine. (If/when we use the scanline decoder for other purposes,
where the stream may not be cheaply duplicated, we should consider
passing the state machine.)
One (intended) result of this change is that a client can create a
new scanline decoder in a new thread, and decode different pieces of
the image simultaneously.
In SkPngCodec::decodePalette, use fBitDepth rather than a parameter.
Review URL: https://codereview.chromium.org/1230033004
SkImageGenerator makes some assumptions that are not necessarily valid
for SkCodec. For example, SkCodec does not assume that it can always be
rewound.
We also have an ongoing question of what an SkCodec should report as
its default settings (i.e. the return from getInfo). It makes sense for
an SkCodec to report that its pixels are unpremultiplied, if that is
the case for the underlying data, but if a client of SkImageGenerator
uses the default settings (as many do), they will receive
unpremultiplied pixels which cannot (currently) be drawn with Skia. We
may ultimately decide to revisit SkCodec reporting an SkImageInfo, but
I have left it unchanged for now.
Import features of SkImageGenerator used by SkCodec into SkCodec.
I have left SkImageGenerator unchanged for now, but it no longer needs
Result or Options. This will require changes to Chromium.
Manually handle the lifetime of fScanlineDecoder, so SkScanlineDecoder.h
can include SkCodec.h (where Result is), and SkCodec.h does not need
to include it (to delete fScanlineDecoder).
In many places, make the following simple changes:
- Now include SkScanlineDecoder.h, which is no longer included by
SkCodec.h
- Use the enums in SkCodec, rather than SkImageGenerator
- Stop including SkImageGenerator.h where no longer needed
Review URL: https://codereview.chromium.org/1220733013
This makes nanobench picture recording benchmarks somewhat useful again,
as opposed to all taking about 5us to run no matter the content.
ATTN Sheriff: this will probably trigger perf.skia.org alerts.
BUG=skia:
Review URL: https://codereview.chromium.org/1219873002
All of the CodecSubset benches fail when the color type is
kIndex8. We need to pass a color table to allocPixels()
when we want to decode to kIndex8 or it will throw a failure.
BUG=skia:
Review URL: https://codereview.chromium.org/1213983003
Changes verbose mode to print both the table and the individual sample
values. No need to hold back information in verbose mode.
BUG=skia:
Review URL: https://codereview.chromium.org/1208763003
Adds a nanobench mode that takes samples for a fixed amount of time,
rather than taking a fixed amount of samples.
BUG=skia:
Review URL: https://codereview.chromium.org/1204153002
Now that Sk4px exists, there's a lot less sense in eeking out every
cycle of speed from SkPMFloat: if we need to go _really_ fast, we
should use Sk4px. SkPMFloat's going to be used for things that are
already slow: large-range intermediates, divides, sqrts, etc.
A [0,1] range is easier to work with, and can even be faster if we
eliminate enough *255 and *1/255 steps. This is particularly true
on ARM, where NEON can do the *255 and /255 steps for us while
converting float<->int.
We have lots of experimental SkPMFloat <-> SkPMColor APIs that
I'm now removing. Of the existing APIs, roundClamp() is the sanest,
so I've kept only that, now called round(). The 4-at-a-time APIs
never panned out, so they're gone.
There will be small diffs on:
colormatrix coloremoji colorfilterimagefilter fadefilter imagefilters_xfermodes imagefilterscropexpand imagefiltersgraph tileimagefilter
BUG=skia:
Review URL: https://codereview.chromium.org/1201343004
Improves the GPU measuring accuracy of nanobench by using fence syncs.
Fence syncs are very widely supported and available on almost every
platform.
NO_MERGE_BUILDS
BUG=skia:
Review URL: https://codereview.chromium.org/1194783003
I think these changes to the subset benchmarks cover what we discussed yesterday.
I removed the divisor benchmarks (2x2, 3x3) and changed the single subset benchmarks.
Also, we will no longer benchmark subset decodes on small images.
BUG=skia:
Review URL: https://codereview.chromium.org/1188223002
Let's make CPU-bound .SKP benching mimic Chrome's tiles.
Unfortunately, the CPU code also performs a lot better with those big wide tiles...
BUG=skia:
Review URL: https://codereview.chromium.org/1189863002
This makes it easier to benchmark _mpd variants in a profiler.
E.g.,
<profiler> out/Release/nanobench --images --config 8888 --loops -1 --match sp_desk_nytimes
BUG=skia:
Review URL: https://codereview.chromium.org/1184673006
I haven't figured out a pithy way to have these apply to only classes
originating from SkNx, so let's just remove them. There aren't too
many use cases, and it's not really any less readable without them.
Semantically, this is a no-op.
BUG=skia:
Review URL: https://codereview.chromium.org/1167153002
The existing bench only tests the fast path, but we're looking to speed
up the general case. It'd be nice to be able to measure that speedup.
BUG=skia:
Review URL: https://codereview.chromium.org/1146953003
Constructing the gm tests and benches causes many calls to font loads.
This is visible as profiling samples in fontconfig and freetype on Linux
for all profiling runs of nanobench. This complicates analysis of
test-cases that are suspected of being slow due to font-related issues.
Move the font loading to GM::onOnceBeforeDraw and Benchmark::onPreDraw.
This way the code is not executed if the testcase does not match the
nanobench --match filter. This way the samples in font-related code are
more easy to identify as legitimate occurances caused by the testcase.
This should not cause differences in timings, because:
* Benchmark::preDraw / onPreDraw is defined to be run outside the timer
* GM::runAsBench is not enabled for any of the modified testcases. Also
nanobench untimed warmup round should run the onOnceBeforeDraw.
(and there are other GM::runAsBench gms already doing loading in
onOnceBeforeDraw).
Changes the behavior:
In TextBench:
Before, the test would report two different gms with the same name if
the color emoji font was not loaded successfully.
After, the test always reports all tests as individual names.
Generally:
The errors from loading fonts now print inbetween each testcase, as
opposed to printing during construction phase. Sample output:
( 143/145 MB 1872) 14.7ms 8888 gm quadclosepathResource /fonts/Funkster.ttf not a valid font.
( 160/160 MB 1831) 575µs 8888 gm surfacenewResource /fonts/Funkster.ttf not a valid font.
( 163/165 MB 1816) 12.5ms 8888 gm linepathResource /fonts/Funkster.ttf not a valid font.
( 263/411 MB 1493) 118ms 8888 gm typefacestyles_kerningResource /fonts/Funkster.ttf not a valid font.
( 374/411 MB 1231) 7.16ms 565 gm getpostextpathResource /fonts/Funkster.ttf not a valid font.
( 323/411 MB 1179) 4.92ms 565 gm stringartResource /fonts/Funkster.ttf not a valid font.
( 347/493 MB 917) 191ms 565 gm patch_gridResource /fonts/Funkster.ttf not a valid font.
( 375/493 MB 857) 23.9ms gpu gm clipdrawdrawCannot render path (0)
( 393/493 MB 706) 2.91ms unit test ParsePath------ png error IEND: CRC error
( 394/493 MB 584) 166ms gpu gm hairmodesResource /fonts/Funkster.ttf not a valid font.
Resource /fonts/Funkster.ttf not a valid font.
Resource /fonts/Funkster.ttf not a valid font.
...
Review URL: https://codereview.chromium.org/1144023002
Make GrResourceCache performance less sensitive to key length change.
The memcmp in GrResourceKey is called when SkTDynamicHash jumps the
slots to find the hash by a index. Avoid most of the memcmps by
comparing the hash first.
This is important because small changes in key data length can cause
big performance regressions. The theory is that key length change causes
different hash values. These hash values might trigger memcmps that
originally weren't there, causing the regression.
Adds few specialized benches to grresourcecache_add to test different
key lengths. The tests are run only on release, because on debug the
SkTDynamicHash validation takes too long, and adding many such delays
to development test runs would be unproductive. On release the tests
are quite fast.
Effect of this patch to the added tests on amd64:
grresourcecache_find_10 738us -> 768us 1.04x
grresourcecache_find_2 472us -> 476us 1.01x
grresourcecache_find_25 841us -> 845us 1x
grresourcecache_find_4 565us -> 531us 0.94x
grresourcecache_find_54 1.18ms -> 1.1ms 0.93x
grresourcecache_find_5 834us -> 749us 0.9x
grresourcecache_find_3 620us -> 542us 0.87x
grresourcecache_add_25 2.74ms -> 2.24ms 0.82x
grresourcecache_add_56 3.23ms -> 2.56ms 0.79x
grresourcecache_add_54 3.34ms -> 2.62ms 0.78x
grresourcecache_add_5 2.68ms -> 2.1ms 0.78x
grresourcecache_add_10 2.7ms -> 2.11ms 0.78x
grresourcecache_add_2 1.85ms -> 1.41ms 0.76x
grresourcecache_add 1.84ms -> 1.4ms 0.76x
grresourcecache_add_4 1.99ms -> 1.49ms 0.75x
grresourcecache_add_3 2.11ms -> 1.55ms 0.73x
grresourcecache_add_55 39ms -> 13.9ms 0.36x
grresourcecache_find_55 23.2ms -> 6.21ms 0.27x
On arm64 the results are similar.
On arm_v7_neon, the results lack the discontinuity at 55:
grresourcecache_add 4.06ms -> 4.26ms 1.05x
grresourcecache_add_2 4.05ms -> 4.23ms 1.05x
grresourcecache_find 1.28ms -> 1.3ms 1.02x
grresourcecache_find_56 3.35ms -> 3.32ms 0.99x
grresourcecache_find_2 1.31ms -> 1.29ms 0.99x
grresourcecache_find_54 3.28ms -> 3.24ms 0.99x
grresourcecache_add_5 6.38ms -> 6.26ms 0.98x
grresourcecache_add_55 8.44ms -> 8.24ms 0.98x
grresourcecache_add_25 7.03ms -> 6.86ms 0.98x
grresourcecache_find_25 2.7ms -> 2.59ms 0.96x
grresourcecache_find_4 1.45ms -> 1.38ms 0.95x
grresourcecache_find_10 2.52ms -> 2.39ms 0.95x
grresourcecache_find_55 3.54ms -> 3.33ms 0.94x
grresourcecache_find_5 2.5ms -> 2.32ms 0.93x
grresourcecache_find_3 1.57ms -> 1.43ms 0.91x
The extremely slow case, 55, is postulated to be due to the index jump
collisions running the memcmp. This is not visible on arm_v7_neon probably due
to hash function producing different results for 32 bit architectures.
This change is needed for extending path cache key in Gr
NV_path_rendering codepath. Extending is needed in order to add dashed
paths to the path cache.
Review URL: https://codereview.chromium.org/1132723003
I'm thinking of using this in perf with something like:
ratio(fill(filter("test=foo")), fill(filter("test=control")))
Does that make sense to you?
Not sure that this is really a good control bench on all bots,
but I propose we just run it a bit and find out if it needs work.
BUG=skia:
Review URL: https://codereview.chromium.org/1129823003