This switches over SkXfermodes_opts.h and SkColorMatrixFilter to use Sk4f,
and converts the SkPMFloat benches to Sk4f benches.
No pixels should change here, and no code beyond the Sk4f_ benches should change speed.
The benches are faster than the old versions.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1324743002
Unfortunately, immintrin.h (which is also included by SkTypes)
includes xmmintrin.h which includes mm_malloc.h which includes
stdlib.h for malloc even though, from the implementation, it is
difficult to see why.
Fortunately, arm_neon.h does not seem to be involved in such
shenanigans, so building for Android will keep things sane.
TBR=reed@google.com
Doesn't change Skia API, just moves an include.
Review URL: https://codereview.chromium.org/1313203003
Prior to this CL, if a client wanted to decode scanlines, they had to
create an SkCodec in order to get an SkScanlineDecoder. This introduces
complications if input data is not easily shared between the two
objects.
Instead, add methods to SkScanlineDecoder for creating a new one from
input data, and remove the creation functions from SkCodec.
Update DM and tests.
Review URL: https://codereview.chromium.org/1267583002
(and a couple presubmit fixes)
This allows us to turn back on -Werror for LLVM coverage builds,
and more generally supports building with Clang 3.7.
No public API changes.
TBR=reed@google.com
BUG=skia:
Review URL: https://codereview.chromium.org/1232463006
Make getScanlineDecoder return a new object each time, which is
owned by the caller, and independent from any existing scanline
decoders and the SkCodec itself.
Since the SkCodec already contains the entire state machine, and it
is used by the scanline decoders, simply create a new SkCodec which
is now owned by the scanline decoder.
Move code that cleans up after using a scanline decoder into its
destructor
One side effect is that creating the first scanline decoder requires
a duplication of the stream and re-reading the header. (With some
more complexity/changes, we could pass the state machine to the
scanline decoder and make the SkCodec recreate its own state machine
instead.) The typical client of the scanline decoder (region decoder)
uses an SkMemoryStream, so the duplication is cheap, although we
should consider the extra time to reread the header/recreate the state
machine. (If/when we use the scanline decoder for other purposes,
where the stream may not be cheaply duplicated, we should consider
passing the state machine.)
One (intended) result of this change is that a client can create a
new scanline decoder in a new thread, and decode different pieces of
the image simultaneously.
In SkPngCodec::decodePalette, use fBitDepth rather than a parameter.
Review URL: https://codereview.chromium.org/1230033004
SkImageGenerator makes some assumptions that are not necessarily valid
for SkCodec. For example, SkCodec does not assume that it can always be
rewound.
We also have an ongoing question of what an SkCodec should report as
its default settings (i.e. the return from getInfo). It makes sense for
an SkCodec to report that its pixels are unpremultiplied, if that is
the case for the underlying data, but if a client of SkImageGenerator
uses the default settings (as many do), they will receive
unpremultiplied pixels which cannot (currently) be drawn with Skia. We
may ultimately decide to revisit SkCodec reporting an SkImageInfo, but
I have left it unchanged for now.
Import features of SkImageGenerator used by SkCodec into SkCodec.
I have left SkImageGenerator unchanged for now, but it no longer needs
Result or Options. This will require changes to Chromium.
Manually handle the lifetime of fScanlineDecoder, so SkScanlineDecoder.h
can include SkCodec.h (where Result is), and SkCodec.h does not need
to include it (to delete fScanlineDecoder).
In many places, make the following simple changes:
- Now include SkScanlineDecoder.h, which is no longer included by
SkCodec.h
- Use the enums in SkCodec, rather than SkImageGenerator
- Stop including SkImageGenerator.h where no longer needed
Review URL: https://codereview.chromium.org/1220733013
This makes nanobench picture recording benchmarks somewhat useful again,
as opposed to all taking about 5us to run no matter the content.
ATTN Sheriff: this will probably trigger perf.skia.org alerts.
BUG=skia:
Review URL: https://codereview.chromium.org/1219873002
All of the CodecSubset benches fail when the color type is
kIndex8. We need to pass a color table to allocPixels()
when we want to decode to kIndex8 or it will throw a failure.
BUG=skia:
Review URL: https://codereview.chromium.org/1213983003
Changes verbose mode to print both the table and the individual sample
values. No need to hold back information in verbose mode.
BUG=skia:
Review URL: https://codereview.chromium.org/1208763003
Adds a nanobench mode that takes samples for a fixed amount of time,
rather than taking a fixed amount of samples.
BUG=skia:
Review URL: https://codereview.chromium.org/1204153002
Now that Sk4px exists, there's a lot less sense in eeking out every
cycle of speed from SkPMFloat: if we need to go _really_ fast, we
should use Sk4px. SkPMFloat's going to be used for things that are
already slow: large-range intermediates, divides, sqrts, etc.
A [0,1] range is easier to work with, and can even be faster if we
eliminate enough *255 and *1/255 steps. This is particularly true
on ARM, where NEON can do the *255 and /255 steps for us while
converting float<->int.
We have lots of experimental SkPMFloat <-> SkPMColor APIs that
I'm now removing. Of the existing APIs, roundClamp() is the sanest,
so I've kept only that, now called round(). The 4-at-a-time APIs
never panned out, so they're gone.
There will be small diffs on:
colormatrix coloremoji colorfilterimagefilter fadefilter imagefilters_xfermodes imagefilterscropexpand imagefiltersgraph tileimagefilter
BUG=skia:
Review URL: https://codereview.chromium.org/1201343004
Improves the GPU measuring accuracy of nanobench by using fence syncs.
Fence syncs are very widely supported and available on almost every
platform.
NO_MERGE_BUILDS
BUG=skia:
Review URL: https://codereview.chromium.org/1194783003
I think these changes to the subset benchmarks cover what we discussed yesterday.
I removed the divisor benchmarks (2x2, 3x3) and changed the single subset benchmarks.
Also, we will no longer benchmark subset decodes on small images.
BUG=skia:
Review URL: https://codereview.chromium.org/1188223002
Let's make CPU-bound .SKP benching mimic Chrome's tiles.
Unfortunately, the CPU code also performs a lot better with those big wide tiles...
BUG=skia:
Review URL: https://codereview.chromium.org/1189863002
This makes it easier to benchmark _mpd variants in a profiler.
E.g.,
<profiler> out/Release/nanobench --images --config 8888 --loops -1 --match sp_desk_nytimes
BUG=skia:
Review URL: https://codereview.chromium.org/1184673006
I haven't figured out a pithy way to have these apply to only classes
originating from SkNx, so let's just remove them. There aren't too
many use cases, and it's not really any less readable without them.
Semantically, this is a no-op.
BUG=skia:
Review URL: https://codereview.chromium.org/1167153002
The existing bench only tests the fast path, but we're looking to speed
up the general case. It'd be nice to be able to measure that speedup.
BUG=skia:
Review URL: https://codereview.chromium.org/1146953003
Constructing the gm tests and benches causes many calls to font loads.
This is visible as profiling samples in fontconfig and freetype on Linux
for all profiling runs of nanobench. This complicates analysis of
test-cases that are suspected of being slow due to font-related issues.
Move the font loading to GM::onOnceBeforeDraw and Benchmark::onPreDraw.
This way the code is not executed if the testcase does not match the
nanobench --match filter. This way the samples in font-related code are
more easy to identify as legitimate occurances caused by the testcase.
This should not cause differences in timings, because:
* Benchmark::preDraw / onPreDraw is defined to be run outside the timer
* GM::runAsBench is not enabled for any of the modified testcases. Also
nanobench untimed warmup round should run the onOnceBeforeDraw.
(and there are other GM::runAsBench gms already doing loading in
onOnceBeforeDraw).
Changes the behavior:
In TextBench:
Before, the test would report two different gms with the same name if
the color emoji font was not loaded successfully.
After, the test always reports all tests as individual names.
Generally:
The errors from loading fonts now print inbetween each testcase, as
opposed to printing during construction phase. Sample output:
( 143/145 MB 1872) 14.7ms 8888 gm quadclosepathResource /fonts/Funkster.ttf not a valid font.
( 160/160 MB 1831) 575µs 8888 gm surfacenewResource /fonts/Funkster.ttf not a valid font.
( 163/165 MB 1816) 12.5ms 8888 gm linepathResource /fonts/Funkster.ttf not a valid font.
( 263/411 MB 1493) 118ms 8888 gm typefacestyles_kerningResource /fonts/Funkster.ttf not a valid font.
( 374/411 MB 1231) 7.16ms 565 gm getpostextpathResource /fonts/Funkster.ttf not a valid font.
( 323/411 MB 1179) 4.92ms 565 gm stringartResource /fonts/Funkster.ttf not a valid font.
( 347/493 MB 917) 191ms 565 gm patch_gridResource /fonts/Funkster.ttf not a valid font.
( 375/493 MB 857) 23.9ms gpu gm clipdrawdrawCannot render path (0)
( 393/493 MB 706) 2.91ms unit test ParsePath------ png error IEND: CRC error
( 394/493 MB 584) 166ms gpu gm hairmodesResource /fonts/Funkster.ttf not a valid font.
Resource /fonts/Funkster.ttf not a valid font.
Resource /fonts/Funkster.ttf not a valid font.
...
Review URL: https://codereview.chromium.org/1144023002
Make GrResourceCache performance less sensitive to key length change.
The memcmp in GrResourceKey is called when SkTDynamicHash jumps the
slots to find the hash by a index. Avoid most of the memcmps by
comparing the hash first.
This is important because small changes in key data length can cause
big performance regressions. The theory is that key length change causes
different hash values. These hash values might trigger memcmps that
originally weren't there, causing the regression.
Adds few specialized benches to grresourcecache_add to test different
key lengths. The tests are run only on release, because on debug the
SkTDynamicHash validation takes too long, and adding many such delays
to development test runs would be unproductive. On release the tests
are quite fast.
Effect of this patch to the added tests on amd64:
grresourcecache_find_10 738us -> 768us 1.04x
grresourcecache_find_2 472us -> 476us 1.01x
grresourcecache_find_25 841us -> 845us 1x
grresourcecache_find_4 565us -> 531us 0.94x
grresourcecache_find_54 1.18ms -> 1.1ms 0.93x
grresourcecache_find_5 834us -> 749us 0.9x
grresourcecache_find_3 620us -> 542us 0.87x
grresourcecache_add_25 2.74ms -> 2.24ms 0.82x
grresourcecache_add_56 3.23ms -> 2.56ms 0.79x
grresourcecache_add_54 3.34ms -> 2.62ms 0.78x
grresourcecache_add_5 2.68ms -> 2.1ms 0.78x
grresourcecache_add_10 2.7ms -> 2.11ms 0.78x
grresourcecache_add_2 1.85ms -> 1.41ms 0.76x
grresourcecache_add 1.84ms -> 1.4ms 0.76x
grresourcecache_add_4 1.99ms -> 1.49ms 0.75x
grresourcecache_add_3 2.11ms -> 1.55ms 0.73x
grresourcecache_add_55 39ms -> 13.9ms 0.36x
grresourcecache_find_55 23.2ms -> 6.21ms 0.27x
On arm64 the results are similar.
On arm_v7_neon, the results lack the discontinuity at 55:
grresourcecache_add 4.06ms -> 4.26ms 1.05x
grresourcecache_add_2 4.05ms -> 4.23ms 1.05x
grresourcecache_find 1.28ms -> 1.3ms 1.02x
grresourcecache_find_56 3.35ms -> 3.32ms 0.99x
grresourcecache_find_2 1.31ms -> 1.29ms 0.99x
grresourcecache_find_54 3.28ms -> 3.24ms 0.99x
grresourcecache_add_5 6.38ms -> 6.26ms 0.98x
grresourcecache_add_55 8.44ms -> 8.24ms 0.98x
grresourcecache_add_25 7.03ms -> 6.86ms 0.98x
grresourcecache_find_25 2.7ms -> 2.59ms 0.96x
grresourcecache_find_4 1.45ms -> 1.38ms 0.95x
grresourcecache_find_10 2.52ms -> 2.39ms 0.95x
grresourcecache_find_55 3.54ms -> 3.33ms 0.94x
grresourcecache_find_5 2.5ms -> 2.32ms 0.93x
grresourcecache_find_3 1.57ms -> 1.43ms 0.91x
The extremely slow case, 55, is postulated to be due to the index jump
collisions running the memcmp. This is not visible on arm_v7_neon probably due
to hash function producing different results for 32 bit architectures.
This change is needed for extending path cache key in Gr
NV_path_rendering codepath. Extending is needed in order to add dashed
paths to the path cache.
Review URL: https://codereview.chromium.org/1132723003
I'm thinking of using this in perf with something like:
ratio(fill(filter("test=foo")), fill(filter("test=control")))
Does that make sense to you?
Not sure that this is really a good control bench on all bots,
but I propose we just run it a bit and find out if it needs work.
BUG=skia:
Review URL: https://codereview.chromium.org/1129823003
The benches for N <= 10 get around 2x faster on my N7 and N9. I believe this
is because of the reduced function-call-then-function-pointer-call overhead on
the N7, and additionally because it seems autovectorization beats our NEON code
for small N on the N9.
My desktop is unchanged, though that's probably because N=10 lies well within a
region where memset's performance is essentially constant: N=100 takes only
about 2x as long as N=1 and N=10, which perform nearly identically.
BUG=skia:
Review URL: https://codereview.chromium.org/1073863002
The colorfilter is applied to a single (paint's) color, so the bench does not
measure the filter at all, but simply the blit of a color.
BUG=skia:
TBR=
Review URL: https://codereview.chromium.org/1055383002
Now that all SkCodecs can rewind (assuming the stream is rewindable),
we do not need to special case it.
Pointed out by Derek in the code review that added this.
TBR=djsollen
Review URL: https://codereview.chromium.org/1058633002
CodecBench:
Add new class for timing using SkCodec.
DecodingBench:
Include creating a decoder inside the loop. This is to have a better
comparison against SkCodec. SkCodec's factory function does not
necessarily read the same amount as SkImageDecoder's, so in order to
have a meaningful comparison, read the entire stream from the
beginning. Also for comparison, create a new SkStream from the
SkData each time.
Add a debugging check to make sure we have an SkImageDecoder.
Add include guards.
nanobench.cpp:
Decode using SkCodec.
When decoding using SkImageDecoder, exclude benches where we decoded
to a different color type than requested. SkImageDecoder may decide to
decode to a different type, in which case the name is misleading.
TODOs:
Now that we ignore color types that do not match the desired
color type, we should add Index8. This also means calling the more
complex version of getPixels so CodecBench can support kIndex8.
BUG=skia:3257
Review URL: https://codereview.chromium.org/1044363002
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
Need to land SK_SUPPORT_LEGACY_SCALAR_MAPPOINTS in chrome to suppress Affine
version which causes slight differences (which will need to be rebaselined)
BUG=skia:
Review URL: https://codereview.chromium.org/1045493002
Duplicate code from the HWUI backends for DM and nanobench
moves into a single place, saving a hundred lines or more of
cut-and-paste.
There's some indication that this increases the incidence of
SkCanvas "Unable to find device for layer." warnings, but no
clear degradation in test results.
R=djsollen@google.com,mtklein@google.com
BUG=skia:3589
Review URL: https://codereview.chromium.org/1036303002
Add and test trunc(), which is what get() used to be before rounding.
Using trunc() is a ~40% speedup on our linear gradient bench.
#neon #floats
BUG=skia:3592
#n5
#n9
CQ_INCLUDE_TRYBOTS=client.skia.android:Test-Android-Nexus5-Adreno330-Arm7-Debug-Trybot;client.skia.android:Test-Android-Nexus9-TegraK1-Arm64-Release-Trybot
Review URL: https://codereview.chromium.org/1032243002
Am I going nuts or can we get this down to just adds and converts in the loop?
#floats #n9
BUG=skia:3592
CQ_INCLUDE_TRYBOTS=client.skia.android:Test-Android-Nexus9-TegraK1-Arm64-Release-Trybot
Review URL: https://codereview.chromium.org/1008973004
There is no reason to require the 4 SkPMFloats (registers) to be adjacent.
The only potential win in loads and stores comes from the SkPMColors being adjacent.
Makes no difference to existing bench.
BUG=skia:
Review URL: https://codereview.chromium.org/1035583002
RotatedRectBench was asking for its base layer size, which may
not be what it expects with odd canvas modes (particularly proxies).
Most benchmarks are not so sophisticated; they hard-wire their
size and just use that (expected) value.
R=mtklein@google.com,djsollen@google.com
BUG=skia:3566
Review URL: https://codereview.chromium.org/1015013004
Initial experiments did show that the 256 tile size fixed the hd2000 win7
nanobot failures. However it did not have any effect on other bots, so this
change is to move back to the larger tile size on all bots expect for the
hd2000.
BUG=skia:
Review URL: https://codereview.chromium.org/1022083002
Going back to old nanobench tile size to see if the increase to tile is what has been
causing recent nanobench crashes. The crashes seem very nondeterministic and hard to
debug manually.
256x256 is too small of a tile to give accurate gpu results but if this fixes we can try some compromise in the middle
BUG=skia:
Review URL: https://codereview.chromium.org/1022823003
(This is essentially a revert of https://codereview.chromium.org/503833002/.)
This was necessary back when SkPaint was flattened even for in-process use. Now that we only flatten SkPaint for cross-process use, there's no need to serialize UniqueIDs.
Note: SkDropShadowImageFilter is being constructed with a croprect and UniqueID (of 0) in Blink. I've made the uniqueID param default to 0 temporarily, until this rolls in and Blink can be changed. (Blink can't be changed first, since unlike the other filters, there's no constructor that takes a cropRect but not a uniqueID.)
BUG=skia:
Review URL: https://codereview.chromium.org/1019493002
Seems strictly more useful.
This implements Mac and Windows, which seemed easy. Don't know how to do this on Linux yet.
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.9-MacMini6.2-HD4000-x86_64-Debug-Trybot
NOTREECHECKS=true
TBR=halcanary@google.com
Review URL: https://codereview.chromium.org/990723002
Please see if this looks usable. It may even give a perf boost if you use it, even without custom implementations for each instruction set.
I've been trying this morning to beat this naive loop implementation, but so far no luck with either _SSE2.h or _SSSE3.h. It's possible this is an artifact of the microbenchmark, because we're not doing anything between the conversions. I'd like to see how this fits into real code, what assembly's generated, what the hot spots are, etc.
I've updated the tests to test these new APIs, and splintered off a pair of new benchmarks that use the new APIs. This required some minor rejiggering in the benches.
BUG=skia:
Review URL: https://codereview.chromium.org/978213003
Instead of set(SkPMColor), add a constructor SkPMFloat(SkPMColor).
Replace setA(), setR(), etc. with a 4 float constructor.
And, promise to stick to SkPMColor order.
BUG=skia:
Review URL: https://codereview.chromium.org/977773002
This bench was ~75% overhead, ~25% good bench. It is now just about the
opposite: about 30% of the runtime is loop and random number overhead, and
about 70% of the time is spent doing SkPMColor <-> SkPMFloat work.
BUG=skia:
NOPRESUBMIT=true
Review URL: https://codereview.chromium.org/968133005
BUG=skia:1366
For the added bench, the collapsing makes the bench take:
- 70% of the time for CPU rendering of 3 consecutive matrix filters
- almost no change in the GPU rendering of the matrix filters
- 50% of the time for CPU and GPU rendering of 3 consecutive table filters
Review URL: https://codereview.chromium.org/776673002
This basically takes out the Windows-only hacks and promotes them to
cross-platform behavior driven by --gpu_threading.
- When --gpu_threading is false (the default), this puts GPU tasks and tests
together in the same GPU enclave. They all run serially.
- When --gpu_threading is true, both the tests and the tasks run totally
independently, just like the thread-safe CPU-bound work.
BUG=skia:3255
Review URL: https://codereview.chromium.org/847273005
SkStream is a stateful object, so it does not make sense for it to have
multiple owners. Make SkStream inherit directly from SkNoncopyable.
Update methods which previously called SkStream::ref() (e.g.
SkImageDecoder::buildTileIndex() and SkFrontBufferedStream::Create(),
which required the existing owners to call SkStream::unref()) to take
ownership of their SkStream parameters and delete when done (including
on failure).
Switch all SkAutoTUnref<SkStream>s to SkAutoTDelete<SkStream>s. In some
cases this means heap allocating streams that were previously stack
allocated.
Respect ownership rules of SkTypeface::CreateFromStream() and
SkImageDecoder::buildTileIndex().
Update the comments for exceptional methods which do not affect the
ownership of their SkStream parameters (e.g.
SkPicture::CreateFromStream() and SkTypeface::Deserialize()) to be
explicit about ownership.
Remove test_stream_life, which tested that buildTileIndex() behaved
correctly when SkStream was a ref counted object. The test does not
make sense now that it is not.
In SkPDFStream, remove the SkMemoryStream member. Instead of using it,
create a new SkMemoryStream to pass to fDataStream (which is now an
SkAutoTDelete).
Make other pdf rasterizers behave like SkPDFDocumentToBitmap.
SkPDFDocumentToBitmap delete the SkStream, so do the same in the
following pdf rasterizers:
SkPopplerRasterizePDF
SkNativeRasterizePDF
SkNoRasterizePDF
Requires a change to Android, which currently treats SkStreams as ref
counted objects.
Review URL: https://codereview.chromium.org/849103004
Restructure SkGpuDevice creation:
*SkSurfaceProps are optional.
*Use SkSurfaceProps to communicate DF text rather than a flag.
*Tell SkGpuDevice::Create whether RT comes from cache or not.
Review URL: https://codereview.chromium.org/848903004
BUG=skia:3255
I think this supports everything DM used to, but has completely refactored how
it works to fit the design in the bug.
Configs like "tiles-gpu" are automatically wired up.
I wouldn't suggest looking at this as a diff. There's just a bunch of deleted
files, a few new files, and one new file that shares a name with a deleted file
(DM.cpp).
NOTREECHECKS=true
Committed: https://skia.googlesource.com/skia/+/709d2c3e5062c5b57f91273bfc11a751f5b2bb88
Review URL: https://codereview.chromium.org/788243008
Reason for revert:
plenty of data
Original issue's description:
> Sketch DM refactor.
>
> BUG=skia:3255
>
>
> I think this supports everything DM used to, but has completely refactored how
> it works to fit the design in the bug.
>
> Configs like "tiles-gpu" are automatically wired up.
>
> I wouldn't suggest looking at this as a diff. There's just a bunch of deleted
> files, a few new files, and one new file that shares a name with a deleted file
> (DM.cpp).
>
> NOTREECHECKS=true
>
> Committed: https://skia.googlesource.com/skia/+/709d2c3e5062c5b57f91273bfc11a751f5b2bb88TBR=bsalomon@google.com,mtklein@chromium.org
NOTREECHECKS=true
NOTRY=true
BUG=skia:3255
Review URL: https://codereview.chromium.org/853883004
BUG=skia:3255
I think this supports everything DM used to, but has completely refactored how
it works to fit the design in the bug.
Configs like "tiles-gpu" are automatically wired up.
I wouldn't suggest looking at this as a diff. There's just a bunch of deleted
files, a few new files, and one new file that shares a name with a deleted file
(DM.cpp).
NOTREECHECKS=true
Review URL: https://codereview.chromium.org/788243008
This avoids the problem of a newly created uncached texture causing a purge of cached resources.
BUG=chromium:445885
Review URL: https://codereview.chromium.org/846303002
This fixes every case where virtual and SK_OVERRIDE were on the same line,
which should be the bulk of cases. We'll have to manually clean up the rest
over time unless I level up in regexes.
for f in (find . -type f); perl -p -i -e 's/virtual (.*)SK_OVERRIDE/\1SK_OVERRIDE/g' $f; end
BUG=skia:
Review URL: https://codereview.chromium.org/806653007
It is desirable that, when layer hoisting is disabled, the MPD and non-MPD timings be
roughly the same. Unfortunately, using a separate canvas for each tile (a requirement
for MPD) introduces its own discrepancy into the timing. Using a separate canvas for
each tile doesn't seem to make a difference for 8888 (see the non-MPD 8888 column below)
but slows down GPU rendering (see the non-MPD GPU column below). Since this is how
Chromium renders I propose switching to this regimen (even though it is "slowing down"
GPU rendering).
nanobench mean times (ms) with layer hoisting disabled (for desk_amazon.skp)
8888
MPD non-MPD
1 canvas (old-style) 0.628 1.71
separate (new-style) 0.795 1.63
GPU
MPD non-MPD
1 canvas (old-style) 2.34 1.69
separate (new-style) 2.32 2.66
Review URL: https://codereview.chromium.org/779643002
Two issues with the SKPBench tile computation were causing the MPD path to do more work:
The clip from the parent canvas wasn't being used to trim content off the edges of the MPD tiles
The non-MPD path was not taking the scale into account in its tile placement (resulting in it having fewer, larger active tiles when scaling).
Review URL: https://codereview.chromium.org/776273002
Seems okay after this small patch to skip lockPixels() / unlockPixels().
BUG=skia:3149
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu13.10-GCE-NoGPU-x86_64-Release-TSAN-Trybot
Review URL: https://codereview.chromium.org/773203003
Removes work done by the constructors of picture_nesting benches,
and moves the work to the Benchmark::onPreDraw override.
This avoids PictureNesting::sierpinsky showing up in profile traces
when profiling other benches.
Review URL: https://codereview.chromium.org/725523002
Reason for revert:
Causing breakages on Mac build.
Original issue's description:
> Make nanobench and dm be usable from Chromium build
>
> Move the app logic for each app as follows:
>
> <app>.cpp -- the file which contains main(). Embedders that compile
> their own apps, such as ios shell, upcoming Chromium dm etc, do not use this.
>
> <app>_main.cpp -- the main logic of the Skia test application. This will be
> used by Skia -compiled apps as well as embedder -compiled apps.
>
> <app>_main.h -- the API for the main logic. This will be
> used by Skia -compiled apps as well as embedder -compiled apps.
>
> This way (the upcoming) Chromium dm can setup its Chromium-specific setup
> in custom main(), and then call dm_main(), without the need of any
> SK_BUILD_FOR_XXXX defines controlling whether the tool defines main or not.
>
> BUG=skia:2992
>
> Committed: https://skia.googlesource.com/skia/+/c092d3bdab5f723576cc0346cea3ee282a9cb444TBR=mtklein@chromium.org,mtklein@google.com,borenet@google.com,kkinnunen@nvidia.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:2992
Review URL: https://codereview.chromium.org/724073002
Move the app logic for each app as follows:
<app>.cpp -- the file which contains main(). Embedders that compile
their own apps, such as ios shell, upcoming Chromium dm etc, do not use this.
<app>_main.cpp -- the main logic of the Skia test application. This will be
used by Skia -compiled apps as well as embedder -compiled apps.
<app>_main.h -- the API for the main logic. This will be
used by Skia -compiled apps as well as embedder -compiled apps.
This way (the upcoming) Chromium dm can setup its Chromium-specific setup
in custom main(), and then call dm_main(), without the need of any
SK_BUILD_FOR_XXXX defines controlling whether the tool defines main or not.
BUG=skia:2992
Review URL: https://codereview.chromium.org/657373002
The tests path_hairline_{small,big}_AA_conic were calling the test
function with NVPR. This caused a warning in nanobench.
The here removed hunk comes from commit referring to skia:2042 ("Enable
NVPR by default"). This is a workaround for a bug. The bug is fixed by
the commit referring to skia:2078 ("Logan bot fails NVPR assertion in
bench").
The proper fix is indeed make sure that path renderer chain ends up
trying software path renderer, if the path contains conics and is a
hairline.
The removed hunk refers also to skia:2033 ("Figure out what is happening
with conic path segments in NVPR"). The above solution is correct also in case
NVPR would support conics, as NVPR would not still support hairlines.
BUG=skia:2078
Review URL: https://codereview.chromium.org/685213005
Reason for revert:
Not compiling in ANGLE build
Original issue's description:
> Get gpudft support working in dm, gm, nanobench and bench_pictures
>
> Adds a new config to test distance field text.
> Clean up some flags and #defines to read "distance field text",
> not "distance field fonts" to be consistent with Chromium
>
> NOTREECHECKS=true
>
> Committed: https://skia.googlesource.com/skia/+/06ba179838ba4fe187cf290750aeeb4a02a2960bTBR=bsalomon@google.com,mtklein@google.com,reed@google.com
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/707723005
Adds a new config to test distance field text.
Clean up some flags and #defines to read "distance field text",
not "distance field fonts" to be consistent with Chromium
NOTREECHECKS=true
Review URL: https://codereview.chromium.org/699453005
- The expected case is now a single bulk-load insert() call instead of N;
- reserve() and flushDeferredInserts() can fold into insert() now;
- SkBBH subclasses may take ownership of the bounds
This appears to be a performance no-op on both my Mac and N5. I guess
even the simplest indirect branch predictor ("same as last time") can predict
the repeated virtual calls to SkBBH::insert() perfectly.
BUG=skia:
Review URL: https://codereview.chromium.org/670213002
Add a new enum to differentiate between a complete decode and a
partial decode (with the third value being failure). Return this
value from SkImageDecoder::onDecode (in all subclasses, plus
SkImageDecoder_empty) and ::decode.
For convenience, if the enum is treated as a boolean, success and
partial success are both considered true.
Note that the static helper functions (DecodeFile etc) still return
true and false (for one thing, this allows us to continue to use
SkImageDecoder::DecodeMemory as an SkPicture::InstallPixelRefProc in
SkPicture::CreateFromStream).
Also correctly report failure in SkASTCImageDecoder::onDecode when
SkTextureCompressor::DecompressBufferFromFormat fails.
BUG=skia:3037
BUG:b/17419670
Review URL: https://codereview.chromium.org/647023006
Add a unique-per-subclass namespace tag to make Keys from different
domains comparable.
Also drop the SkPictureShader cache and convert to using the global
resource cache instead.
R=reed@google.com,mtklein@google.com,robertphillips@google.com
Review URL: https://codereview.chromium.org/668223002
Got to keep our precious data in event of a crash.
With --flushEvery 10 I'm not seeing this cost any wall time.
BUG=skia:
Review URL: https://codereview.chromium.org/653083003
The original bench was hitting the cache since it was using the same color filter for all loops. By creating a new color filter within the loop, at least this part of it is solved. I'm not 100% sure this is the right way, but at least the numbers are a bit more reasonable and are affected by the output resolution.
BUG=skia:
Review URL: https://codereview.chromium.org/648483002
Draw thick-stroked Beziers by computing the outset quadratic, measuring the error, and subdividing until the error is within a predetermined limit.
To try this CL out, change src/core/SkStroke.h:18 to
#define QUAD_STROKE_APPROXIMATION 1
or from the command line: CPPFLAGS="-D QUAD_STROKE_APPROXIMATION=1" ./gyp_skia
Here's what's in this CL:
bench/BezierBench.cpp : a microbench for examining where the time is going
gm/beziers.cpp : random Beziers with various thicknesses
gm/smallarc.cpp : a distillation of bug skia:2769
samplecode/SampleRotateCircles.cpp : controls added for error, limit, width
src/core/SkStroke.cpp : the new stroke implementation (disabled)
tests/StrokerTest.cpp : a stroke torture test that checks normal and extreme values
The new stroke algorithm has a tweakable parameter:
stroker.setError(1); (SkStrokeRec.cpp:112)
The stroke error is the allowable gap between the midpoint of the stroke quadratic and the center Bezier. As the projection from the quadratic approaches the endpoints, the error is decreased proportionally so that it is always inside the quadratic curve.
An overview of how this works:
- For a given T range of a Bezier, compute the perpendiculars and find the points outset and inset for some radius.
- Construct tangents for the quadratic stroke.
- If the tangent don't intersect between them (may happen with cubics), subdivide.
- If the quadratic stroke end points are close (again, may happen with cubics), draw a line between them.
- Compute the quadratic formed by the intersecting tangents.
- If the midpoint of the quadratic is close to the midpoint of the Bezier perpendicular, return the quadratic.
- If the end of the stroke at the Bezier midpoint doesn't intersect the quad's bounds, subdivide.
- Find where the Bezier midpoint ray intersects the quadratic.
- If the intersection is too close to the quad's endpoints, subdivide.
- If the error is large proportional to the intersection's distance to the quad's endpoints, subdivide.
BUG=skia:723,skia:2769
Review URL: https://codereview.chromium.org/558163005
Make the Sk GL context class, SkGLNativeContext, an abstract base class. Before,
it depended on ifdefs to implement the platform dependent polymorphism. Move
the logic to subclasses of the various platform implementations.
This a step to enable Skia embedders to compile dm and bench_pictures. The
concrete goal is to support running these test apps with Chromium command buffer.
With this change, Chromium can implement its own version of SkGLNativeContext
that uses command buffer, and host the implementation in its own repository.
Implements the above by renaming the SkGLContextHelper to SkGLContext and
removing the unneeded SkGLNativeContext. Also removes
SkGLNativeContext::AutoRestoreContext functionality, it appeared to be unused:
no use in Skia code, and no tests.
BUG=skia:2992
Committed: https://skia.googlesource.com/skia/+/a90ed4e83897b45d6331ee4c54e1edd4054de9a8
Review URL: https://codereview.chromium.org/630843002
Reason for revert:
nanobech failing on Android
Original issue's description:
> Make the Sk GL context class an abstract base class
>
> Make the Sk GL context class, SkGLNativeContext, an abstract base class. Before,
> it depended on ifdefs to implement the platform dependent polymorphism. Move
> the logic to subclasses of the various platform implementations.
>
> This a step to enable Skia embedders to compile dm and bench_pictures. The
> concrete goal is to support running these test apps with Chromium command buffer.
>
> With this change, Chromium can implement its own version of SkGLNativeContext
> that uses command buffer, and host the implementation in its own repository.
>
> Implements the above by renaming the SkGLContextHelper to SkGLContext and
> removing the unneeded SkGLNativeContext. Also removes
> SkGLNativeContext::AutoRestoreContext functionality, it appeared to be unused:
> no use in Skia code, and no tests.
>
> BUG=skia:2992
>
> Committed: https://skia.googlesource.com/skia/+/a90ed4e83897b45d6331ee4c54e1edd4054de9a8TBR=kkinnunen@nvidia.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:2992
Review URL: https://codereview.chromium.org/639793002
Make the Sk GL context class, SkGLNativeContext, an abstract base class. Before,
it depended on ifdefs to implement the platform dependent polymorphism. Move
the logic to subclasses of the various platform implementations.
This a step to enable Skia embedders to compile dm and bench_pictures. The
concrete goal is to support running these test apps with Chromium command buffer.
With this change, Chromium can implement its own version of SkGLNativeContext
that uses command buffer, and host the implementation in its own repository.
Implements the above by renaming the SkGLContextHelper to SkGLContext and
removing the unneeded SkGLNativeContext. Also removes
SkGLNativeContext::AutoRestoreContext functionality, it appeared to be unused:
no use in Skia code, and no tests.
BUG=skia:2992
Review URL: https://codereview.chromium.org/630843002
Now that the old backend's not using BBHs, we can specialize them for
SkRecord's needs. The only thing we really want to store is op index, which
should always be small enough to fit into an unsigned (unsigned also helps keep
it straight from other ints floating around).
This means we'll need half (32-bit) or a quarter (64-bit) the bytes in SkTileGrid,
because we don't have to store an extra int for ordering.
BUG=skia:2834
Review URL: https://codereview.chromium.org/617393004
This removes:
1) ability to record old pictures with SkPictureRecorder;
2) a couple tests specific to the old backend.
The functionality of DEPRECATED_beginRecording() now lives in
(private) SkPicture::Backport(), which is the only place we
need it now.
BUG=skia:
TBR=reed@google.com
Review URL: https://codereview.chromium.org/618303002
This makes it considerably cheaper to run SKP recording benchmarks, without
affecting their measurements and without really affecting SKP playback
benchmarks at all.
On my machine, running out/Release/nanobench --match skp --config nondrendering
drops in run time from 6.7s to 2.5s, and the peak RAM usage drops from 129M to 50M.
I'm strongly considering making this lazy decoding the default.
BUG=skia:
R=robertphillips@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/572933006
Today we measure SkPicture playback speed, but not the time it takes to record
the SkPicture. This fixes that by reading SKPs from disk and re-recording them.
On the console, recording shows up first as the nonrendering skp benches,
followed later by the usual playback benches:
maxrss loops min median mean max stddev samples config bench
51M 2 165µs 168µs 169µs 178µs 3% ▆▄▃█▂▄▁▂▁▁ nonrendering tabl_slashdot.skp
57M 1 9.72ms 9.77ms 9.79ms 9.97ms 1% █▂▂▅▃▂▁▄▂▁ nonrendering desk_pokemonwiki.skp
57M 32 2.92µs 2.96µs 3.03µs 3.46µs 6% ▅▁▁▁▁▁▁█▂▁ nonrendering desk_yahoosports.skp
...
147M 1 3.86ms 3.87ms 3.97ms 4.81ms 7% █▁▁▁▁▁▁▁▁▁ 8888 tabl_slashdot.skp_1
147M 1 4.54ms 4.56ms 4.55ms 4.56ms 0% █▅▇▅█▅▂▁▅▁ 565 tabl_slashdot.skp_1
147M 2 3.08ms 3.24ms 4.17ms 8.18ms 50% █▁▁█▁▁▁▁▁▁ gpu tabl_slashdot.skp_1
147M 1 1.61ms 1.62ms 1.69ms 2.33ms 13% █▁▁▁▁▁▁▁▁▁ 8888 desk_pokemonwiki.skp_1
147M 1 1.44ms 1.44ms 1.45ms 1.47ms 1% ▅▂█▂▂▅▁▁▂▁ 565 desk_pokemonwiki.skp_1
...
On skiaperf.com, they'll also be separated out from playback benches by bench_type.
BUG=skia:
R=reed@google.com, mtklein@google.com, jcgregorio@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/559153002
Chrome's using a bounding box, so it's a good idea for our
bots to do so too.
When set, we'll create an SkTileGrid to match the
parameters of --clip, and so should always hit its fast
path.
This will impose a small overhead (querying the BBH) on all
SKPs, but make large SKPs render more quickly. E.g. on
GPU desk_pokemonwiki should show about a 30% improvement,
tabl_mozilla about 40%, and one very long page from my
personal suite, askmefast.com, gets 5x faster.
(The performance changes are not the point of the CL, but
something we should be aware of.)
BUG=
R=bsalomon@google.com, mtklein@google.com, robertphillips@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/497493003
--key describes the type of run (describes the line on the chart), --properties
describes the run itself (describes the dot on the chart).
We'll pass --properties gitHash <git hash> build_number <build number> --key
... to nanobench from the bots.
And... delete a whole lot of dead code.
Example: nanobench --properties gitHash foo build_number 1234 --key bar baz
{
"build_number" : "1234",
"gitHash" : "foo",
"key" : {
"bar" : "baz"
},
"results" : {
....
Friends with https://codereview.chromium.org/491943002
BUG=skia:
R=jcgregorio@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/488213002
drawPatch now receives as parameter const SkPoint cubics[12]
Adjusted derived classes and serialization.
Ajusted GM's and benches that take into account combinations of optional
parameters, the scale of the patch and 4 different types of patches.
Planning on adding the extra functionality of SkPatch in another CL.
BUG=skia:
R=egdaniel@google.com, reed@google.com
Author: dandov@google.com
Review URL: https://codereview.chromium.org/463493002
Share command flags between dm and unit tests.
Also, allow dm's core to be included by itself and iOSShell.
Command line flags that are the same (or nearly the same) in DM
and in skia_tests have been moved to common_flags. Authors,
please check to see that the shared common flag is correct for
the tool.
For iOS, the 'tool_main' entry point has a wrapper to allow multiple
tools to be statically linked in the iOSShell.
Since SkCommandLineFlags::Parse can only be called once, these calls
are disabled in the IOS build.
Since the iOS app directory is dynamically assigned a name, use '@' to
select it. (This is the same convention chosen by the Mobile Harness
iOS file system utilities.)
Move the heart of dm.gyp into dm.gypi so that it can be included by
itself and iOSShell.gyp.
Add tools/flags/SkCommonFlags.* to define and declare common
command line flags.
Add support for dm to iOSShell.
BUG=skia:
R=scroggo@google.com, mtklein@google.com, jvanverth@google.com, bsalomon@google.com
Author: caryclark@google.com
Review URL: https://codereview.chromium.org/389653004
We're moving away from BigQuery for storing results so the output doens't have to conform to BQ requirements, which allows simplifying the format. Also stop parsing the filename for information and pass in buildbot parameters explicitly.
Adds the following flags to nanobench:
--key
--gitHash
BUG=skia:
R=mtklein@google.com, bsalomon@google.com
Author: jcgregorio@google.com
Review URL: https://codereview.chromium.org/392393002
This seems to be ~100x higher resolution than QueryPerformanceCounter. AFAIK, all our Windows perf bots have constant_tsc, so we can be a bit more direct about using rdtsc directly: it'll always tick at the max CPU frequency.
Now, the question remains, what is the max CPU frequency to divide through by? It looks like QueryPerformanceFrequency actually gives the CPU frequency in kHz, suspiciously exactly what we need to divide through to get elapsed milliseconds. That was a freebie.
I did some before/after comparison on slow benchmarks. Timings look the same. Going to land this without review tonight to see what happens on the bots; happy to review carefully tomorrow.
R=mtklein@google.com
TBR=bungeman
BUG=skia:
Review URL: https://codereview.chromium.org/394363003
Reason for revert:
possible test and gm failures
Original issue's description:
> Remove gpu shader optimatization for solid white or trans black colors
>
> Running test on the added bench which draws a grid of all white paths, all blue paths, or alternating checkered white/blue paths.
>
> With optimization in (ms):
> White Blue Checkered
> Linux ~80 ~80 ~160
> N7 ~800 ~1100 ~1500
> Moto-e ~830 ~1100 ~2500
>
> Without optimization in (ms):
> White Blue Checkered
> Linux ~80 ~80 ~80
> N7 ~1100 ~1100 ~1100
> Moto-e ~1100 ~1100 ~1500
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/5f78d2251a440443c9eaa321dad058d7a32bfef7R=bsalomon@google.comTBR=bsalomon@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: egdaniel@google.com
Review URL: https://codereview.chromium.org/385163004
Running test on the added bench which draws a grid of all white paths, all blue paths, or alternating checkered white/blue paths.
With optimization in (ms):
White Blue Checkered
Linux ~80 ~80 ~160
N7 ~800 ~1100 ~1500
Moto-e ~830 ~1100 ~2500
Without optimization in (ms):
White Blue Checkered
Linux ~80 ~80 ~80
N7 ~1100 ~1100 ~1100
Moto-e ~1100 ~1100 ~1500
BUG=skia:
R=bsalomon@google.com
Author: egdaniel@google.com
Review URL: https://codereview.chromium.org/375823005
This ought to get us a little ahead on the transition. Only minor fixes
are needed. The one in MemoryBench is the most interesting: what used
to unambiguously be interpreted as concatenating two string literals is
now also ambiguously a user-defined literal; adding a space
disambiguates.
BUG=skia:
R=bungeman@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/361723002