Do multiply (mul) and add while tracking that the
calculation does not overflow, which can be checked with
ok().
The new unit test shows a couple examples.
Author: Herb Derby <herb@google.com>
Change-Id: I7e67671d2488d67f21d47d9618736a6bae8f23c3
Reviewed-on: https://skia-review.googlesource.com/33721
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
We only need to clip a temporary x to ensure that we don't blit
beyond clip. Storing such clipped x is problematic because it
may make our edges unsorted.
The added unit test would fail without this fix.
Bug: skia:6947
Change-Id: I6c21d7c7c097e50fef18ab151921d6c07c089318
Reviewed-on: https://skia-review.googlesource.com/33420
Commit-Queue: Yuqian Li <liyuqian@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
I'm betting big on ok bench. This is a forcing function.
Change-Id: I8c359b7d712e16f8f0cbb90591801e0014073288
Reviewed-on: https://skia-review.googlesource.com/33660
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
log2 didn't exist on Android until API 18, and it's doing double
precision math. I think this integer version is what we want?
Bug: skia:
Change-Id: I4909153c56a266688355349cda5d553b69f5c942
Reviewed-on: https://skia-review.googlesource.com/33680
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
This basically unrolls all loops, handling twice as many pixels in a
stride. We now pass around 4 native registers instead of just 2.
I've temporarily disabled AVX2 mask loads and stores. It shouldn't be
hard to turn them back on, but I'd want to test on AVX2 hardware first.
Change-Id: I0907070f086a0650167456c149a479c1d96b8a2d
Reviewed-on: https://skia-review.googlesource.com/33361
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This reverts commit 880768032d.
Reason for revert: Pixel tests, and gfx tests can't handle hinting.
Original change's description:
> Remove subpixel positioning implies no bytecode hinting.
>
> SkTypeface_FreeType::onFilterRec currently assumes that if we're asked
> to do subpixel positioning, don't do bytecode hinting. The idea was that
> both could not be satisfied at the same time, so pick something. This is
> no longer true, as with the v40 interpreter it is possible to get
> subpixel positioned but bytcode hinted glyphs.
>
> BUG=skia:6931
>
> Change-Id: Ifaeff20c121d6bb4b9287f552e383547eb6d5d49
> Reviewed-on: https://skia-review.googlesource.com/32201
> Reviewed-by: Yuqian Li <liyuqian@google.com>
> Reviewed-by: Ben Wagner <bungeman@google.com>
> Commit-Queue: Ben Wagner <bungeman@google.com>
TBR=bungeman@google.com,liyuqian@google.com,reed@google.com
Change-Id: Idb1ee50d271846bdf962986914f6b75e3aa817c8
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: skia:6931
Reviewed-on: https://skia-review.googlesource.com/33586
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
This new source acts like other sources (GMs, SKPs) for benchmarks. It
times multiple samples (controlled by samples=N, default 20), and each
of those samples uses the same strategy as monobench, growing loops
exponentially until it runs for at least 10ms.
When done it prints the fastest and the two slowest samples. In
practice the 100th percentile sample is very different from the
next slowest due to caching, and the fastest is always interesting.
Because these benchmarks run in whatever execution engine ok has
selected, on non-Windows platforms you have some real control over the
interaction between benchmarks. In its default "fork" mode each
benchmark runs independently in its own process, so the 100th
percentiles really stand out. The other modes "thread" and "serial"
work as you'd expect too.
Here's an example where you can see how the different interactions work:
out/ok bench:samples=100 8888 filter:search=text_16_AA fork
[text_16_AA_WT] 2.32µs @0 6.23µs @99 24.3ms @100
[text_16_AA_FF] 2.41µs @0 5.7µs @99 23.3ms @100
[text_16_AA_88] 2.55µs @0 5.6µs @99 24.8ms @100
[text_16_AA_BK] 1.97µs @0 5.44µs @99 23.2ms @100
out/ok bench:samples=100 8888 filter:search=text_16_AA thread
[text_16_AA_FF] 2.45µs @0 23.5µs @99 24.8ms @100
[text_16_AA_WT] 2.52µs @0 17.8µs @99 24.7ms @100
[text_16_AA_88] 2.55µs @0 19.7µs @99 25.1ms @100
[text_16_AA_BK] 1.8µs @0 14.7µs @99 25.1ms @100
out/ok bench:samples=100 8888 filter:search=text_16_AA serial
[text_16_AA_88] 2.35µs @0 3.53µs @99 16.7ms @100
[text_16_AA_FF] 2.09µs @0 2.73µs @99 2.91µs @100
[text_16_AA_BK] 1.75µs @0 2.46µs @99 2.65µs @100
[text_16_AA_WT] 2.1µs @0 3.16µs @99 3.17µs @100
In the first "fork" case all runs are independent and have roughly
the same profile. "thread" looks similar except you can see them
contending at the 99th percentile. In "serial", the first bench
warms up the rest, so their 100th percentiles are all much faster.
Change-Id: I01a9f8c54b540221a9f232b271bb8ef3fda2569c
Reviewed-on: https://skia-review.googlesource.com/33585
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
SkTypeface_FreeType::onFilterRec currently assumes that if we're asked
to do subpixel positioning, don't do bytecode hinting. The idea was that
both could not be satisfied at the same time, so pick something. This is
no longer true, as with the v40 interpreter it is possible to get
subpixel positioned but bytcode hinted glyphs.
BUG=skia:6931
Change-Id: Ifaeff20c121d6bb4b9287f552e383547eb6d5d49
Reviewed-on: https://skia-review.googlesource.com/32201
Reviewed-by: Yuqian Li <liyuqian@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
Commit-Queue: Ben Wagner <bungeman@google.com>
Bug: skia:
Change-Id: I0172e9e74898fb615cbb0ac61e46cbf9012ae75b
Reviewed-on: https://skia-review.googlesource.com/33262
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Change-Id: I985e54a071338e99292a5aa2f42c92bc115b4008
Reviewed-on: https://skia-review.googlesource.com/32760
Commit-Queue: Brian Salomon <bsalomon@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Broken by https://skia-review.googlesource.com/32862. It seems that
"tiles_rt" and "pic" configs were passing at one point, but have since
bit-rotted after being renamed to "tiles_rt-8888" and "pic-8888" and
thus ignored by dm.
No-Try: true
Change-Id: I00a5e5a0cc2090566809a61fa310c8ddaafdea43
Reviewed-on: https://skia-review.googlesource.com/33581
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
Adjust the configs specified by recipes to avoid the new error.
Change-Id: I23e31355e2faaab919d92abdb37a6f70cd2da1ff
Reviewed-on: https://skia-review.googlesource.com/32862
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
As a part of serializing SkPaths, I want to be able to know (without
asserting) whether or not a path is valid so that I can discard
potentially malicious deserialized paths.
Currently, SkPath(Ref) both just have asserting validation functions
which can't be used externally. This patch adds accessors that don't
assert.
Bug: chromium:752755 skia:6955
Change-Id: I4d0ceb31ec660b87e3fda438392ad2b60a27a0da
Reviewed-on: https://skia-review.googlesource.com/31720
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Mike Reed <reed@google.com>
These symbols were resolving, but weren't really connected to systrace
on Android. Using the macros/functions allows this to work correctly.
This change doesn't actually cause anything to happen in framework
builds - that still requires installing an instance of SkATrace, but
now doing so will route all of Skia's TRACE_EVENT macros to systrace.
Bug: skia:
Change-Id: I6d2eafac7ef2531ba92094b92f68e59e0490ef62
Reviewed-on: https://skia-review.googlesource.com/33400
Reviewed-by: Derek Sollenberger <djsollen@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
MSAN started failing after https://skia-review.googlesource.com/c/32722.
This should fix it.
No-Try: true
Change-Id: I8956c8c211507923f078fe96921fedaadefae8a8
Reviewed-on: https://skia-review.googlesource.com/32942
Reviewed-by: Ben Wagner <bungeman@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
Adding this extra field to the CommandBufferInfo may or may not have led to a memory regression.
Remove it until it is actually needed.
Change-Id: Ibdddbeb7625f91f5199584a575289f07f6e95304
Reviewed-on: https://skia-review.googlesource.com/33280
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
Also makes paint clones use cloned fragment processors.
Change-Id: I60efcfc6a46a4f8430a72f4d1ec79c7d99fbe593
Reviewed-on: https://skia-review.googlesource.com/33084
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Salomon <bsalomon@google.com>
1. Always inline (Clang previously ignored inline and got 25% slower)
2. SIMD everywhere other than x86 gcc:
non-SIMD is only faster in my desktop with gcc;
with Clang on my desktop, SIMD is 50% faster than non-SIMD.
3. Allocate 4x memory instead of 2x when running out of space:
on old Android devices with Linux kernel 3.10 (e.g., Nexus 6P, 5X),
the alloc/memcpy will triger a major bottleneck in kernel (30% of
the running time). Such bottleneck goes away (the kernel is no
longer doing stupid things during alloc/memcpy) in Linux kernel
3.18 (e.g., Pixel), and that's why DAA is much faster on Pixel than
on Nexus 6P.
I think maybe I should adopt SkRasterPipeline for device-specific
optimizations.
Bug: skia:
Change-Id: I0408aa7671a5f1b39aad3bec25f8fc994ff5a1bb
Reviewed-on: https://skia-review.googlesource.com/30820
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Yuqian Li <liyuqian@google.com>
This reverts commit a5a69cfb48.
Bug: skia:
Change-Id: I08475d96255b9df13e5c86e1ef9c7f4739e51459
Reviewed-on: https://skia-review.googlesource.com/33202
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Since both armv7-a-neon and 32-bit armv8-a have NEON, we can treat them
the same in Android.bp.
Bug: b/62895439
Corresponds to https://android-review.googlesource.com/c/423660/3
This change will generate the change to Android.bp described there.
Change-Id: Icae9b5b79093d6f2886da39771d4fbe901be237a
Reviewed-on: https://skia-review.googlesource.com/33000
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Leon Scroggins <scroggo@google.com>
Bug: skia:
Change-Id: I880e3d5a668743ac12fb0101baca637443e920b4
Reviewed-on: https://skia-review.googlesource.com/33082
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
This reverts commit 0f450acd76.
Bug: skia:
Change-Id: I97428fbbc6d82bf8b186ec5fdbf1a939c00e4126
Reviewed-on: https://skia-review.googlesource.com/32726
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
I tried to follow exactly the same strategy as a start.
(Though I did fix the off-by-one dimensions.)
It does rather look like we only need 3D and 4D now
that I've looked at the call sites.
Looks like about a 20% speedup.
Change-Id: I8b1af64750ad1750716ee1ab0767e64591c7206a
Reviewed-on: https://skia-review.googlesource.com/32842
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Since recipes are now versioned with the code, this is unnecessary and
could mask recipe bugs.
Change-Id: Ic5aafbd3a7e9ccd3fd529c71b282cf6b037b78df
Reviewed-on: https://skia-review.googlesource.com/32722
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
Valgrind 3.13.0 only supports up to AVX2, so I used SK_CPU_LIMIT_SSE41
to avoid failing with "Unrecognised instruction." (The existing Valgrind
tasks run on a ShuttleA machine, whose CPU only supports AVX.)
Needed to enable verbose output on all Valgrind tasks to avoid Swarming
I/O timeout. Opportunistically removed verbose output for Linux Intel
bots that are no longer failing.
Bug: skia:6881
Change-Id: I2ffa6efe901c97bd2e0bbc9b26632aafbb3cf9a6
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/31143
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
The new Delta AA scan converter does not need the edge to be updated
with monotonic Y so chopping at y extrema is not necessary. Removing
such chopping brings ~10% performance increase to chalkboard.svg which
has tons of small cubics (the same is true for many svgs I saw).
We didn't remove the chopping for quads because that does not bring
a significant speedup. Moreover, dropping those y extremas would make
our strokecircle animation look a little more wobbly (because we would
have fewer divisions for the quads at the top and bottom of the circle).
Bug: skia:
Change-Id: I3984d2619f9f77269ed24e8cbfa9f1429ebca4a8
Reviewed-on: https://skia-review.googlesource.com/31940
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Yuqian Li <liyuqian@google.com>
This reverts commit 175af0d011.
Reason for revert: Chrome doesn't know about portable format specifiers. Sigh.
Original change's description:
> GrContext::dump that produces JSON formatted output
>
> Includes caps, GL strings, and extensions
>
> Bug: skia:
> Change-Id: I1e8b3dd50fb68357f9de8ca6149cf65443d027ef
> Reviewed-on: https://skia-review.googlesource.com/32340
> Commit-Queue: Brian Osman <brianosman@google.com>
> Reviewed-by: Brian Salomon <bsalomon@google.com>
TBR=bsalomon@google.com,brianosman@google.com
Change-Id: Ie280b25275725f0661da7541f54ed62897abb82f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/32861
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
This reverts commit 6a7d56fa0f.
Reason for revert: Earlier commit needs to be reverted for Chrome roll.
Original change's description:
> Support single line objects and arrays
>
> This is just a formatting nicety. The new caps dump has several large
> arrays of structs, and keeping each object on one line makes them much
> more readable. (It also limits the total length of the output, which
> helps when scanning through).
>
> Example of the output, before and after this change:
> https://gist.github.com/brianosman/872f33be9af49031023b791e7db0b1fb
>
> Bug: skia:
> Change-Id: I0fe0c2241b0c7f451b0837500e554d0491126d5e
> Reviewed-on: https://skia-review.googlesource.com/32820
> Reviewed-by: Brian Salomon <bsalomon@google.com>
> Commit-Queue: Brian Osman <brianosman@google.com>
TBR=bsalomon@google.com,brianosman@google.com
Change-Id: I2b05cf79ca4804e5944f2eb3e17fe4be4d5af290
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/32860
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
It looks like our recursive approach is faster than interp3D(),
and we'd prefer trilinear interpolation over tetrahedral for quality.
Change-Id: I1019254b9ecf24b2f4feff17ed8ae1b48fcc281e
Reviewed-on: https://skia-review.googlesource.com/32800
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This reverts commit b681a0f1b0.
Reason for revert: Seems to be messing up some MacMini & Nexus7 bots
Original change's description:
> Store discard request on the opList and remove GrDiscardOp
>
> Change-Id: Ic1f76bb91c16b23df1fe71c07a4d5ad5abf1dc26
> Reviewed-on: https://skia-review.googlesource.com/32640
> Reviewed-by: Brian Salomon <bsalomon@google.com>
> Commit-Queue: Robert Phillips <robertphillips@google.com>
TBR=egdaniel@google.com,bsalomon@google.com,robertphillips@google.com
Change-Id: I8a89fae7bb11791bd023d7444a074bb34d006fd0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/32704
Reviewed-by: Robert Phillips <robertphillips@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
This is just a formatting nicety. The new caps dump has several large
arrays of structs, and keeping each object on one line makes them much
more readable. (It also limits the total length of the output, which
helps when scanning through).
Example of the output, before and after this change:
https://gist.github.com/brianosman/872f33be9af49031023b791e7db0b1fb
Bug: skia:
Change-Id: I0fe0c2241b0c7f451b0837500e554d0491126d5e
Reviewed-on: https://skia-review.googlesource.com/32820
Reviewed-by: Brian Salomon <bsalomon@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
Includes caps, GL strings, and extensions
Bug: skia:
Change-Id: I1e8b3dd50fb68357f9de8ca6149cf65443d027ef
Reviewed-on: https://skia-review.googlesource.com/32340
Commit-Queue: Brian Osman <brianosman@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Until now we've been using 3 separate parametric stages to apply
gamma to r,g,b. That works fine, but is kind of unnecessarily
slow, and again less clear in a stack trace than seeing "gamma".
The new bench runs in about 60% of the time the old one does
on my Trashcan.
BUG=skia:6939
Change-Id: I079698d3009b081f1c23a2e27fc26e373b439610
Reviewed-on: https://skia-review.googlesource.com/32721
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>