Adds and uses fastMulDiv255Round() where possible,
which approximates x*y/255 as (x*y+x)/256. Seems like a sizeable
speedup, as seen below on Exclusion, Screen, and Modulate. The
existing NEON code uses this approximation for
{Src,Dst}x{In,Out,Over}, and without it we'd regress speed there.
This will require rebaselines whether or not we use this
approximation: the x86 bots change if we do, the ARM bots change
if we don't. None of the diffs are significant.
Desktop:
Xfermode_Screen_aa 5.82ms -> 5.54ms 0.95x
Xfermode_Modulate_aa 5.67ms -> 5.36ms 0.95x
Xfermode_Exclusion_aa 6.18ms -> 5.81ms 0.94x
Xfermode_Exclusion 5.03ms -> 4.24ms 0.84x
Xfermode_Screen 4.51ms -> 3.59ms 0.8x
Xfermode_Modulate 4.2ms -> 3.19ms 0.76x
Xfermode_DstOver 6.73ms -> 3.88ms 0.58x
Xfermode_SrcOut 6.47ms -> 3.48ms 0.54x
Xfermode_SrcIn 6.46ms -> 3.46ms 0.54x
Xfermode_DstOut 6.49ms -> 3.41ms 0.52x
Xfermode_DstIn 6.5ms -> 3.32ms 0.51x
Xfermode_Src_aa 9.53ms -> 4.75ms 0.5x
Xfermode_Clear_aa 9.65ms -> 4.8ms 0.5x
Xfermode_DstIn_aa 11.5ms -> 5.57ms 0.49x
Xfermode_DstOver_aa 11.6ms -> 5.63ms 0.49x
Xfermode_SrcOut_aa 11.6ms -> 5.5ms 0.47x
Xfermode_SrcIn_aa 11.7ms -> 5.51ms 0.47x
Xfermode_DstOut_aa 11.7ms -> 5.4ms 0.46x
N7 performance is close enough to 1x that I'm not sure whether
this is a net win, net loss, or truly neutral. I figure the bots will
show that.
I experimented with another approximation,
(x*(255-y))/255 ≈ (x*(256-y))/256. This was inconclusive, so I'm
leaving it out for now.
The remaining modes are the complicated conditional ones.
BUG=skia:
Review URL: https://codereview.chromium.org/1141953004
Cheap (one contour) paths can be evaluated and reversed as needed with a minimum of checking, but multi-contour paths invoke the regular path ops machinery to determine who is contained by whom.
More tests need to be added to verify that all corner cases are considered, but this fixes the cases in the bug thus far.
R=fmalita@chromium.orgTBR=reed@google.com
BUG=skia:3838
Review URL: https://codereview.chromium.org/1129193006
Reason for revert:
Appears to be breaking Linux ARM bots:
FAILED:
/usr/local/google/home/mosaic-role/slave/repo_clients/chromium_tot/chromium/src/../../prebuilt/toolchain/armv7a/bin/armv7a-cros-linux-gnueabi-g++
... -o obj/third_party/skia/src/ports/skia_library.SkFontHost_FreeType.o
../../third_party/skia/src/ports/SkFontHost_FreeType.cpp:37:31: fatal error:
freetype/ftmm.h: No such file or directory
#include FT_MULTIPLE_MASTERS_H
^
compilation terminated.
Original issue's description:
> Font variations.
>
> Multiple Master and TrueType fonts support variation axes.
> This implements back-end support for axes on platforms which
> support it.
>
> Committed: https://skia.googlesource.com/skia/+/05773ed30920c0214d1433c07cf6360a05476c97
>
> Committed: https://skia.googlesource.com/skia/+/3489ee0f4fa34f124f9de090d12bdc2107d52aa9TBR=reed@google.com,mtklein@google.com,djsollen@google.com,halcanary@google.com,bungeman@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1139123008
Make the bin/c and bin/ac scripts work with sh. The scripts are run with
/bin/sh shebang, which fails atleast on ubuntu 12.04 /bin/sh. The sh in
Ubuntu 12.04 is dash.
The fixes are according to the suggestions in http://mywiki.wooledge.org/Bashism
Also run "compare" script with explicit ./bin/ path to support people
who do not have skia/bin in PATH.
Review URL: https://codereview.chromium.org/1139033005
0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does.
I plan to use this to implement the Difference xfermode,
and it seems generally handy.
BUG=skia:
Review URL: https://codereview.chromium.org/1133933004
The rewrite of path ops caused the inner contour direction to be reversed.
This exposed an existing bug in path ops builder, namely that the implicit
winding of the internal sum path could hide inner contours if they ended
up in the wrong direction.
Setting the sum path's fill type to even-odd ensures that the inner
contours aren't discarded.
R=fmalita@chromium.org
BUG=skia:3838
Review URL: https://codereview.chromium.org/1126193004
Use SkPoint* for all vertex data. (It was void* to support two different
vertex data formats, which we no longer need.)
Assert that vertex stride == sizeof(SkPoint).
Fix build when LOGGING_ENABLED=1.
BUG=skia:
Review URL: https://codereview.chromium.org/1128083004
Reason for revert:
This CL seems to be triggering the assert:
src/gpu/SkGpuDevice.cpp:319: failed assertion "!fNeedClear"
on, at least, Mac & Ubuntu.
Original issue's description:
> Defer glClear to just before draw call
>
> Remove some DO_DEFERRED_CLEAR call to avoid call glClear separately, like this:
> glBindFramebuffer(1)
> glClear
> glBindFramebuffer(2)
> glClear
> glBindFramebuffer(1)
> glDrawXXX
> glBindFramebuffer(2)
> glDrawXXX
>
> These call sequences may need read and write memory back and forth.
>
> If we call DO_DEFERRED_CLEAR just before draw call, we can bind, clear and draw in one go.
> e.g.
> glBindFramebuffer(1)
> glClear
> glDrawXXX
> glBindFramebuffer(2)
> glClear
> glDrawXXX
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/c3c06a13e69b90d4cc1d543853504072d363ae8bTBR=bsalomon@google.com,joel.liang@arm.com,wasim.abbas@arm.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1136393005
Remove some DO_DEFERRED_CLEAR call to avoid call glClear separately, like this:
glBindFramebuffer(1)
glClear
glBindFramebuffer(2)
glClear
glBindFramebuffer(1)
glDrawXXX
glBindFramebuffer(2)
glDrawXXX
These call sequences may need read and write memory back and forth.
If we call DO_DEFERRED_CLEAR just before draw call, we can bind, clear and draw in one go.
e.g.
glBindFramebuffer(1)
glClear
glDrawXXX
glBindFramebuffer(2)
glClear
glDrawXXX
BUG=skia:
Review URL: https://codereview.chromium.org/1113003005
I realized when writing the comment on https://crrev.com/1135363002/
that I'd really just sketched out the entire thing, so I couldn't help
but actually write up a working CL. How does this do for your benchmark?
BUG=chromium:487075
Review URL: https://codereview.chromium.org/1130123006
SSE runs 2-3x faster (than 4f), NEON runs 1.2-1.4x faster (than existing NEON).
Small diffs on {aarectmodes, imagefilters_xfermodes, hairmodes, mixed_xfermodes} only on AA edges due to precision drop.
BUG=skia:
Review URL: https://codereview.chromium.org/1132853005
Implemented by extracting out the non-scale/translate components
and applying that post-filter as an SkMatrixImageFilter.
BUG=skia:
Review URL: https://codereview.chromium.org/1120043002
alphas() extracts the 4 alphas from an existing Sk4px as another Sk4px.
LoadNAlphas() constructs an Sk4px from N packed alphas.
In both cases, we end up with 4x repeated alphas aligned with their pixels.
alphas()
A0 R0 G0 B0 A1 R1 G1 B1 A2 R2 G2 B2 A3 R3 G3 B3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3
Load4Alphas()
A0 A1 A2 A3
->
A0 A0 A0 A0 A1 A1 A1 A1 A2 A2 A2 A2 A3 A3 A3 A3
Load2Alphas()
A0 A1
->
A0 A0 A0 A0 A1 A1 A1 A1 0 0 0 0 0 0 0 0
This is a 5-10% speedup for AA on Intel, and wash on ARM.
AA is still mostly dominated by the final lerp.
alphas() isn't used yet, but it's similar enough to Load[24]Alphas()
that it was easier to write all at once.
BUG=skia:
Review URL: https://codereview.chromium.org/1138333003
Multiple Master and TrueType fonts support variation axes.
This implements back-end support for axes on platforms which
support it.
Review URL: https://codereview.chromium.org/1027373002
Confirm that no path ops tests are flaky, and clean up errors around
that. The test framework was incorrectly checking for >= MAX_ERRORS for
failure and <= MAX_ERRORS for success.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1140563003