What is going on here is that, after the mapPoints in fillAANestedRects, devInside was upside down so the isEmpty check was always firing. I don't see why we need to avoid having devInside sorted.
BUG=488103
Review URL: https://codereview.chromium.org/1135753004
Improve caching of dashed paths in GrStencilAndCoverPathRenderer.
Look up the (NVPR specific) GrGLPath based on GrStrokeInfo and
the original path.
Use unique keys for all GrPaths.
Dash the path with Skia dash stroker and use that path geometry for
NVPR path.
NVPR internal dashing stroke is not used, because the dashing
implementation of NVPR does not match Skia implementation.
Review URL: https://codereview.chromium.org/1116123003
Make GrResourceCache performance less sensitive to key length change.
The memcmp in GrResourceKey is called when SkTDynamicHash jumps the
slots to find the hash by a index. Avoid most of the memcmps by
comparing the hash first.
This is important because small changes in key data length can cause
big performance regressions. The theory is that key length change causes
different hash values. These hash values might trigger memcmps that
originally weren't there, causing the regression.
Adds few specialized benches to grresourcecache_add to test different
key lengths. The tests are run only on release, because on debug the
SkTDynamicHash validation takes too long, and adding many such delays
to development test runs would be unproductive. On release the tests
are quite fast.
Effect of this patch to the added tests on amd64:
grresourcecache_find_10 738us -> 768us 1.04x
grresourcecache_find_2 472us -> 476us 1.01x
grresourcecache_find_25 841us -> 845us 1x
grresourcecache_find_4 565us -> 531us 0.94x
grresourcecache_find_54 1.18ms -> 1.1ms 0.93x
grresourcecache_find_5 834us -> 749us 0.9x
grresourcecache_find_3 620us -> 542us 0.87x
grresourcecache_add_25 2.74ms -> 2.24ms 0.82x
grresourcecache_add_56 3.23ms -> 2.56ms 0.79x
grresourcecache_add_54 3.34ms -> 2.62ms 0.78x
grresourcecache_add_5 2.68ms -> 2.1ms 0.78x
grresourcecache_add_10 2.7ms -> 2.11ms 0.78x
grresourcecache_add_2 1.85ms -> 1.41ms 0.76x
grresourcecache_add 1.84ms -> 1.4ms 0.76x
grresourcecache_add_4 1.99ms -> 1.49ms 0.75x
grresourcecache_add_3 2.11ms -> 1.55ms 0.73x
grresourcecache_add_55 39ms -> 13.9ms 0.36x
grresourcecache_find_55 23.2ms -> 6.21ms 0.27x
On arm64 the results are similar.
On arm_v7_neon, the results lack the discontinuity at 55:
grresourcecache_add 4.06ms -> 4.26ms 1.05x
grresourcecache_add_2 4.05ms -> 4.23ms 1.05x
grresourcecache_find 1.28ms -> 1.3ms 1.02x
grresourcecache_find_56 3.35ms -> 3.32ms 0.99x
grresourcecache_find_2 1.31ms -> 1.29ms 0.99x
grresourcecache_find_54 3.28ms -> 3.24ms 0.99x
grresourcecache_add_5 6.38ms -> 6.26ms 0.98x
grresourcecache_add_55 8.44ms -> 8.24ms 0.98x
grresourcecache_add_25 7.03ms -> 6.86ms 0.98x
grresourcecache_find_25 2.7ms -> 2.59ms 0.96x
grresourcecache_find_4 1.45ms -> 1.38ms 0.95x
grresourcecache_find_10 2.52ms -> 2.39ms 0.95x
grresourcecache_find_55 3.54ms -> 3.33ms 0.94x
grresourcecache_find_5 2.5ms -> 2.32ms 0.93x
grresourcecache_find_3 1.57ms -> 1.43ms 0.91x
The extremely slow case, 55, is postulated to be due to the index jump
collisions running the memcmp. This is not visible on arm_v7_neon probably due
to hash function producing different results for 32 bit architectures.
This change is needed for extending path cache key in Gr
NV_path_rendering codepath. Extending is needed in order to add dashed
paths to the path cache.
Review URL: https://codereview.chromium.org/1132723003
Set the "path stroke error bound" path parameter to 0.02 for all paths.
This means that the stroked path area will be within 98% of the stroke
width in path space.
This should fix many cases where NVPR stroked paths were visibly different to
Skia stroked paths. One such path is in dashcubics gm.
This increases the amount of subdivisions the path object creation will
make for paths that need it. This in turn will increase gpu object space
requirements sligthly. Both of these effects should be unnoticeable.
GL_NV_path_rendering.txt:
"""
Every path object has a stroke approximation bound parameter
(PATH_STROKE_BOUND_NV) that is a floating-point value /sab/ clamped
between 0.0 and 1.0 and set and queried with the PATH_STROKE_BOUND_NV
path parameter. Exact determination of samples swept an orthogonal
centered line segment along cubic Bezier segments and rational
quadratic Bezier curves (so non-circular partial elliptical arcs) is
intractable for real-time rendering so an approximation is required;
/sab/ intuitively bounds the approximation error as a percentage of
the path object's stroke width. Specifically, this path parameter
requests the implementation to stencil any samples within /sweep/
object space units of the exact sweep of the path's cubic Bezier
segments or partial elliptical arcs to be sampled by the stroke where
sweep = ((1-sab)*sw)/2
where /sw/ is the path object's stroke width. The initial value
of /sab/ when a path is created is 0.2. In practical terms, this
initial value means the stencil sample positions coverage within 80%
(100%-20%) of the stroke width of cubic and rational quadratic stroke
segments should be sampled.
"""
BUG=skia:2049
Review URL: https://codereview.chromium.org/1124423007
Make the code more readable by inheriting GrStrokeInfo from SkStrokeRec.
This should avoid the long .getStrokeRec() and .getStrokeRecPtr(). These
were a bit cumbersome especially in cases where an alias variable was
created for these, and then the reader had to keep track to which
StrokeInfo member the StrokeRec alias was pointing.
Removes SkStrokeRec::SkStrokeRec(const SkStrokeRec&). It was memcpying.
Try to play it safe wrt compiler using the possible padding of
superclass for subclass members. Instead, let the compiler generate
the copy constructor. Assignment operator was already
compiler-generated, so at least in that way this is consistent.
Renames GrStrokeInfo::applyDash to applyDashToPath for consistency
with superclass applyToPath.
Review URL: https://codereview.chromium.org/1128113008
Reason for revert:
win_chromium_compile_dbg_ng
FAILED: ninja -t msvc -e environment.x86 -- E:\b\build\goma/gomacc "E:\b\depot_tools\win_toolchain\vs2013_files\VC\bin\amd64_x86\cl.exe" /nologo /showIncludes /FC @obj\third_party\skia\src\core\skia.SkBitmapHeap.obj.rsp /c ..\..\third_party\skia\src\core\SkBitmapHeap.cpp /Foobj\third_party\skia\src\core\skia.SkBitmapHeap.obj /Fdobj\skia\skia.cc.pdb
e:\b\build\slave\win\build\src\third_party\skia\include\core\skpicture.h(176) : error C2487: 'CURRENT_PICTURE_VERSION' : member of dll interface class may not be declared with dll interface
Original issue's description:
> Sketch splitting SkPicture into an interface and SkBigPicture.
>
> Adds small pictures for drawRect(), drawTextBlob(), and drawPath().
> These cover about 89% of draw calls from Blink SKPs,
> and about 25% of draw calls from our GMs.
>
> SkPicture handles:
> - serialization and deserialization
> - unique IDs
>
> Everything else is left to the subclasses:
> - playback(), cullRect()
> - hasBitmap(), hasText(), suitableForGPU(), etc.
> - LayerInfo / AccelData if applicable.
>
> The time to record a 1-op picture improves a good chunk
> (2 mallocs to 1), and the time to record a 0-op picture
> greatly improves (2 mallocs to none):
>
> picture_overhead_draw: 450ns -> 350ns
> picture_overhead_nodraw: 300ns -> 90ns
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/c92c129ff85b05a714bd1bf921c02d5e14651f8b
>
> Latest blink_linux_rel:
>
> http://build.chromium.org/p/tryserver.blink/builders/linux_blink_rel/builds/61248
>
> Committed: https://skia.googlesource.com/skia/+/15877b6eae33a9282458bdb904a6d00440eca0ecTBR=reed@google.com,robertphillips@google.com,fmalita@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1130283004
Adds small pictures for drawRect(), drawTextBlob(), and drawPath().
These cover about 89% of draw calls from Blink SKPs,
and about 25% of draw calls from our GMs.
SkPicture handles:
- serialization and deserialization
- unique IDs
Everything else is left to the subclasses:
- playback(), cullRect()
- hasBitmap(), hasText(), suitableForGPU(), etc.
- LayerInfo / AccelData if applicable.
The time to record a 1-op picture improves a good chunk
(2 mallocs to 1), and the time to record a 0-op picture
greatly improves (2 mallocs to none):
picture_overhead_draw: 450ns -> 350ns
picture_overhead_nodraw: 300ns -> 90ns
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/c92c129ff85b05a714bd1bf921c02d5e14651f8b
Latest blink_linux_rel:
http://build.chromium.org/p/tryserver.blink/builders/linux_blink_rel/builds/61248
Review URL: https://codereview.chromium.org/1112523006
Reason for revert:
break cros build
Original issue's description:
> SkPDF: Add Sfntly to DEPS, gyp
>
> Note: this can be disabled via:
> GYP_DEFINES='skia_pdf_use_sfntly=0
>
> Warning: dm is 34% slower and uses 9% more memory. This is
> okay.
>
> Motivation: We want to test this code path in DM, since it is
> always used by Chromium and Android.
>
> BUG=skia:3563
>
> Committed: https://skia.googlesource.com/skia/+/6a53b04e26749ea61f690ece408f2a1c0a5ad5bbTBR=reed@google.com,mtklein@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:3563
Review URL: https://codereview.chromium.org/1128353004
Note: this can be disabled via:
GYP_DEFINES='skia_pdf_use_sfntly=0
Warning: dm is 34% slower and uses 9% more memory. This is
okay.
Motivation: We want to test this code path in DM, since it is
always used by Chromium and Android.
BUG=skia:3563
Review URL: https://codereview.chromium.org/1134683006
Adds and uses fastMulDiv255Round() where possible,
which approximates x*y/255 as (x*y+x)/256. Seems like a sizeable
speedup, as seen below on Exclusion, Screen, and Modulate. The
existing NEON code uses this approximation for
{Src,Dst}x{In,Out,Over}, and without it we'd regress speed there.
This will require rebaselines whether or not we use this
approximation: the x86 bots change if we do, the ARM bots change
if we don't. None of the diffs are significant.
Desktop:
Xfermode_Screen_aa 5.82ms -> 5.54ms 0.95x
Xfermode_Modulate_aa 5.67ms -> 5.36ms 0.95x
Xfermode_Exclusion_aa 6.18ms -> 5.81ms 0.94x
Xfermode_Exclusion 5.03ms -> 4.24ms 0.84x
Xfermode_Screen 4.51ms -> 3.59ms 0.8x
Xfermode_Modulate 4.2ms -> 3.19ms 0.76x
Xfermode_DstOver 6.73ms -> 3.88ms 0.58x
Xfermode_SrcOut 6.47ms -> 3.48ms 0.54x
Xfermode_SrcIn 6.46ms -> 3.46ms 0.54x
Xfermode_DstOut 6.49ms -> 3.41ms 0.52x
Xfermode_DstIn 6.5ms -> 3.32ms 0.51x
Xfermode_Src_aa 9.53ms -> 4.75ms 0.5x
Xfermode_Clear_aa 9.65ms -> 4.8ms 0.5x
Xfermode_DstIn_aa 11.5ms -> 5.57ms 0.49x
Xfermode_DstOver_aa 11.6ms -> 5.63ms 0.49x
Xfermode_SrcOut_aa 11.6ms -> 5.5ms 0.47x
Xfermode_SrcIn_aa 11.7ms -> 5.51ms 0.47x
Xfermode_DstOut_aa 11.7ms -> 5.4ms 0.46x
N7 performance is close enough to 1x that I'm not sure whether
this is a net win, net loss, or truly neutral. I figure the bots will
show that.
I experimented with another approximation,
(x*(255-y))/255 ≈ (x*(256-y))/256. This was inconclusive, so I'm
leaving it out for now.
The remaining modes are the complicated conditional ones.
BUG=skia:
Review URL: https://codereview.chromium.org/1141953004
Cheap (one contour) paths can be evaluated and reversed as needed with a minimum of checking, but multi-contour paths invoke the regular path ops machinery to determine who is contained by whom.
More tests need to be added to verify that all corner cases are considered, but this fixes the cases in the bug thus far.
R=fmalita@chromium.orgTBR=reed@google.com
BUG=skia:3838
Review URL: https://codereview.chromium.org/1129193006
Reason for revert:
Appears to be breaking Linux ARM bots:
FAILED:
/usr/local/google/home/mosaic-role/slave/repo_clients/chromium_tot/chromium/src/../../prebuilt/toolchain/armv7a/bin/armv7a-cros-linux-gnueabi-g++
... -o obj/third_party/skia/src/ports/skia_library.SkFontHost_FreeType.o
../../third_party/skia/src/ports/SkFontHost_FreeType.cpp:37:31: fatal error:
freetype/ftmm.h: No such file or directory
#include FT_MULTIPLE_MASTERS_H
^
compilation terminated.
Original issue's description:
> Font variations.
>
> Multiple Master and TrueType fonts support variation axes.
> This implements back-end support for axes on platforms which
> support it.
>
> Committed: https://skia.googlesource.com/skia/+/05773ed30920c0214d1433c07cf6360a05476c97
>
> Committed: https://skia.googlesource.com/skia/+/3489ee0f4fa34f124f9de090d12bdc2107d52aa9TBR=reed@google.com,mtklein@google.com,djsollen@google.com,halcanary@google.com,bungeman@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1139123008
Make the bin/c and bin/ac scripts work with sh. The scripts are run with
/bin/sh shebang, which fails atleast on ubuntu 12.04 /bin/sh. The sh in
Ubuntu 12.04 is dash.
The fixes are according to the suggestions in http://mywiki.wooledge.org/Bashism
Also run "compare" script with explicit ./bin/ path to support people
who do not have skia/bin in PATH.
Review URL: https://codereview.chromium.org/1139033005
0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does.
I plan to use this to implement the Difference xfermode,
and it seems generally handy.
BUG=skia:
Review URL: https://codereview.chromium.org/1133933004
The rewrite of path ops caused the inner contour direction to be reversed.
This exposed an existing bug in path ops builder, namely that the implicit
winding of the internal sum path could hide inner contours if they ended
up in the wrong direction.
Setting the sum path's fill type to even-odd ensures that the inner
contours aren't discarded.
R=fmalita@chromium.org
BUG=skia:3838
Review URL: https://codereview.chromium.org/1126193004
Use SkPoint* for all vertex data. (It was void* to support two different
vertex data formats, which we no longer need.)
Assert that vertex stride == sizeof(SkPoint).
Fix build when LOGGING_ENABLED=1.
BUG=skia:
Review URL: https://codereview.chromium.org/1128083004
Reason for revert:
This CL seems to be triggering the assert:
src/gpu/SkGpuDevice.cpp:319: failed assertion "!fNeedClear"
on, at least, Mac & Ubuntu.
Original issue's description:
> Defer glClear to just before draw call
>
> Remove some DO_DEFERRED_CLEAR call to avoid call glClear separately, like this:
> glBindFramebuffer(1)
> glClear
> glBindFramebuffer(2)
> glClear
> glBindFramebuffer(1)
> glDrawXXX
> glBindFramebuffer(2)
> glDrawXXX
>
> These call sequences may need read and write memory back and forth.
>
> If we call DO_DEFERRED_CLEAR just before draw call, we can bind, clear and draw in one go.
> e.g.
> glBindFramebuffer(1)
> glClear
> glDrawXXX
> glBindFramebuffer(2)
> glClear
> glDrawXXX
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/c3c06a13e69b90d4cc1d543853504072d363ae8bTBR=bsalomon@google.com,joel.liang@arm.com,wasim.abbas@arm.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1136393005