Adds a module that performs instanced rendering and starts using it
for a select subset of draws on Mac GL platforms. The instance
processor can currently handle rects, ovals, round rects, and double
round rects. It can generalize shapes as round rects in order to
improve batching. The instance processor also employs new drawing
algorithms, irrespective of instanced rendering, that improve GPU-side
performance (e.g. sample mask, different triangle layouts, etc.).
This change only scratches the surface of instanced rendering. The
majority of draws still only have one instance. Future work may
include:
* Passing coord transforms through the texel buffer.
* Sending FP uniforms through instanced vertex attribs.
* Using instanced rendering for more draws (stencil writes,
drawAtlas, etc.).
* Adding more shapes to the instance processor’s repertoire.
* Batching draws that have mismatched scissors (analyzing draw
bounds, inserting clip planes, etc.).
* Bindless textures.
* Uber shaders.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2066993003
Committed: https://skia.googlesource.com/skia/+/42eafa4bc00354b132ad114d22ed6b95d8849891
Review-Url: https://codereview.chromium.org/2066993003
Reason for revert:
This caused static initializer regressions in Chromium (crbug.com/625728).
Relevant build logs here:
Linux:
https://build.chromium.org/p/chromium/builders/Linux%20x64/builds/21849
Mac:
https://build.chromium.org/p/chromium/builders/Mac/builds/17350
Relevant lines from the error log:
Linux:
# InstanceProcessor.cpp GrUniqueKey::GenerateDomain()
# InstanceProcessor.cpp gr_instanced::kShapeBufferDomain
FAILED linux-release-64/sizes/nacl_helper-si/initializers: actual 8, expected 7, better lower
FAILED linux-release-64/sizes/chrome-si/initializers: actual 8, expected 7, better lower
Mac:
FAILED mac-release/sizes/chrome-si/initializers: actual 2, expected 0, better lower
Original issue's description:
> Begin instanced rendering for simple shapes
>
> Adds a module that performs instanced rendering and starts using it
> for a select subset of draws on Mac GL platforms. The instance
> processor can currently handle rects, ovals, round rects, and double
> round rects. It can generalize shapes as round rects in order to
> improve batching. The instance processor also employs new drawing
> algorithms, irrespective of instanced rendering, that improve GPU-side
> performance (e.g. sample mask, different triangle layouts, etc.).
>
> This change only scratches the surface of instanced rendering. The
> majority of draws still only have one instance. Future work may
> include:
>
> * Passing coord transforms through the texel buffer.
> * Sending FP uniforms through instanced vertex attribs.
> * Using instanced rendering for more draws (stencil writes,
> drawAtlas, etc.).
> * Adding more shapes to the instance processor’s repertoire.
> * Batching draws that have mismatched scissors (analyzing draw
> bounds, inserting clip planes, etc.).
> * Bindless textures.
> * Uber shaders.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2066993003
>
> Committed: https://skia.googlesource.com/skia/+/42eafa4bc00354b132ad114d22ed6b95d8849891
NOTREECHECKS=true
TBR=bsalomon@google.com,egdaniel@google.com,robertphillips@google.com,csmartdalton@google.com
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG=skia:
Review-Url: https://codereview.chromium.org/2123693002
Adds a module that performs instanced rendering and starts using it
for a select subset of draws on Mac GL platforms. The instance
processor can currently handle rects, ovals, round rects, and double
round rects. It can generalize shapes as round rects in order to
improve batching. The instance processor also employs new drawing
algorithms, irrespective of instanced rendering, that improve GPU-side
performance (e.g. sample mask, different triangle layouts, etc.).
This change only scratches the surface of instanced rendering. The
majority of draws still only have one instance. Future work may
include:
* Passing coord transforms through the texel buffer.
* Sending FP uniforms through instanced vertex attribs.
* Using instanced rendering for more draws (stencil writes,
drawAtlas, etc.).
* Adding more shapes to the instance processor’s repertoire.
* Batching draws that have mismatched scissors (analyzing draw
bounds, inserting clip planes, etc.).
* Bindless textures.
* Uber shaders.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2066993003
Review-Url: https://codereview.chromium.org/2066993003
Ensures that ".get()" always returns null when a container is empty.
Also ensures consistent assert behavior for array counts.
There are still differences in that the malloc variants take a size_t
and the arrays take an int, and that SkAutoSTMalloc defaults to the
stack-allocated buffer wheras the other containers default to null.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2084213003
Review-Url: https://codereview.chromium.org/2084213003
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164a
Review-Url: https://codereview.chromium.org/2045633002
$ git grep -l '<windows.h>' include src
include/private/SkLeanWindows.h
$ git grep -l SkLeanWindows.h | grep '\.h$'
include/ports/SkTypeface_win.h
include/utils/win/SkHRESULT.h
include/utils/win/SkTScopedComPtr.h
include/views/SkEvent.h
src/core/SkMathPriv.h
src/ports/SkTypeface_win_dw.h
src/utils/SkThreadUtils_win.h
src/utils/win/SkWGL.h
The same for `#include <intrin.h>` that was found in SkMath.h.
Those functions that needed it are moved to SkMathPriv.h.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041943002
CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_chromium_compile_dbg_ng,win_chromium_compile_rel_ng
Review-Url: https://codereview.chromium.org/2041943002
Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.
Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164aTBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2046213002
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Review-Url: https://codereview.chromium.org/2045633002
- remove dead code
- rewrite float -> int converters
The strategy for the new converters is:
- convert input to double
- floor/ceil/round in double space
- pin that double to [SK_MinS32, SK_MaxS32]
- truncate that double to int32_t
This simpler strategy does not work:
- floor/ceil/round in float space
- pin that float to [SK_MinS32, SK_MaxS32]
- truncate that float to int32_t
SK_MinS32 and SK_MaxS32 are not representable as floats:
they round to the nearest float, ±2^31, which makes the
pin insufficient for floats near SK_MinS32 (-2^31+1) or
SK_MaxS32 (+2^31-1).
float only has 24 bits of precision, and we need 31.
double can represent all integers up to 50-something bits.
An alternative is to pin in float to ±2147483520, the last
exactly representable float before SK_MaxS32 (127 too small).
Our tests test that we round as floor(x+0.5), which can
return different numbers than round(x) for negative x.
So this CL explicitly uses floor(x+0.5).
I've updated the tests with ±inf and ±NaN, and tried to
make them a little clearer, especially using SK_MinS32
instead of -SK_MaxS32.
I have not timed anything here. I have never seen any of these
methods in a profile.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2012333003
Review-Url: https://codereview.chromium.org/2012333003
It's always nice to kill off a synchronization primitive.
And while less terse, I think this new code reads more clearly.
... and, SkOncePtr's tests were the only thing now using sk_num_cores()
outside of SkTaskGroup, so I've hidden it as static inside SkTaskGroup.cpp.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1953533002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/1953533002
Still slowly working through all the SK_DECLARE_STATIC_FOO macros.
SkOncePtr is complicating things by having SkOncePtr delete its pointer
and SkBaseOncePtr not. Simplify things by removing SkOncePtr, leaving
only the leaky SkBaseOncePtr.
We replace SkOncePtr<T> instead with SkOnce and T. In most cases this
did not need to be a pointer, and in some cases here we're even saving
a few bytes by replacing SkOncePtr<T> with SkOnce and a T.
The dependency map of SK_DECLARE_STATIC_FOO is:
SkBaseMutex -> SkBaseSemaphore -> SkBaseOncePtr
They're intertwined enough that I think I've got to do all three in one
next CL.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1939503002
Review-Url: https://codereview.chromium.org/1939503002
The VC++ STL with 2015u2 now provides a complete std::is_function.
Also, Skia is no longer using skstd::is_function.
Review-Url: https://codereview.chromium.org/1929343002
Reason for revert:
bust the roll
Original issue's description:
> SkOnce: 2 bytes -> 1 byte
>
> This uses the same logic we worked out for SkOncePtr to reduce
> the memory footprint of SkOnce from a done byte and lock byte
> to a single 3-state byte:
>
> - NotStarted: no thread has tried to run fn() yet
> - Active: a thread is running fn()
> - Done: fn() is complete
>
> Threads which see Done return immediately.
> Threads which see NotStarted try to move to Active, run fn(), then move to Done.
> Threads which see Active spin until the active thread moves to Done.
>
> This additionally fixes a too-weak memory order bug in SkOncePtr,
> and adds a big note to explain.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1904483003
>
> Committed: https://skia.googlesource.com/skia/+/df02d338be8e3c1c50b48a3a9faa582703a39c07TBR=herb@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1898413004
This uses the same logic we worked out for SkOncePtr to reduce
the memory footprint of SkOnce from a done byte and lock byte
to a single 3-state byte:
- NotStarted: no thread has tried to run fn() yet
- Active: a thread is running fn()
- Done: fn() is complete
Threads which see Done return immediately.
Threads which see NotStarted try to move to Active, run fn(), then move to Done.
Threads which see Active spin until the active thread moves to Done.
This additionally fixes a too-weak memory order bug in SkOncePtr,
and adds a big note to explain.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1904483003
Review URL: https://codereview.chromium.org/1904483003
SkTArray cannot currently contain move only elements because its swap
currently requires the SkTArray to be copyable. This makes SkTArray
movable and makes its swap move instead of copy.
Review URL: https://codereview.chromium.org/1904663004
The API and implementation are very much simplified.
You may not want to bother reading the diff.
As is our trend, SkOnce now uses <atomic> directly.
Member initialization means we don't need SK_DECLARE_STATIC_ONCE.
SkSpinlock already works this same way.
All uses of the old API taking an external bool* and Lock* were pessimal,
so I have not carried this sort of API forward. It's simpler, faster,
and more space-efficient to always use this single SkOnce class interface.
SkOnce weighs 2 bytes: a done bool and an SkSpinlock, also a bool internally.
This API refactoring opens up the opportunity to fuse those into a single
three-state byte if we'd like.
No public API changes.
TBR=reed@google.com
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1894893002
Review URL: https://codereview.chromium.org/1894893002
This enables removing the more complicated atomic shims from SkAtomics.h.
TBR=reed
This doesn't actually change any API.
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-TSAN-Trybot,Test-Ubuntu-GCC-Golo-GPU-GT610-x86_64-Release-TSAN-Trybot
Review URL: https://codereview.chromium.org/1867863002