I think I cracked it.
Though, this may not technically be legal C++...
I've only got one definition of SkCpu::gCachedFeatures,
but two different declarations: non-const in SkCpu.cpp, const elsewhere.
Is this...
- legal C++?
- not C++ but probably works as I think?
- not C++ and will probably blow up?
- who knows, let's see?
I have tested that the features are cached properly, read properly, and that the generated code treats SkCpu::gCachedFeatures as a global constant outside SkCpu.cpp. So it all observably works optimally.
Expanding testing to more bots.
TBR=reed@google.com
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1905683003
Review URL: https://codereview.chromium.org/1905683003
Reason for revert:
bust the roll
Original issue's description:
> SkOnce: 2 bytes -> 1 byte
>
> This uses the same logic we worked out for SkOncePtr to reduce
> the memory footprint of SkOnce from a done byte and lock byte
> to a single 3-state byte:
>
> - NotStarted: no thread has tried to run fn() yet
> - Active: a thread is running fn()
> - Done: fn() is complete
>
> Threads which see Done return immediately.
> Threads which see NotStarted try to move to Active, run fn(), then move to Done.
> Threads which see Active spin until the active thread moves to Done.
>
> This additionally fixes a too-weak memory order bug in SkOncePtr,
> and adds a big note to explain.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1904483003
>
> Committed: https://skia.googlesource.com/skia/+/df02d338be8e3c1c50b48a3a9faa582703a39c07TBR=herb@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1898413004
This uses the same logic we worked out for SkOncePtr to reduce
the memory footprint of SkOnce from a done byte and lock byte
to a single 3-state byte:
- NotStarted: no thread has tried to run fn() yet
- Active: a thread is running fn()
- Done: fn() is complete
Threads which see Done return immediately.
Threads which see NotStarted try to move to Active, run fn(), then move to Done.
Threads which see Active spin until the active thread moves to Done.
This additionally fixes a too-weak memory order bug in SkOncePtr,
and adds a big note to explain.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1904483003
Review URL: https://codereview.chromium.org/1904483003
SkTArray cannot currently contain move only elements because its swap
currently requires the SkTArray to be copyable. This makes SkTArray
movable and makes its swap move instead of copy.
Review URL: https://codereview.chromium.org/1904663004
The main goal of this refactorization is to allow Vulkan to use separate
sampler and texture objects in the shader and descriptor sets and combine
them into a sampler2d in the shader where needed.
A large part of this is separating how we store samplers and uniforms in the
UniformHandler. We no longer need to store handles to samplers besides when
we are initially emitting code. After we emit code all we ever do is loop over
all samplers and do some processor independent work on them, so we have no need
for direct access to individual samplers.
In the GLProgram all we ever do is set the sampler uniforms in the ctor and never
touch them again, so no need to save sampler info there. The texture access on
program reuse just assume that they come in the same order as we set the texture
units for the samplers
For Vulkan, it is a similar story. We create the descriptor set layouts with the samplers,
then when we get new textures, we just assume they come in in the same order as we
set the samplers on the descriptor sets. Thus no need to save direct vulkan info.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1885863004
Committed: https://skia.googlesource.com/skia/+/45b61a1c4c0be896e7b12fd1405abfece799114f
Review URL: https://codereview.chromium.org/1885863004
Reason for revert:
breaking bots
Original issue's description:
> Refactor how we store and use samplers in Ganesh
>
> The main goal of this refactorization is to allow Vulkan to use separate
> sampler and texture objects in the shader and descriptor sets and combine
> them into a sampler2d in the shader where needed.
>
> A large part of this is separating how we store samplers and uniforms in the
> UniformHandler. We no longer need to store handles to samplers besides when
> we are initially emitting code. After we emit code all we ever do is loop over
> all samplers and do some processor independent work on them, so we have no need
> for direct access to individual samplers.
>
> In the GLProgram all we ever do is set the sampler uniforms in the ctor and never
> touch them again, so no need to save sampler info there. The texture access on
> program reuse just assume that they come in the same order as we set the texture
> units for the samplers
>
> For Vulkan, it is a similar story. We create the descriptor set layouts with the samplers,
> then when we get new textures, we just assume they come in in the same order as we
> set the samplers on the descriptor sets. Thus no need to save direct vulkan info.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1885863004
>
> Committed: https://skia.googlesource.com/skia/+/45b61a1c4c0be896e7b12fd1405abfece799114fTBR=bsalomon@google.com,jvanverth@google.com,cdalton@nvidia.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1896013003
The main goal of this refactorization is to allow Vulkan to use separate
sampler and texture objects in the shader and descriptor sets and combine
them into a sampler2d in the shader where needed.
A large part of this is separating how we store samplers and uniforms in the
UniformHandler. We no longer need to store handles to samplers besides when
we are initially emitting code. After we emit code all we ever do is loop over
all samplers and do some processor independent work on them, so we have no need
for direct access to individual samplers.
In the GLProgram all we ever do is set the sampler uniforms in the ctor and never
touch them again, so no need to save sampler info there. The texture access on
program reuse just assume that they come in the same order as we set the texture
units for the samplers
For Vulkan, it is a similar story. We create the descriptor set layouts with the samplers,
then when we get new textures, we just assume they come in in the same order as we
set the samplers on the descriptor sets. Thus no need to save direct vulkan info.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1885863004
Review URL: https://codereview.chromium.org/1885863004
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
Review URL: https://codereview.chromium.org/1890483002
Static initializers run in a confusing unspecified order,
so it's best to have as few of them as possible.
Most tools and clients I can find already call SkGraphics::Init(),
(or equivalently create an SkAutoGraphics) which calls SkOpts::Init():
- Chrome
- Chrome renderer
- Android
- DM
- nanobench
- SampleApp
- VisualBench
- the old debugger
Seems like the only thing relying on this static initializer today is
the new debugger, fixed here.
TBR=reed@google.com
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1903503002
Review URL: https://codereview.chromium.org/1903503002