- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180
If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9
Review URL: https://codereview.chromium.org/1891513002
Reason for revert:
Need to change around my #if guards so that clang-cl is treated like GCC and Clang, rather than MSVC.
Original issue's description:
> skcpu: sse4.1 floor, f16c f16<->f32
>
> - floor with roundps is about 4.5x faster when available
> - f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
>
> +0x180 movups (%r15), %xmm1
> +0x184 vcvtph2ps (%rbx), %xmm2
> +0x189 movaps %xmm1, %xmm3
> +0x18c shufps $255, %xmm3, %xmm3
> +0x190 movaps %xmm0, %xmm4
> +0x193 subps %xmm3, %xmm4
> +0x196 mulps %xmm2, %xmm4
> +0x199 addps %xmm1, %xmm4
> +0x19c vcvtps2ph $0, %xmm4, (%rbx)
> +0x1a2 addq $16, %r15
> +0x1a6 addq $8, %rbx
> +0x1aa decl %r14d
> +0x1ad jne +0x180
>
> If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/cbe3c1af987d622ea67ef560d855b41bb14a0ce9TBR=fmalita@chromium.org,herb@google.com,reed@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1891993002
These are implemented generically with Sk4s and don't benefit
from anything fancier than vanilla SSE/NEON.
This means there's no need to hide this code away in another
file or behind a function pointer... it's readable and we have
compile-time support for all the instructions it needs.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1872193002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1872193002
With the GPU backend, allow F16 render targets to be created (along with
any other renderable format). We were previously just falling back to 8888.
In SampleApp, if the window configuration is F16, don't render directly
to the primary surface (which is actually sRGB 8888). Intead, make an
off-screen F16 surface, then blit it back to the framebuffer when we're done.
In DM, clamp values outside of [0,1]. These were wrapping, producing very
incorrect images. (Many filters can trigger out-of-range values due to
ringing).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890923003
Review URL: https://codereview.chromium.org/1890923003
- floor with roundps is about 4.5x faster when available
- f16 srcover_n is similar to but a little faster than the version in https://codereview.chromium.org/1884683002. This new one fuses the dst load/stores into the f16<->f32 conversions:
+0x180 movups (%r15), %xmm1
+0x184 vcvtph2ps (%rbx), %xmm2
+0x189 movaps %xmm1, %xmm3
+0x18c shufps $255, %xmm3, %xmm3
+0x190 movaps %xmm0, %xmm4
+0x193 subps %xmm3, %xmm4
+0x196 mulps %xmm2, %xmm4
+0x199 addps %xmm1, %xmm4
+0x19c vcvtps2ph $0, %xmm4, (%rbx)
+0x1a2 addq $16, %r15
+0x1a6 addq $8, %rbx
+0x1aa decl %r14d
+0x1ad jne +0x180
If we decide to land this it'd be a good idea to convert most or all users of SkFloatToHalf_01 and SkHalfToFloat_01 over to the pointer-based versions.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1891513002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1891513002
Do not call SkBufferHead::validate in SkROBuffer's destructor, which
may be called in a separate thread from SkRWBuffer::append. validate()
reads SkBufferBlock::fUsed, and append() writes to it, resulting in
a data race.
Update some comments to be more clear about how it is safe to use
these classes across threads.
Test the readers in separate threads.
In addition, make sure it is safe to create a reader even when no
data has been appended. Add tests for this case.
Mark a parameter to SkBufferHead::validate() as const, reflecting
its use.
BUG=chromium:601578
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1871953002
Review URL: https://codereview.chromium.org/1871953002
We ended up with two SkEmptyTypefaces it differnt places. Avoid this odr
violation by sticking these in anonymous namespaces.
Review URL: https://codereview.chromium.org/1887093002
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review URL: https://codereview.chromium.org/1890483002
recent changes to text rendering.
Uses linear coverage falloff. Produces results that are perceptually more
similar to L32 (raster and gpu). Smoothstep + sRGB was too soft.
Plumb the gamma-correctness via DrawPathArgs, which also paves the way for
other path rendering implementations to easily make decisions about rendering
technique based on that flag.
Fix a few typos and formatting issues from my most recent change.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1889453002
Review URL: https://codereview.chromium.org/1889453002
A font consists of a set of data and a set of parameters to that data.
For example a ttc font consists of the full font data parameterized by
the index. In addition to the index, FontConfig allows specifying a
matrix and embolden flag. In the future there may also be additional
parameters of this sort, for example which color palette to use.
This does not provide a way to serialize these parameters.
Adding this here provides a nice place to experiment with doing so.
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890533002
Review URL: https://codereview.chromium.org/1890533002
This adds SK_VERY_LEGACY_CREATE_TYPEFACE which, when defined, provides
only the old interface.
Ideally, everyone would switch directly to SkFontMgr and use one of the
newer calls, but there is currently no path for current users to get
there. This updates all the internals to use SkFontStyle, after
switching these over the higher level APIs can be switched.
The Chromium follow on patch can be seen at https://crrev.com/1877673002
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1873923002
TBR=reed
This doesn't really change API, just modernizes it.
Review URL: https://codereview.chromium.org/1873923002
Add a second distance field adjust table that only applies contrast,
not fake-gamma correction. Store a flag in the batch at creation time,
using the same logic we apply elsewhere (render target format, plus
paint flags).
That gets us close, but not as good as bitmap text. The final step is
to use a linear step function (rather than smoothstep) to map distance
to coverage, when we have sRGB output. Smoothstep's nonlinear response
is actually doing some fake-gamma, so it ends up over-correcting when
the output is already gamma-correct.
Results are now very close between L32 (old table, smoothstep) and S32
(contrast-only table, linstep).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1885613002
Review URL: https://codereview.chromium.org/1885613002