This adds sk_memory_barrier(), implemented using sk_atomic_fetch_add() on an uninitialized variable. If that becomes a problem we can drop this to the porting layer, using std::atomic_thread_fence() / __atomic_thread_fence() / __sync_synchronize().
The big win is that ref() doesn't generate a memory barrier any more on ARM.
This is an instance of SkSafeRef() in SkPaint(const SkPaint&) after this CL:
4d0: 684a ldr r2, [r1, #4]
4d2: 6018 str r0, [r3, #0]
4d4: b13a cbz r2, 4e6 <_ZN7SkPaintC1ERKS_+0x2e>
4d6: 1d10 adds r0, r2, #4
4d8: e850 4f00 ldrex r4, [r0]
4dc: 3401 adds r4, #1
4de: e840 4500 strex r5, r4, [r0]
4e2: 2d00 cmp r5, #0
4e4: d1f8 bne.n 4d8 <_ZN7SkPaintC1ERKS_+0x20>
Here's the before, pretty much the same with two memory barriers surrounding the ref():
4d8: 684a ldr r2, [r1, #4]
4da: 6018 str r0, [r3, #0]
4dc: b15a cbz r2, 4f6 <_ZN7SkPaintC1ERKS_+0x3e>
4de: 1d10 adds r0, r2, #4
4e0: f3bf 8f5f dmb sy
4e4: e850 4f00 ldrex r4, [r0]
4e8: 3401 adds r4, #1
4ea: e840 4500 strex r5, r4, [r0]
4ee: 2d00 cmp r5, #0
4f0: d1f8 bne.n 4e4 <_ZN7SkPaintC1ERKS_+0x2c>
4f2: f3bf 8f5f dmb sy
The miscellaneous files in here are just fixups to explicitly include SkMutex.h,
instead of leeching it off SkRefCnt.h.
No public API changes.
TBR=reed@google.com
Build trybots seem hosed.
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/896803002
This is working towards making a simple example part of the buildbot compile step and removing SkExamples from the experimental directory.
This works on Mac, Windows, and Linux but isn't complete for Android, ChromeOS and iOS.
Review URL: https://codereview.chromium.org/886413004
This merges and refactors SkAtomics.h and SkBarriers.h into SkAtomics.h and
some ports/ implementations. The major new feature is that we can express
memory orders explicitly rather than only through comments.
The porting layer is reduced to four template functions:
- sk_atomic_load
- sk_atomic_store
- sk_atomic_fetch_add
- sk_atomic_compare_exchange
From those four we can reconstruct all our previous sk_atomic_foo.
There are three ports:
- SkAtomics_std: uses C++11 <atomic>, used with MSVC
- SkAtomics_atomic: uses newer GCC/Clang intrinsics, used on not-MSVC where possible
- SkAtomics_sync: uses older GCC/Clang intrinsics, used where SkAtomics_atomic not supported
No public API changes.
TBR=reed@google.com
BUG=skia:
Review URL: https://codereview.chromium.org/896553002
With a new style fonts.xml (version >= 21) fallback fonts are specified
as font families without a name. The current code reads these, but then
also reads in the old fallback and vendor xml files. With this change,
when a new style xml file is encoutered, no further files will be read.
This both lowers memory use and speeds font lookup. Locally, this
change reduces the number of font families loaded on my 'L' device from
148 to 80. All of the families removed are duplicates.
Review URL: https://codereview.chromium.org/888923003
Reason for revert:
Reverted the wrong CL.
Original issue's description:
> Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #16 id:300001 of https://codereview.chromium.org/874863002/)
>
> Reason for revert:
> This causes a bug on the 'hittestpath' GM on MacMini 4,1
>
> See:
>
> https://gold.skia.org/#/triage/hittestpath?head=0
>
> for details.
>
> Original issue's description:
> > SSE4 opaque blend using intrinsics instead of assembly.
> >
> > Since we had such a hard time with the assembly versions of this blit (to the
> > point that we have them completely disabled everywhere), I thought I'd take
> > a shot at writing a version of the blit using intrinsics.
> >
> > The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> > to skip the blend when the 16 src pixels we consider each loop are all opaque
> > or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> > all those alphas.
> >
> > It's worth looking to see if we can backport this type of logic to SSE2 using
> > _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
> >
> > My local performance testing doesn't show this to be an unambiguous win
> > (there are probably microbenchmarks and SKPs where we'd be better off just
> > powering through the blend rather than looking at alphas), but the potential
> > does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.)
> >
> > DM says it draws pixel perfect compare to the old code.
> >
> > Microbenchmarks:
> > bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x
> > bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
> > bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
> > bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x
> > bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x
> > bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
> > bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x
> > bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x
> > bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x
> > bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
> > bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
> > bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x
> > bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x
> > bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
> > bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
> > bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x
> > bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x
> >
> > Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> > the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> > be a decent predictor of real-world impact.
> >
> > BUG=chromium:399842
> >
> > Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785
> >
> > CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot
> >
> > Committed: https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493
>
> TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
> NOPRESUBMIT=true
> NOTREECHECKS=true
> NOTRY=true
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/4988891a1173cd405bf1c1dd3a3668c451f45e4cTBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842
Review URL: https://codereview.chromium.org/894083002
Reason for revert:
This causes a bug on the 'hittestpath' GM on MacMini 4,1
See:
https://gold.skia.org/#/triage/hittestpath?head=0
for details.
Original issue's description:
> SSE4 opaque blend using intrinsics instead of assembly.
>
> Since we had such a hard time with the assembly versions of this blit (to the
> point that we have them completely disabled everywhere), I thought I'd take
> a shot at writing a version of the blit using intrinsics.
>
> The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> to skip the blend when the 16 src pixels we consider each loop are all opaque
> or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> all those alphas.
>
> It's worth looking to see if we can backport this type of logic to SSE2 using
> _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
>
> My local performance testing doesn't show this to be an unambiguous win
> (there are probably microbenchmarks and SKPs where we'd be better off just
> powering through the blend rather than looking at alphas), but the potential
> does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.)
>
> DM says it draws pixel perfect compare to the old code.
>
> Microbenchmarks:
> bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x
> bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x
> bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x
> bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x
> bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x
> bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x
> bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x
> bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x
> bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x
> bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x
> bitmap_RGBA_8888 5.73us -> 5.58us 0.97x
> bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x
> bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x
> bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x
> bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x
> bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x
> bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x
>
> Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> be a decent predictor of real-world impact.
>
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785
>
> CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842
Review URL: https://codereview.chromium.org/873553003
Without specifying the resource path a number of tests and gms do not
work properly. The example given in the docs should specify the
resource path so that the example run is exemplary.
Review URL: https://codereview.chromium.org/898453003
Not enabled by default, but this should get you SKPs, GMs etc for free to play with.
$ out/Debug/dm -w svgs --src gm skp --config svg
BUG=skia:
Review URL: https://codereview.chromium.org/892693002