This is a reland of 26ad8ccdec
... now with MSAN support.
Original change's description:
> add ERMS (enhanced rep mov/sto) SkOpts slice
>
> Intel's got two CPUID bits indicating the speed of rep mov/sto
> (memcpy/memset),
>
> - ERMS, Enhanced Rep Mov/Sto, older, large copies are fast?
> - FSRM, Fast Short Rep Mov, newer, small copies are fast?
>
> ERMS has been around a long time on Intel, but is relatively recent on
> Ryzen, and FSRM is new across the board. The startup cost for
> ERMS-but-not-FSRM copies really is noticeable, so we cut over to the
> previous SSE/AVX routines when N is small.
>
> I've left the memset benchmarks as I found them most useful when
> tuning the small/large cutoff in this CL.
>
> Change-Id: I3ac4e3f34796aba0ea86aabbe9dda7526919456a
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/332580
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>
Cq-Include-Trybots: luci.skia.skia.primary:Test-Debian10-Clang-GCE-CPU-AVX2-x86_64-Release-All-MSAN
Change-Id: Ia293bba90022c48c884599331ef35aa67644729b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/334343
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
This reverts commit 26ad8ccdec.
Reason for revert: gonna need to teach MSAN about this to reland.
Original change's description:
> add ERMS (enhanced rep mov/sto) SkOpts slice
>
> Intel's got two CPUID bits indicating the speed of rep mov/sto
> (memcpy/memset),
>
> - ERMS, Enhanced Rep Mov/Sto, older, large copies are fast?
> - FSRM, Fast Short Rep Mov, newer, small copies are fast?
>
> ERMS has been around a long time on Intel, but is relatively recent on
> Ryzen, and FSRM is new across the board. The startup cost for
> ERMS-but-not-FSRM copies really is noticeable, so we cut over to the
> previous SSE/AVX routines when N is small.
>
> I've left the memset benchmarks as I found them most useful when
> tuning the small/large cutoff in this CL.
>
> Change-Id: I3ac4e3f34796aba0ea86aabbe9dda7526919456a
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/332580
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>
TBR=mtklein@google.com,herb@google.com
Change-Id: I3264af132272dbbaac8fc8b62e139a6a112bbadb
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/334342
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Intel's got two CPUID bits indicating the speed of rep mov/sto
(memcpy/memset),
- ERMS, Enhanced Rep Mov/Sto, older, large copies are fast?
- FSRM, Fast Short Rep Mov, newer, small copies are fast?
ERMS has been around a long time on Intel, but is relatively recent on
Ryzen, and FSRM is new across the board. The startup cost for
ERMS-but-not-FSRM copies really is noticeable, so we cut over to the
previous SSE/AVX routines when N is small.
I've left the memset benchmarks as I found them most useful when
tuning the small/large cutoff in this CL.
Change-Id: I3ac4e3f34796aba0ea86aabbe9dda7526919456a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/332580
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Current strategy: everything from the top
Things to look at first are the manual changes:
- added tools/rewrite_includes.py
- removed -Idirectives from BUILD.gn
- various compile.sh simplifications
- tweak tools/embed_resources.py
- update gn/find_headers.py to write paths from the top
- update gn/gn_to_bp.py SkUserConfig.h layout
so that #include "include/config/SkUserConfig.h" always
gets the header we want.
No-Presubmit: true
Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Hal Canary <halcanary@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
The benches for N <= 10 get around 2x faster on my N7 and N9. I believe this
is because of the reduced function-call-then-function-pointer-call overhead on
the N7, and additionally because it seems autovectorization beats our NEON code
for small N on the N9.
My desktop is unchanged, though that's probably because N=10 lies well within a
region where memset's performance is essentially constant: N=100 takes only
about 2x as long as N=1 and N=10, which perform nearly identically.
BUG=skia:
Review URL: https://codereview.chromium.org/1073863002
This fixes every case where virtual and SK_OVERRIDE were on the same line,
which should be the bulk of cases. We'll have to manually clean up the rest
over time unless I level up in regexes.
for f in (find . -type f); perl -p -i -e 's/virtual (.*)SK_OVERRIDE/\1SK_OVERRIDE/g' $f; end
BUG=skia:
Review URL: https://codereview.chromium.org/806653007
I'm not quite sure why I wrote such a convoluted API with setLoops()/getLoops().
This replaces it with a loops argument passed to onDraw().
This CL is largely mechanical translation from the old API to the new one.
MathBench used this->getLoops() outside onDraw(), which seems incorrect. I
fixed it.
BUG=
R=djsollen@google.com
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/99893003
git-svn-id: http://skia.googlecode.com/svn/trunk@12466 2bbb7eff-a529-9590-31e7-b0007b416f81
Adds "grresourcecache_add" and "grresourcecache_find" bench tests to test
GrResourceCache::add and GrResourceCache::find. The tests work only
with GPU backends, since GrResourceCache needs an GrGpu.
Modifies bench tests to override SkBenchmark::isSuitableFor(Backend)
function that specifies what kind of backend the test is inteded
for. This replaces the previous "fIsRendering" flag that would
indicate test that did no rendering.
Adds SkCanvas::getGrContext() call to get the GrContext that the
canvas ends up drawing to. The member function solves a common
use-case that is also used in the benchmark added here.
R=mtklein@google.com, bsalomon@google.com
Author: kkinnunen@nvidia.com
Review URL: https://codereview.chromium.org/73643005
git-svn-id: http://skia.googlecode.com/svn/trunk@12334 2bbb7eff-a529-9590-31e7-b0007b416f81