Commit Graph

17 Commits

Author SHA1 Message Date
Mike Klein
7ac2be2020 Reland "add ERMS (enhanced rep mov/sto) SkOpts slice"
This is a reland of 26ad8ccdec
... now with MSAN support.

Original change's description:
> add ERMS (enhanced rep mov/sto) SkOpts slice
>
> Intel's got two CPUID bits indicating the speed of rep mov/sto
> (memcpy/memset),
>
>     - ERMS, Enhanced Rep Mov/Sto, older, large copies are fast?
>     - FSRM, Fast Short Rep Mov, newer, small copies are fast?
>
> ERMS has been around a long time on Intel, but is relatively recent on
> Ryzen, and FSRM is new across the board.  The startup cost for
> ERMS-but-not-FSRM copies really is noticeable, so we cut over to the
> previous SSE/AVX routines when N is small.
>
> I've left the memset benchmarks as I found them most useful when
> tuning the small/large cutoff in this CL.
>
> Change-Id: I3ac4e3f34796aba0ea86aabbe9dda7526919456a
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/332580
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>

Cq-Include-Trybots: luci.skia.skia.primary:Test-Debian10-Clang-GCE-CPU-AVX2-x86_64-Release-All-MSAN
Change-Id: Ia293bba90022c48c884599331ef35aa67644729b
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/334343
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2020-11-12 18:00:39 +00:00
Mike Klein
d32c57d742 Revert "add ERMS (enhanced rep mov/sto) SkOpts slice"
This reverts commit 26ad8ccdec.

Reason for revert: gonna need to teach MSAN about this to reland.

Original change's description:
> add ERMS (enhanced rep mov/sto) SkOpts slice
>
> Intel's got two CPUID bits indicating the speed of rep mov/sto
> (memcpy/memset),
>
>     - ERMS, Enhanced Rep Mov/Sto, older, large copies are fast?
>     - FSRM, Fast Short Rep Mov, newer, small copies are fast?
>
> ERMS has been around a long time on Intel, but is relatively recent on
> Ryzen, and FSRM is new across the board.  The startup cost for
> ERMS-but-not-FSRM copies really is noticeable, so we cut over to the
> previous SSE/AVX routines when N is small.
>
> I've left the memset benchmarks as I found them most useful when
> tuning the small/large cutoff in this CL.
>
> Change-Id: I3ac4e3f34796aba0ea86aabbe9dda7526919456a
> Reviewed-on: https://skia-review.googlesource.com/c/skia/+/332580
> Reviewed-by: Herb Derby <herb@google.com>
> Commit-Queue: Mike Klein <mtklein@google.com>

TBR=mtklein@google.com,herb@google.com

Change-Id: I3264af132272dbbaac8fc8b62e139a6a112bbadb
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/334342
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-11-12 15:48:38 +00:00
Mike Klein
26ad8ccdec add ERMS (enhanced rep mov/sto) SkOpts slice
Intel's got two CPUID bits indicating the speed of rep mov/sto
(memcpy/memset),

    - ERMS, Enhanced Rep Mov/Sto, older, large copies are fast?
    - FSRM, Fast Short Rep Mov, newer, small copies are fast?

ERMS has been around a long time on Intel, but is relatively recent on
Ryzen, and FSRM is new across the board.  The startup cost for
ERMS-but-not-FSRM copies really is noticeable, so we cut over to the
previous SSE/AVX routines when N is small.

I've left the memset benchmarks as I found them most useful when
tuning the small/large cutoff in this CL.

Change-Id: I3ac4e3f34796aba0ea86aabbe9dda7526919456a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/332580
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-11-12 15:05:07 +00:00
Adlai Holler
684838f1f5 Mark SkStringPrintf as SK_PRINTF_LIKE
Change-Id: I3d2ee8dca1d2e962794ce8c3c391779bff357f0c
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/288762
Commit-Queue: Adlai Holler <adlai@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Auto-Submit: Adlai Holler <adlai@google.com>
2020-05-12 15:22:14 +00:00
Mike Klein
c0bd9f9fe5 rewrite includes to not need so much -Ifoo
Current strategy: everything from the top

Things to look at first are the manual changes:

   - added tools/rewrite_includes.py
   - removed -Idirectives from BUILD.gn
   - various compile.sh simplifications
   - tweak tools/embed_resources.py
   - update gn/find_headers.py to write paths from the top
   - update gn/gn_to_bp.py SkUserConfig.h layout
     so that #include "include/config/SkUserConfig.h" always
     gets the header we want.

No-Presubmit: true
Change-Id: I73a4b181654e0e38d229bc456c0d0854bae3363e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/209706
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Hal Canary <halcanary@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Florin Malita <fmalita@chromium.org>
2019-04-24 16:27:11 +00:00
mtklein
a1ebeb25e9 Remove const from const int loops.
This drives me nuts, and prevents `while (loops --> 0)`.

BUG=skia:

Review URL: https://codereview.chromium.org/1379923005
2015-10-01 09:43:39 -07:00
mtklein
9ff378b01b Rewrite memset benches, then use results to add a small-N optimization.
The benches for N <= 10 get around 2x faster on my N7 and N9.  I believe this
is because of the reduced function-call-then-function-pointer-call overhead on
the N7, and additionally because it seems autovectorization beats our NEON code
for small N on the N9.

My desktop is unchanged, though that's probably because N=10 lies well within a
region where memset's performance is essentially constant: N=100 takes only
about 2x as long as N=1 and N=10, which perform nearly identically.

BUG=skia:

Review URL: https://codereview.chromium.org/1073863002
2015-04-09 14:05:17 -07:00
mtklein
36352bf5e3 C++11 override should now be supported by all of {bots,Chrome,Android,Mozilla}
NOPRESUBMIT=true

BUG=skia:
DOCS_PREVIEW= https://skia.org/?cl=1037793002

Review URL: https://codereview.chromium.org/1037793002
2015-03-25 18:17:32 -07:00
mtklein
72c9faab45 Fix up all the easy virtual ... SK_OVERRIDE cases.
This fixes every case where virtual and SK_OVERRIDE were on the same line,
which should be the bulk of cases.  We'll have to manually clean up the rest
over time unless I level up in regexes.

for f in (find . -type f); perl -p -i -e 's/virtual (.*)SK_OVERRIDE/\1SK_OVERRIDE/g' $f; end

BUG=skia:

Review URL: https://codereview.chromium.org/806653007
2015-01-09 10:06:40 -08:00
bsalomon
0aa5cea869 fix last warnings on w64 and turn on w.a.e.
Review URL: https://codereview.chromium.org/801413002
2014-12-15 09:13:35 -08:00
tfarina
f168b86d7f Remove Sk prefix from some bench classes.
This idea came while commenting on
https://codereview.chromium.org/343583005/

Since SkBenchmark, SkBenchLogger and SkGMBench are not part of the Skia library,
they should not have the Sk prefix.

BUG=None
TEST=make all
R=mtklein@google.com

Author: tfarina@chromium.org

Review URL: https://codereview.chromium.org/347823004
2014-06-19 12:32:29 -07:00
commit-bot@chromium.org
3361471a35 Simplify benchmark internal API.
I'm not quite sure why I wrote such a convoluted API with setLoops()/getLoops().
This replaces it with a loops argument passed to onDraw().

This CL is largely mechanical translation from the old API to the new one.
MathBench used this->getLoops() outside onDraw(), which seems incorrect.  I
fixed it.

BUG=
R=djsollen@google.com

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/99893003

git-svn-id: http://skia.googlecode.com/svn/trunk@12466 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-12-03 18:17:16 +00:00
commit-bot@chromium.org
644629c1c7 Implement a benchmark for GrResourceCache
Adds "grresourcecache_add" and "grresourcecache_find" bench tests to test
GrResourceCache::add and GrResourceCache::find. The tests work only
with GPU backends, since GrResourceCache needs an GrGpu.

Modifies bench tests to override SkBenchmark::isSuitableFor(Backend)
function that specifies what kind of backend the test is inteded
for. This replaces the previous "fIsRendering" flag that would
indicate test that did no rendering.

Adds SkCanvas::getGrContext() call to get the GrContext that the
canvas ends up drawing to. The member function solves a common
use-case that is also used in the benchmark added here.

R=mtklein@google.com, bsalomon@google.com

Author: kkinnunen@nvidia.com

Review URL: https://codereview.chromium.org/73643005

git-svn-id: http://skia.googlecode.com/svn/trunk@12334 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-11-21 06:21:58 +00:00
mtklein@google.com
410e6e80f0 Refactoring: get rid of the SkBenchmark void* parameter.
While I was doing massive sed-ing, I also converted every bench to use DEF_BENCH instead of registering the ugly manual way.

BUG=
R=scroggo@google.com

Review URL: https://codereview.chromium.org/23876006

git-svn-id: http://skia.googlecode.com/svn/trunk@11263 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-09-13 19:52:27 +00:00
mtklein@google.com
c289743864 Major bench refactoring.
- Use FLAGS_.
   - Remove outer repeat loop.
   - Tune inner loop automatically.

BUG=skia:1590
R=epoger@google.com, scroggo@google.com

Review URL: https://codereview.chromium.org/23478013

git-svn-id: http://skia.googlecode.com/svn/trunk@11187 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-09-10 19:23:38 +00:00
robertphillips@google.com
59ce1377b1 Fix Win7 warning-as-error complaint
git-svn-id: http://skia.googlecode.com/svn/trunk@9411 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-06-03 17:29:58 +00:00
reed@google.com
c117cbae61 add bench for sk_memset16/32
BUG=

Review URL: https://codereview.chromium.org/16336009

git-svn-id: http://skia.googlecode.com/svn/trunk@9405 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-06-03 16:54:10 +00:00