Commit Graph

32 Commits

Author SHA1 Message Date
mtklein
3d096654b9 update memset16/32 inlining heuristics
I spent some time looking at perf.skia.org and it looks like we can do better.

It is weird, weird, weird that on x86, we see three completely different behaviors:
  - x86 Android: inlining better for small N, custom better for large N;
  - Windows:     inlining better for large N, custom better for small N;
  - other x86:   inlining generally better

BUG=skia:4316,chromium:516426

Committed: https://skia.googlesource.com/skia/+/b68fa409fc00ce2f38e2a0fd6f9dc2379b372481

Summaries: https://perf.skia.org/#4179
All traces, log scale: https://perf.skia.org/#4180

TBR=reed@google.com
No public API changes.

Review URL: https://codereview.chromium.org/1357193002
2015-09-29 10:38:59 -07:00
mtklein
b758bbd84e Revert of Combined approach. (patchset #2 id:20001 of https://codereview.chromium.org/1356133002/ )
Reason for revert:
whee

Original issue's description:
> Combined approach.
>
> This combines some ideas from these two CLs:
>     - try stosd/w
>     - update memset16/32 inlining heuristics
>
>
> BUG=skia:4316
>
> Blinking in and out for perf.skia.org.
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/46243a7c02a1d5116e55a27ff59218f9c320df97

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4316

Review URL: https://codereview.chromium.org/1353703006
2015-09-21 11:02:39 -07:00
mtklein
46243a7c02 Combined approach.
This combines some ideas from these two CLs:
    - try stosd/w
    - update memset16/32 inlining heuristics

BUG=skia:4316

Blinking in and out for perf.skia.org.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1356133002
2015-09-21 10:50:56 -07:00
mtklein
b1cc9daa65 Revert of try simplest code: inline whenever vaguely sensible (patchset #1 id:1 of https://codereview.chromium.org/1351403005/ )
Reason for revert:
pingpong

Original issue's description:
> try simplest code: inline whenever vaguely sensible
>
> BUG=skia:4316
>
> Will land and revert.
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/527a0c8235b454f5d0475a9a3e34caa9520db3a2

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4316

Review URL: https://codereview.chromium.org/1355073002
2015-09-20 19:05:01 -07:00
mtklein
527a0c8235 try simplest code: inline whenever vaguely sensible
BUG=skia:4316

Will land and revert.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1351403005
2015-09-20 19:04:21 -07:00
mtklein
c566fddd37 Revert of try stosd/w (patchset #2 id:20001 of https://codereview.chromium.org/1355063002/ )
Reason for revert:
boink

Original issue's description:
> try stosd/w
>
> While we're trying things and reverting them, might as well try this too.
>
> BUG=skia:4316
>
> Blinking in and out for perf.skia.org.
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/3ca0f626a07e9b534d14a2d8213eedb93c5f7534

TBR=mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4316

Review URL: https://codereview.chromium.org/1356983004
2015-09-20 18:06:03 -07:00
mtklein
3ca0f626a0 try stosd/w
While we're trying things and reverting them, might as well try this too.

BUG=skia:4316

Blinking in and out for perf.skia.org.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1355063002
2015-09-20 18:05:23 -07:00
mtklein
b63d816683 Revert of update memset16/32 inlining heuristics (patchset #1 id:1 of https://codereview.chromium.org/1357193002/ )
Reason for revert:
Who wants to land forever?

Original issue's description:
> update memset16/32 inlining heuristics
>
> I spent some time looking at perf.skia.org and it looks like we can do better.
>
> It is weird, weird, weird that on x86, we see three completely different behaviors:
>   - x86 Android: inlining better for small N, custom better for large N;
>   - Windows:     inlining better for large N, custom better for small N;
>   - other x86:   inlining generally better
>
> BUG=skia:4316,chromium:516426
>
> (Temporary, plan to revert.)
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/b68fa409fc00ce2f38e2a0fd6f9dc2379b372481

TBR=reed@google.com,jcgregorio@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4316,chromium:516426

Review URL: https://codereview.chromium.org/1358793002
2015-09-20 15:02:54 -07:00
mtklein
b68fa409fc update memset16/32 inlining heuristics
I spent some time looking at perf.skia.org and it looks like we can do better.

It is weird, weird, weird that on x86, we see three completely different behaviors:
  - x86 Android: inlining better for small N, custom better for large N;
  - Windows:     inlining better for large N, custom better for small N;
  - other x86:   inlining generally better

BUG=skia:4316,chromium:516426

(Temporary, plan to revert.)
TBR=reed@google.com

Review URL: https://codereview.chromium.org/1357193002
2015-09-20 15:02:15 -07:00
mtklein
bdb34d0345 Move SkOpts.h back to src/core.
The Chrome opts targets (sse2, ssse3, sse41, etc) don't have include/private on
their include path.  This should unblock the roll.

TBR=reed@google.com

BUG=skia:4117

Review URL: https://codereview.chromium.org/1268853007
2015-07-31 14:02:36 -07:00
mtklein
7eb0945af2 Port SkUtils opts to SkOpts.
With this new arrangement, the benefits of inlining sk_memset16/32 have changed.

On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower.
On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster).  For small N<=10 inlining is still significantly faster.
On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster.

We were not using the NEON memset16 and memset32 procs on ARMv8.  At first blush, that seems to be an oversight, but if so it's an extremely lucky one.  The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization.

So, no need to inline any more on x86, and still inline for N<=10 on ARMv7.  Always inline for ARMv8.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1270573002
2015-07-31 10:46:50 -07:00
mtklein
58fd2c8af4 Remove sk_memcpy32
It's only implemented on x86, where the exisiting benchmark says memcpy() is
faster for all cases:

Timer overhead: 24ns
curr/maxrss    loops    min    median    mean    max    stddev    samples       config    bench
  10/10  MB    1    35.9µs    36.2µs    36.2µs    36.6µs    1%    ▁▂▄▅▅▃█▄▄▅    nonrendering    sk_memcpy32_100000
  10/10  MB    13    2.27µs    2.28µs    2.28µs    2.29µs    0%    █▄▃▅▃▁▃▅▁▄    nonrendering    sk_memcpy32_10000
  11/11  MB    677    91.6ns    95.9ns    94.5ns    99.4ns    3%    ▅▅▅▅▅█▁▁▁▁    nonrendering    sk_memcpy32_1000
  11/11  MB    1171    20ns    20.9ns    21.3ns    23.4ns    6%    ▁▁▇▃▃▃█▇▃▃    nonrendering    sk_memcpy32_100
  11/11  MB    1952    14ns    14ns    14.3ns    15.2ns    3%    ▁▁██▁▁▁▁▁▁    nonrendering    sk_memcpy32_10
  11/11  MB    5    33.6µs    33.7µs    34.1µs    35.2µs    2%    ▆▇█▁▁▁▁▁▁▁    nonrendering    memcpy32_memcpy_100000
  11/11  MB    18    2.12µs    2.22µs    2.24µs    2.39µs    5%    ▂█▄▇█▄▇▁▁▁    nonrendering    memcpy32_memcpy_10000
  11/11  MB    1112    87.3ns    87.3ns    89.1ns    93.7ns    3%    ▄██▄▁▁▁▁▁▁    nonrendering    memcpy32_memcpy_1000
  11/11  MB    2124    12.8ns    13.3ns    13.5ns    14.8ns    6%    ▁▁▁█▃▃█▇▃▃    nonrendering    memcpy32_memcpy_100
  11/11  MB    3077    9ns    9.41ns    9.52ns    10.2ns    4%    ▃█▁█▃▃▃▃▃▃    nonrendering    memcpy32_memcpy_10

(Why?  One fewer thing to port to SkOpts.)

BUG=skia:4117

Review URL: https://codereview.chromium.org/1256763003
2015-07-27 11:08:28 -07:00
mtklein
9ff378b01b Rewrite memset benches, then use results to add a small-N optimization.
The benches for N <= 10 get around 2x faster on my N7 and N9.  I believe this
is because of the reduced function-call-then-function-pointer-call overhead on
the N7, and additionally because it seems autovectorization beats our NEON code
for small N on the N9.

My desktop is unchanged, though that's probably because N=10 lies well within a
region where memset's performance is essentially constant: N=100 takes only
about 2x as long as N=1 and N=10, which perform nearly identically.

BUG=skia:

Review URL: https://codereview.chromium.org/1073863002
2015-04-09 14:05:17 -07:00
commit-bot@chromium.org
f0ea77a363 SSE2 implementation of memcpy32
With SSE2 version memcpy32, S32_Opaque_BlitRow32() in SkBlitRow_D32.cpp
has about 30% performance improvement. Here are the data on desktop
i7-3770.
before:
       bitmap_scale_filter_90_90   8888:  cmsecs =      2.01
      bitmaprect_FF_filter_trans   8888:  cmsecs =      3.61
    bitmaprect_FF_nofilter_trans   8888:  cmsecs =      3.57
   bitmaprect_FF_filter_identity   8888:  cmsecs =      3.53
 bitmaprect_FF_nofilter_identity   8888:  cmsecs =      3.53
              bitmap_4444_update   8888:  cmsecs =      4.84
     bitmap_4444_update_volatile   8888:  cmsecs =      4.81
                     bitmap_4444   8888:  cmsecs =      4.81
after:
       bitmap_scale_filter_90_90   8888:  cmsecs =      1.83
      bitmaprect_FF_filter_trans   8888:  cmsecs =      2.36
    bitmaprect_FF_nofilter_trans   8888:  cmsecs =      2.36
   bitmaprect_FF_filter_identity   8888:  cmsecs =      2.60
 bitmaprect_FF_nofilter_identity   8888:  cmsecs =      2.63
              bitmap_4444_update   8888:  cmsecs =      3.30
     bitmap_4444_update_volatile   8888:  cmsecs =      3.30
                     bitmap_4444   8888:  cmsecs =      3.29

BUG=skia:
R=mtklein@google.com, reed@google.com, bsalomon@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/285313002

git-svn-id: http://skia.googlecode.com/svn/trunk@14822 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-05-21 12:43:07 +00:00
commit-bot@chromium.org
608d63735f Choose memset procs once.
TSAN shows us racing on the function pointers.  Might as well fix it.

WARNING: ThreadSanitizer: data race (pid=19995)
  Read of size 8 at 0x7f703affb048 by thread T12 (mutexes: write M2957):
    #0 SkBitmap::internalErase(SkIRect const&, unsigned int, unsigned int, unsigned int, unsigned int) const /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/src/core/SkBitmap.cpp:886 (tests+0x0000003511ca)
    #1 SkBitmap::eraseARGB(unsigned int, unsigned int, unsigned int, unsigned int) const /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/src/core/SkBitmap.cpp:919 (tests+0x0000003534bf)
    #2 (anonymous namespace)::DecodingImageGenerator::getPixels(SkImageInfo const&, void*, unsigned long) /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/src/images/SkDecodingImageGenerator.cpp:195 (tests+0x00000051bee1)
    #3 SkDiscardablePixelRef::onNewLockPixels(SkPixelRef::LockRec*) /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/src/lazy/SkDiscardablePixelRef.cpp:63 (tests+0x00000039ad9c)
    #4 SkPixelRef::lockPixels(SkPixelRef::LockRec*) /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/src/core/SkPixelRef.cpp:179 (tests+0x0000003fec23)
    #5 SkBitmap::lockPixels() const /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/src/core/SkBitmap.cpp:414 (tests+0x00000034e41e)
    #6 SkAutoLockPixels /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/include/core/SkBitmap.h:819 (tests+0x0000002752f3)
    #7 ImageDecoderOptions(skiatest::Reporter*) /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/tests/ImageDecodingTest.cpp:565 (tests+0x000000275d03)
    #8 skiatest::Test::run() /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/tests/Test.cpp:107 (tests+0x0000002263e7)
    #9 SkTestRunnable::run() /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/tests/skia_test.cpp:108 (tests+0x0000001d8607)
    #10 SkThreadPoolPrivate::ThreadLocal<void>::run(SkTRunnable<void>*) /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/include/utils/SkThreadPool.h:108 (tests+0x0000001d817e)
    #11 thread_start(void*) /var/scratch/Release/../../../usr/local/google/home/mtklein/skia/src/utils/SkThreadUtils_pthread.cpp:66 (tests+0x000000604347)

  Previous write of size 8 at 0x7f703affb048 by thread T26:
    [failed to restore the stack]


BUG=skia:1792
R=bungeman@google.com, mtklein@google.com, reed@google.com

Author: mtklein@chromium.org

Review URL: https://codereview.chromium.org/250503003

git-svn-id: http://skia.googlecode.com/svn/trunk@14548 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-05-02 20:28:56 +00:00
robertphillips@google.com
a4662865e3 More Windows 64b compilation warning fixes
https://codereview.chromium.org/47513017/



git-svn-id: http://skia.googlecode.com/svn/trunk@12337 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-11-21 14:24:16 +00:00
robertphillips@google.com
8c99c9f4a6 Reverting r12315 (More Windows 64b compilation warning fixes) due to compilation failures
git-svn-id: http://skia.googlecode.com/svn/trunk@12316 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-11-20 15:56:14 +00:00
robertphillips@google.com
80051d38a3 More Windows 64b compilation warning fixes
https://codereview.chromium.org/47513017/



git-svn-id: http://skia.googlecode.com/svn/trunk@12315 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-11-20 15:46:10 +00:00
commit-bot@chromium.org
e61a86cfa0 Guard against most unintentionally ephemeral SkAutoFoo instantiations.
I think I applied the trick everywhere possible.  Limitations:
    - can't be used with templated classes
    - all constructors and destructors must be defined inline

A couple of the SkAutoFoo were unused in Skia, Chromium, and Android, so I
deleted them.  This change caught the same bugs Cary found in SkPath, plus one
more in SampleApp.

BUG=
R=reed@google.com, caryclark@google.com

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/72603005

git-svn-id: http://skia.googlecode.com/svn/trunk@12301 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-11-18 16:03:59 +00:00
tfarina@chromium.org
48552314f6 Fix a few clang errors while trying to build tools target.
Most of the errors were like:
../../src/gpu/gl/GrGLEffectMatrix.cpp:74:9: error: variable 'varyingType' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
../../src/gpu/gl/debug/GrDebugGL.h:125:21: error: private field 'fMaxTextureUnits' is not used [-Werror,-Wunused-private-field]
../../src/core/SkBitmapSampler.cpp:312:25: error: private field 'fProcTable' is not used [-Werror,-Wunused-private-field]

R=bsalomon@google.com,scroggo@google.com

Review URL: https://codereview.chromium.org/12915007

git-svn-id: http://skia.googlecode.com/svn/trunk@8403 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-03-26 21:48:58 +00:00
mike@reedtribe.org
53f3f31e17 tweak to spacing, to trigger build
git-svn-id: http://skia.googlecode.com/svn/trunk@7607 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-02-06 03:43:57 +00:00
skia.committer@gmail.com
e16efc1882 Sanitizing source files in Skia_Periodic_House_Keeping
git-svn-id: http://skia.googlecode.com/svn/trunk@7406 2bbb7eff-a529-9590-31e7-b0007b416f81
2013-01-26 07:06:02 +00:00
djsollen@google.com
a44e6c6b53 Add ARM optimizations to the build.
Also had to fix a problem in the ARM memset code that was
causing some tests and bench to fail.
Review URL: http://codereview.appspot.com/5522052

git-svn-id: http://skia.googlecode.com/svn/trunk@2989 2bbb7eff-a529-9590-31e7-b0007b416f81
2012-01-09 14:38:25 +00:00
reed@google.com
cf4b8181c9 declare IsVariationSelector to be inline, to fix warning
git-svn-id: http://skia.googlecode.com/svn/trunk@2917 2bbb7eff-a529-9590-31e7-b0007b416f81
2011-12-21 16:31:23 +00:00
reed@google.com
419f43348a add SkUnichar_IsVariationSelector()
git-svn-id: http://skia.googlecode.com/svn/trunk@2915 2bbb7eff-a529-9590-31e7-b0007b416f81
2011-12-21 15:21:32 +00:00
djsollen@google.com
56c69773ae Update files to use SK_BUILD_FOR_ANDROID.
This CL also removes any unecessary references to
the ANDROID definition.
Review URL: http://codereview.appspot.com/5354049

git-svn-id: http://skia.googlecode.com/svn/trunk@2629 2bbb7eff-a529-9590-31e7-b0007b416f81
2011-11-08 19:00:26 +00:00
epoger@google.com
ec3ed6a5eb Automatic update of all copyright notices to reflect new license terms.
I have manually examined all of these diffs and restored a few files that
seem to require manual adjustment.

The following files still need to be modified manually, in a separate CL:

android_sample/SampleApp/AndroidManifest.xml
android_sample/SampleApp/res/layout/layout.xml
android_sample/SampleApp/res/menu/sample.xml
android_sample/SampleApp/res/values/strings.xml
android_sample/SampleApp/src/com/skia/sampleapp/SampleApp.java
android_sample/SampleApp/src/com/skia/sampleapp/SampleView.java
experimental/CiCarbonSampleMain.c
experimental/CocoaDebugger/main.m
experimental/FileReaderApp/main.m
experimental/SimpleCocoaApp/main.m
experimental/iOSSampleApp/Shared/SkAlertPrompt.h
experimental/iOSSampleApp/Shared/SkAlertPrompt.m
experimental/iOSSampleApp/SkiOSSampleApp-Base.xcconfig
experimental/iOSSampleApp/SkiOSSampleApp-Debug.xcconfig
experimental/iOSSampleApp/SkiOSSampleApp-Release.xcconfig
gpu/src/android/GrGLDefaultInterface_android.cpp
gyp/common.gypi
gyp_skia
include/ports/SkHarfBuzzFont.h
include/views/SkOSWindow_wxwidgets.h
make.bat
make.py
src/opts/memset.arm.S
src/opts/memset16_neon.S
src/opts/memset32_neon.S
src/opts/opts_check_arm.cpp
src/ports/SkDebug_brew.cpp
src/ports/SkMemory_brew.cpp
src/ports/SkOSFile_brew.cpp
src/ports/SkXMLParser_empty.cpp
src/utils/ios/SkImageDecoder_iOS.mm
src/utils/ios/SkOSFile_iOS.mm
src/utils/ios/SkStream_NSData.mm
tests/FillPathTest.cpp
Review URL: http://codereview.appspot.com/4816058

git-svn-id: http://skia.googlecode.com/svn/trunk@1982 2bbb7eff-a529-9590-31e7-b0007b416f81
2011-07-28 14:26:00 +00:00
mike@reedtribe.org
4e1d3acc16 code style
git-svn-id: http://skia.googlecode.com/svn/trunk@1095 2bbb7eff-a529-9590-31e7-b0007b416f81
2011-04-10 01:04:37 +00:00
reed@android.com
f2b98d67dc merge with changes for GPU backend
git-svn-id: http://skia.googlecode.com/svn/trunk@637 2bbb7eff-a529-9590-31e7-b0007b416f81
2010-12-20 18:26:13 +00:00
senorblanco@chromium.org
4e753558fc More SSE2-ification; fix for gcc -msse2.
Review URL:  http://codereview.appspot.com/154163



git-svn-id: http://skia.googlecode.com/svn/trunk@428 2bbb7eff-a529-9590-31e7-b0007b416f81
2009-11-16 21:09:00 +00:00
reed@android.com
ed673310e2 add initial unittest framework (tests)
move some previous unittests out of core classes and into tests



git-svn-id: http://skia.googlecode.com/svn/trunk@96 2bbb7eff-a529-9590-31e7-b0007b416f81
2009-02-27 16:24:51 +00:00
reed@android.com
8a1c16ff38 grab from latest android
git-svn-id: http://skia.googlecode.com/svn/trunk@27 2bbb7eff-a529-9590-31e7-b0007b416f81
2008-12-17 15:59:43 +00:00