Reason for revert:
Who wants to land forever?
Original issue's description:
> update memset16/32 inlining heuristics
>
> I spent some time looking at perf.skia.org and it looks like we can do better.
>
> It is weird, weird, weird that on x86, we see three completely different behaviors:
> - x86 Android: inlining better for small N, custom better for large N;
> - Windows: inlining better for large N, custom better for small N;
> - other x86: inlining generally better
>
> BUG=skia:4316,chromium:516426
>
> (Temporary, plan to revert.)
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/b68fa409fc00ce2f38e2a0fd6f9dc2379b372481TBR=reed@google.com,jcgregorio@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4316,chromium:516426
Review URL: https://codereview.chromium.org/1358793002
I spent some time looking at perf.skia.org and it looks like we can do better.
It is weird, weird, weird that on x86, we see three completely different behaviors:
- x86 Android: inlining better for small N, custom better for large N;
- Windows: inlining better for large N, custom better for small N;
- other x86: inlining generally better
BUG=skia:4316,chromium:516426
(Temporary, plan to revert.)
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1357193002
With this new arrangement, the benefits of inlining sk_memset16/32 have changed.
On x86, they're not significantly different, except for small N<=10 where the inlined code is significantly slower.
On ARMv7 with NEON, our custom code is still significantly faster for N>10 (up to 2x faster). For small N<=10 inlining is still significantly faster.
On ARMv7 without NEON, our custom code is still ridiculously faster (up to 10x) than inlining for N>10, though for small N<=10 inlining is still a little faster.
We were not using the NEON memset16 and memset32 procs on ARMv8. At first blush, that seems to be an oversight, but if so it's an extremely lucky one. The ARMv8 code generation for our memset16/32 procs is total garbage, leaving those methods ~8x slower than just inlining the memset, using the compiler's autovectorization.
So, no need to inline any more on x86, and still inline for N<=10 on ARMv7. Always inline for ARMv8.
BUG=skia:4117
Review URL: https://codereview.chromium.org/1270573002
The benches for N <= 10 get around 2x faster on my N7 and N9. I believe this
is because of the reduced function-call-then-function-pointer-call overhead on
the N7, and additionally because it seems autovectorization beats our NEON code
for small N on the N9.
My desktop is unchanged, though that's probably because N=10 lies well within a
region where memset's performance is essentially constant: N=100 takes only
about 2x as long as N=1 and N=10, which perform nearly identically.
BUG=skia:
Review URL: https://codereview.chromium.org/1073863002
Most of the errors were like:
../../src/gpu/gl/GrGLEffectMatrix.cpp:74:9: error: variable 'varyingType' is used uninitialized whenever switch default is taken [-Werror,-Wsometimes-uninitialized]
../../src/gpu/gl/debug/GrDebugGL.h:125:21: error: private field 'fMaxTextureUnits' is not used [-Werror,-Wunused-private-field]
../../src/core/SkBitmapSampler.cpp:312:25: error: private field 'fProcTable' is not used [-Werror,-Wunused-private-field]
R=bsalomon@google.com,scroggo@google.com
Review URL: https://codereview.chromium.org/12915007
git-svn-id: http://skia.googlecode.com/svn/trunk@8403 2bbb7eff-a529-9590-31e7-b0007b416f81
I have manually examined all of these diffs and restored a few files that
seem to require manual adjustment.
The following files still need to be modified manually, in a separate CL:
android_sample/SampleApp/AndroidManifest.xml
android_sample/SampleApp/res/layout/layout.xml
android_sample/SampleApp/res/menu/sample.xml
android_sample/SampleApp/res/values/strings.xml
android_sample/SampleApp/src/com/skia/sampleapp/SampleApp.java
android_sample/SampleApp/src/com/skia/sampleapp/SampleView.java
experimental/CiCarbonSampleMain.c
experimental/CocoaDebugger/main.m
experimental/FileReaderApp/main.m
experimental/SimpleCocoaApp/main.m
experimental/iOSSampleApp/Shared/SkAlertPrompt.h
experimental/iOSSampleApp/Shared/SkAlertPrompt.m
experimental/iOSSampleApp/SkiOSSampleApp-Base.xcconfig
experimental/iOSSampleApp/SkiOSSampleApp-Debug.xcconfig
experimental/iOSSampleApp/SkiOSSampleApp-Release.xcconfig
gpu/src/android/GrGLDefaultInterface_android.cpp
gyp/common.gypi
gyp_skia
include/ports/SkHarfBuzzFont.h
include/views/SkOSWindow_wxwidgets.h
make.bat
make.py
src/opts/memset.arm.S
src/opts/memset16_neon.S
src/opts/memset32_neon.S
src/opts/opts_check_arm.cpp
src/ports/SkDebug_brew.cpp
src/ports/SkMemory_brew.cpp
src/ports/SkOSFile_brew.cpp
src/ports/SkXMLParser_empty.cpp
src/utils/ios/SkImageDecoder_iOS.mm
src/utils/ios/SkOSFile_iOS.mm
src/utils/ios/SkStream_NSData.mm
tests/FillPathTest.cpp
Review URL: http://codereview.appspot.com/4816058
git-svn-id: http://skia.googlecode.com/svn/trunk@1982 2bbb7eff-a529-9590-31e7-b0007b416f81