This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply 8888: cmsecs = 33.30 565: cmsecs = 45.65
after:
Xfermode_Multiply 8888: cmsecs = 17.18 565: cmsecs = 24.87
BUG=
R=mtklein@google.com
Author: qiankun.miao@intel.com
Review URL: https://codereview.chromium.org/202903004
git-svn-id: http://skia.googlecode.com/svn/trunk@14006 2bbb7eff-a529-9590-31e7-b0007b416f81
Reason for revert:
GYP's failing on most (all?) bots.
Original issue's description:
> ARM Skia NEON patches - 35 - First AArch64 support
>
> Aarch64 support
>
> This change contains the necessary modifications to have Skia build and
> run properly on an ARMv8 processor in aarch64 execution state.
>
> Here's a list of the changes:
>
> - add an arm64 target to the build system + SK_CPU_ARM64 flag
>
> - MatrixTest was failing when built in Release mode. Fused MAC
> instructions were generated which made some intermediate results
> more accurate. As the test relies on result comparison, the more
> precise results when compared to others led to a gap bigger than
> what was tolerated. As I don't know if some actual skia code relies
> on results being comparable, I've disabled fused MAC instruction
> with -ffp-contract=off for arm64.
>
> - Modify include/core/SkOnce.h to have barriers work.
>
> - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS.
>
> - use existing Xfermode optimisations with modifications that can be
> removed in the future when toolchains are ready. Also save a few
> instructions is two Xfermodes (will apply to ARM too).
>
> - use existing SkBoxBlur and SkMorphology optimisations.
>
> - use existing SkBlitMask optimisations
>
> - use existing BitmapProcState and Convolution optimisations.
>
> Future changes will include:
>
> - Blitters (only partialy merged upstream)
>
> - SkUtils (there's little value in sending asm optimisations without
> having them benchmarked on real hardware).
>
> Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
>
> BUG=skia:
>
> Committed: http://code.google.com/p/skia/source/detail?r=13980R=djsollen@google.com, reed@google.com, halcanary@google.com, kevin.petit@arm.comTBR=djsollen@google.com, halcanary@google.com, kevin.petit@arm.com, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/216113005
git-svn-id: http://skia.googlecode.com/svn/trunk@13983 2bbb7eff-a529-9590-31e7-b0007b416f81
Aarch64 support
This change contains the necessary modifications to have Skia build and
run properly on an ARMv8 processor in aarch64 execution state.
Here's a list of the changes:
- add an arm64 target to the build system + SK_CPU_ARM64 flag
- MatrixTest was failing when built in Release mode. Fused MAC
instructions were generated which made some intermediate results
more accurate. As the test relies on result comparison, the more
precise results when compared to others led to a gap bigger than
what was tolerated. As I don't know if some actual skia code relies
on results being comparable, I've disabled fused MAC instruction
with -ffp-contract=off for arm64.
- Modify include/core/SkOnce.h to have barriers work.
- SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS.
- use existing Xfermode optimisations with modifications that can be
removed in the future when toolchains are ready. Also save a few
instructions is two Xfermodes (will apply to ARM too).
- use existing SkBoxBlur and SkMorphology optimisations.
- use existing SkBlitMask optimisations
- use existing BitmapProcState and Convolution optimisations.
Future changes will include:
- Blitters (only partialy merged upstream)
- SkUtils (there's little value in sending asm optimisations without
having them benchmarked on real hardware).
Signed-off-by: Kevin PETIT <kevin.petit@arm.com>
BUG=skia:
R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com
Author: kevin.petit@arm.com
Review URL: https://codereview.chromium.org/143423004
git-svn-id: http://skia.googlecode.com/svn/trunk@13980 2bbb7eff-a529-9590-31e7-b0007b416f81
Split off from https://codereview.chromium.org/140503007/.
The eventual goal is to create our Android.mk from gyp. This patch
adds an option for skia_android_framework with the right settings.
The follow-up (https://codereview.chromium.org/140503007/) will
use scripts to create the final makefile.
gyp/android_deps.gyp:
Use different dependencies for the framework than for building Skia
normally.
gyp/android_framework_lib.gyp:
Like skia_lib, specifies the minimum needed for building Skia, in this
case for the framework.
gyp/common_conditions.gypi:
Add settings specific to skia_android_framework. In some cases this
means turning off flags and defines.
gyp/common.gypi
Turn off SK_DEBUG and SK_DEVELOPER when building for the framework.
This allows the framework to create a single makefile which can be
modified to add SK_DEBUG and SK_DEVELOPER as desired.
gyp/common_variables.gypi:
Add skia_android_framework.
gyp/core.gyp:
Don't depend on cpufeatures, and add the cutils library for
skia_android_framework.
gyp/freetype.gyp:
skia_android_framework-specific options:
Don't include freetype_static as a dependency.
Include the proper folders.
Include the android library.
gyp/images.gyp:
Don't export libjpeg as a dependency for targets that include images
for the framework.
Also reorder image decoders to match the Android order, leaving our
most commonly used ones last (and therefore first in the chain for
trying them).
gyp/libwebp.gyp:
Use the system webp when building for the Android framework. Specify
the correct settings for the framework.
gyp/opts.gyp:
Specify a default set of files to compile when there are no possible
optimizations.
gyp/pdf.gyp:
Add dependencies for Android framework.
gyp/zlib.gyp:
Include the zlib folder, and undefine SK_ZLIB_INCLUDE.
BUG=skia:1975
R=djsollen@google.com
Committed: https://code.google.com/p/skia/source/detail?r=13298
Review URL: https://codereview.chromium.org/153093003
git-svn-id: http://skia.googlecode.com/svn/trunk@13304 2bbb7eff-a529-9590-31e7-b0007b416f81
Split off from https://codereview.chromium.org/140503007/.
The eventual goal is to create our Android.mk from gyp. This patch
adds an option for skia_android_framework with the right settings.
The follow-up (https://codereview.chromium.org/140503007/) will
use scripts to create the final makefile.
gyp/android_deps.gyp:
Use different dependencies for the framework than for building Skia
normally.
gyp/android_framework_lib.gyp:
Like skia_lib, specifies the minimum needed for building Skia, in this
case for the framework.
gyp/common_conditions.gypi:
Add settings specific to skia_android_framework. In some cases this
means turning off flags and defines.
gyp/common.gypi
Turn off SK_DEBUG and SK_DEVELOPER when building for the framework.
This allows the framework to create a single makefile which can be
modified to add SK_DEBUG and SK_DEVELOPER as desired.
gyp/common_variables.gypi:
Add skia_android_framework.
gyp/core.gyp:
Don't depend on cpufeatures, and add the cutils library for
skia_android_framework.
gyp/freetype.gyp:
skia_android_framework-specific options:
Don't include freetype_static as a dependency.
Include the proper folders.
Include the android library.
gyp/images.gyp:
Don't export libjpeg as a dependency for targets that include images
for the framework.
Also reorder image decoders to match the Android order, leaving our
most commonly used ones last (and therefore first in the chain for
trying them).
gyp/libwebp.gyp:
Use the system webp when building for the Android framework. Specify
the correct settings for the framework.
gyp/opts.gyp:
Specify a default set of files to compile when there are no possible
optimizations.
gyp/pdf.gyp:
Add dependencies for Android framework.
gyp/zlib.gyp:
Include the zlib folder, and undefine SK_ZLIB_INCLUDE.
BUG=skia:1975
R=djsollen@google.com
Review URL: https://codereview.chromium.org/153093003
git-svn-id: http://skia.googlecode.com/svn/trunk@13298 2bbb7eff-a529-9590-31e7-b0007b416f81
BitmapProcState: new factorised code
This one basically factorises the clamp and repeat transformations with
some performance improvements. It has the benefit of being faster, much
easier to maintain (nearly three times less code for more work
done :-)), and more complete (all persp transformations weren't optimised
in the previous version).
It also introduces the use of can_truncate_to_fixed_for_decal where
useful.
The effect on benchmarks ranges from a 5% penalty to a 25% gain on a
Cortex-A9 and from a 5% penalty to a 100% gain on a Cortex-A15.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
R=djsollen@google.com, mtklein@google.com, luisjoseromeroesclusa@hotmail.com, reed@google.com
Author: kevin.petit.arm@gmail.com
Review URL: https://codereview.chromium.org/23835006
git-svn-id: http://skia.googlecode.com/svn/trunk@13218 2bbb7eff-a529-9590-31e7-b0007b416f81
Using OTHER_CPLUSPLUSFLAGS instead of OTHER_CFLAGS will append -mssse3 into the
argument list instead of overwriting as the old note warns about. (So it's
actually there twice now for the files in opts_ssse3, and we can still build if
we remove -mssse3 from common_conditions.gypi.)
We could also just delete this clause entirely given that
common_conditions.gypi sets it anyway. Which do you think is best? This code
won't compile unless _someone_ has set -mssse3. Seems to me the redundancy
helps communicate that and protect against changes in common_conditions.gypi.
BUG=
R=epoger@google.com, bungeman@google.com
Author: mtklein@google.com
Review URL: https://chromiumcodereview.appspot.com/21279005
git-svn-id: http://skia.googlecode.com/svn/trunk@10573 2bbb7eff-a529-9590-31e7-b0007b416f81
$ compare-android.sh bench --match bitmap_ --repeat 30
master -> ssse3
N=30 p=0.001000 (corrected to 0.000033)
sig? speedup bench
n -1.16% bitmap_scale_filter_256_64
y -0.72% bitmap_8888_A_scale_bicubic
y -0.21% bitmap_index8_A
n -0.00% bitmap_565
n -0.00% bitmap_scale_filter_90_80
n 0.03% bitmap_8888_A_source_transparent
y 0.06% bitmap_index8
y 0.30% bitmap_8888_A_source_stripes_two
n 0.34% bitmap_scale_filter_80_90
y 0.42% bitmap_8888_A
y 0.44% bitmap_8888_A_source_opaque
n 0.53% bitmap_scale_filter_90_10
y 0.71% bitmap_8888_A_source_stripes_three
y 0.91% bitmap_8888_A_scale_rotate_bicubic
y 1.04% bitmap_8888_update
n 1.19% bitmap_scale_filter_10_90
n 1.39% bitmap_scale_filter_90_90
y 1.77% bitmap_8888_update_volatile
y 1.89% bitmap_8888
y 2.37% bitmap_scale_filter_30_90
y 9.57% bitmap_scale_filter_64_256
n 17.86% bitmap_scale_filter_90_30
y 25.40% bitmap_8888_A_scale_rotate_bilerp
y 27.19% bitmap_8888_scale_rotate_bilerp
y 27.23% bitmap_8888_update_scale_rotate_bilerp
y 27.29% bitmap_8888_update_volatile_scale_rotate_bilerp
y 55.08% bitmap_8888_A_scale_bilerp
y 58.75% bitmap_8888_update_volatile_scale_bilerp
y 58.90% bitmap_8888_scale_bilerp
y 58.92% bitmap_8888_update_scale_bilerp
Overall speedup: 10.52%
BUG=skia:1111
R=djsollen@google.com
Review URL: https://codereview.chromium.org/21203005
git-svn-id: http://skia.googlecode.com/svn/trunk@10474 2bbb7eff-a529-9590-31e7-b0007b416f81
This is a step toward targets declaring their deps in a sane fashion.
This change resolves cycles by forcing core to the root, then everything
in skia_lib pointing toward core as best possible, then everything
outside skia_lib depending on skia_lib for things in skia_lib. This
prevents double definitions where a symbol is provided by both the
skia_lib shared object and and a statically linked component of skia_lib.
R=djsollen@google.com
Review URL: https://codereview.chromium.org/19823003
git-svn-id: http://skia.googlecode.com/svn/trunk@10231 2bbb7eff-a529-9590-31e7-b0007b416f81
Need to avoid linking in .a things which are already provided by .so things.
git-svn-id: http://skia.googlecode.com/svn/trunk@10222 2bbb7eff-a529-9590-31e7-b0007b416f81
This is a step toward targets declaring their deps in a sane fashion.
This change resolves cycles by forcing core to the root,
then opts, ports, and utils depending on core, then everything else.
We will need some other change to resolve the fact that
core, opts, ports, and utils depend on each other and other targets which
depend on them. Outside of these targets, things look ok.
R=djsollen@google.com
Review URL: https://codereview.chromium.org/19823003
git-svn-id: http://skia.googlecode.com/svn/trunk@10217 2bbb7eff-a529-9590-31e7-b0007b416f81
- Add nacl_make script to build Skia targets for NaCl using gyp
- Add nacl_interface for command-line apps
- Add nacl_sample as front-end for SampleApp
- Add freetype to DEPS
- Various gyp tweaks for NaCl
TODO:
- Implement GL interface
- Implement font host
- Fix plumbing so that SampleApp works properly
Review URL: https://codereview.appspot.com/6671044
git-svn-id: http://skia.googlecode.com/svn/trunk@6245 2bbb7eff-a529-9590-31e7-b0007b416f81
- Roll GYP so that we get non-thin archives on Linux
- Add merge_static_libs.py
- Add skia_core_lib target which builds core, ports, opts*, and utils
- Replace dependencies on core/ports/opts/utils with skia_core_libs
- Rename exportable libraries with "skia_"
Review URL: https://codereview.appspot.com/6619049
git-svn-id: http://skia.googlecode.com/svn/trunk@5889 2bbb7eff-a529-9590-31e7-b0007b416f81
This patch does the following:
- Move the NEON-specific code from src/core/SkBitmapProcState_filter.h
to src/opts/SkBitmapProcState_filter_neon.h
- Implement the NEON-specific functions in the new source file
src/opts/SkBitmapProcState_opts_arm_neon.cpp, added to the "opts_neon"
static library target. All functions now use the _neon suffix, even
in full-NEON builds.
- Move most of the content of src/core/SkBitmapProcState.cpp to a
new header: src/core/SkBitmapProcState_procs.h
This header is included by two source files:
src/core/SkBitmapProcState.cpp, to define the regular functions.
src/opts/SkBitmapProcState_opts_arm_neon.cpp to define NEON ones.
This is to deal with the fact that all NEON functions now
use the _neon suffix, even in SK_ARM_NEON_IS_ALWAYS mode,
and to be able to include the same header twice in the
SK_ARM_NEON_IS_DYNAMIC case.
Review URL: https://codereview.appspot.com/6449117
git-svn-id: http://skia.googlecode.com/svn/trunk@5055 2bbb7eff-a529-9590-31e7-b0007b416f81
This patch implements dynamic ARM NEON support for the functions
implemented by src/core/SkBitmapProcState_matrixProcs.cpp.
- Because the SkBitmapProcState_matrix_{clamp,repeat}.h headers
are NEON-specific, they are renamed with a _neon.h suffix, and
moved to src/opts/ (from src/core/)
- Add a new file src/opts/SkBitmapProcState_matrixProcs_neon.cpp
which implements the NEON code paths for all builds, and add
it to the 'opts_neon' static library.
- Modify SkBitmapProcState_matrixProcs.cpp to select the right
code-path depending on our build configuration. Note that in
the case where 'arm_neon == 1', we do not embed regular ARM
code paths in the final binary. Only 'arm_neon_optional == 1'
builds will contain both regular and NEON code paths at the
same time.
Note that there doesn't seem to be a simple way to put the
NEON-specific selection from that currently is in
SkBitmapProcState_matrixProcs.cpp into src/opts/. Doing so
would require much more drastic restructuring. This is also
true of the other SkBitmapProcState source files that will
be touched in a future patch.
Review URL: https://codereview.appspot.com/6453065
git-svn-id: http://skia.googlecode.com/svn/trunk@4888 2bbb7eff-a529-9590-31e7-b0007b416f81
This patch adds minimal support for dynamic ARM NEON support,
i.e. the ability to probe the CPU at runtime for NEON and
provide alternate code paths when it is available.
- Add include/core/SkUtilsArm.h, which declares a few helper
macros (e.g. SK_NEON_ARM_IS_DYNAMIC), plus the handy
function 'sk_cpu_arm_has_neon()' which returns true if
the target CPU supports the ARM NEON instruction set.
Note that the header is in include/core/ because it will
have to be included from NEON-specific code under src/code/
It would probably be more logical to put it under include/opts/
instead, but this would require moving all the NEON-specific
stuff under src/code/ into src/opts/, which is not trivial
due to the way the code is currently architected.
- Add src/core/SkUtilsArm.cpp which implements
'sk_cpu_arm_has_neon' for ARM-based Linux systems, only
when SK_NEON_ARM_IS_DYNAMIC is true.
(For other cases, 'sk_cpu_arm_has_neon' is an inline function
that returns a constant 'true' or 'false' value).
There is no user-level accessible CPUID instruction on ARM,
so do all CPU feature probing by parsing /proc/cpuinfo.
This is Linux-specific.
For Debug build types, the CPU probing result is printed
to the Android log (or Linux command-line) for easier
debugging.
- Create a new 'opts_neon' target (static library) which shall
contain all the NEON-specific code paths for the library.
This is necessary because -mfpu=neon impacts also non-scalar
code. Just like with -mssse3 on x86, we can't build the rest
of the library with this flag.
Note that for now, we only include memset16_neon and
memset32_neon in this library.
- Modify opts_check_arm.cpp to implement SK_ARM_NEON_IS_DYNAMIC
properly.
Compared to a 'xoom' build, the only difference is the use of
NEON-optimized memset16/32 functions. Later patches will move
more NEON-specific code paths to 'opts_neon'.
Review URL: https://codereview.appspot.com/6247058
git-svn-id: http://skia.googlecode.com/svn/trunk@4069 2bbb7eff-a529-9590-31e7-b0007b416f81
Do this, rather than including common.gypi explicitly in all our gyp files, so that gyp files we use but do not maintain (e.g., third_party/externals/libjpeg/libjpeg.gyp) will include common.gypi too.
Review URL: https://codereview.appspot.com/5820068
git-svn-id: http://skia.googlecode.com/svn/trunk@3411 2bbb7eff-a529-9590-31e7-b0007b416f81
see http://codereview.appspot.com/4527084/
Now, to build out/Debug/SampleApp on Linux, do the following:
cd trunk/gyp
rm -rf Makefile *mk *.Makefile out
./gyp_skia -fmake --ignore-environment "--toplevel-dir=$PWD" \
-Icommon.gypi "--depth=$PWD" SampleApp.gyp
make
git-svn-id: http://skia.googlecode.com/svn/trunk@1446 2bbb7eff-a529-9590-31e7-b0007b416f81