Commit Graph

119 Commits

Author SHA1 Message Date
mtklein
20473344b2 spin off some safe parts from AVX2 CL
(reviewed here https://codereview.chromium.org/1532613002/)

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1628333003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1628333003
2016-01-25 09:26:54 -08:00
msarett
0dfffbeeec Revert of AVX 2 SrcOver blits: color32, blitmask. (patchset #24 id:450001 of https://codereview.chromium.org/1532613002/ )
Reason for revert:
Bot failures

Original issue's description:
> AVX 2 SrcOver blits: color32, blitmask.
>
> As a follow up to the SSE 4.1 CL, this should look pretty familiar.
>
> I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp.  I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later.
>
> Perf changes (relative to SSE 4.1):
> Xfermode_SrcOver:      1650 -> 1180  (0.71x)  // large opaque blit
> Xfermode_SrcOver_aa:   1794 -> 1653  (0.92x)  // large opaque + small transparent
> text_16_AA_{FF,BK,WT}: 1.72 -> 1.59  (0.92x)  // small opaque blit
> text_16_AA_88:         1.83 -> 1.77  (0.97x)  // small transparent blit
>
> This should be a big throughout win, and a small latency win.
> This should all be pixel-exact to the previous SSE 4.1 code.
>
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36

TBR=herb@google.com,mtklein@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Review URL: https://codereview.chromium.org/1632713002
2016-01-25 08:54:50 -08:00
mtklein
5d2117015e AVX 2 SrcOver blits: color32, blitmask.
As a follow up to the SSE 4.1 CL, this should look pretty familiar.

I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp.  I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later.

Perf changes (relative to SSE 4.1):
Xfermode_SrcOver:      1650 -> 1180  (0.71x)  // large opaque blit
Xfermode_SrcOver_aa:   1794 -> 1653  (0.92x)  // large opaque + small transparent
text_16_AA_{FF,BK,WT}: 1.72 -> 1.59  (0.92x)  // small opaque blit
text_16_AA_88:         1.83 -> 1.77  (0.97x)  // small transparent blit

This should be a big throughout win, and a small latency win.
This should all be pixel-exact to the previous SSE 4.1 code.

GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot

Review URL: https://codereview.chromium.org/1532613002
2016-01-25 08:37:30 -08:00
mtklein
85533968a8 Add build targets for advanced Intel instruction sets (1.5 of 3)
This is a follow-up to https://codereview.chromium.org/1290423007/,
with a couple small changes:

   - turn on AVX and AVX2 for Windows using /arch ('EnabledEnhancedInstructionSet')
   - reformat and de-conditionalize where possible / irrelevant

Picked up this while poking around in libvpx's Chrome GYPs.

And yes, AVX = 3, AVX2 = 5.  Don't even ask what 4 means...

BUG=skia:4117

Review URL: https://codereview.chromium.org/1309253002
2015-08-25 06:30:07 -07:00
mtklein
5141d90796 Add build targets for advanced Intel instruction sets (1 of 3).
CL (1 of 3) adds empty lists in our .gypi,
and builds the files in those empty lists with the appropriate flags.

CL (2 of 3) will have Chrome's GYP and GN files read these lists,
and build them with the appropriate flags.

CL (3 of 3) will add runtime detection and stub files to the lists
with empty Init_sse42(), Init_avx(), Init_avx2() methods.

After that, we should be able to use SSE 4.2, AVX, and AVX2 if desired.

Some motivation:
  - SSE 4.2 adds some sweet string-oriented methods that
    can help us write fast high quality 32-bit hashes.
  - AVX is SSE doubled, e.g. 8 floats or two SkPMFloat at a time.
  - AVX2 is SSE2 doubled, e.g. 8 pixels at a time.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1290423007
2015-08-24 10:32:02 -07:00
mtklein
b6394746ff Port SkTextureCompression opts to SkOpts
Pretty vanilla translation.  I cleaned up who calls whom a little.
Used to be utils -> opts -> utils, now it's just utils -> opts.

I may follow up with a pass over the NEON code for readability
and to clean up dead code.

This turns on NEON A8->R11EAC conversion for ARMv8.
Unit tests which now hit the NEON code still pass.
I can't find any related bench.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1273103002
2015-08-06 08:17:16 -07:00
mtklein
cd1f2daf44 add -Iinclude/private anywhere we have -Isrc/core
I'll be moving headers from src/core to include/private, so this guarantees
that anyone who was finding them via -Isrc/core can now find them via
-Iinclude/private.

This is purely mechanical, mostly to preserve my sanity, so it's likely
(harmless) overkill.

Chromium's GYP and GN builds already set -Iinclude/private for Skia builds.

BUG=skia:4126

Review URL: https://codereview.chromium.org/1265443002
2015-07-28 08:55:14 -07:00
mtklein
56b78a7a2a Revert of Lay groundwork for SkOpts. (patchset #3 id:40001 of https://codereview.chromium.org/1255193002/)
Reason for revert:
Chromium doesn't call SkGraphics::Init().  This setup won't work.

Original issue's description:
> Lay groundwork for SkOpts.
>
> This doesn't really do anything yet.  It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.
>
> BUG=skia:4117
>
> Committed: https://skia.googlesource.com/skia/+/ce2c5055cee5d5d3c9fc84c1b3eeed4b4d84a827

TBR=djsollen@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:4117

Review URL: https://codereview.chromium.org/1261743002
2015-07-27 12:03:23 -07:00
mtklein
ce2c5055ce Lay groundwork for SkOpts.
This doesn't really do anything yet.  It's just the CPU detection code, skeleton new .cpp files, and a few little .gyp tweaks.

BUG=skia:4117

Review URL: https://codereview.chromium.org/1255193002
2015-07-27 10:52:33 -07:00
mtklein
f3f9440372 Revert of Allow NEON on iOS. (patchset #3 id:40001 of https://codereview.chromium.org/1091823002/)
Reason for revert:
need to coordinate this with chrome gyps/gn

Original issue's description:
> Allow NEON on iOS.
>
> I have nanobench building and running (Color32_arm_neon) on my iPad.
>
> No public API changes.
> TBR=reed@google.com
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/e5043b7ea5170a639367b67c986568907576be4b

TBR=caryclark@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1094843005
2015-04-17 14:02:40 -07:00
mtklein
e5043b7ea5 Allow NEON on iOS.
I have nanobench building and running (Color32_arm_neon) on my iPad.

No public API changes.
TBR=reed@google.com

BUG=skia:

Review URL: https://codereview.chromium.org/1091823002
2015-04-17 06:36:53 -07:00
mtklein
a863e41d7f This section is moot: we divert ios to opts_none above.
(We never set arm_version = 7 for iOS...)

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.7-Clang-Arm7-Debug-iOS-Trybot

BUG=skia:

Review URL: https://codereview.chromium.org/1092753002
2015-04-16 12:45:38 -07:00
mtklein
a67572ff91 Remove ARM assembly memsets.
Step 1 of a zillion in the quest for NEON on iOS,
and step 1 of a different zillion in the Great Assembly Purge.

ios, arm, arm64, arm_v7, arm_v7_neon all build.

BUG=skia:

Review URL: https://codereview.chromium.org/1072063002
2015-04-09 09:16:28 -07:00
scroggo
3e5622764a Add copyright headers to remaining gyp files.
Prevents some PRESUBMIT errors.

TBR=mtklein@google.com

Review URL: https://codereview.chromium.org/1035523003
2015-03-25 10:22:41 -07:00
scroggo
df1c3373fc Don't use m32 cflag for x86_64.
When checking the skia_arch_type for "x86", instead of doing an
== compare, check if "x86" in skia_arch_type, so it will cover
both x86 and x86_64.

Except when we specifically want x86.

Set skia_arch_width based on "64" in skia_arch_type. No need to specify
in scripts.

In gyp_to_android.py, create a separate var_dict for x86_64.

BUG=skia:3419

Review URL: https://codereview.chromium.org/916113002
2015-02-12 10:48:25 -08:00
mtklein
d160192fd9 Revert of GYP groudwork for half-float opts support. (patchset #1 id:1 of https://codereview.chromium.org/915693002/)
Reason for revert:
Going to punt on 16-bit float support for now.  Can't figure out ARM 64.

Original issue's description:
> GYP groudwork for half-float opts support.
>
> This sets us up two new opts targets with the immediate goal of adding half-float (SkHalf.h) opts:
>   - opts_neon_fp16: uses hardware support on most ARM chips with NEON to do 4 conversions at a time;
>   - opts_avx: uses hardware support on Intel chips with AVX to do 8 conversions at a time.
>
> opts_avx will be a handy thing to have around later too, especially if we want to work with floats.
>
> This doesn't actually add any new source files to these libraries yet, so they're no-ops for now.
> I'll need to write a parallel change to Chrome's GN and GYPs before we can start adding sources.
>
> This also rolls GYP up to head, to get suppport for EnableEnhancedInstructionSet: '3' on Windows,
> which is how we turn on AVX there.  There's no Mac-specific flag, so we use OTHER_CPLUSPLUSFLAGS.
>
> BUG=skia:
>
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/46b80833394d7919cadf2abf2b93802141dd21c5

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/912223002
2015-02-10 18:18:18 -08:00
mtklein
46b8083339 GYP groudwork for half-float opts support.
This sets us up two new opts targets with the immediate goal of adding half-float (SkHalf.h) opts:
  - opts_neon_fp16: uses hardware support on most ARM chips with NEON to do 4 conversions at a time;
  - opts_avx: uses hardware support on Intel chips with AVX to do 8 conversions at a time.

opts_avx will be a handy thing to have around later too, especially if we want to work with floats.

This doesn't actually add any new source files to these libraries yet, so they're no-ops for now.
I'll need to write a parallel change to Chrome's GN and GYPs before we can start adding sources.

This also rolls GYP up to head, to get suppport for EnableEnhancedInstructionSet: '3' on Windows,
which is how we turn on AVX there.  There's no Mac-specific flag, so we use OTHER_CPLUSPLUSFLAGS.

BUG=skia:

TBR=reed@google.com

Review URL: https://codereview.chromium.org/915693002
2015-02-10 09:17:05 -08:00
mtklein
f7069d58fc Split src/opts source lists out of opts.gyp.
This should make it easier to keep our opts.gyp in sync with Chrome's GYP and GN.

BUG=skia:

Landing this without review as a mega-tryjob.
TBR=reed@google.com

Committed: https://skia.googlesource.com/skia/+/c98fe3aa4f8c97c462c0eb6d9106fc37e48d7f82

Review URL: https://codereview.chromium.org/870353003
2015-01-26 18:55:58 -08:00
mtklein
0933725e49 Revert of Split src/opts source lists out of opts.gyp. (patchset #1 id:1 of https://codereview.chromium.org/870353003/)
Reason for revert:
Android Makefiles broken

Original issue's description:
> Split src/opts source lists out of opts.gyp.
>
> This should make it easier to keep our opts.gyp in sync with Chrome's GYP and GN.
>
> BUG=skia:
>
> Landing this without review as a mega-tryjob.
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/c98fe3aa4f8c97c462c0eb6d9106fc37e48d7f82

TBR=mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/880783002
2015-01-26 18:15:31 -08:00
mtklein
c98fe3aa4f Split src/opts source lists out of opts.gyp.
This should make it easier to keep our opts.gyp in sync with Chrome's GYP and GN.

BUG=skia:

Landing this without review as a mega-tryjob.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/870353003
2015-01-26 18:05:37 -08:00
bungeman
2d80dd2647 Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #14 id:260001 of https://codereview.chromium.org/874863002/)
Reason for revert:
This kills Mac 10.6 bots.

FAILED: c++ -MMD -MF obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o.d -DSK_INTERNAL -DSK_GAMMA_SRGB -DSK_GAMMA_APPLY_TO_A8 -DSK_SCALAR_TO_FLOAT_EXCLUDED -DSK_ALLOW_STATIC_GLOBAL_INITIALIZERS=1 -DSK_SUPPORT_GPU=1 -DSK_SUPPORT_OPENCL=0 -DSK_FORCE_DISTANCE_FIELD_TEXT=0 -DSK_BUILD_FOR_MAC -DSK_CRASH_HANDLER -DSK_DEVELOPER=1 -I../../src/core -I../../src/utils -I../../include/c -I../../include/config -I../../include/core -I../../include/pathops -I../../include/pipe -I../../include/utils/mac -I../../include/effects -O0 -gdwarf-2 -mmacosx-version-min=10.6 -arch x86_64 -mssse3 -Wall -Wextra -Winit-self -Wpointer-arith -Wsign-compare -Wno-unused-parameter -Wno-invalid-offsetof -msse4.1  -c ../../src/opts/SkBlitRow_opts_SSE4.cpp -o obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o
../../src/opts/SkBlitRow_opts_SSE4.cpp:15:27: warning: x86intrin.h: No such file or directory
../../src/opts/SkBlitRow_opts_SSE4.cpp: In function 'void S32A_Opaque_BlitRow32_SSE4(SkPMColor*, const SkPMColor*, int, U8CPU)':
../../src/opts/SkBlitRow_opts_SSE4.cpp:40: error: '_mm_testz_si128' was not declared in this scope
../../src/opts/SkBlitRow_opts_SSE4.cpp:45: error: '_mm_testc_si128' was not declared in this scope

Original issue's description:
> SSE4 opaque blend using intrinsics instead of assembly.
>
> Since we had such a hard time with the assembly versions of this blit (to the
> point that we have them completely disabled everywhere), I thought I'd take
> a shot at writing a version of the blit using intrinsics.
>
> The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> to skip the blend when the 16 src pixels we consider each loop are all opaque
> or all transparent.  _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> all those alphas.
>
> It's worth looking to see if we can backport this type of logic to SSE2 using
> _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
>
> My local performance testing doesn't show this to be an unambiguous win
> (there are probably microbenchmarks and SKPs where we'd be better off just
> powering through the blend rather than looking at alphas), but the potential
> does seem tantalizing enough to let skiaperf vet it on the bots.  (< 1.0x is a win.)
>
> DM says it draws pixel perfect compare to the old code.
>
> Microbenchmarks:
>                bitmap_RGBA_8888_A_source_stripes_two	  14us -> 14.4us	1.03x
>              bitmap_RGBA_8888_A_source_stripes_three	14.3us -> 14.5us	1.01x
>                        bitmap_RGBA_8888_scale_bilerp	61.9us -> 62.2us	1.01x
> bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp	 102us ->  101us	0.99x
>                 bitmap_RGBA_8888_scale_rotate_bilerp	 103us ->  101us	0.99x
>                               bitmap_RGBA_8888_scale	18.4us -> 18.2us	0.99x
>              bitmap_RGBA_8888_A_scale_rotate_bicubic	  71us ->   70us	0.99x
>          bitmap_RGBA_8888_update_scale_rotate_bilerp	 103us ->  101us	0.99x
>               bitmap_RGBA_8888_A_scale_rotate_bilerp	 112us ->  109us	0.98x
>                     bitmap_RGBA_8888_update_volatile	5.72us -> 5.58us	0.98x
>                                     bitmap_RGBA_8888	5.73us -> 5.58us	0.97x
>                              bitmap_RGBA_8888_update	5.78us ->  5.6us	0.97x
>                      bitmap_RGBA_8888_A_scale_bilerp	70.7us ->   68us	0.96x
>                     bitmap_RGBA_8888_A_scale_bicubic	23.7us -> 21.8us	0.92x
>                                   bitmap_RGBA_8888_A	13.9us -> 10.9us	0.78x
>                     bitmap_RGBA_8888_A_source_opaque	  14us -> 6.29us	0.45x
>                bitmap_RGBA_8888_A_source_transparent	  14us -> 3.65us	0.26x
>
> Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> be a decent predictor of real-world impact.
>
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785

TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842

Review URL: https://codereview.chromium.org/874033004
2015-01-26 14:32:09 -08:00
mtklein
04bc91b972 SSE4 opaque blend using intrinsics instead of assembly.
Since we had such a hard time with the assembly versions of this blit (to the
point that we have them completely disabled everywhere), I thought I'd take
a shot at writing a version of the blit using intrinsics.

The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
to skip the blend when the 16 src pixels we consider each loop are all opaque
or all transparent.  _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
all those alphas.

It's worth looking to see if we can backport this type of logic to SSE2 using
_mm_movemask_epi8, or up to 32 pixels at a time using AVX.

My local performance testing doesn't show this to be an unambiguous win
(there are probably microbenchmarks and SKPs where we'd be better off just
powering through the blend rather than looking at alphas), but the potential
does seem tantalizing enough to let skiaperf vet it on the bots.  (< 1.0x is a win.)

DM says it draws pixel perfect compare to the old code.

Microbenchmarks:
               bitmap_RGBA_8888_A_source_stripes_two	  14us -> 14.4us	1.03x
             bitmap_RGBA_8888_A_source_stripes_three	14.3us -> 14.5us	1.01x
                       bitmap_RGBA_8888_scale_bilerp	61.9us -> 62.2us	1.01x
bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp	 102us ->  101us	0.99x
                bitmap_RGBA_8888_scale_rotate_bilerp	 103us ->  101us	0.99x
                              bitmap_RGBA_8888_scale	18.4us -> 18.2us	0.99x
             bitmap_RGBA_8888_A_scale_rotate_bicubic	  71us ->   70us	0.99x
         bitmap_RGBA_8888_update_scale_rotate_bilerp	 103us ->  101us	0.99x
              bitmap_RGBA_8888_A_scale_rotate_bilerp	 112us ->  109us	0.98x
                    bitmap_RGBA_8888_update_volatile	5.72us -> 5.58us	0.98x
                                    bitmap_RGBA_8888	5.73us -> 5.58us	0.97x
                             bitmap_RGBA_8888_update	5.78us ->  5.6us	0.97x
                     bitmap_RGBA_8888_A_scale_bilerp	70.7us ->   68us	0.96x
                    bitmap_RGBA_8888_A_scale_bicubic	23.7us -> 21.8us	0.92x
                                  bitmap_RGBA_8888_A	13.9us -> 10.9us	0.78x
                    bitmap_RGBA_8888_A_source_opaque	  14us -> 6.29us	0.45x
               bitmap_RGBA_8888_A_source_transparent	  14us -> 3.65us	0.26x

Running over our ~70 SKP web page captures, this looks like we spend 0.7x
the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
be a decent predictor of real-world impact.

BUG=chromium:399842

Review URL: https://codereview.chromium.org/874863002
2015-01-26 14:06:43 -08:00
djsollen
a3a706fcd4 Cleanup android to ensure it can compile with clang
Review URL: https://codereview.chromium.org/694533002
2014-10-30 11:39:13 -07:00
djsollen
bc89329fb5 Enable the SSSE3 compile time check on all platforms (4th attempt)
BUG=skia:2746
R=bungeman@google.com, robertphillips@google.com, mtklein@google.com

Author: djsollen@google.com

Review URL: https://codereview.chromium.org/414033002
2014-07-24 13:53:56 -07:00
bungeman
2da0f85488 Revert of Enable the SSSE3 compile time check on all platforms. (https://codereview.chromium.org/403583002/)
Reason for revert:
This is blocking the roll. Chromium Windows trybots (like win_chromium_x64_rel) are crashing in the SSSE3 code (for example SkCanvasVideoRenderTest.CroppedFrame).

Original issue's description:
> Enable the SSSE3 compile time check on all platforms (3rd attempt)
>
> BUG=skia:2746
>
> Committed: https://skia.googlesource.com/skia/+/933834851f9d48fbd85b728cc92e1f0134bfaa4e

R=halcanary@google.com, mtklein@google.com, djsollen@google.com
TBR=djsollen@google.com, halcanary@google.com, mtklein@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:2746

Author: bungeman@google.com

Review URL: https://codereview.chromium.org/418523002
2014-07-23 11:28:18 -07:00
djsollen
933834851f Enable the SSSE3 compile time check on all platforms (3rd attempt)
BUG=skia:2746
R=halcanary@google.com, mtklein@google.com

Author: djsollen@google.com

Review URL: https://codereview.chromium.org/403583002
2014-07-22 07:20:18 -07:00
krajcevski
630598cbb8 Add support for NEON intrinsics to speed up texture compression. We can
now convert the time that we would have spent uploading the texture to
compressing it giving a net 50% memory savings for these things.

Committed: https://skia.googlesource.com/skia/+/bc9205be0a1094e312da098348601398c210dc5a

R=robertphillips@google.com, mtklein@google.com, kevin.petit@arm.com

Author: krajcevski@google.com

Review URL: https://codereview.chromium.org/390453002
2014-07-14 12:00:04 -07:00
krajcevski
d41dab43e6 Revert of Add support for NEON intrinsics to speed up texture compression. We can (https://codereview.chromium.org/390453002/)
Reason for revert:
Breaking chrome.

Original issue's description:
> Add support for NEON intrinsics to speed up texture compression. We can
> now convert the time that we would have spent uploading the texture to
> compressing it giving a net 50% memory savings for these things.
>
> Committed: https://skia.googlesource.com/skia/+/bc9205be0a1094e312da098348601398c210dc5a

R=robertphillips@google.com, mtklein@google.com, kevin.petit@arm.com
TBR=kevin.petit@arm.com, mtklein@google.com, robertphillips@google.com
NOTREECHECKS=true
NOTRY=true

Author: krajcevski@google.com

Review URL: https://codereview.chromium.org/384053003
2014-07-11 13:15:15 -07:00
krajcevski
bc9205be0a Add support for NEON intrinsics to speed up texture compression. We can
now convert the time that we would have spent uploading the texture to
compressing it giving a net 50% memory savings for these things.

R=robertphillips@google.com, mtklein@google.com, kevin.petit@arm.com

Author: krajcevski@google.com

Review URL: https://codereview.chromium.org/390453002
2014-07-11 12:12:27 -07:00
djordje.pesut
a26bbb95a6 MIPS: added optimizations for functions from SkBitmapProcState
gain is ~30%

following functions are optimized:
  SI8_D16_nofilter_DX
  SI8_opaque_D32_nofilter_DX

R=djsollen@google.com, teodora.petrovic@gmail.com

Author: djordje.pesut@imgtec.com

Review URL: https://codereview.chromium.org/336533003
2014-07-08 02:24:16 -07:00
henrik.smiding
5f7f9d04dc Add SSE4 version of BlurImage optimizations.
Adds an SSE4.1 version of the existing BlurImage optimizations.
Performance of blur_image_filter_* benchmarks show a 10-50%
improvement on Linux/Ubuntu Core i7.

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: https://skia.googlesource.com/skia/+/2830632ce93c97ed7647b13348365ea92e4ea665

R=mtklein@google.com, reed@chromium.org

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/366593004
2014-07-07 08:05:40 -07:00
reed
82cb86f613 Revert of Add SSE4 version of BlurImage optimizations. (https://codereview.chromium.org/366593004/)
Reason for revert:
breaks linker on chrome

[04:36:09.966000] [503/5965] LIB obj\chrome\installer_util.lib
[04:36:10.466000] FAILED: C:\Users\chrome-bot\buildbot\third_party\depot_tools\python276_bin\python.exe gyp-win-tool link-with-manifests environment.x86 True skia.dll "C:\Users\chrome-bot\buildbot\third_party\depot_tools\python276_bin\python.exe gyp-win-tool link-wrapper environment.x86 False link.exe /nologo /IMPLIB:skia.dll.lib /DLL /OUT:skia.dll @skia.dll.rsp" 2 mt.exe rc.exe "obj\skia\skia.skia.dll.intermediate.manifest" obj\skia\skia.skia.dll.generated.manifest
[04:36:10.466000] skia.opts_check_x86.obj : error LNK2019: unresolved external symbol "bool __cdecl SkBoxBlurGetPlatformProcs_SSE4(void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int))" (?SkBoxBlurGetPlatformProcs_SSE4@@YA_NPAP6AXPBIHPAIHHHHH@Z222@Z) referenced in function "bool __cdecl SkBoxBlurGetPlatformProcs(void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int),void (__cdecl**)(unsigned int const *,int,unsigned int *,int,int,int,int,int))" (?SkBoxBlurGetPlatformProcs@@YA_NPAP6AXPBIHPAIHHHHH@Z222@Z)
[04:36:10.466000]
[04:36:10.466000] skia.dll : fatal error LNK1120: 1 unresolved externals

Original issue's description:
> Add SSE4 version of BlurImage optimizations.
>
> Adds an SSE4.1 version of the existing BlurImage optimizations.
> Performance of blur_image_filter_* benchmarks show a 10-50%
> improvement on Linux/Ubuntu Core i7.
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/2830632ce93c97ed7647b13348365ea92e4ea665

R=mtklein@google.com, henrik.smiding@intel.com
TBR=henrik.smiding@intel.com, mtklein@google.com
NOTREECHECKS=true
NOTRY=true

Author: reed@chromium.org

Review URL: https://codereview.chromium.org/375503003
2014-07-06 18:51:29 -07:00
henrik.smiding
2830632ce9 Add SSE4 version of BlurImage optimizations.
Adds an SSE4.1 version of the existing BlurImage optimizations.
Performance of blur_image_filter_* benchmarks show a 10-50%
improvement on Linux/Ubuntu Core i7.

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

R=mtklein@google.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/366593004
2014-07-04 04:23:17 -07:00
henrik.smiding
3bb195ef0d Add SSE4 optimization of S32A_Opaque_Blitrow
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.

Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.

bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e

Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/289473009
2014-06-27 08:03:17 -07:00
mtklein
db6346a5b1 Revert of Add SSE4 optimization of S32A_Opaque_Blitrow (https://codereview.chromium.org/289473009/)
NOTREECHECKS=true
NOTRY=true

Reason for revert:
Valgrind bot's seeing this code use uninitialized memory, and it's somehow blocking our roll into Chrome too:

> ld: warning: could not create compact unwind for
S32A_Opaque_BlitRow32_SSE4_asm:
> stack subq instruction is too different from dwarf stack size
> [10339/10982 | 3247.792] PACKAGE FRAMEWORK "Chromium Framework.framework",
> POSTBUILDS
> FAILED: ./gyp-mac-tool package-framework "Chromium Framework.framework" A &&
> (export
> BUILT_PRODUCTS_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export CONFIGURATION=Release; export CONTENTS_FOLDER_PATH="Chromium
> Framework.framework/Versions/A"; export
> DYLIB_INSTALL_NAME_BASE=@executable_path/../Versions/37.0.2056.0; export
> EXECUTABLE_NAME="Chromium Framework"; export EXECUTABLE_PATH="Chromium
> Framework.framework/Versions/A/Chromium Framework"; export
> FULL_PRODUCT_NAME="Chromium Framework.framework"; export
> INFOPLIST_PATH="Chromium Framework.framework/Versions/A/Resources/Info.plist";
> export
LD_DYLIB_INSTALL_NAME="@executable_path/../Versions/37.0.2056.0/Chromium
> Framework.framework/Chromium Framework"; export MACH_O_TYPE=mh_dylib; export
> PRODUCT_NAME="Chromium Framework"; export
> PRODUCT_TYPE=com.apple.product-type.framework; export
>
SDKROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.6.sdk;
> export
>
SRCROOT=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/../../chrome;
> export SOURCE_ROOT="${SRCROOT}"; export
> TARGET_BUILD_DIR=/Volumes/data/b/build/slave/mac_gpu/build/src/out/Release;
> export TEMP_DIR="${TMPDIR}"; export
UNLOCALIZED_RESOURCES_FOLDER_PATH="Chromium
> Framework.framework/Versions/A/Resources"; export WRAPPER_NAME="Chromium
> Framework.framework"; (cd ../../chrome && ../build/mac/tweak_info_plist.py
> "--breakpad=1" "--breakpad_uploads=0" "--keystone=0" "--scm=1"
> "--branding=Chromium" && ln -fns Versions/Current/Libraries
> "${BUILT_PRODUCTS_DIR}/${WRAPPER_NAME}/Libraries" &&
> tools/build/mac/verify_order _ChromeMain
> "${BUILT_PRODUCTS_DIR}/${EXECUTABLE_PATH}"); G=$?; ((exit $G) || rm -rf
> 'Chromium Framework.framework') && exit $G) && touch "Chromium
> Framework.framework"
> tools/build/mac/verify_order: unordered symbols in
> /Volumes/data/b/build/slave/mac_gpu/build/src/out/Release/Chromium
> Framework.framework/Versions/A/Chromium Framework:
> S32A_Opaque_BlitRow32_SSE4_asm
> _S32A_Opaque_BlitRow32_SSE4_asm
> ninja: build stopped: subcommand failed.

Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e
>
> Committed: https://skia.googlesource.com/skia/+/b5c281e1e06af3be804309877de1dac6145686b9

R=reed@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com, mtklein@chromium.org

Author: mtklein@google.com

Review URL: https://codereview.chromium.org/336413007
2014-06-17 17:37:05 -07:00
henrik.smiding
b5c281e1e0 Add SSE4 optimization of S32A_Opaque_Blitrow
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.

Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.

bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/289473009
2014-06-17 11:32:47 -07:00
djordje.pesut
632a4546b0 MIPS: added optimization for functions from SkBlitRow.
gain is ~40%

following function are optimized:
  S32_D565_Blend
  S32A_D565_Opaque_Dither
  S32_D565_Opaque_Dither
  S32_D565_Blend_Dither
  S32A_D565_Opaque
  S32A_D565_Blend
  S32_Blend_BlitRow32

R=djsollen@google.com, teodora.petrovic@gmail.com

Author: djordje.pesut@imgtec.com

Review URL: https://codereview.chromium.org/326913004
2014-06-11 06:56:10 -07:00
jvanverth
71804cc303 Revert of Add SSE4 optimization of S32A_Opaque_Blitrow (https://codereview.chromium.org/289473009/)
Reason for revert:
Buildbot failures on Mac 10.6 and Mac 10.7.

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com, henrik.smiding@intel.com
TBR=reed@google.com
NOTRY=True

Original issue's description:
> Add SSE4 optimization of S32A_Opaque_Blitrow
>
> Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
> instruction set. Special case for when alpha is zero or opaque.
>
> Performance increase of 10%-400% compared to the existing SSE2
> optimization (measured on Silvermont architecture).
> Noticeable in ~25 different skia bench subtests, especially in
> bitmap_8888_*, repeatTile_*, and morph_*.
>
> bitmap_8888_A - 100% faster
> bitmap_8888_A_source_transparent - 250% faster
> bitmap_8888_A_source_opaque - 25% faster
> bitmap_8888_A_scale_bicubic - 75% faster
>
> Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>
>
> Committed: https://skia.googlesource.com/skia/+/e2527b147679b0c43019fae7d59cc3777d2d097e

Author: jvanverth@google.com

Review URL: https://codereview.chromium.org/311053009
2014-06-05 08:19:23 -07:00
henrik.smiding
e2527b1476 Add SSE4 optimization of S32A_Opaque_Blitrow
Adds optimization of Skia S32A_Opaque_Blitrow blitter using SSE4.2 SIMD
instruction set. Special case for when alpha is zero or opaque.

Performance increase of 10%-400% compared to the existing SSE2
optimization (measured on Silvermont architecture).
Noticeable in ~25 different skia bench subtests, especially in
bitmap_8888_*, repeatTile_*, and morph_*.

bitmap_8888_A - 100% faster
bitmap_8888_A_source_transparent - 250% faster
bitmap_8888_A_source_opaque - 25% faster
bitmap_8888_A_scale_bicubic - 75% faster

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/289473009
2014-06-05 07:50:54 -07:00
kevin.petit
866b95d65d ARM Skia NEON patches - 38 - arm64 8888 blitters
Enable NEON on arm64 for most 8888 blitters

This patch enables NEON optimisation for the Color32, S32_Blend,
S32A_Opaque blitters on arm64.

Here are the perf improvements vs the existing code:

Color32:
========

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
|     1 |     -2.39% |     23.78% |
+-------+------------+------------+
|     2 |     -5.46% |      8.88% |
+-------+------------+------------+
|     4 |     -4.74% |      4.89% |
+-------+------------+------------+
|     8 |     67.74% |    107.12% |
+-------+------------+------------+
|    16 |     40.03% |    101.20% |
+-------+------------+------------+
|    64 |     11.09% |     98.40% |
+-------+------------+------------+
|   256 |     -2.20% |     74.81% |
+-------+------------+------------+
|  1024 |     -4.28% |     78.90% |
+-------+------------+------------+

S32_Blend:
==========

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
|     1 |      7.84% |     -6.75% |
+-------+------------+------------+
|     2 |     28.95% |     39.77% |
+-------+------------+------------+
|     4 |      5.80% |      8.26% |
+-------+------------+------------+
|     8 |      1.35% |     33.80% |
+-------+------------+------------+
|    16 |     -2.13% |     41.13% |
+-------+------------+------------+
|    64 |     -4.91% |     42.84% |
+-------+------------+------------+
|   256 |     -6.53% |     48.72% |
+-------+------------+------------+
|  1024 |     -6.65% |     46.66% |
+-------+------------+------------+

S32A_Opaque:
============

+-------+------------+------------+
| count | Cortex-A53 | Cortex-A57 |
+-------+------------+------------+
|     1 |     -7.51% |    -19.06% |
+-------+------------+------------+
|     2 |     -5.02% |    -27.70% |
+-------+------------+------------+
|     4 |     15.38% |    -21.66% |
+-------+------------+------------+
|     8 |     -0.98% |      1.05% |
+-------+------------+------------+
|    16 |     -7.35% |      3.34% |
+-------+------------+------------+
|    64 |     50.53% |     94.63% |
+-------+------------+------------+
|   256 |     71.17% |    164.10% |
+-------+------------+------------+
|  1024 |     79.58% |    197.60% |
+-------+------------+------------+

Signed-off-by: Kevin PETIT <kevin.petit@arm.com>

BUG=skia:
R=djsollen@google.com, mtklein@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/302283003
2014-06-03 10:08:07 -07:00
commit-bot@chromium.org
ab08437e39 Revert "Temporarily disable NEON on Android framework builds."
R=scroggo@google.com

Author: djsollen@google.com

Review URL: https://codereview.chromium.org/294183002

git-svn-id: http://skia.googlecode.com/svn/trunk@14844 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-05-22 13:42:34 +00:00
bungeman@google.com
daeec365ff Remove non-existent header file from Android opts.
R=djsollen@google.com

Review URL: https://codereview.chromium.org/274793004

git-svn-id: http://skia.googlecode.com/svn/trunk@14657 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-05-08 19:41:48 +00:00
djsollen@google.com
901c43c26f Temporarily disable NEON on Android framework builds.
The GCC 4.8 compiler has an AARCH64 bug that generated non-PIC output
that fails to link.

R=scroggo@google.com

Review URL: https://codereview.chromium.org/266883011

git-svn-id: http://skia.googlecode.com/svn/trunk@14597 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-05-06 19:47:07 +00:00
commit-bot@chromium.org
8c4953c6f1 Cleanup of SSE optimization files.
General cleanup of optimization files for x86/SSEx.
Renamed the opts_check_SSE2.cpp file to _x86, since it's not specific
to SSE2. Commented out the ColorRect32 optimization, since it's
disabled anyway, to make it more visible.
Also fixed a lot of indentation, inclusion guards, spelling,
copyright headers, braces, whitespace, and sorting of includes.

Author: henrik.smiding@intel.com

Signed-off-by: Henrik Smiding <henrik.smiding@intel.com>

R=reed@google.com, mtklein@google.com, tomhudson@google.com, djsollen@google.com, joakim.landberg@intel.com

Author: henrik.smiding@intel.com

Review URL: https://codereview.chromium.org/264603002

git-svn-id: http://skia.googlecode.com/svn/trunk@14464 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-04-30 14:58:46 +00:00
commit-bot@chromium.org
c524e98f1e Xfermode: SSE2 implementation of multiply_modeproc
This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply   8888:  cmsecs =     33.30   565:  cmsecs =     45.65
after:
Xfermode_Multiply   8888:  cmsecs =     17.18   565:  cmsecs =     24.87

BUG=

Committed: http://code.google.com/p/skia/source/detail?r=14006

Committed: http://code.google.com/p/skia/source/detail?r=14050

R=mtklein@google.com, robertphillips@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/202903004

git-svn-id: http://skia.googlecode.com/svn/trunk@14107 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-04-09 15:43:46 +00:00
commit-bot@chromium.org
479d1ae9b6 Fixes to Android.mk generation for arm64.
Remove warning about no optimizations for arm64 and rebaseline the
associated test.

Exclude _opts_none.cpps when building arm64, to avoid double definitions.

BUG=skia:1975
R=halcanary@google.com, djsollen@google.com

Author: scroggo@google.com

Review URL: https://codereview.chromium.org/229393002

git-svn-id: http://skia.googlecode.com/svn/trunk@14104 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-04-09 13:34:26 +00:00
commit-bot@chromium.org
77815fd74d Revert of Xfermode: SSE2 implementation of multiply_modeproc (https://codereview.chromium.org/202903004/)
Reason for revert:
It looks like serialization is broken. The serialize and pipe-cross-process tests are failing and turning (at least the Ubuntu12 and Win7) bots red

Original issue's description:
> Xfermode: SSE2 implementation of multiply_modeproc
>
> This patch implements basics for Xfermode SSE optimization. Based on
> these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
> implementation for other modes will come in future. With this patch
> performance of Xfermode_Multiply will improve about 45%. Here are the
> data on desktop i7-3770.
> before:
> Xfermode_Multiply   8888:  cmsecs =     33.30   565:  cmsecs =     45.65
> after:
> Xfermode_Multiply   8888:  cmsecs =     17.18   565:  cmsecs =     24.87
>
> BUG=
>
> Committed: http://code.google.com/p/skia/source/detail?r=14006
>
> Committed: http://code.google.com/p/skia/source/detail?r=14050

R=mtklein@google.com, qiankun.miao@intel.com
TBR=mtklein@google.com, qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=

Author: robertphillips@google.com

Review URL: https://codereview.chromium.org/224253003

git-svn-id: http://skia.googlecode.com/svn/trunk@14053 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-04-03 18:53:33 +00:00
commit-bot@chromium.org
c311873927 Xfermode: SSE2 implementation of multiply_modeproc
This patch implements basics for Xfermode SSE optimization. Based on
these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
implementation for other modes will come in future. With this patch
performance of Xfermode_Multiply will improve about 45%. Here are the
data on desktop i7-3770.
before:
Xfermode_Multiply   8888:  cmsecs =     33.30   565:  cmsecs =     45.65
after:
Xfermode_Multiply   8888:  cmsecs =     17.18   565:  cmsecs =     24.87

BUG=

Committed: http://code.google.com/p/skia/source/detail?r=14006

R=mtklein@google.com, robertphillips@google.com

Author: qiankun.miao@intel.com

Review URL: https://codereview.chromium.org/202903004

git-svn-id: http://skia.googlecode.com/svn/trunk@14050 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-04-03 18:26:40 +00:00
commit-bot@chromium.org
6f2d4d4679 ARM Skia NEON patches - 35 - First AArch64 support
Aarch64 support

This change contains the necessary modifications to have Skia build and
run properly on an ARMv8 processor in aarch64 execution state.

Here's a list of the changes:

 - add an arm64 target to the build system + SK_CPU_ARM64 flag

 - MatrixTest was failing when built in Release mode. Fused MAC
   instructions were generated which made some intermediate results
   more accurate. As the test relies on result comparison, the more
   precise results when compared to others led to a gap bigger than
   what was tolerated. As I don't know if some actual skia code relies
   on results being comparable, I've disabled fused MAC instruction
   with -ffp-contract=off for arm64.

 - Modify include/core/SkOnce.h to have barriers work.

 - SK_CPU_ARM64 implies SK_ARM_NEON_MODE_ALWAYS.

 - use existing Xfermode optimisations with modifications that can be
   removed in the future when toolchains are ready. Also save a few
   instructions is two Xfermodes (will apply to ARM too).

 - use existing SkBoxBlur and SkMorphology optimisations.

 - use existing SkBlitMask optimisations

 - use existing BitmapProcState and Convolution optimisations.

Future changes will include:

 - Blitters (only partialy merged upstream)

 - SkUtils (there's little value in sending asm optimisations without
   having them benchmarked on real hardware).

Signed-off-by: Kevin PETIT <kevin.petit@arm.com>

BUG=skia:

Committed: http://code.google.com/p/skia/source/detail?r=13980

R=djsollen@google.com, reed@google.com, mtklein@google.com, halcanary@google.com

Author: kevin.petit@arm.com

Review URL: https://codereview.chromium.org/143423004

git-svn-id: http://skia.googlecode.com/svn/trunk@14025 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-04-02 15:03:56 +00:00
commit-bot@chromium.org
079d298600 Revert of Xfermode: SSE2 implementation of multiply_modeproc (https://codereview.chromium.org/202903004/)
Reason for revert:
Breaking builds

Original issue's description:
> Xfermode: SSE2 implementation of multiply_modeproc
>
> This patch implements basics for Xfermode SSE optimization. Based on
> these basics, SSE2 implementation of multiply_modeproc is provided. SSE2
> implementation for other modes will come in future. With this patch
> performance of Xfermode_Multiply will improve about 45%. Here are the
> data on desktop i7-3770.
> before:
> Xfermode_Multiply   8888:  cmsecs =     33.30   565:  cmsecs =     45.65
> after:
> Xfermode_Multiply   8888:  cmsecs =     17.18   565:  cmsecs =     24.87
>
> BUG=
>
> Committed: http://code.google.com/p/skia/source/detail?r=14006

R=mtklein@google.com, qiankun.miao@intel.com
TBR=mtklein@google.com, qiankun.miao@intel.com
NOTREECHECKS=true
NOTRY=true
BUG=

Author: robertphillips@google.com

Review URL: https://codereview.chromium.org/219243009

git-svn-id: http://skia.googlecode.com/svn/trunk@14007 2bbb7eff-a529-9590-31e7-b0007b416f81
2014-04-01 14:17:44 +00:00