Commit Graph

16157 Commits

Author SHA1 Message Date
mlee
402448d681 skia: blend32_16_row for neon version
This includes blend32_16_row neon implementation
for aarch32 and aarch64.

For performance,
blend32_16_row is called in following tests in nanobench.
 - Xfermode_SrcOver
 - tablebench
 - rotated_rects_bw_alternating_transparent_and_opaque_srcover
 - rotated_rects_bw_changing_transparent_srcover
 - rotated_rects_bw_same_transparent_srcover
 - luma_colorfilter_large
 - luma_colorfilter_small
 - chart_bw

I can see perf increase in following two tests, especially. For others, looks
similar.
For each, I tried to run two times.

1) Xfermode_SrcOver
<org>
 - D/skia    ( 2000):    3M        57      17.3µs  17.4µs  17.4µs  17.7µs  1%
  █▃▂▃▂▂▂▁▃▂      565     Xfermode_SrcOver
 - D/skia    ( 1915):    3M        70      13.5µs  16.9µs  16.7µs  18.8µs  9%
  ▆█▄▅█▁▅▅▆▄      565     Xfermode_SrcOver

<new>
 - D/skia    ( 2000):    3M        8       11.6µs  11.8µs  12.1µs  14.4µs  7%
  ▃█▁▁▂▁▁▁▂▂      565     Xfermode_SrcOver
 - D/skia    ( 2004):    3M        62      10.3µs  12.9µs  13µs    15.2µs  11%
  █▅▅▆▁▅▅▅▇▃      565     Xfermode_SrcOver

2)
luma_colorfilter_large
<org>
 - D/skia    ( 2000):  159M        8       136µs   136µs   136µs   139µs   1%
  █▃▁▂▁▁▁▁▁▁      565     luma_colorfilter_large
 - D/skia    ( 1915):  158M        2       135µs   177µs   182µs   269µs   22%
  ▆▃█▁▁▃▃▃▃▃      565     luma_colorfilter_large

<new>
 - D/skia    ( 2000):  157M        5       84.2µs  85.3µs  87.5µs  110µs   9%
  █▁▂▁▁▁▁▁▁▁      565     luma_colorfilter_large
 - D/skia    ( 2004):  159M        6       84.7µs  110µs   112µs   144µs   18%
  █▄▇▁▁▄▃▄▄▆      565     luma_colorfilter_large

Review URL: https://codereview.chromium.org/847363002
2015-01-29 06:22:41 -08:00
robertphillips
9cc2f2613a Revert of Add device space "nudge" to gpu draws (patchset #5 id:70001 of https://codereview.chromium.org/877473005/)
Reason for revert:
Chrome pixel test :(

Original issue's description:
> Add device space "nudge" to gpu draws
>
> This CL nudges all the GPU draws and clips slightly to match raster's round behavior for BW draws. We assume the effect will be negligible and do it for AA draws too.
>
> BUG=423834
>
> Committed: https://skia.googlesource.com/skia/+/2d55d07501c56310f97d2092d789a2bc9fa01b78

TBR=bsalomon@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=423834

Review URL: https://codereview.chromium.org/890433003
2015-01-28 17:37:33 -08:00
robertphillips
e79d7b7929 Revert of Remove 'f' from 0.05f in shader code (patchset #1 id:1 of https://codereview.chromium.org/888483002/)
Reason for revert:
Chrome pixel test

Original issue's description:
> Remove 'f' from 0.05f in shader code
>
> TBR=bsalomon@google.com
> NOTREECHECKS=true
> NOTRY=true
>
> Committed: https://skia.googlesource.com/skia/+/1726997861fac8daa8213d1a51d5c8fbe44428d1

TBR=bsalomon@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true

Review URL: https://codereview.chromium.org/890433002
2015-01-28 17:33:45 -08:00
mtklein
f4ba3219c2 Revert of DM::SKPSrc::size() reports correct size. (patchset #3 id:40001 of https://codereview.chromium.org/863243005/)
Reason for revert:
Still no good on Chrome OS bot:

http://build.chromium.org/p/client.skia/builders/Test-ChromeOS-Alex-GMA3150-x86-Release/builds/628/steps/dm/logs/stdio

Original issue's description:
> DM::SKPSrc::size() reports correct size.
>
> Also, DM::GPUSink and DM::RasterSink crop DM::Src::size() to 2048x2048.
>
> Motivation:
>   Improve PDF testing by printing the entire SKP.
>
> Source: http://crrev.com/863243004
>
> BUG=skia:3365
>
> Committed: https://skia.googlesource.com/skia/+/441b10eac09a1f44983e35da827a6b438a409e63
>
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu12-ShuttleA-GTX660-x86-Release-Trybot,Test-ChromeOS-Alex-GMA3150-x86-Release-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/d4dd58e43ca4551531ad6a9f54bfc5632ea45a80

TBR=halcanary@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:3365

Review URL: https://codereview.chromium.org/886543005
2015-01-28 15:32:24 -08:00
robertphillips
1726997861 Remove 'f' from 0.05f in shader code
TBR=bsalomon@google.com
NOTREECHECKS=true
NOTRY=true

Review URL: https://codereview.chromium.org/888483002
2015-01-28 15:19:53 -08:00
robertphillips
2d55d07501 Add device space "nudge" to gpu draws
This CL nudges all the GPU draws and clips slightly to match raster's round behavior for BW draws. We assume the effect will be negligible and do it for AA draws too.

BUG=423834

Review URL: https://codereview.chromium.org/877473005
2015-01-28 14:41:57 -08:00
hcm
45453c2acf authors update- add herb and reorganize
BUG=skia:

TBR=reed@google.com

NOTRY=true

Review URL: https://codereview.chromium.org/881993002
2015-01-28 14:16:43 -08:00
herb
f8dd0765c0 Make char hash dynamic when needed.
BUG=skia:

Review URL: https://codereview.chromium.org/880383002
2015-01-28 14:12:12 -08:00
mtklein
d4dd58e43c DM::SKPSrc::size() reports correct size.
Also, DM::GPUSink and DM::RasterSink crop DM::Src::size() to 2048x2048.

Motivation:
  Improve PDF testing by printing the entire SKP.

Source: http://crrev.com/863243004

BUG=skia:3365

Committed: https://skia.googlesource.com/skia/+/441b10eac09a1f44983e35da827a6b438a409e63

CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu12-ShuttleA-GTX660-x86-Release-Trybot

Review URL: https://codereview.chromium.org/863243005
2015-01-28 13:59:42 -08:00
reed
40dab98de1 Use murmur3 finisher to improve font hash efficiency.
Add dump() method to inspect glyphcache strikes.

Murmur addition improves hash efficient roughly 50%

BUG=skia:

Review URL: https://codereview.chromium.org/877113002
2015-01-28 13:28:53 -08:00
sugoi
6af314724f Fixed clusterfuzz issue
BUG=448423

Review URL: https://codereview.chromium.org/881423002
2015-01-28 13:15:32 -08:00
jvanverth
5a3d92fca1 Use distance fields for glyphs > 256 pt, before switching to paths.
BUG=452313

Review URL: https://codereview.chromium.org/862403004
2015-01-28 13:08:41 -08:00
joshualitt
4d8da81562 GrBatchPrototype
BUG=skia:

Committed: https://skia.googlesource.com/skia/+/d15e4e45374275c045572b304c229237c4a82be4

Committed: https://skia.googlesource.com/skia/+/d5a7db4a867c7e6ccf8451a053d987b470099198

Review URL: https://codereview.chromium.org/845103005
2015-01-28 12:53:54 -08:00
mtklein
52f401ee7b Revert of DM::SKPSrc::size() reports correct size. (patchset #1 id:1 of https://codereview.chromium.org/863243005/)
Reason for revert:
OOM on 32-bit machines.

Original issue's description:
> DM::SKPSrc::size() reports correct size.
>
> Also, DM::GPUSink and DM::RasterSink crop DM::Src::size() to 2048x2048.
>
> Motivation:
>   Improve PDF testing by printing the entire SKP.
>
> Source: http://crrev.com/863243004
>
> BUG=skia:3365
>
> Committed: https://skia.googlesource.com/skia/+/441b10eac09a1f44983e35da827a6b438a409e63

TBR=halcanary@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:3365

Review URL: https://codereview.chromium.org/884743003
2015-01-28 12:04:08 -08:00
halcanary
fd4a993f8b DM::PDFSink::draw excercises multi-page pdf
BUG=skia:3365

Review URL: https://codereview.chromium.org/881343002
2015-01-28 11:45:58 -08:00
reed
9d91eb3136 add more checks for computing clamp counts, remove dead code
BUG=448299

Review URL: https://codereview.chromium.org/886473003
2015-01-28 11:44:48 -08:00
bsalomon
3bd12efdcf Fix wrapped content keys for npot textures.
Review URL: https://codereview.chromium.org/879193002
2015-01-28 11:39:48 -08:00
mtklein
9c3f17d6e8 Fold gmtoskp into DM, as --src gm --config skp.
BUG=skia:

Review URL: https://codereview.chromium.org/885733002
2015-01-28 11:35:18 -08:00
mtklein
441b10eac0 DM::SKPSrc::size() reports correct size.
Also, DM::GPUSink and DM::RasterSink crop DM::Src::size() to 2048x2048.

Motivation:
  Improve PDF testing by printing the entire SKP.

Source: http://crrev.com/863243004

BUG=skia:3365

Review URL: https://codereview.chromium.org/863243005
2015-01-28 11:12:25 -08:00
joshualitt
5ce33c17af dstread gm
TBR=
BUG=skia:

Review URL: https://codereview.chromium.org/883053002
2015-01-28 11:08:01 -08:00
senorblanco
772604c214 Add a flag to flush the canvases during SkMultiPictureDraw::draw().
This is necessary for multisampling, so that each multisampled render
target resolves before Chrome's compositor attempts to draw the
texture.

BUG=skia:

Review URL: https://codereview.chromium.org/878653004
2015-01-28 11:01:06 -08:00
fmalita
3dc40ac9f9 Conservative SkTextBlob bounds.
Compute cheaper/more conservative text blob bounds based on the typeface
maximum glyph bbox.

BUG=chromium:451401
R=reed@google.com,bungeman@google.com

Review URL: https://codereview.chromium.org/886473002
2015-01-28 10:56:06 -08:00
skia.buildbots
cb30bf63b5 Update SKP version
Automatic commit by the RecreateSKPs bot.

TBR=

Review URL: https://codereview.chromium.org/879203002
2015-01-28 10:56:00 -08:00
mtklein
674cd7e05c Add a script to fetch the latest SKPs.
I keep forgetting how best to do this.

NOTRY=true
NOTREECHECKS=true

Review URL: https://codereview.chromium.org/880283002
2015-01-28 09:39:10 -08:00
mtklein
073720e897 add a paranoid assert
NOTREECHECKS=true

BUG=chromium:399842

Review URL: https://codereview.chromium.org/881253003
2015-01-28 07:20:28 -08:00
joshualitt
c2893c5e38 Revert of GrBatchPrototype (patchset #32 id:630001 of https://codereview.chromium.org/845103005/)
Reason for revert:
One last try to fix mac perf regression

Original issue's description:
> GrBatchPrototype
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/d15e4e45374275c045572b304c229237c4a82be4
>
> Committed: https://skia.googlesource.com/skia/+/d5a7db4a867c7e6ccf8451a053d987b470099198

TBR=bsalomon@google.com,kkinnunen@nvidia.com,joshualitt@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/877393002
2015-01-28 06:54:30 -08:00
joshualitt
8b0a05ae44 whitespace
NOTREECHECKS=True
NOTRY=True
TBR=
BUG=skia:

Review URL: https://codereview.chromium.org/885443003
2015-01-27 16:27:12 -08:00
mtklein
25b76119e9 Revert of patch from issue 885453002 at patchset 20001 (http://crrev.com/885453002#ps20001) (patchset #1 id:1 of https://codereview.chromium.org/881953002/)
Reason for revert:
==32435==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x621000d8cd00

Lots of info here:
http://build.chromium.org/p/client.skia/builders/Test-Ubuntu13.10-GCE-NoGPU-x86_64-Debug-ASAN/builds/1198/steps/dm/logs/stdio

Original issue's description:
> patch from issue 885453002 at patchset 20001 (http://crrev.com/885453002#ps20001)
>
> Make the char cache dynamic in SkGlyphCache
> because it is rarely used.
>
> Landing on behalf of Herb.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/95faa61d63a6f62916f6f7be58c4624da8357e3b

TBR=mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/881023003
2015-01-27 15:39:19 -08:00
joshualitt
d5a7db4a86 GrBatchPrototype
BUG=skia:

Committed: https://skia.googlesource.com/skia/+/d15e4e45374275c045572b304c229237c4a82be4

Review URL: https://codereview.chromium.org/845103005
2015-01-27 15:39:06 -08:00
mtklein
95faa61d63 patch from issue 885453002 at patchset 20001 (http://crrev.com/885453002#ps20001)
Make the char cache dynamic in SkGlyphCache
because it is rarely used.

Landing on behalf of Herb.

BUG=skia:

Review URL: https://codereview.chromium.org/881953002
2015-01-27 15:10:17 -08:00
mtklein
62bd1a69ea add -r to DM
$ out/Debug/dm -w good
$ out/Debug/dm -r good -w bad && echo "hooray no diffs!"

BUG=skia:

Review URL: https://codereview.chromium.org/863093003
2015-01-27 14:46:26 -08:00
mtklein
6dbfb21a6c SSE4 opaque blend using intrinsics instead of assembly.
Since we had such a hard time with the assembly versions of this blit (to the
point that we have them completely disabled everywhere), I thought I'd take
a shot at writing a version of the blit using intrinsics.

The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
to skip the blend when the 16 src pixels we consider each loop are all opaque
or all transparent.  _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
all those alphas.

It's worth looking to see if we can backport this type of logic to SSE2 using
_mm_movemask_epi8, or up to 32 pixels at a time using AVX.

My local performance testing doesn't show this to be an unambiguous win
(there are probably microbenchmarks and SKPs where we'd be better off just
powering through the blend rather than looking at alphas), but the potential
does seem tantalizing enough to let skiaperf vet it on the bots.  (< 1.0x is a win.)

DM says it draws pixel perfect compare to the old code.

Microbenchmarks:
               bitmap_RGBA_8888_A_source_stripes_two	  14us -> 14.4us	1.03x
             bitmap_RGBA_8888_A_source_stripes_three	14.3us -> 14.5us	1.01x
                       bitmap_RGBA_8888_scale_bilerp	61.9us -> 62.2us	1.01x
bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp	 102us ->  101us	0.99x
                bitmap_RGBA_8888_scale_rotate_bilerp	 103us ->  101us	0.99x
                              bitmap_RGBA_8888_scale	18.4us -> 18.2us	0.99x
             bitmap_RGBA_8888_A_scale_rotate_bicubic	  71us ->   70us	0.99x
         bitmap_RGBA_8888_update_scale_rotate_bilerp	 103us ->  101us	0.99x
              bitmap_RGBA_8888_A_scale_rotate_bilerp	 112us ->  109us	0.98x
                    bitmap_RGBA_8888_update_volatile	5.72us -> 5.58us	0.98x
                                    bitmap_RGBA_8888	5.73us -> 5.58us	0.97x
                             bitmap_RGBA_8888_update	5.78us ->  5.6us	0.97x
                     bitmap_RGBA_8888_A_scale_bilerp	70.7us ->   68us	0.96x
                    bitmap_RGBA_8888_A_scale_bicubic	23.7us -> 21.8us	0.92x
                                  bitmap_RGBA_8888_A	13.9us -> 10.9us	0.78x
                    bitmap_RGBA_8888_A_source_opaque	  14us -> 6.29us	0.45x
               bitmap_RGBA_8888_A_source_transparent	  14us -> 3.65us	0.26x

Running over our ~70 SKP web page captures, this looks like we spend 0.7x
the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
be a decent predictor of real-world impact.

BUG=chromium:399842

Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785

CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot

Review URL: https://codereview.chromium.org/874863002
2015-01-27 14:35:18 -08:00
bungeman
8ece6eb37b Remove unused methods from SkScalerContext.
The methods getLocalMatrixWithoutTextSize and
getSingleMatrixWithoutTextSize on SkScalerContext were added as a
temporary measure for CoreText issues. Now that the CoreText
SkScalerContext is using other means to fix these issues more completely,
remove these now unused methods.

Review URL: https://codereview.chromium.org/883833002
2015-01-27 11:01:43 -08:00
bungeman
52b64b45e9 SkFontHost_FreeType takes advantage of SkStreamAsset.
With recent changes, SkTypeface now deals in SkStreamAsset instead of
SkStream. Take advantage of this for performance with FreeType.

Review URL: https://codereview.chromium.org/882763002
2015-01-27 10:41:17 -08:00
bsalomon
e167f9660c Fix GPU resource cache related assertions.
Review URL: https://codereview.chromium.org/879963003
2015-01-27 09:56:04 -08:00
djsollen
f379ad3429 Setup Android framework builds to use the appropriate shared lib defines.
Review URL: https://codereview.chromium.org/864043005
2015-01-27 09:01:01 -08:00
halcanary
f77365f43e sk_tool_utils::draw_checkerboard uses SkXfermode::kSrc_Mode to fix valgrind error.
Review URL: https://codereview.chromium.org/877103002
2015-01-27 08:38:35 -08:00
jvanverth
fdf7ccc201 Use highp for distance field texture coordinates.
Addresses issues with wavy text (probably due to low-precision aliasing)

BUG=429080

Review URL: https://codereview.chromium.org/879603003
2015-01-27 08:19:33 -08:00
reed
776c0cd955 fix gm to not rely on SkColor's swizzle == SkPMColor's
BUG=skia:3361

Review URL: https://codereview.chromium.org/873983009
2015-01-27 07:26:51 -08:00
joshualitt
ca0a1799ff Revert of GrBatchPrototype (patchset #30 id:570001 of https://codereview.chromium.org/845103005/)
Reason for revert:
creates large performance regression

Original issue's description:
> GrBatchPrototype
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/d15e4e45374275c045572b304c229237c4a82be4

TBR=bsalomon@google.com,joshualitt@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/862823004
2015-01-27 06:41:33 -08:00
robertphillips
7defaa6c4a Add ClipDrawMatch SampleApp slide
This slide can be used to find and diagnose discrepancies between BW clipping and drawing.

BUG=skia:423834

Review URL: https://codereview.chromium.org/872363003
2015-01-27 06:17:22 -08:00
bungeman
5f213d9627 SkTypeface to use SkStreamAsset.
SkTypeface already requires typeface streams to support SkStreamAsset
in practice, and in practice all users are already supplying them.

Review URL: https://codereview.chromium.org/869763002
2015-01-27 05:39:10 -08:00
reed
dc14fa4ec7 speedup mipmap building
mipmap_build benchmark:

before: 3.36ms
after:    2.20ms

BUG=skia:

Review URL: https://codereview.chromium.org/873393002
2015-01-27 05:01:50 -08:00
kkinnunen
36c57dfb4f Make stencil buffers uncached for uncached render target textures
Make new stencil buffers of uncached render target textures not affect the
cache budgets. This is consistent with render buffer storage of uncached
render target textures.

Affects only newly created stencil buffers. An uncached render target
might still receive a cached stencil buffer if such is available from
cache.

BUG=skia:3119
BUG=skia:3301

Review URL: https://codereview.chromium.org/859013002
2015-01-27 00:30:18 -08:00
skia.buildbots
7f7036ab3f Update SKP version
Automatic commit by the RecreateSKPs bot.

TBR=

Review URL: https://codereview.chromium.org/882613002
2015-01-26 23:08:55 -08:00
mtklein
f7069d58fc Split src/opts source lists out of opts.gyp.
This should make it easier to keep our opts.gyp in sync with Chrome's GYP and GN.

BUG=skia:

Landing this without review as a mega-tryjob.
TBR=reed@google.com

Committed: https://skia.googlesource.com/skia/+/c98fe3aa4f8c97c462c0eb6d9106fc37e48d7f82

Review URL: https://codereview.chromium.org/870353003
2015-01-26 18:55:58 -08:00
mtklein
0933725e49 Revert of Split src/opts source lists out of opts.gyp. (patchset #1 id:1 of https://codereview.chromium.org/870353003/)
Reason for revert:
Android Makefiles broken

Original issue's description:
> Split src/opts source lists out of opts.gyp.
>
> This should make it easier to keep our opts.gyp in sync with Chrome's GYP and GN.
>
> BUG=skia:
>
> Landing this without review as a mega-tryjob.
> TBR=reed@google.com
>
> Committed: https://skia.googlesource.com/skia/+/c98fe3aa4f8c97c462c0eb6d9106fc37e48d7f82

TBR=mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/880783002
2015-01-26 18:15:31 -08:00
mtklein
c98fe3aa4f Split src/opts source lists out of opts.gyp.
This should make it easier to keep our opts.gyp in sync with Chrome's GYP and GN.

BUG=skia:

Landing this without review as a mega-tryjob.
TBR=reed@google.com

Review URL: https://codereview.chromium.org/870353003
2015-01-26 18:05:37 -08:00
bungeman
2d80dd2647 Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #14 id:260001 of https://codereview.chromium.org/874863002/)
Reason for revert:
This kills Mac 10.6 bots.

FAILED: c++ -MMD -MF obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o.d -DSK_INTERNAL -DSK_GAMMA_SRGB -DSK_GAMMA_APPLY_TO_A8 -DSK_SCALAR_TO_FLOAT_EXCLUDED -DSK_ALLOW_STATIC_GLOBAL_INITIALIZERS=1 -DSK_SUPPORT_GPU=1 -DSK_SUPPORT_OPENCL=0 -DSK_FORCE_DISTANCE_FIELD_TEXT=0 -DSK_BUILD_FOR_MAC -DSK_CRASH_HANDLER -DSK_DEVELOPER=1 -I../../src/core -I../../src/utils -I../../include/c -I../../include/config -I../../include/core -I../../include/pathops -I../../include/pipe -I../../include/utils/mac -I../../include/effects -O0 -gdwarf-2 -mmacosx-version-min=10.6 -arch x86_64 -mssse3 -Wall -Wextra -Winit-self -Wpointer-arith -Wsign-compare -Wno-unused-parameter -Wno-invalid-offsetof -msse4.1  -c ../../src/opts/SkBlitRow_opts_SSE4.cpp -o obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o
../../src/opts/SkBlitRow_opts_SSE4.cpp:15:27: warning: x86intrin.h: No such file or directory
../../src/opts/SkBlitRow_opts_SSE4.cpp: In function 'void S32A_Opaque_BlitRow32_SSE4(SkPMColor*, const SkPMColor*, int, U8CPU)':
../../src/opts/SkBlitRow_opts_SSE4.cpp:40: error: '_mm_testz_si128' was not declared in this scope
../../src/opts/SkBlitRow_opts_SSE4.cpp:45: error: '_mm_testc_si128' was not declared in this scope

Original issue's description:
> SSE4 opaque blend using intrinsics instead of assembly.
>
> Since we had such a hard time with the assembly versions of this blit (to the
> point that we have them completely disabled everywhere), I thought I'd take
> a shot at writing a version of the blit using intrinsics.
>
> The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
> to skip the blend when the 16 src pixels we consider each loop are all opaque
> or all transparent.  _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
> all those alphas.
>
> It's worth looking to see if we can backport this type of logic to SSE2 using
> _mm_movemask_epi8, or up to 32 pixels at a time using AVX.
>
> My local performance testing doesn't show this to be an unambiguous win
> (there are probably microbenchmarks and SKPs where we'd be better off just
> powering through the blend rather than looking at alphas), but the potential
> does seem tantalizing enough to let skiaperf vet it on the bots.  (< 1.0x is a win.)
>
> DM says it draws pixel perfect compare to the old code.
>
> Microbenchmarks:
>                bitmap_RGBA_8888_A_source_stripes_two	  14us -> 14.4us	1.03x
>              bitmap_RGBA_8888_A_source_stripes_three	14.3us -> 14.5us	1.01x
>                        bitmap_RGBA_8888_scale_bilerp	61.9us -> 62.2us	1.01x
> bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp	 102us ->  101us	0.99x
>                 bitmap_RGBA_8888_scale_rotate_bilerp	 103us ->  101us	0.99x
>                               bitmap_RGBA_8888_scale	18.4us -> 18.2us	0.99x
>              bitmap_RGBA_8888_A_scale_rotate_bicubic	  71us ->   70us	0.99x
>          bitmap_RGBA_8888_update_scale_rotate_bilerp	 103us ->  101us	0.99x
>               bitmap_RGBA_8888_A_scale_rotate_bilerp	 112us ->  109us	0.98x
>                     bitmap_RGBA_8888_update_volatile	5.72us -> 5.58us	0.98x
>                                     bitmap_RGBA_8888	5.73us -> 5.58us	0.97x
>                              bitmap_RGBA_8888_update	5.78us ->  5.6us	0.97x
>                      bitmap_RGBA_8888_A_scale_bilerp	70.7us ->   68us	0.96x
>                     bitmap_RGBA_8888_A_scale_bicubic	23.7us -> 21.8us	0.92x
>                                   bitmap_RGBA_8888_A	13.9us -> 10.9us	0.78x
>                     bitmap_RGBA_8888_A_source_opaque	  14us -> 6.29us	0.45x
>                bitmap_RGBA_8888_A_source_transparent	  14us -> 3.65us	0.26x
>
> Running over our ~70 SKP web page captures, this looks like we spend 0.7x
> the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
> be a decent predictor of real-world impact.
>
> BUG=chromium:399842
>
> Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785

TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:399842

Review URL: https://codereview.chromium.org/874033004
2015-01-26 14:32:09 -08:00
bungeman
6bdc9cd003 Add sbix font to coloremoji gm.
Review URL: https://codereview.chromium.org/797043002
2015-01-26 14:08:52 -08:00