skia2/gyp
mtklein 04bc91b972 SSE4 opaque blend using intrinsics instead of assembly.
Since we had such a hard time with the assembly versions of this blit (to the
point that we have them completely disabled everywhere), I thought I'd take
a shot at writing a version of the blit using intrinsics.

The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*)
to skip the blend when the 16 src pixels we consider each loop are all opaque
or all transparent.  _mm_shuffle_epi8 from SSSE3 also lends a hand to extract
all those alphas.

It's worth looking to see if we can backport this type of logic to SSE2 using
_mm_movemask_epi8, or up to 32 pixels at a time using AVX.

My local performance testing doesn't show this to be an unambiguous win
(there are probably microbenchmarks and SKPs where we'd be better off just
powering through the blend rather than looking at alphas), but the potential
does seem tantalizing enough to let skiaperf vet it on the bots.  (< 1.0x is a win.)

DM says it draws pixel perfect compare to the old code.

Microbenchmarks:
               bitmap_RGBA_8888_A_source_stripes_two	  14us -> 14.4us	1.03x
             bitmap_RGBA_8888_A_source_stripes_three	14.3us -> 14.5us	1.01x
                       bitmap_RGBA_8888_scale_bilerp	61.9us -> 62.2us	1.01x
bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp	 102us ->  101us	0.99x
                bitmap_RGBA_8888_scale_rotate_bilerp	 103us ->  101us	0.99x
                              bitmap_RGBA_8888_scale	18.4us -> 18.2us	0.99x
             bitmap_RGBA_8888_A_scale_rotate_bicubic	  71us ->   70us	0.99x
         bitmap_RGBA_8888_update_scale_rotate_bilerp	 103us ->  101us	0.99x
              bitmap_RGBA_8888_A_scale_rotate_bilerp	 112us ->  109us	0.98x
                    bitmap_RGBA_8888_update_volatile	5.72us -> 5.58us	0.98x
                                    bitmap_RGBA_8888	5.73us -> 5.58us	0.97x
                             bitmap_RGBA_8888_update	5.78us ->  5.6us	0.97x
                     bitmap_RGBA_8888_A_scale_bilerp	70.7us ->   68us	0.96x
                    bitmap_RGBA_8888_A_scale_bicubic	23.7us -> 21.8us	0.92x
                                  bitmap_RGBA_8888_A	13.9us -> 10.9us	0.78x
                    bitmap_RGBA_8888_A_source_opaque	  14us -> 6.29us	0.45x
               bitmap_RGBA_8888_A_source_transparent	  14us -> 3.65us	0.26x

Running over our ~70 SKP web page captures, this looks like we spend 0.7x
the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should
be a decent predictor of real-world impact.

BUG=chromium:399842

Review URL: https://codereview.chromium.org/874863002
2015-01-26 14:06:43 -08:00
..
android_deps.gyp Reland "Gyp file changes for the android framework." 2014-02-05 16:35:12 +00:00
android_framework_lib.gyp Updates to gyp files for building Android.mk 2014-02-28 16:07:39 +00:00
android_output.gyp Change how SkDebugf is sent to stdout on Android. 2014-12-10 10:23:06 -08:00
android_system.gyp Update DEPS and GYP to use the copy of Android in platform_tools. 2013-05-02 12:27:21 +00:00
angle.gyp Try to roll angle again. 2014-06-17 17:28:17 -04:00
animator.gyp rename SkDrawable to SkADrawable 2014-11-26 08:50:45 -08:00
apptype_console.gypi Change how SkDebugf is sent to stdout on Android. 2014-12-10 10:23:06 -08:00
bench.gyp Revert of nanobench: lazily decode bitmaps in .skps. (patchset #1 id:1 of https://codereview.chromium.org/743613005/) 2014-11-25 14:57:26 -08:00
bench.gypi add bench for building mipmaps 2015-01-26 12:28:54 -08:00
canvas_state_lib.gyp Run CanvasState test across a library boundary. 2014-07-22 12:38:55 -07:00
chromeos_deps.gyp GYP Changes and Scripts for Compiling Skia for ChromeOS 2013-06-11 15:52:19 +00:00
common_conditions.gypi Update compiler warning flags 2015-01-23 07:01:26 -08:00
common_variables.gypi WAE on Macs too. That leaves only Android framework builds. 2014-12-15 10:38:42 -08:00
common.gypi remove dead code for scalar type 2015-01-18 11:19:33 -08:00
core.gyp Cleanup the XML directory in public includes. 2014-11-14 05:52:50 -08:00
core.gypi initial preroll api 2015-01-25 10:33:58 -08:00
debugger.gyp debugger: Make draw command image widget resize 2014-12-30 23:03:56 -08:00
dm.gyp Revert of Make nanobench and dm be usable from Chromium build (patchset #5 id:80001 of https://codereview.chromium.org/657373002/) 2014-11-13 08:06:40 -08:00
dm.gypi More natural way to serialize GPU tasks and tests. 2015-01-21 15:50:13 -08:00
effects.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
effects.gypi Revert of remove unused SkAvoidXfermode (patchset #2 id:20001 of https://codereview.chromium.org/860583002/) 2015-01-20 06:33:14 -08:00
etc1.gyp Simple PKM image decoder. 2014-05-22 18:40:29 +00:00
everything.gyp Cleanup: Delete webtry.gyp 2014-11-20 18:20:06 -08:00
experimental.gyp experimental/skp_to_pdf_md5 optionally also outputs pdf files 2015-01-24 13:04:57 -08:00
FileReaderApp.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
flags.gyp tool --help alphabetizes command line flags 2015-01-18 10:39:25 -08:00
freetype.gyp Clean up FreeType code for 2.3.8. 2014-11-21 13:18:34 -08:00
freetype.gypi Sanitizing source files in Housekeeper-Nightly 2013-08-21 07:01:29 +00:00
giflib.gyp Sanitizing source files in Housekeeper-Nightly 2014-02-25 03:05:18 +00:00
gmslides.gypi GrBatchPrototype 2015-01-26 13:30:10 -08:00
gpu.gyp Apply the layer's image filter to the hoisted image 2014-12-11 08:20:31 -08:00
gpu.gypi GrBatchPrototype 2015-01-26 13:30:10 -08:00
gputest.gyp Cleanup GrContextFactory and make it's subclasses private 2014-11-13 11:12:41 -08:00
images.gyp add ImageGenerator::NewFromData to porting layer 2015-01-07 18:04:45 -08:00
iOSShell.gyp Revert of Fix build for iOS after "Make nanobench and dm be usable from Chromium build" (patchset #1 id:1 of https://codereview.chromium.org/716413003/) 2014-11-13 07:58:01 -08:00
jsoncpp.gyp Roll jsoncpp, drop dependency on Chromium overrides. 2014-08-19 07:21:00 -07:00
ktx.gyp Pass compressed blitters to our mask drawing algorithm 2014-08-07 08:15:14 -07:00
libjpeg.gyp Build Skia for a bare-bones embedded Linux system. 2014-02-24 20:22:34 +00:00
libpng.gyp Build Skia for a bare-bones embedded Linux system. 2014-02-24 20:22:34 +00:00
libwebp.gyp Rolling libwebp broke our iOS builds. Silence warnings instead. 2014-12-15 12:59:07 -08:00
lua.gyp Build Skia for a bare-bones embedded Linux system. 2014-02-24 20:22:34 +00:00
most.gyp Revert "Revert "delete old things!"" 2015-01-20 10:23:02 -08:00
nacl.gyp Prepare skia for shared library build on android 2013-06-03 12:10:19 +00:00
nanomsg.gyp Silence warnings from libnanomsg on Mac like we do on Linux. 2014-12-15 12:24:47 -08:00
opts.gyp SSE4 opaque blend using intrinsics instead of assembly. 2015-01-26 14:06:43 -08:00
pathops_skpclip.gyp Turn SkTaskGroups back on. 2014-11-03 17:41:08 -08:00
pathops_unittest.gyp Turn SkTaskGroups back on. 2014-11-03 17:41:08 -08:00
pathops_unittest.gypi These tests stress pathops by describing the union of circle-like paths that have tiny line segments embedded and double back to create near-coincident conditions. 2014-11-13 06:58:52 -08:00
pathops.gypi path ops work in progress 2013-09-16 15:55:01 +00:00
pdf.gyp revert buildbot breaker 2015-01-07 07:36:52 -08:00
pdf.gypi SkPDFCanon 2015-01-21 09:59:14 -08:00
pdfviewer_lib.gyp move some headers out of public 2014-06-17 09:04:45 -07:00
pdfviewer.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
pixman_test.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
poppler.gyp Since we're only using it on Linux now, just require poppler as a system dependency. 2014-05-20 15:07:53 +00:00
ports.gyp Move sync code to include/, switch from using platform define to a proxy header in core/ 2015-01-21 13:13:31 -08:00
SampleApp.gyp s/sk_tools::DrawCheckerboard/sk_tool_utils::draw_checkerboard/ 2015-01-26 12:49:00 -08:00
sfnt.gyp Better rendering detection with DirectWrite. 2014-06-23 08:29:23 -07:00
shapeops_demo.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
shapeops_edge.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
shapeops_tool.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
SimpleCocoaApp.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
SimpleiOSApp.gyp get iOS building again 2014-04-05 01:13:43 +00:00
skflate.gyp Build Skia for a bare-bones embedded Linux system. 2014-02-24 20:22:34 +00:00
skia_for_android_framework_defines.gypi remove unnecessary guard flags for android (for conics) 2015-01-22 13:41:00 -08:00
skia_for_chromium_defines.gypi remove flags that are now in chrome's SkUserConfig.h 2015-01-08 10:20:05 -08:00
skia_launcher.gyp Move BenchTimer to tools as Timer 2014-06-20 11:29:21 -07:00
skia_lib.gyp Force linking as C++ library. 2014-08-04 12:51:20 -07:00
svg.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
tests.gypi Remove GrBinHashKey 2015-01-23 06:46:16 -08:00
tools.gyp s/sk_tools::DrawCheckerboard/sk_tool_utils::draw_checkerboard/ 2015-01-26 12:49:00 -08:00
utils.gyp Cleanup the XML directory in public includes. 2014-11-14 05:52:50 -08:00
utils.gypi remove (unused) GatherPixelRefs 2015-01-22 09:03:25 -08:00
v8.gyp SkV8Sample: Now with Path2D and Path2DBuilder. 2014-10-29 05:33:28 -07:00
views_animated.gyp Remove dependency of views on angle 2014-04-29 00:38:39 +00:00
views.gyp Cleanup the XML directory in public includes. 2014-11-14 05:52:50 -08:00
xml.gyp Cleanup the XML directory in public includes. 2014-11-14 05:52:50 -08:00
xps.gyp Remove the comments settings for vim tab width and expansion variables. 2013-12-02 22:23:03 +00:00
zlib.gyp Fix valgrind bot errors introduced in f84722e477. 2014-02-25 18:01:37 +00:00