skia2

Author	SHA1	Message	Date
Mike Klein	be0bd92561	more easy lowp shader stages This fills out a couple more matrix and gather stages. Deletes a not particularly important unit test that was using a scale matrix in a weird, non-lowp compatible way. This will require guards for Blink layout tests. Change-Id: I54cb228ff541f771e8f4758f07d26c5161d48af3 Reviewed-on: https://skia-review.googlesource.com/62520 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-10-24 17:31:51 +00:00
Mike Klein	85f8536ca2	Feed seed_shader() iota through a context pointer. As this array grows longer it causes troublesome code generation when we're compiling offline, but it's easy as an argument. Change-Id: I53526443f534f29d3bff17c3aec24a9e916c9b86 Reviewed-on: https://skia-review.googlesource.com/60564 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>	2017-10-18 16:17:55 +00:00
Mike Klein	f3b4e16c36	Fold clamp_{x,y} into the gathers. All three image tile modes go through exclusive_clamp() and then a gather today, so we can move the work of exclusive_clamp() into eac gather_ stage, eliminating the need for clamp_{x,y} stages. Luckily, we've got a convenient place to bottleneck this, ptr_and_ix(), which works out the pointer and vector of indices to load for gathers. This deletes SkRasterPipeline_repeat_tiling unit test, which now no longer exactly makes sense. It tests that repeat_x does that clamp, but that's now done automatically outside that stage. Change-Id: I24637ef60921bec7aa00082984c0c6a49dd86ca9 Reviewed-on: https://skia-review.googlesource.com/50260 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Reed <reed@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>	2017-09-22 22:06:08 +00:00
Mike Klein	f757ae6aa7	Retry, Bump stored lowp uniform color to 16-bit storage. This makes loading into 16-bit channels more natural in _lowp.cpp. Update a unit test to stop using out-of-range "colors". Change-Id: I494687aac87948b60a40de447aa1527cf7167b2d Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-UBSAN_float_cast_overflow Reviewed-on: https://skia-review.googlesource.com/47580 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-09-16 02:25:23 +00:00
Mike Klein	16776dfb4b	funnel all constant colors through append_constant_color() My next step is to change the uniform_color context to struct { float r,g,b,a; uint32_t rgba; }; so that it's trivial to load in both float and 8-bit pipelines. Change-Id: If9bdde353ced3bf9eb0c63204b4770ed614ad16b Reviewed-on: https://skia-review.googlesource.com/30481 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-08-03 15:37:37 +00:00
Mike Klein	249ee1f985	clamp to 0 in repeat and mirror image tilers If we were doing this math with real numbers or even just doubles, these clamps wouldn't be necessary. But we're favoring speed over accuracy here when we emulate fmod() and some of those inaccuracies end up with values outside the [0,tile) range, negative! To keep the spirit of fast over 100% accurate, I've just added a safety clamp to 0. The case in the unit test now returns 0 where it should really return something like 7 or 8, but at least we won't try to read _way_ outside the image buffer. BUG=chromium:749260 Change-Id: Ifc5cfe69798beccbb2a16547510158576e06eb3a Reviewed-on: https://skia-review.googlesource.com/29580 Reviewed-by: Florin Malita <fmalita@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-08-01 19:15:33 +00:00
Mike Klein	45c16fa82c	convert over to 2d-mode [√] convert all stages to use SkJumper_MemoryCtx / be 2d-compatible [√] convert compile to 2d also, remove 1d run/compile [√] convert all call sites [√] no diffs Change-Id: I3b806eb8fe0c3ec043359616409f7cd1211a1e43 Reviewed-on: https://skia-review.googlesource.com/24263 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Florin Malita <fmalita@chromium.org>	2017-07-20 19:50:32 +00:00
Mike Klein	968af43f72	remove gather_i8, unify memory-touching contexts gather_i8 is now unused, so we can remove it. That in turn makes the ctable field of SkJumper_GatherCtx unused. After removing ctable, SkJumper_GatherCtx and SkJumper_PtrStride look identical, so I've now fused them into SkJumper_MemoryCtx, which will eventually be used by everything loading from, gathering from, or storing to memory. Change-Id: Ia882d2dbd54c9fcf9a8250a1ce83304389dd284a Reviewed-on: https://skia-review.googlesource.com/24085 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-07-18 21:22:34 +00:00
Mike Klein	3b92b6907a	start on raster pipeline 2d mode - Add run_2d(x,y,w,h) and start_pipeline_2d(). - Add and test a 2d-compatible store_8888_2d stage. Change-Id: Ib9c225d1b8cb40471ae4333df1d06eec4d506f8a Reviewed-on: https://skia-review.googlesource.com/24401 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Florin Malita <fmalita@chromium.org>	2017-07-18 18:34:04 +00:00
Mike Klein	c2d207603e	clean up low-hanging swap_rb There are two remaining swap_rb uses that both look non-trivial to replace: - sampling out of index8 when the color table is bgra - table transforms on bgra inputs in SkColorSpaceXform I don't think it's a big deal to just leave swap_rb around, just a little sad. Change-Id: I3d30200cf867cbf37d6f86572b1574d3e22e3490 Reviewed-on: https://skia-review.googlesource.com/21040 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-06-28 17:41:56 +00:00
Mike Klein	9f2b3d1fbf	remove unused "swap" stage Change-Id: I25619f010f8ac6441529cfe8dff2d8c42d7400cf Reviewed-on: https://skia-review.googlesource.com/20988 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-06-27 21:50:53 +00:00
Mike Klein	bba02c2031	start on SkJumper lowp mode Just 3 stages implemented so far: load_8888 swap_rb store_8888 That's enough to make the shortest non-trivial pipeline that you see in the new unit test. Change-Id: Iabf90866ab452f7183d8c8dec1405ece2db695dc Reviewed-on: https://skia-review.googlesource.com/18458 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Herb Derby <herb@google.com>	2017-06-04 22:50:14 +00:00
Mike Klein	761d27c4d7	update SkRasterPipeline::run() to also take y y isn't used yet. This is just a warmup that updates the callers. Change-Id: I78f4f44e2b82f72b3a39fa8a8bdadef1d1b8a99e Reviewed-on: https://skia-review.googlesource.com/18381 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-06-01 17:05:13 +00:00
Herb Derby	84dcac3292	Add aarch64 tail code. Change-Id: I25f029604a04f5fc6c249a3817b0dd84379071be Reviewed-on: https://skia-review.googlesource.com/18149 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>	2017-05-30 22:04:31 +00:00
Herb Derby	f81c56f3c0	Add arm tail code. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Android-Clang-Nexus10-CPU-Exynos5250-arm-Release-Android Change-Id: Ia0e9f32d0324e66c9d4812dbb156a2b858d49a13 Reviewed-on: https://skia-review.googlesource.com/18127 Commit-Queue: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>	2017-05-30 19:39:44 +00:00
Herb Derby	d1f08302aa	Add tests for tail handling in SkJumper. Change-Id: Ib4ecc33dc9552c16b5530359cd3649487e70bbed Reviewed-on: https://skia-review.googlesource.com/18067 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Herb Derby <herb@google.com>	2017-05-26 20:41:04 +00:00
Mike Klein	b24704d35f	SkRasterPipeline in SkArenaAlloc Bug: skia:6673 Change-Id: Ia2bae4f6a9039a007a10b6b45bcf2f0854bf6e5c Reviewed-on: https://skia-review.googlesource.com/17794 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-05-24 14:54:15 +00:00
Mike Klein	8729e5bbf7	Simplify more: remove SkRasterPipeline::compile(). It's easier to work on SkJumper if everything funnels through run(). I don't anticipate huge benefit from compile() without JITing, but it's something we can always put back if we find a need. Change-Id: Id5256fd21495e8195cad1924dbad81856416d913 Reviewed-on: https://skia-review.googlesource.com/8468 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>	2017-02-16 12:23:06 +00:00
Mike Klein	319ba3d3a1	Move shader register setup to SkRasterPipelineBlitter. We've been seeding the initial values of our registers to x+0.5,y+0.5, 1,0, 0,0,0,0 (useful values for shaders to start with) in all pipelines. This CL changes that to do so only when blitting, and only when we have a shader. The nicest part of this change is that SkRasterPipeline itself no longer needs to have a concept of y, or what x means. It just marches x through [x,x+n), and the blitter handles y and layers the meaning of "dst x coordinate" onto x. This ought to make SkSplicer a little easier to work with too. dm --src gm --config f16 srgb 565 all draws the same. Change-Id: I69d8c1cc14a06e5dfdd6a7493364f43a18f8dec5 Reviewed-on: https://skia-review.googlesource.com/7353 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-01-23 20:00:23 +00:00
Mike Klein	7ba89a15fc	SkSplicer: test and fix loop logic The new test would fail without the the change in SkSplicer.cpp to call fSpliced(x,x+body) instead of fSpliced(x,body). The rest of the changes are cosmetic, mostly renaming n to limit. Change-Id: Iae28802d0adb91e962ed3ee60fa5a4334bd140f9 Reviewed-on: https://skia-review.googlesource.com/6837 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-01-10 19:20:22 +00:00
Mike Klein	7f71d8845c	SkXbyak basics A little JIT proof of concept for SkRasterPipeline, using xbyak, which is a header-only assembler. It's x86-only, but supports x86 very thoroughly, and it's very user friendly (at least as far as assembler libraries go...). CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ie17e562b0f3fff5914041badfb2c1fe4f86efab8 Reviewed-on: https://skia-review.googlesource.com/5730 Reviewed-by: Herb Derby <herb@google.com> Reviewed-by: Heather Miller <hcm@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-01-06 17:45:41 +00:00
Mike Klein	8c8cb5bfc5	simplify by removing _d stages CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I75e232faee6ad48f65bac5b119a461280b27bbc8 Reviewed-on: https://skia-review.googlesource.com/6661 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2017-01-06 16:14:00 +00:00
Mike Klein	c789b61167	Bring back SkRasterPipeline::run() for one-off uses. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: I308b6d75f2987a667eead9a55760a2ff6aec2984 Reviewed-on: https://skia-review.googlesource.com/5353 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>	2016-11-30 20:16:49 +00:00
Mike Klein	729b582962	Consistent naming. For stages that have {r,g,b,a} and {dr,dg,db,da} versions, name the {r,g,b,a} one "foo" and the {dr,dg,db,da} on "foo_d". The {r,g,b,a} registers are the ones most commonly used and fastest, so they get short ordinary names, and the d-registers are less commonly used and sometimes slower, so they get a suffix. Some stages naturally opearate on all 8 registers (the xfermodes, accumulate). These names for those look fine and aren't ambiguous. Also, a bit more re-arrangement in _opts.h. CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD Change-Id: Ia20029247642798a60a2566e8a26b84ed101dbd0 Reviewed-on: https://skia-review.googlesource.com/5291 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-11-29 00:29:32 +00:00
Mike Klein	af49b19582	Start each pipeline with (x,y) in (dr,dg) registers for the shader. Image shaders need to do some geometry work before sampling the image colors: 1) determine dst coordinates 2) map back to src coordinates 3) tiling Feeding (x,y) through as (dr,dg) registers makes step 1) easy, perhaps trivial, while leaving (r,g,b,a) with their usual meanings, "the color", starting with the paint color. This is easy to tweak into something like (x+0.5, y+0.5, 1) in (dr,dg,db) once this lands. Mostly I just want to get all the uninteresting boilerplate out of the way first. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=4791 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: Ia07815d942ded6672dc1df785caf80a508fc8f37 Reviewed-on: https://skia-review.googlesource.com/4791 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-11-15 14:48:57 +00:00
Mike Klein	bd3fe475b8	Convert SkRasterPipeline loads and stores to indirect. This allows us to change the underlying pointer without rebuilding the pipeline, e.g. when moving the blitter from scanline to scanline. The extra overhead when not needed is measurable but small, <2%. We can always add back direct stages later for cases where we know the context pointer will not change. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3943 Change-Id: I827d7e6e4e67d02dd2802610f898f98c5f36f8cb Reviewed-on: https://skia-review.googlesource.com/3943 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-10-26 18:15:03 +00:00
Mike Klein	e9f74b89c0	SkRasterPipeline::compile(). I'm not yet caching these in the blitter, and speed is essentially unchanged in the bench where I am now building and compiling the pipeline only once. This may not be able to stay a simple std::function after I figure out caching, but for now it's a nice fit. GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3911 Change-Id: I9545af589f73baf9f17cb4e6ace9a814c2478fe9 Reviewed-on: https://skia-review.googlesource.com/3911 Reviewed-by: Herb Derby <herb@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-10-25 18:04:16 +00:00
Mike Klein	26bea5d557	Make test lower-level, make const_cast more visible. I can only think there's something funky going on with the hidden const_cast inside SkRasterPipeline.cpp, or with the Sk4h -> Sk4f conversion. So make the const_cast visible and write the test directly in halfs instead of floats. CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release-GN-Trybot BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3003 Change-Id: I3a7e19ae165fce44f177b4968a63ec04e25c4b93 Reviewed-on: https://skia-review.googlesource.com/3003 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-10-05 16:52:58 +00:00
Mike Klein	d0ccb57ec4	looks like red and blue start wrong (more unit test debugging) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3001 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release-GN-Trybot Change-Id: I8d26b5484a2bf67d5d5891475640970046e470d8 Reviewed-on: https://skia-review.googlesource.com/3001 Commit-Queue: Mike Klein <mtklein@chromium.org> Reviewed-by: Mike Klein <mtklein@chromium.org>	2016-10-05 13:56:40 +00:00
Mike Klein	c876e99d4c	initialize result in SkRasterPipelineTest... more debugging BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2982 Change-Id: I7c40766c3df61fb66f63e17a2da61219d1d95d1b Reviewed-on: https://skia-review.googlesource.com/2982 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-10-05 13:06:50 +00:00
Mike Klein	85a45d93ed	more debugging for this test. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2965 Change-Id: I583bcdeb199212840ff227942ab6ebbf57a481d7 Reviewed-on: https://skia-review.googlesource.com/2965 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-10-05 00:06:52 +00:00
Mike Klein	433b306a70	Revert "Debug Mac test failure." This reverts commit I0ed569b585f4962a90a0b6993acc484a74055177. Reason for revert: Mac bots look okay now. I am puzzled. Original change's description: > Debug Mac test failure. > > CQ_EXTRA_TRYBOTS=master.client.skia:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release-GN-Trybot > TBR= > > BUG=skia: > > GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2933 > > Change-Id: I0ed569b585f4962a90a0b6993acc484a74055177 > Reviewed-on: https://skia-review.googlesource.com/2933 > Reviewed-by: Mike Klein <mtklein@chromium.org> > Commit-Queue: Mike Klein <mtklein@chromium.org> > TBR=mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true Change-Id: I4feef1f58172d1cc649bc311c9e6d3c86f915617 Reviewed-on: https://skia-review.googlesource.com/2963 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-10-04 21:46:55 +00:00
Mike Klein	67dd94a263	Debug Mac test failure. CQ_EXTRA_TRYBOTS=master.client.skia:Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release-GN-Trybot TBR= BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2933 Change-Id: I0ed569b585f4962a90a0b6993acc484a74055177 Reviewed-on: https://skia-review.googlesource.com/2933 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-10-04 19:25:10 +00:00
Mike Klein	9161ef012f	Make all SkRasterPipeline stages stock stages in SkOpts. If we want to support VEX-encoded instructions (AVX, F16C, etc.) without a ridiculous slowdown, we need to make sure we're running either all VEX-encoded instructions or all non-VEX-encoded instructions. That means we cannot mix arbitrary user-defined SkRasterPipeline::Fn (never VEX) with those living in SkOpts (maybe VEX)... it's SkOpts or bust. This ports the existing user-defined SkRasterPipeline::Fn use cases over to use stock stages from SkOpts. I rewrote the unit test to use stock stages, and moved the SkXfermode implementations to SkOpts. The code deleted for SkArithmeticMode_scalar should already be dead. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2940 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I94dbe766b2d65bfec6e544d260f71d721f0f5cb0 Reviewed-on: https://skia-review.googlesource.com/2940 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Mike Reed <reed@google.com>	2016-10-04 18:33:52 +00:00
Mike Klein	baaf8ad952	Start moving SkRasterPipeline stages to SkOpts. This lets them pick up runtime CPU specializations. Here I've plugged in SSE4.1. This is still one of the N prelude CLs to full 8-at-a-time AVX. I've moved the union of the stages used by SkRasterPipelineBench and SkRasterPipelineBlitter to SkOpts... they'll all be used by the blitter eventually. Picking up SSE4.1 specialization here (even still just 4 pixels at a time) is a significant speedup, especially to store_srgb(), so much that it's no longer really interesting to compare against the fused-but-default-instruction-set version in the bench. So that's gone now. That left the SkRasterPipeline unit test as the only other user of the EasyFn simplified interface to SkRasterPipeline. So I converted that back down to the bare-metal interface, and EasyFn and its friends became SkRasterPipeline_opts.h exclusive abbreviations (now called Kernel_Sk4f). This isn't really unexpected: SkXfermode also wanted to build up its own little abstractions, and once you build your own abstraction, the value of an additional EasyFn-like layer plummets to negative. For simplicity I've left the SkXfermode stages alone, except srcover() which was always part of the blitter. No particular reason except keeping the churn down while I hack. These _can_ be in SkOpts, but don't have to be until we go 8-at-a-time. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2752 CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Change-Id: I3b476b18232a1598d8977e425be2150059ab71dc Reviewed-on: https://skia-review.googlesource.com/2752 Reviewed-by: Mike Klein <mtklein@chromium.org> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-09-29 16:20:26 +00:00
Mike Klein	c8dd6bc3e7	Rearrange SkRasterPipeline scanline tail handling. We used to step at a 4-pixel stride as long as possible, then run up to 3 times, one pixel at a time. Now replace those 1-at-a-time runs with a single tail stamp if there are 1-3 remaining pixels. This style is simply more efficient: e.g. we'll blend and lerp once for 3 pixels instead of 3 times. This should make short blits significantly more efficient. It's also more future-oriented... AVX+ on Intel and SVE on ARM support masked loads and stores, so we can do the entire tail in one direct step. This also makes it possible to re-arrange the code a bit to encapsulate each stage better. I think generally this code reads more clearly than the old code, but YMMV. I've arranged things so you write one function, but it's compiled into two specializations, one for tail=0 (Body) and one for tail>0 (Tail). It's pretty tidy. For now I've just burned a register to pass around tail. It's 2 bits now, maybe soon 3 with AVX, and capped at 4 for even the craziest new toys, so there are plenty of places we can pack it if we want to get clever. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2717 Change-Id: I45852a3e5d4c5b5e9315302c46601aee0d32265f Reviewed-on: https://skia-review.googlesource.com/2717 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>	2016-09-28 15:28:24 +00:00
mtklein	fe2042e60f	SkRasterPipeline: new APIs for fusion Most visibly this adds a macro SK_RASTER_STAGE that cuts down on the boilerplate of defining a raster pipeline stage function. Most interestingly, SK_RASTER_STAGE doesn't define a SkRasterPipeline::Fn, but rather a new type EasyFn. This function is always static and inlined, and the details of interacting with the SkRasterPipeline::Stage are taken care of for you: ctx is just passed as a void*, and st->next() is always called. All EasyFns have to do is take care of the meat of the work: update r,g,b, etc. and read and write from their context. The really neat new feature here is that you can either add EasyFns to a pipeline with the new append() functions, _or_ call them directly yourself. This lets you use the same set of pieces to build either a pipelined version of the function or a custom, fused version. The bench shows this off. On my desktop, the pipeline version of the bench takes about 25% more time to run than the fused one. The old approach to creating stages still works fine. I haven't updated SkXfermode.cpp or SkArithmeticMode.cpp because they seemed just as clear using Fn directly as they would have using EasyFn. If this looks okay to you I will rework the comments in SkRasterPipeline to explain SK_RASTER_STAGE and EasyFn a bit as I've done here in the CL description. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2195853002 Review-Url: https://codereview.chromium.org/2195853002	2016-07-29 14:27:41 -07:00
mtklein	0abddf7bb7	SkRasterPipeline: simplify impl and remove need to rewire stages This builds the stages correctly wired from the get-go. With a little clever setup, we can also design around the previous error cases like having no stages or pipelines that call st->next() off the end. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2149443002 Review-Url: https://codereview.chromium.org/2149443002	2016-07-13 08:22:20 -07:00
mtklein	281b33fdd9	SkRasterPipeline preliminaries Re-uploading to see if I can get a CL number < 2^31. patch from issue 2147533002 at patchset 240001 (http://crrev.com/2147533002#ps240001) Already reviewed at the other crrev link. TBR= BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2147533002 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Review-Url: https://codereview.chromium.org/2144573004	2016-07-12 15:01:26 -07:00

39 Commits