skia2/tests/SkRasterPipelineTest.cpp
Mike Klein c8dd6bc3e7 Rearrange SkRasterPipeline scanline tail handling.
We used to step at a 4-pixel stride as long as possible, then run up to 3 times, one pixel at a time.  Now replace those 1-at-a-time runs with a single tail stamp if there are 1-3 remaining pixels.

This style is simply more efficient: e.g. we'll blend and lerp once for 3 pixels instead of 3 times.  This should make short blits significantly more efficient.  It's also more future-oriented... AVX+ on Intel and SVE on ARM support masked loads and stores, so we can do the entire tail in one direct step.

This also makes it possible to re-arrange the code a bit to encapsulate each stage better.  I think generally this code reads more clearly than the old code, but YMMV.  I've arranged things so you write one function, but it's compiled into two specializations, one for tail=0 (Body) and one for tail>0 (Tail).  It's pretty tidy.

For now I've just burned a register to pass around tail.  It's 2 bits now, maybe soon 3 with AVX, and capped at 4 for even the craziest new toys, so there are plenty of places we can pack it if we want to get clever.

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2717

Change-Id: I45852a3e5d4c5b5e9315302c46601aee0d32265f
Reviewed-on: https://skia-review.googlesource.com/2717
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
2016-09-28 15:28:24 +00:00

77 lines
2.0 KiB
C++

/*
* Copyright 2016 Google Inc.
*
* Use of this source code is governed by a BSD-style license that can be
* found in the LICENSE file.
*/
#include "Test.h"
#include "SkRasterPipeline.h"
SK_RASTER_STAGE(load) {
auto ptr = (const float*)ctx + x;
switch(tail&3) {
case 0: a = Sk4f{ptr[3]};
case 3: b = Sk4f{ptr[2]};
case 2: g = Sk4f{ptr[1]};
case 1: r = Sk4f{ptr[0]};
}
}
SK_RASTER_STAGE(square) {
r *= r;
g *= g;
b *= b;
a *= a;
}
SK_RASTER_STAGE(store) {
auto ptr = (float*)ctx + x;
switch (tail&3) {
case 0: ptr[3] = a[0];
case 3: ptr[2] = b[0];
case 2: ptr[1] = g[0];
case 1: ptr[0] = r[0];
}
}
DEF_TEST(SkRasterPipeline, r) {
// We'll build up and run a simple pipeline that exercises the salient
// mechanics of SkRasterPipeline:
// - context pointers (load,store)
// - stages sensitive to the number of pixels (load,store)
// - stages insensitive to the number of pixels (square)
//
// This pipeline loads up some values, squares them, then writes them back to memory.
const float src_vals[] = { 1,2,3,4,5 };
float dst_vals[] = { 0,0,0,0,0 };
SkRasterPipeline p;
p.append<load>(src_vals);
p.append<square>();
p.append<store>(dst_vals);
p.run(5);
REPORTER_ASSERT(r, dst_vals[0] == 1);
REPORTER_ASSERT(r, dst_vals[1] == 4);
REPORTER_ASSERT(r, dst_vals[2] == 9);
REPORTER_ASSERT(r, dst_vals[3] == 16);
REPORTER_ASSERT(r, dst_vals[4] == 25);
}
DEF_TEST(SkRasterPipeline_empty, r) {
// No asserts... just a test that this is safe to run.
SkRasterPipeline p;
p.run(20);
}
DEF_TEST(SkRasterPipeline_nonsense, r) {
// No asserts... just a test that this is safe to run and terminates.
// square() always calls st->next(); this makes sure we've always got something there to call.
SkRasterPipeline p;
p.append<square>();
p.run(20);
}