c8dd6bc3e7
We used to step at a 4-pixel stride as long as possible, then run up to 3 times, one pixel at a time. Now replace those 1-at-a-time runs with a single tail stamp if there are 1-3 remaining pixels. This style is simply more efficient: e.g. we'll blend and lerp once for 3 pixels instead of 3 times. This should make short blits significantly more efficient. It's also more future-oriented... AVX+ on Intel and SVE on ARM support masked loads and stores, so we can do the entire tail in one direct step. This also makes it possible to re-arrange the code a bit to encapsulate each stage better. I think generally this code reads more clearly than the old code, but YMMV. I've arranged things so you write one function, but it's compiled into two specializations, one for tail=0 (Body) and one for tail>0 (Tail). It's pretty tidy. For now I've just burned a register to pass around tail. It's 2 bits now, maybe soon 3 with AVX, and capped at 4 for even the craziest new toys, so there are plenty of places we can pack it if we want to get clever. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2717 Change-Id: I45852a3e5d4c5b5e9315302c46601aee0d32265f Reviewed-on: https://skia-review.googlesource.com/2717 Reviewed-by: Mike Reed <reed@google.com> Commit-Queue: Mike Klein <mtklein@chromium.org>
77 lines
2.0 KiB
C++
77 lines
2.0 KiB
C++
/*
|
|
* Copyright 2016 Google Inc.
|
|
*
|
|
* Use of this source code is governed by a BSD-style license that can be
|
|
* found in the LICENSE file.
|
|
*/
|
|
|
|
#include "Test.h"
|
|
#include "SkRasterPipeline.h"
|
|
|
|
SK_RASTER_STAGE(load) {
|
|
auto ptr = (const float*)ctx + x;
|
|
switch(tail&3) {
|
|
case 0: a = Sk4f{ptr[3]};
|
|
case 3: b = Sk4f{ptr[2]};
|
|
case 2: g = Sk4f{ptr[1]};
|
|
case 1: r = Sk4f{ptr[0]};
|
|
}
|
|
}
|
|
|
|
SK_RASTER_STAGE(square) {
|
|
r *= r;
|
|
g *= g;
|
|
b *= b;
|
|
a *= a;
|
|
}
|
|
|
|
SK_RASTER_STAGE(store) {
|
|
auto ptr = (float*)ctx + x;
|
|
switch (tail&3) {
|
|
case 0: ptr[3] = a[0];
|
|
case 3: ptr[2] = b[0];
|
|
case 2: ptr[1] = g[0];
|
|
case 1: ptr[0] = r[0];
|
|
}
|
|
}
|
|
|
|
DEF_TEST(SkRasterPipeline, r) {
|
|
// We'll build up and run a simple pipeline that exercises the salient
|
|
// mechanics of SkRasterPipeline:
|
|
// - context pointers (load,store)
|
|
// - stages sensitive to the number of pixels (load,store)
|
|
// - stages insensitive to the number of pixels (square)
|
|
//
|
|
// This pipeline loads up some values, squares them, then writes them back to memory.
|
|
|
|
const float src_vals[] = { 1,2,3,4,5 };
|
|
float dst_vals[] = { 0,0,0,0,0 };
|
|
|
|
SkRasterPipeline p;
|
|
p.append<load>(src_vals);
|
|
p.append<square>();
|
|
p.append<store>(dst_vals);
|
|
|
|
p.run(5);
|
|
|
|
REPORTER_ASSERT(r, dst_vals[0] == 1);
|
|
REPORTER_ASSERT(r, dst_vals[1] == 4);
|
|
REPORTER_ASSERT(r, dst_vals[2] == 9);
|
|
REPORTER_ASSERT(r, dst_vals[3] == 16);
|
|
REPORTER_ASSERT(r, dst_vals[4] == 25);
|
|
}
|
|
|
|
DEF_TEST(SkRasterPipeline_empty, r) {
|
|
// No asserts... just a test that this is safe to run.
|
|
SkRasterPipeline p;
|
|
p.run(20);
|
|
}
|
|
|
|
DEF_TEST(SkRasterPipeline_nonsense, r) {
|
|
// No asserts... just a test that this is safe to run and terminates.
|
|
// square() always calls st->next(); this makes sure we've always got something there to call.
|
|
SkRasterPipeline p;
|
|
p.append<square>();
|
|
p.run(20);
|
|
}
|