trim another instruction of I32_SWAR

Now that we've got shr_16x2, extract(..., 8, splat(0x00ff00ff)) is
better done as shr_16x2(..., 8).  This swaps a 16-bit shift in for
the 32-bit shift, a wash, but lets us drop the bit_and at the end,
saving one whole instruction.

This places I32_SWAR a tiny little bit faster than the code in Opts,
like .19 ns/px vs .20 ns/px for Opts.

Change-Id: I4160dc03ecc8b855c0773a927f1510ad5cbb4b87
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220856
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
This commit is contained in:
Mike Klein 2019-06-13 15:51:39 -05:00 committed by Skia Commit-Bot
parent 7f061fb53b
commit 4c4945a252
2 changed files with 5 additions and 5 deletions

View File

@ -609,8 +609,8 @@ r4 = shr r3 24
r4 = pack r4 r4 16
r4 = sub_i16x2 r0 r4
r5 = load32 arg(1)
r6 = extract r5 0 r1
r5 = extract r5 8 r1
r6 = bit_and r5 r1
r5 = shr_i16x2 r5 8
r6 = mul_i16x2 r6 r4
r6 = shr_i16x2 r6 8
r5 = mul_i16x2 r5 r4

View File

@ -148,9 +148,9 @@ SrcoverBuilder_I32_SWAR::SrcoverBuilder_I32_SWAR() {
skvm::I32 ax2 = pack(a,a,16),
invAx2 = sub_16x2(splat(0x01000100), ax2);
skvm::I32 d = load32(dst),
rb = extract(d, 0, splat(0x00ff00ff)),
ga = extract(d, 8, splat(0x00ff00ff));
skvm::I32 d = load32(dst),
rb = bit_and (d, splat(0x00ff00ff)),
ga = shr_16x2(d, 8);
rb = shr_16x2(mul_16x2(rb, invAx2), 8); // Put the high 8 bits back in the low lane.
ga = mul_16x2(ga, invAx2); // Keep the high 8 bits up high...