trim another instruction of I32_SWAR
Now that we've got shr_16x2, extract(..., 8, splat(0x00ff00ff)) is better done as shr_16x2(..., 8). This swaps a 16-bit shift in for the 32-bit shift, a wash, but lets us drop the bit_and at the end, saving one whole instruction. This places I32_SWAR a tiny little bit faster than the code in Opts, like .19 ns/px vs .20 ns/px for Opts. Change-Id: I4160dc03ecc8b855c0773a927f1510ad5cbb4b87 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/220856 Commit-Queue: Mike Klein <mtklein@google.com> Reviewed-by: Herb Derby <herb@google.com>
This commit is contained in:
parent
7f061fb53b
commit
4c4945a252
@ -609,8 +609,8 @@ r4 = shr r3 24
|
||||
r4 = pack r4 r4 16
|
||||
r4 = sub_i16x2 r0 r4
|
||||
r5 = load32 arg(1)
|
||||
r6 = extract r5 0 r1
|
||||
r5 = extract r5 8 r1
|
||||
r6 = bit_and r5 r1
|
||||
r5 = shr_i16x2 r5 8
|
||||
r6 = mul_i16x2 r6 r4
|
||||
r6 = shr_i16x2 r6 8
|
||||
r5 = mul_i16x2 r5 r4
|
||||
|
@ -148,9 +148,9 @@ SrcoverBuilder_I32_SWAR::SrcoverBuilder_I32_SWAR() {
|
||||
skvm::I32 ax2 = pack(a,a,16),
|
||||
invAx2 = sub_16x2(splat(0x01000100), ax2);
|
||||
|
||||
skvm::I32 d = load32(dst),
|
||||
rb = extract(d, 0, splat(0x00ff00ff)),
|
||||
ga = extract(d, 8, splat(0x00ff00ff));
|
||||
skvm::I32 d = load32(dst),
|
||||
rb = bit_and (d, splat(0x00ff00ff)),
|
||||
ga = shr_16x2(d, 8);
|
||||
|
||||
rb = shr_16x2(mul_16x2(rb, invAx2), 8); // Put the high 8 bits back in the low lane.
|
||||
ga = mul_16x2(ga, invAx2); // Keep the high 8 bits up high...
|
||||
|
Loading…
Reference in New Issue
Block a user