skia2/tests/sksl/runtime/golden/SampleWithUniformMatrix.skvm

50 lines
1.3 KiB
Plaintext
Raw Normal View History

Stop calling schedule() The new unit test demonstrates load/store reordering is error-prone. At head we're allowing loads from a given pointer to reorder later than a store to that same pointer, and boy, that's just not sound. In the scenario constructed by the test we reorder this swap, x = load32 X y = load32 Y store32 X y store32 Y x using schedule() (following Op argument data dependencies) into y = load32 Y store32 X y x = load32 X store32 Y x which moves `x = load32 X` illegally past `store X y`. We write `y` twice instead of swapping `x` and `y`. It's not impossible to implement that extra reordering constraint: I think it's easiest to think about by adding implicit use edges in schedule() from stores to prior loads of the same pointer. But that'd be a little complicated to implement, and doesn't handle aliasing at all, so I decided to ponder on other approaches that handle a wider range of programs or would have a simpler implementation to reason about. I ended up walking through this rough chain of ideas: 0) reorder using only Op argument data dependencies (HEAD) 1) don't let load(ptr) pass store(ptr) (above) 2) don't let any load pass any store (allows aliasing) 3) don't reorder any Op that touches memory 4) don't reorder any Op, period. This CL is 4). It's certainly the easiest and cheapest implementation. It's not clear to me that we need this scheduling, and should we find we really want it I'll come back and work back through the list until we find something that meets our needs. (Hoisting of uniforms is unaffected here.) Change-Id: I7765b1d16202e0645b11295f7e30c5e09f2b7339 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/350256 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-05 19:31:15 +00:00
18 registers, 47 instructions:
0 r0 = uniform32 arg(0) 0
1 r1 = uniform32 arg(0) C
Stop calling schedule() The new unit test demonstrates load/store reordering is error-prone. At head we're allowing loads from a given pointer to reorder later than a store to that same pointer, and boy, that's just not sound. In the scenario constructed by the test we reorder this swap, x = load32 X y = load32 Y store32 X y store32 Y x using schedule() (following Op argument data dependencies) into y = load32 Y store32 X y x = load32 X store32 Y x which moves `x = load32 X` illegally past `store X y`. We write `y` twice instead of swapping `x` and `y`. It's not impossible to implement that extra reordering constraint: I think it's easiest to think about by adding implicit use edges in schedule() from stores to prior loads of the same pointer. But that'd be a little complicated to implement, and doesn't handle aliasing at all, so I decided to ponder on other approaches that handle a wider range of programs or would have a simpler implementation to reason about. I ended up walking through this rough chain of ideas: 0) reorder using only Op argument data dependencies (HEAD) 1) don't let load(ptr) pass store(ptr) (above) 2) don't let any load pass any store (allows aliasing) 3) don't reorder any Op that touches memory 4) don't reorder any Op, period. This CL is 4). It's certainly the easiest and cheapest implementation. It's not clear to me that we need this scheduling, and should we find we really want it I'll come back and work back through the list until we find something that meets our needs. (Hoisting of uniforms is unaffected here.) Change-Id: I7765b1d16202e0645b11295f7e30c5e09f2b7339 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/350256 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-05 19:31:15 +00:00
2 r2 = uniform32 arg(0) 10
3 r3 = uniform32 arg(0) 14
4 r4 = uniform32 arg(0) 18
5 r5 = uniform32 arg(0) 1C
6 r6 = uniform32 arg(0) 20
7 r7 = uniform32 arg(0) 24
8 r8 = uniform32 arg(0) 28
9 r9 = uniform32 arg(0) 2C
10 r10 = uniform32 arg(0) 30
11 r5 = mul_f32 r5 r0
12 r6 = mul_f32 r6 r0
13 r0 = mul_f32 r7 r0
14 r7 = splat 3F800000 (1)
15 r11 = splat 1 (1.4012985e-45)
16 r12 = splat 2 (2.8025969e-45)
Stop calling schedule() The new unit test demonstrates load/store reordering is error-prone. At head we're allowing loads from a given pointer to reorder later than a store to that same pointer, and boy, that's just not sound. In the scenario constructed by the test we reorder this swap, x = load32 X y = load32 Y store32 X y store32 Y x using schedule() (following Op argument data dependencies) into y = load32 Y store32 X y x = load32 X store32 Y x which moves `x = load32 X` illegally past `store X y`. We write `y` twice instead of swapping `x` and `y`. It's not impossible to implement that extra reordering constraint: I think it's easiest to think about by adding implicit use edges in schedule() from stores to prior loads of the same pointer. But that'd be a little complicated to implement, and doesn't handle aliasing at all, so I decided to ponder on other approaches that handle a wider range of programs or would have a simpler implementation to reason about. I ended up walking through this rough chain of ideas: 0) reorder using only Op argument data dependencies (HEAD) 1) don't let load(ptr) pass store(ptr) (above) 2) don't let any load pass any store (allows aliasing) 3) don't reorder any Op that touches memory 4) don't reorder any Op, period. This CL is 4). It's certainly the easiest and cheapest implementation. It's not clear to me that we need this scheduling, and should we find we really want it I'll come back and work back through the list until we find something that meets our needs. (Hoisting of uniforms is unaffected here.) Change-Id: I7765b1d16202e0645b11295f7e30c5e09f2b7339 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/350256 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-05 19:31:15 +00:00
17 r13 = splat 3 (4.2038954e-45)
loop:
18 r14 = index
Stop calling schedule() The new unit test demonstrates load/store reordering is error-prone. At head we're allowing loads from a given pointer to reorder later than a store to that same pointer, and boy, that's just not sound. In the scenario constructed by the test we reorder this swap, x = load32 X y = load32 Y store32 X y store32 Y x using schedule() (following Op argument data dependencies) into y = load32 Y store32 X y x = load32 X store32 Y x which moves `x = load32 X` illegally past `store X y`. We write `y` twice instead of swapping `x` and `y`. It's not impossible to implement that extra reordering constraint: I think it's easiest to think about by adding implicit use edges in schedule() from stores to prior loads of the same pointer. But that'd be a little complicated to implement, and doesn't handle aliasing at all, so I decided to ponder on other approaches that handle a wider range of programs or would have a simpler implementation to reason about. I ended up walking through this rough chain of ideas: 0) reorder using only Op argument data dependencies (HEAD) 1) don't let load(ptr) pass store(ptr) (above) 2) don't let any load pass any store (allows aliasing) 3) don't reorder any Op that touches memory 4) don't reorder any Op, period. This CL is 4). It's certainly the easiest and cheapest implementation. It's not clear to me that we need this scheduling, and should we find we really want it I'll come back and work back through the list until we find something that meets our needs. (Hoisting of uniforms is unaffected here.) Change-Id: I7765b1d16202e0645b11295f7e30c5e09f2b7339 Reviewed-on: https://skia-review.googlesource.com/c/skia/+/350256 Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-05 19:31:15 +00:00
19 r15 = mul_f32 r2 r14
20 r15 = add_f32 r15 r5
21 r15 = add_f32 r15 r8
22 r16 = mul_f32 r3 r14
23 r16 = add_f32 r16 r6
24 r16 = add_f32 r16 r9
25 r14 = mul_f32 r4 r14
26 r14 = add_f32 r14 r0
27 r14 = add_f32 r14 r10
28 r14 = div_f32 r7 r14
29 r15 = mul_f32 r15 r14
30 r14 = mul_f32 r16 r14
31 r15 = trunc r15
32 r14 = trunc r14
33 r14 = mul_i32 r14 r1
34 r14 = add_i32 r15 r14
35 r14 = shl_i32 r14 2
36 r15 = gather32 arg(0) 4 r14
37 r16 = add_i32 r14 r11
38 r16 = gather32 arg(0) 4 r16
39 r17 = add_i32 r14 r12
40 r17 = gather32 arg(0) 4 r17
41 r14 = add_i32 r14 r13
42 r14 = gather32 arg(0) 4 r14
43 store32 arg(1) r15
44 store32 arg(2) r16
45 store32 arg(3) r17
46 store32 arg(4) r14