[wasm-simd] Support returning Simd128 on caller's stack
In Liftoff, we were missing kS128 cases to load to/from stack.
For the x64 and ARM64 instruction selector, the calculation of
reverse_slot is incorrect for 128-bit values:
- reverse_slot += 2 (size of 128-bit values, 2 pointers)
- this copies from slot -2 into register
- but the value starts at slot -1, it occupies slots -1 and -2
- we end up copying slot -2 (most significant half) of the register, and
also slot -3, which is where rsi was store (Wasm instance addr)
- the test ends up with a different result every time
The calculation of reverse_slot is changed to follow how ia32 and ARM
does it, which is to start with
- reverse_slot = 0
- in the code-generator, add 1 to the slot
- then after emitting Peek operation, reverse_slot += 2
The fixes for x64 and ARM64 are in both instruction-selector and
code-generator.
ia32 and ARM didn't support writing kSimd128 values yet, it was only a
missing check in code-generator, so add that in.
For ARM, the codegen is more involved, vld1 does not support addressing
with an offset, so we have to do the addition into a scratch register.
Also adding a test for returning multiple v128. V128 is not exposed to
JavaScript, so we use a Wasm function call, and then an involved chain
of extract lanes, returning 6 i32 which we verify the values of. It
extracts the first and last lane of the i32x4 value in order to catch
bugs where we write or read to a wrong stack slot (off by 1).
The simd-scalar-lowering for kCall was only handling single s128 return,
we adopt the way i64-lowering handles kCall, so that is can now handle
any kinds of calls with s128 in the descriptor.
Bug: v8:10794
Bug: chromium:1115230
Change-Id: I2ccdd55f6292bc5794be78053b27e14da8cce70e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2355189
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69439}
2020-08-17 20:29:48 +00:00
|
|
|
// Copyright 2017 the V8 project authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file.
|
|
|
|
|
2021-11-29 20:24:30 +00:00
|
|
|
// Flags: --experimental-wasm-simd
|
[wasm-simd] Support returning Simd128 on caller's stack
In Liftoff, we were missing kS128 cases to load to/from stack.
For the x64 and ARM64 instruction selector, the calculation of
reverse_slot is incorrect for 128-bit values:
- reverse_slot += 2 (size of 128-bit values, 2 pointers)
- this copies from slot -2 into register
- but the value starts at slot -1, it occupies slots -1 and -2
- we end up copying slot -2 (most significant half) of the register, and
also slot -3, which is where rsi was store (Wasm instance addr)
- the test ends up with a different result every time
The calculation of reverse_slot is changed to follow how ia32 and ARM
does it, which is to start with
- reverse_slot = 0
- in the code-generator, add 1 to the slot
- then after emitting Peek operation, reverse_slot += 2
The fixes for x64 and ARM64 are in both instruction-selector and
code-generator.
ia32 and ARM didn't support writing kSimd128 values yet, it was only a
missing check in code-generator, so add that in.
For ARM, the codegen is more involved, vld1 does not support addressing
with an offset, so we have to do the addition into a scratch register.
Also adding a test for returning multiple v128. V128 is not exposed to
JavaScript, so we use a Wasm function call, and then an involved chain
of extract lanes, returning 6 i32 which we verify the values of. It
extracts the first and last lane of the i32x4 value in order to catch
bugs where we write or read to a wrong stack slot (off by 1).
The simd-scalar-lowering for kCall was only handling single s128 return,
we adopt the way i64-lowering handles kCall, so that is can now handle
any kinds of calls with s128 in the descriptor.
Bug: v8:10794
Bug: chromium:1115230
Change-Id: I2ccdd55f6292bc5794be78053b27e14da8cce70e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2355189
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69439}
2020-08-17 20:29:48 +00:00
|
|
|
|
2021-06-01 12:46:36 +00:00
|
|
|
d8.file.execute("test/mjsunit/wasm/wasm-module-builder.js");
|
[wasm-simd] Support returning Simd128 on caller's stack
In Liftoff, we were missing kS128 cases to load to/from stack.
For the x64 and ARM64 instruction selector, the calculation of
reverse_slot is incorrect for 128-bit values:
- reverse_slot += 2 (size of 128-bit values, 2 pointers)
- this copies from slot -2 into register
- but the value starts at slot -1, it occupies slots -1 and -2
- we end up copying slot -2 (most significant half) of the register, and
also slot -3, which is where rsi was store (Wasm instance addr)
- the test ends up with a different result every time
The calculation of reverse_slot is changed to follow how ia32 and ARM
does it, which is to start with
- reverse_slot = 0
- in the code-generator, add 1 to the slot
- then after emitting Peek operation, reverse_slot += 2
The fixes for x64 and ARM64 are in both instruction-selector and
code-generator.
ia32 and ARM didn't support writing kSimd128 values yet, it was only a
missing check in code-generator, so add that in.
For ARM, the codegen is more involved, vld1 does not support addressing
with an offset, so we have to do the addition into a scratch register.
Also adding a test for returning multiple v128. V128 is not exposed to
JavaScript, so we use a Wasm function call, and then an involved chain
of extract lanes, returning 6 i32 which we verify the values of. It
extracts the first and last lane of the i32x4 value in order to catch
bugs where we write or read to a wrong stack slot (off by 1).
The simd-scalar-lowering for kCall was only handling single s128 return,
we adopt the way i64-lowering handles kCall, so that is can now handle
any kinds of calls with s128 in the descriptor.
Bug: v8:10794
Bug: chromium:1115230
Change-Id: I2ccdd55f6292bc5794be78053b27e14da8cce70e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2355189
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69439}
2020-08-17 20:29:48 +00:00
|
|
|
|
|
|
|
(function MultiReturnS128Test() {
|
|
|
|
print("MultiReturnS128Test");
|
|
|
|
// Most backends only support 2 fp return registers, so the third v128
|
|
|
|
// onwards here will written to caller stack slot.
|
|
|
|
let builder = new WasmModuleBuilder();
|
|
|
|
let sig_v_sssss = builder.addType(
|
|
|
|
makeSig([], [kWasmS128, kWasmS128, kWasmS128, kWasmS128, kWasmS128]));
|
|
|
|
let sig_iiiiiiiiii_v = builder.addType(
|
|
|
|
makeSig([], [kWasmI32, kWasmI32, kWasmI32, kWasmI32, kWasmI32, kWasmI32,
|
|
|
|
kWasmI32, kWasmI32, kWasmI32, kWasmI32] ));
|
|
|
|
|
|
|
|
let callee = builder.addFunction("callee", sig_v_sssss)
|
|
|
|
.addBody([
|
|
|
|
kExprI32Const, 0,
|
|
|
|
kSimdPrefix, kExprI32x4Splat,
|
|
|
|
kExprI32Const, 1,
|
|
|
|
kSimdPrefix, kExprI32x4Splat,
|
|
|
|
kExprI32Const, 2,
|
|
|
|
kSimdPrefix, kExprI32x4Splat,
|
|
|
|
kExprI32Const, 3,
|
|
|
|
kSimdPrefix, kExprI32x4Splat,
|
|
|
|
kExprI32Const, 4,
|
|
|
|
kSimdPrefix, kExprI32x4Splat,
|
|
|
|
kExprReturn]);
|
|
|
|
// For each v128 on the stack, we return the first and last lane. This help
|
|
|
|
// catch bugs with reading/writing the wrong stack slots.
|
|
|
|
builder.addFunction("main", sig_iiiiiiiiii_v)
|
2020-09-10 12:39:52 +00:00
|
|
|
.addLocals(kWasmI32, 10).addLocals(kWasmS128, 1)
|
[wasm-simd] Support returning Simd128 on caller's stack
In Liftoff, we were missing kS128 cases to load to/from stack.
For the x64 and ARM64 instruction selector, the calculation of
reverse_slot is incorrect for 128-bit values:
- reverse_slot += 2 (size of 128-bit values, 2 pointers)
- this copies from slot -2 into register
- but the value starts at slot -1, it occupies slots -1 and -2
- we end up copying slot -2 (most significant half) of the register, and
also slot -3, which is where rsi was store (Wasm instance addr)
- the test ends up with a different result every time
The calculation of reverse_slot is changed to follow how ia32 and ARM
does it, which is to start with
- reverse_slot = 0
- in the code-generator, add 1 to the slot
- then after emitting Peek operation, reverse_slot += 2
The fixes for x64 and ARM64 are in both instruction-selector and
code-generator.
ia32 and ARM didn't support writing kSimd128 values yet, it was only a
missing check in code-generator, so add that in.
For ARM, the codegen is more involved, vld1 does not support addressing
with an offset, so we have to do the addition into a scratch register.
Also adding a test for returning multiple v128. V128 is not exposed to
JavaScript, so we use a Wasm function call, and then an involved chain
of extract lanes, returning 6 i32 which we verify the values of. It
extracts the first and last lane of the i32x4 value in order to catch
bugs where we write or read to a wrong stack slot (off by 1).
The simd-scalar-lowering for kCall was only handling single s128 return,
we adopt the way i64-lowering handles kCall, so that is can now handle
any kinds of calls with s128 in the descriptor.
Bug: v8:10794
Bug: chromium:1115230
Change-Id: I2ccdd55f6292bc5794be78053b27e14da8cce70e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2355189
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69439}
2020-08-17 20:29:48 +00:00
|
|
|
.addBody([
|
|
|
|
kExprCallFunction, callee.index,
|
|
|
|
|
|
|
|
kExprLocalTee, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 0,
|
|
|
|
kExprLocalSet, 0,
|
|
|
|
kExprLocalGet, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 3,
|
|
|
|
kExprLocalSet, 1,
|
|
|
|
|
|
|
|
kExprLocalTee, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 0,
|
|
|
|
kExprLocalSet, 2,
|
|
|
|
kExprLocalGet, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 3,
|
|
|
|
kExprLocalSet, 3,
|
|
|
|
|
|
|
|
kExprLocalTee, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 0,
|
|
|
|
kExprLocalSet, 4,
|
|
|
|
kExprLocalGet, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 3,
|
|
|
|
kExprLocalSet, 5,
|
|
|
|
|
|
|
|
kExprLocalTee, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 0,
|
|
|
|
kExprLocalSet, 6,
|
|
|
|
kExprLocalGet, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 3,
|
|
|
|
kExprLocalSet, 7,
|
|
|
|
|
|
|
|
kExprLocalTee, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 0,
|
|
|
|
kExprLocalSet, 8,
|
|
|
|
kExprLocalGet, 10,
|
|
|
|
kSimdPrefix, kExprI32x4ExtractLane, 3,
|
|
|
|
kExprLocalSet, 9,
|
|
|
|
|
|
|
|
// Return all the stored locals.
|
|
|
|
kExprLocalGet, 0,
|
|
|
|
kExprLocalGet, 1,
|
|
|
|
kExprLocalGet, 2,
|
|
|
|
kExprLocalGet, 3,
|
|
|
|
kExprLocalGet, 4,
|
|
|
|
kExprLocalGet, 5,
|
|
|
|
kExprLocalGet, 6,
|
|
|
|
kExprLocalGet, 7,
|
|
|
|
kExprLocalGet, 8,
|
|
|
|
kExprLocalGet, 9,
|
|
|
|
])
|
|
|
|
.exportAs("main");
|
|
|
|
|
|
|
|
let module = new WebAssembly.Module(builder.toBuffer());
|
|
|
|
let instance = new WebAssembly.Instance(module);
|
|
|
|
assertEquals(instance.exports.main(), [4, 4, 3, 3, 2, 2, 1, 1, 0, 0]);
|
|
|
|
})();
|