This CL implements load_extend with 2 lanes and all load_splat
operations on IA32. The necessary assemblers together with their
corresponding disassemblers and tests are also added in this CL.
The newly added opcodes include: S8x16LoadSplat, S16x8LoadSplat,
S32x4LoadSplat, S64x2LoadSplat, I64x2Load32x2S, I64x2Load32x2U.
Bug: v8:9886
Change-Id: I0a5dae0a683985c14c433ba9d85acbd1cee6705f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1982989
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Zhiguo Zhou <zhiguo.zhou@intel.com>
Cr-Commit-Position: refs/heads/master@{#65937}
Note the tricky part in instruction-selector-x64, where we flip the
inputs given to the code generator. This is because the semantics we
want is: v128.andnot a b = a & !b, but the x64 instruction performs
andnps a b = !a & b. Therefore we flip the inputs, and combined with
g.DefineSameAsFirst, the output register will be the same as b, and we
can use andnps without any modifications in both SSE and AVX cases.
Bug: v8:10082
Change-Id: Iff98dc1dd944fbc642875f6306c6633d5d646615
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1980894
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65738}
For I16x8Splat and I8x16Splat, the arguments takes I32, which can hold a
value larger than what should be splatted. We add tests to check that
the splatted values is the truncated I32 value (top bits masked off).
See https://github.com/WebAssembly/simd/pull/151 for the updated to
proposal text.
Change-Id: Ib32770872e70c7cde2028130d2b44b416594610e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986200
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65642}
This change includes templatization of the test helper to allow the
same function to be reused for both signed and unsigned data types.
We implement a new function RoundingAverageUnsigned in overflowing-math,
rather than in base/utils, since the addition could overflow.
SIMD scalar lowering and implementation for other backends will follow
in future patches.
Bug: v8:10039
Change-Id: I70735f7b6536f197869ef1afbccaf5649e7e8448
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1958007
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65531}
Also some cleanup reordering of instruction codes.
Bug: v8:9813
Change-Id: I35caad0b84dd5824090046cba964454eac45d5d8
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1925613
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65088}
These instructions should always treat inputs as signed, and saturate to
unsigned min/max values.
E.g. given -1, it should saturate to 0.
The spec text,
https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing,
has been updated to describe this.
The changes here include codegen changes to ia32, x64, arm, and arm64,
changes to arm simulator, assembler, and disassembler to handle the case
of treating input as signed and narrowing to unsigned. The vqmovn
instruction can handle this case, our assembler wasn't allowing callers
to specify this.
The interpreter and scalar lowering are also fixed with this change.
Bug: v8:9729
Change-Id: I6f72baa825f59037f7754485df6a2964af59fe31
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1879423
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65051}
This implements the rest of the load extend instructions:
- i32x4.load16x4_s
- i32x4.load16x4_u
- i64x2.load32x2_s
- i64x2.load32x2_u
Bug: v8:9886
Change-Id: I4649f77bae5224042a1628d9f0498c050b1e599d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1903812
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65017}
This makes sure that the {WasmGraphBuilder} properly detects the
presence of Simd128 global.get and global.set opcodes and triggers
scalar lowering on architectures without Simd128 support.
R=clemensb@chromium.org
TEST=cctest/test-run-wasm-simd/RunWasm_S128Globals
BUG=v8:9973
Change-Id: I1538bd1d3fea40cc78e82b125d4f113842faf68a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1917148
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65002}
Introduce new operator LoadTransform that holds a LoadTransformInfo param,
which describes the kind of load (normal, unaligned, protected), and a
transformation (splat or extend, signed or unsigned).
We have a new method that a full decoder needs to implement, LoadTransform,
which resuses the existing LoadType we have, but also takes a LoadTransform,
to distinguish between splats and extends at the decoder level.
This implements 4 out of the 10 suggested load splat/extend operations
(to keep the cl smaller), and is also missing interpreter support (will
be added in the future).
Change-Id: I1e65c693bfbe30e2a511c81b5a32e06aacbddc19
Bug: v8:9886
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1863863
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64819}
There are a couple of bugs here:
1. The immediate used for vinsertps is wrong when lane == 1, the first
two bits specify which element of the source is copied, and it should
always be 00, 01 to copy the first 2 lanes of source.
2. For both cases, the second insertps call should be using dst as the
src, since dst was already updated by the first insertps call, it was
incorrectly using the old value of src. This was probably working
correctly because in many cases dst and src happened to be the same
register.
3. rep cannot be same as dst, because dst is overwritten, and rep should
stay the same
I also modified the F64x2ReplaceLane to test separately for replacing
lane 0 and lane 1.
Fixed bug 3. for arm and arm64.
Bug: v8:9728
Change-Id: Iec6e48bcfbc7d27908dd86d5f113a8b5dedd499b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1877055
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64620}
This introduces 2 new machine operators that are variants of I64x2Splat
and I64x2ReplaceLane that takes two int32 operands instead of one i64
operand.
Bug: v8:9728
Change-Id: I6675f991e6c56821c84d183dacfda96961c1a708
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1841242
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64337}
The vst1 and vld1 instruction does a post-increment access. What we
intend is the usual access at (base+offset). This change adds a helper
function that is called for load and stores of s128, which emits the add
instruction to do base+offset, and then change the addressing mode of
the load/store to Operand2_R, which generates the variant of vld1/vst1
without the offset register. This is similar to how kSimd128 values are
loaded/stored in VisitUnalignedLoad and VisitUnalignedStore.
We also remove kSimd128 cases from UnalignedLoad and UnalignedStore,
since it is supported (see A3.2.1 Unaligned Data Access, ARM DDI
0406C.d)
Bug: v8:9746
Bug: v8:9748
Change-Id: I60b987ac58a5eaacd498a940625163484a3dc2db
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1834771
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64229}
This CL implements i8x16.extract_lane_u, i16x8.extract_lane_u operations by
changing the default narrow extract operations to be unsigned. The
sign-extended extracts are implemented on top of the unsigned extracts
with an additional extend compiler node.
For IA32/X64, the codegen effectively remains the same -
0x389332bc32a3 63 660f3a14c900 pextrb rcx,xmm1,0
0x389332bc32a9 69 0fbec9 movsxbl rcx,rcx
0x389332bc32a3 63 660f3a14c900 pextrb rcx,xmm1,0
0x389332bc32a9 69 0fbec9 movsxbl rcx,rcx
On ARM, this adds an additional sxt instruction for the signed extracts.
Bug: v8:8460
Change-Id: I67f14b2b860ff8cc86ffbb2f65c7ef7de32da83f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1846711
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64172}
This brings our constants back in line with the changed spec text. We
already use kExprTableGet and kExprTableSet, but for locals and globals
we still use the old wording.
This renaming is mostly mechanical.
PS1 was created using:
ag -l 'kExpr(Get|Set)Global' src test | \
xargs -L1 sed -E 's/kExpr(Get|Set)Global\b/kExprGlobal\1/g' -i
PS2 contains manual fixes.
R=mstarzinger@chromium.org
Bug: v8:9810
Change-Id: I064a6448cd95bc24d31a5931b5b4ef2464ea88b1
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1847355
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64163}