Introduce new operator LoadTransform that holds a LoadTransformInfo param,
which describes the kind of load (normal, unaligned, protected), and a
transformation (splat or extend, signed or unsigned).
We have a new method that a full decoder needs to implement, LoadTransform,
which resuses the existing LoadType we have, but also takes a LoadTransform,
to distinguish between splats and extends at the decoder level.
This implements 4 out of the 10 suggested load splat/extend operations
(to keep the cl smaller), and is also missing interpreter support (will
be added in the future).
Change-Id: I1e65c693bfbe30e2a511c81b5a32e06aacbddc19
Bug: v8:9886
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1863863
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64819}
There are a couple of bugs here:
1. The immediate used for vinsertps is wrong when lane == 1, the first
two bits specify which element of the source is copied, and it should
always be 00, 01 to copy the first 2 lanes of source.
2. For both cases, the second insertps call should be using dst as the
src, since dst was already updated by the first insertps call, it was
incorrectly using the old value of src. This was probably working
correctly because in many cases dst and src happened to be the same
register.
3. rep cannot be same as dst, because dst is overwritten, and rep should
stay the same
I also modified the F64x2ReplaceLane to test separately for replacing
lane 0 and lane 1.
Fixed bug 3. for arm and arm64.
Bug: v8:9728
Change-Id: Iec6e48bcfbc7d27908dd86d5f113a8b5dedd499b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1877055
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64620}
This introduces 2 new machine operators that are variants of I64x2Splat
and I64x2ReplaceLane that takes two int32 operands instead of one i64
operand.
Bug: v8:9728
Change-Id: I6675f991e6c56821c84d183dacfda96961c1a708
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1841242
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64337}
The vst1 and vld1 instruction does a post-increment access. What we
intend is the usual access at (base+offset). This change adds a helper
function that is called for load and stores of s128, which emits the add
instruction to do base+offset, and then change the addressing mode of
the load/store to Operand2_R, which generates the variant of vld1/vst1
without the offset register. This is similar to how kSimd128 values are
loaded/stored in VisitUnalignedLoad and VisitUnalignedStore.
We also remove kSimd128 cases from UnalignedLoad and UnalignedStore,
since it is supported (see A3.2.1 Unaligned Data Access, ARM DDI
0406C.d)
Bug: v8:9746
Bug: v8:9748
Change-Id: I60b987ac58a5eaacd498a940625163484a3dc2db
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1834771
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64229}
This CL implements i8x16.extract_lane_u, i16x8.extract_lane_u operations by
changing the default narrow extract operations to be unsigned. The
sign-extended extracts are implemented on top of the unsigned extracts
with an additional extend compiler node.
For IA32/X64, the codegen effectively remains the same -
0x389332bc32a3 63 660f3a14c900 pextrb rcx,xmm1,0
0x389332bc32a9 69 0fbec9 movsxbl rcx,rcx
0x389332bc32a3 63 660f3a14c900 pextrb rcx,xmm1,0
0x389332bc32a9 69 0fbec9 movsxbl rcx,rcx
On ARM, this adds an additional sxt instruction for the signed extracts.
Bug: v8:8460
Change-Id: I67f14b2b860ff8cc86ffbb2f65c7ef7de32da83f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1846711
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64172}
This brings our constants back in line with the changed spec text. We
already use kExprTableGet and kExprTableSet, but for locals and globals
we still use the old wording.
This renaming is mostly mechanical.
PS1 was created using:
ag -l 'kExpr(Get|Set)Global' src test | \
xargs -L1 sed -E 's/kExpr(Get|Set)Global\b/kExprGlobal\1/g' -i
PS2 contains manual fixes.
R=mstarzinger@chromium.org
Bug: v8:9810
Change-Id: I064a6448cd95bc24d31a5931b5b4ef2464ea88b1
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1847355
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64163}
FMA operations is always supported on arm64, so in the test, we expect
fused results on arm64 whenever we run on TurboFan.
Bug: v8:9415
Change-Id: Ia2016533b9b76ee14b8c8da1c0d4ff7753276714
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1819723
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63973}
Add a new test SimdLoadStoreLoadMemargOffset to test this, without this fix
this test would have failed.
Bug: v8:9753
Change-Id: I119adda8e3c6c7adb0ad4023298bbce9c0c64a01
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1811457
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63914}
Quasi Fused Multiply-Add and Quasi Fused Multiply-Subtract performs, on floats, a + b * c and a - b * c respectively.
When there is only a single rounding, it is a fused operation. Quasi in this case means that the result can either be fused or not fused (two roundings), depending on hardware support.
It is tricky to write the test because we need to calculate the expected value, and there is no easy way to express fused or unfused operation in C++, i.e.
we cannot confirm that float expected = a + b * c will perform a fused or unfused operation (unless we use intrinsics).
Thus in the test we have a list of simple checks, plus interesting values that we know will produce different results depending on whether it was fused or not.
The difference between 32x4 and 64x2 qfma/qfms is the type, and also the values of b and c that will cause an overflow, and thus the intermediate rounding will affect the final result.
The same array can be copy pasted for both types, but with a bit of templating we can avoid that duplication.
Change-Id: I0973a3d28468d25f310b593c72f21bff54d809a7
Bug: v8:9415
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1779325
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63878}
Drive by fix of type of expected value in a test
Bug: v8:9626
Change-Id: I1bb44082b873383ea75e7089828bc68c9d4e0df0
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1757503
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63727}
Implementations for other architectures will follow in subsequent
changes.
Bug: v8:8460
Change-Id: I279388ab76b1d88d65cbe179088be5573c17fc58
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1796317
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63693}
Implement I64x2 multiply using 32-bit multiplies.
This approach uses two fewer cycles (0.88x) on Cortex-A53 and three fewer cycles (0.86x)
on Cortex-A72, compared to moving to general purpose registers and doing two 64-bit multiplies.
Based on a patch by Zhi An Ng.
Bug: v8:8460
Change-Id: I9c8d3bb77f0d751eec2d85823522558b7f173628
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1781696
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63558}
This is only for turbofan and interpreter, and simd lowering for 64x2 is
not implemented yet.
Bug: v8:8460
Change-Id: I0d046cb39ff64936da772e0db9a86b88b1509ac2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1769194
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63466}
This off-by-1 error surfaces when the load/store opcodes take up 2
bytes, which is the case for v128.load and v128.store SIMD operations.
Bug: v8:9015
Change-Id: Ife17375ed3450a95399b326bc6415dbc3ed3773b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1769480
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63405}
- Move undef closer to end of usage
- Move I64x2ExtractWithF64x2 closer to Extract tests, and into ifdef
scope so it runs on arm64 builds
Change-Id: I7138c44097975d02e97f4b2b9bfcddd8eb9735c8
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1754544
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63227}
This is a reland of
https://chromium-review.googlesource.com/c/v8/v8/+/1749712 with a fix in
test-run-wasm-simd.cc to use base::Divide to work around C++ undefined
behavior when the denominator is 0.
Bug: v8:8460
Change-Id: Ia0a4ff621cccc6d9b7528717bf3fa7c79e42ba1a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1745819
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63198}