This is a reland of commit aa541f1c9c
Original change's description:
> [turbofan][arm64] Emit Lsl for Int32MulWithOverflow when possible
>
> Int32MulWithOverflow on arm64 uses a cmp to set flags rather than
> the multiply instruction itself, thus we can use a left shift when
> the multiplication is by a power of two.
>
> This provides 0.15% for Speedometer2 on a Neoverse-N1 machine,
> with React being improved by 0.45%.
>
> Change-Id: Ic8db42ecc7cb14cf1ac7bbbeab0e9d8359104351
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3829472
> Commit-Queue: George Wort <george.wort@arm.com>
> Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#82499}
Change-Id: Ib8f387bd41d283df551299f7ee98e72d39e2a3bd
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3865484
Commit-Queue: George Wort <george.wort@arm.com>
Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
Cr-Commit-Position: refs/heads/main@{#82909}
This reverts commit aa541f1c9c.
Reason for revert: Reverting due to large regressions for motionmark on M1.
Original change's description:
> [turbofan][arm64] Emit Lsl for Int32MulWithOverflow when possible
>
> Int32MulWithOverflow on arm64 uses a cmp to set flags rather than
> the multiply instruction itself, thus we can use a left shift when
> the multiplication is by a power of two.
>
> This provides 0.15% for Speedometer2 on a Neoverse-N1 machine,
> with React being improved by 0.45%.
>
> Change-Id: Ic8db42ecc7cb14cf1ac7bbbeab0e9d8359104351
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3829472
> Commit-Queue: George Wort <george.wort@arm.com>
> Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#82499}
Change-Id: I896530a53fbdf6d397922124abddda4140144448
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3854222
Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Commit-Queue: George Wort <george.wort@arm.com>
Cr-Commit-Position: refs/heads/main@{#82696}
Int32MulWithOverflow on arm64 uses a cmp to set flags rather than
the multiply instruction itself, thus we can use a left shift when
the multiplication is by a power of two.
This provides 0.15% for Speedometer2 on a Neoverse-N1 machine,
with React being improved by 0.45%.
Change-Id: Ic8db42ecc7cb14cf1ac7bbbeab0e9d8359104351
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3829472
Commit-Queue: George Wort <george.wort@arm.com>
Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
Cr-Commit-Position: refs/heads/main@{#82499}
Use a single SBFX instruction for Word64Sar(ChangeInt32ToInt64(x), imm)
when possible.
Using PGO, this improves Speedometer2 by 0.4% on a Cortex-A55 machine,
and 0.27% on a Neoverse-N1 machine.
Change-Id: I6fea5e473f0f0869f8f6cebd9a4e61bb2fc6e9ef
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3807586
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Commit-Queue: Rodolph Perfetta <rodolph.perfetta@arm.com>
Reviewed-by: Jakob Linke <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#82277}
This prevents ambiguity errors in C++20 due to ADL when casting types in
std::, which gains std::bit_cast<>().
Bug: chromium:1284275
Change-Id: I25046d1952a9304852e481ad8b84049c6769c289
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3625838
Auto-Submit: Peter Kasting <pkasting@chromium.org>
Reviewed-by: Adam Klein <adamk@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Commit-Queue: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/main@{#80378}
Immediate version of the Bitclear instruction can be used for logical
And with some immediates. It can also be used to implement
And(x, Not(imm)) in a single instruction. This patch gives ~0.5% runtime
improvement in one benchmark on Neoverse N1.
Change-Id: Ia926c6746f0c252f81626c6fca21c4dfb41679d9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3160667
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/main@{#80015}
This is a reland of 16df1dfa13
No changes have been made to this reland as previous commit was reverted
due to a new test revealing an existing bug. This bug has now been fixed.
Original change's description:
> [arm64][wasm-simd] Use Cm(0) for integer comparison with 0
>
> Use an immediate zero operand for integer comparison when possible. This
> gives ~1% runtime performance improvement in some benchmarks on Neoverse
> N1.
>
> Change-Id: I727a8104f8e6ca3d122d6b5b8b3d38d7bdd76c47
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3158327
> Reviewed-by: Zhi An Ng <zhin@chromium.org>
> Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
> Cr-Commit-Position: refs/heads/main@{#76847}
Change-Id: I77d6923d79407a83becbd39970c6a3f62d3a304d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178482
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Rodolph Perfetta <rodolph.perfetta@arm.com>
Cr-Commit-Position: refs/heads/main@{#77260}
This reverts commit 16df1dfa13.
Reason for revert: Multiple failures, e.g. https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Linux/43844/overview
Original change's description:
> [arm64][wasm-simd] Use Cm(0) for integer comparison with 0
>
> Use an immediate zero operand for integer comparison when possible. This
> gives ~1% runtime performance improvement in some benchmarks on Neoverse
> N1.
>
> Change-Id: I727a8104f8e6ca3d122d6b5b8b3d38d7bdd76c47
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3158327
> Reviewed-by: Zhi An Ng <zhin@chromium.org>
> Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
> Cr-Commit-Position: refs/heads/main@{#76847}
Tbr: zhin@chromium.org
Change-Id: I7039106d885c59aecad24dd8dda4d151b8e1f022
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3162053
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Auto-Submit: Clemens Backes <clemensb@chromium.org>
Owners-Override: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76851}
Use an immediate zero operand for integer comparison when possible. This
gives ~1% runtime performance improvement in some benchmarks on Neoverse
N1.
Change-Id: I727a8104f8e6ca3d122d6b5b8b3d38d7bdd76c47
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3158327
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/main@{#76847}
Use an immediate zero operand for floating point comparison nodes when
possible. This results in up to 20-25% runtime improvement in some
microbenchmarks, as well as 1-1.5% runtime improvement in some
real-use benchmarks on Cortex-A55 and Neoverse N1.
Change-Id: I39d10871a08a037dbe8c0877d789d110476e1a58
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3133143
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/main@{#76749}
This is a reland of 65515ddd3e
Fix is to use AddWithWraparound for signed additions to avoid UB.
Original change's description:
> [wasm-simd][arm64] Fuse add and extmul
>
> We can select a better instruction for add+extmul, using one of the
> multiply-long-accumulate instruction.
>
> Define a helper struct to pattern match Add(x, OP(y, z)) and
> Add(OP(x, y) z), and ensure that the matched OP is always on the
> LHS, to simplify checking for matches.
>
> Bug: v8:11548
> Change-Id: I7ab488b262aa9f749785f973549ccd9fad72f4c8
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2826725
> Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#76708}
Bug: v8:11548
Change-Id: I675ab8b78d9c6c30b82a8c96c8e7098a548c6a60
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3144379
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76712}
This reverts commit 65515ddd3e.
Reason for revert: https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Linux64%20UBSan/18117/overview
Original change's description:
> [wasm-simd][arm64] Fuse add and extmul
>
> We can select a better instruction for add+extmul, using one of the
> multiply-long-accumulate instruction.
>
> Define a helper struct to pattern match Add(x, OP(y, z)) and
> Add(OP(x, y) z), and ensure that the matched OP is always on the
> LHS, to simplify checking for matches.
>
> Bug: v8:11548
> Change-Id: I7ab488b262aa9f749785f973549ccd9fad72f4c8
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2826725
> Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#76708}
Bug: v8:11548
Change-Id: Ic1560616e7ee6df917fcedbb6ad139a1a9773d68
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3144377
Auto-Submit: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Cr-Commit-Position: refs/heads/main@{#76709}
We can select a better instruction for add+extmul, using one of the
multiply-long-accumulate instruction.
Define a helper struct to pattern match Add(x, OP(y, z)) and
Add(OP(x, y) z), and ensure that the matched OP is always on the
LHS, to simplify checking for matches.
Bug: v8:11548
Change-Id: I7ab488b262aa9f749785f973549ccd9fad72f4c8
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2826725
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76708}
We are running out of encoding space for opcodes on arm64. This patch
merges some wasm simd opcodes of different simd types, encoding the lane
size in the instruction code using LaneSizeField instead. This reduces
the total number of opcodes on arm64 by 71.
Bug: v8:12093
Change-Id: Ib4d96d1db1ff9b08fafd665974f3494a507da770
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3109676
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/main@{#76434}
This is a reland of 2261e05333
This patch can now be relanded as some space was made for more opcodes:
https://bugs.chromium.org/p/v8/issues/detail?id=12093
Original change's description:
> [arm64][wasm] Use NEON S/Usra for Wasm SIMD add(shr(x, imm), y)
>
> A single AArch64 SIMD signed/unsigned Shift Right and Accumulate can be
> used to implement Wasm SIMD add(shr(x, imm), y). This gives a 1-1.5%
> improvement on some compute intensive Wasm benchmarks on Neoverse-N1.
>
> Mla and Adalp optimisations were refactored to match the style of the
> added code.
>
> Change-Id: Id5959a31ca267e02b7d60e7ff6f942adb029b41e
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3089157
> Reviewed-by: Zhi An Ng <zhin@chromium.org>
> Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
> Cr-Commit-Position: refs/heads/master@{#76280}
Change-Id: Idd166b7d3c960af33049bbce6e7276763c28f286
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3097284
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#76340}
This is identical to https://crrev.com/c/3094011, but for 16-bit values.
We introduce another instruction to differentiate between 16->32 bit
sign extensions and 16->64 bit sign extensions.
R=ahaas@chromium.org, mslekova@chromium.org
Bug: chromium:1239116
Change-Id: I2742e9d9c2b4a038fc7a0b1715faf8f25fa20b1f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3094012
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Maya Lekova <mslekova@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#76284}
InstructionSelector::ZeroExtendsWord32ToWord64 assumes that a
Load[kRepWord8|kTypeInt32] generates a zero-extended value. This
assumption makes sense, but was not fulfilled by the instruction
selector which emitted an "ldrsb" instruction which sign-extended to the
full 64-bit register.
This CL fixes that by introducing a separate "LdrsbW" instruction which
is selected if we are sign-extending an 8-bit value to 32-bit.
R=ahaas@chromium.org, mslekova@chromium.orgCC=v8-arm-ports@googlegroups.com
Bug: chromium:1239116
Change-Id: I2da1ad6062805acf5558f3e66b8db9a50e830302
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3094011
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Maya Lekova <mslekova@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#76283}
This reverts commit 2261e05333.
Reason for revert: No issues with the CL, but it is taking the
last two available opcodes on arm64 (we use 9 bits to encode it,
so we are limited to 512 opcodes). We need to land a security fix
which includes the addition of two opcodes. Before relanding this,
we need to figure out a strategy to either reduce opcodes, or use
one more bit to encode them.
Original change's description:
> [arm64][wasm] Use NEON S/Usra for Wasm SIMD add(shr(x, imm), y)
>
> A single AArch64 SIMD signed/unsigned Shift Right and Accumulate can be
> used to implement Wasm SIMD add(shr(x, imm), y). This gives a 1-1.5%
> improvement on some compute intensive Wasm benchmarks on Neoverse-N1.
>
> Mla and Adalp optimisations were refactored to match the style of the
> added code.
>
> Change-Id: Id5959a31ca267e02b7d60e7ff6f942adb029b41e
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3089157
> Reviewed-by: Zhi An Ng <zhin@chromium.org>
> Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
> Cr-Commit-Position: refs/heads/master@{#76280}
Change-Id: Ifad0625ed8a6b66e7a7a74da11ad7d60941207e5
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3094014
Auto-Submit: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#76282}
A single AArch64 SIMD signed/unsigned Shift Right and Accumulate can be
used to implement Wasm SIMD add(shr(x, imm), y). This gives a 1-1.5%
improvement on some compute intensive Wasm benchmarks on Neoverse-N1.
Mla and Adalp optimisations were refactored to match the style of the
added code.
Change-Id: Id5959a31ca267e02b7d60e7ff6f942adb029b41e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3089157
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/master@{#76280}
The two instructions are fused into a single Uadalp instruction,
improving performance of quantized neural network operator
implementations such as XNNPACK.
Bug: v8:11546
Change-Id: Ic11b35d1e7758ee0b4ccfe8f592edc1aa798f6f2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2939997
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Daan de Graaf <daagra@google.com>
Cr-Commit-Position: refs/heads/master@{#75102}
The two instructions are fused into a single Sadalp instruction,
improving performance of quantized neural network operator
implementations such as XNNPACK.
This change also includes some formatting changes to the unit
tests that were made automatically by clang-format, which I am
happy to revert if preferred.
Bug: v8:11546
Change-Id: I2afc8940a52186617cffd276c82733ad3020b728
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2878742
Commit-Queue: Daan de Graaf <daagra@google.com>
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#74952}
cpplint rules change over time, and we change the exact rules we enable
for v8. This CL removes NOLINT annotations which are not needed
according to the currently enabled rules.
R=ahaas@chromium.org
Bug: v8:11717
Change-Id: Ica92f4ddc9c351c1c63147cbcf050086ca26cc07
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2859854
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#74297}
Modify TryAnyExtendMatch to combine Int64Add/Int64Sub(x, ChangeInt32ToInt64(y))
to use an extend register operand, removing the cast.
Change-Id: Id130f8a9614e2c208f9ed8c17b923ee738fcb916
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2857964
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/master@{#74285}
This reverts commit d2ce574457.
Reason for revert: We reverted the early canonicalization change, so we need to worry about non-canonicalized shuffles now.
Original change's description:
> [wasm-simd][arm64] Update f32x4.mul(dup) pattern matching
>
> We now canonicalize earlier in the pipeline, and don't need to worry
> about non-canonicalized shuffles.
>
> Bug: v8:11542,v8:11257
> Change-Id: If9f5c44061465be339c98e479fd8c5a437bbd74b
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2778673
> Reviewed-by: Bill Budge <bbudge@chromium.org>
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#73645}
Bug: v8:11542
Bug: v8:11257
Change-Id: Ib492b3ab7ad140193975d2641999c12c9697e27b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2850630
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#74193}
This reverts commit d16eefe0f2.
It is not correct to check for node equality during the graph
construction phase, because we can have optimizations that will combine
same nodes. So it can happen that in wasm-compiler, the inputs to
shuffle are not the same, so we canonicalize using that knowledge that
it will not be the same, and allow indices > 15. But later we can have
optimizations that combine the 2 inputs (e.g. splat of the same
constants), and the instruction selector will see that the input nodes
are the same.
Bug: v8:11542,chromium:1199662
Change-Id: I21c175f4707708038710147f64d687d1b14c6ecc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2829986
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#74017}
Introduce two machine nodes for FABD and fold Float32/64 Abs,Sub
during instruction selection.
This gives ~1% speed improvement of the Bullet physics engine
compiled as wasm.
Change-Id: Ifd985538e6ebb280bc0eaf11b0ebfc687891cf91
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2786854
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/master@{#73765}
Add Float32Select and Float64Select as OptionalOperators and insert
these, if supported, when handling a Select expression in the wasm
graph builder. FlagsContinuation have been modified to support the
select operation and code generation support has been added for arm64.
This improves the 'Bullet' physics benchmark by ~2-3%.
Change-Id: I928c3085c9136ad8baeeb34c71c47c1c8338844c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2763871
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73657}
We now canonicalize earlier in the pipeline, and don't need to worry
about non-canonicalized shuffles.
Bug: v8:11542,v8:11257
Change-Id: If9f5c44061465be339c98e479fd8c5a437bbd74b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2778673
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73645}
We currently canonicalize shuffles in the architecture specific
instruction selector. This has the drawback that if we want to pattern
match on nodes that have a shuffle as input, they need to individually
canonicalize the shuffle. There can also be a subtle bug if we
canonicalize the same shuffle node twice (see bug for details).
This moves the canonicalization to "construction time", in
wasm-compiler, when building the graph. As such, any pattern matches in
instruction-selector will only need to deal with canonicalized shuffles.
We introduce a new kind of parameter for shuffle nodes,
ShuffleParameter, to store the 16 bytes plus a bool indicating if this
is a swizzle. A swizzle essentially: inputs to the shuffle are the same
or all indices only touch 1 input. We calculate this when
canonicalizing, so store this bit of information inside of the node's
parameter.
We update the tests in x64 to handle special cases where, even though
the node's inputs are not swapped (due to canonicalization), they need
to be swapped for the specific instruction selected (e.g. palignr). The
test data also contains canonicalized shuffles, so we have to manually
canonicalize them.
Bug: v8:11542
Change-Id: I4e78082267bd03d6caedf43d68d81ef3f5f364a8
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2762420
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73495}
This is similar to the optimization for f32x4 dup + mul in
https://crrev.com/c/2719083. Refactor the pattern-matching code into a
helper function that returns a struct with all the necessary fields to
emit the optimized fmul by element instruction.
Add similar unittests and a negative test as well.
Bug: v8:11257
Change-Id: I79ab0bc783f43397191a54bf6fa736dd4dc8d807
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2728428
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73164}
Wasm SIMD does not have an opcode to multiply a vector by a scalar. In
these cases, Wasm code uses mul(x, shuffle(y, imms)), where the shuffle
is a dup of a single lane in y. Pattern match on this to emit a fmul
(element).
We can do similar pattern match on f64x2 too, that will come in a future
patch.
Bug: v8:11257
Change-Id: I61e8c46b56719a1179c8a6032dbf8a4cc03b40a9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2719083
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73141}
This instruction is not in the final SIMD proposal.
Bug: v8:6020
Change-Id: Ifef1b3d58bf660f2d30784f587aed85f327825ec
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2716073
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73058}
Fold distinct MUL and ADD (or SUB) instructions into a single MLA (or
MLS) instruction, mirroring what is being done for general purpose
registers.
SIMD wasm only uses the vectorized ADD and MUL instructions on quad
vectors (NEON Q), so only those cases are handled.
SIMD wasm only uses MUL by vectors, not by elements so there is no need
to check for an addition and shift reduction.
Change-Id: If07191dde9fb1dc37a5de27187800c15cc4325ea
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2184239
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/master@{#67770}
This reverts commit 0c72c71900.
Reason for revert: Wasm code size increase because not all pipelines use CommonOperatorReducer
Original change's description:
> Move branch inversion on ==0 into platform-agnostic reducer
>
> This change is based on a discussion from
> https://crrev.com/c/v8/v8/+/2053769/4/src/compiler/machine-operator-reducer.cc#1696
> wherein Tobias suggested moving the folding away of ==0 operations out
> of the platform-specific instruction selectors and into the
> MachineOperatorReducer. I noticed that CommonOperatorReducer already
> handles some very similar cases, so I have tried putting the ==0 folding
> into CommonOperatorReducer instead. I'm happy to move it into
> MachineOperatorReducer if that's better; I still don't have a very good
> understanding of how roles are separated among reducers.
>
> Change-Id: Ia0285bd9fafeef29d87cc88654bd6d355d467e8f
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2076498
> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Clemens Backes <clemensb@chromium.org>
> Reviewed-by: Georg Neis <neis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#66688}
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: chromium:1061767
Change-Id: Id1fdfb38357eb514d92ed3be0a683f077202faa4
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2117789
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Reviewed-by: Georg Neis <neis@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66862}
This change is based on a discussion from
https://crrev.com/c/v8/v8/+/2053769/4/src/compiler/machine-operator-reducer.cc#1696
wherein Tobias suggested moving the folding away of ==0 operations out
of the platform-specific instruction selectors and into the
MachineOperatorReducer. I noticed that CommonOperatorReducer already
handles some very similar cases, so I have tried putting the ==0 folding
into CommonOperatorReducer instead. I'm happy to move it into
MachineOperatorReducer if that's better; I still don't have a very good
understanding of how roles are separated among reducers.
Change-Id: Ia0285bd9fafeef29d87cc88654bd6d355d467e8f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2076498
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66688}
Since the turbo_decompression_elimination flag is removed, there
are several methods in machine-type.h that get simplified, e.g
TypeCompressedTaggedPointer() can be replaced by just
"TaggedPointer()".
Also Removing the creation of Change to/from Compressed nodes.
Removing these Change nodes' logic is left to a follow-up CL.
Bug: v8:7703
Change-Id: Iff1f9aa8361189cf781a26317fd342b942fd5aa4
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1897537
Commit-Queue: Santiago Aboy Solanes <solanes@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64834}
Code from ARES-6 Basic:
ldur w11, [x5, #15]
asr w11, w11, #1
sxtw x11, w11
With this CL:
ldur w11, [x5, #15]
sbfx x11, x11, #1, #31
This increases performance of Ares6 Basic by ~2% on Cortex-A53.
Also reduces the snapshot by ~2000 instructions.
Change-Id: Ie9801da730f832337306422d2a9c63461d9e5690
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1849530
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Cr-Commit-Position: refs/heads/master@{#64235}
Placing these tests in anonymous namespaces, is the suggested fix
according to the GCC documentation.
The GCC documentation states: "If a type A depends on a type B with no or
internal linkage, defining it in multiple translation units would be an
ODR violation because the meaning of B is different in each translation unit.
If A only appears in a single translation unit, the best way to silence the
warning is to give it internal linkage by putting it in an anonymous namespace as well."
Change-Id: I69a1e9b5f1789e9a7a62c762cd499809a72e0ea5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1836255
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64128}
Make TryEmitCbzOrTbz a template, so it can be used for Word64 as well as Word32.
0.09% reduction of embedded builtins size with a arm64 ptr-compr build.
Some of the unittests weren't ported to Word64 as they don't pass, this is due to
VisitWordCompare missing a loop to remove Word64Equal comparisons against 0. This
can be added in a different CL if needed.
Change-Id: I927129d934083b71abe5b77991c39286470a228d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1792908
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63751}
This removes LoadStackPointer and its last remaining use in the
interpreter assembler.
Bug: v8:9534
Change-Id: I19aafb12c5fd50248841a3d92448e64243c723ad
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1748729
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63164}
With compressed pointers, `kArchStoreWithBarrier` is a 32-bit store instead of
64-bit, and this means the index has a differerent immediate range.
Bug: v8:7703
Change-Id: If61c8544b0da87ba2779ba2c1a6963b52e3e5d9a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1710674
Commit-Queue: Sigurd Schneider <sigurds@chromium.org>
Reviewed-by: Sigurd Schneider <sigurds@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62861}
With a write barrier, stores with negative offsets would allocate a temporary
register to hold the offset when the `str` instruction is able to encode it.
For instance, when writing the object map:
```
;; This could be 'str x2, [x5, #-1]'
movn x4, #0x0
str x2, [x5, x4]
and x16, x5, #0xfffffffffffc0000
ldr x16, [x16, #8]
tbnz w16, #2, #+0xba8 ; Jump out-of-line
```
The reason behind this is that the out-of-line code uses an 'add' instruction on
the offset to compute the field address, putting pressure on the instruction
selector to make sure the immediate fits in both 'str' and 'add'.
But, this is not necessary since the macro-assembler is able to turn the 'add'
into a 'sub' or use a temporary register if needed.
Change-Id: I8838e4b81a0c0c1f90aa3d67861a9da1a6dfed06
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1708471
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Sigurd Schneider <sigurds@chromium.org>
Commit-Queue: Pierre Langlois <pierre.langlois@arm.com>
Cr-Commit-Position: refs/heads/master@{#62803}
Cpplint usually checks for non-const reference arguments. They are
forbidden in the style guide, and v8 does not explicitly make an
exception here.
This CL re-enables that warning, and fixes all current violations by
adding an explicit "NOLINT(runtime/references)" comment. In follow-up
CLs, we should aim to remove as many of them as possible.
TBR=mlippautz@chromium.org
Bug: v8:9429
Change-Id: If7054d0b366138b731972ed5d4e304b5ac8423bb
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1687891
Reviewed-by: Clemens Hammacher <clemensh@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Reviewed-by: Igor Sheludko <ishell@chromium.org>
Commit-Queue: Clemens Hammacher <clemensh@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62551}
On Windows, long is 32-bits, so the 'L' suffix shouldn't be used if a
64-bit value is needed. This caused a test failure in 'Int64MulWithImmediate'.
Change-Id: I93c43a1f166aa0e5bcd53aaf7a860fffd006fd0b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1627538
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#61986}
This replaces all typedefs that define types and not functions by the
equivalent "using" declaration.
This was done mostly automatically using this command:
ag -l '\btypedef\b' src test | xargs -L1 \
perl -i -p0e 's/typedef ([^*;{}]+) (\w+);/using \2 = \1;/sg'
Patchset 2 then adds some manual changes for typedefs for pointer types,
where the regular expression did not match.
R=mstarzinger@chromium.orgTBR=yangguo@chromium.org, jarin@chromium.org
Bug: v8:9183
Change-Id: I6f6ee28d1793b7ac34a58f980b94babc21874b78
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1631409
Commit-Queue: Clemens Hammacher <clemensh@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#61849}