Some shuffles take have either register or memory operand for second
input, but the codegen incorrectly assumes that it is always a register.
Bug: v8:10824
Change-Id: Ia2df233dad4ed451e52e57e35cce5c80db0905db
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2373586
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69562}
With the new Turbofan variants (NCI and Turboprop), we need a way to
distinguish between them both during and after compilation. We
initially introduced CompilationTarget to track the variant during
compilation, but decided to reuse the code kind as the canonical spot to
store this information instead.
Why? Because it is an established mechanism, already available in most
of the necessary spots (inside the pipeline, on Code objects, in
profiling traces).
This CL removes CompilationTarget and adds a new
NATIVE_CONTEXT_INDEPENDENT kind, plus helper functions to determine
various things about a given code kind (e.g.: does this code kind
deopt?).
As a (very large) drive-by, refactor both Code::Kind and
AbstractCode::Kind into a new CodeKind enum class.
Bug: v8:8888
Change-Id: Ie858b9a53311b0731630be35cf5cd108dee95b39
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2336793
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Dominik Inführ <dinfuehr@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69244}
This is a reland of f7f72b7b3a
This was reverted because of a test timing out on slow_path
variant (https://crrev.com/c/2237131 for details). Turns out
the test is just really slow, and was skipped on this variant
in https://crrev.com/c/2237628. Relanding without changes.
Original change's description:
> [wasm-simd] Prototype f64x2 rounding instructions
>
> Implements f64x2 ceil, floor, trunc, nearestint, for interpreter and
> x64.
>
> Bug: v8:10553
> Change-Id: I12a260a3b1d728368e5525d317d30fc9581cae04
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2213082
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#68241}
Tbr: tebbi@chromium.org
Bug: v8:10553
Change-Id: I4cdc23d0556f11310d32fa066f40b057fd49d2d7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2237350
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68304}
vroundps assembly is incorrect:
- the signature was wrong, vroundps takes 2 operands and 1 immediate
- when calling vinstr, should always pass xmm0, this wasn't causing
issues because our test cases were restricted enough that it was always
xmm0 anyway
- the macro assembler should use AVX_OP_SSE4_1, since roundps requires
SSE4_1
- drive-by fix for roundss and roundsd to be AVX_OP_SSE4_1
- add disasm for roundps and vroundps, and test them
Bug: v8:10553
Change-Id: I4046eb81a9f18d5af7137bbd46bfa0478e5a9ab2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2227252
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68157}
While working on some AVX stuff, saw that these ops were missing from
the test cases.
Change-Id: Ie41be465a0715323096c6549b21aa9e994eaac3e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2137472
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67072}
For a bunch of s8x16, s16x2 and s32x4 shuffle ops (generated by
s8x16shuffle).
Bug: v8:9561
Change-Id: I0e5cd8a90edba8bc15918c0ca1dc830475db2769
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2110952
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66865}
The AVX versions of pabsb, pabsw, and pabsd have an incorrect function
signature, they should only have two operands. So, extract them into
another macro list. And separately generate the right signatures and
implementations. Also update the disasm and tests.
Bug: v8:10233
Change-Id: I95ee0bf12bb285d10324ecedcec28e941f64d2dc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2063199
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66382}
These instructions were probably leftover from an earlier cleanup. We
can move them into respective macro lists, then delete away the
redundant declarations, definitions, disasm, and tests.
We were missing disasm tests for SSE2_INSTRUCTION_LIST_SD, so add that
in.
Change-Id: I8f27beaf57e7a338097690073910a0863f00b26a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2036833
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66123}
The assembly of sqrtpd when using Sqrtpd macro was wrong, since
Sqrtpd(xmm1, xmm1) will incorrect generated vsqrtpd(xmm1, xmm1, xmm1),
which is nonsensical, since vsqrtpd only takes two operands. The
expected instruction should be vsqrtpd(xmm1, xmm0, xmm1) in terms of the
encoding, which is vsqrtpd(xmm1, xmm1).
So, move sqrtpd and cvtps2dq out into their own macro list, because
they have two operands in their AVX form, unlike the rest of the
instructions in SSE2_INSTRUCTION_LIST.
Also updated disasm and tests to use this new list.
Fixed: v8:10170
Change-Id: Ia9343c9a3ae64596bbc876744556e1dcea2a443b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2032195
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66088}
This CL introduces the negb and negw instructions (8-bit and 16-bit
versions of neg) in the x64 assembler. These instructions are needed to
implement I32AtomicSub8U and similar WebAssembly instructions
efficiently.
The existing implementation was embedded in a generic macro, and it was
difficult to change it without introducing also the 8-bit and 16-bit
versions of many other instructions. This would have introduced a lot
of dead code. Instead this CL extracted the neg instructions from the
macro and implements them directly. This should be fine because the
assembler does not change much, and approachability of the code is
improved.
R=clemensb@chromium.org
Bug: v8:10108
Change-Id: I46099bbebd47f864311a67da3ba8ddc4fe4cd35d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2019165
Commit-Queue: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65989}
This CL introduces the xadd instruction to the x64 assembler so it can
be used to implement WebAssembly's AtomicAdd. This is done in a
separate CL though.
R=clemensb@chromium.org
Bug: v8:10108
Change-Id: I36dcb900ed4c39b23c4996328774780afd8b816a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2011105
Commit-Queue: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65879}
Add missing disasm tests for vroundss and vpalignr.
Fix disasm for vinsertps and vpinsrq.
Change-Id: I0f3907761b998d27ec00435a569084724af54ae2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1990140
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65799}
blendvpd should not be defined in the macro list, since the AVX version
has 4 operands, not 3.
Change-Id: Id020b460fa1a3510a91490f3b2286024cc6c5994
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1990139
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65771}
The AVX version should only take one argument, so these instructions have
to be split from the main list of SSE4 instructions, whose AVX version
have two arguments.
Bug: v8:9886
Change-Id: Ie37e060711babd7760547e2aa01c9c0fb0c728b5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986215
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65588}
This change includes splitting the existing SSE_INSTRUCTION_LIST into two:
1. sse instructions with two-operand AVX
2. sse instructions with three-operand AVX
Also a drive by fix for disasm of pblendw, the printing of imm8 doesn't
not require AND-ing with 3, since all 8 bits are significant.
Bug: v8:9561
Change-Id: I56c93a24bb9905ae6422698c793b27f3b9e66d8f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1933593
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65274}
This is a reland of 855591a54d
Fixes break in builds that verify ReadOnlyHeap by relaxing the requirement for
Code objects to be in CODE_SPACE in PagedSpaceObjectIterator::FromCurrentPage.
Original change's description:
> Reland: [builtins] Move non-JS linkage builtins code objects into RO_SPACE
>
> Reland of https://chromium-review.googlesource.com/c/v8/v8/+/1795358.
>
> [builtins] Move non-JS linkage builtins code objects into RO_SPACE
>
> Creates an allow-list of builtins that can still go in code_space
> including all TFJ builtins and a small manual list that should be pared
> down in the future.
>
> For builtins that go in RO_SPACE a Code object is created that contains an
> immediate trap instruction. Generally these Code objects are still no
> smaller than CODE_SPACE Code objects because of the Code object alignment
> requirements. This will hopefully be addressed in a follow-up CL either by
> relaxing them or removing the instruction stream completely.
>
> In the snapshot, this reduces code_space from ~152k to ~40k (-112k) and
> increases by the same amount.
>
> Change-Id: I76661c35c7ea5866c1fb16e87e87122b3e3ca0ce
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893336
> Commit-Queue: Dan Elphick <delphick@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#64700}
Change-Id: I4eeb7dab3027b42fa58c5dfb2bad9873e9fff250
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893192
Commit-Queue: Dan Elphick <delphick@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64728}
This reverts commit 855591a54d.
Reason for revert: Breaks arm64 sim tests
https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20debug/17957https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20gc%20stress/16585
Original change's description:
> Reland: [builtins] Move non-JS linkage builtins code objects into RO_SPACE
>
> Reland of https://chromium-review.googlesource.com/c/v8/v8/+/1795358.
>
> [builtins] Move non-JS linkage builtins code objects into RO_SPACE
>
> Creates an allow-list of builtins that can still go in code_space
> including all TFJ builtins and a small manual list that should be pared
> down in the future.
>
> For builtins that go in RO_SPACE a Code object is created that contains an
> immediate trap instruction. Generally these Code objects are still no
> smaller than CODE_SPACE Code objects because of the Code object alignment
> requirements. This will hopefully be addressed in a follow-up CL either by
> relaxing them or removing the instruction stream completely.
>
> In the snapshot, this reduces code_space from ~152k to ~40k (-112k) and
> increases by the same amount.
>
> Change-Id: I76661c35c7ea5866c1fb16e87e87122b3e3ca0ce
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893336
> Commit-Queue: Dan Elphick <delphick@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#64700}
TBR=ulan@chromium.org,jgruber@chromium.org,delphick@chromium.org
Change-Id: I4211c3bb7fe4741e0ba3898f92ce382dfc93c4f3
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893636
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64701}
Reland of https://chromium-review.googlesource.com/c/v8/v8/+/1795358.
[builtins] Move non-JS linkage builtins code objects into RO_SPACE
Creates an allow-list of builtins that can still go in code_space
including all TFJ builtins and a small manual list that should be pared
down in the future.
For builtins that go in RO_SPACE a Code object is created that contains an
immediate trap instruction. Generally these Code objects are still no
smaller than CODE_SPACE Code objects because of the Code object alignment
requirements. This will hopefully be addressed in a follow-up CL either by
relaxing them or removing the instruction stream completely.
In the snapshot, this reduces code_space from ~152k to ~40k (-112k) and
increases by the same amount.
Change-Id: I76661c35c7ea5866c1fb16e87e87122b3e3ca0ce
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893336
Commit-Queue: Dan Elphick <delphick@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64700}
These are SSE2 instructions that deal with scalar double precision
values, and look like the packed double precision variant of the
instructions, but with a prefix.
E.g. sqrtpd is 66 0F 51, sqrtss is F2 0F 51.
We don't put this in the same list, even though the implementation
is very similar, because SSE2_INSTRUCTION_LIST is used in other
macros which generate AVX versions of this, and that overlaps with
another macro which generates AVX versions of these X-sd instructions.
I will tease this apart and clean it up in subsequent changes.
Bug: v8:9810
Change-Id: I0db64fe0d37df5685158331ce9f48bd1c763cc59
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1874510
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64688}
These are SSE instructions that deal with scalar single precision
values, and look like the packed single precision variant of the
instructions, but with a prefix.
E.g. sqrtps is NP 0F 51, sqrtss is F3 0F 51.
Bug: v8:9810
Change-Id: I417ea6d4d85d8618ad6602a1b32d4428db0d66d2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1874509
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64658}
This class used to describe unoptimized but compiled frames. All such
frames are by now covered via the architecture-independent description
in the {StandardFrameConstants} class (or one of its subclasses).
R=clemensb@chromium.org
BUG=v8:9810
Change-Id: I294cc6eec7d4a05e88e7aa336f1ebedfa0eb6e98
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1878708
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Michael Stanton <mvstanton@chromium.org>
Commit-Queue: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64556}
This adds avx for extractps, insertps, and cvtdq2ps. These require
SSE4_1, so modified AvxHelper to take another template arg for sse4
operations, and open the proper cpu scope before calling this arg.
Bug: v8:9561
Change-Id: Iad2be7ebab41b96f7eb74f4e2bd9776002e6a76c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1874378
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64529}
This reverts commit 83f8464ffc.
Reason for revert: speculative revert for blink linux failure
https://ci.chromium.org/p/v8/builders/ci/V8%20Blink%20Linux/1272
Original change's description:
> [builtins] Move non-JS linkage builtins code objects into RO_SPACE
>
> Creates an allow-list of builtins that can still go in code_space
> including all TFJ builtins and a small manual list that should be pared
> down in the future.
>
> For builtins that go in RO_SPACE a Code object is created that contains
> no code at all (shrinking its size from 96 bytes to 64 bytes on x64),
> but is there to allow the runtime to continue to work since it expects
> a Code object.
>
> This reduces code_space from ~152k to ~40k (-112k) and increases
> read_only_space from 33k to 108k (+75k) in the snapshot.
>
> Bug: v8:7464, v8:9821, v8:9338, v8:8127
> Change-Id: Icc8bfc722bb267a2bcc17e2f1e27bef7f02f2376
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1795358
> Commit-Queue: Dan Elphick <delphick@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#64377}
TBR=mstarzinger@chromium.org,jgruber@chromium.org,delphick@chromium.org
Change-Id: I4cf38e9370280acdd2de718ca527776ebc509003
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:7464, v8:9821, v8:9338, v8:8127
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1868621
Reviewed-by: Sathya Gunasekaran <gsathya@chromium.org>
Commit-Queue: Sathya Gunasekaran <gsathya@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64383}
Creates an allow-list of builtins that can still go in code_space
including all TFJ builtins and a small manual list that should be pared
down in the future.
For builtins that go in RO_SPACE a Code object is created that contains
no code at all (shrinking its size from 96 bytes to 64 bytes on x64),
but is there to allow the runtime to continue to work since it expects
a Code object.
This reduces code_space from ~152k to ~40k (-112k) and increases
read_only_space from 33k to 108k (+75k) in the snapshot.
Bug: v8:7464, v8:9821, v8:9338, v8:8127
Change-Id: Icc8bfc722bb267a2bcc17e2f1e27bef7f02f2376
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1795358
Commit-Queue: Dan Elphick <delphick@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64377}
WebAssembly locals are specified to be zero on function entry. Liftoff
implements this by just storing the constant 0 in the virtual stack for
integer types, and using one floating point register initialized to
zero for all floating point types.
For big counts of locals this leads to problems (manifesting as huge
blocks of code being generated) once we hit a merge point: All those
constants (for int) and all duplicate register uses (for floats) need to
be fixed up, by using separate registers for the locals or spilling to
the stack if no more registers are available. All this spilling
generates a lot of code, and can even happen multiple times within a
function.
This CL optimizes for such cases by spilling all locals to the stack
initially. All merges within the function body get much smaller then.
The spilled values rarely have to be loaded anyway, because the initial
zero value is usually overwritten before the first use.
To optimize the code size for initializing big numbers of locals on the
stack, this CL also introduces the platform-specific
{FillStackSlotsWithZero} method which uses a loop for bigger local
counts.
This often saves dozens of kilobytes for very big functions, and shows
an overall code size reduction of 4-5 percent for big modules.
R=jkummerow@chromium.org
Bug: v8:9830
Change-Id: I23fa4145847827420f09e043a11e0e7b606e94cc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1856004
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64282}