Commit Graph

198 Commits

Author SHA1 Message Date
Ng Zhi An
9c120b753d [wasm-simd][x64] Fix encoding of vcvtdq2pd
vcvtdq2pd was incorrectly declared to take 3 operands, the use of the
macro Cvtdq2pd meant that the call was vcvtdq2pd(dst, dst, src). This
is an incorrect encoding. Our tests happen to pass because dst was xmm0,
which made it accidentally correct.

This fixes it by moving cvtdq2pd out of the macro list.

Bug: v8:11265
Change-Id: I8b1baf4dd2c670021eafa76dc1a10b442f812805
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2654003
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#72382}
2021-01-27 22:48:59 +00:00
Ng Zhi An
173d660849 [wasm-simd][x64] Optimize i8x16.popcnt with aligned moves
movups is slower on older hardware (core2) than movaps, even if the
operand is aligned. (Not an issue on modern hardware).

Also move i8x16.splat(0x0F) to an external reference so we can load the
mask directly.

Bug: v8:11002
Change-Id: I0b01c27a142024d50b9faaa9e7bd6a1fe169e141
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2643242
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#72336}
2021-01-26 19:03:10 +00:00
Zhi An Ng
ffc832becf [wasm-simd][x64][avx2] Optimize f32x4.splat
When AVX2 is available, we can use vbroadcastss. On AVX, use vshufps,
since it is non-destructive. On SSE, shufps is 1 byte shorter.

FIXED=b/175364402

Change-Id: I5bd10914579d8db012192a9c04f7b0038ec1c812
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599849
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71964}
2021-01-08 03:03:45 +00:00
Zhi An Ng
506c09797c [x64] Sort out move instructions in codegen
In AVX, it is better to use the appropriate integer or floating point
moves depending on which instructions produce/consume these moves, since
there can be a delay moving from integer to floating point domain. On
SSE systems, it is less important, and we can move movaps/movups which
is 1 byte shorter than movdqa/movdqu.

This patch cleans up a couple of places, and defines macro-assembler
functions Movdqa, Movdqu, Movapd, to call into movaps/movups when AVX is
not supported.

Change-Id: Iba6c54e218875f1a70f61792978d7b3f69edfb4b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599843
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71884}
2020-12-29 01:27:23 +00:00
Zhi An Ng
c9560d1dbf [wasm-simd][x64][avx2] Improve codegen for load{8,16}_splat
Detect AVX2 support and use vpbroadcastb or vpbroadcastw.

No new assembler helpers required because we are only emitting the
VEX-128 versions of these instructions.

Bug: v8:11258
Change-Id: Ic50178daa6fc8fe767dfc788e61e67538066bdea
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596582
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71866}
2020-12-23 01:56:42 +00:00
Zhi An Ng
741e5a66de [wasm-simd][ia32][x64] More optimization for f32x4.extract_lane
We can have more optimizations for this instruction, they leave some
junk in the top lanes of dst, but that doesn't matter:

- when lane is 1: we use movshdup, this is 4 bytes long
- when lane is 2: use movhlps, this is 3 bytes long
- otherwise use shufps (4 bytes) or pshufd (5 bytes)

All of which are better than insertps (6 bytes).

Change-Id: I0e524431d1832e297e8c8bb418d42382d93fa691
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591850
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71813}
2020-12-17 01:58:52 +00:00
Zhi An Ng
6cb61e63bb [wasm-simd][x64] Optimize f64x2.extract_lane
pextrq + movq crosses register files twice, which is not efficient.

Optimize this by:
- checking if lane 0, do nothing if dst == src (macro-assembler helper)
- use vmovhlps on AVX, with src as the operands to avoid false
dependency on dst
- use movhlps otherwise, this is shorter than shufpd, and faster on
older system

Change-Id: I3486d87224c048b3229c2f92359b8b8e6d5fd025
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589056
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71751}
2020-12-14 23:53:19 +00:00
Zhi An Ng
b0d7912042 [wasm-simd][x64] Prototype sign select
Prototype i8x16, i16x8, i32x4, i64x2 sign select on x64 and interpreter.

Bug: v8:10983
Change-Id: I7d6f39a2cb4c2aefe31daac782978fe8b363dd1a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486235
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70818}
2020-10-28 03:32:57 +00:00
Jakob Gruber
c7cb9beca1 Reland "Reland "[deoptimizer] Change deopt entries into builtins""
This is a reland of fbfa9bf4ec

The arm64 was missing proper codegen for CFI, thus sizes were off.

Original change's description:
> Reland "[deoptimizer] Change deopt entries into builtins"
>
> This is a reland of 7f58ced72e
>
> It fixes the different exit size emitted on x64/Atom CPUs due to
> performance tuning in TurboAssembler::Call. Additionally, add
> cctests to verify the fixed size exits.
>
> Original change's description:
> > [deoptimizer] Change deopt entries into builtins
> >
> > While the overall goal of this commit is to change deoptimization
> > entries into builtins, there are multiple related things happening:
> >
> > - Deoptimization entries, formerly stubs (i.e. Code objects generated
> >   at runtime, guaranteed to be immovable), have been converted into
> >   builtins. The major restriction is that we now need to preserve the
> >   kRootRegister, which was formerly used on most architectures to pass
> >   the deoptimization id. The solution differs based on platform.
> > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> > - Removed heap/ support for immovable Code generation.
> > - Removed the DeserializerData class (no longer needed).
> > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
> >   in which the final jump to the deoptimization entry is generated
> >   once per Code object, and deopt exits can continue to emit a
> >   near-call.
> > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
> >   sizes by 4/8, 5, and 5 bytes, respectively.
> >
> > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> > by using the same strategy as on arm64 (recalc deopt id from return
> > address). Before:
> >
> >  e300a002       movw r10, <id>
> >  e59fc024       ldr ip, [pc, <entry offset>]
> >  e12fff3c       blx ip
> >
> > After:
> >
> >  e59acb35       ldr ip, [r10, <entry offset>]
> >  e12fff3c       blx ip
> >
> > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> > object (max 32 bytes added overhead per Code object). Before:
> >
> >  9401cdae       bl <entry offset>
> >
> > After:
> >
> >  # eager deoptimization entry jump.
> >  f95b1f50       ldr x16, [x26, <eager entry offset>]
> >  d61f0200       br x16
> >  # lazy deoptimization entry jump.
> >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
> >  d61f0200       br x16
> >  # the deopt exit.
> >  97fffffc       bl <eager deoptimization entry jump offset>
> >
> > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
> >
> >  bb00000000     mov ebx,<id>
> >  e825f5372b     call <entry>
> >
> > After:
> >
> >  e8ea2256ba     call <entry>
> >
> > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
> >
> >  49c7c511000000 REX.W movq r13,<id>
> >  e8ea2f0700     call <entry>
> >
> > After:
> >
> >  41ff9560360000 call [r13+<entry offset>]
> >
> > Bug: v8:8661,v8:8768
> > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#70597}
>
> Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
> Bug: v8:8661,v8:8768,chromium:1140165
> Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70655}

Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
Bug: v8:8661
Bug: v8:8768
Bug: chromium:1140165
Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70672}
2020-10-21 06:01:38 +00:00
Maya Lekova
7c7aa4fa94 Revert "Reland "[deoptimizer] Change deopt entries into builtins""
This reverts commit fbfa9bf4ec.

Reason for revert: Seems to break arm64 sim CFI build (please see DeoptExitSizeIfFixed) - https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/2808

Original change's description:
> Reland "[deoptimizer] Change deopt entries into builtins"
>
> This is a reland of 7f58ced72e
>
> It fixes the different exit size emitted on x64/Atom CPUs due to
> performance tuning in TurboAssembler::Call. Additionally, add
> cctests to verify the fixed size exits.
>
> Original change's description:
> > [deoptimizer] Change deopt entries into builtins
> >
> > While the overall goal of this commit is to change deoptimization
> > entries into builtins, there are multiple related things happening:
> >
> > - Deoptimization entries, formerly stubs (i.e. Code objects generated
> >   at runtime, guaranteed to be immovable), have been converted into
> >   builtins. The major restriction is that we now need to preserve the
> >   kRootRegister, which was formerly used on most architectures to pass
> >   the deoptimization id. The solution differs based on platform.
> > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> > - Removed heap/ support for immovable Code generation.
> > - Removed the DeserializerData class (no longer needed).
> > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
> >   in which the final jump to the deoptimization entry is generated
> >   once per Code object, and deopt exits can continue to emit a
> >   near-call.
> > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
> >   sizes by 4/8, 5, and 5 bytes, respectively.
> >
> > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> > by using the same strategy as on arm64 (recalc deopt id from return
> > address). Before:
> >
> >  e300a002       movw r10, <id>
> >  e59fc024       ldr ip, [pc, <entry offset>]
> >  e12fff3c       blx ip
> >
> > After:
> >
> >  e59acb35       ldr ip, [r10, <entry offset>]
> >  e12fff3c       blx ip
> >
> > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> > object (max 32 bytes added overhead per Code object). Before:
> >
> >  9401cdae       bl <entry offset>
> >
> > After:
> >
> >  # eager deoptimization entry jump.
> >  f95b1f50       ldr x16, [x26, <eager entry offset>]
> >  d61f0200       br x16
> >  # lazy deoptimization entry jump.
> >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
> >  d61f0200       br x16
> >  # the deopt exit.
> >  97fffffc       bl <eager deoptimization entry jump offset>
> >
> > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
> >
> >  bb00000000     mov ebx,<id>
> >  e825f5372b     call <entry>
> >
> > After:
> >
> >  e8ea2256ba     call <entry>
> >
> > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
> >
> >  49c7c511000000 REX.W movq r13,<id>
> >  e8ea2f0700     call <entry>
> >
> > After:
> >
> >  41ff9560360000 call [r13+<entry offset>]
> >
> > Bug: v8:8661,v8:8768
> > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#70597}
>
> Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
> Bug: v8:8661,v8:8768,chromium:1140165
> Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70655}

TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org

Change-Id: I4739a3475bfd8ee0cfbe4b9a20382f91a6ef1bf0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:8661
Bug: v8:8768
Bug: chromium:1140165
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485223
Reviewed-by: Maya Lekova <mslekova@chromium.org>
Commit-Queue: Maya Lekova <mslekova@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70658}
2020-10-20 14:14:12 +00:00
Jakob Gruber
fbfa9bf4ec Reland "[deoptimizer] Change deopt entries into builtins"
This is a reland of 7f58ced72e

It fixes the different exit size emitted on x64/Atom CPUs due to
performance tuning in TurboAssembler::Call. Additionally, add
cctests to verify the fixed size exits.

Original change's description:
> [deoptimizer] Change deopt entries into builtins
>
> While the overall goal of this commit is to change deoptimization
> entries into builtins, there are multiple related things happening:
>
> - Deoptimization entries, formerly stubs (i.e. Code objects generated
>   at runtime, guaranteed to be immovable), have been converted into
>   builtins. The major restriction is that we now need to preserve the
>   kRootRegister, which was formerly used on most architectures to pass
>   the deoptimization id. The solution differs based on platform.
> - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> - Removed heap/ support for immovable Code generation.
> - Removed the DeserializerData class (no longer needed).
> - arm64: to preserve 4-byte deopt exits, introduced a new optimization
>   in which the final jump to the deoptimization entry is generated
>   once per Code object, and deopt exits can continue to emit a
>   near-call.
> - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
>   sizes by 4/8, 5, and 5 bytes, respectively.
>
> On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> by using the same strategy as on arm64 (recalc deopt id from return
> address). Before:
>
>  e300a002       movw r10, <id>
>  e59fc024       ldr ip, [pc, <entry offset>]
>  e12fff3c       blx ip
>
> After:
>
>  e59acb35       ldr ip, [r10, <entry offset>]
>  e12fff3c       blx ip
>
> On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> object (max 32 bytes added overhead per Code object). Before:
>
>  9401cdae       bl <entry offset>
>
> After:
>
>  # eager deoptimization entry jump.
>  f95b1f50       ldr x16, [x26, <eager entry offset>]
>  d61f0200       br x16
>  # lazy deoptimization entry jump.
>  f95b2b50       ldr x16, [x26, <lazy entry offset>]
>  d61f0200       br x16
>  # the deopt exit.
>  97fffffc       bl <eager deoptimization entry jump offset>
>
> On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
>
>  bb00000000     mov ebx,<id>
>  e825f5372b     call <entry>
>
> After:
>
>  e8ea2256ba     call <entry>
>
> On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
>
>  49c7c511000000 REX.W movq r13,<id>
>  e8ea2f0700     call <entry>
>
> After:
>
>  41ff9560360000 call [r13+<entry offset>]
>
> Bug: v8:8661,v8:8768
> Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70597}

Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
Bug: v8:8661,v8:8768,chromium:1140165
Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70655}
2020-10-20 12:30:23 +00:00
Jakob Gruber
8bc9a7941c Revert "[deoptimizer] Change deopt entries into builtins"
This reverts commit 7f58ced72e.

Reason for revert: Segfaults on Atom_x64 https://ci.chromium.org/p/v8-internal/builders/ci/v8_linux64_atom_perf/5686?

Original change's description:
> [deoptimizer] Change deopt entries into builtins
>
> While the overall goal of this commit is to change deoptimization
> entries into builtins, there are multiple related things happening:
>
> - Deoptimization entries, formerly stubs (i.e. Code objects generated
>   at runtime, guaranteed to be immovable), have been converted into
>   builtins. The major restriction is that we now need to preserve the
>   kRootRegister, which was formerly used on most architectures to pass
>   the deoptimization id. The solution differs based on platform.
> - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> - Removed heap/ support for immovable Code generation.
> - Removed the DeserializerData class (no longer needed).
> - arm64: to preserve 4-byte deopt exits, introduced a new optimization
>   in which the final jump to the deoptimization entry is generated
>   once per Code object, and deopt exits can continue to emit a
>   near-call.
> - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
>   sizes by 4/8, 5, and 5 bytes, respectively.
>
> On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> by using the same strategy as on arm64 (recalc deopt id from return
> address). Before:
>
>  e300a002       movw r10, <id>
>  e59fc024       ldr ip, [pc, <entry offset>]
>  e12fff3c       blx ip
>
> After:
>
>  e59acb35       ldr ip, [r10, <entry offset>]
>  e12fff3c       blx ip
>
> On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> object (max 32 bytes added overhead per Code object). Before:
>
>  9401cdae       bl <entry offset>
>
> After:
>
>  # eager deoptimization entry jump.
>  f95b1f50       ldr x16, [x26, <eager entry offset>]
>  d61f0200       br x16
>  # lazy deoptimization entry jump.
>  f95b2b50       ldr x16, [x26, <lazy entry offset>]
>  d61f0200       br x16
>  # the deopt exit.
>  97fffffc       bl <eager deoptimization entry jump offset>
>
> On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
>
>  bb00000000     mov ebx,<id>
>  e825f5372b     call <entry>
>
> After:
>
>  e8ea2256ba     call <entry>
>
> On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
>
>  49c7c511000000 REX.W movq r13,<id>
>  e8ea2f0700     call <entry>
>
> After:
>
>  41ff9560360000 call [r13+<entry offset>]
>
> Bug: v8:8661,v8:8768
> Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70597}

TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: v8:8661,v8:8768,chromium:1140165
Change-Id: I3df02ab42f6e02233d9f6fb80e8bb18f76870d91
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485504
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70649}
2020-10-20 09:43:19 +00:00
Jakob Gruber
7f58ced72e [deoptimizer] Change deopt entries into builtins
While the overall goal of this commit is to change deoptimization
entries into builtins, there are multiple related things happening:

- Deoptimization entries, formerly stubs (i.e. Code objects generated
  at runtime, guaranteed to be immovable), have been converted into
  builtins. The major restriction is that we now need to preserve the
  kRootRegister, which was formerly used on most architectures to pass
  the deoptimization id. The solution differs based on platform.
- Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
- Removed heap/ support for immovable Code generation.
- Removed the DeserializerData class (no longer needed).
- arm64: to preserve 4-byte deopt exits, introduced a new optimization
  in which the final jump to the deoptimization entry is generated
  once per Code object, and deopt exits can continue to emit a
  near-call.
- arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
  sizes by 4/8, 5, and 5 bytes, respectively.

On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
by using the same strategy as on arm64 (recalc deopt id from return
address). Before:

 e300a002       movw r10, <id>
 e59fc024       ldr ip, [pc, <entry offset>]
 e12fff3c       blx ip

After:

 e59acb35       ldr ip, [r10, <entry offset>]
 e12fff3c       blx ip

On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
with CFI). Additionally, up to 4 builtin jumps are emitted per Code
object (max 32 bytes added overhead per Code object). Before:

 9401cdae       bl <entry offset>

After:

 # eager deoptimization entry jump.
 f95b1f50       ldr x16, [x26, <eager entry offset>]
 d61f0200       br x16
 # lazy deoptimization entry jump.
 f95b2b50       ldr x16, [x26, <lazy entry offset>]
 d61f0200       br x16
 # the deopt exit.
 97fffffc       bl <eager deoptimization entry jump offset>

On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:

 bb00000000     mov ebx,<id>
 e825f5372b     call <entry>

After:

 e8ea2256ba     call <entry>

On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:

 49c7c511000000 REX.W movq r13,<id>
 e8ea2f0700     call <entry>

After:

 41ff9560360000 call [r13+<entry offset>]

Bug: v8:8661,v8:8768
Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70597}
2020-10-19 07:32:48 +00:00
Ng Zhi An
944dad59c8 [x64] Add movlps and movhps to assembler
These instructions will be used for prototyping Wasm SIMD's store lane
later on, separated the implementation for assembler and disassembler
into this patch to make things smaller.

Curiously, movhps and movlhps seems to have the same encoding, 0f 16, so
I'm not sure not sure how to differentiate them in the disassembler
besides using the mod field, since movlhps only takes xmm registers,
whereas movhps always take 1 operand.

Bug: v8:10975
Change-Id: I8be9a31b1c9a5515038f9c8c55ef30d1ba063ea7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2471977
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70520}
2020-10-15 00:37:32 +00:00
Ng Zhi An
e30c50f3bf [x64] Refactor pinsrb family of instructions
The existing macro assembler define Pinsrb, which expects 3 arguments:

- XMMRegister dst
- Register/Operand src
- uint8_t imm

which overwrites dst with src at lane specified by imm.

That means we cannot use the AVX version, which has 4 arguments, and
does not overwrite dst.

This refactoring defines the 4 argument AVX version instead, and if AVX
is not supported, fall back to the SSE version, and ensure that the
value is copied over into dst first.

For convenience, we define an overload with 3 arguments that duplicates
dst, this replicates the SSE behavior, so that not all callers have to
be updated.

Bug: v8:10975, v8:10933
Change-Id: I6f9b9d37fa08d3f5cff4f040ae7d5e1f0cf36455
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2444096
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70392}
2020-10-07 23:25:30 +00:00
Jakob Gruber
29bcdaad1d Rename legacy code kinds
CodeKind::OPTIMIZED_CODE -> TURBOFAN

Kinds are now more fine-grained and distinguish between TF, TP, NCI.

CodeKind::STUB -> DEOPT_ENTRIES_OR_FOR_TESTING

Code stubs (like builtins, but generated at runtime) were removed from
the codebase years ago, this is the last remnant. This kind is used
only for deopt entries (which should be converted into builtins) and
for tests.

Change-Id: I67beb15377cb60f395e9b051b25f3e5764982e93
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2440335
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Mythri Alle <mythria@chromium.org>
Reviewed-by: Mythri Alle <mythria@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70234}
2020-09-30 15:39:23 +00:00
Ng Zhi An
ddf30bea13 [wasm-simd][x64] Check for register when emitting shuffles
Some shuffles take have either register or memory operand for second
input, but the codegen incorrectly assumes that it is always a register.

Bug: v8:10824
Change-Id: Ia2df233dad4ed451e52e57e35cce5c80db0905db
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2373586
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69562}
2020-08-25 17:52:16 +00:00
Jakob Gruber
c51041f454 [nci] Replace CompilationTarget with a new Code::Kind value
With the new Turbofan variants (NCI and Turboprop), we need a way to
distinguish between them both during and after compilation. We
initially introduced CompilationTarget to track the variant during
compilation, but decided to reuse the code kind as the canonical spot to
store this information instead.

Why? Because it is an established mechanism, already available in most
of the necessary spots (inside the pipeline, on Code objects, in
profiling traces).

This CL removes CompilationTarget and adds a new
NATIVE_CONTEXT_INDEPENDENT kind, plus helper functions to determine
various things about a given code kind (e.g.: does this code kind
deopt?).

As a (very large) drive-by, refactor both Code::Kind and
AbstractCode::Kind into a new CodeKind enum class.

Bug: v8:8888
Change-Id: Ie858b9a53311b0731630be35cf5cd108dee95b39
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2336793
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Dominik Inführ <dinfuehr@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69244}
2020-08-05 12:27:22 +00:00
Ng Zhi An
667fafcec4 Reland "[wasm-simd] Prototype f64x2 rounding instructions"
This is a reland of f7f72b7b3a

This was reverted because of a test timing out on slow_path
variant (https://crrev.com/c/2237131 for details). Turns out
the test is just really slow, and was skipped on this variant
in https://crrev.com/c/2237628. Relanding without changes.


Original change's description:
> [wasm-simd] Prototype f64x2 rounding instructions
>
> Implements f64x2 ceil, floor, trunc, nearestint, for interpreter and
> x64.
>
> Bug: v8:10553
> Change-Id: I12a260a3b1d728368e5525d317d30fc9581cae04
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2213082
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#68241}

Tbr: tebbi@chromium.org
Bug: v8:10553
Change-Id: I4cdc23d0556f11310d32fa066f40b057fd49d2d7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2237350
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68304}
2020-06-10 20:51:21 +00:00
Leszek Swirski
926ce88782 Revert "[wasm-simd] Prototype f64x2 rounding instructions"
This reverts commit f7f72b7b3a.

Reason for revert: Flaky timeouts of slow-path tests -- specifically, mjsunit/regress/wasm/regress-9017, which appears to have regressed from ~2 min to ~3-4 min 

https://logs.chromium.org/logs/v8/buildbucket/cr-buildbucket.appspot.com/8878016799136124416/+/steps/Check_-_slow_path__flakes_/0/logs/regress-9017/0

Original change's description:
> [wasm-simd] Prototype f64x2 rounding instructions
> 
> Implements f64x2 ceil, floor, trunc, nearestint, for interpreter and
> x64.
> 
> Bug: v8:10553
> Change-Id: I12a260a3b1d728368e5525d317d30fc9581cae04
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2213082
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#68241}

TBR=gdeepti@chromium.org,tebbi@chromium.org,zhin@chromium.org

Change-Id: I9915dd375c7f0e08b5414189efb29ed1c90cb96d
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:10553
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2237131
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68248}
2020-06-09 08:38:52 +00:00
Ng Zhi An
f7f72b7b3a [wasm-simd] Prototype f64x2 rounding instructions
Implements f64x2 ceil, floor, trunc, nearestint, for interpreter and
x64.

Bug: v8:10553
Change-Id: I12a260a3b1d728368e5525d317d30fc9581cae04
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2213082
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68241}
2020-06-08 23:43:09 +00:00
Ng Zhi An
02ee6904f4 [x64] Fix vroundps assembly, add disassembly
vroundps assembly is incorrect:
- the signature was wrong, vroundps takes 2 operands and 1 immediate
- when calling vinstr, should always pass xmm0, this wasn't causing
issues because our test cases were restricted enough that it was always
xmm0 anyway
- the macro assembler should use AVX_OP_SSE4_1, since roundps requires
SSE4_1
- drive-by fix for roundss and roundsd to be AVX_OP_SSE4_1
- add disasm for roundps and vroundps, and test them

Bug: v8:10553
Change-Id: I4046eb81a9f18d5af7137bbd46bfa0478e5a9ab2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2227252
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68157}
2020-06-03 19:16:10 +00:00
Ng Zhi An
e2f666184c [x64] Add some ops to disasm tests
While working on some AVX stuff, saw that these ops were missing from
the test cases.

Change-Id: Ie41be465a0715323096c6549b21aa9e994eaac3e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2137472
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67072}
2020-04-09 01:16:07 +00:00
Ng Zhi An
043ac205ec [wasm-simd][x64] Bitmask instructions
Implement i8x16.bitmask, i16x8.bitmask, i32x4.bitmask on x64.

Bug: v8:10308
Change-Id: Id47cb229de77d80d0a7ec91f4862a91258ff1979
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2127317
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67022}
2020-04-06 18:33:15 +00:00
Ng Zhi An
2f83184db3 [wasm-simd][x64] Add AVX codegen
For a bunch of s8x16, s16x2 and s32x4 shuffle ops (generated by
s8x16shuffle).

Bug: v8:9561
Change-Id: I0e5cd8a90edba8bc15918c0ca1dc830475db2769
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2110952
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66865}
2020-03-25 20:12:03 +00:00
Ng Zhi An
307490b012 [wasm-simd][x64] Add AVX codegen for i32x4 conversions and hadd
Bug: v8:9561
Change-Id: I4a2c6217dea540b81256dcc833412da573f54795
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2069403
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66587}
2020-03-04 19:33:11 +00:00
Ng Zhi An
63d1879d94 [wasm-simd][x64] Add AVX codegen for all true ops
Bug: v8:9561
Change-Id: Ic57b38cefbdc21045d71601c67995d3568634c27
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2069400
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66479}
2020-02-27 10:38:22 +00:00
Ng Zhi An
b31ef394b6 [x64] Extract packed absolute value instructions
The AVX versions of pabsb, pabsw, and pabsd have an incorrect function
signature, they should only have two operands. So, extract them into
another macro list. And separately generate the right signatures and
implementations. Also update the disasm and tests.

Bug: v8:10233
Change-Id: I95ee0bf12bb285d10324ecedcec28e941f64d2dc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2063199
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66382}
2020-02-21 03:48:28 +00:00
Ng Zhi An
9fba0cb07e [cleanup] Move some instructions into macro lists
These instructions were probably leftover from an earlier cleanup. We
can move them into respective macro lists, then delete away the
redundant declarations, definitions, disasm, and tests.

We were missing disasm tests for SSE2_INSTRUCTION_LIST_SD, so add that
in.

Change-Id: I8f27beaf57e7a338097690073910a0863f00b26a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2036833
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66123}
2020-02-05 01:57:17 +00:00
Ng Zhi An
d05d335e6a Fix assembler for sqrtpd
The assembly of sqrtpd when using Sqrtpd macro was wrong, since
Sqrtpd(xmm1, xmm1) will incorrect generated vsqrtpd(xmm1, xmm1, xmm1),
which is nonsensical, since vsqrtpd only takes two operands. The
expected instruction should be vsqrtpd(xmm1, xmm0, xmm1) in terms of the
encoding, which is vsqrtpd(xmm1, xmm1).

So, move sqrtpd and cvtps2dq out into their own macro list, because
they have two operands in their AVX form, unlike the rest of the
instructions in SSE2_INSTRUCTION_LIST.

Also updated disasm and tests to use this new list.

Fixed: v8:10170
Change-Id: Ia9343c9a3ae64596bbc876744556e1dcea2a443b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2032195
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66088}
2020-02-03 18:53:19 +00:00
Andreas Haas
911f38c411 [x64] Introduce negb and negw instructions
This CL introduces the negb and negw instructions (8-bit and 16-bit
versions of neg) in the x64 assembler. These instructions are needed to
implement I32AtomicSub8U and similar WebAssembly instructions
efficiently.

The existing implementation was embedded in a generic macro, and it was
difficult to change it without introducing also the 8-bit and 16-bit
versions of many other instructions. This would have introduced a lot
of dead code. Instead this CL extracted the neg instructions from the
macro and implements them directly. This should be fine because the
assembler does not change much, and approachability of the code is
improved.

R=clemensb@chromium.org

Bug: v8:10108
Change-Id: I46099bbebd47f864311a67da3ba8ddc4fe4cd35d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2019165
Commit-Queue: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65989}
2020-01-27 09:45:55 +00:00
Andreas Haas
f506c609bd [x64] Implement xadd in the assembler
This CL introduces the xadd instruction to the x64 assembler so it can
be used to implement WebAssembly's AtomicAdd. This is done in a
separate CL though.

R=clemensb@chromium.org

Bug: v8:10108
Change-Id: I36dcb900ed4c39b23c4996328774780afd8b816a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2011105
Commit-Queue: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65879}
2020-01-21 09:54:45 +00:00
Ng Zhi An
8078d6518b Small fixes for AVX disassembly
Add missing disasm tests for vroundss and vpalignr.
Fix disasm for vinsertps and vpinsrq.

Change-Id: I0f3907761b998d27ec00435a569084724af54ae2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1990140
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65799}
2020-01-16 01:55:31 +00:00
Ng Zhi An
06fa66fec6 Fix assembler and disassembler for vblendvpd
blendvpd should not be defined in the macro list, since the AVX version
has 4 operands, not 3.

Change-Id: Id020b460fa1a3510a91490f3b2286024cc6c5994
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1990139
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65771}
2020-01-14 22:59:09 +00:00
Ng Zhi An
d60809aaf5 [wasm-simd] Add AVX for some i64x2 instructions
Also add missing disasm for SSE4_2 instruction.

Bug: v8:9561
Change-Id: Idc8d3c0e59f0e9aff57ebdcc5774bba375828597
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986386
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65769}
2020-01-14 21:26:48 +00:00
Ng Zhi An
acc96e1f6a [wasm-simd] Add AVX for movlhps and some avx codegen
Bug: v8:9561
Change-Id: I18c832737cbea89e08af2ca166de7b01b7fe51b0
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986256
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65674}
2020-01-09 21:57:06 +00:00
Ng Zhi An
4273416561 [wasm-simd] Add AVX for pextrq
Bug: v8:9561
Change-Id: I2259e72829c0ad688284dcecef8aaf418ad53022
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1980503
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65643}
2020-01-08 19:10:04 +00:00
Ng Zhi An
fd53519035 [wasm-simd] AVX codegen for some conversion opcodes
Bug: v8:9561
Change-Id: Ie3231038312495c2d8f77062ee5b81b2b55ab4d7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1980502
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65617}
2020-01-07 21:27:21 +00:00
Ng Zhi An
c855532af8 Move FMA opcodes into a list macro
Bug: v8:9415
Bug: v8:10021
Change-Id: I77c24b58f575b612e5422bfcb9bb7ab83986659a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986249
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65616}
2020-01-07 19:05:37 +00:00
Ng Zhi An
7cfbcefb5c Fix assembler for packed move instructions
The AVX version should only take one argument, so these instructions have
to be split from the main list of SSE4 instructions, whose AVX version
have two arguments.

Bug: v8:9886
Change-Id: Ie37e060711babd7760547e2aa01c9c0fb0c728b5
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986215
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65588}
2020-01-06 19:42:13 +00:00
Ng Zhi An
1effe529c2 [wasm-simd] Add AVX codegen
Mostly for f32x4 instructions.

Bug: v8:9561
Change-Id: I3a3dc06305acb9e336c494fc399cf5d21518c0e8
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1950488
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65382}
2019-12-09 10:49:07 +00:00
Ng Zhi An
277381d85e Collate packed shift data instructions into macro list
Bug: v8:10021
Change-Id: Ibececfd23b852d7cecf609f6ae1a4b01ea8b55f6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1950485
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65361}
2019-12-06 10:51:40 +00:00
Ng Zhi An
4972b2c84c Add AVX for movddup and pinsrq
Bug: v8:9561
Change-Id: I39a3148570664909eb08f1559b2cb418477a6c15
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1948717
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65322}
2019-12-04 12:28:12 +00:00
Ng Zhi An
785fa6b412 [liftoff] Change FillStackSlotsWithZero to use bytes
Bug: v8:9909
Change-Id: I997ae6f19c580f08eb9ff8ee039e0dd647091616
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1947350
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65320}
2019-12-04 10:07:02 +00:00
Ng Zhi An
83fc8559fa [wasm-simd] AVX codegen for load splat
Bug: v8:9886
Change-Id: I321e93d02971c6ba568d9d7c52d464ffc2754665
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1929837
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65277}
2019-12-02 10:07:23 +00:00
Ng Zhi An
5d80a202dd Add missing diasm and impl of AVX instr
This change includes splitting the existing SSE_INSTRUCTION_LIST into two:
1. sse instructions with two-operand AVX
2. sse instructions with three-operand AVX

Also a drive by fix for disasm of pblendw, the printing of imm8 doesn't
not require AND-ing with 3, since all 8 bits are significant.

Bug: v8:9561
Change-Id: I56c93a24bb9905ae6422698c793b27f3b9e66d8f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1933593
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65274}
2019-12-02 09:13:53 +00:00
Michael Starzinger
330445cee4 [test][x64] Test disassembly of indirect call again.
R=clemensb@chromium.org
TEST=cctest/test-disasm-x64/DisasmX64

Change-Id: I011d0d5e25c472c5a62ad73edd42165e55b34e2b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1900460
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64809}
2019-11-06 14:38:41 +00:00
Dan Elphick
352bbb1279 Reland "Reland: [builtins] Move non-JS linkage builtins code objects into RO_SPACE"
This is a reland of 855591a54d

Fixes break in builds that verify ReadOnlyHeap by relaxing the requirement for
Code objects to be in CODE_SPACE in PagedSpaceObjectIterator::FromCurrentPage.

Original change's description:
> Reland: [builtins] Move non-JS linkage builtins code objects into RO_SPACE
>
> Reland of https://chromium-review.googlesource.com/c/v8/v8/+/1795358.
>
> [builtins] Move non-JS linkage builtins code objects into RO_SPACE
>
> Creates an allow-list of builtins that can still go in code_space
> including all TFJ builtins and a small manual list that should be pared
> down in the future.
>
> For builtins that go in RO_SPACE a Code object is created that contains an
> immediate trap instruction. Generally these Code objects are still no
> smaller than CODE_SPACE Code objects because of the Code object alignment
> requirements. This will hopefully be addressed in a follow-up CL either by
> relaxing them or removing the instruction stream completely.
>
> In the snapshot, this reduces code_space from ~152k to ~40k (-112k) and
> increases by the same amount.
>
> Change-Id: I76661c35c7ea5866c1fb16e87e87122b3e3ca0ce
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893336
> Commit-Queue: Dan Elphick <delphick@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#64700}

Change-Id: I4eeb7dab3027b42fa58c5dfb2bad9873e9fff250
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893192
Commit-Queue: Dan Elphick <delphick@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64728}
2019-11-04 10:45:10 +00:00
Bill Budge
8b104dee9c Revert "Reland: [builtins] Move non-JS linkage builtins code objects into RO_SPACE"
This reverts commit 855591a54d.

Reason for revert: Breaks arm64 sim tests
https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20debug/17957
https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20gc%20stress/16585

Original change's description:
> Reland: [builtins] Move non-JS linkage builtins code objects into RO_SPACE
> 
> Reland of https://chromium-review.googlesource.com/c/v8/v8/+/1795358.
> 
> [builtins] Move non-JS linkage builtins code objects into RO_SPACE
> 
> Creates an allow-list of builtins that can still go in code_space
> including all TFJ builtins and a small manual list that should be pared
> down in the future.
> 
> For builtins that go in RO_SPACE a Code object is created that contains an
> immediate trap instruction. Generally these Code objects are still no
> smaller than CODE_SPACE Code objects because of the Code object alignment
> requirements. This will hopefully be addressed in a follow-up CL either by
> relaxing them or removing the instruction stream completely.
> 
> In the snapshot, this reduces code_space from ~152k to ~40k (-112k) and
> increases by the same amount.
> 
> Change-Id: I76661c35c7ea5866c1fb16e87e87122b3e3ca0ce
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893336
> Commit-Queue: Dan Elphick <delphick@chromium.org>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#64700}

TBR=ulan@chromium.org,jgruber@chromium.org,delphick@chromium.org

Change-Id: I4211c3bb7fe4741e0ba3898f92ce382dfc93c4f3
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893636
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64701}
2019-10-31 20:30:07 +00:00
Dan Elphick
855591a54d Reland: [builtins] Move non-JS linkage builtins code objects into RO_SPACE
Reland of https://chromium-review.googlesource.com/c/v8/v8/+/1795358.

[builtins] Move non-JS linkage builtins code objects into RO_SPACE

Creates an allow-list of builtins that can still go in code_space
including all TFJ builtins and a small manual list that should be pared
down in the future.

For builtins that go in RO_SPACE a Code object is created that contains an
immediate trap instruction. Generally these Code objects are still no
smaller than CODE_SPACE Code objects because of the Code object alignment
requirements. This will hopefully be addressed in a follow-up CL either by
relaxing them or removing the instruction stream completely.

In the snapshot, this reduces code_space from ~152k to ~40k (-112k) and
increases by the same amount.

Change-Id: I76661c35c7ea5866c1fb16e87e87122b3e3ca0ce
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1893336
Commit-Queue: Dan Elphick <delphick@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64700}
2019-10-31 18:18:56 +00:00