Commit Graph

212 Commits

Author SHA1 Message Date
Ng Zhi An
1dfb8cd74a [x64][diasasm] Add more padding to disassembly
A mov can be up to 10 bytes, 6 for displacement, 4 for instr. Other
instructions (like pshufb) with a complex addressing mode can take 10
bytes too. So adjust the padding for disassembly of hex accordingly.
This requires fixing up all the test cases too.

Bug: v8:12207
Change-Id: I372d67a818a5dbfe6f49f67047493d7f67b59bcd
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3180375
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77241}
2021-10-06 00:08:45 +00:00
Ng Zhi An
f80eed4729 [x64] Verify disassembly of SSE3 and SSSE3 instructions
Bug: v8:12207
Change-Id: I6d8a62bb69c6011e6e7f6da2663f9db297b76f7f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3180374
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77226}
2021-10-04 17:38:52 +00:00
Ng Zhi An
eb5656ef23 [x64] Verify disassembly of cmov instructions
Bug: v8:12207
Change-Id: Ic59dbbce330221c917f20c7d20ac7ddb421932ee
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3180373
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77222}
2021-10-04 16:27:52 +00:00
Yolanda Chen
ed7e3de95a [x64] Implement 256-bit assembly for vhaddps
Bug: v8:12228
Change-Id: Ie1f569c450f84a862c754b844e36349b1533872d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3194633
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Commit-Queue: Yolanda Chen <yolanda.chen@intel.com>
Cr-Commit-Position: refs/heads/main@{#77202}
2021-10-02 04:24:22 +00:00
Ng Zhi An
7537e36efa [x64] Verify disassembly of SSE2 instructions
Bug: v8:12207
Change-Id: Ia553891986f0ef3fe6fb1c4350c3accc0e7bfc84
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3180243
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77027}
2021-09-24 01:37:03 +00:00
Ng Zhi An
d90c9c1f65 [x64] Verify disassembly of SSE instructions
- create a helper class to set up Disassembler for testing
- add a helper macro to only compare disassembled instruction (ignore
the hex bytes), this is useful for comparing SSE instructions, whose
opcodes are defined in sse-instr.h, and use uppercase letters, but the
disassembly always uses lowercase
- emit and compare SSE instructions using macro list

Bug: v8:12207
Change-Id: I3580f5d756736cada4f7260efc4d90e2c894f43c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3173906
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77021}
2021-09-23 18:51:03 +00:00
Ng Zhi An
565e83ab2f [x64] Check expected disassembly output fpu instructions
We move some instructions from the test that just disassembles them, to
the test that checks for expected output.

Bug: v8:12207
Change-Id: Ide8954e36c6ad016150bfe45abc1717bed55eb19
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3171972
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76970}
2021-09-21 17:18:18 +00:00
Ng Zhi An
dd06c11ee0 [x64] Check expected disassembly output for some instructions
We move some instructions from the test that just disassembles them, to
the test that checks for expected output.

Bug: v8:12207
Change-Id: I913237427d795ed44539c7294ebbe69330c41dfa
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3163278
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76944}
2021-09-20 18:03:57 +00:00
Ng Zhi An
f67ee467aa [disasm][x64] Remove unnecessary initialization code
These tests don't depend on initializing VM (for Context) or even an
isolate, so we can remove the setup code, and use UNINITIALIZED_TEST
(will not even set up an isolate).

Bug: v8:12207
Change-Id: I4b509b95cc8272db22892c32b53464678403dc7d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3160748
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76854}
2021-09-15 17:38:00 +00:00
Ng Zhi An
ca817b0bb6 [x64] Add new disassembly tests that verifies output
Currently the main test for disassembly just checks that there is
disassembly support for a assembler function, it doesn't verify the
output is as expected.

Add a new test case that checks the disassembly output against an
expected string.

Right now we only check a single instruction, subsequent patches will
move more instructions into this test case.

Bug: v8:12207
Change-Id: Id183bb2fd625713d82239363ebce3f4c77155acd
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3150145
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76828}
2021-09-14 20:41:29 +00:00
Andrew Brown
cea787e280 [x64] Add disassembly tests for 256-bit instructions
A previous change (see ref) added a subset of 256-bit instructions to
the x64 assembler--this change adds a disassembly test for the added
instructions.

ref: https://chromium-review.googlesource.com/c/v8/v8/+/3123648
Change-Id: Ia56be7a7df636b8bf6c04f044912e914d949d19f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3133956
Auto-Submit: Andrew Brown <andrew.brown@intel.com>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76711}
2021-09-08 00:26:44 +00:00
Ng Zhi An
684f3cee1f [wasm-simd] Optimize i32x4.trunc_sat_f32x4_s
Bug: v8:12094
Change-Id: Ibefce881cbfcd4445485197a4a2615bdf0599ada
Fixed: v8:12094
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3123638
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76706}
2021-09-07 20:11:26 +00:00
Ng Zhi An
bb12c48ac3 [wasm-simd] Share i8x16.splat implementation
The optimal implementation is in TurboFan x64 codegen, move it into
shared-macro-assembler, and have TurboFan ia32 and Liftoff use it. The
optimal implementation accounts for AVX2 support.

We add a couple of AVX2 instruction to ia32 in sse-instr.h, not all of
them are used, but follow-up patches will use them, so we add support
(including diassembly and test) in this change.

Drive-by clean up to test-disasm-x64.cc to merge 2 AVX2 test sections.

Bug: v8:11589
Change-Id: I1c8d7deb0f8bb70b29e7a680e5dbcfb09ca5505b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3092555
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76352}
2021-08-17 21:05:00 +00:00
Ng Zhi An
add293e80e [x64][ia32] Move more AVX_OP into SharedTurboAssembler
Bug: v8:11589
Change-Id: I30dbdbc6266d703ce697352780da1d543afbb457
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2826711
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#73965}
2021-04-14 23:46:56 +00:00
Ng Zhi An
9c120b753d [wasm-simd][x64] Fix encoding of vcvtdq2pd
vcvtdq2pd was incorrectly declared to take 3 operands, the use of the
macro Cvtdq2pd meant that the call was vcvtdq2pd(dst, dst, src). This
is an incorrect encoding. Our tests happen to pass because dst was xmm0,
which made it accidentally correct.

This fixes it by moving cvtdq2pd out of the macro list.

Bug: v8:11265
Change-Id: I8b1baf4dd2c670021eafa76dc1a10b442f812805
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2654003
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#72382}
2021-01-27 22:48:59 +00:00
Ng Zhi An
173d660849 [wasm-simd][x64] Optimize i8x16.popcnt with aligned moves
movups is slower on older hardware (core2) than movaps, even if the
operand is aligned. (Not an issue on modern hardware).

Also move i8x16.splat(0x0F) to an external reference so we can load the
mask directly.

Bug: v8:11002
Change-Id: I0b01c27a142024d50b9faaa9e7bd6a1fe169e141
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2643242
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#72336}
2021-01-26 19:03:10 +00:00
Zhi An Ng
ffc832becf [wasm-simd][x64][avx2] Optimize f32x4.splat
When AVX2 is available, we can use vbroadcastss. On AVX, use vshufps,
since it is non-destructive. On SSE, shufps is 1 byte shorter.

FIXED=b/175364402

Change-Id: I5bd10914579d8db012192a9c04f7b0038ec1c812
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599849
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71964}
2021-01-08 03:03:45 +00:00
Zhi An Ng
506c09797c [x64] Sort out move instructions in codegen
In AVX, it is better to use the appropriate integer or floating point
moves depending on which instructions produce/consume these moves, since
there can be a delay moving from integer to floating point domain. On
SSE systems, it is less important, and we can move movaps/movups which
is 1 byte shorter than movdqa/movdqu.

This patch cleans up a couple of places, and defines macro-assembler
functions Movdqa, Movdqu, Movapd, to call into movaps/movups when AVX is
not supported.

Change-Id: Iba6c54e218875f1a70f61792978d7b3f69edfb4b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2599843
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71884}
2020-12-29 01:27:23 +00:00
Zhi An Ng
c9560d1dbf [wasm-simd][x64][avx2] Improve codegen for load{8,16}_splat
Detect AVX2 support and use vpbroadcastb or vpbroadcastw.

No new assembler helpers required because we are only emitting the
VEX-128 versions of these instructions.

Bug: v8:11258
Change-Id: Ic50178daa6fc8fe767dfc788e61e67538066bdea
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2596582
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71866}
2020-12-23 01:56:42 +00:00
Zhi An Ng
741e5a66de [wasm-simd][ia32][x64] More optimization for f32x4.extract_lane
We can have more optimizations for this instruction, they leave some
junk in the top lanes of dst, but that doesn't matter:

- when lane is 1: we use movshdup, this is 4 bytes long
- when lane is 2: use movhlps, this is 3 bytes long
- otherwise use shufps (4 bytes) or pshufd (5 bytes)

All of which are better than insertps (6 bytes).

Change-Id: I0e524431d1832e297e8c8bb418d42382d93fa691
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2591850
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71813}
2020-12-17 01:58:52 +00:00
Zhi An Ng
6cb61e63bb [wasm-simd][x64] Optimize f64x2.extract_lane
pextrq + movq crosses register files twice, which is not efficient.

Optimize this by:
- checking if lane 0, do nothing if dst == src (macro-assembler helper)
- use vmovhlps on AVX, with src as the operands to avoid false
dependency on dst
- use movhlps otherwise, this is shorter than shufpd, and faster on
older system

Change-Id: I3486d87224c048b3229c2f92359b8b8e6d5fd025
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2589056
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#71751}
2020-12-14 23:53:19 +00:00
Zhi An Ng
b0d7912042 [wasm-simd][x64] Prototype sign select
Prototype i8x16, i16x8, i32x4, i64x2 sign select on x64 and interpreter.

Bug: v8:10983
Change-Id: I7d6f39a2cb4c2aefe31daac782978fe8b363dd1a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2486235
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70818}
2020-10-28 03:32:57 +00:00
Jakob Gruber
c7cb9beca1 Reland "Reland "[deoptimizer] Change deopt entries into builtins""
This is a reland of fbfa9bf4ec

The arm64 was missing proper codegen for CFI, thus sizes were off.

Original change's description:
> Reland "[deoptimizer] Change deopt entries into builtins"
>
> This is a reland of 7f58ced72e
>
> It fixes the different exit size emitted on x64/Atom CPUs due to
> performance tuning in TurboAssembler::Call. Additionally, add
> cctests to verify the fixed size exits.
>
> Original change's description:
> > [deoptimizer] Change deopt entries into builtins
> >
> > While the overall goal of this commit is to change deoptimization
> > entries into builtins, there are multiple related things happening:
> >
> > - Deoptimization entries, formerly stubs (i.e. Code objects generated
> >   at runtime, guaranteed to be immovable), have been converted into
> >   builtins. The major restriction is that we now need to preserve the
> >   kRootRegister, which was formerly used on most architectures to pass
> >   the deoptimization id. The solution differs based on platform.
> > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> > - Removed heap/ support for immovable Code generation.
> > - Removed the DeserializerData class (no longer needed).
> > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
> >   in which the final jump to the deoptimization entry is generated
> >   once per Code object, and deopt exits can continue to emit a
> >   near-call.
> > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
> >   sizes by 4/8, 5, and 5 bytes, respectively.
> >
> > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> > by using the same strategy as on arm64 (recalc deopt id from return
> > address). Before:
> >
> >  e300a002       movw r10, <id>
> >  e59fc024       ldr ip, [pc, <entry offset>]
> >  e12fff3c       blx ip
> >
> > After:
> >
> >  e59acb35       ldr ip, [r10, <entry offset>]
> >  e12fff3c       blx ip
> >
> > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> > object (max 32 bytes added overhead per Code object). Before:
> >
> >  9401cdae       bl <entry offset>
> >
> > After:
> >
> >  # eager deoptimization entry jump.
> >  f95b1f50       ldr x16, [x26, <eager entry offset>]
> >  d61f0200       br x16
> >  # lazy deoptimization entry jump.
> >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
> >  d61f0200       br x16
> >  # the deopt exit.
> >  97fffffc       bl <eager deoptimization entry jump offset>
> >
> > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
> >
> >  bb00000000     mov ebx,<id>
> >  e825f5372b     call <entry>
> >
> > After:
> >
> >  e8ea2256ba     call <entry>
> >
> > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
> >
> >  49c7c511000000 REX.W movq r13,<id>
> >  e8ea2f0700     call <entry>
> >
> > After:
> >
> >  41ff9560360000 call [r13+<entry offset>]
> >
> > Bug: v8:8661,v8:8768
> > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#70597}
>
> Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
> Bug: v8:8661,v8:8768,chromium:1140165
> Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70655}

Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
Bug: v8:8661
Bug: v8:8768
Bug: chromium:1140165
Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70672}
2020-10-21 06:01:38 +00:00
Maya Lekova
7c7aa4fa94 Revert "Reland "[deoptimizer] Change deopt entries into builtins""
This reverts commit fbfa9bf4ec.

Reason for revert: Seems to break arm64 sim CFI build (please see DeoptExitSizeIfFixed) - https://ci.chromium.org/p/v8/builders/ci/V8%20Linux%20-%20arm64%20-%20sim%20-%20CFI/2808

Original change's description:
> Reland "[deoptimizer] Change deopt entries into builtins"
>
> This is a reland of 7f58ced72e
>
> It fixes the different exit size emitted on x64/Atom CPUs due to
> performance tuning in TurboAssembler::Call. Additionally, add
> cctests to verify the fixed size exits.
>
> Original change's description:
> > [deoptimizer] Change deopt entries into builtins
> >
> > While the overall goal of this commit is to change deoptimization
> > entries into builtins, there are multiple related things happening:
> >
> > - Deoptimization entries, formerly stubs (i.e. Code objects generated
> >   at runtime, guaranteed to be immovable), have been converted into
> >   builtins. The major restriction is that we now need to preserve the
> >   kRootRegister, which was formerly used on most architectures to pass
> >   the deoptimization id. The solution differs based on platform.
> > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> > - Removed heap/ support for immovable Code generation.
> > - Removed the DeserializerData class (no longer needed).
> > - arm64: to preserve 4-byte deopt exits, introduced a new optimization
> >   in which the final jump to the deoptimization entry is generated
> >   once per Code object, and deopt exits can continue to emit a
> >   near-call.
> > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
> >   sizes by 4/8, 5, and 5 bytes, respectively.
> >
> > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> > by using the same strategy as on arm64 (recalc deopt id from return
> > address). Before:
> >
> >  e300a002       movw r10, <id>
> >  e59fc024       ldr ip, [pc, <entry offset>]
> >  e12fff3c       blx ip
> >
> > After:
> >
> >  e59acb35       ldr ip, [r10, <entry offset>]
> >  e12fff3c       blx ip
> >
> > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> > with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> > object (max 32 bytes added overhead per Code object). Before:
> >
> >  9401cdae       bl <entry offset>
> >
> > After:
> >
> >  # eager deoptimization entry jump.
> >  f95b1f50       ldr x16, [x26, <eager entry offset>]
> >  d61f0200       br x16
> >  # lazy deoptimization entry jump.
> >  f95b2b50       ldr x16, [x26, <lazy entry offset>]
> >  d61f0200       br x16
> >  # the deopt exit.
> >  97fffffc       bl <eager deoptimization entry jump offset>
> >
> > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
> >
> >  bb00000000     mov ebx,<id>
> >  e825f5372b     call <entry>
> >
> > After:
> >
> >  e8ea2256ba     call <entry>
> >
> > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
> >
> >  49c7c511000000 REX.W movq r13,<id>
> >  e8ea2f0700     call <entry>
> >
> > After:
> >
> >  41ff9560360000 call [r13+<entry offset>]
> >
> > Bug: v8:8661,v8:8768
> > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> > Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> > Cr-Commit-Position: refs/heads/master@{#70597}
>
> Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
> Bug: v8:8661,v8:8768,chromium:1140165
> Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70655}

TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org

Change-Id: I4739a3475bfd8ee0cfbe4b9a20382f91a6ef1bf0
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:8661
Bug: v8:8768
Bug: chromium:1140165
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485223
Reviewed-by: Maya Lekova <mslekova@chromium.org>
Commit-Queue: Maya Lekova <mslekova@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70658}
2020-10-20 14:14:12 +00:00
Jakob Gruber
fbfa9bf4ec Reland "[deoptimizer] Change deopt entries into builtins"
This is a reland of 7f58ced72e

It fixes the different exit size emitted on x64/Atom CPUs due to
performance tuning in TurboAssembler::Call. Additionally, add
cctests to verify the fixed size exits.

Original change's description:
> [deoptimizer] Change deopt entries into builtins
>
> While the overall goal of this commit is to change deoptimization
> entries into builtins, there are multiple related things happening:
>
> - Deoptimization entries, formerly stubs (i.e. Code objects generated
>   at runtime, guaranteed to be immovable), have been converted into
>   builtins. The major restriction is that we now need to preserve the
>   kRootRegister, which was formerly used on most architectures to pass
>   the deoptimization id. The solution differs based on platform.
> - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> - Removed heap/ support for immovable Code generation.
> - Removed the DeserializerData class (no longer needed).
> - arm64: to preserve 4-byte deopt exits, introduced a new optimization
>   in which the final jump to the deoptimization entry is generated
>   once per Code object, and deopt exits can continue to emit a
>   near-call.
> - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
>   sizes by 4/8, 5, and 5 bytes, respectively.
>
> On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> by using the same strategy as on arm64 (recalc deopt id from return
> address). Before:
>
>  e300a002       movw r10, <id>
>  e59fc024       ldr ip, [pc, <entry offset>]
>  e12fff3c       blx ip
>
> After:
>
>  e59acb35       ldr ip, [r10, <entry offset>]
>  e12fff3c       blx ip
>
> On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> object (max 32 bytes added overhead per Code object). Before:
>
>  9401cdae       bl <entry offset>
>
> After:
>
>  # eager deoptimization entry jump.
>  f95b1f50       ldr x16, [x26, <eager entry offset>]
>  d61f0200       br x16
>  # lazy deoptimization entry jump.
>  f95b2b50       ldr x16, [x26, <lazy entry offset>]
>  d61f0200       br x16
>  # the deopt exit.
>  97fffffc       bl <eager deoptimization entry jump offset>
>
> On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
>
>  bb00000000     mov ebx,<id>
>  e825f5372b     call <entry>
>
> After:
>
>  e8ea2256ba     call <entry>
>
> On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
>
>  49c7c511000000 REX.W movq r13,<id>
>  e8ea2f0700     call <entry>
>
> After:
>
>  41ff9560360000 call [r13+<entry offset>]
>
> Bug: v8:8661,v8:8768
> Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70597}

Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org
Bug: v8:8661,v8:8768,chromium:1140165
Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70655}
2020-10-20 12:30:23 +00:00
Jakob Gruber
8bc9a7941c Revert "[deoptimizer] Change deopt entries into builtins"
This reverts commit 7f58ced72e.

Reason for revert: Segfaults on Atom_x64 https://ci.chromium.org/p/v8-internal/builders/ci/v8_linux64_atom_perf/5686?

Original change's description:
> [deoptimizer] Change deopt entries into builtins
>
> While the overall goal of this commit is to change deoptimization
> entries into builtins, there are multiple related things happening:
>
> - Deoptimization entries, formerly stubs (i.e. Code objects generated
>   at runtime, guaranteed to be immovable), have been converted into
>   builtins. The major restriction is that we now need to preserve the
>   kRootRegister, which was formerly used on most architectures to pass
>   the deoptimization id. The solution differs based on platform.
> - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
> - Removed heap/ support for immovable Code generation.
> - Removed the DeserializerData class (no longer needed).
> - arm64: to preserve 4-byte deopt exits, introduced a new optimization
>   in which the final jump to the deoptimization entry is generated
>   once per Code object, and deopt exits can continue to emit a
>   near-call.
> - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
>   sizes by 4/8, 5, and 5 bytes, respectively.
>
> On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
> by using the same strategy as on arm64 (recalc deopt id from return
> address). Before:
>
>  e300a002       movw r10, <id>
>  e59fc024       ldr ip, [pc, <entry offset>]
>  e12fff3c       blx ip
>
> After:
>
>  e59acb35       ldr ip, [r10, <entry offset>]
>  e12fff3c       blx ip
>
> On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
> with CFI). Additionally, up to 4 builtin jumps are emitted per Code
> object (max 32 bytes added overhead per Code object). Before:
>
>  9401cdae       bl <entry offset>
>
> After:
>
>  # eager deoptimization entry jump.
>  f95b1f50       ldr x16, [x26, <eager entry offset>]
>  d61f0200       br x16
>  # lazy deoptimization entry jump.
>  f95b2b50       ldr x16, [x26, <lazy entry offset>]
>  d61f0200       br x16
>  # the deopt exit.
>  97fffffc       bl <eager deoptimization entry jump offset>
>
> On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:
>
>  bb00000000     mov ebx,<id>
>  e825f5372b     call <entry>
>
> After:
>
>  e8ea2256ba     call <entry>
>
> On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:
>
>  49c7c511000000 REX.W movq r13,<id>
>  e8ea2f0700     call <entry>
>
> After:
>
>  41ff9560360000 call [r13+<entry offset>]
>
> Bug: v8:8661,v8:8768
> Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#70597}

TBR=ulan@chromium.org,rmcilroy@chromium.org,jgruber@chromium.org,tebbi@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: v8:8661,v8:8768,chromium:1140165
Change-Id: I3df02ab42f6e02233d9f6fb80e8bb18f76870d91
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485504
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70649}
2020-10-20 09:43:19 +00:00
Jakob Gruber
7f58ced72e [deoptimizer] Change deopt entries into builtins
While the overall goal of this commit is to change deoptimization
entries into builtins, there are multiple related things happening:

- Deoptimization entries, formerly stubs (i.e. Code objects generated
  at runtime, guaranteed to be immovable), have been converted into
  builtins. The major restriction is that we now need to preserve the
  kRootRegister, which was formerly used on most architectures to pass
  the deoptimization id. The solution differs based on platform.
- Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING.
- Removed heap/ support for immovable Code generation.
- Removed the DeserializerData class (no longer needed).
- arm64: to preserve 4-byte deopt exits, introduced a new optimization
  in which the final jump to the deoptimization entry is generated
  once per Code object, and deopt exits can continue to emit a
  near-call.
- arm,ia32,x64: change to fixed-size deopt exits. This reduces exit
  sizes by 4/8, 5, and 5 bytes, respectively.

On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes
by using the same strategy as on arm64 (recalc deopt id from return
address). Before:

 e300a002       movw r10, <id>
 e59fc024       ldr ip, [pc, <entry offset>]
 e12fff3c       blx ip

After:

 e59acb35       ldr ip, [r10, <entry offset>]
 e12fff3c       blx ip

On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases
with CFI). Additionally, up to 4 builtin jumps are emitted per Code
object (max 32 bytes added overhead per Code object). Before:

 9401cdae       bl <entry offset>

After:

 # eager deoptimization entry jump.
 f95b1f50       ldr x16, [x26, <eager entry offset>]
 d61f0200       br x16
 # lazy deoptimization entry jump.
 f95b2b50       ldr x16, [x26, <lazy entry offset>]
 d61f0200       br x16
 # the deopt exit.
 97fffffc       bl <eager deoptimization entry jump offset>

On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before:

 bb00000000     mov ebx,<id>
 e825f5372b     call <entry>

After:

 e8ea2256ba     call <entry>

On x64 the deopt exit size is reduced from 12 to 7 bytes. Before:

 49c7c511000000 REX.W movq r13,<id>
 e8ea2f0700     call <entry>

After:

 41ff9560360000 call [r13+<entry offset>]

Bug: v8:8661,v8:8768
Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70597}
2020-10-19 07:32:48 +00:00
Ng Zhi An
944dad59c8 [x64] Add movlps and movhps to assembler
These instructions will be used for prototyping Wasm SIMD's store lane
later on, separated the implementation for assembler and disassembler
into this patch to make things smaller.

Curiously, movhps and movlhps seems to have the same encoding, 0f 16, so
I'm not sure not sure how to differentiate them in the disassembler
besides using the mod field, since movlhps only takes xmm registers,
whereas movhps always take 1 operand.

Bug: v8:10975
Change-Id: I8be9a31b1c9a5515038f9c8c55ef30d1ba063ea7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2471977
Reviewed-by: Bill Budge <bbudge@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70520}
2020-10-15 00:37:32 +00:00
Ng Zhi An
e30c50f3bf [x64] Refactor pinsrb family of instructions
The existing macro assembler define Pinsrb, which expects 3 arguments:

- XMMRegister dst
- Register/Operand src
- uint8_t imm

which overwrites dst with src at lane specified by imm.

That means we cannot use the AVX version, which has 4 arguments, and
does not overwrite dst.

This refactoring defines the 4 argument AVX version instead, and if AVX
is not supported, fall back to the SSE version, and ensure that the
value is copied over into dst first.

For convenience, we define an overload with 3 arguments that duplicates
dst, this replicates the SSE behavior, so that not all callers have to
be updated.

Bug: v8:10975, v8:10933
Change-Id: I6f9b9d37fa08d3f5cff4f040ae7d5e1f0cf36455
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2444096
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70392}
2020-10-07 23:25:30 +00:00
Jakob Gruber
29bcdaad1d Rename legacy code kinds
CodeKind::OPTIMIZED_CODE -> TURBOFAN

Kinds are now more fine-grained and distinguish between TF, TP, NCI.

CodeKind::STUB -> DEOPT_ENTRIES_OR_FOR_TESTING

Code stubs (like builtins, but generated at runtime) were removed from
the codebase years ago, this is the last remnant. This kind is used
only for deopt entries (which should be converted into builtins) and
for tests.

Change-Id: I67beb15377cb60f395e9b051b25f3e5764982e93
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2440335
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Mythri Alle <mythria@chromium.org>
Reviewed-by: Mythri Alle <mythria@chromium.org>
Cr-Commit-Position: refs/heads/master@{#70234}
2020-09-30 15:39:23 +00:00
Ng Zhi An
ddf30bea13 [wasm-simd][x64] Check for register when emitting shuffles
Some shuffles take have either register or memory operand for second
input, but the codegen incorrectly assumes that it is always a register.

Bug: v8:10824
Change-Id: Ia2df233dad4ed451e52e57e35cce5c80db0905db
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2373586
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69562}
2020-08-25 17:52:16 +00:00
Jakob Gruber
c51041f454 [nci] Replace CompilationTarget with a new Code::Kind value
With the new Turbofan variants (NCI and Turboprop), we need a way to
distinguish between them both during and after compilation. We
initially introduced CompilationTarget to track the variant during
compilation, but decided to reuse the code kind as the canonical spot to
store this information instead.

Why? Because it is an established mechanism, already available in most
of the necessary spots (inside the pipeline, on Code objects, in
profiling traces).

This CL removes CompilationTarget and adds a new
NATIVE_CONTEXT_INDEPENDENT kind, plus helper functions to determine
various things about a given code kind (e.g.: does this code kind
deopt?).

As a (very large) drive-by, refactor both Code::Kind and
AbstractCode::Kind into a new CodeKind enum class.

Bug: v8:8888
Change-Id: Ie858b9a53311b0731630be35cf5cd108dee95b39
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2336793
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Dominik Inführ <dinfuehr@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69244}
2020-08-05 12:27:22 +00:00
Ng Zhi An
667fafcec4 Reland "[wasm-simd] Prototype f64x2 rounding instructions"
This is a reland of f7f72b7b3a

This was reverted because of a test timing out on slow_path
variant (https://crrev.com/c/2237131 for details). Turns out
the test is just really slow, and was skipped on this variant
in https://crrev.com/c/2237628. Relanding without changes.


Original change's description:
> [wasm-simd] Prototype f64x2 rounding instructions
>
> Implements f64x2 ceil, floor, trunc, nearestint, for interpreter and
> x64.
>
> Bug: v8:10553
> Change-Id: I12a260a3b1d728368e5525d317d30fc9581cae04
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2213082
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#68241}

Tbr: tebbi@chromium.org
Bug: v8:10553
Change-Id: I4cdc23d0556f11310d32fa066f40b057fd49d2d7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2237350
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Reviewed-by: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68304}
2020-06-10 20:51:21 +00:00
Leszek Swirski
926ce88782 Revert "[wasm-simd] Prototype f64x2 rounding instructions"
This reverts commit f7f72b7b3a.

Reason for revert: Flaky timeouts of slow-path tests -- specifically, mjsunit/regress/wasm/regress-9017, which appears to have regressed from ~2 min to ~3-4 min 

https://logs.chromium.org/logs/v8/buildbucket/cr-buildbucket.appspot.com/8878016799136124416/+/steps/Check_-_slow_path__flakes_/0/logs/regress-9017/0

Original change's description:
> [wasm-simd] Prototype f64x2 rounding instructions
> 
> Implements f64x2 ceil, floor, trunc, nearestint, for interpreter and
> x64.
> 
> Bug: v8:10553
> Change-Id: I12a260a3b1d728368e5525d317d30fc9581cae04
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2213082
> Commit-Queue: Zhi An Ng <zhin@chromium.org>
> Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
> Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#68241}

TBR=gdeepti@chromium.org,tebbi@chromium.org,zhin@chromium.org

Change-Id: I9915dd375c7f0e08b5414189efb29ed1c90cb96d
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:10553
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2237131
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68248}
2020-06-09 08:38:52 +00:00
Ng Zhi An
f7f72b7b3a [wasm-simd] Prototype f64x2 rounding instructions
Implements f64x2 ceil, floor, trunc, nearestint, for interpreter and
x64.

Bug: v8:10553
Change-Id: I12a260a3b1d728368e5525d317d30fc9581cae04
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2213082
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Tobias Tebbi <tebbi@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68241}
2020-06-08 23:43:09 +00:00
Ng Zhi An
02ee6904f4 [x64] Fix vroundps assembly, add disassembly
vroundps assembly is incorrect:
- the signature was wrong, vroundps takes 2 operands and 1 immediate
- when calling vinstr, should always pass xmm0, this wasn't causing
issues because our test cases were restricted enough that it was always
xmm0 anyway
- the macro assembler should use AVX_OP_SSE4_1, since roundps requires
SSE4_1
- drive-by fix for roundss and roundsd to be AVX_OP_SSE4_1
- add disasm for roundps and vroundps, and test them

Bug: v8:10553
Change-Id: I4046eb81a9f18d5af7137bbd46bfa0478e5a9ab2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2227252
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#68157}
2020-06-03 19:16:10 +00:00
Ng Zhi An
e2f666184c [x64] Add some ops to disasm tests
While working on some AVX stuff, saw that these ops were missing from
the test cases.

Change-Id: Ie41be465a0715323096c6549b21aa9e994eaac3e
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2137472
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67072}
2020-04-09 01:16:07 +00:00
Ng Zhi An
043ac205ec [wasm-simd][x64] Bitmask instructions
Implement i8x16.bitmask, i16x8.bitmask, i32x4.bitmask on x64.

Bug: v8:10308
Change-Id: Id47cb229de77d80d0a7ec91f4862a91258ff1979
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2127317
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#67022}
2020-04-06 18:33:15 +00:00
Ng Zhi An
2f83184db3 [wasm-simd][x64] Add AVX codegen
For a bunch of s8x16, s16x2 and s32x4 shuffle ops (generated by
s8x16shuffle).

Bug: v8:9561
Change-Id: I0e5cd8a90edba8bc15918c0ca1dc830475db2769
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2110952
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66865}
2020-03-25 20:12:03 +00:00
Ng Zhi An
307490b012 [wasm-simd][x64] Add AVX codegen for i32x4 conversions and hadd
Bug: v8:9561
Change-Id: I4a2c6217dea540b81256dcc833412da573f54795
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2069403
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66587}
2020-03-04 19:33:11 +00:00
Ng Zhi An
63d1879d94 [wasm-simd][x64] Add AVX codegen for all true ops
Bug: v8:9561
Change-Id: Ic57b38cefbdc21045d71601c67995d3568634c27
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2069400
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66479}
2020-02-27 10:38:22 +00:00
Ng Zhi An
b31ef394b6 [x64] Extract packed absolute value instructions
The AVX versions of pabsb, pabsw, and pabsd have an incorrect function
signature, they should only have two operands. So, extract them into
another macro list. And separately generate the right signatures and
implementations. Also update the disasm and tests.

Bug: v8:10233
Change-Id: I95ee0bf12bb285d10324ecedcec28e941f64d2dc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2063199
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66382}
2020-02-21 03:48:28 +00:00
Ng Zhi An
9fba0cb07e [cleanup] Move some instructions into macro lists
These instructions were probably leftover from an earlier cleanup. We
can move them into respective macro lists, then delete away the
redundant declarations, definitions, disasm, and tests.

We were missing disasm tests for SSE2_INSTRUCTION_LIST_SD, so add that
in.

Change-Id: I8f27beaf57e7a338097690073910a0863f00b26a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2036833
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66123}
2020-02-05 01:57:17 +00:00
Ng Zhi An
d05d335e6a Fix assembler for sqrtpd
The assembly of sqrtpd when using Sqrtpd macro was wrong, since
Sqrtpd(xmm1, xmm1) will incorrect generated vsqrtpd(xmm1, xmm1, xmm1),
which is nonsensical, since vsqrtpd only takes two operands. The
expected instruction should be vsqrtpd(xmm1, xmm0, xmm1) in terms of the
encoding, which is vsqrtpd(xmm1, xmm1).

So, move sqrtpd and cvtps2dq out into their own macro list, because
they have two operands in their AVX form, unlike the rest of the
instructions in SSE2_INSTRUCTION_LIST.

Also updated disasm and tests to use this new list.

Fixed: v8:10170
Change-Id: Ia9343c9a3ae64596bbc876744556e1dcea2a443b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2032195
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66088}
2020-02-03 18:53:19 +00:00
Andreas Haas
911f38c411 [x64] Introduce negb and negw instructions
This CL introduces the negb and negw instructions (8-bit and 16-bit
versions of neg) in the x64 assembler. These instructions are needed to
implement I32AtomicSub8U and similar WebAssembly instructions
efficiently.

The existing implementation was embedded in a generic macro, and it was
difficult to change it without introducing also the 8-bit and 16-bit
versions of many other instructions. This would have introduced a lot
of dead code. Instead this CL extracted the neg instructions from the
macro and implements them directly. This should be fine because the
assembler does not change much, and approachability of the code is
improved.

R=clemensb@chromium.org

Bug: v8:10108
Change-Id: I46099bbebd47f864311a67da3ba8ddc4fe4cd35d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2019165
Commit-Queue: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65989}
2020-01-27 09:45:55 +00:00
Andreas Haas
f506c609bd [x64] Implement xadd in the assembler
This CL introduces the xadd instruction to the x64 assembler so it can
be used to implement WebAssembly's AtomicAdd. This is done in a
separate CL though.

R=clemensb@chromium.org

Bug: v8:10108
Change-Id: I36dcb900ed4c39b23c4996328774780afd8b816a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2011105
Commit-Queue: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65879}
2020-01-21 09:54:45 +00:00
Ng Zhi An
8078d6518b Small fixes for AVX disassembly
Add missing disasm tests for vroundss and vpalignr.
Fix disasm for vinsertps and vpinsrq.

Change-Id: I0f3907761b998d27ec00435a569084724af54ae2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1990140
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65799}
2020-01-16 01:55:31 +00:00
Ng Zhi An
06fa66fec6 Fix assembler and disassembler for vblendvpd
blendvpd should not be defined in the macro list, since the AVX version
has 4 operands, not 3.

Change-Id: Id020b460fa1a3510a91490f3b2286024cc6c5994
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1990139
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65771}
2020-01-14 22:59:09 +00:00
Ng Zhi An
d60809aaf5 [wasm-simd] Add AVX for some i64x2 instructions
Also add missing disasm for SSE4_2 instruction.

Bug: v8:9561
Change-Id: Idc8d3c0e59f0e9aff57ebdcc5774bba375828597
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986386
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65769}
2020-01-14 21:26:48 +00:00
Ng Zhi An
acc96e1f6a [wasm-simd] Add AVX for movlhps and some avx codegen
Bug: v8:9561
Change-Id: I18c832737cbea89e08af2ca166de7b01b7fe51b0
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1986256
Reviewed-by: Deepti Gandluri <gdeepti@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65674}
2020-01-09 21:57:06 +00:00