v8/test/cctest/test-macro-assembler-arm.cc

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

351 lines
11 KiB
C++
Raw Normal View History

// Copyright 2013 the V8 project authors. All rights reserved.
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following
// disclaimer in the documentation and/or other materials provided
// with the distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived
// from this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <stdlib.h>
#include "src/codegen/assembler-inl.h"
#include "src/codegen/macro-assembler.h"
Reland "Reland "[deoptimizer] Change deopt entries into builtins"" This is a reland of fbfa9bf4ec72b1b73a96b70ccb68cd98c321511b The arm64 was missing proper codegen for CFI, thus sizes were off. Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced72eb65b6b5530ccabaf2eaebe45bf9d33 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682 Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70672}
2020-10-21 05:12:25 +00:00
#include "src/deoptimizer/deoptimizer.h"
#include "src/execution/simulator.h"
#include "src/init/v8.h"
#include "src/objects/objects-inl.h"
#include "src/utils/ostreams.h"
#include "test/cctest/cctest.h"
Reland "[turboassembler] Introduce hard-abort mode" This is a reland of a462a7854a081f4f34bb4c112ee33f3d69efa309 Original change's description: > [turboassembler] Introduce hard-abort mode > > For checks and assertions (mostly for debug code, like stack alignment > or zero extension), we had two modes: Emit a call to the {Abort} > runtime function (the default), and emit a debug break (used for > testing, enabled via --trap-on-abort). > In wasm, where we cannot just call a runtime function because code must > be isolate independent, we always used the trap-on-abort behaviour. > This causes problems for our fuzzers, which do not catch SIGTRAP, and > hence do not detect debug code failures. > > This CL introduces a third mode ("hard abort"), which calls a C > function via {ExternalReference}. The C function still outputs the > abort reason, but does not print the stack trace. It then aborts via > "OS::Abort", just like the runtime function. > This will allow fuzzers to detect the crash and even find a nice error > message. > > Even though this looks like a lot of code churn, it is actually not. > Most added lines are new tests, and other changes are minimal. > > R=mstarzinger@chromium.org > > Bug: chromium:863799 > Change-Id: I77c58ff72db552d49014614436259ccfb49ba87b > Reviewed-on: https://chromium-review.googlesource.com/1142163 > Commit-Queue: Clemens Hammacher <clemensh@chromium.org> > Reviewed-by: Michael Starzinger <mstarzinger@chromium.org> > Cr-Commit-Position: refs/heads/master@{#54592} Bug: chromium:863799 Change-Id: I7729a47b4823a982a8e201df36520aa2b6ef5326 Reviewed-on: https://chromium-review.googlesource.com/1146100 Reviewed-by: Michael Starzinger <mstarzinger@chromium.org> Commit-Queue: Clemens Hammacher <clemensh@chromium.org> Cr-Commit-Position: refs/heads/master@{#54656}
2018-07-24 14:12:47 +00:00
#include "test/common/assembler-tester.h"
namespace v8 {
namespace internal {
namespace test_macro_assembler_arm {
using F = void*(int x, int y, int p2, int p3, int p4);
#define __ masm->
using F3 = void*(void* p0, int p1, int p2, int p3, int p4);
using F5 = int(void*, void*, void*, void*, void*);
TEST(ExtractLane) {
if (!CpuFeatures::IsSupported(NEON)) return;
Isolate* isolate = CcTest::i_isolate();
HandleScope handles(isolate);
Reland "[Memory] Use OS::Allocate for all OS memory allocations." This is a reland of 4899bcb66db1c8c8272b02fb1cedf9886e010f82 This is a reland of b73ee3344ac96f92b4418c3c07779eb95a8541cc Original change's description: > [Memory] Use OS::Allocate for all OS memory allocations. > > - Eliminates OS::ReserveRegion and OS::ReserveAlignedRegion. > - Changes OS::Allocate to take alignment parameter, reorders parameters > to match page_allocator. > - Since the size of memory allocation can be deduced, don't return the > amount of memory allocated. > - Changes reservation of aligned address space. Before we would reserve > (size + alignment) rounded up to page size. This is too much, because > maximum misalignment is (alignment - page_size). > - On Windows and Cygwin, we release an oversize allocation and > immediately retry at the aligned address in the allocation. If we > lose the address due to a race, we just retry. > - Clean up all the calls to OS::Allocate in codegen and tests by adding > helper AllocateSystemPage function (allocation.h) and > AllocateAssemblerBuffer (cctest.h). > - Changes 'assm' to 'masm' in some targets for consistency when using > a macro-assembler. > > - Eliminates OS::ReleaseRegion, replacing with calls to OS::Free. > - Adds bool return value to OS::Free. > - Cleans up types of flags, protection on Windows and Cygwin. > Bug: chromium:756050 > Cq-Include-Trybots: master.tryserver.chromium.linux:linux_chromium_rel_ng > Change-Id: I306dbe042cc867670fdc935abca29db074b0da71 Cq-Include-Trybots: master.tryserver.chromium.linux:linux_chromium_rel_ng Change-Id: Iad3c025334e8f8d7d647be99a36a11ee449c9087 Reviewed-on: https://chromium-review.googlesource.com/767014 Commit-Queue: Bill Budge <bbudge@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/master@{#49363}
2017-11-14 15:55:09 +00:00
auto buffer = AllocateAssemblerBuffer();
MacroAssembler assembler(isolate, v8::internal::CodeObjectRequired::kYes,
buffer->CreateView());
MacroAssembler* masm = &assembler; // Create a pointer for the __ macro.
struct T {
int32_t i32x4_low[4];
int32_t i32x4_high[4];
int32_t i16x8_low[8];
int32_t i16x8_high[8];
int32_t i8x16_low[16];
int32_t i8x16_high[16];
int32_t f32x4_low[4];
int32_t f32x4_high[4];
int32_t i8x16_low_d[16];
int32_t i8x16_high_d[16];
};
T t;
__ stm(db_w, sp, r4.bit() | r5.bit() | lr.bit());
for (int i = 0; i < 4; i++) {
__ mov(r4, Operand(i));
__ vdup(Neon32, q1, r4);
__ ExtractLane(r5, q1, NeonS32, i);
__ str(r5, MemOperand(r0, offsetof(T, i32x4_low) + 4 * i));
SwVfpRegister si = SwVfpRegister::from_code(i);
__ ExtractLane(si, q1, i);
__ vstr(si, r0, offsetof(T, f32x4_low) + 4 * i);
}
for (int i = 0; i < 8; i++) {
__ mov(r4, Operand(i));
__ vdup(Neon16, q1, r4);
__ ExtractLane(r5, q1, NeonS16, i);
__ str(r5, MemOperand(r0, offsetof(T, i16x8_low) + 4 * i));
}
for (int i = 0; i < 16; i++) {
__ mov(r4, Operand(i));
__ vdup(Neon8, q1, r4);
__ ExtractLane(r5, q1, NeonS8, i);
__ str(r5, MemOperand(r0, offsetof(T, i8x16_low) + 4 * i));
}
for (int i = 0; i < 8; i++) {
__ mov(r4, Operand(i));
__ vdup(Neon8, q1, r4); // q1 = d2,d3
__ ExtractLane(r5, d2, NeonS8, i);
__ str(r5, MemOperand(r0, offsetof(T, i8x16_low_d) + 4 * i));
__ ExtractLane(r5, d3, NeonS8, i);
__ str(r5, MemOperand(r0, offsetof(T, i8x16_low_d) + 4 * (i + 8)));
}
if (CpuFeatures::IsSupported(VFP32DREGS)) {
for (int i = 0; i < 4; i++) {
__ mov(r4, Operand(-i));
__ vdup(Neon32, q15, r4);
__ ExtractLane(r5, q15, NeonS32, i);
__ str(r5, MemOperand(r0, offsetof(T, i32x4_high) + 4 * i));
SwVfpRegister si = SwVfpRegister::from_code(i);
__ ExtractLane(si, q15, i);
__ vstr(si, r0, offsetof(T, f32x4_high) + 4 * i);
}
for (int i = 0; i < 8; i++) {
__ mov(r4, Operand(-i));
__ vdup(Neon16, q15, r4);
__ ExtractLane(r5, q15, NeonS16, i);
__ str(r5, MemOperand(r0, offsetof(T, i16x8_high) + 4 * i));
}
for (int i = 0; i < 16; i++) {
__ mov(r4, Operand(-i));
__ vdup(Neon8, q15, r4);
__ ExtractLane(r5, q15, NeonS8, i);
__ str(r5, MemOperand(r0, offsetof(T, i8x16_high) + 4 * i));
}
for (int i = 0; i < 8; i++) {
__ mov(r4, Operand(-i));
__ vdup(Neon8, q15, r4); // q1 = d30,d31
__ ExtractLane(r5, d30, NeonS8, i);
__ str(r5, MemOperand(r0, offsetof(T, i8x16_high_d) + 4 * i));
__ ExtractLane(r5, d31, NeonS8, i);
__ str(r5, MemOperand(r0, offsetof(T, i8x16_high_d) + 4 * (i + 8)));
}
}
__ ldm(ia_w, sp, r4.bit() | r5.bit() | pc.bit());
CodeDesc desc;
masm->GetCode(isolate, &desc);
Reland "Reland "[deoptimizer] Change deopt entries into builtins"" This is a reland of fbfa9bf4ec72b1b73a96b70ccb68cd98c321511b The arm64 was missing proper codegen for CFI, thus sizes were off. Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced72eb65b6b5530ccabaf2eaebe45bf9d33 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682 Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70672}
2020-10-21 05:12:25 +00:00
Handle<Code> code =
Factory::CodeBuilder(isolate, desc, CodeKind::FOR_TESTING).Build();
#ifdef DEBUG
StdoutStream os;
code->Print(os);
#endif
auto f = GeneratedCode<F3>::FromCode(*code);
f.Call(&t, 0, 0, 0, 0);
for (int i = 0; i < 4; i++) {
CHECK_EQ(i, t.i32x4_low[i]);
CHECK_EQ(i, t.f32x4_low[i]);
}
for (int i = 0; i < 8; i++) {
CHECK_EQ(i, t.i16x8_low[i]);
}
for (int i = 0; i < 16; i++) {
CHECK_EQ(i, t.i8x16_low[i]);
}
for (int i = 0; i < 8; i++) {
CHECK_EQ(i, t.i8x16_low_d[i]);
CHECK_EQ(i, t.i8x16_low_d[i + 8]);
}
if (CpuFeatures::IsSupported(VFP32DREGS)) {
for (int i = 0; i < 4; i++) {
CHECK_EQ(-i, t.i32x4_high[i]);
CHECK_EQ(-i, t.f32x4_high[i]);
}
for (int i = 0; i < 8; i++) {
CHECK_EQ(-i, t.i16x8_high[i]);
}
for (int i = 0; i < 16; i++) {
CHECK_EQ(-i, t.i8x16_high[i]);
}
for (int i = 0; i < 8; i++) {
CHECK_EQ(-i, t.i8x16_high_d[i]);
CHECK_EQ(-i, t.i8x16_high_d[i + 8]);
}
}
}
TEST(ReplaceLane) {
if (!CpuFeatures::IsSupported(NEON)) return;
Isolate* isolate = CcTest::i_isolate();
HandleScope handles(isolate);
Reland "[Memory] Use OS::Allocate for all OS memory allocations." This is a reland of 4899bcb66db1c8c8272b02fb1cedf9886e010f82 This is a reland of b73ee3344ac96f92b4418c3c07779eb95a8541cc Original change's description: > [Memory] Use OS::Allocate for all OS memory allocations. > > - Eliminates OS::ReserveRegion and OS::ReserveAlignedRegion. > - Changes OS::Allocate to take alignment parameter, reorders parameters > to match page_allocator. > - Since the size of memory allocation can be deduced, don't return the > amount of memory allocated. > - Changes reservation of aligned address space. Before we would reserve > (size + alignment) rounded up to page size. This is too much, because > maximum misalignment is (alignment - page_size). > - On Windows and Cygwin, we release an oversize allocation and > immediately retry at the aligned address in the allocation. If we > lose the address due to a race, we just retry. > - Clean up all the calls to OS::Allocate in codegen and tests by adding > helper AllocateSystemPage function (allocation.h) and > AllocateAssemblerBuffer (cctest.h). > - Changes 'assm' to 'masm' in some targets for consistency when using > a macro-assembler. > > - Eliminates OS::ReleaseRegion, replacing with calls to OS::Free. > - Adds bool return value to OS::Free. > - Cleans up types of flags, protection on Windows and Cygwin. > Bug: chromium:756050 > Cq-Include-Trybots: master.tryserver.chromium.linux:linux_chromium_rel_ng > Change-Id: I306dbe042cc867670fdc935abca29db074b0da71 Cq-Include-Trybots: master.tryserver.chromium.linux:linux_chromium_rel_ng Change-Id: Iad3c025334e8f8d7d647be99a36a11ee449c9087 Reviewed-on: https://chromium-review.googlesource.com/767014 Commit-Queue: Bill Budge <bbudge@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/master@{#49363}
2017-11-14 15:55:09 +00:00
auto buffer = AllocateAssemblerBuffer();
MacroAssembler assembler(isolate, v8::internal::CodeObjectRequired::kYes,
buffer->CreateView());
MacroAssembler* masm = &assembler; // Create a pointer for the __ macro.
struct T {
int32_t i32x4_low[4];
int32_t i32x4_high[4];
int16_t i16x8_low[8];
int16_t i16x8_high[8];
int8_t i8x16_low[16];
int8_t i8x16_high[16];
int32_t f32x4_low[4];
int32_t f32x4_high[4];
};
T t;
__ stm(db_w, sp, r4.bit() | r5.bit() | r6.bit() | r7.bit() | lr.bit());
__ veor(q0, q0, q0); // Zero
__ veor(q1, q1, q1); // Zero
for (int i = 0; i < 4; i++) {
__ mov(r4, Operand(i));
__ ReplaceLane(q0, q0, r4, NeonS32, i);
SwVfpRegister si = SwVfpRegister::from_code(i);
__ vmov(si, r4);
__ ReplaceLane(q1, q1, si, i);
}
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, i32x4_low))));
__ vst1(Neon8, NeonListOperand(q0), NeonMemOperand(r4));
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, f32x4_low))));
__ vst1(Neon8, NeonListOperand(q1), NeonMemOperand(r4));
__ veor(q0, q0, q0); // Zero
for (int i = 0; i < 8; i++) {
__ mov(r4, Operand(i));
__ ReplaceLane(q0, q0, r4, NeonS16, i);
}
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, i16x8_low))));
__ vst1(Neon8, NeonListOperand(q0), NeonMemOperand(r4));
__ veor(q0, q0, q0); // Zero
for (int i = 0; i < 16; i++) {
__ mov(r4, Operand(i));
__ ReplaceLane(q0, q0, r4, NeonS8, i);
}
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, i8x16_low))));
__ vst1(Neon8, NeonListOperand(q0), NeonMemOperand(r4));
if (CpuFeatures::IsSupported(VFP32DREGS)) {
__ veor(q14, q14, q14); // Zero
__ veor(q15, q15, q15); // Zero
for (int i = 0; i < 4; i++) {
__ mov(r4, Operand(-i));
__ ReplaceLane(q14, q14, r4, NeonS32, i);
SwVfpRegister si = SwVfpRegister::from_code(i);
__ vmov(si, r4);
__ ReplaceLane(q15, q15, si, i);
}
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, i32x4_high))));
__ vst1(Neon8, NeonListOperand(q14), NeonMemOperand(r4));
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, f32x4_high))));
__ vst1(Neon8, NeonListOperand(q15), NeonMemOperand(r4));
__ veor(q14, q14, q14); // Zero
for (int i = 0; i < 8; i++) {
__ mov(r4, Operand(-i));
__ ReplaceLane(q14, q14, r4, NeonS16, i);
}
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, i16x8_high))));
__ vst1(Neon8, NeonListOperand(q14), NeonMemOperand(r4));
__ veor(q14, q14, q14); // Zero
for (int i = 0; i < 16; i++) {
__ mov(r4, Operand(-i));
__ ReplaceLane(q14, q14, r4, NeonS8, i);
}
__ add(r4, r0, Operand(static_cast<int32_t>(offsetof(T, i8x16_high))));
__ vst1(Neon8, NeonListOperand(q14), NeonMemOperand(r4));
}
__ ldm(ia_w, sp, r4.bit() | r5.bit() | r6.bit() | r7.bit() | pc.bit());
CodeDesc desc;
masm->GetCode(isolate, &desc);
Reland "Reland "[deoptimizer] Change deopt entries into builtins"" This is a reland of fbfa9bf4ec72b1b73a96b70ccb68cd98c321511b The arm64 was missing proper codegen for CFI, thus sizes were off. Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced72eb65b6b5530ccabaf2eaebe45bf9d33 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682 Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70672}
2020-10-21 05:12:25 +00:00
Handle<Code> code =
Factory::CodeBuilder(isolate, desc, CodeKind::FOR_TESTING).Build();
#ifdef DEBUG
StdoutStream os;
code->Print(os);
#endif
auto f = GeneratedCode<F3>::FromCode(*code);
f.Call(&t, 0, 0, 0, 0);
for (int i = 0; i < 4; i++) {
CHECK_EQ(i, t.i32x4_low[i]);
CHECK_EQ(i, t.f32x4_low[i]);
}
for (int i = 0; i < 8; i++) {
CHECK_EQ(i, t.i16x8_low[i]);
}
for (int i = 0; i < 16; i++) {
CHECK_EQ(i, t.i8x16_low[i]);
}
if (CpuFeatures::IsSupported(VFP32DREGS)) {
for (int i = 0; i < 4; i++) {
CHECK_EQ(-i, t.i32x4_high[i]);
CHECK_EQ(-i, t.f32x4_high[i]);
}
for (int i = 0; i < 8; i++) {
CHECK_EQ(-i, t.i16x8_high[i]);
}
for (int i = 0; i < 16; i++) {
CHECK_EQ(-i, t.i8x16_high[i]);
}
}
}
Reland "Reland "[deoptimizer] Change deopt entries into builtins"" This is a reland of fbfa9bf4ec72b1b73a96b70ccb68cd98c321511b The arm64 was missing proper codegen for CFI, thus sizes were off. Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced72eb65b6b5530ccabaf2eaebe45bf9d33 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682 Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70672}
2020-10-21 05:12:25 +00:00
TEST(DeoptExitSizeIsFixed) {
CHECK(Deoptimizer::kSupportsFixedDeoptExitSizes);
Isolate* isolate = CcTest::i_isolate();
HandleScope handles(isolate);
auto buffer = AllocateAssemblerBuffer();
MacroAssembler masm(isolate, v8::internal::CodeObjectRequired::kYes,
buffer->CreateView());
STATIC_ASSERT(static_cast<int>(kFirstDeoptimizeKind) == 0);
for (int i = 0; i < kDeoptimizeKindCount; i++) {
DeoptimizeKind kind = static_cast<DeoptimizeKind>(i);
Label before_exit;
masm.bind(&before_exit);
[Turboprop] Move deoptimizations for dynamic map checks into builtin. In order to reduce the codegen size of dynamic map checks, add the ability to have an eager with resume deopt point, which can call a given builitin to perform a more detailed check than can be done in codegen, and then either deoptimizes itself (as if the calling code had performed an eager deopt) or resumes execution in the calling code after the check. In addition, support for adding extra arguments to a deoptimization continuation is added to enable us to pass the necessary arguments to the DynamicMapChecks builtin. Finally, a trampoline is added to the DynamicMapChecks which saves the registers that might be clobbered by that builtin, to avoid having to save them in the generated code. This trampoline also performs the deoptimization based on the result of the DynamicMapChecks builtin. In order to ensure both the trampoline and DynamicMapChecks builtin have the same call interface, and to limit the number of registers that need saving in the trampoline, the DynamicMapChecks builtin is moved to be a CSA builtin with a custom CallInterfaceDescriptor, that calls an exported Torque macro that implements the actual functionality. All told, this changes the codegen for a monomorphic dynamic map check from: movl rbx,<expected_map> cmpl [<object>-0x1],rbx jnz <deferred_call> resume_point: ... deferred_call: <spill registers> movl rax,<slot> movq rbx,<object> movq rcx,<handler> movq r10,<DynamicMapChecks> call r10 cmpq rax,0x0 jz <restore_regs> cmpq rax,0x1 jz <deopt_point_1> cmpq rax,0x2 jz <deopt_point_2> int3l restore_regs: <restore_regs> jmp <resume_point> ... deopt_point_1: call Deoptimization_Eager deopt_point_2: call Deoptimization_Bailout To: movl rax,<slot> movl rcx,<expected_map> movq rdx,<handler> cmpl [<object>-0x1],rcx jnz <deopt_point> resume_point: ... deopt_point: call DynamicMapChecksTrampoline jmp <resume_point> BUG=v8:10582 Change-Id: Ica4927b9acc963b9b73dc62d9379a7815335650f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2560197 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#71545}
2020-12-01 17:45:40 +00:00
if (kind == DeoptimizeKind::kEagerWithResume) {
Builtin target = Deoptimizer::GetDeoptWithResumeBuiltin(
DeoptimizeReason::kDynamicCheckMaps);
[Turboprop] Move deoptimizations for dynamic map checks into builtin. In order to reduce the codegen size of dynamic map checks, add the ability to have an eager with resume deopt point, which can call a given builitin to perform a more detailed check than can be done in codegen, and then either deoptimizes itself (as if the calling code had performed an eager deopt) or resumes execution in the calling code after the check. In addition, support for adding extra arguments to a deoptimization continuation is added to enable us to pass the necessary arguments to the DynamicMapChecks builtin. Finally, a trampoline is added to the DynamicMapChecks which saves the registers that might be clobbered by that builtin, to avoid having to save them in the generated code. This trampoline also performs the deoptimization based on the result of the DynamicMapChecks builtin. In order to ensure both the trampoline and DynamicMapChecks builtin have the same call interface, and to limit the number of registers that need saving in the trampoline, the DynamicMapChecks builtin is moved to be a CSA builtin with a custom CallInterfaceDescriptor, that calls an exported Torque macro that implements the actual functionality. All told, this changes the codegen for a monomorphic dynamic map check from: movl rbx,<expected_map> cmpl [<object>-0x1],rbx jnz <deferred_call> resume_point: ... deferred_call: <spill registers> movl rax,<slot> movq rbx,<object> movq rcx,<handler> movq r10,<DynamicMapChecks> call r10 cmpq rax,0x0 jz <restore_regs> cmpq rax,0x1 jz <deopt_point_1> cmpq rax,0x2 jz <deopt_point_2> int3l restore_regs: <restore_regs> jmp <resume_point> ... deopt_point_1: call Deoptimization_Eager deopt_point_2: call Deoptimization_Bailout To: movl rax,<slot> movl rcx,<expected_map> movq rdx,<handler> cmpl [<object>-0x1],rcx jnz <deopt_point> resume_point: ... deopt_point: call DynamicMapChecksTrampoline jmp <resume_point> BUG=v8:10582 Change-Id: Ica4927b9acc963b9b73dc62d9379a7815335650f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2560197 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#71545}
2020-12-01 17:45:40 +00:00
masm.CallForDeoptimization(target, 42, &before_exit, kind, &before_exit,
nullptr);
CHECK_EQ(masm.SizeOfCodeGeneratedSince(&before_exit),
Deoptimizer::kEagerWithResumeBeforeArgsSize);
[Turboprop] Move deoptimizations for dynamic map checks into builtin. In order to reduce the codegen size of dynamic map checks, add the ability to have an eager with resume deopt point, which can call a given builitin to perform a more detailed check than can be done in codegen, and then either deoptimizes itself (as if the calling code had performed an eager deopt) or resumes execution in the calling code after the check. In addition, support for adding extra arguments to a deoptimization continuation is added to enable us to pass the necessary arguments to the DynamicMapChecks builtin. Finally, a trampoline is added to the DynamicMapChecks which saves the registers that might be clobbered by that builtin, to avoid having to save them in the generated code. This trampoline also performs the deoptimization based on the result of the DynamicMapChecks builtin. In order to ensure both the trampoline and DynamicMapChecks builtin have the same call interface, and to limit the number of registers that need saving in the trampoline, the DynamicMapChecks builtin is moved to be a CSA builtin with a custom CallInterfaceDescriptor, that calls an exported Torque macro that implements the actual functionality. All told, this changes the codegen for a monomorphic dynamic map check from: movl rbx,<expected_map> cmpl [<object>-0x1],rbx jnz <deferred_call> resume_point: ... deferred_call: <spill registers> movl rax,<slot> movq rbx,<object> movq rcx,<handler> movq r10,<DynamicMapChecks> call r10 cmpq rax,0x0 jz <restore_regs> cmpq rax,0x1 jz <deopt_point_1> cmpq rax,0x2 jz <deopt_point_2> int3l restore_regs: <restore_regs> jmp <resume_point> ... deopt_point_1: call Deoptimization_Eager deopt_point_2: call Deoptimization_Bailout To: movl rax,<slot> movl rcx,<expected_map> movq rdx,<handler> cmpl [<object>-0x1],rcx jnz <deopt_point> resume_point: ... deopt_point: call DynamicMapChecksTrampoline jmp <resume_point> BUG=v8:10582 Change-Id: Ica4927b9acc963b9b73dc62d9379a7815335650f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2560197 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#71545}
2020-12-01 17:45:40 +00:00
} else {
Builtin target = Deoptimizer::GetDeoptimizationEntry(kind);
[Turboprop] Move deoptimizations for dynamic map checks into builtin. In order to reduce the codegen size of dynamic map checks, add the ability to have an eager with resume deopt point, which can call a given builitin to perform a more detailed check than can be done in codegen, and then either deoptimizes itself (as if the calling code had performed an eager deopt) or resumes execution in the calling code after the check. In addition, support for adding extra arguments to a deoptimization continuation is added to enable us to pass the necessary arguments to the DynamicMapChecks builtin. Finally, a trampoline is added to the DynamicMapChecks which saves the registers that might be clobbered by that builtin, to avoid having to save them in the generated code. This trampoline also performs the deoptimization based on the result of the DynamicMapChecks builtin. In order to ensure both the trampoline and DynamicMapChecks builtin have the same call interface, and to limit the number of registers that need saving in the trampoline, the DynamicMapChecks builtin is moved to be a CSA builtin with a custom CallInterfaceDescriptor, that calls an exported Torque macro that implements the actual functionality. All told, this changes the codegen for a monomorphic dynamic map check from: movl rbx,<expected_map> cmpl [<object>-0x1],rbx jnz <deferred_call> resume_point: ... deferred_call: <spill registers> movl rax,<slot> movq rbx,<object> movq rcx,<handler> movq r10,<DynamicMapChecks> call r10 cmpq rax,0x0 jz <restore_regs> cmpq rax,0x1 jz <deopt_point_1> cmpq rax,0x2 jz <deopt_point_2> int3l restore_regs: <restore_regs> jmp <resume_point> ... deopt_point_1: call Deoptimization_Eager deopt_point_2: call Deoptimization_Bailout To: movl rax,<slot> movl rcx,<expected_map> movq rdx,<handler> cmpl [<object>-0x1],rcx jnz <deopt_point> resume_point: ... deopt_point: call DynamicMapChecksTrampoline jmp <resume_point> BUG=v8:10582 Change-Id: Ica4927b9acc963b9b73dc62d9379a7815335650f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2560197 Commit-Queue: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Georg Neis <neis@chromium.org> Reviewed-by: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#71545}
2020-12-01 17:45:40 +00:00
masm.CallForDeoptimization(target, 42, &before_exit, kind, &before_exit,
nullptr);
CHECK_EQ(masm.SizeOfCodeGeneratedSince(&before_exit),
kind == DeoptimizeKind::kLazy
? Deoptimizer::kLazyDeoptExitSize
: Deoptimizer::kNonLazyDeoptExitSize);
}
Reland "Reland "[deoptimizer] Change deopt entries into builtins"" This is a reland of fbfa9bf4ec72b1b73a96b70ccb68cd98c321511b The arm64 was missing proper codegen for CFI, thus sizes were off. Original change's description: > Reland "[deoptimizer] Change deopt entries into builtins" > > This is a reland of 7f58ced72eb65b6b5530ccabaf2eaebe45bf9d33 > > It fixes the different exit size emitted on x64/Atom CPUs due to > performance tuning in TurboAssembler::Call. Additionally, add > cctests to verify the fixed size exits. > > Original change's description: > > [deoptimizer] Change deopt entries into builtins > > > > While the overall goal of this commit is to change deoptimization > > entries into builtins, there are multiple related things happening: > > > > - Deoptimization entries, formerly stubs (i.e. Code objects generated > > at runtime, guaranteed to be immovable), have been converted into > > builtins. The major restriction is that we now need to preserve the > > kRootRegister, which was formerly used on most architectures to pass > > the deoptimization id. The solution differs based on platform. > > - Renamed DEOPT_ENTRIES_OR_FOR_TESTING code kind to FOR_TESTING. > > - Removed heap/ support for immovable Code generation. > > - Removed the DeserializerData class (no longer needed). > > - arm64: to preserve 4-byte deopt exits, introduced a new optimization > > in which the final jump to the deoptimization entry is generated > > once per Code object, and deopt exits can continue to emit a > > near-call. > > - arm,ia32,x64: change to fixed-size deopt exits. This reduces exit > > sizes by 4/8, 5, and 5 bytes, respectively. > > > > On arm the deopt exit size is reduced from 12 (or 16) bytes to 8 bytes > > by using the same strategy as on arm64 (recalc deopt id from return > > address). Before: > > > > e300a002 movw r10, <id> > > e59fc024 ldr ip, [pc, <entry offset>] > > e12fff3c blx ip > > > > After: > > > > e59acb35 ldr ip, [r10, <entry offset>] > > e12fff3c blx ip > > > > On arm64 the deopt exit size remains 4 bytes (or 8 bytes in same cases > > with CFI). Additionally, up to 4 builtin jumps are emitted per Code > > object (max 32 bytes added overhead per Code object). Before: > > > > 9401cdae bl <entry offset> > > > > After: > > > > # eager deoptimization entry jump. > > f95b1f50 ldr x16, [x26, <eager entry offset>] > > d61f0200 br x16 > > # lazy deoptimization entry jump. > > f95b2b50 ldr x16, [x26, <lazy entry offset>] > > d61f0200 br x16 > > # the deopt exit. > > 97fffffc bl <eager deoptimization entry jump offset> > > > > On ia32 the deopt exit size is reduced from 10 to 5 bytes. Before: > > > > bb00000000 mov ebx,<id> > > e825f5372b call <entry> > > > > After: > > > > e8ea2256ba call <entry> > > > > On x64 the deopt exit size is reduced from 12 to 7 bytes. Before: > > > > 49c7c511000000 REX.W movq r13,<id> > > e8ea2f0700 call <entry> > > > > After: > > > > 41ff9560360000 call [r13+<entry offset>] > > > > Bug: v8:8661,v8:8768 > > Change-Id: I13e30aedc360474dc818fecc528ce87c3bfeed42 > > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2465834 > > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > > Reviewed-by: Ross McIlroy <rmcilroy@chromium.org> > > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > > Cr-Commit-Position: refs/heads/master@{#70597} > > Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org > Bug: v8:8661,v8:8768,chromium:1140165 > Change-Id: Ibcd5c39c58a70bf2b2ac221aa375fc68d495e144 > Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2485506 > Reviewed-by: Jakob Gruber <jgruber@chromium.org> > Reviewed-by: Tobias Tebbi <tebbi@chromium.org> > Commit-Queue: Jakob Gruber <jgruber@chromium.org> > Cr-Commit-Position: refs/heads/master@{#70655} Tbr: ulan@chromium.org, tebbi@chromium.org, rmcilroy@chromium.org Bug: v8:8661 Bug: v8:8768 Bug: chromium:1140165 Change-Id: I471cc94fc085e527dc9bfb5a84b96bd907c2333f Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2488682 Reviewed-by: Jakob Gruber <jgruber@chromium.org> Commit-Queue: Jakob Gruber <jgruber@chromium.org> Cr-Commit-Position: refs/heads/master@{#70672}
2020-10-21 05:12:25 +00:00
}
}
#undef __
} // namespace test_macro_assembler_arm
} // namespace internal
} // namespace v8