There is no good reason to have the meat of most objects' initialization
logic in heap.cc, all wrapped by the CALL_HEAP_FUNCTION macro. Instead,
this CL changes the protocol between Heap and Factory to be AllocateRaw,
and all object initialization work after (possibly retried) successful
raw allocation happens in the Factory.
This saves about 20KB of binary size on x64.
Original review: https://chromium-review.googlesource.com/c/v8/v8/+/959533
Originally landed as r52416 / f9a2e24bbc
Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
Change-Id: Id072cbe6b3ed30afd339c7e502844b99ca12a647
Reviewed-on: https://chromium-review.googlesource.com/1000540
Commit-Queue: Jakob Kummerow <jkummerow@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#52492}
This reverts commit f9a2e24bbc.
Reason for revert: gc stress failures not all fixed by follow up.
Original change's description:
> [cleanup] Refactor the Factory
>
> There is no good reason to have the meat of most objects' initialization
> logic in heap.cc, all wrapped by the CALL_HEAP_FUNCTION macro. Instead,
> this CL changes the protocol between Heap and Factory to be AllocateRaw,
> and all object initialization work after (possibly retried) successful
> raw allocation happens in the Factory.
>
> This saves about 20KB of binary size on x64.
>
> Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
> Change-Id: Icbfdc4266d7be8b48d2fe085f03411743dc6a0ca
> Reviewed-on: https://chromium-review.googlesource.com/959533
> Commit-Queue: Jakob Kummerow <jkummerow@chromium.org>
> Reviewed-by: Hannes Payer <hpayer@chromium.org>
> Reviewed-by: Yang Guo <yangguo@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#52416}
TBR=jkummerow@chromium.org,yangguo@chromium.org,mstarzinger@chromium.org,hpayer@chromium.org
Change-Id: Idbbc53478742f3e9525eee83342afc6aedae122f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
Reviewed-on: https://chromium-review.googlesource.com/999414
Reviewed-by: Michael Achenbach <machenbach@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>
Cr-Commit-Position: refs/heads/master@{#52420}
There is no good reason to have the meat of most objects' initialization
logic in heap.cc, all wrapped by the CALL_HEAP_FUNCTION macro. Instead,
this CL changes the protocol between Heap and Factory to be AllocateRaw,
and all object initialization work after (possibly retried) successful
raw allocation happens in the Factory.
This saves about 20KB of binary size on x64.
Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
Change-Id: Icbfdc4266d7be8b48d2fe085f03411743dc6a0ca
Reviewed-on: https://chromium-review.googlesource.com/959533
Commit-Queue: Jakob Kummerow <jkummerow@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#52416}
This changes the encoding of the {HandlerTable} from an array of Smi
values to a byte array. It allows embedding of said array into the
instruction stream of {Code} objects (similar to how safepoint tables
work). For interpreted bytecode the table is attached as a {ByteArray}
to the bytecode.
The advantage of this approach is a more compact encoding and also the
ability to move such tables easily off the GC'ed heap if needed (as is
done for WebAssembly code for example).
R=jarin@chromium.org
Change-Id: I3320415dff69b3d1053825bda0d667a28232bf6d
Reviewed-on: https://chromium-review.googlesource.com/934642
Commit-Queue: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Jaroslav Sevcik <jarin@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Clemens Hammacher <clemensh@chromium.org>
Cr-Commit-Position: refs/heads/master@{#51589}
assembler-arm64.h and assembler-arm64-inl.h have a B() function
which conflicts with the B macro in bytecode-utils.h.
Headers that leak macros can be annoying to deal with, in this case
we can't simply undef B at the end of source files that include
bytecode-utils.h because the second source file that includes
bytecode-utils.h won't see the B macro. Let's just move this macro
into the two unittest files that include this header.
Bug: chromium:746958
Change-Id: I588b73fe81615f882a0e010c92ba187d3bc2bf25
Reviewed-on: https://chromium-review.googlesource.com/758779
Commit-Queue: Mostyn Bramley-Moore <mostynb@vewd.com>
Reviewed-by: Clemens Hammacher <clemensh@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49258}
This CL adds support to optimize for..in in fast enum-cache mode to the
same degree that it was optimized in Crankshaft, without adding the same
deoptimization loop that Crankshaft had with missing enum cache indices.
That means code like
for (var k in o) {
var v = o[k];
// ...
}
and code like
for (var k in o) {
if (Object.prototype.hasOwnProperty.call(o, k)) {
var v = o[k];
// ...
}
}
which follows the https://eslint.org/docs/rules/guard-for-in linter
rule, can now utilize the enum cache indices if o has only fast
properties on the receiver, which speeds up the access o[k]
significantly and reduces the pollution of the global megamorphic
stub cache.
For example the micro-benchmark in the tracking bug v8:6702 now runs
faster than ever before:
forIn: 1516 ms.
forInHasOwnProperty: 1674 ms.
forInHasOwnPropertySafe: 1595 ms.
forInSum: 2051 ms.
forInSumSafe: 2215 ms.
Compared to numbers from V8 5.8 which is the last version running with
Crankshaft
forIn: 1641 ms.
forInHasOwnProperty: 1719 ms.
forInHasOwnPropertySafe: 1802 ms.
forInSum: 2226 ms.
forInSumSafe: 2409 ms.
and V8 6.0 which is the current stable version with TurboFan:
forIn: 1713 ms.
forInHasOwnProperty: 5417 ms.
forInHasOwnPropertySafe: 5324 ms.
forInSum: 7556 ms.
forInSumSafe: 11067 ms.
It also improves the throughput on the string-fasta benchmark by
around 7-10%, and there seems to be a ~5% improvement on the
Speedometer/React benchmark locally.
For this to work, the ForInPrepare bytecode was split into
ForInEnumerate and ForInPrepare, which is very similar to how it was
handled in Fullcodegen initially. In TurboFan we introduce a new
operator LoadFieldByIndex that does the dynamic property load.
This also removes the CheckMapValue operator again in favor of
just using LoadField, ReferenceEqual and CheckIf, which work
automatically with the EscapeAnalysis and the
BranchConditionElimination.
Bug: v8:6702
Change-Id: I91235413eea478ba77ace7bd14bb2f62e155dd9a
Reviewed-on: https://chromium-review.googlesource.com/645949
Commit-Queue: Benedikt Meurer <bmeurer@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Reviewed-by: Jaroslav Sevcik <jarin@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47768}
Move bytecode array writing logic into the array builder, allowing us to
remove the bytecode array writer and bytecode node, and convert runtime
operand writing to compile-time bytecode operand writing using the
information statically known at compile time.
Bug: v8:6474
Change-Id: I210cd9897fd41293745614e4a253c7c251dfffc9
Reviewed-on: https://chromium-review.googlesource.com/533055
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#46183}
The BytecodePipeline is no longer used by any optimizers, so remove it and
connect the BytecodeArrayBuilder directly to the BytecodeWriter.
Also remove some functions from BytecodeNode which are no longer used.
BUG=v8:6194
Change-Id: Id2ec94ff1d4db41b108a778100459283fbb2256c
Reviewed-on: https://chromium-review.googlesource.com/471528
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#44619}
Move dead bytecode elimination from a seperate bytecode pipeline optimizer
into the BytecodeArrayWriter. This removes the last bytecode pipeline
optimizer, which means we can remove the Bytecode pipeline which,
which should increase compile speed.
BUG=v8:6194
Change-Id: I47fb3c3463b2b8a92e02cf7a6b608683fcfa5261
Reviewed-on: https://chromium-review.googlesource.com/471407
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#44568}
Moves the logic for eliding non-effectful accumulator load elision from the
peephole optimizer to the BytecodeArrayWriter.
BUG=v8:6194
Change-Id: I05fbe4ee8ac340e5c355285d0b47e4a9d52fd0a8
Reviewed-on: https://chromium-review.googlesource.com/469828
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#44560}
Removes handles from bytecode generation, instead storing
un-internalized AstValues (and other, similar values such as Scopes and
AstRawStrings) in the constant array builder.
This will allow us in the future to generate the bytecode before
internalizing the AST.
BUG=v8:5832
Change-Id: I3b8be8f7329a484eb1e5d12808b001d3475239da
Reviewed-on: https://chromium-review.googlesource.com/439326
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#43115}
Since JumpLoop is always backwards, and other jumps are always forwards,
we can store the jump offset as an always positive integer and decide on
the jump direction based on the bytecode. This will save a small amount
of space for large-ish for loops (>128 bytecodes).
Review-Url: https://codereview.chromium.org/2641443002
Cr-Commit-Position: refs/heads/master@{#42638}
Downside: this adds all kinds of weird includes in the .cc files.
(See design doc linked in the bug.)
BUG=v8:5402
Review-Url: https://codereview.chromium.org/2622503002
Cr-Commit-Position: refs/heads/master@{#42140}
SourcePosition::InliningId() refers to a the new table DeoptimizationInputData::InliningPositions(), which provides the following data for every inlining id:
- The inlined SharedFunctionInfo as an offset into DeoptimizationInfo::LiteralArray
- The SourcePosition of the inlining. Recursively, this yields the full inlining stack.
Before the Code object is created, the same information can be found in CompilationInfo::inlined_functions().
If SourcePosition::InliningId() is SourcePosition::kNotInlined, it refers to the outer (non-inlined) function.
So every SourcePosition has full information about its inlining stack, as long as the corresponding Code object is known. The internal represenation of a source position is a positive 64bit integer.
All compilers create now appropriate source positions for inlined functions. In the case of Turbofan, this required using AstGraphBuilderWithPositions for inlined functions too. So this class is now moved to a header file.
At the moment, the additional information in source positions is only used in --trace-deopt and --code-comments. The profiler needs to be updated, at the moment it gets the correct script offsets from the deopt info, but the wrong script id from the reconstructed deopt stack, which can lead to wrong outputs. This should be resolved by making the profiler use the new inlining information for deopts.
I activated the inlined deoptimization tests in test-cpu-profiler.cc for Turbofan, changing them to a case where the deopt stack and the inlining position agree. It is currently still broken for other cases.
The following additional changes were necessary:
- The source position table (internal::SourcePositionTableBuilder etc.) supports now 64bit source positions. Encoding source positions in a single 64bit int together with the difference encoding in the source position table results in very little overhead for the inlining id, since only 12% of the source positions in Octane have a changed inlining id.
- The class HPositionInfo was effectively dead code and is now removed.
- SourcePosition has new printing and information facilities, including computing a full inlining stack.
- I had to rename compiler/source-position.{h,cc} to compiler/compiler-source-position-table.{h,cc} to avoid clashes with the new src/source-position.cc file.
- I wrote the new wrapper PodArray for ByteArray. It is a template working with any POD-type. This is used in DeoptimizationInputData::InliningPositions().
- I removed HInlinedFunctionInfo and HGraph::inlined_function_infos, because they were only used for the now obsolete Crankshaft inlining ids.
- Crankshaft managed a list of inlined functions in Lithium: LChunk::inlined_functions. This is an analog structure to CompilationInfo::inlined_functions. So I removed LChunk::inlined_functions and made Crankshaft use CompilationInfo::inlined_functions instead, because this was necessary to register the offsets into the literal array in a uniform way. This is a safe change because LChunk::inlined_functions has no other uses and the functions in CompilationInfo::inlined_functions have a strictly longer lifespan, being created earlier (in Hydrogen already).
BUG=v8:5432
Review-Url: https://codereview.chromium.org/2451853002
Cr-Commit-Position: refs/heads/master@{#40975}
Modify the Bytecode Register Optimizer to be an independent component
rather than part of the BytecodePipeline. This means the BytecodeArrayBuilder
can explicitly call it with register operands when outputting a bytecode
and the Bytecode Register Optimizer doesn't need to work out which operands
are register operands. This also means we don't need to build BytecodeNodes
for Ldar / Star / Mov bytecodes unless they are actually emitted by the
optimizer.
This change also modifies the way the BytecodeArrayBuilder converts
operands to make use of the OperandTypes specified in bytecodes.h.
This avoids having to individually convert operands to their raw output
value before calling Output(...).
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2393683004
Cr-Commit-Position: refs/heads/master@{#40543}
This CL optimizes the code in BytecodeArrayBuilder and
BytecodeArrayWriter by making the following main changes:
- Move operand scale calculation out of BytecodeArrayWriter to the
BytecodeNode constructor, where the decision on which operands are
scalable can generally be statically decided by the compiler.
- Move the maximum register calculation out of BytecodeArrayWriter
and into BytecodeRegisterOptimizer (which is the only place outside
BytecodeGenerator which updates which registers are used). This
avoids the BytecodeArrayWriter needing to know the operand types
of a node as it writes it.
- Modify EmitBytecodes to use individual push_backs rather than
building a buffer and calling insert, since this turns out to be faster.
- Initialize BytecodeArrayWriter's bytecode vector by reserving 512
bytes,
- Make common functions in Bytecodes constexpr so that they
can be statically calculated by the compiler.
- Move common functions and constructors in Bytecodes and
BytecodeNode to the header so that they can be inlined.
- Change large static switch statements in Bytecodes to const array
lookups, and move to the header to allow inlining.
I also took the opportunity to remove a number of unused helper
functions, and rework some others for consistency.
This reduces the percentage of time spent in making BytecodeArrays
in CodeLoad from ~15% to ~11% according to perf. The
CoadLoad score increase by around 2%.
BUG=v8:4280
Committed: https://crrev.com/b11a8b4d41bf09d6b3d6cf214fe3fb61faf01a64
Review-Url: https://codereview.chromium.org/2351763002
Cr-Original-Commit-Position: refs/heads/master@{#39599}
Cr-Commit-Position: refs/heads/master@{#39637}
Reason for revert:
Prime suspect for roll blocker: https://codereview.chromium.org/2362503002/
Original issue's description:
> [Interpreter] Optimize BytecodeArrayBuilder and BytecodeArrayWriter.
>
> This CL optimizes the code in BytecodeArrayBuilder and
> BytecodeArrayWriter by making the following main changes:
>
> - Move operand scale calculation out of BytecodeArrayWriter to the
> BytecodeNode constructor, where the decision on which operands are
> scalable can generally be statically decided by the compiler.
> - Move the maximum register calculation out of BytecodeArrayWriter
> and into BytecodeRegisterOptimizer (which is the only place outside
> BytecodeGenerator which updates which registers are used). This
> avoids the BytecodeArrayWriter needing to know the operand types
> of a node as it writes it.
> - Modify EmitBytecodes to use individual push_backs rather than
> building a buffer and calling insert, since this turns out to be faster.
> - Initialize BytecodeArrayWriter's bytecode vector by reserving 512
> bytes,
> - Make common functions in Bytecodes constexpr so that they
> can be statically calculated by the compiler.
> - Move common functions and constructors in Bytecodes and
> BytecodeNode to the header so that they can be inlined.
> - Change large static switch statements in Bytecodes to const array
> lookups, and move to the header to allow inlining.
>
> I also took the opportunity to remove a number of unused helper
> functions, and rework some others for consistency.
>
> This reduces the percentage of time spent in making BytecodeArrays
> in CodeLoad from ~15% to ~11% according to perf. The
> CoadLoad score increase by around 2%.
>
> BUG=v8:4280
>
> Committed: https://crrev.com/b11a8b4d41bf09d6b3d6cf214fe3fb61faf01a64
> Cr-Commit-Position: refs/heads/master@{#39599}
TBR=mythria@chromium.org,leszeks@chromium.org,rmcilroy@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2360193003
Cr-Commit-Position: refs/heads/master@{#39612}
This CL optimizes the code in BytecodeArrayBuilder and
BytecodeArrayWriter by making the following main changes:
- Move operand scale calculation out of BytecodeArrayWriter to the
BytecodeNode constructor, where the decision on which operands are
scalable can generally be statically decided by the compiler.
- Move the maximum register calculation out of BytecodeArrayWriter
and into BytecodeRegisterOptimizer (which is the only place outside
BytecodeGenerator which updates which registers are used). This
avoids the BytecodeArrayWriter needing to know the operand types
of a node as it writes it.
- Modify EmitBytecodes to use individual push_backs rather than
building a buffer and calling insert, since this turns out to be faster.
- Initialize BytecodeArrayWriter's bytecode vector by reserving 512
bytes,
- Make common functions in Bytecodes constexpr so that they
can be statically calculated by the compiler.
- Move common functions and constructors in Bytecodes and
BytecodeNode to the header so that they can be inlined.
- Change large static switch statements in Bytecodes to const array
lookups, and move to the header to allow inlining.
I also took the opportunity to remove a number of unused helper
functions, and rework some others for consistency.
This reduces the percentage of time spent in making BytecodeArrays
in CodeLoad from ~15% to ~11% according to perf. The
CoadLoad score increase by around 2%.
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2351763002
Cr-Commit-Position: refs/heads/master@{#39599}
This introduces a new {JumpLoop} bytecode to combine the OSR polling
mechanism modeled by {OsrPoll} with the actual {Jump} performing the
backwards branch. This reduces the overall size and also avoids one
additional dispatch. It also makes sure that OSR polling is only done
within real loops.
R=rmcilroy@chromium.org
BUG=v8:4764
Review-Url: https://codereview.chromium.org/2331033002
Cr-Commit-Position: refs/heads/master@{#39384}
These JavaScript operators were special hacks to ensure that we always
operate on Smis for the magic for-in index variable, but this never
really worked in the OSR case, because the OsrValue for the index
variable didn't have the proper information (that we have for the
JSForInPrepare in the non-OSR case).
Now that we have loop induction variable analysis and binary operation
hints, we can just use JSLessThan and JSAdd instead with appropriate
Smi hints, which handle the OSR case by inserting Smi checks (that are
always true). Thanks to OSR deconstruction and loop peeling these Smi
checks will be hoisted so they don't hurt the OSR case too much.
Drive-by-change: Rename the ForInDone bytecode to ForInContinue, since
we have to lower it to JSLessThan to get the loop induction variable
goodness.
R=epertoso@chromium.org
BUG=v8:5267
Review-Url: https://codereview.chromium.org/2289613002
Cr-Commit-Position: refs/heads/master@{#38968}
Removes all accesses to the Isolate during bytecode generation and the
bytecode pipeline. Adds an DisallowIsolateAccessScope which is used to
enforce this invariant within the BytecodeGenerator.
BUG=v8:5203
Review-Url: https://codereview.chromium.org/2242193002
Cr-Commit-Position: refs/heads/master@{#38716}
Now that all backends use the source position builder to record source
positions, simplify the code line logging events to take a source
position table on code creation. This means that the source position
table builder no longer needs to access the isolate until the table is
generated. This is required for off-thread bytecode generation.
BUG=v8:5203
Review-Url: https://codereview.chromium.org/2248673002
Cr-Commit-Position: refs/heads/master@{#38676}
This gets rid of the Star bytecodes that were always dispatched to from
ToObject.
ToObject now outputs to register instead of to the accumulator and
ForInPrepare gets the receiver object from an input register.
BUG=v8:4820
LOG=n
Review-Url: https://codereview.chromium.org/2189463006
Cr-Commit-Position: refs/heads/master@{#38177}
Add explicit state in BytecodeSourceInfo to simplify checks for
validity and whether a statement or expression position.
Remove BytecodeSourceInfo::Update which inherited rules for updating
source position information during bytecode building.
BUG=v8:4280
LOG=N
Review-Url: https://codereview.chromium.org/2048203002
Cr-Commit-Position: refs/heads/master@{#37136}
This moves processing of jumps out of bytecode array builder and into
bytecode array writer. This simplifies the pipeline by avoiding having
to flush for offset and patch up offsets in bytecode array builder based
on what was emitted by the bytecode array writer.
This also enables future refactorings to add dead code elimination back
into the pipeline, and move processing of scalable operand sizes to the
end of the pipeline (in the bytecode array writer) rather than having to
deal with scalable operand types throughout pipeline.
BUG=v8:4280,chromium:616064
Review-Url: https://codereview.chromium.org/2035813002
Cr-Commit-Position: refs/heads/master@{#36716}
This change introduces a pipeline for the final stages of
bytecode generation.
The peephole optimizer is made distinct from the BytecodeArrayBuilder.
A new BytecodeArrayWriter is responsible for writing bytecode. It
also keeps track of the maximum register seen and offers a potentially
smaller frame size.
R=rmcilroy@chromium.org
LOG=N
BUG=v8:4280
Review-Url: https://codereview.chromium.org/1947403002
Cr-Commit-Position: refs/heads/master@{#36220}