Reason for revert:
Speculative revert because of blocked roll: https://codereview.chromium.org/2596013002/
Original issue's description:
> [TypeFeedbackVector] Root literal arrays in function literals slots
>
> Literal arrays and feedback vectors for a function can be garbage
> collected if we don't have a rooted closure for the function, which
> happens often. It's expensive to come back from this (recreating
> boilerplates and gathering feedback again), and the cost is
> disproportionate if the function was inlined into optimized code.
>
> To guard against losing these arrays when we need them, we'll now
> create literal arrays when creating the feedback vector for the outer
> closure, and root them strongly in that vector.
>
> BUG=v8:5456
>
> Review-Url: https://codereview.chromium.org/2504153002
> Cr-Commit-Position: refs/heads/master@{#41893}
> Committed: 93df094081TBR=bmeurer@chromium.org,mlippautz@chromium.org,mvstanton@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=v8:5456
Review-Url: https://codereview.chromium.org/2597163002
Cr-Commit-Position: refs/heads/master@{#41917}
Literal arrays and feedback vectors for a function can be garbage
collected if we don't have a rooted closure for the function, which
happens often. It's expensive to come back from this (recreating
boilerplates and gathering feedback again), and the cost is
disproportionate if the function was inlined into optimized code.
To guard against losing these arrays when we need them, we'll now
create literal arrays when creating the feedback vector for the outer
closure, and root them strongly in that vector.
BUG=v8:5456
Review-Url: https://codereview.chromium.org/2504153002
Cr-Commit-Position: refs/heads/master@{#41893}
eval() may introduce a scope which needs to be represented as a context at
runtime, e.g.,
eval('var x; let y; ()=>y')
introduces a variable y which needs to have a context allocated for it. However,
when traversing upwards to find the declaration context for a variable which leaks,
as the declaration of x does above, this context has to be understood to not be
a declaration context in sloppy mode.
This patch makes that distinction by introducing a different map for eval-introduced
contexts. A dynamic search for the appropriate context will continue past an eval
context to find the appropriate context. Marking contexts as eval contexts rather
than function contexts required updates in each compiler backend.
BUG=v8:5295, chromium:648719
Review-Url: https://codereview.chromium.org/2435023002
Cr-Commit-Position: refs/heads/master@{#41869}
Encode the PropertyAttribute and whether the function
names must be set as a flag instead of setting two registers.
BUG=v8:5624
Review-Url: https://codereview.chromium.org/2586463002
Cr-Commit-Position: refs/heads/master@{#41812}
This is so that a NotSuperConstructor error is thrown before evaluating the
arguments to the super constructor. Besides updating the runtime function, a
new bytecode GetSuperConstructor is introduced.
BUG=v8:5336
Review-Url: https://codereview.chromium.org/2504553003
Cr-Commit-Position: refs/heads/master@{#41788}
Transform LdaNull/LdaUndefined followed by StrictEquality to TestNull/TestUndefined.
This would avoid a call to the compare IC. In the bytecode-graph builder these are
mapped to StrictEqual javascript operator. When reducing this operator, we already
optimize the cases for null/undefined.
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2554723004
Cr-Commit-Position: refs/heads/master@{#41768}
This introduces an explicit struct for the communication channel between
the {ArrayLiteral} AST node and the corresponding runtime methods. Those
methods take a pair of {ElementsKind} as well as an array (can either be
a FixedArray or a FixedDoubleArray) of constant values.
For bonus points it also reduces the size of the involved heap object by
one word (i.e. length field of FixedArray not needed anymore).
R=mvstanton@chromium.org
Review-Url: https://codereview.chromium.org/2581683003
Cr-Commit-Position: refs/heads/master@{#41752}
Allocate the registers used as arguments to a call on-demand after visiting the
argument (or reciever). This means that the visited expression can use registers
that would otherwise have been allocated for arguments which haven't been
visited yet.
The reason for doing this is to avoid keeping things live in registers
unecessarily for chained function calls, which avoids a memory leak for
functions which chain a large number of calls with large temporary arguments /
recievers.
BUG=chromium:672027
Review-Url: https://codereview.chromium.org/2557173004
Cr-Commit-Position: refs/heads/master@{#41714}
Templatizes the AccumulatorUsage and OperandType for BytecodeNode creation and
BytecodeRegisterOptimizer::PrepareForBytecode. This allows the compiler to
statically know whether the bytecode being created accesses the accumulator
and what operand types need scaling, avoiding runtime checks in the code.
Also removes BytecodeNode::set_bytecode methods.
Review-Url: https://codereview.chromium.org/2542903003
Cr-Commit-Position: refs/heads/master@{#41706}
Introduces:
- a new AST node representing the GetIterator() algorithm in the specification, to be used by ForOfStatement, YieldExpression (in the case of delegating yield*), and the future `for-await-of` loop proposed in http://tc39.github.io/proposal-async-iteration/#sec-async-iterator-value-unwrap-functions.
- a new opcode (JumpIfJSReceiver), which is useful for `if Type(object) is not Object` checks which are common throughout the specification. This node is easily eliminated by TurboFan.
The AST node is desugared specially in bytecode, rather than manually when building the AST. The benefit of this is that desugaring in the BytecodeGenerator is much simpler and easier to understand than desugaring the AST.
This also reduces parse time very slightly, and allows us to use LoadIC rather than KeyedLoadIC, which seems to have better baseline performance. This results in a ~20% improvement in test/js-perf-test/Iterators micro-benchmarks, which I believe owes to the use of the slightly faster LoadIC as opposed to the KeyedLoadIC in the baseline case. Both produce identical optimized code via TurboFan when the type check can be eliminated, and the load can be replaced with a constant value.
BUG=v8:4280
R=bmeurer@chromium.org, rmcilroy@chromium.org, adamk@chromium.org, neis@chromium.org, jarin@chromium.orgTBR=rossberg@chromium.org
Review-Url: https://codereview.chromium.org/2557593004
Cr-Commit-Position: refs/heads/master@{#41555}
This just calls into a runtime function for implementation currently.
Intermediate step in speeding up constructor calls containing a spread.
The NewWithSpread bytecode will probably end up having different arguments with future CLs - the constructor and the new.target should have their own regs. For now we are calling into the runtime function, so we need the regs together.
BUG=v8:5659
Review-Url: https://codereview.chromium.org/2541113004
Cr-Commit-Position: refs/heads/master@{#41542}
Equality with null/undefined is equivalent to a check on the undetectable bit
on the map of the object. This would be more efficient than performing the entire
comparison operation.
This cl introduces:
1. A new bytecode called TestUndetectable that checks if the object is null/undefined.
2. Updates peeophole optimizer to emit TestUndetectable when a LdaNull/Undefined
precedes equality check.
4. TestUndetectable is transformed to ObjectIsUndetectable operator when building
turbofan graph.
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2547043002
Cr-Commit-Position: refs/heads/master@{#41514}
Reorders the jump bytecodes so that the majority of jump checks can be
implemented as range checks (rather than a list of comparisons that get
compiled to a bunch of jumps).
Review-Url: https://codereview.chromium.org/2537123002
Cr-Commit-Position: refs/heads/master@{#41498}
This allows us to optimise the bytecode liveness analysis to jump
directly to previously seen indices. The analysis is optimised to store
a stack of loop ends (JumpLoop bytecode indices), and iterate through
these indices directly rather than looping through the bytecode array to
find them.
Review-Url: https://codereview.chromium.org/2536653003
Cr-Commit-Position: refs/heads/master@{#41485}
This pre-calculates and stores a vector of bytecode offsets, and then allows
one to iterate over it backwards. This could probably be adapted to a
bidirectional/random access iterator if we wanted to, but for now reverse
is all we need.
Review-Url: https://codereview.chromium.org/2518003002
Cr-Commit-Position: refs/heads/master@{#41153}
Add bytecode for defining data properties, which initially just calls the runtime function.
BUG=v8:5624
Review-Url: https://codereview.chromium.org/2510743002
Cr-Commit-Position: refs/heads/master@{#41101}
The reasons are:
1) The names dictionaries in the feedback metadata seems to consume a lot of memory
and the idea didn't payoff.
2) The absence of a name parameter blocks data handlers support in LoadGlobalIC.
This CL reverts a part of r37278 (https://codereview.chromium.org/2096653003/).
BUG=chromium:576312, v8:5561
Review-Url: https://codereview.chromium.org/2510653002
Cr-Commit-Position: refs/heads/master@{#41046}
This is in preparation for introducing more specialized
CodeStubAssembler subclasses. The state object can be handed
around, while the Assembler instances are temporary-scoped.
BUG=v8:5628
Original review: https://codereview.chromium.org/2498073002/
Review-Url: https://codereview.chromium.org/2502293002
Cr-Commit-Position: refs/heads/master@{#41028}
Adds a bytecode to set and retrieve the pending message. This avoids a
runtime call in finally blocks, and also ensures that TurboFan builds a
graph using the SetMessage / LoadMessage nodes instead of inserting a
runtime call.
BUG=chromium:662334
Review-Url: https://codereview.chromium.org/2501503005
Cr-Commit-Position: refs/heads/master@{#41023}
Reason for revert:
https://build.chromium.org/p/client.v8/builders/V8%20Linux%20-%20shared doesn't want to compile. Missing export annotation?
Original issue's description:
> [refactoring] Split CodeAssemblerState out of CodeAssembler
>
> This is in preparation for introducing more specialized
> CodeStubAssembler subclasses. The state object can be handed
> around, while the Assembler instances are temporary-scoped.
>
> BUG=v8:5628
TBR=ishell@chromium.org,mstarzinger@chromium.org,jkummerow@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=v8:5628
Review-Url: https://codereview.chromium.org/2504913002
Cr-Commit-Position: refs/heads/master@{#41018}
This is in preparation for introducing more specialized
CodeStubAssembler subclasses. The state object can be handed
around, while the Assembler instances are temporary-scoped.
BUG=v8:5628
Review-Url: https://codereview.chromium.org/2498073002
Cr-Commit-Position: refs/heads/master@{#41015}
SourcePosition::InliningId() refers to a the new table DeoptimizationInputData::InliningPositions(), which provides the following data for every inlining id:
- The inlined SharedFunctionInfo as an offset into DeoptimizationInfo::LiteralArray
- The SourcePosition of the inlining. Recursively, this yields the full inlining stack.
Before the Code object is created, the same information can be found in CompilationInfo::inlined_functions().
If SourcePosition::InliningId() is SourcePosition::kNotInlined, it refers to the outer (non-inlined) function.
So every SourcePosition has full information about its inlining stack, as long as the corresponding Code object is known. The internal represenation of a source position is a positive 64bit integer.
All compilers create now appropriate source positions for inlined functions. In the case of Turbofan, this required using AstGraphBuilderWithPositions for inlined functions too. So this class is now moved to a header file.
At the moment, the additional information in source positions is only used in --trace-deopt and --code-comments. The profiler needs to be updated, at the moment it gets the correct script offsets from the deopt info, but the wrong script id from the reconstructed deopt stack, which can lead to wrong outputs. This should be resolved by making the profiler use the new inlining information for deopts.
I activated the inlined deoptimization tests in test-cpu-profiler.cc for Turbofan, changing them to a case where the deopt stack and the inlining position agree. It is currently still broken for other cases.
The following additional changes were necessary:
- The source position table (internal::SourcePositionTableBuilder etc.) supports now 64bit source positions. Encoding source positions in a single 64bit int together with the difference encoding in the source position table results in very little overhead for the inlining id, since only 12% of the source positions in Octane have a changed inlining id.
- The class HPositionInfo was effectively dead code and is now removed.
- SourcePosition has new printing and information facilities, including computing a full inlining stack.
- I had to rename compiler/source-position.{h,cc} to compiler/compiler-source-position-table.{h,cc} to avoid clashes with the new src/source-position.cc file.
- I wrote the new wrapper PodArray for ByteArray. It is a template working with any POD-type. This is used in DeoptimizationInputData::InliningPositions().
- I removed HInlinedFunctionInfo and HGraph::inlined_function_infos, because they were only used for the now obsolete Crankshaft inlining ids.
- Crankshaft managed a list of inlined functions in Lithium: LChunk::inlined_functions. This is an analog structure to CompilationInfo::inlined_functions. So I removed LChunk::inlined_functions and made Crankshaft use CompilationInfo::inlined_functions instead, because this was necessary to register the offsets into the literal array in a uniform way. This is a safe change because LChunk::inlined_functions has no other uses and the functions in CompilationInfo::inlined_functions have a strictly longer lifespan, being created earlier (in Hydrogen already).
BUG=v8:5432
Review-Url: https://codereview.chromium.org/2451853002
Cr-Commit-Position: refs/heads/master@{#40975}
We seem to get some small wins from avoiding the Ldr bytecodes, probably due
to reduced icache pressure since there are less bytecode handlers. Replace
the Ldr bytecodes with Star lookahead inlined into the Lda versions.
Also fixes IsAccumulatorLoadWithoutEffects to include LdaContextSlot and
LdaCurrentContextSlot
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2489513005
Cr-Commit-Position: refs/heads/master@{#40883}
The Ldr[Named/Keyed]Property bytecodes are problematic for the deoptimizer when
inlining accessors in TurboFan. Remove them and replace with a Star lookahead
in the bytecode handlers for Lda[Named/Keyed]Property.
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2485383002
Cr-Commit-Position: refs/heads/master@{#40860}
This introduces two new bytecodes LdaModuleVariable and StaModuleVariable,
replacing the corresponding runtime calls.
Support in the bytecode graph builder exists only in the form of runtime calls.
BUG=v8:1569
Review-Url: https://codereview.chromium.org/2471033004
Cr-Commit-Position: refs/heads/master@{#40825}
The existing Load/StoreContextElement operations take the index as an int. This
CL adds versions that take the index as a Node. These already existed in the
interpreter-assembler, from which they are now removed.
R=mstarzinger@chromium.org, rmcilroy@chromium.org
BUG=
Review-Url: https://codereview.chromium.org/2473003004
Cr-Commit-Position: refs/heads/master@{#40810}
The majority of context slot accesses are to the local context (current context
register and depth 0), so this adds bytecodes to optimise for that case.
This cuts down bytecode size by roughly 1% (measured on Octane and Top25).
Review-Url: https://codereview.chromium.org/2459513002
Cr-Commit-Position: refs/heads/master@{#40641}
This is a new bytecode which behaves (for now) exactly like Call,
except that in turbofan graph building we can set the
ConvertReceiverMode to NotNullOrUndefined.
I observe a 1% improvement on Box2D, I'd expect a similar improvement on
other OOP heavy code.
Review-Url: https://codereview.chromium.org/2450243002
Cr-Commit-Position: refs/heads/master@{#40610}
Modify the Bytecode Register Optimizer to be an independent component
rather than part of the BytecodePipeline. This means the BytecodeArrayBuilder
can explicitly call it with register operands when outputting a bytecode
and the Bytecode Register Optimizer doesn't need to work out which operands
are register operands. This also means we don't need to build BytecodeNodes
for Ldar / Star / Mov bytecodes unless they are actually emitted by the
optimizer.
This change also modifies the way the BytecodeArrayBuilder converts
operands to make use of the OperandTypes specified in bytecodes.h.
This avoids having to individually convert operands to their raw output
value before calling Output(...).
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2393683004
Cr-Commit-Position: refs/heads/master@{#40543}
This allows people writing code stubs to just verify the graph of the stub they're working on, at least until we fix all of the issues we have and enable the verification by default.
Also fixes representations in CodeStubAssembler::SmiOr and InterpreterAssembler::StarDispatchLookahead.
R=bmeurer@chromium.org
BUG=
Review-Url: https://codereview.chromium.org/2413653006
Cr-Commit-Position: refs/heads/master@{#40320}
This reverts commit 7db0ecdec3.
Manual revert since automatic revert is too large for the web interface.
BUG=
TBR=bmeurer@chromium.org,mstarzinger@chromium.org,yangguo@chromium.org,ahaas@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
Review-Url: https://codereview.chromium.org/2396353002
Cr-Commit-Position: refs/heads/master@{#40082}
There are only a few occasions where we allocate a register in an outer
expression allocation scope, which makes the costly free-list approach
of the BytecodeRegisterAllocator unecessary. This CL replaces all
occurrences with moves to the accumulator and stores to a register
allocated in the correct scope. By doing this, we can simplify the
BytecodeRegisterAllocator to be a simple bump-pointer allocator
with registers released in the same order as allocated.
The following changes are also made:
- Make BytecodeRegisterOptimizer able to use registers which have been
unallocated, but not yet reused
- Remove RegisterExpressionResultScope and rename
AccumulatorExpressionResultScope to ValueExpressionResultScope
- Introduce RegisterList to represent consecutive register
allocations, and use this for operands to call bytecodes.
By avoiding the free-list handling, this gives another couple of
percent on CodeLoad.
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2369873002
Cr-Commit-Position: refs/heads/master@{#39905}
They are nops, but will be used when verifying the machine graph.
BUG=
Review-Url: https://codereview.chromium.org/2367413002
Cr-Commit-Position: refs/heads/master@{#39758}
This CL optimizes the code in BytecodeArrayBuilder and
BytecodeArrayWriter by making the following main changes:
- Move operand scale calculation out of BytecodeArrayWriter to the
BytecodeNode constructor, where the decision on which operands are
scalable can generally be statically decided by the compiler.
- Move the maximum register calculation out of BytecodeArrayWriter
and into BytecodeRegisterOptimizer (which is the only place outside
BytecodeGenerator which updates which registers are used). This
avoids the BytecodeArrayWriter needing to know the operand types
of a node as it writes it.
- Modify EmitBytecodes to use individual push_backs rather than
building a buffer and calling insert, since this turns out to be faster.
- Initialize BytecodeArrayWriter's bytecode vector by reserving 512
bytes,
- Make common functions in Bytecodes constexpr so that they
can be statically calculated by the compiler.
- Move common functions and constructors in Bytecodes and
BytecodeNode to the header so that they can be inlined.
- Change large static switch statements in Bytecodes to const array
lookups, and move to the header to allow inlining.
I also took the opportunity to remove a number of unused helper
functions, and rework some others for consistency.
This reduces the percentage of time spent in making BytecodeArrays
in CodeLoad from ~15% to ~11% according to perf. The
CoadLoad score increase by around 2%.
BUG=v8:4280
Committed: https://crrev.com/b11a8b4d41bf09d6b3d6cf214fe3fb61faf01a64
Review-Url: https://codereview.chromium.org/2351763002
Cr-Original-Commit-Position: refs/heads/master@{#39599}
Cr-Commit-Position: refs/heads/master@{#39637}
Reason for revert:
Prime suspect for roll blocker: https://codereview.chromium.org/2362503002/
Original issue's description:
> [Interpreter] Optimize BytecodeArrayBuilder and BytecodeArrayWriter.
>
> This CL optimizes the code in BytecodeArrayBuilder and
> BytecodeArrayWriter by making the following main changes:
>
> - Move operand scale calculation out of BytecodeArrayWriter to the
> BytecodeNode constructor, where the decision on which operands are
> scalable can generally be statically decided by the compiler.
> - Move the maximum register calculation out of BytecodeArrayWriter
> and into BytecodeRegisterOptimizer (which is the only place outside
> BytecodeGenerator which updates which registers are used). This
> avoids the BytecodeArrayWriter needing to know the operand types
> of a node as it writes it.
> - Modify EmitBytecodes to use individual push_backs rather than
> building a buffer and calling insert, since this turns out to be faster.
> - Initialize BytecodeArrayWriter's bytecode vector by reserving 512
> bytes,
> - Make common functions in Bytecodes constexpr so that they
> can be statically calculated by the compiler.
> - Move common functions and constructors in Bytecodes and
> BytecodeNode to the header so that they can be inlined.
> - Change large static switch statements in Bytecodes to const array
> lookups, and move to the header to allow inlining.
>
> I also took the opportunity to remove a number of unused helper
> functions, and rework some others for consistency.
>
> This reduces the percentage of time spent in making BytecodeArrays
> in CodeLoad from ~15% to ~11% according to perf. The
> CoadLoad score increase by around 2%.
>
> BUG=v8:4280
>
> Committed: https://crrev.com/b11a8b4d41bf09d6b3d6cf214fe3fb61faf01a64
> Cr-Commit-Position: refs/heads/master@{#39599}
TBR=mythria@chromium.org,leszeks@chromium.org,rmcilroy@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2360193003
Cr-Commit-Position: refs/heads/master@{#39612}
This CL optimizes the code in BytecodeArrayBuilder and
BytecodeArrayWriter by making the following main changes:
- Move operand scale calculation out of BytecodeArrayWriter to the
BytecodeNode constructor, where the decision on which operands are
scalable can generally be statically decided by the compiler.
- Move the maximum register calculation out of BytecodeArrayWriter
and into BytecodeRegisterOptimizer (which is the only place outside
BytecodeGenerator which updates which registers are used). This
avoids the BytecodeArrayWriter needing to know the operand types
of a node as it writes it.
- Modify EmitBytecodes to use individual push_backs rather than
building a buffer and calling insert, since this turns out to be faster.
- Initialize BytecodeArrayWriter's bytecode vector by reserving 512
bytes,
- Make common functions in Bytecodes constexpr so that they
can be statically calculated by the compiler.
- Move common functions and constructors in Bytecodes and
BytecodeNode to the header so that they can be inlined.
- Change large static switch statements in Bytecodes to const array
lookups, and move to the header to allow inlining.
I also took the opportunity to remove a number of unused helper
functions, and rework some others for consistency.
This reduces the percentage of time spent in making BytecodeArrays
in CodeLoad from ~15% to ~11% according to perf. The
CoadLoad score increase by around 2%.
BUG=v8:4280
Review-Url: https://codereview.chromium.org/2351763002
Cr-Commit-Position: refs/heads/master@{#39599}
Adds a fast path for loading DYNAMIC_GLOBAL variables, which are lookup
variables that can be globally loaded, without calling the runtime, as long as
there was no context extension by a sloppy eval along their context chain.
BUG=v8:5263
Review-Url: https://codereview.chromium.org/2347143002
Cr-Commit-Position: refs/heads/master@{#39537}
Adds a fast path for loading DYNAMIC_LOCAL variables, which are lookup
variables that can be context loaded, without calling the runtime, as
long as there was no context extension by a sloppy eval along their
context chain.
BUG=v8:5263
Review-Url: https://codereview.chromium.org/2343633002
Cr-Commit-Position: refs/heads/master@{#39473}
This introduces a new {JumpLoop} bytecode to combine the OSR polling
mechanism modeled by {OsrPoll} with the actual {Jump} performing the
backwards branch. This reduces the overall size and also avoids one
additional dispatch. It also makes sure that OSR polling is only done
within real loops.
R=rmcilroy@chromium.org
BUG=v8:4764
Review-Url: https://codereview.chromium.org/2331033002
Cr-Commit-Position: refs/heads/master@{#39384}
Moves the context chain search loop out of generated bytecode, and into
the (Lda|Ldr|Sda)ContextSlot handler, by passing the context depth in as
an additional operand. This should decrease the bytecode size and
increase performance for deep context chain searches, at the cost of
slightly increasing bytecode size for shallow context access.
Review-Url: https://codereview.chromium.org/2336643002
Cr-Commit-Position: refs/heads/master@{#39378}