This change encompasses what is necessary to enable stack checks in loops without suffering large regressions.
Primarily, it consists of a new mechanism for dealing with deferred blocks by "splintering", rather than splitting, inside deferred blocks.
My initial change was splitting along deferred block boundaries, but the regression introduced by stackchecks wasn't resolved conclusively. After investigation, it appears that just splitting ranges along cold block boundaries leads to a greater opportunity for moves on the hot path, hence the suboptimal outcome.
The alternative "splinters" ranges rather than splitting them. While splitting creates 2 ranges and links them (parent-child), in contrast, splintering creates a new independent range with no parent-child relation to the original. The original range appears as if it has a liveness hole in the place of the splintered one. All thus obtained ranges are then register allocated with no change to the register allocator.
The splinters (cold blocks) do not conflict with the hot path ranges, by construction. The hot path ones have less pressure to split, because we remove a source of conflicts. After allocation, we merge the splinters back to their original ranges and continue the pipeline. We leverage the previous changes made for deferred blocks (determining where to spill, for example).
Review URL: https://codereview.chromium.org/1305393003
Cr-Commit-Position: refs/heads/master@{#30357}
Adding an AstExpressionVisitor to touch each expression node in
an AST.
Adding TypingReseter to clear the slate after a failed asm.js
validation that has set partial typing information.
Adding a ExpressionTypeCollector to walk the expressions
in an AST and emit them as a string for testing.
Adding tests of the above.
LOG=N
BUG= https://code.google.com/p/v8/issues/detail?id=4203
TEST=test-typing-reset,test-ast-expression-visitor
R=rossberg@chromium.org,titzer@chromium.org
Review URL: https://codereview.chromium.org/1288773007
Cr-Commit-Position: refs/heads/master@{#30336}
This make inclusion of unicode-inl.h in object.h absolete. Now most
compilation units don't require that header. It also breaks a cycle
within declarations of the scanner.h header.
This tries to remove includes of "-inl.h" headers from normal ".h"
headers, thereby reducing the chance of any cyclic dependencies and
decreasing the average size of our compilation units.
Note that this change still leaves 3 violations of that rule in the
code, checked with the "tools/check-inline-includes.sh" tool.
R=yangguo@chromium.org
Review URL: https://codereview.chromium.org/1287893006
Cr-Commit-Position: refs/heads/master@{#30268}
Embedders would use these for features which must be able to be turned
off at runtime, despite being compiled into V8. They can be turned on
and off by the embedder using the --experimental_extras flag, e.g. via
v8::SetFlagsFromString.
R=yangguo@chromium.org, mlippautz@chromium.org, hpayer@chromium.org
BUG=chromium:507137
LOG=Y
Review URL: https://codereview.chromium.org/1284413002
Cr-Commit-Position: refs/heads/master@{#30260}
Previously, it was not possible to specify StackSlotOperands for all
slots in both the caller and callee stacks. Specifically, the region
of the callee's stack including the saved return address, frame
pointer, function pointer and context pointer could not be addressed
by the register allocator/gap resolver.
In preparation for better tail call support, which will use the gap
resolver to reconcile outgoing parameters, this change makes it
possible to address all slots on the stack, because slots in the
previously inaccessible dead zone may become parameter slots for
outgoing tail calls. All caller stack slots are accessible as they
were before, with slot -1 corresponding to the last stack
parameter. Stack slot indices >= 0 access the callee stack, with slot
0 corresponding to the callee's saved return address, 1 corresponding
to the saved frame pointer, 2 corresponding to the current function
context, 3 corresponding to the frame marker/JSFunction, and slots 4
and above corresponding to spill slots.
The following changes were specifically needed:
* Frame has been changed to explicitly manage three areas of the
callee frame, the fixed header, the spill slot area, and the
callee-saved register area.
* Conversions from stack slot indices to fp offsets all now go through
a common bottleneck: OptimizedFrame::StackSlotOffsetRelativeToFp
* The generation of deoptimization translation tables has been changed
to support the new stack slot indexing scheme. Crankshaft, which
doesn't support the new slot numbering in its register allocator,
must adapt the indexes when creating translation tables.
* Callee-saved parameters are now kept below spill slots, not above,
to support saving only the optimal set of used registers, which is
only known after register allocation is finished and spill slots
have been allocated.
Review URL: https://codereview.chromium.org/1261923007
Cr-Commit-Position: refs/heads/master@{#30224}
Bytecode generator for local assignment and basic binary operations.
Command-line flag for printing bytecodes.
BUG=v8:4280
LOG=N
Review URL: https://codereview.chromium.org/1294543002
Cr-Commit-Position: refs/heads/master@{#30221}
- Make the API look like v8::V8::InitializeICU.
(That is: A static method call, not an object to be created on the stack.)
- Fix path separator on Windows, by calling base::OS::isPathSeparator.
- Move into API, so that it can be called by hello-world & friends.
- Actually call it from hello-world and friends.
R=jochen@chromium.org
BUG=
Review URL: https://codereview.chromium.org/1292053002
Cr-Commit-Position: refs/heads/master@{#30174}
This is only an estimate since it counts objects that could be shared,
for example strings, cow arrays, heap numbers, etc.
It however ignores objects that could be shared, but may only be used
by the context to be measured, for example shared function infos,
script objects, scope infos, etc.
R=jochen@chromium.org
Review URL: https://codereview.chromium.org/1268333004
Cr-Commit-Position: refs/heads/master@{#30029}
To avoid tanking context startup performance, only the actual installation of the
JS-exposed API is flag-guarded. The remainder of the implementation still
resides in the snapshot.
Review URL: https://codereview.chromium.org/1257063003
Cr-Commit-Position: refs/heads/master@{#30017}
Added a separate flag for this, since we intend to enable it for the linear allocator as well. Currently, the option is "on" for greedy, as a point in time to enable its testing (through the greedy allocator bots).
BUG=
Review URL: https://codereview.chromium.org/1256313003
Cr-Commit-Position: refs/heads/master@{#30005}
Adds basic support for generation of interpreter bytecode handler code
snippets. The InterpreterAssembler class exposes a set of low level,
interpreter specific operations which can be used to build a Turbofan
graph. The Interpreter class generates a bytecode handler snippet for
each bytecode by assembling operations using an InterpreterAssembler.
Currently only two simple bytecodes are supported: LoadLiteral0 and Return.
BUG=v8:4280
LOG=N
Review URL: https://codereview.chromium.org/1239793002
Cr-Commit-Position: refs/heads/master@{#29814}
Using the GraphBuilder base class forces each node creation to go
through a virtual function dispatch just for the sake of saving the
duplication of the NewNode helper methods. In total that added up to
saving minus (sic!) six lines of code.
R=titzer@chromium.org
Review URL: https://codereview.chromium.org/1252093002
Cr-Commit-Position: refs/heads/master@{#29799}
In many cases, the context that TurboFan's ASTGraphBuilder or subsequent
reduction operations attaches to nodes does not need to be that exact
context, but rather only needs to be one with the same native context,
because it is used internally only to fetch the native context, e.g. for
creating and throwing exceptions.
This reducer recognizes common cases where the context that is specified
for a node can be relaxed to a canonical, less specific one. This
relaxed context can either be the enclosing function's context or a specific
Module or Script context that is explicitly created within the function.
This optimization is especially important for TurboFan-generated code stubs
which use context specialization and inlining to generate optimal code.
Without context relaxation, many extraneous moves are generated to pass
exactly the right context to internal functions like ToNumber and
AllocateHeapNumber, which only need the native context. By turning context
relaxation on, these moves disappear because all these common internal
context uses are unified to the context passed into the stub function, which
is typically already in the correct context register and remains there for
short stubs. It also eliminates the explicit use of a specialized context
constant in the code stub in these cases, which could cause memory leaks.
Review URL: https://codereview.chromium.org/1244583003
Cr-Commit-Position: refs/heads/master@{#29763}
This CL exposes the constructor function, defines type related
information, and implements value type semantics.
It also refactors test/mjsunit/samevalue.js to test SameValue and SameValueZero.
TEST=test/mjsunit/harmony/simd.js, test/cctest/test-simd.cc
LOG=Y
BUG=v8:4124
Committed: https://crrev.com/e5ed3bee99807c502fa7d7a367ec401e16d3f773
Cr-Commit-Position: refs/heads/master@{#29689}
Review URL: https://codereview.chromium.org/1219943002
Cr-Commit-Position: refs/heads/master@{#29712}
This CL exposes the constructor function, defines type related
information, and implements value type semantics.
It also refactors test/mjsunit/samevalue.js to test SameValue and SameValueZero.
TEST=test/mjsunit/harmony/simd.js, test/cctest/test-simd.cc
LOG=Y
BUG=v8:4124
Review URL: https://codereview.chromium.org/1219943002
Cr-Commit-Position: refs/heads/master@{#29689}
Until now, TF-generated code stubs piggy-backed off of the builtin
context. Since generation of code stubs is lazy, stubs generated at
different times in different native contexts would contain embedded
pointers different builtin contexts, leading to cross-context references
and memory leaks.
After this CL, all TF-generated code stubs are generated inside a
internal thinned-out, native context that lives solely for the
purpose of hosting generated code stubs.
Review URL: https://codereview.chromium.org/1213203007
Cr-Commit-Position: refs/heads/master@{#29593}
The breakage to Chrome seems to be based on @@isConcatSpreadable
and turning that part off with this patch fixes the Maps Tips & Tricks
test case.
BUG=chromium:507553
LOG=Y
R=adamk
Review URL: https://codereview.chromium.org/1226063002
Cr-Commit-Position: refs/heads/master@{#29545}
Optimize string "length" property access based on static type
information if possible, but also optimistically optimize the access
based on type feedback from the LoadIC.
R=jarin@chromium.org
Review URL: https://codereview.chromium.org/1216593003
Cr-Commit-Position: refs/heads/master@{#29543}
The RawMachineAssembler will be used to build the interpreter, so it needs
to move back to src/compiler.
This reverts commit b5b00cc031.
BUG=v8:4280
LOG=N
Review URL: https://codereview.chromium.org/1221303014
Cr-Commit-Position: refs/heads/master@{#29519}
We have to reland these two commits at once, because the first breaks
some asm.js benchmarks without the second. The change was reverted
because of bogus checks in the verifier, which will not work in the
presence of OSR (and where hidden because of the type back propagation
hack in OSR so far). Original messages are below:
[turbofan] Add new JSFrameSpecialization reducer.
The JSFrameSpecialization specializes an OSR graph to the current
unoptimized frame on which we will perform the on-stack replacement.
This is used for asm.js functions, where we cannot reuse the OSR
code object anyway because of context specialization, and so we could as
well specialize to the max instead.
It works by replacing all OsrValues in the graph with their values
in the JavaScriptFrame.
The idea is that using this trick we get better performance without
doing the unsound backpropagation of types to OsrValues later. This
is the first step towards fixing OSR for TurboFan.
[turbofan] Perform OSR deconstruction early and remove type propagation.
This way we don't have to deal with dead pre-OSR code in the graph
and risk optimizing the wrong code, especially we don't make
optimistic assumptions in the dead code that leaks into the OSR code
(i.e. deopt guards are in dead code, but the types propagate to OSR
code via the OsrValue type back propagation).
BUG=v8:4273
LOG=n
R=jarin@chromium.org
Review URL: https://codereview.chromium.org/1226673005
Cr-Commit-Position: refs/heads/master@{#29486}
The JSFrameSpecialization specializes an OSR graph to the current
unoptimized frame on which we will perform the on-stack replacement.
This is used for asm.js functions, where we cannot reuse the OSR code
object anyway because of context specialization, and so we could as well
specialize to the max instead.
It works by replacing all OsrValues in the graph with their values in
the JavaScriptFrame.
The idea is that using this trick we get better performance without
doing the unsound backpropagation of types to OsrValues later. This is
the first step towards fixing OSR for TurboFan.
R=jarin@chromium.org
BUG=v8:4273
LOG=n
Review URL: https://codereview.chromium.org/1225683004
Cr-Commit-Position: refs/heads/master@{#29476}
Conditionally including Array and TypedArray methods seems to cause
a slowdown in V8 context creation, possibly due to the new code added.
BUG=chromium:504629
R=adamk@chromium.org
LOG=Y
Review URL: https://codereview.chromium.org/1215863003
Cr-Commit-Position: refs/heads/master@{#29430}
Separated core greedy allocator concepts, exposing the APIs we would want to continue working with. In particular, this change completely reworks CoalescedLiveRanges to reflect the fact that we expect more than one possible conflict, scrapping the initial design of the structure. Since this is a critical part of the design, this change may be thought of as a full rewrite of the algorithm.
Reduced all heuristics to just 2 essential ones: split "somewhere", which we'll still need when all other heuristics fail; and spill.
Introduced a simple primitive for splitting - at GapPosition::START. The goal is to use such primitives to quickly and reliably author heuristics.
I expected this primitive to "just work" for any arbitrary instruction index within a live range - e.g. its middle. That's not the case, it seems to upset execution in certain scenarios. Restricting to either before/after use positions seems to work. I'm still investigating what the source of failures is in the case of "arbitrary instruction in the range" case.
I intended to document the rationale and prove the soundness of always using START for splits, but I will postpone to after this last remaining issue is resolved.
Review URL: https://codereview.chromium.org/1205173002
Cr-Commit-Position: refs/heads/master@{#29352}