Previously we tried to share some code on by a slightly confusing re-use
of LDivI for a (general) flooring division. Now we cleanly separate
concerns, just like for the rest of the division-like operations. Note
that ARM64 already did it this way.
If we really want to save some code, we can introduce some macro
assembler instructions and/or helper functions in the code generator in
a future CL, but we should really try to avoid being "clever" to save
just a few lines of trivial code. Effort != complexity. :-)
Renamed some related Lithium operands on the way for more consistency.
R=mstarzinger@chromium.org
Review URL: https://codereview.chromium.org/212703002
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@20395 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
Lithium currently supports 3 division-like operations on integral operands: "Normal" division (rounding towards zero), flooring division (rounding towards -Infinity) and modulus calculation (the counterpart for the "normal" division). For divisors which are a power of 2, one can efficiently use some bit fiddling to avoid the actual division for such operations. This CL cleanly splits off these operations into separate Lithium instructions, making the code much more maintainable and more consistent across platforms.
There are 2 basic variations of these bit fiddling algorithms: One involving branches and a seemingly more clever one without branches. Choosing between the two is not as easy as it seems: Benchmarks (and probably real-world) programs seem to favor positive dividends, registers and shifting units are sometimes scarce resources, and branch prediction is quite good in modern processors. Therefore only the "normal" division by a power of 2 is implemented in a branch-free manner, this seems to be the best approach in practice. If this turns out to be wrong, we can easily and locally change this.
R=bmeurer@chromium.org
Review URL: https://codereview.chromium.org/175143002
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@19715 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
Recent changes in IC logic meant that CallStubs no longer use the Contextual bit. IsUndeclaredGlobal() needed to adjust for that.
In fact, now the CL has morphed to remove the notion of storing contextual state in the IC at all, it just becomes some extra ic state of the load ic. This took some adjustment in harmony code to use the global receiver for certain stores.
Now it's clearer that only LoadICs actually record any information about contextual or not.
R=verwaest@chromium.org
Review URL: https://codereview.chromium.org/140943002
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18660 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
call machinery. The change replaces CallNamed, CallKeyed,
CallConstantFunction and CallKnownGlobal hydrogen instructions with two
new instructions with a more lower level semantics:
1. CallJSFunction for direct calls of JSFunction objects (no
argument adaptation)
2. CallWithDescriptor for calls of a given Code object according to
the supplied calling convention.
Details:
CallJSFunction should be straightforward, the main difference from the
existing InvokeFunction instruction is the absence of argument adaptor
handling. (As a next step, we will replace InvokeFunction with an
equivalent hydrogen code.)
For CallWithDescriptor, the calling conventions are represented by a
tweaked version of CallStubInterfaceDescriptor. In addition to the
parameter-register mapping, we also define parameter-representation
mapping there. The CallWithDescriptor instruction has variable number of
parameters now - this required some simple tweaks in Lithium, which
assumed fixed number of arguments in some places.
The calling conventions used in the calls are initialized in the
CallDescriptors class (code-stubs.h, <arch>/code-stubs-<arch>.cc), and
they live in a new table in the Isolate class. I should say I am not
quite sure about Representation::Integer32() representation for some of
the params of ArgumentAdaptorCall - it is not clear to me wether the
params could not end up on the stack and thus confuse the GC.
The change also includes an earlier small change to argument adaptor
(https://codereview.chromium.org/98463007) that avoids passing a naked
pointer to the code entry as a parameter. I am sorry for packaging that
with an already biggish change.
Performance implications:
Locally, I see a small regression (.2% or so). It is hard to say where
exactly it comes from, but I do see inefficient call sequences to the
adaptor trampoline. For example:
;;; <@78,#24> constant-t
bf85aa515a mov edi,0x5a51aa85 ;; debug: position 29
;;; <@72,#53> load-named-field
8b7717 mov esi,[edi+0x17] ;; debug: position 195
;;; <@80,#51> constant-s
b902000000 mov ecx,0x2 ;; debug: position 195
;;; <@81,#51> gap
894df0 mov [ebp+0xf0],ecx
;;; <@82,#103> constant-i
bb01000000 mov ebx,0x1
;;; <@84,#102> constant-i
b902000000 mov ecx,0x2
;;; <@85,#102> gap
89d8 mov eax,ebx
89cb mov ebx,ecx
8b4df0 mov ecx,[ebp+0xf0]
;;; <@86,#58> call-with-descriptor
e8ef57fcff call ArgumentsAdaptorTrampoline (0x2d80e6e0) ;; code: BUILTIN
Note the silly handling of ecx; the hydrogen for this code is:
0 4 s27 Constant 1 range:1_1 <|@
0 3 t30 Constant 0x5bc1aa85 <JS Function xyz (SharedFunctionInfo 0x5bc1a919)> type:object <|@
0 1 t36 LoadNamedField t30.[in-object]@24 <|@
0 1 t38 Constant 0x2300e6a1 <Code> <|@
0 1 i102 Constant 2 range:2_2 <|@
0 1 i103 Constant 1 range:1_1 <|@
0 2 t41 CallWithDescriptor t38 t30 t36 s27 i103 i102 #2 changes[*] <|@
BUG=
R=verwaest@chromium.org, danno@chromium.org
Review URL: https://codereview.chromium.org/104663004
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18626 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
by escape analysis). Added several tests that expose the bug.
Summary:
LCodegen::AddToTranslation assumes that Lithium environments are
generated by depth-first traversal, but LChunkBuilder::CreateEnvironment
was generating them in breadth-first fashion. This fixes the
CreateEnvironment to traverse the captured objects depth-first.
Note:
It might be worth considering representing LEnvironment by a list
with the same order as the serialized translation representation
rather than having two lists with a subtle relationship between
them (and then serialize in a slightly different order again).
R=titzer@chromium.org, mstarzinger@chromium.org
LOG=N
BUG=
Review URL: https://codereview.chromium.org/93803003
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18470 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
It was only used for Math.log, and even then only in full code and in %_MathLog. For crankshafted code, Intel already used the FP operations directly, while the ARM/MIPS ports were a bit lazy and simply called the stub. The latter directly call the C library now without any cache. It would be possible to directly generate machine code if somebody has the time, from what I've seen out in the wild it should be only about a dozen instructions.
LOG=y
R=yangguo@chromium.org
Review URL: https://codereview.chromium.org/113343003
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18344 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
This removes tons of architecture-specific code and makes it easy to
experiment with other pseudo-RNG algorithms. The crankshafted code is
extremely good, keeping all things unboxed and doing only minimal
checks, so it is basically equivalent to the handwritten code.
When benchmarks are run without parallel recompilation, we get a few
percent regression on SunSpider's string-validate-input and
string-base64, but these benchmarks run so fast that the overall
SunSpider score is hardly affected and within the usual jitter. Note
that these benchmarks actually run even faster when we don't
crankshaft at all on the main thread (the regression is not caused by
bad code, it is caused by Crankshaft needing a few hundred microsecond
for compilation of a trivial function). Luckily, when parallel
recompilation is enabled, i.e. in the browser, we see no regression at
all!
R=mstarzinger@chromium.org
Review URL: https://codereview.chromium.org/68723002
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@17955 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
The %_OneByteSeqStringSetChar intrinsic expects its arguments to be checked before being called for efficiency reasons, but the fuzzer provided no such checks. Now the intrinsic is robust to bad input if FLAG_debug_code is set.
R=yangguo@chromium.org
TEST=test/mjsunit/regress/regress-320948.js
BUG=chromium:320948
LOG=Y
Review URL: https://codereview.chromium.org/72813004
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@17886 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
Lithium uses indexes after the maximium value ID in the HGraph as indexes
of virtual registers and assumes that the maximum value ID does not change.
The IsStandardConstant and GetConstantXX functions could add constants to
HGraph, which aliased virtual registers with real values. This could confuse
the register allocator to think that a value in a virtual register is tagged
and to incorrectly set it in the pointer map.
BUG=298269
TEST=mjsunit/regress/regress-298269.js
R=verwaest@chromium.org
Review URL: https://chromiumcodereview.appspot.com/66693002
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@17599 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
This improves the generated code for HSeqStringSetChar across
all platforms, taking advantage of constant operands whenever
possible. It also drops the unused DefineSameAsFirst constraint
for the register allocator on x64 and ia32, where it caused
unnecessary spills when the string operand was live across the
HSeqStringSetChar instruction.
A new GVN flag StringChars is introduced to express dependencies
between HSeqStringSetChar, HStringCharCodeAt and the upcoming
HSeqStringGetChar (the GVNFlags type is now 64bit in size).
Also improves the test case.
TEST=mjsunit/string-natives
R=mstarzinger@chromium.org, yangguo@chromium.org
Review URL: https://codereview.chromium.org/57383004
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@17521 ce2b1a6d-e550-0410-aec6-3dcde31c8c00