v8/test/benchmarks
Steinar H. Gunderson 6da6e45099 Microoptimizations in FastDtoa.
Optimize FastDtoa, in particular Grisu3. In addition to making
a microbenchmark, there are a number of smaller and larger
changes here:

 - Replace divisions by power-of-ten with multiplications by
   their inverses, using an algorithm very similar to the one
   in libdivide.
 - For DiyFp::Times(), use 128-bit hardware multiplication
   if available (which it generally is on 64-bit platforms).
 - Where possible, send around a pointer to the end of the string,
   instead of a pointer and a length, reducing register pressure
   (especially for Intel). Where not (easily) possible, add
   a local variable to make the compiler understand that length
   and decimal_point cannot alias.
 - Change some ints to unsigneds where it helps us avoid sign
   extensions.
 - Some minor changes to reduce instruction dependency chains.
 - Inline BiggestPowerTen().

Actual performance gain is wildly different between platforms.
On my 3990X workstation (Zen 2), gains are about 21%. On a M1
Mac Mini, they are about 17%. But on my i7-10610U laptop
(Comet Lake, so Skylake microarchitecture), the function is
78% faster. This is probably because large divisions
(divisor over 255) seem to hurt a lot on Skylake, but I haven't
gone through it in detail.

Change-Id: I5b67c257d788a3f7d1be7065d055456852451d68
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4110741
Commit-Queue: Steinar H Gunderson <sesse@chromium.org>
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Cr-Commit-Position: refs/heads/main@{#84906}
2022-12-16 15:03:39 +00:00
..
cpp Microoptimizations in FastDtoa. 2022-12-16 15:03:39 +00:00
csuite [infra] Change all Python shebangs to Python3 2022-08-05 14:55:00 +00:00
benchmarks.status Re-enable octane/typescript for deopt_fuzzer 2022-09-09 08:34:45 +00:00
BUILD.gn [build] Add data deps for d8 test suites 2018-03-26 13:44:58 +00:00
testcfg.py [test] Refactor testrunner (4) 2022-07-18 09:52:24 +00:00