v8/benchmarks at 5472313c96a96a7ef2c2f74b6a9b631bba1a107a - v8

History

Steinar H. Gunderson 6da6e45099 Microoptimizations in FastDtoa. Optimize FastDtoa, in particular Grisu3. In addition to making a microbenchmark, there are a number of smaller and larger changes here: - Replace divisions by power-of-ten with multiplications by their inverses, using an algorithm very similar to the one in libdivide. - For DiyFp::Times(), use 128-bit hardware multiplication if available (which it generally is on 64-bit platforms). - Where possible, send around a pointer to the end of the string, instead of a pointer and a length, reducing register pressure (especially for Intel). Where not (easily) possible, add a local variable to make the compiler understand that length and decimal_point cannot alias. - Change some ints to unsigneds where it helps us avoid sign extensions. - Some minor changes to reduce instruction dependency chains. - Inline BiggestPowerTen(). Actual performance gain is wildly different between platforms. On my 3990X workstation (Zen 2), gains are about 21%. On a M1 Mac Mini, they are about 17%. But on my i7-10610U laptop (Comet Lake, so Skylake microarchitecture), the function is 78% faster. This is probably because large divisions (divisor over 255) seem to hurt a lot on Skylake, but I haven't gone through it in detail. Change-Id: I5b67c257d788a3f7d1be7065d055456852451d68 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4110741 Commit-Queue: Steinar H Gunderson <sesse@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#84906}		2022-12-16 15:03:39 +00:00
..
cpp	Microoptimizations in FastDtoa.	2022-12-16 15:03:39 +00:00
csuite	[infra] Change all Python shebangs to Python3	2022-08-05 14:55:00 +00:00
benchmarks.status	Re-enable octane/typescript for deopt_fuzzer	2022-09-09 08:34:45 +00:00
BUILD.gn	[build] Add data deps for d8 test suites	2018-03-26 13:44:58 +00:00
testcfg.py	[test] Refactor testrunner (4)	2022-07-18 09:52:24 +00:00