v8/cpp at fa73edeb1ef41c9f9430d16bb20991614c5405b5 - v8

History

Steinar H. Gunderson 6da6e45099 Microoptimizations in FastDtoa. Optimize FastDtoa, in particular Grisu3. In addition to making a microbenchmark, there are a number of smaller and larger changes here: - Replace divisions by power-of-ten with multiplications by their inverses, using an algorithm very similar to the one in libdivide. - For DiyFp::Times(), use 128-bit hardware multiplication if available (which it generally is on 64-bit platforms). - Where possible, send around a pointer to the end of the string, instead of a pointer and a length, reducing register pressure (especially for Intel). Where not (easily) possible, add a local variable to make the compiler understand that length and decimal_point cannot alias. - Change some ints to unsigneds where it helps us avoid sign extensions. - Some minor changes to reduce instruction dependency chains. - Inline BiggestPowerTen(). Actual performance gain is wildly different between platforms. On my 3990X workstation (Zen 2), gains are about 21%. On a M1 Mac Mini, they are about 17%. But on my i7-10610U laptop (Comet Lake, so Skylake microarchitecture), the function is 78% faster. This is probably because large divisions (divisor over 255) seem to hurt a lot on Skylake, but I haven't gone through it in detail. Change-Id: I5b67c257d788a3f7d1be7065d055456852451d68 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/4110741 Commit-Queue: Steinar H Gunderson <sesse@chromium.org> Reviewed-by: Michael Lippautz <mlippautz@chromium.org> Cr-Commit-Position: refs/heads/main@{#84906}		2022-12-16 15:03:39 +00:00
..
cppgc	cppgc: Add binary trees benchmark	2021-11-18 16:42:24 +00:00
BUILD.gn	Microoptimizations in FastDtoa.	2022-12-16 15:03:39 +00:00
DEPS	cppgc: Fix empty benchmark on Windows	2021-04-27 07:52:52 +00:00
dtoa.cc	Microoptimizations in FastDtoa.	2022-12-16 15:03:39 +00:00
empty.cc	cppgc: Fix empty benchmark on Windows	2021-04-27 07:52:52 +00:00