The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.
The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.
Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):
latency master patched improvement
x86_64 82.3961 54.8052 33.49%
x86_64v2 82.3415 54.8052 33.44%
x86_64v3 69.3661 50.4864 27.22%
i686 219.271 45.5396 79.23%
aarch64 29.2127 19.1951 34.29%
power10 19.5060 16.2760 16.56%
reciprocal-throughput master patched improvement
x86_64 28.3976 19.7334 30.51%
x86_64v2 28.4568 19.7334 30.65%
x86_64v3 21.1815 16.1811 23.61%
i686 105.016 15.1426 85.58%
aarch64 18.1573 10.7681 40.70%
power10 8.7207 8.7097 0.13%
Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>