Commit Graph

26 Commits

Author SHA1 Message Date
Adhemerval Zanella
0e0be3ed80 math: Use tanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic tanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      51.5273        41.0951        20.25%
x86_64v2                    47.7021        39.1526        17.92%
x86_64v3                    45.0373        34.2737        23.90%
i686                       133.9970        83.8596        37.42%
aarch64 (Neoverse)          21.5439        14.7961        31.32%
power10                     13.3301         8.4406        36.68%

reciprocal-throughput        master        patched   improvement
x86_64                      24.9493        12.8547        48.48%
x86_64v2                    20.7051        12.7761        38.29%
x86_64v3                    19.2492        11.0851        42.41%
i686                        78.6498        29.8211        62.08%
aarch64 (Neoverse)          11.6026        7.11487        38.68%
power10                      6.3328         2.8746        54.61%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
1751c0519a math: Use sinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic sinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.6819        49.1489         6.71%
x86_64v2                    49.1162        42.9447        12.57%
x86_64v3                    46.9732        39.9157        15.02%
i686                       141.1470       129.6410         8.15%
aarch64 (Neoverse)          20.8539        17.1288        17.86%
power10                     14.5258        9.1906         36.73%

reciprocal-throughput        master        patched   improvement
x86_64                      27.5553        23.9395        13.12%
x86_64v2                    21.6423        20.3219         6.10%
x86_64v3                    21.4842        16.0224        25.42%
i686                        87.9709        86.1626         2.06%
aarch64 (Neoverse)          15.1919        12.2744        19.20%
power10                      7.2188         5.2611        27.12%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
9583836785 math: Use coshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode),
although it should worse performance than current one.  The current
implementation performance comes mainly from the internal usage of
the optimize expf implementation, and shows a maximum ULPs of 2 for
FE_TONEAREST and 3 for other rounding modes.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      40.6995        49.0737       -20.58%
x86_64v2                    40.5841        44.3604        -9.30%
x86_64v3                    39.3879        39.7502        -0.92%
i686                       112.3380       129.8570       -15.59%
aarch64 (Neoverse)          18.6914        17.0946         8.54%
power10                     11.1343        9.3245         16.25%

reciprocal-throughput        master        patched   improvement
x86_64                      18.6471        24.1077       -29.28%
x86_64v2                    17.7501        20.2946       -14.34%
x86_64v3                    17.8262        17.1877         3.58%
i686                        64.1454        86.5645       -34.95%
aarch64 (Neoverse)          9.77226        12.2314       -25.16%
power10                      4.0200        5.3316        -32.63%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
7cfd8b5698 math: Use atanhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      59.4930        45.8568        22.92%
x86_64v2                    59.5705        45.5804        23.48%
x86_64v3                    53.1838        37.7155        29.08%
i686                        169.354       133.5940        21.12%
aarch64 (Neoverse)          26.0781        16.9829        34.88%
power10                     15.6591        10.7623        31.27%

reciprocal-throughput        master        patched   improvement
x86_64                      23.5903        18.5766        21.25%
x86_64v2                    22.6489        18.2683        19.34%
x86_64v3                    19.0401        13.9474        26.75%
i686                        97.6034       107.3260        -9.96%
aarch64 (Neoverse)          15.3664        9.57846        37.67%
power10                      6.8877        4.6242         32.86%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
6f9bacf36b math: Use atan2f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atan2f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      68.1175        69.2014        -1.59%
x86_64v2                    66.9884        66.0081         1.46%
x86_64v3                    57.7034        61.6407        -6.82%
i686                       189.8690        152.7560       19.55%
aarch64 (Neoverse)          32.6151        24.5382        24.76%
power10                     21.7282        17.1896        20.89%

reciprocal-throughput        master        patched   improvement
x86_64                      34.5202        31.6155         8.41%
x86_64v2                    32.6379        30.3372         7.05%
x86_64v3                    34.3677        23.6455        31.20%
i686                       157.7290        75.8308        51.92%
aarch64 (Neoverse)          27.7788        16.2671        41.44%
power10                     15.5715         8.1588        47.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
a357d6273f math: Use atanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic atanf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      56.8265        53.6842         5.53%
x86_64v2                    54.8177        53.6842         2.07%
x86_64v3                    46.2915        48.7034        -5.21%
i686                       158.3760        108.9560       31.20%
aarch64 (Neoverse)           21.687        20.5893         5.06%
power10                     13.1903        13.5012        -2.36%

reciprocal-throughput        master        patched   improvement
x86_64                      16.6787        16.7601        -0.49%
x86_64v2                    16.6983        16.7601        -0.37%
x86_64v3                    16.2268        12.1391        25.19%
i686                       138.6840        36.0640        74.00%
aarch64 (Neoverse)          11.8012        10.3565        12.24%
power10                      5.3212         4.2894        19.39%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
ed608a40e2 math: Use asinhf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinhf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      64.5128        56.9717        11.69%
x86_64v2                    63.3065        57.2666         9.54%
x86_64v3                    62.8719        51.4170        18.22%
i686                       189.1630        137.635        27.24%
aarch64 (Neoverse)          25.3551        20.5757        18.85%
power10                     17.9712        13.3302        25.82%

reciprocal-throughput        master        patched   improvement
x86_64                      20.0844        15.4731        22.96%
x86_64v2                    19.2919        15.4000        20.17%
x86_64v3                    18.7226        11.9009        36.44%
i686                       103.7670        80.2681        22.65%
aarch64 (Neoverse)          12.5005        8.68969        30.49%
power10                      7.2220        5.03617        30.27%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>:
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
5fb4b566ef math: Use asinf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic asinf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      42.8237        35.2460        17.70%
x86_64v2                    43.3711        35.9406        17.13%
x86_64v3                    35.0335        30.5744        12.73%
i686                       213.8780        104.4710       51.15%
aarch64 (Neoverse)          17.2937        13.6025        21.34%
power10                     12.0227        7.4241         38.25%

reciprocal-throughput        master        patched   improvement
x86_64                      13.6770        15.5231       -13.50%
x86_64v2                    13.8722        16.0446       -15.66%
x86_64v3                    13.6211        13.2753         2.54%
i686                       186.7670        45.4388        75.67%
aarch64 (Neoverse)          9.96089        9.39285         5.70%
power10                      4.9862        3.7819         24.15%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
673e6fe110 math: Use acoshf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acoshf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      61.2471        58.7742         4.04%
x86_64-v2                   62.6519        59.0523         5.75%
x86_64-v3                   58.7408        50.1393        14.64%
aarch64                     24.8580        21.3317        14.19%
power10                     17.0469        13.1345        22.95%

reciprocal-throughput        master        patched   improvement
x86_64                      16.1618        15.1864         6.04%
x86_64-v2                   15.7729        14.7563         6.45%
x86_64-v3                   14.1669        11.9568        15.60%
aarch64                      10.911        9.5486         12.49%
power10                     6.38196        5.06734        20.60%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Adhemerval Zanella
66fa7ad437 math: Use acosf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic acosf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      52.5098        36.6312        30.24%
x86_64v2                    53.0217        37.3091        29.63%
x86_64v3                    42.8501        32.3977        24.39%
i686                       207.3960       109.4000        47.25%
aarch64                     21.3694        13.7871        35.48%
power10                     14.5542         7.2891        49.92%

reciprocal-throughput        master        patched   improvement
x86_64                      14.1487        15.9508       -12.74%
x86_64v2                    14.3293        16.1899       -12.98%
x86_64v3                    13.6563        12.6161         7.62%
i686                       158.4060        45.7354        71.13%
aarch64                     12.5515        9.19233        26.76%
power10                      5.7868         3.3487        42.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-12-18 17:24:43 -03:00
Stafford Horne
afac8b1311 or1k: Update libm-test-ulps
Regen to add new functions acospi, asinpi, atan2pi and atanpi.
2024-12-15 00:42:27 +00:00
Stafford Horne
e4e49583d9 or1k: Update libm-test-ulps
Pick up new functions cospi, "Imaginary part of csin", exp10m1, exp2m1,
log10p1, log2p1, sinpi and tanpi.
2024-12-13 07:20:32 +00:00
Adhemerval Zanella
bccb0648ea math: Use tanf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic tanf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, and to use a generic
128 bit routine for ABIs that do not support it natively.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       82.3961       54.8052       33.49%
x86_64v2                     82.3415       54.8052       33.44%
x86_64v3                     69.3661       50.4864       27.22%
i686                         219.271       45.5396       79.23%
aarch64                      29.2127       19.1951       34.29%
power10                      19.5060       16.2760       16.56%

reciprocal-throughput         master       patched  improvement
x86_64                       28.3976       19.7334       30.51%
x86_64v2                     28.4568       19.7334       30.65%
x86_64v3                     21.1815       16.1811       23.61%
i686                         105.016       15.1426       85.58%
aarch64                      18.1573       10.7681       40.70%
power10                       8.7207        8.7097        0.13%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
d846f4c12d math: Use lgammaf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic lgammaf.

The code was adapted to glibc style, to use the definition of
math_config.h, to remove errno handling, to use math_narrow_eval
on overflow usage, and to adapt to make it reentrant.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       86.5609       70.3278       18.75%
x86_64v2                     78.3030       69.9709       10.64%
x86_64v3                     74.7470       59.8457       19.94%
i686                         387.355       229.761       40.68%
aarch64                      40.8341       33.7563       17.33%
power10                      26.5520       16.1672       39.11%
powerpc                      28.3145       17.0625       39.74%

reciprocal-throughput         master       patched  improvement
x86_64                       68.0461       48.3098       29.00%
x86_64v2                     55.3256       47.2476       14.60%
x86_64v3                     52.3015       38.9028       25.62%
i686                         340.848       195.707       42.58%
aarch64                      36.8000       30.5234       17.06%
power10                      20.4043       12.6268       38.12%
powerpc                      22.6588       13.8866       38.71%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
baa495f231 math: Use erfcf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erfcf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       98.8796       66.2142       33.04%
x86_64v2                     98.9617       67.4221       31.87%
x86_64v3                     87.4161       53.1754       39.17%
aarch64                      33.8336       22.0781       34.75%
power10                      21.1750       13.5864       35.84%
powerpc                      21.4694       13.8149       35.65%

reciprocal-throughput         master       patched  improvement
x86_64                       48.5620       27.6731       43.01%
x86_64v2                     47.9497       28.3804       40.81%
x86_64v3                     42.0255       18.1355       56.85%
aarch64                      24.3938       13.4041       45.05%
power10                      10.4919        6.1881       41.02%
powerpc                       11.763       6.76468       42.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
994fec2397 math: Use erff from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic erff.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master       patched  improvement
x86_64                       85.7363       45.1372       47.35%
x86_64v2                     86.6337       38.5816       55.47%
x86_64v3                     71.3810       34.0843       52.25%
i686                         190.143       97.5014       48.72%
aarch64                      34.9091       14.9320       57.23%
power10                      38.6160        8.5188       77.94%
powerpc                      39.7446       8.45781       78.72%

reciprocal-throughput         master       patched  improvement
x86_64                       35.1739       14.7603       58.04%
x86_64v2                     34.5976       11.2283       67.55%
x86_64v3                     27.3260        9.8550       63.94%
i686                         91.0282       30.8840       66.07%
aarch64                      22.5831        6.9615       69.17%
power10                      18.0386        3.0918       82.86%
powerpc                      20.7277       3.63396       82.47%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-22 10:52:27 -03:00
Adhemerval Zanella
c5d241f06b math: Use cbrtf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance to the generic cbrtf.

The code was adapted to glibc style and to use the definition of
math_config.h.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

latency                       master        patched       improvement
x86_64                       68.6348        36.8908            46.25%
x86_64v2                     67.3418        36.6968            45.51%
x86_64v3                     63.4981        32.7859            48.37%
aarch64                      29.3172        12.1496            58.56%
power10                      18.0845         8.8893            50.85%
powerpc                      18.0859        8.79527            51.37%

reciprocal-throughput         master        patched       improvement
x86_64                       36.4369        13.3565            63.34%
x86_64v2                     37.3611        13.1149            64.90%
x86_64v3                     31.6024        11.2102            64.53%
aarch64                      18.6866        7.3474             60.68%
power10                       9.4758        3.6329             61.66%
powerpc                      9.58896        3.90439            59.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-11-22 10:01:03 -03:00
Adhemerval Zanella
8ae9e51376 math: Use log1pf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log1pf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      71.8142        38.9668        45.74%
x86_64v2                    71.9094        39.1321        45.58%
x86_64v3                    60.1000        32.4016        46.09%
i686                        147.105        104.258        29.13%
aarch64                     26.4439        14.0050        47.04%
power10                     19.4874         9.4146        51.69%
powerpc                     17.6145        8.00736        54.54%

reciprocal-throughput        master        patched   improvement
x86_64                      19.7604        12.7254        35.60%
x86_64v2                    19.0039        11.9455        37.14%
x86_64v3                    16.8559        11.9317        29.21%
i686                        82.3426        73.9718        10.17%
aarch64                     14.4665         7.9614        44.97%
power10                     11.9974         8.4117        29.89%
powerpc                     7.15222         6.0914        14.83%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella
bbd578b38d math: Use expm1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic expm1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      96.7402        36.4026        62.37%
x86_64v2                    97.5391        33.4625        65.69%
x86_64v3                    82.1778        30.8668        62.44%
i686                         120.58        94.8302        21.35%
aarch64                     32.3558        12.8881        60.17%
power10                     23.5087        9.8574         58.07%
powerpc                     23.4776        9.06325        61.40%

reciprocal-throughput        master        patched   improvement
x86_64                      27.8224        15.9255        42.76%
x86_64v2                    27.8364        9.6438         65.36%
x86_64v3                    20.3227        9.6146         52.69%
i686                        63.5629        59.4718         6.44%
aarch64                     17.4838        7.1082         59.34%
power10                     12.4644        8.7829         29.54%
powerpc                     14.2152        5.94765        58.16%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:35 -03:00
Paul Zimmermann
392b3f0971 replace tgammaf by the CORE-MATH implementation
The CORE-MATH implementation is correctly rounded (for any rounding mode).
This can be checked by exhaustive tests in a few minutes since there are
less than 2^32 values to check against for example GNU MPFR.
This patch also adds some bench values for tgammaf.

Tested on x86_64 and x86 (cfarm26).

With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700:

      "tgammaf": {
       "": {
        "duration": 3.50188e+09,
        "iterations": 2e+07,
        "max": 602.891,
        "min": 65.1415,
        "mean": 175.094
       }
      }

With the new code:

      "tgammaf": {
       "": {
        "duration": 3.30825e+09,
        "iterations": 5e+07,
        "max": 211.592,
        "min": 32.0325,
        "mean": 66.1649
       }
      }

With the initial GNU libc code it gave on cfarm26 (i686):

  "tgammaf": {
   "": {
    "duration": 3.70505e+09,
    "iterations": 6e+06,
    "max": 2420.23,
    "min": 243.154,
    "mean": 617.509
   }
  }

With the new code:

  "tgammaf": {
   "": {
    "duration": 3.24497e+09,
    "iterations": 1.8e+07,
    "max": 1238.15,
    "min": 101.155,
    "mean": 180.276
   }
  }

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

Changes in v2:
    - include <math.h> (fix the linknamespace failures)
    - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file
    - restored original wrapper code (math/w_tgammaf_compat.c),
      except for the dealing with the sign
    - removed the tgammaf/float entries in all libm-test-ulps files
    - address other comments from Joseph Myers
      (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html)

Changes in v3:
    - pass NULL argument for signgam from w_tgammaf_compat.c
    - use of math_narrow_eval
    - added more comments

Changes in v4:
    - initialize local_signgam to 0 in math/w_tgamma_template.c
    - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file

Changes in v5:
    - do not mention local_signgam any more in math/w_tgammaf_compat.c
    - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c
      and added comment

Changes in v6:
    - pass NULL as 2nd argument of __ieee754_gammaf_r in
      w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c

Changes in v7:
    - added Signed-off-by line for Alexei Sibidanov (author of the code)

Changes in v8:
    - added Signed-off-by line for Paul Zimmermann (submitted of the patch)

Changes in v9:
    - address comments from review by Adhemerval Zanella
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-10-11 11:12:32 +02:00
Joseph Myers
bb014f50c4 Implement C23 logp1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p).  As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.

Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.

The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either).  It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately.  For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:47:09 +00:00
Stafford Horne
b57adfa49b or1k: Add hard float libm-test-ulps
This patch adds the ulps test file to prepare for the upcoming
hard float patch.  This is separated out to make the hard float patch
smaller.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-03 18:28:18 +01:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
Joseph Myers
6d7e8eda9b Update copyright dates with scripts/update-copyrights 2023-01-06 21:14:39 +00:00
Stafford Horne
0c3c62ca7d or1k: Build Infrastructure
Here we define the minumum linux kernel version at 5.4.0, as that is the
long term support version where 32-bit architectures start to support
64-bit time API's.  The OpenRISC kernel had some bugs up until version 5.8
which caused issues with glibc fork/clone, they have been backported to
5.4 but not previous versions.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-01-05 06:40:06 +09:00
Stafford Horne
9a47b9660b or1k: math soft float support
OpenRISC support hard float but I will like to submit that after glibc
soft float goes upstream.  The hard float support depends on adding user
access to the FPCSR, which is not supported by the kernel yet.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-01-05 06:40:06 +09:00