Commit Graph

206 Commits

Author SHA1 Message Date
Sunil K Pandey
a3ed5cf2ab x86_64: Fix svml_d_asin4_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
80766b1407 x86_64: Fix svml_d_asin2_core_sse4.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
3bc1831523 x86_64: Fix svml_s_asinf8_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
e597cf3975 x86_64: Fix svml_s_asinf4_core_sse4.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
62871830a4 x86_64: Fix svml_s_asinf16_core_avx512.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
7a5806ce1c x86_64: Fix svml_d_acosh8_core_avx512.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
e71f7abba6 x86_64: Fix svml_d_acosh4_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
92127a8f41 x86_64: Fix svml_d_acosh2_core_sse4.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
dfa127e854 x86_64: Fix svml_s_acoshf8_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
c90f386276 x86_64: Fix svml_s_acoshf4_core_sse4.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
29f1d36687 x86_64: Fix svml_s_acoshf16_core_avx512.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
67031a3934 x86_64: Fix svml_d_acos8_core_avx512.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
656ff2e94e x86_64: Fix svml_d_acos4_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
97f8492788 x86_64: Fix svml_d_acos2_core_sse4.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
35668c8d94 x86_64: Fix svml_s_acosf8_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
c321692af7 x86_64: Fix svml_s_acosf4_core_sse4.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
5f7fb3ea48 x86_64: Fix svml_s_acosf16_core_avx512.S code formatting (supplemental)
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:14:09 -08:00
Sunil K Pandey
f42415c736 x86_64: Fix svml_s_acosf16_core_avx512.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-04 22:50:35 -08:00
Sunil K Pandey
49e2bf58d5 x86_64: Fix SSE4.2 libmvec atan2 function accuracy [BZ #28765]
This patch fixes SSE4.2 libmvec atan2 function accuracy for following
inputs to less than 4 ulps.

{0x1.bcab29da0e947p-54,0x1.bc41f4d2294b8p-54}   4.19888 ulps
{0x1.b836ed678be29p-588,0x1.b7be6f5a03a8cp-588} 4.09889 ulps

This fixes BZ #28765.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-01-12 13:23:22 -08:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Sunil K Pandey
c21c7bc24e x86-64: Add vector tan/tanf implementation to libmvec
Implement vectorized tan/tanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector tan/tanf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-30 10:19:13 -08:00
Sunil K Pandey
8881cca8fb x86-64: Add vector erfc/erfcf implementation to libmvec
Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector erfc/erfcf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-30 10:19:03 -08:00
Sunil K Pandey
e682d01578 x86-64: Add vector asinh/asinhf implementation to libmvec
Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector asinh/asinhf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:56 -08:00
Sunil K Pandey
c0f36fc303 x86-64: Add vector tanh/tanhf implementation to libmvec
Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector tanh/tanhf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:50 -08:00
Sunil K Pandey
f9ce13fdac x86-64: Add vector erf/erff implementation to libmvec
Implement vectorized erf/erff containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector erf/erff with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:44 -08:00
Sunil K Pandey
0625489ccc x86-64: Add vector acosh/acoshf implementation to libmvec
Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector acosh/acoshf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:39 -08:00
Sunil K Pandey
6dea4dd3da x86-64: Add vector atanh/atanhf implementation to libmvec
Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atanh/atanhf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:34 -08:00
Sunil K Pandey
74265c16ab x86-64: Add vector log1p/log1pf implementation to libmvec
Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:27 -08:00
Sunil K Pandey
7e1722fec8 x86-64: Add vector log2/log2f implementation to libmvec
Implement vectorized log2/log2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector log2/log2f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:21 -08:00
Sunil K Pandey
8f8566026d x86-64: Add vector log10/log10f implementation to libmvec
Implement vectorized log10/log10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector log10/log10f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:15 -08:00
Sunil K Pandey
2941a24f8c x86-64: Add vector atan2/atan2f implementation to libmvec
Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atan2/atan2f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:09 -08:00
Sunil K Pandey
2bf02c5843 x86-64: Add vector cbrt/cbrtf implementation to libmvec
Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:38:02 -08:00
Sunil K Pandey
aa1809a1df x86-64: Add vector sinh/sinhf implementation to libmvec
Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector sinh/sinhf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:55 -08:00
Sunil K Pandey
76ddc74e86 x86-64: Add vector expm1/expm1f implementation to libmvec
Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:49 -08:00
Sunil K Pandey
ef7ea9c132 x86-64: Add vector cosh/coshf implementation to libmvec
Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector cosh/coshf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:42 -08:00
Sunil K Pandey
8b726453d5 x86-64: Add vector exp10/exp10f implementation to libmvec
Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:35 -08:00
Sunil K Pandey
3fc9ccc20b x86-64: Add vector exp2/exp2f implementation to libmvec
Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:29 -08:00
Sunil K Pandey
37475ba883 x86-64: Add vector hypot/hypotf implementation to libmvec
Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:21 -08:00
Sunil K Pandey
11c01de14c x86-64: Add vector asin/asinf implementation to libmvec
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:03 -08:00
Sunil K Pandey
146310177a x86-64: Add vector atan/atanf implementation to libmvec
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:36:46 -08:00
Sunil K Pandey
f20f980c71 x86-64: Add vector acos/acosf implementation to libmvec
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-22 13:03:14 -08:00
H.J. Lu
d962cce139 x86-64: Add sysdeps/x86_64/fpu/Makeconfig
1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which
contains libmvec ABI test dependencies and CFLAGS, in the build directory.
2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS.

Tested on SSE4, AVX, AVX2 and AVX512 machines.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-10-20 11:53:45 -07:00
Joseph Myers
b3f27d8150 Add narrowing fma functions
This patch adds the narrowing fused multiply-add functions from TS
18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal,
f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x,
f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128,
f64xfmaf128 for configurations with _Float64x and _Float128;
__f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case
(for calls to ffmal and dfmal when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.

The changes are mostly similar to those for the other narrowing
functions previously added, especially that for sqrt, so the
description of those generally applies to this patch as well.  As with
sqrt, I reused the same test inputs in auto-libm-test-in as for
non-narrowing fma rather than adding extra or separate inputs for
narrowing fma.  The tests in libm-test-narrow-fma.inc also follow
those for non-narrowing fma.

The non-narrowing fma has a known bug (bug 6801) that it does not set
errno on errors (overflow, underflow, Inf * 0, Inf - Inf).  Rather
than fixing this or having narrowing fma check for errors when
non-narrowing does not (complicating the cases when narrowing fma can
otherwise be an alias for a non-narrowing function), this patch does
not attempt to check for errors from narrowing fma and set errno; the
CHECK_NARROW_FMA macro is still present, but as a placeholder that
does nothing, and this missing errno setting is considered to be
covered by the existing bug rather than needing a separate open bug.
missing-errno annotations are duly added to many of the
auto-libm-test-in test inputs for fma.

This completes adding all the new functions from TS 18661-1 to glibc,
so will be followed by corresponding stdc-predef.h changes to define
__STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support
for TS 18661-1 will be at a similar level to that for C standard
floating-point facilities up to C11 (pragmas not implemented, but
library functions done).  (There are still further changes to be done
to implement changes to the types of fromfp functions from N2548.)

Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float).  The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
2021-09-22 21:25:31 +00:00
Joseph Myers
4b6574a6f6 Redirect fma calls to __fma in libm
include/math.h has a mechanism to redirect internal calls to various
libm functions, that can often be inlined by the compiler, to call
non-exported __* names for those functions in the case when the calls
aren't inlined, with the redirection being disabled when
NO_MATH_REDIRECT.  Add fma to the functions to which this mechanism is
applied.

At present, libm-internal fma calls (generally to __builtin_fma*
functions) are only done when it's known the call will be inlined,
with alternative code not relying on an fma operation being used in
the caller otherwise.  This patch is in preparation for adding the TS
18661 / C2X narrowing fma functions to glibc; it will be natural for
the narrowing function implementations to call the underlying fma
functions unconditionally, with this either being inlined or resulting
in an __fma* call.  (Using two levels of round-to-odd computation like
that, in the case where there isn't an fma hardware instruction, isn't
optimal but is certainly a lot simpler for the initial implementation
than writing different narrowing fma implementations for all the
various pairs of formats.)

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by the patch (using
<https://sourceware.org/pipermail/libc-alpha/2021-September/130991.html>
to fix installed library stripping in build-many-glibcs.py).  Also
tested for x86_64.
2021-09-15 22:57:35 +00:00
Siddhesh Poyarekar
30891f35fa Remove "Contributed by" lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date.  Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.

Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions.  These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.

The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively.  These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:

https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc
https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-09-03 22:06:44 +05:30
H.J. Lu
528f9ff6bf x86-64: Remove assembler AVX512DQ check
The minimum GNU binutils requirement is 2.25 which supports AVX512DQ.
Remove assembler AVX512DQ check.
2021-08-24 07:05:35 -07:00
H.J. Lu
78c9ec9000 x86-64: Optimize load of all bits set into ZMM register [BZ #28252]
Optimize loads of all bits set into ZMM register in AVX512 SVML codes
by replacing

	vpbroadcastq .L_2il0floatpacket.16(%rip), %zmmX

and

	vmovups   .L_2il0floatpacket.13(%rip), %zmmX

with
	vpternlogd $0xff, %zmmX, %zmmX, %zmmX

This fixes BZ #28252.
2021-08-22 06:23:37 -07:00
Shen-Ta Hsieh
1683249d17 x86_64: roundeven with sse4.1 support
This patch adds support for the sse4.1 hardware floating point
roundeven.

Here is some benchmark results on my systems:

=AMD Ryzen 9 3900X 12-Core Processor=

* benchmark result before this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  3.75587e+09 |  3.75114e+09 |
| iterations |  3.93053e+08 |  4.35402e+08 |
| max        | 52.592       | 58.71        |
| min        |  7.98        |  7.22        |
| mean       |  9.55563     |  8.61535     |

* benchmark result after this commit
|            |     roundeven |   roundevenf |
|------------|---------------|--------------|
| duration   |   3.73815e+09 |  3.73738e+09 |
| iterations |   5.82692e+08 |  5.91498e+08 |
| max        |  56.468       | 51.642       |
| min        |   6.27        |  6.156       |
| mean       |   6.41532     |  6.3185      |

=Intel(R) Pentium(R) CPU D1508 @ 2.20GHz=

* benchmark result before this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  2.18208e+09 |  2.18258e+09 |
| iterations |  2.39932e+08 |  2.46924e+08 |
| max        | 96.378       | 98.035       |
| min        |  6.776       |  5.94        |
| mean       |  9.09456     |  8.83907     |

* benchmark result after this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  2.17415e+09 |  2.17005e+09 |
| iterations |  3.56193e+08 |  4.09824e+08 |
| max        | 51.693       | 97.192       |
| min        |  5.926       |  5.093       |
| mean       |  6.10385     |  5.29507     |

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Wilco Dijkstra
47ad14d789 math: Remove mpa files [BZ #15267]
Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Anssi Hannula
69a7ca7705 ieee754: Remove unused __sin32 and __cos32
The __sin32 and __cos32 functions were only used in the now removed slow
path of asin and acos.
2020-12-18 12:10:31 +05:30
Ondřej Hošek
23af890b3f x86-64: Fix FMA4 detection in ifunc [BZ #26534]
A typo in commit 107e6a3c22 causes the
FMA4 code path to be taken on systems that support FMA, even if they do
not support FMA4. Fix this to detect FMA4.
2020-09-02 05:07:37 -07:00
H.J. Lu
107e6a3c22 x86: Support usable check for all CPU features
Support usable check for all CPU features with the following changes:

1. Change struct cpu_features to

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};

so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits.  EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.

The results are

1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
2020-07-13 06:05:16 -07:00
Wilco Dijkstra
220622dde5 Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol.  It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).

The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.

Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).

Passes buildmanyglibc.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-03 10:02:04 -03:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Stefan Liebler
1c94bf0f0a Always use wordsize-64 version of s_trunc.c.
This patch replaces s_trunc.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:14 +01:00
Stefan Liebler
9f234eafe8 Always use wordsize-64 version of s_ceil.c.
This patch replaces s_ceil.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:13 +01:00
Stefan Liebler
95b0c2c431 Always use wordsize-64 version of s_floor.c.
This patch replaces s_floor.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
ab48bdd098 Always use wordsize-64 version of s_rint.c.
This patch replaces s_rint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
af123aa950 Always use wordsize-64 version of s_nearbyint.c.
This patch replaces s_nearbyint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:11 +01:00
Wilco Dijkstra
d0007dc53c Remove x64 _finite tests and references
Remove _finite tests and references from x86_64.  Rather than calling
__exp_finite, use exp directly (since it's the same entry point).

x86_64 builds and passes testsuite.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-10-21 14:29:12 -03:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
H.J. Lu
ba4b8fab20 x86-64: Remove s_sincosf-sse2.S
The current s_sincosf.c is faster than s_sincosf-sse2.S.  On Broadwell
with FMA disabled, bench-sincosf shows:

       Before         After      Improvement
max    154.032        114.517        34%
min    6.25           5.609          11%
mean   14.8728        12.8589        15%

	* sysdeps/x86_64/fpu/s_sincosf.S: Removed.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.c: New file.
2018-12-26 06:58:31 -08:00
H.J. Lu
8700a7851b x86-64: Vectorize sincosf_poly and update s_sincosf-fma.c
Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized
sincosf_poly.  Add x86 sincosf_poly.h to vectorize sincosf_poly.  On
Broadwell, bench-sincosf shows:

       Before         After      Improvement
max    160.273        114.198        40%
min    6.25           5.625          11%
mean   13.0325        10.6462        22%

Vectorized sincosf_poly shows

       Before         After      Improvement
max    138.653        114.198        21%
min    5.004          5.625          -11%
mean   11.5934        10.6462        9%

Tested on x86-64 and i686 as well as with build-many-glibcs.py.

	* sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>.
	(sincos_t, sincosf_poly, sinf_poly): Moved to ...
	* sysdeps/ieee754/flt-32/sincosf_poly.h: Here.  New file.
	* sysdeps/x86/fpu/s_sincosf_data.c: New file.
	* sysdeps/x86/fpu/sincosf_poly.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include
	<sysdeps/ieee754/flt-32/s_sincosf.c>.
2018-12-26 06:56:13 -08:00
Szabolcs Nagy
a502c5294b Remove the error handling wrapper from pow
Introduce new pow symbol version that doesn't do SVID compatible error
handling.  The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.

The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_pow.c and enabled for targets with their own pow implementation or
ifunc dispatch on __ieee754_pow by including math/w_pow.c.

The compatibility symbol version still uses the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously powl was an alias of pow, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling.  This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.

The __pow_finite symbol is now an alias of pow.  Both __pow_finite and
pow set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that
may affect that header.

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add pow.
	* math/w_pow_compat.c (__pow_compat): Change to versioned compat
	symbol.
	* math/w_pow.c: New file.
	* sysdeps/i386/fpu/w_pow.c: New file.
	* sysdeps/ia64/fpu/e_pow.S: Add versioned symbols.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Rename to __pow
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_pow.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_pow.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__ieee754_pow): Rename to
	__pow.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__ieee754_pow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_pow.c (__ieee754_pow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/w_pow.c: New file.
2018-11-21 09:58:36 +00:00
Szabolcs Nagy
f29b7c492d Remove the error handling wrapper from log
Introduce new log symbol version that doesn't do SVID compatible error
handling.  The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.

The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_log.c and enabled for targets with their own log implementation by
including math/w_log.c.

The compatibility symbol version still uses the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously logl was an alias of log, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling.  This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.

The __log_finite symbol is now an alias of log.  Both __log_finite and
log set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that may
affect that header.

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add log.
	* math/w_log_compat.c (__log_compat): Change to versioned compat
	symbol.
	* math/w_log.c: New file.
	* sysdeps/i386/fpu/w_log.c: New file.
	* sysdeps/ia64/fpu/e_log.S: Update.
	* sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Rename to __log
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_log.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_log.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_log-avx.c (__ieee754_log): Rename to
	__log.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/w_log.c: New file.
2018-11-21 09:56:27 +00:00
Szabolcs Nagy
c20a10561a Remove the error handling wrapper from exp and exp2
Introduce new exp and exp2 symbol version that don't do SVID compatible
error handling.  The standard errno and fp exception based error handling
is inline in the new code and does not have significant overhead.

The double precision wrappers are disabled for sysdeps/ieee754/dbl-64
by using empty w_exp.c and w_exp2.c files, the math/w_exp.c and
math/w_exp2.c files use the wrapper template and can be included by
targets that have their own exp and exp2 implementations or use ifunc
on the glibc internal __ieee754_exp symbol.

The compatibility symbol versions still use the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously expl and exp2l were aliases of exp and exp2,
now they point to the compatibility symbols with the wrapper, because
they still need the SVID compatible error handling.  This affects
NO_LONG_DOUBLE (e.g arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets
as well.

The _finite symbols are now aliases of the standard symbols (they have
no performance advantage anymore).  Both the standard symbols and
_finite symbols set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that may
affect that header (the new macro name is __exp instead of __ieee754_exp
which breaks some math.h macros).

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add exp and exp2.
	* math/w_exp2_compat.c (__exp2_compat): Change to versioned compat
	symbol, handle NO_LONG_DOUBLE and LONG_DOUBLE_COMPAT explicitly.
	* math/w_exp_compat.c (__exp_compat): Likewise.
	* math/w_exp.c: New file.
	* math/w_exp2.c: New file.
	* sysdeps/i386/fpu/w_exp.c: New file.
	* sysdeps/i386/fpu/w_exp2.c: New file.
	* sysdeps/ia64/fpu/e_exp.S: Add versioned symbols.
	* sysdeps/ia64/fpu/e_exp2.S: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Rename to __exp
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Rename to __exp2
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_exp.c: New file.
	* sysdeps/ieee754/dbl-64/w_exp2.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_exp.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_exp2.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp.c (__ieee754_exp): Rename to
	__exp.
	* sysdeps/x86_64/fpu/multiarch/w_exp.c: New file.
2018-11-21 09:55:02 +00:00
Joseph Myers
7abf97bed9 Use trunc functions not __trunc functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __trunc functions to call the
corresponding trunc names instead, with asm redirection to __trunc
when the calls are not inlined.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_truncf.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise.
	* sysdeps/ieee754/float128/s_truncf128.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_trunc.c: Likewise.
	* sysdeps/ieee754/flt-32/s_truncf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
	* sysdeps/riscv/rvf/s_truncf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise.
	(ceil): Redirect to __ceil.
	(floor): Redirect to __floor.
	(trunc): Redirect to __trunc.
	(__truncl): Call trunc instead of __trunc.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc):
	Remove macro.
	[_ARCH_PWR5X] (__truncf): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use
	trunc functions instead of __trunc variants.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
2018-09-20 21:11:10 +00:00
Szabolcs Nagy
424c4f60ed Add new pow implementation
The algorithm is exp(y * log(x)), where log(x) is computed with about
1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result
in two doubles, and the exp part uses the same algorithm (and lookup
tables) as exp, but takes the input as two doubles and a sign (to handle
negative bases with odd integer exponent).  The __exp1 internal symbol
is no longer necessary.

There is separate code path when fma is not available but the worst case
error is about 0.54 ULP in both cases.  The lookup table and consts for
log are 4168 bytes.  The .rodata+.text is decreased by 37908 bytes on
aarch64.  The non-nearest rounding error is less than 1 ULP.

Improvements on Cortex-A72 compared to current glibc master:
pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1]
pow latency: 1.84x in [0.01 11.1]x[0.01 11.1]

Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and
arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets.

	* NEWS: Mention pow improvements.
	* math/Makefile (type-double-routines): Add e_pow_log_data.
	* sysdeps/generic/math_private.h (__exp1): Remove.
	* sysdeps/i386/fpu/e_pow_log_data.c: New file.
	* sysdeps/ia64/fpu/e_pow_log_data.c: New file.
	* sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma
	contraction.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove.
	(exp_inline): Remove.
	(__ieee754_exp): Only single double input is handled.
	* sysdeps/ieee754/dbl-64/e_pow.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file.
	* sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define.
	(__pow_log_data): Define.
	* sysdeps/ieee754/dbl-64/upow.h: Remove.
	* sysdeps/ieee754/dbl-64/upow.tbl: Remove.
	* sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file.
	* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma
	contraction.
	(CFLAGS-e_pow-fma4.c): Likewise.
2018-09-19 10:04:51 +01:00
Joseph Myers
71223ef909 Use ceil functions not __ceil functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __ceil functions to call the
corresponding ceil names instead, with asm redirection to __ceil when
the calls are not inlined.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_ceilf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_ceil.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise.
	* sysdeps/ieee754/float128/s_ceilf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_ceilf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
	* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil):
	Remove macro.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil
	functions instead of __ceil variants.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-17 20:42:06 +00:00
Joseph Myers
f29b6f17e4 Use rint functions not __rint functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __rint functions to call the
corresponding rint names instead, with asm redirection to __rint when
the calls are not inlined.  The x86_64 math_private.h is removed as no
longer useful after this patch.

This patch is relative to a tree with my floor patch
<https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied,
and much the same considerations arise regarding possibly replacing an
IFUNC call with a direct inline expansion.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_rintf.c: Likewise.
	* sysdeps/alpha/fpu/s_rint.c: Likewise.
	* sysdeps/alpha/fpu/s_rintf.c: Likewise.
	* sysdeps/i386/fpu/s_rintl.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_rint.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise.
	* sysdeps/ieee754/float128/s_rintf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_rintf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
	* sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise.
	* sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise.
	* sysdeps/powerpc/fpu/s_rint.c: Likewise.
	* sysdeps/powerpc/fpu/s_rintf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_rint.c: Likewise.
	* sysdeps/riscv/rvf/s_rintf.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/x86_64/fpu/math_private.h: Remove file.
	* math/e_scalb.c (invalid_fn): Use rint functions instead of
	__rint variants.
	* math/e_scalbf.c (invalid_fn): Likewise.
	* math/e_scalbl.c (invalid_fn): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
	Likewise.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
	Likewise.
	* sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise.
	* sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
2018-09-14 13:10:39 +00:00
Joseph Myers
e44acb2063 Use floor functions not __floor functions in glibc libm.
Similar to the changes that were made to call sqrt functions directly
in glibc, instead of __ieee754_sqrt variants, so that the compiler
could inline them automatically without needing special inline
definitions in lots of math_private.h headers, this patch makes libm
code call floor functions directly instead of __floor variants,
removing the inlines / macros for x86_64 (SSE4.1) and powerpc
(POWER5).

The redirection used to ensure that __ieee754_sqrt does still get
called when the compiler doesn't inline a built-in function expansion
is refactored so it can be applied to other functions; the refactoring
is arranged so it's not limited to unary functions either (it would be
reasonable to use this mechanism for copysign - removing the inline in
math_private_calls.h but also eliminating unnecessary local PLT entry
use in the cases (powerpc soft-float and e500v1, for IBM long double)
where copysign calls don't get inlined).

The point of this change is that more architectures can get floor
calls inlined where they weren't previously (AArch64, for example),
without needing special inline definitions in their math_private.h,
and existing such definitions in math_private.h headers can be
removed.

Note that it's possible that in some cases an inline may be used where
an IFUNC call was previously used - this is the case on x86_64, for
example.  I think the direct calls to floor are still appropriate; if
there's any significant performance cost from inline SSE2 floor
instead of an IFUNC call ending up with SSE4.1 floor, that indicates
that either the function should be doing something else that's faster
than using floor at all, or it should itself have IFUNC variants, or
that the compiler choice of inlining for generic tuning should change
to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
might be called at runtime - but not that glibc should avoid calling
floor internally.  (After all, all the same considerations would apply
to any user program calling floor, where it might either be inlined or
left as an out-of-line call allowing for a possible IFUNC.)

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT):
	New macro.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (floor): Likewise.
	* sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_floorf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_floor.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise.
	* sysdeps/ieee754/float128/s_floorf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_floorf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
	* sysdeps/riscv/rvf/s_floorf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor):
	Remove macro.
	[_ARCH_PWR5X] (__floorf): Likewise.
	* sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove
	inline function.
	[__SSE4_1__] (__floorf): Likewise.
	* math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions
	instead of __floor variants.
	* math/w_lgamma_r_compat.c (__lgamma_r): Likewise.
	* math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise.
	* math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise.
	* math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise.
	* math/w_lgammal_r_compat.c (__lgammal_r): Likewise.
	* math/w_tgamma_compat.c (__tgamma): Likewise.
	* math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise.
	* math/w_tgammaf_compat.c (__tgammaf): Likewise.
	* math/w_tgammal_compat.c (__tgammal): Likewise.
	* sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
	Likewise.
	* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
	* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise.
	* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
	* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-14 13:09:01 +00:00
Wilco Dijkstra
599cf39766 Improve performance of sinf and cosf
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials.  The returned values are identical to sincosf
for the same input.  ULP definitions for AArch64 and x64 are updated.

sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4  : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0   : 2.3x
* |x| < Inf     : 3.0x

	* NEWS: Mention sinf, cosf, sincosf.
	* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
	constants rather than including generic sincosf.h.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
	* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
	(reduced_cos): Remove.
	(sinf_poly): New function.
	* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
2018-08-14 10:45:59 +01:00
H.J. Lu
430388d5dc x86: Don't include <init-arch.h> in assembly codes
There is no need to include <init-arch.h> in assembly codes since all
x86 IFUNC selector functions are written in C.  Tested on i686 and
x86-64.  There is no code change in libc.so, ld.so and libmvec.so.

	* sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include
	<init-arch.h>.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
2018-08-03 08:05:00 -07:00
Wilco Dijkstra
610ee1fc93 Remove mplog and mpexp
Remove the now unused mplog and mpexp files.

	* math/Makefile: Remove mpexp.c and mplog.c
	* sysdeps/i386/fpu/mpexp.c: Delete file.
	* sysdeps/i386/fpu/mplog.c: Likewise.
	* sysdeps/ia64/fpu/mpexp.c: Likewise.
	* sysdeps/ia64/fpu/mplog.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog.
	* sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function.
	* sysdeps/ieee754/dbl-64/mpexp.c: Delete file.
	* sysdeps/ieee754/dbl-64/mplog.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mplog.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*.
	* sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise.
2018-02-15 12:41:05 +00:00
Szabolcs Nagy
de800d8305 Remove slow paths from exp
Remove the __slowexp code, so exp is no longer correctly rounded.  The
result is computed to about 70 bits precision so the worst case ulp
error is about 0.500007 in nearest rounding mode.

	* manual/probes.texi: Remove slowexp probes.
	* math/Makefile: Remove slowexp.
	* sysdeps/generic/math_private.h (__slowexp): Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and
	document error bounds.
	* sysdeps/i386/fpu/slowexp.c: Remove.
	* sysdeps/ia64/fpu/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Remove.
	* sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
2018-02-12 11:33:33 +00:00
Wilco Dijkstra
c3d466cba1 Remove slow paths from pow
Remove the slow paths from pow.  Like several other double precision math
functions, pow is exactly rounded.  This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP.  Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.

All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP.  A simple test over a few hundred million
values shows pow is 10% faster on average.  This fixes BZ #13932.

	[BZ #13932]
	* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
	* benchtests/pow-inputs: Update comment for slow path cases.
	* manual/probes.texi (slowpow_p10): Delete removed probe.
	(slowpow_p10): Likewise.
	* math/Makefile: Remove halfulp.c and slowpow.c.
	* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/generic/math_private.h (__exp1): Remove error argument.
	(__halfulp): Remove.
	(__slowpow): Remove.
	* sysdeps/i386/fpu/halfulp.c: Delete file.
	* sysdeps/i386/fpu/slowpow.c: Likewise.
	* sysdeps/ia64/fpu/halfulp.c: Likewise.
	* sysdeps/ia64/fpu/slowpow.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
	improve comments and add error analysis.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
	(power1): Remove function:
	(log1): Remove error argument, add error analysis.
	(my_log2): Remove function.
	* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
	* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
	* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
	* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
	slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-12 10:47:09 +00:00
H.J. Lu
c70e4e9c9e x86-64: Add sincosf with vector FMA
Since the x86-64 assembly version of sincosf is higly optimized with
vector instructions, there isn't much room for improvement.  However
s_sincosf.c written in C with vector math and intrinsics can be
optimized by GCC with FMA.

On Skylake, bench-sincosf reports performance improvement:

           Assembly       FMA         improvement
max        104.042       101.008         3%
min        9.426         8.586           10%
mean       20.6209       18.2238         13%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_sincosf-sse2 and s_sincosf-fma.
	(CFLAGS-s_sincosf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise.
	* sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if
	__sincosf is defined.
2018-01-08 08:04:40 -08:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Joseph Myers
f1e005022e Revert exp reimplementation (causes test failures).
Revert:

	2017-12-19  Joseph Myers  <joseph@codesourcery.com>

	* sysdeps/x86_64/fpu/libm-test-ulps: Update.

	2017-12-19  Patrick McGehearty  <patrick.mcgehearty@oracle.com>

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 18:11:37 +00:00
Patrick McGehearty
6fd0a3c6a8 Improve __ieee754_exp() performance by greater than 5x on sparc/x86.
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 17:27:31 +00:00
H.J. Lu
ac817e083b x86-64: Add cosf with FMA
On Skylake, bench-cosf reports performance improvement:

            Before        After         Improvement
max        135.362       94.552            43%
min        8.532         7.688             11%
mean       17.1446       11.8128           45%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_cosf-sse2 and s_cosf-fma.
	(CFLAGS-s_cosf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-sse2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_cosf.c: Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-12-12 15:32:58 -08:00
H.J. Lu
9d0ffa60ad x86-64: Add sinf with FMA
On Skylake, bench-sinf reports performance improvement:

            Before        After         Improvement
max        153.996       100.094           54%
min        8.546         6.852             25%
mean       18.1223       11.802            54%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_sinf-sse2 and s_sinf-fma.
	(CFLAGS-s_sinf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_sinf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_sinf-sse2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sinf.c: Likewise.
2017-12-07 10:11:16 -08:00
Joseph Myers
34bb10aabf Use libm_alias_float for x86_64.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes x86_64 libm function implementations use
libm_alias_float to define function aliases, or libm_alias_float_other
where the main name is defined with versioned_symbol.

Tested with the glibc testsuite for x86_64, and tested with
build-many-glibcs.py for all its x86_64 configurations that installed
stripped shared libraries are unchanged by the patch.

	* sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_expf.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_log2f.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_logf.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_powf.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Include
	<libm-alias-float.h>.
	(ceilf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Include
	<libm-alias-float.h>.
	(floorf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Include
	<libm-alias-float.h>.
	(fmaf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Include
	<libm-alias-float.h>.
	(nearbyintf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Include
	<libm-alias-float.h>.
	(rintf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Include
	<libm-alias-float.h>.
	(truncf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_copysignf.S: Include <libm-alias-float.h>.
	(copysignf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_cosf.S: Include <libm-alias-float.h>.
	(cosf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_fabsf.c: Include <libm-alias-float.h>.
	(fabsf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_fmaxf.S: Include <libm-alias-float.h>.
	(fmaxf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_fminf.S: Include <libm-alias-float.h>.
	(fminf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_llrintf.S: Include <libm-alias-float.h>.
	(llrintf): Define using libm_alias_float.
	[!__ILP32__] (lrintf): Likewise.
	* sysdeps/x86_64/fpu/s_sincosf.S: Include <libm-alias-float.h>.
	(sincosf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_sinf.S: Include <libm-alias-float.h>.
	(sinf): Define using libm_alias_float.
	* sysdeps/x86_64/x32/fpu/s_lrintf.S: Include <libm-alias-float.h>.
	(lrintf): Define using libm_alias_float.
2017-11-29 21:25:41 +00:00
Joseph Myers
011fba7e48 Use libm_alias_double for x86_64.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes x86_64 libm function implementations use
libm_alias_double to define function aliases.

Tested with the glibc testsuite for x86_64, and tested with
build-many-glibcs.py for all its x86_64 configurations that installed
stripped shared libraries are unchanged by the patch.

	* sysdeps/x86_64/fpu/multiarch/s_atan.c: Include
	<libm-alias-double.h>.
	(atan): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_ceil.c: Include
	<libm-alias-double.h>.
	(ceil): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_floor.c: Include
	<libm-alias-double.h>.
	(floor): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_fma.c: Include
	<libm-alias-double.h>.
	(fma): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyint.c: Include
	<libm-alias-double.h>.
	(nearbyint): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_rint.c: Include
	<libm-alias-double.h>.
	(rint): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_sin.c: Include
	<libm-alias-double.h>.
	(sin): Define using libm_alias_double.
	(cos): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan.c: Include
	<libm-alias-double.h>.
	(tan): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Include
	<libm-alias-double.h>.
	(trunc): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_copysign.S: Include <libm-alias-double.h>.
	(copysign): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_fabs.c: Include <libm-alias-double.h>.
	(fabs): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_fmax.S: Include <libm-alias-double.h>.
	(fmax): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_fmin.S: Include <libm-alias-double.h>.
	(fmin): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_llrint.S: Include <libm-alias-double.h>.
	(llrint): Define using libm_alias_double.
	[!__ILP32__] (lrint): Likewise.
	* sysdeps/x86_64/x32/fpu/s_lrint.S: Include <libm-alias-double.h>.
	(lrint): Define using libm_alias_double.
2017-11-29 19:01:21 +00:00
H.J. Lu
a122dbfb2e Replace "if if " with "if " in comments
* include/alloc_buffer.h: Replace "if if " with "if " in
	comments.
	* sysdeps/mips/memcpy.S: Likkewise.
	* sysdeps/mips/memset.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S:
	Likewise.
2017-10-25 08:05:51 -07:00
H.J. Lu
80bb593563 x86-64: Add powf with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      35.4713          27.3842       29%
latency                    82.4537          66.3175       24%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_powf-fma.
	(CFLAGS-e_powf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_powf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_powf.c: Likewise.
2017-10-22 08:08:00 -07:00
H.J. Lu
5c7adbd8ed x86-64: Add log2f with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      16.5937          14.0789       17%
latency                    41.7755          35.3586       18%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_log2f-fma.
	(CFLAGS-e_log2f-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_log2f-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_log2f.c: Likewise.
2017-10-22 08:06:58 -07:00
H.J. Lu
0ccc7153cc x86-64: Add logf with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      16.1534          13.8874       16%
latency                    41.9642          34.3072       22%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_logf-fma.
	(CFLAGS-e_logf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_logf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_logf.c: Likewise.
2017-10-22 08:05:15 -07:00
H.J. Lu
5d15c96975 x86-64: Add exp2f with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      13.0291          11.2225       16%
latency                    44.5154          37.5766       18%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_exp2f-fma.
	(CFLAGS-e_exp2f-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_exp2f-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Likewise.
2017-10-22 07:57:50 -07:00
H.J. Lu
e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c
This patch replaces x86-64 assembly versions of e_expf with generic
e_expf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      36.039           20.7749       73%
latency                    58.8096          40.8715       43%

On Skylake, it improves

                           Before            After     Improvement
reciprocal-throughput      18.4436          11.1693       65%
latency                    47.5162          37.5411       26%

	* sysdeps/x86_64/fpu/e_expf.S: Removed.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise.
	* sysdeps/x86_64/fpu/w_expf.c: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic
	e_expf.c.
	* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c):
	New.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf):
	Renamed to ...
	(__redirect_expf): This.
	(SYMBOL_NAME): Changed to expf.
	(__ieee754_expf): Renamed to ...
	(__expf): This.
	(__GI___expf): This.
	(__ieee754_expf): Add strong_alias.
	(__expf_finite): Likewise.
	(__expf): New.
	Include <sysdeps/ieee754/flt-32/e_expf.c>.
2017-10-22 07:49:55 -07:00
Joseph Myers
527cd19c3d Make dbl-64 atan and tan into weak aliases.
This patch converts the dbl-64 implementations of atan and tan into
weak aliases of __atan and __tan, in preparation for making them use
libm_alias_double.  Consequent changes are made to the x86_64
multiarch versions wrapping round them (with the dbl-64 functions,
like other such functions, being made not to define their aliases at
all if __atan or __tan are defined as macros by an including file).

Tested for x86_64, and with build-many-glibcs.py.

	* sysdeps/ieee754/dbl-64/s_atan.c (atan): Rename to __atan and
	define as weak alias of __atan.  Do not define any aliases if
	[__atan].
	[NO_LONG_DOUBLE] (__atanl): Define as strong alias of __atan.
	[NO_LONG_DOUBLE] (atanl): Define as weak alias of __atanl.
	* sysdeps/ieee754/dbl-64/s_tan.c (tan): Rename to __tan and define
	as weak alias of __tan.  Do not define any aliases if [__tan].
	[NO_LONG_DOUBLE] (__tanl): Define as strong alias of __tan.
	[NO_LONG_DOUBLE] (tanl): Define as weak alias of __tanl.
	* sysdeps/x86_64/fpu/multiarch/s_atan-avx.c (atan): Rename to
	__atan.
	* sysdeps/x86_64/fpu/multiarch/s_atan-fma.c (atan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan-fma4.c (atan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan.c (atan): Rename to __atan
	and define as weak alias of __atan.
	* sysdeps/x86_64/fpu/multiarch/s_tan-avx.c (tan): Rename to
	__atan.
	* sysdeps/x86_64/fpu/multiarch/s_tan-fma.c (tan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan-fma4.c (tan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan.c (tan): Rename to __tan and
	define as weak alias of __tan.
2017-10-02 20:20:52 +00:00
Joseph Myers
ae8372d7e4 Add SSE4.1 trunc, truncf (bug 20142).
This patch adds SSE4.1 versions of trunc and truncf, using the roundsd
/ roundss instructions, similar to the versions of ceil, floor, rint
and nearbyint functions we already have.  In my testing with the glibc
benchtests these are about 30% faster than the C versions for double,
20% faster for float.

Tested for x86_64.

	[BZ #20142]
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1.
	* sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
2017-09-20 16:54:05 +00:00
H.J. Lu
ef8adeb041 x86: Add MathVec_Prefer_No_AVX512 to cpu-features [BZ #21967]
AVX512 functions in mathvec are used on machines with AVX512.  An AVX2
wrapper is also provided and it can be used when the AVX512 version
isn't profitable.  MathVec_Prefer_No_AVX512 is addded to cpu-features.
If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES
environment variable, the AVX2 wrapper will be used.

Tested on x86-64 machines with and without AVX512.  Also verified
glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine.

	[BZ #21967]
	* sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512):
	New.
	(index_arch_MathVec_Prefer_No_AVX512): Likewise.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Handle MathVec_Prefer_No_AVX512.
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h
	(IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512
	is set.
2017-09-12 07:54:47 -07:00
H.J. Lu
5e2bc4ff33 x86_64 __redirect_ieee754_expf: Change double to float
__redirect_ieee754_expf has type float, not double.

	* sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf):
	Change double to float.
2017-08-28 08:45:53 -07:00
H.J. Lu
b9eaca8fa0 x86_64: Replace AVX512F .byte sequences with instructions
Since binutils 2.25 or later is required to build glibc, we can replace
AVX512F .byte sequences with AVX512F instructions.

Tested on x86-64 and x32.  There are no code differences in libmvec.so
and libmvec.a.

	* sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Replace AVX512F
	.byte sequences with AVX512F instructions.
	* sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Likewise.
	* sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
	* sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
	Likewise.
2017-08-23 06:26:44 -07:00
H.J. Lu
098b9dd468 x86-64: Check FMA_Usable in ifunc-mathvec-avx2.h [BZ #21966]
Since the AVX2 version of mathvec functions uses FMA, it can only be
used when FMA is usable.

	[BZ #21966]
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h
	(IFUNC_SELECTOR): Don't use the AVX2 version if FMA isn't
	usable.
2017-08-18 06:19:07 -07:00
H.J. Lu
24a2e6588d x86-64: Optimize e_expf with FMA [BZ #21912]
FMA optimized e_expf improves performance by more than 50% on Skylake.

	[BZ #21912]
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_expf-fma.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: New file.
	* sysdeps/x86_64/fpu/multiarch/e_expf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/ifunc-fma.h: Likewise.
2017-08-16 08:43:48 -07:00
H.J. Lu
57a72fa350 x86-64: Add FMA multiarch functions to libm
This patch adds multiarch functions optimized with -mfma -mavx2 to libm.
e_pow-fma.c is compiled with $(config-cflags-nofma) due to PR 19003.

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_exp-fma, e_log-fma, e_pow-fma, s_atan-fma, e_asin-fma,
	e_atan2-fma, s_sin-fma, s_tan-fma, mplog-fma, mpa-fma,
	slowexp-fma, slowpow-fma, sincos32-fma, doasin-fma, dosincos-fma,
	halfulp-fma, mpexp-fma, mpatan2-fma, mpatan-fma, mpsqrt-fma,
	and mptan-fma.
	(CFLAGS-doasin-fma.c): New.
	(CFLAGS-dosincos-fma.c): Likewise.
	(CFLAGS-e_asin-fma.c): Likewise.
	(CFLAGS-e_atan2-fma.c): Likewise.
	(CFLAGS-e_exp-fma.c): Likewise.
	(CFLAGS-e_log-fma.c): Likewise.
	(CFLAGS-e_pow-fma.c): Likewise.
	(CFLAGS-halfulp-fma.c): Likewise.
	(CFLAGS-mpa-fma.c): Likewise.
	(CFLAGS-mpatan-fma.c): Likewise.
	(CFLAGS-mpatan2-fma.c): Likewise.
	(CFLAGS-mpexp-fma.c): Likewise.
	(CFLAGS-mplog-fma.c): Likewise.
	(CFLAGS-mpsqrt-fma.c): Likewise.
	(CFLAGS-mptan-fma.c): Likewise.
	(CFLAGS-s_atan-fma.c): Likewise.
	(CFLAGS-sincos32-fma.c): Likewise.
	(CFLAGS-slowexp-fma.c): Likewise.
	(CFLAGS-slowpow-fma.c): Likewise.
	(CFLAGS-s_sin-fma.c): Likewise.
	(CFLAGS-s_tan-fma.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/doasin-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/dosincos-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_asin-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_atan2-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/ifunc-avx-fma4.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/ifunc-fma4.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpa-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpatan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpatan2-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpsqrt-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mptan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sin-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/sincos32-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_asin.c: Rewrite.
	* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
2017-08-07 08:20:56 -07:00