Commit Graph

739 Commits

Author SHA1 Message Date
Joseph Myers
7ec903e028 Implement C23 exp2m1, exp10m1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).

As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future.  Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.

exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode.  Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.

The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.

The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 16:31:49 +00:00
Joseph Myers
55eb99e9a9 Implement C23 log10p1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the log10p1 functions (log10(1+x): like log1p, but for
base-10 logarithms).

This is directly analogous to the log2p1 implementation (except that
whereas log2p1 has a smaller underflow range than log1p, log10p1 has a
larger underflow range).  The test inputs are copied from those for
log1p and log2p1, plus a few more inputs in that wider underflow
range.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:48:13 +00:00
Joseph Myers
bb014f50c4 Implement C23 logp1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p).  As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.

Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.

The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either).  It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately.  For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:47:09 +00:00
Joseph Myers
79c52daf47 Implement C23 log2p1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the log2p1 functions (log2(1+x): like log1p, but for
base-2 logarithms).

This illustrates the intended structure of implementations of all
these function families: define them initially with a type-generic
template implementation.  If someone wishes to add type-specific
implementations, it is likely such implementations can be both faster
and more accurate than the type-generic one and can then override it
for types for which they are implemented (adding benchmarks would be
desirable in such cases to demonstrate that a new implementation is
indeed faster).

The test inputs are copied from those for log1p.  Note that these
changes make gen-auto-libm-tests depend on MPFR 4.2 (or later).

The bulk of the changes are fairly generic for any such new function.
(sysdeps/powerpc/nofpu/Makefile only needs changing for those
type-generic templates that use fabs.)

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-05-20 13:41:39 +00:00
H.J. Lu
9e1f4aef86 x86-64: Exclude FMA4 IFUNC functions for -mapxf
When -mapxf is used to build glibc, the resulting glibc will never run
on FMA4 machines.  Exclude FMA4 IFUNC functions when -mapxf is used.
This requires GCC which defines __APX_F__ for -mapxf with commit:

1df56719bd8 x86: Define __APX_F__ for -mapxf

Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-04-06 05:03:55 -07:00
Adhemerval Zanella
44ccc2465c math: x86 trunc traps when FE_INEXACT is enabled (BZ 31603)
The implementations of trunc functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella
932544efa4 math: x86 floor traps when FE_INEXACT is enabled (BZ 31601)
The implementations of floor functions using x87 floating point (i386 and
86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella
637bfc392f math: x86 ceill traps when FE_INEXACT is enabled (BZ 31600)
The implementations of ceil functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Sunil K Pandey
9f78a7c1d0 x86_64: Exclude SSE, AVX and FMA4 variants in libm multiarch
When glibc is built with ISA level 3 or higher by default, the resulting
glibc binaries won't run on SSE or FMA4 processors.  Exclude SSE, AVX and
FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by
default.

When glibc is built with ISA level 2 enabled by default, only keep SSE4.1
variant.

Fixes BZ 31335.

NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind
doesn't support AVX512 instructions:

https://bugs.kde.org/show_bug.cgi?id=383010

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-02-25 13:20:51 -08:00
H.J. Lu
ef7f4b1fef Apply the Makefile sorting fix
Apply the Makefile sorting fix generated by sort-makefile-lines.py.
2024-02-15 11:19:56 -08:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
Bruno Haible
787282dede x86: Do not raises floating-point exception traps on fesetexceptflag (BZ 30990)
According to ISO C23 (7.6.4.4), fesetexcept is supposed to set
floating-point exception flags without raising a trap (unlike
feraiseexcept, which is supposed to raise a trap if feenableexcept
was called with the appropriate argument).

The flags can be set in the 387 unit or in the SSE unit.  When we need
to clear a flag, we need to do so in both units, due to the way
fetestexcept is implemented.

When we need to set a flag, it is sufficient to do it in the SSE unit,
because that is guaranteed to not trap.  However, on i386 CPUs that have
only a 387 unit, set the flags in the 387, as long as this cannot trap.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-12-19 15:12:38 -03:00
H.J. Lu
a8ecb126d4 x86_64: Add log1p with FMA
On Skylake, it changes log1p bench performance by:

        Before       After     Improvement
max     63.349       58.347       8%
min     4.448        5.651        -30%
mean    12.0674      10.336       14%

The minimum code path is

 if (hx < 0x3FDA827A)                          /* x < 0.41422  */
    {
      if (__glibc_unlikely (ax >= 0x3ff00000))           /* x <= -1.0 */
        {
	   ...
        }
      if (__glibc_unlikely (ax < 0x3e200000))           /* |x| < 2**-29 */
        {
          math_force_eval (two54 + x);          /* raise inexact */
          if (ax < 0x3c900000)                  /* |x| < 2**-54 */
            {
	      ...
            }
          else
            return x - x * x * 0.5;

FMA and non-FMA code sequences look similar.  Non-FMA version is slightly
faster.  Since log1p is called by asinh and atanh, it improves asinh
performance by:

        Before       After     Improvement
max     75.645       63.135       16%
min     10.074       10.071       0%
mean    15.9483      14.9089      6%

and improves atanh performance by:

        Before       After     Improvement
max     91.768       75.081       18%
min     15.548       13.883       10%
mean    18.3713      16.8011      8%
2023-08-21 10:44:26 -07:00
H.J. Lu
1b214630ce x86_64: Add expm1 with FMA
On Skylake, it improves expm1 bench performance by:

        Before       After     Improvement
max     70.204       68.054       3%
min     20.709       16.2         22%
mean    22.1221      16.7367      24%

NB: Add

extern long double __expm1l (long double);
extern long double __expm1f128 (long double);

for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
2023-08-14 08:14:19 -07:00
H.J. Lu
f6b10ed8e9 x86_64: Add log2 with FMA
On Skylake, it improves log2 bench performance by:

        Before       After     Improvement
max     208.779      63.827       69%
min     9.977        6.55         34%
mean    10.366       6.8191       34%
2023-08-11 07:49:45 -07:00
H.J. Lu
881546979d x86_64: Sort fpu/multiarch/Makefile
Sort Makefile variables using scripts/sort-makefile-lines.py.

No code generation changes observed in libm.  No regressions on x86_64.
2023-08-10 11:23:25 -07:00
Andreas K. Hüttel
6d457ff36a
Update x86_64 libm-test-ulps (x32 ABI)
Based on feedback by Mike Gilbert <floppym@gentoo.org>
Linux-6.1.38-dist x86_64 AMD Phenom-tm- II X6 1055T Processor
-march=amdfam10
failures occur for x32 ABI

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2023-07-19 16:56:54 +02:00
Paul Pluzhnikov
1e9d5987fd Fix misspellings in sysdeps/x86_64 -- BZ 25337.
Applying this commit results in bit-identical rebuild of libc.so.6
math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-05-23 10:25:11 +00:00
Paul Pluzhnikov
1d2971b525 Fix misspellings in sysdeps/x86_64/fpu/multiarch -- BZ 25337.
Applying this commit results in a bit-identical rebuild of
mathvec/libmvec.so.1 (which is the only binary that gets rebuilt).

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-05-23 03:28:58 +00:00
Joe Ramsay
cd94326a13 Enable libmvec support for AArch64
This patch enables libmvec on AArch64. The proposed change is mainly
implementing build infrastructure to add the new routines to ABI,
tests and benchmarks. I have demonstrated how this all fits together
by adding implementations for vector cos, in both single and double
precision, targeting both Advanced SIMD and SVE.

The implementations of the routines themselves are just loops over the
scalar routine from libm for now, as we are more concerned with
getting the plumbing right at this point. We plan to contribute vector
routines from the Arm Optimized Routines repo that are compliant with
requirements described in the libmvec wiki.

Building libmvec requires minimum GCC 10 for SVE ACLE. To avoid raising
the minimum GCC by such a big jump, we allow users to disable libmvec
if their compiler is too old.

Note that at this point users have to manually call the vector math
functions. This seems to be acceptable to some downstream users.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-05-03 12:09:49 +01:00
Florian Weimer
5d1ccdda7b x86_64: Fix asm constraints in feraiseexcept (bug 30305)
The divss instruction clobbers its first argument, and the constraints
need to reflect that.  Fortunately, with GCC 12, generated code does
not actually change, so there is no externally visible bug.

Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-03 18:40:52 +02:00
Joe Ramsay
e4d336f1ac benchtests: Move libmvec benchtest inputs to benchtests directory
This allows other targets to use the same inputs for their own libmvec
microbenchmarks without having to duplicate them in their own
subdirectory.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-03-27 17:04:03 +01:00
H.J. Lu
04a558e669 x86_64: Update libm test ulps
Update libm test ulps for

commit 3efbf11fdf
Author: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Date:   Tue Feb 14 11:24:59 2023 +0100

    update auto-libm-test-out-hypot

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-02-27 08:39:32 -08:00
Joseph Myers
6d7e8eda9b Update copyright dates with scripts/update-copyrights 2023-01-06 21:14:39 +00:00
Florian Weimer
e88b9f0e5c stdio-common: Convert vfprintf and related functions to buffers
vfprintf is entangled with vfwprintf (of course), __printf_fp,
__printf_fphex, __vstrfmon_l_internal, and the strfrom family of
functions.  The latter use the internal snprintf functionality,
so vsnprintf is converted as well.

The simples conversion is __printf_fphex, followed by
__vstrfmon_l_internal and __printf_fp, and finally
__vfprintf_internal and __vfwprintf_internal.  __vsnprintf_internal
and strfrom* are mostly consuming the new interfaces, so they
are comparatively simple.

__printf_fp is a public symbol, so the FILE *-based interface
had to preserved.

The __printf_fp rewrite does not change the actual binary-to-decimal
conversion algorithm, and digits are still not emitted directly to
the target buffer.  However, the staging buffer now uses bytes
instead of wide characters, and one buffer copy is eliminated.

The changes are at least performance-neutral in my testing.
Floating point printing and snprintf improved measurably, so that
this Lua script

  for i=1,5000000 do
      print(i, i * math.pi)
  end

runs about 5% faster for me.  To preserve fprintf performance for
a simple "%d" format, this commit has some logic changes under
LABEL (unsigned_number) to avoid additional function calls.  There
are certainly some very easy performance improvements here: binary,
octal and hexadecimal formatting can easily avoid the temporary work
buffer (the number of digits can be computed ahead-of-time using one
of the __builtin_clz* built-ins). Decimal formatting can use a
specialized version of _itoa_word for base 10.

The existing (inconsistent) width handling between strfmon and printf
is preserved here.  __print_fp_buffer_1 would have to use
__translated_number_width to achieve ISO conformance for printf.

Test expectations in libio/tst-vtables-common.c are adjusted because
the internal staging buffer merges all virtual function calls into
one.

In general, stack buffer usage is greatly reduced, particularly for
unbuffered input streams.  __printf_fp can still use a large buffer
in binary128 mode for %g, though.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-12-19 18:56:54 +01:00
Noah Goldstein
f704192911 x86/fpu: Factor out shared avx2/avx512 code in svml_{s|d}_wrapper_impl.h
Code is exactly the same for the two so better to only maintain one
version.

All math and mathvec tests pass on x86.
2022-11-27 20:22:49 -08:00
Noah Goldstein
72f6a5a0ed x86/fpu: Cleanup code in svml_{s|d}_wrapper_impl.h
1. Remove unnecessary spills.
2. Fix some small nit missed optimizations.

All math and mathvec tests pass on x86.
2022-11-27 20:22:49 -08:00
Noah Goldstein
d371be4b11 x86/fpu: Reformat svml_{s|d}_wrapper_impl.h
Just reformat with the style convention used in other x86 assembler
files.  This doesn't change libm.so or libmvec.so.
2022-11-27 20:22:49 -08:00
Noah Goldstein
95177b78ff x86/fpu: Fix misspelled evex512 section in variety of svml files
```
.section .text.evex512, "ax", @progbits
```

With misspelled as:

```
.section .text.exex512, "ax", @progbits
```
2022-11-27 20:22:49 -08:00
Noah Goldstein
e1d082d9de x86/fpu: Add missing ISA sections to variety of svml files
Many sse4/avx2/avx512 files where just in .text.
2022-11-27 20:22:49 -08:00
Adhemerval Zanella
114e299ca6 x86: Remove .tfloat usage
Some compiler does not support it (such as clang integrated assembler)
neither gcc emits it.
2022-10-03 14:03:21 -03:00
Noah Goldstein
3079f652d7 x86: Replace all sse instructions with vex equivilent in avx+ files
Most of these don't really matter as there was no dirty upper state
but we should generally avoid stray sse when its not needed.

The one case that really matters is in svml_d_tanh4_core_avx2.S:

blendvps %xmm0, %xmm8, %xmm7

When there was a dirty upper state.

Tested on x86_64-linux
2022-06-22 19:42:17 -07:00
Noah Goldstein
cffb9414c5 x86: Optimize svml_s_tanhf4_core_sse4.S
Optimizations are:
    1. Reduce code size (-112 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Prefer registers which get short instruction encoding.
    5. Reduce rodata size (-4k+ rodata is shared with avx2).

Result is roughly a 15-16% speedup:

       Function, New Time, Old Time, New / Old
 _ZGVbN4v_tanhf,    3.158,    3.749,     0.842
2022-06-09 12:51:25 -07:00
Noah Goldstein
bcc41f66a4 x86: Optimize svml_s_tanhf8_core_avx2.S
Optimizations are:
    1. Reduce code size (-81 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Prefer registers which get short instruction encoding.
    5. Reduce rodata size (-32 bytes).

Result is roughly a 17-18% speedup:

       Function, New Time, Old Time, New / Old
_ZGVdN8v_tanhf,     1.977,    2.402,     0.823
2022-06-09 12:51:22 -07:00
Noah Goldstein
3a49ce8799 x86: Add data file that can be shared by tanhf-avx2 and tanhf-sse4
tanhf-avx2 and tanhf-sse4 use the same data tables so we can save
over 4kb using a shared datatable. This does increase the memory
footprint of the sse4 version (as now all the targets are 32 bytes
instead of 16), generally it seems worth the code size save.

NB: This patch doesn't do anything itself, it is setup for future
patches.
2022-06-09 12:51:15 -07:00
Noah Goldstein
e560b3c2d2 x86: Optimize svml_s_tanhf16_core_avx512.S
Optimizations are:
    1. Reduce code size (-67 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Reduce rodata usage (-448 bytes).

Result is roughly a 14% speedup:

       Function, New Time, Old Time, New / Old
_ZGVeN16v_tanhf,    0.649,    0.752,     0.863
2022-06-09 12:51:12 -07:00
Noah Goldstein
fe1915d4f6 x86: Improve svml_s_atanhf4_core_sse4.S
Improvements are:
    1. Reduce code size (-62 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Prefer registers which get short instruction encoding.
    5. Reduce rodata usage (-16 bytes).

The throughput improvement is not significant as the port 0 bottleneck
is unavoidable.

       Function, New Time, Old Time, New / Old
_ZGVbN4v_atanhf,    8.821,    8.903,     0.991
2022-06-09 12:51:09 -07:00
Noah Goldstein
65897e9916 x86: Improve svml_s_atanhf8_core_avx2.S
Improvements are:
    1. Reduce code size (-60 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Prefer registers which get short instruction encoding.
    5. Shrink rodata usage (-32 bytes).

The throughput improvement is not that significant (3-5%) as the
port 0 bottleneck is unavoidable.

       Function, New Time, Old Time, New / Old
_ZGVdN8v_atanhf,    2.799,    2.923,     0.958
2022-06-09 12:51:04 -07:00
Noah Goldstein
73bae395cf x86: Improve svml_s_atanhf16_core_avx512.S
Improvements are:
    1. Reduce code size (-64 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Reduce rodata size ([-128, -188] bytes).

The throughput improvement is not significant as the port 0 bottleneck
is unavoidable.

        Function, New Time, Old Time, New / Old
_ZGVeN16v_atanhf,     1.39,    1.408,     0.987
2022-06-09 12:50:58 -07:00
Andreas Schwab
dc1e5eeb25 x86_64: Optimize sincos where sin/cos is optimized (bug 29193)
The compiler may substitute calls to sin or cos with calls to sincos, thus
we should have the same optimized implementations for sincos.  The
optimized implementations may produce results that differ, that also makes
sure that the sincos call aggrees with the sin and cos calls.
2022-06-01 10:29:52 +02:00
Adhemerval Zanella
efeb2bd1ab math: Add math-use-builtins-fabs (BZ#29027)
Both float, double, and _Float128 are assumed to be supported
(float and double already only uses builtins).  Only long double
is parametrized due GCC bug 29253 which prevents its usage on
powerpc.

It allows to remove i686, ia64, x86_64, powerpc, and sparc arch
specific implementation.

On ia64 it also fixes the sNAN handling:

  math/test-float64x-fabs
  math/test-ldouble-fabs

Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
2022-05-23 17:49:18 -03:00
Siddhesh Poyarekar
5b5b1012d5 benchtests: Better libmvec integration
Improve libmvec benchmark integration so that in future other
architectures may be able to run their libmvec benchmarks as well.  This
now allows libmvec benchmarks to be run with `make BENCHSET=bench-math`.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-04-29 11:48:18 +05:30
Siddhesh Poyarekar
944afe6d95 benchtests: Add UNSUPPORTED benchmark status
The libmvec benchmarks print a message indicating that a certain CPU
feature is unsupported and exit prematurelyi, which breaks the JSON in
bench.out.

Handle this more elegantly in the bench makefile target by adding
support for an UNSUPPORTED exit status (77) so that bench.out continues
to have output for valid tests.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-04-29 11:48:16 +05:30
Adhemerval Zanella
13d45cf9a7 x86: Remove fcopysign{f} implementation
The builtin used by generic code generates similar code.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2022-04-07 12:17:15 -03:00
Adhemerval Zanella
cbc2c56bab benchtests: Only build libmvec benchmarks iff $(build-mathvec) is set
Checked on x86_64-linux-gnu.
2022-04-05 12:01:10 -03:00
Adhemerval Zanella
7eed708edf x86: Remove fabs{f} implementation
For x86_64 is the same as the generic implementation, while for i686
the builtin generates the same code.
2022-04-04 16:23:11 -03:00
Sunil K Pandey
6de743a4e3 x86_64: Fix svml_d_tanh8_core_avx512.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:44:09 -08:00
Sunil K Pandey
28ba5ee77f x86_64: Fix svml_d_tanh4_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:44:09 -08:00
Sunil K Pandey
06c7208f27 x86_64: Fix svml_d_tanh2_core_sse4.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:44:09 -08:00
Sunil K Pandey
2c632117bf x86_64: Fix svml_s_tanhf8_core_avx2.S code formatting
This commit contains following formatting changes

1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
   between it and the first operand.
3. Instruction greater than 7 characters in length have a
   space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-03-07 21:44:09 -08:00