Commit Graph

55 Commits

Author SHA1 Message Date
Anton Blanchard
01d7806282 powerpc64le: Fix typo in configure
The configure script checks for -mlong-double-128 but mentions -mlongdouble
when it fails.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-08 21:59:28 -03:00
Pedro Franco de Carvalho
813c6ec808 powerpc: optimize strcpy/stpcpy for POWER9/10
This patch modifies the current POWER9 implementation of strcpy and
stpcpy to optimize it for POWER9/10.

Since no new POWER10 instructions are used, the original POWER9 strcpy is
modified instead of creating a new implementation for POWER10.  This
implementation is based on both the original POWER9 implementation of
strcpy and the preamble of the new POWER10 implementation of strlen.

The changes also affect stpcpy, which uses the same implementation with
some additional code before returning.

On POWER9, averaging improvements across the benchmark
inputs (length/source alignment/destination alignment), for an
experiment that ran the benchmark five times, bench-strcpy showed an
improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%.

On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%.

The changes are:

1. Removed the null string optimization.

   Although this results in a few extra cycles for the null string, in
   combination with the second change, this resulted in improvements for
   for other cases.

2. Adapted the preamble from strlen for POWER10.

   This is the part of the function that handles up to the first 16 bytes
   of the string.

3. Increased number of unrolled iterations in the main loop to 6.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Tested-by: Matheus Castanho <msc@linux.ibm.com>
2021-07-01 17:58:53 -03:00
Lucas A. M. Magalhaes
a55e2da270 powerpc: Optimized memcmp for power10
This patch was based on the __memcmp_power8 and the recent
__strlen_power10.

Improvements from __memcmp_power8:

1. Don't need alignment code.

   On POWER10 lxvp and lxvl do not generate alignment interrupts, so
they are safe for use on caching-inhibited memory.  Notice that the
comparison on the main loop will wait for both VSR to be ready.
Therefore aligning one of the input address does not improve
performance.  In order to align both registers a vperm is necessary
which add too much overhead.

2. Uses new POWER10 instructions

   This code uses lxvp to decrease contention on load by loading 32 bytes
per instruction.
   The vextractbm is used to have a smaller tail code for calculating the
return value.

3. Performance improvement

   This version has around 35% better performance on average. I saw no
performance regressions for any length or alignment.

Thanks Matheus for helping me out with some details.

Co-authored-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-05-31 18:00:20 -03:00
Florian Weimer
d337345ce1 powerpc64le: Check HWCAP bits against compiler build flags
When built with GCC 11.1 and -mcpu=power9, ld.so prints this error
message when running on POWER8:

Fatal glibc error: CPU lacks ISA 3.00 support (POWER9 or later required)
2021-05-19 11:09:57 +02:00
Matheus Castanho
1a594aa986 powerpc: Add optimized rawmemchr for POWER10
Reuse code for optimized strlen to implement a faster version of rawmemchr.
This takes advantage of the same benefits provided by the strlen implementation,
but needs some extra steps. __strlen_power10 code should be unchanged after this
change.

rawmemchr returns a pointer to the char found, while strlen returns only the
length, so we have to take that into account when preparing the return value.

To quickly check 64B, the loop on __strlen_power10 merges the whole block into
16B by using unsigned minimum vector operations (vminub) and checks if there are
any \0 on the resulting vector. The same code is used by rawmemchr if the char c
is 0. However, this approach does not work when c != 0.  We first need to
subtract each byte by c, so that the value we are looking for is converted to a
0, then taking the minimum and checking for nulls works again.

The new code branches after it has compared ~256 bytes and chooses which of the
two strategies above will be used in the main loop, based on the char c. This
extra branch adds some overhead (~5%) for length ~256, but is quickly amortized
by the faster loop for larger sizes.

Compared to __rawmemchr_power9, this version is ~20% faster for length < 256.
Because of the optimized main loop, the improvement becomes ~35% for c != 0
and ~50% for c = 0 for strings longer than 256.

Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-05-17 10:30:35 -03:00
Raoni Fassina Firmino
23fdf8178c powerpc64le: Optimize memset for POWER10
This implementation is based on __memset_power8 and integrates a lot
of suggestions from Anton Blanchard.

The biggest difference is that it makes extensive use of stxvl to
alignment and tail code to avoid branches and small stores.  It has
three main execution paths:

a) "Short lengths" for lengths up to 64 bytes, avoiding as many
   branches as possible.

b) "General case" for larger lengths, it has an alignment section
   using stxvl to avoid branches, a 128 bytes loop and then a tail
   code, again using stxvl with few branches.

c) "Zeroing cache blocks" for lengths from 256 bytes upwards and set
   value being zero.  It is mostly the __memset_power8 code but the
   alignment phase was simplified because, at this point, address is
   already 16-bytes aligned and also changed to use vector stores.
   The tail code was also simplified to reuse the general case tail.

All unaligned stores use stxvl instructions that do not generate
alignment interrupts on POWER10, making it safe to use on
caching-inhibited memory.

On average, this implementation provides something around 30%
improvement when compared to __memset_power8.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-04-30 18:12:08 -03:00
Tulio Magno Quites Machado Filho
e941e0ae80 powerpc64le: Optimize memcpy for POWER10
This implementation is based on __memcpy_power8_cached and integrates
suggestions from Anton Blanchard.
It benefits from loads and stores with length for short lengths and for
tail code, simplifying the code.

All unaligned memory accesses use instructions that do not generate
alignment interrupts on POWER10, making it safe to use on
caching-inhibited memory.

The main loop has also been modified in order to increase instruction
throughput by reducing the dependency on updates from previous iterations.

On average, this implementation provides around 30% improvement when
compared to __memcpy_power7 and 10% improvement in comparison to
__memcpy_power8_cached.
2021-04-30 18:12:08 -03:00
Lucas A. M. Magalhaes
dd59655e93 powerpc64le: Optimized memmove for POWER10
This patch was initially based on the __memmove_power7 with some ideas
from strncpy implementation for Power 9.

Improvements from __memmove_power7:

1. Use lxvl/stxvl for alignment code.

   The code for Power 7 uses branches when the input is not naturally
   aligned to the width of a vector. The new implementation uses
   lxvl/stxvl instead which reduces pressure on GPRs. It also allows
   the removal of branch instructions, implicitly removing branch stalls
   and mispredictions.

2. Use of lxv/stxv and lxvl/stxvl pair is safe to use on Cache Inhibited
   memory.

   On Power 10 vector load and stores are safe to use on CI memory for
   addresses unaligned to 16B. This code takes advantage of this to
   do unaligned loads.

   The unaligned loads don't have a significant performance impact by
   themselves. However doing so decreases register pressure on GPRs
   and interdependence stalls on load/store pairs. This also improved
   readability as there are now less code paths for different alignments.
   Finally this reduces the overall code size.

3. Improved performance.

   This version runs on average about 30% better than memmove_power7
   for lengths  larger than 8KB. For input lengths shorter than 8KB
   the improvement is smaller, it has on average about 17% better
   performance.

   This version has a degradation of about 50% for input lengths
   in the 0 to 31 bytes range when dest is unaligned.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-04-30 18:12:08 -03:00
Raphael Moreira Zinsly
25cb72820a powerpc: Add log IFUNC multiarch support for POWER10
Checked on ppc64le built without --with-cpu, with --with-cpu=power9
and with --disable-multi-arch.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
2021-04-26 10:10:29 -03:00
Matheus Castanho
10624a97e8 powerpc: Add optimized strlen for POWER10
Improvements compared to POWER9 version:

1. Take into account first 16B comparison for aligned strings

   The previous version compares the first 16B and increments r4 by the number
   of bytes until the address is 16B-aligned, then starts doing aligned loads at
   that address. For aligned strings, this causes the first 16B to be compared
   twice, because the increment is 0. Here we calculate the next 16B-aligned
   address differently, which avoids that issue.

2. Use simple comparisons for the first ~192 bytes

   The main loop is good for big strings, but comparing 16B each time is better
   for smaller strings.  So after aligning the address to 16 Bytes, we check
   more 176B in 16B chunks.  There may be some overlaps with the main loop for
   unaligned strings, but we avoid using the more aggressive strategy too soon,
   and also allow the loop to start at a 64B-aligned address.  This greatly
   benefits smaller strings and avoids overlapping checks if the string is
   already aligned at a 64B boundary.

3. Reduce dependencies between load blocks caused by address calculation on loop

   Doing a precise time tracing on the code showed many loads in the loop were
   stalled waiting for updates to r4 from previous code blocks.  This
   implementation avoids that as much as possible by using 2 registers (r4 and
   r5) to hold addresses to be used by different parts of the code.

   Also, the previous code aligned the address to 16B, then to 64B by doing a
   few 48B loops (if needed) until the address was aligned. The main loop could
   not start until that 48B loop had finished and r4 was updated with the
   current address. Here we calculate the address used by the loop very early,
   so it can start sooner.

   The main loop now uses 2 pointers 128B apart to make pointer updates less
   frequent, and also unrolls 1 iteration to guarantee there is enough time
   between iterations to update the pointers, reducing stalled cycles.

4. Use new P10 instructions

   lxvp is used to load 32B with a single instruction, reducing contention in
   the load queue.

   vextractbm allows simplifying the tail code for the loop, replacing
   vbpermq and avoiding having to generate a permute control vector.

Reviewed-by: Paul E Murphy <murphyp@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
2021-04-22 16:18:06 -03:00
Andreas Schwab
5ccea9a011 powerpc64le: Use ifunc for _Float128 functions also in libc
This fixes missing definition of math functions in libc in a static link
that are no longer built for libm after commit 4898d9712b ("Avoid adding
duplicated symbols into static libraries").
2021-04-01 10:55:42 +02:00
Raphael Moreira Zinsly
a7d88506c2 powerpc: Add optimized llogb* for POWER9
The POWER9 builtins used to improve the ilogb* functions can be
used in the llogb* functions as well.
2021-03-16 12:19:09 -03:00
Raphael Moreira Zinsly
56c81132cc powerpc: Add optimized ilogb* for POWER9
The instructions xsxexpdp and xsxexpqp introduced on POWER9 extract
the exponent from a double-precision and quad-precision floating-point
respectively, thus they can be used to improve ilogb, ilogbf and ilogbf128.
2021-03-16 12:19:09 -03:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Florian Weimer
4c38c1a229 powerpc64le: Add glibc-hwcaps support
The "power10" and "power9" subdirectories are selected in a way
that matches the -mcpu=power10 and -mcpu=power9 options of GCC.
2020-12-04 14:50:49 +01:00
Paul E. Murphy
33fc34521d powerpc64le: ifunc select *f128 routines in multiarch mode
Programatically generate simple wrappers for interesting libm *f128
objects.  Selected functions are transcendental functions or
those with trivial compiler builtins.  This can result in a 2-3x
speedup (e.g logf128 and expf128).

A second set of implementation files are generated which include
the first implementation encountered along the search path.  This
usually works, except when a wrapper is overriden and makefile
search order slightly diverges from include order.  Likewise,
wrapper object files are created for each generated file.  These
hold the ifunc selection routines which export ABI.

Next, several shared headers are intercepted to control renaming of
asm function redirects are used first, and sometimes macro renames
if the former is impractical.

Notably, if the request machine supports hardware IEEE128 (i.e POWER9
and newer) this ifunc machinery is disabled.  Likewise existing
ifunc support for float128 is consolidated into this (e.g sqrtf128
and fmaf128).

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-11-30 09:56:14 -06:00
Raphael M Zinsly
7beee7b39a powerpc: Add optimized stpncpy for POWER9
Add stpncpy support into the POWER9 strncpy.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-11-12 13:16:36 -03:00
Raphael M Zinsly
b9d83bf3eb powerpc: Add optimized strncpy for POWER9
Similar to the strcpy P9 optimization, this version uses VSX to improve
performance.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-11-12 13:12:24 -03:00
Paul E. Murphy
c79607a474 powerpc64le: guarantee a .gnu.attributes section [BZ #26220]
Upstream GCC 11 development is now building the ibm128 runtime
support (in libgcc) without a .gnu.attributes section on ppc64le.
Ensure we have one to replace by building one ibm128 file in
libc and libm with attributes.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-07-21 09:03:01 -05:00
Tulio Magno Quites Machado Filho
d2ba3677da powerpc: Add support for POWER10
1. Add the directories to hold POWER10 files.

2. Add support to select POWER10 libraries based on AT_PLATFORM.

3. Let submachine=power10 be set automatically.
2020-06-29 10:08:38 -03:00
Paul E. Murphy
b637306d3e powerpc64le: refactor e_sqrtf128.c
Combine both implementations into a single file to allow
building twice with appropriate multiarch support when possible.
2020-06-16 13:50:44 -05:00
Paul E. Murphy
a23bd00f9d powerpc64le: add optimized strlen for P9
This started as a trivial change to Anton's rawmemchr.  I got
carried away.  This is a hybrid between P8's asympotically
faster 64B checks with extremely efficient small string checks
e.g <64B (and sometimes a little bit more depending on alignment).

The second trick is to align to 64B by running a 48B checking loop
16B at a time until we naturally align to 64B (i.e checking 48/96/144
bytes/iteration based on the alignment after the first 5 comparisons).
This allieviates the need to check page boundaries.

Finally, explicly use the P7 strlen with the runtime loader when building
P9.  We need to be cautious about vector/vsx extensions here on P9 only
builds.
2020-06-05 15:30:00 -05:00
Paul E. Murphy
6ef4227509 powerpc64le: use common fmaf128 implementation
This defines the macro such that it should behave best on all
supported powerpc targets.  Likewise, this allows us to remove the
ppc64le specific s_fmaf128.c.

I have verified powerpc64le multiarch and powerpc64le power9
no-multiarch builds continue to generate optimize fmaf128.
2020-06-05 15:29:44 -05:00
Anton Blanchard
765de945ef powerpc: Optimized rawmemchr for POWER9
This version uses vector instructions and is up to 60% faster on medium
matches and up to 90% faster on long matches, compared to the POWER7
version. A few examples:

                            __rawmemchr_power9  __rawmemchr_power7
Length   32, alignment  0:   2.27566             3.77765
Length   64, alignment  2:   2.46231             3.51064
Length 1024, alignment  0:  17.3059             32.6678
2020-05-18 17:08:54 -05:00
Anton Blanchard via Libc-alpha
aa70d05632 powerpc: Optimized stpcpy for POWER9
Add stpcpy support to the POWER9 strcpy. This is up to 40% faster on
small strings and up to 90% faster on long relatively unaligned strings,
compared to the POWER8 version. A few examples:

                                        __stpcpy_power9  __stpcpy_power8
Length   20, alignments in bytes  4/ 4:  2.58246          4.8788
Length 1024, alignments in bytes  1/ 6: 24.8186          47.8528
2020-05-18 08:26:22 -05:00
Anton Blanchard via Libc-alpha
3903704850 powerpc: Optimized strcpy for POWER9
This version uses VSX store vector with length instructions and is
significantly faster on small strings and relatively unaligned large
strings, compared to the POWER8 version. A few examples:

                                        __strcpy_power9  __strcpy_power8
Length   16, alignments in bytes  0/ 0: 2.52454          4.62695
Length  412, alignments in bytes  4/ 0: 11.6             22.9185
2020-05-18 08:26:22 -05:00
Paul E. Murphy
4a4db1de2f powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
strcmp is used while resolving PLT references.  Vector registers
should not be used during this.  The P9 strcmp makes heavy use of
vector registers, so it should be avoided in rtld.

This prevents quiet vector register corruption when glibc is configured
with --disable-multi-arch and --with-cpu=power9.  This can be seen with
test-float64x-compat_totalordermag during the first call into
totalordermagf64x@GLIBC_2.27.

Add a guard to fallback to the power8 implementation when building
power9 strcmp for libraries other than libc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-05-04 13:27:31 -05:00
Gabriel F. T. Gomes
051be01f6b powerpc64le: Enable support for IEEE long double
On platforms where long double may have two different formats, i.e.: the
same format as double (64-bits) or something else (128-bits), building
with -mlong-double-128 is the default and function calls in the user
program match the name of the function in Glibc.  When building with
-mlong-double-64, Glibc installed headers redirect such calls to the
appropriate function.

Likewise, the internals of glibc are now built against IEEE long double.
However, the only (minimally) notable usage of long double is difftime.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-04-30 08:52:08 -05:00
Paul E. Murphy
5c7ccc2983 powerpc64le: blacklist broken GCC compilers (e.g GCC 7.5.0)
GCC 7.5.0 (PR94200) will refuse to compile if both -mabi=% and
-mlong-double-128 are passed on the command line.  Surprisingly,
it will work happily if the latter is not.  For the sake of
maintaining status quo, test for and blacklist such compilers.

Tested with a GCC 8.3.1 and GCC 7.5.0 compiler for ppc64le.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-04-30 08:52:08 -05:00
Paul E. Murphy
3a0acbdcc5 powerpc64le: bump binutils version requirement to >= 2.26
This is a small step up from 2.25 which brings in support for
rewriting the .gnu.attributes section of libc/libm.so.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-04-30 08:52:08 -05:00
Paul E. Murphy
50545f5aa0 powerpc64le: raise GCC requirement to 7.4 for long double transition
Add compiler feature tests to ensure we can build ieee128 long double.
These test for -mabi=ieeelongdouble, -mno-gnu-attribute, and -Wno-psabi.

Likewise, verify some compiler bugs have been addressed.  These aren't
helpful for building glibc, but may cause test failures when testing
the new long double.  See notes below from Raji.

On powerpc64le, some older compiler versions give error for the function
signbit() for 128-bit floating point types.  This is fixed by PR83862
in gcc 8.0 and backported to gcc6 and gcc7.  This patch adds a test
to check compiler version to avoid compiler errors during make check.

Likewise, test for -mno-gnu-attribute support which was

On powerpc64le, a few files are built on IEEE long double mode
(-mabi=ieeelongdouble), whereas most are built on IBM long double mode
(-mabi=ibmlongdouble, the default for -mlong-double-128).  Since binutils
2.31, linking object files with different long double modes causes
errors similar to:

  ld: libc_pic.a(s_isinfl.os) uses IBM long double,
      libc_pic.a(ieee128-qefgcvt.os) uses IEEE long double.
  collect2: error: ld returned 1 exit status
  make[2]: *** [../Makerules:649: libc_pic.os] Error 1

The warnings are fair and correct, but in order for glibc to have
support for both long double modes on powerpc64le, they have to be
ignored.  This can be accomplished with the use of -mno-gnu-attribute
option when building the few files that require IEEE long double mode.

However, -mno-gnu-attribute is not available in GCC 6, the minimum
version required to build glibc, so this patch adds a test for this
feature in powerpc64le builds, and fails early if it's not available.

Co-Authored-By: Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>
Co-Authored-By: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-04-30 08:52:08 -05:00
Paul E. Murphy
4531ba8ebf powerpc64le: enforce non-specific long double in .gnu.attributes section
We turn off this feature to avoid polluting our shared libary with
a specific value.  However, static libgcc is not under our control,
and has enabled this for ibm128 routines.  This pollutes the
resulting shared libraries with it.

Attach a post-linking hook to replace this section with one crafted
as hard-float + indeterminate ldbl.  This allows IEEE ldbl users to
avoid having to disable the gnu attributes feature which should
protect them from linking ibm ldbl libraries using the gnu attributes
feature.

Currently, this only replaces libc and libm which support both ldbl
formats and rely on application code to explicitly determine which
is to be used.

Strictly speaking, the section could be deleted with minimal lost value.
However correctly set attributes could prove useful for some future change,
and similarly missing attributes.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-04-06 10:23:58 -05:00
Paul E. Murphy
8e72163b16 powerpc64le: workaround ieee long double / _Float128 stdc++ bug
-mabi=ieeelongdouble triggers the stdc++ libraries _Float128
support, which then breaks if algorithm is included.  For now,
explicitly disable _Float128 for such tests.

I have opened up GCC BZ 94080 to track this.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-04-06 10:23:58 -05:00
Paul E. Murphy
6f82d05034 powerpc64le: Enforce -mabi=ibmlongdouble when -mfloat128 used
I have observed a bug on 7.4.0 whereby __mulkc3 calls are
swapped with __multc3 depending on ABI selection.  For the
sake of being overly cautious, build all _Float128 files
with ibm128 to workaround these compilers.  This has been
noted in GCC BZ 84914, and will not be fixed for GCC 7.

Likewise, non-math files built with _Float128 are assumed
to have ibm long double.  Explicilty preserve this
assumption.

Finally, add some bootstrapping code to avoid applying
these options until IEEE long double is enabled as they
require GCC 7 and above.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-04-06 10:23:58 -05:00
Paul E. Murphy
25ee3931f0 powerpc64le/multiarch: don't generate strong aliases for fmaf128-ppc64
This prevents generating a second alias for __fmaieee128 when
compiling with ldouble == ieee128 redirects.
2020-04-06 10:23:58 -05:00
Raphael Moreira Zinsly
66807aebad powerpc: Add support for fmaf128() in hardware
Adds a POWER9 version of fmaf128 that uses the xsmaddqp
instruction.

Co-authored-by: Tulio Magno Quites Machado Filho  <tuliom@linux.ibm.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-03-30 18:04:27 -03:00
Paul E. Murphy
57651ee4c8 powerpc64: apply -mabi=ibmlongdouble to special files
Some of these files depend on the avoidance of using the various
register sets of POWER.  When enabling the IEEE 128 long double,
we must be sure to disable this ABI as some compilers will
refuse to compile if -mno-vsx and -mabi=ieeelongdouble are both
present.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-03-25 14:34:23 -05:00
Paul E. Murphy
39517c008f powerpc64le: add -mno-gnu-attribute to *f128 objects and difftime
In practice, this flag should be applied globally, but it makes a good
sanity check to ensure ibm128 and ieee128 long double files are not
getting mismatched.  _Float128 files use no long double, thus are
always safe to use this option.

Similarly, when investigating the linker complaints, difftime
makes trivial, self contained, usage of long double, so thus it
is also explicitly marked as such.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-03-25 14:34:23 -05:00
Paul E. Murphy
3618e5fece Makeconfig: sandwich gnulib-tests between libc/ld linking of tests
This better resembles the default linking process with the gnulibs,
and also resolves the increasingly difficult to maintain
f128-loader-link usage on powerpc64le as some libgcc symbols are
dependent on those found in the loader (ld).
2020-03-25 14:34:23 -05:00
Gabriel F. T. Gomes
076d06e849 powerpc64le: Ensure correct ldouble compiler flags are used
Ensure the correct ldouble abi flags are applied to ibm128 files and
nldbl files.  Remove the IEEE options if used, and apply the flags
used to build ldouble files which are ibm128 abi.

nldbl tests are a little tricky.  To use the support, we must remove
all ldouble abi flags, and ensure -mlong-double-64 is used.

Co-authored-by: Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>
Co-authored-by: Tulio Magno Quites Machado Filho  <tuliom@linux.vnet.ibm.com>
Co-authored-by: Paul E. Murphy  <murphyp@linux.vnet.ibm.com>
2020-03-25 14:34:23 -05:00
Adhemerval Zanella
1c15464ca0 math: Remove inline math tests
With mathinline removal there is no need to keep building and testing
inline math tests.

The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries.  The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2020-03-19 11:45:44 -03:00
Wilco Dijkstra
220622dde5 Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol.  It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).

The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.

Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).

Passes buildmanyglibc.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-03 10:02:04 -03:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Florian Weimer
8b196ac4b8 Expand $(as-needed) and $(no-as-needed) throughout the build system
Since commit a3cc4f48e9 ("Remove
--as-needed configure test."), --as-needed support is no longer
optional.

The macros are not much shorter and do not provide documentary
value, either, so this commit removes them.
2019-12-03 21:37:50 +01:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Paul A. Clarke
10cce66930 [powerpc] Use __builtin_{mffs,mtfsf}
Replace inline asm uses of the "mffs" and "mtfsf" instructions with
the analogous GCC builtins.

__builtin_mffs and __builtin_mtfsf are both available in GCC 5 and above.
Given the minimum GCC level for GLibC is now GCC 6.2, it is safe to use
these builtins without restriction.

2019-03-29  Paul A. Clarke  <pc@us.ibm.com>

	* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_register): Replace inline
	asm with builtin.
	* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (FP_INIT_ROUNDMODE):
	Likewise.
	* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (_GET_DI_FPSCR): Likewise.
	(_GET_SI_FPSCR): Likewise.
	(_SET_SI_FPSCR): Likewise.
2019-03-29 19:16:34 -05:00
Joseph Myers
462e83a4a0 Add more spaces before '('.
This patch fixes more places where a space should have been present
before '(' in accordance with the GNU Coding Standards (as with the
previous patch, mainly for calls to sizeof).

Tested with build-many-glibcs.py.

	* sysdeps/powerpc/powerpc32/dl-machine.c
	(__elf_machine_fixup_plt): Use space before '('.
	(__process_machine_rela): Likewise.
	* sysdeps/powerpc/powerpc32/register-dump.h (register_dump):
	Likewise.
	* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (TI_BITS):
	Likewise.
	* sysdeps/powerpc/powerpc64/register-dump.h (register_dump):
	Likewise.
	* sysdeps/powerpc/test-arith.c (union_t): Likewise.
	(pattern): Likewise.
	(delta): Likewise.
	(check_result): Likewise.
	(check_excepts): Likewise.
	(check_op): Likewise.
	(fail_xr): Likewise.
	* sysdeps/unix/alpha/sysdep.h (syscall_promote): Likewise.
	* sysdeps/unix/sysv/linux/alpha/a.out.h (AOUTHSZ): Likewise.
	(SCNHSZ): Likewise.
	* sysdeps/unix/sysv/linux/hppa/makecontext.c (FRAME_SIZE_BYTES):
	Likewise.
	(ARGS): Likewise.
	(__makecontext): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (ucontext_t):
	Likewise.
2019-02-28 15:02:09 +00:00
Gabriel F. T. Gomes
4a2dd41cb5 powerpc64le: Remove test for GCC 6.2
The configure fragment for powerpc64le contains a test for the presence
of several compiler builtins and of the __float128 type, which are
provided by GCC 6.2 for powerpc64le.  Since this configure test was
added, the compiler version required to build glibc for powerpc64le was
different than that required for the other architectures.

Now that glibc requires GCC 6.2 globally (since commit ID 4dcbbc3b28),
this patch removes the powerpc64le-specific test.

Even tough the configure test checks for compiler features rather than
compiler version, the intent of the test was to stop build attempts at
early stages, if they had been configured with a too old compiler.  It
was not the intention of the test to detect compiler breakage (such as
the removal of the required compiler features in future GCC versions),
and glibc is not the place to test for compiler regressions, anyway.

Tested for powerpc64le with GCC 6.2 (built with build-many-glibcs.py).

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2019-02-20 11:06:51 -03:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Rajalakshmi Srinivasaraghavan
7793ad7a2c powerpc: Rearrange little endian specific files
This patch moves little endian specific POWER9 optimization files to
sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions
only for little endian.
2018-08-16 12:12:02 +05:30