Commit Graph

21 Commits

Author SHA1 Message Date
Matheus Castanho
0218463dd8 powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197]
__strncpy_power9 initializes VR 18 with zeroes to be used throughout the
code, including when zero-padding the destination string. However, the
v18 reference was mistakenly being used for stxv and stxvl, which take a
VSX vector as operand. The code ended up using the uninitialized VSR 18
register by mistake.

Both occurrences have been changed to use the proper VSX number for VR 18
(i.e. VSR 50).

Tested on powerpc, powerpc64 and powerpc64le.

Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>
2022-06-07 15:07:25 -03:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Pedro Franco de Carvalho
813c6ec808 powerpc: optimize strcpy/stpcpy for POWER9/10
This patch modifies the current POWER9 implementation of strcpy and
stpcpy to optimize it for POWER9/10.

Since no new POWER10 instructions are used, the original POWER9 strcpy is
modified instead of creating a new implementation for POWER10.  This
implementation is based on both the original POWER9 implementation of
strcpy and the preamble of the new POWER10 implementation of strlen.

The changes also affect stpcpy, which uses the same implementation with
some additional code before returning.

On POWER9, averaging improvements across the benchmark
inputs (length/source alignment/destination alignment), for an
experiment that ran the benchmark five times, bench-strcpy showed an
improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%.

On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%.

The changes are:

1. Removed the null string optimization.

   Although this results in a few extra cycles for the null string, in
   combination with the second change, this resulted in improvements for
   for other cases.

2. Adapted the preamble from strlen for POWER10.

   This is the part of the function that handles up to the first 16 bytes
   of the string.

3. Increased number of unrolled iterations in the main loop to 6.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Tested-by: Matheus Castanho <msc@linux.ibm.com>
2021-07-01 17:58:53 -03:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Paul E. Murphy
33fc34521d powerpc64le: ifunc select *f128 routines in multiarch mode
Programatically generate simple wrappers for interesting libm *f128
objects.  Selected functions are transcendental functions or
those with trivial compiler builtins.  This can result in a 2-3x
speedup (e.g logf128 and expf128).

A second set of implementation files are generated which include
the first implementation encountered along the search path.  This
usually works, except when a wrapper is overriden and makefile
search order slightly diverges from include order.  Likewise,
wrapper object files are created for each generated file.  These
hold the ifunc selection routines which export ABI.

Next, several shared headers are intercepted to control renaming of
asm function redirects are used first, and sometimes macro renames
if the former is impractical.

Notably, if the request machine supports hardware IEEE128 (i.e POWER9
and newer) this ifunc machinery is disabled.  Likewise existing
ifunc support for float128 is consolidated into this (e.g sqrtf128
and fmaf128).

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-11-30 09:56:14 -06:00
Raphael M Zinsly
7beee7b39a powerpc: Add optimized stpncpy for POWER9
Add stpncpy support into the POWER9 strncpy.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-11-12 13:16:36 -03:00
Raphael M Zinsly
b9d83bf3eb powerpc: Add optimized strncpy for POWER9
Similar to the strcpy P9 optimization, this version uses VSX to improve
performance.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2020-11-12 13:12:24 -03:00
Paul E. Murphy
b637306d3e powerpc64le: refactor e_sqrtf128.c
Combine both implementations into a single file to allow
building twice with appropriate multiarch support when possible.
2020-06-16 13:50:44 -05:00
Paul E. Murphy
a23bd00f9d powerpc64le: add optimized strlen for P9
This started as a trivial change to Anton's rawmemchr.  I got
carried away.  This is a hybrid between P8's asympotically
faster 64B checks with extremely efficient small string checks
e.g <64B (and sometimes a little bit more depending on alignment).

The second trick is to align to 64B by running a 48B checking loop
16B at a time until we naturally align to 64B (i.e checking 48/96/144
bytes/iteration based on the alignment after the first 5 comparisons).
This allieviates the need to check page boundaries.

Finally, explicly use the P7 strlen with the runtime loader when building
P9.  We need to be cautious about vector/vsx extensions here on P9 only
builds.
2020-06-05 15:30:00 -05:00
Paul E. Murphy
6ef4227509 powerpc64le: use common fmaf128 implementation
This defines the macro such that it should behave best on all
supported powerpc targets.  Likewise, this allows us to remove the
ppc64le specific s_fmaf128.c.

I have verified powerpc64le multiarch and powerpc64le power9
no-multiarch builds continue to generate optimize fmaf128.
2020-06-05 15:29:44 -05:00
Anton Blanchard
765de945ef powerpc: Optimized rawmemchr for POWER9
This version uses vector instructions and is up to 60% faster on medium
matches and up to 90% faster on long matches, compared to the POWER7
version. A few examples:

                            __rawmemchr_power9  __rawmemchr_power7
Length   32, alignment  0:   2.27566             3.77765
Length   64, alignment  2:   2.46231             3.51064
Length 1024, alignment  0:  17.3059             32.6678
2020-05-18 17:08:54 -05:00
Anton Blanchard via Libc-alpha
aa70d05632 powerpc: Optimized stpcpy for POWER9
Add stpcpy support to the POWER9 strcpy. This is up to 40% faster on
small strings and up to 90% faster on long relatively unaligned strings,
compared to the POWER8 version. A few examples:

                                        __stpcpy_power9  __stpcpy_power8
Length   20, alignments in bytes  4/ 4:  2.58246          4.8788
Length 1024, alignments in bytes  1/ 6: 24.8186          47.8528
2020-05-18 08:26:22 -05:00
Anton Blanchard via Libc-alpha
3903704850 powerpc: Optimized strcpy for POWER9
This version uses VSX store vector with length instructions and is
significantly faster on small strings and relatively unaligned large
strings, compared to the POWER8 version. A few examples:

                                        __strcpy_power9  __strcpy_power8
Length   16, alignments in bytes  0/ 0: 2.52454          4.62695
Length  412, alignments in bytes  4/ 0: 11.6             22.9185
2020-05-18 08:26:22 -05:00
Paul E. Murphy
4a4db1de2f powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
strcmp is used while resolving PLT references.  Vector registers
should not be used during this.  The P9 strcmp makes heavy use of
vector registers, so it should be avoided in rtld.

This prevents quiet vector register corruption when glibc is configured
with --disable-multi-arch and --with-cpu=power9.  This can be seen with
test-float64x-compat_totalordermag during the first call into
totalordermagf64x@GLIBC_2.27.

Add a guard to fallback to the power8 implementation when building
power9 strcmp for libraries other than libc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-05-04 13:27:31 -05:00
Raphael Moreira Zinsly
66807aebad powerpc: Add support for fmaf128() in hardware
Adds a POWER9 version of fmaf128 that uses the xsmaddqp
instruction.

Co-authored-by: Tulio Magno Quites Machado Filho  <tuliom@linux.ibm.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-03-30 18:04:27 -03:00
Wilco Dijkstra
220622dde5 Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol.  It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).

The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.

Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).

Passes buildmanyglibc.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-03 10:02:04 -03:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Rajalakshmi Srinivasaraghavan
7793ad7a2c powerpc: Rearrange little endian specific files
This patch moves little endian specific POWER9 optimization files to
sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions
only for little endian.
2018-08-16 12:12:02 +05:30
Gabriel F. T. Gomes
3a33b06969 powerpc64*: fix the order of implied sysdeps directories
The creation of the divergent sysdeps directory for powerpc64le

commit 2f7f3cd8cd
Author: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
Date:   Fri Jul 15 18:04:40 2016 -0500

    powerpc64le: Create divergent sysdep directory for powerpc64le.

allowed float128 to be enabled for powerpc64le (little-endian) and not
for powerpc64 (big-endian).  Since the only intended difference between
them was the presence or absence of the float128 interface, the sysdeps
directory for powerpc64le explicitly reused the files from powerpc64
(through the use of Implies files).

Although this works, it also means that files under the powerpc64
directory might be preferred over files under powerpc64le.  For
instance, on a build for powerpc64le with target set to power9, a file
from powerpc64/power5 might get built, even though a file with the same
name exists in powerpc64le/power8.  That happens because the processor
hierarchy was only defined in the sysdeps directory for powerpc64 (and
borrowed by powerpc64le).

This patch fixes this behavior, by creating new subdirectories under
powerpc64 (i.e.: powerpc64/be and powerpc64/le) and creating new Implies
files to provide the hierarchy of processors for powerpc64 and
powerpc64le separately.  These changes have no effect on installed,
stripped binaries (which remain unchanged).

Tested that installed stripped binaries are unchanged and that there are
no regressions on powerpc64 and powerpc64le.
2018-04-27 16:32:01 -03:00