Commit Graph

1338 Commits

Author SHA1 Message Date
Paul A. Clarke
e3d85df50b [powerpc] fenv_private.h clean up
fenv_private.h includes unused functions, magic macro constants, and
some replicated common code fragments.

Remove unused functions, replace magic constants with constants from
fenv_libc.h, and refactor replicated code.

Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
2019-09-27 08:48:56 -05:00
Paul A. Clarke
f1c56cdff0 [powerpc] SET_RESTORE_ROUND optimizations and bug fix
SET_RESTORE_ROUND brackets a block of code, temporarily setting and
restoring the rounding mode and letting everything else, including
exceptions generated within the block, pass through.

On powerpc, the current code clears the exception enables, which will hide
exceptions generated within the block.  This issue was introduced by me
in commit e905212627.

Fix this by not clearing exception enable bits in the prologue.

Also, since we are no longer changing the enable bits in either the
prologue or the epilogue, there is no need to test for entering/exiting
non-stop mode.

Also, optimize the prologue get/save/set rounding mode operations for
POWER9 and later by using 'mffscrn' when possible.

Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Fixes: e905212627

2019-09-19  Paul A. Clarke  <pc@us.ibm.com>

	* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_and_set_rn): New.
	(__fe_mffscrn): New.
	* sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
	Do not clear enable bits, remove obsolete code, use
	fegetenv_and_set_rn.
	(libc_feresetround_ppc): Remove obsolete code, use
	fegetenv_and_set_rn.
2019-09-19 13:02:30 -05:00
Adhemerval Zanella
b8a7c7da4e Refactor vDSO initialization code
Linux vDSO initialization code the internal function pointers require a
lot of duplicated boilerplate over different architectures.  This patch
aims to simplify not only the code but the required definition to enable
a vDSO symbol.

The changes are:

  1. Consolidate all init-first.c on only one implementation and enable
     the symbol based on HAVE_*_VSYSCALL existence.

  2. Set the HAVE_*_VSYSCALL to the architecture expected names string.

  3. Add a new internal implementation, get_vdso_mangle_symbol, which
     returns a mangled function pointer.

Currently the clock_gettime, clock_getres, gettimeofday, getcpu, and time
are handled in an arch-independent way, powerpc still uses some
arch-specific vDSO symbol handled in a specific init-first implementation.

Checked on aarch64-linux-gnu, arm-linux-gnueabihf, i386-linux-gnu,
mips64-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu,
sparc64-linux-gnu, and x86_64-linux-gnu.

	* sysdeps/powerpc/powerpc32/backtrace.c (is_sigtramp_address,
	is_sigtramp_address_rt): Use HAVE_SIGTRAMP_{RT}32 instead of SHARED.
	* sysdeps/powerpc/powerpc64/backtrace.c (is_sigtramp_address):
	Likewise.
	* sysdeps/unix/sysv/linux/aarch64/init-first.c: Remove file.
	* sysdeps/unix/sysv/linux/aarch64/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/arm/init-first.c: Likewise.
	* sysdeps/unix/sysv/linux/arm/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/mips/init-first.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/i386/init-first.c: Likewise.
	* sysdeps/unix/sysv/linux/riscv/init-first.c: Likewise.
	* sysdeps/unix/sysv/linux/riscv/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/s390/init-first.c: Likewise.
	* sysdeps/unix/sysv/linux/s390/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/sparc/init-first.c: Likewise.
	* sysdeps/unix/sysv/linux/sparc/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/x86/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/init-first.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/sysdep.h
	(HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL,
	HAVE_GETTIMEOFDAY_VSYSCALL): Define value based on kernel exported
	name.
	* sysdeps/unix/sysv/linux/arm/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
	HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/i386/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
	HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/mips/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
	HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/sysdep.h
	(HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL,
	HAVE_GETCPU_VSYSCALL, HAVE_TIME_VSYSCALL, HAVE_GET_TBFREQ,
	HAVE_SIGTRAMP_RT64, HAVE_SIGTRAMP_32, HAVE_SIGTRAMP_RT32i,
	HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/riscv/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL,
	HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
	HAVE_GETCPU_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/s390/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL,
	HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
	HAVE_GETCPU_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/sparc/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
	HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/x86_64/sysdep.h
	(HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
	HAVE_GETCPU_VSYSCALL): Likewise.
	* sysdeps/unix/sysv/linux/dl-vdso.h (VDSO_NAME, VDSO_HASH): Define to
	invalid names if architecture does not define them.
	(get_vdso_mangle_symbol): New symbol.
	* sysdeps/unix/sysv/linux/init-first.c: New file.
	* sysdeps/unix/sysv/linux/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/init-first.c (gettimeofday,
	clock_gettime, clock_getres, getcpu, time): Remove declaration.
	(__libc_vdso_platform_setup_arch): Likewise and use
	get_vdso_mangle_symbol to setup vDSO symbols.
	(sigtramp_rt64, sigtramp32, sigtramp_rt32, get_tbfreq): Add
	attribute_hidden.
	* sysdeps/unix/sysv/linux/powerpc/libc-vdso.h: Likewise.
	* sysdeps/unix/sysv/linux/sysdep-vdso.h (VDSO_SYMBOL): Remove
	definition.
2019-09-17 17:09:24 -03:00
Paul Eggert
5cb226d7e4 Fix three GNU license URLs, along with trailing-newline issues. 2019-09-07 03:13:16 -07:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Paul A. Clarke
0b3c9e57a4 [powerpc] fegetenv_status: simplify instruction generation
fegetenv_status() wants to use the lighter weight instruction 'mffsl'
for reading the Floating-Point Status and Control Register (FPSCR).
It currently will use it directly if compiled '-mcpu=power9', and will
perform a runtime check (cpu_supports("arch_3_00")) otherwise.

Nicely, it turns out that the 'mffsl' instruction will decode to
'mffs' on architectures older than "arch_3_00" because the additional
bits set for 'mffsl' are "don't care" for 'mffs'.  'mffs' is a superset
of 'mffsl'.

So, just generate 'mffsl'.
2019-08-28 13:53:09 -05:00
Paul A. Clarke
fec2bd2c2d [powerpc] fesetenv: optimize FPSCR access
fesetenv() reads the current value of the Floating-Point Status and Control
Register (FPSCR) to determine the difference between the current state of
exception enables and the newly requested state.  All of these bits are also
returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
Use that instead.

Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
FPU_ENABLES_MASK from fenv_libc.h.

Finally, use a local variable ('new') in favor of a pointer dereference
('*envp').
2019-08-28 13:52:17 -05:00
Paul A. Clarke
e905212627 [powerpc] SET_RESTORE_ROUND improvements
SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
libc_feresetround_ppc_ctx to bracket a block of code where the floating point
rounding mode must be set to a certain value.

For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
1. Read/save FPSCR.
2. Create new value for FPSCR with new rounding mode and enables cleared.
3. If new value is different than current value,
   a. If transitioning from a state where some exceptions enabled,
      enter "ignore exceptions / non-stop" mode.
   b. Write new value to FPSCR.
   c. Put a mark on the wall indicating the FPSCR was changed.

(1) uses the 'mffs' instruction.  On POWER9, the lighter weight 'mffsl'
instruction can be used, but it doesn't return all of the bits in the FPSCR.
fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
used instead of fegetenv_register.
(3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
instead use 'mtfsf 0b00000011' to write just the enables and the mode,
because some of the rest of the bits are not valid if 'mffsl' was used.
fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
otherwise.

For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
parameters such that it performs:
1. Retreive saved value of FPSCR, saved in prologue above.
2. Read FPSCR.
3. Create new value of FPSCR where:
   - Summary bits and exception indicators = current OR saved.
   - Rounding mode and enables = saved.
   - Status bits = current.
4. If transitioning from some exceptions enabled to none,
   enter "ignore exceptions / non-stop" mode.
5. If transitioning from no exceptions enabled to some,
   enter "catch exceptions" mode.
6. Write new value to FPSCR.

The summary bits are hardwired to the exception indicators, so there is no
need to restore any saved summary bits.
The exception indicator bits, which are sticky and remain set unless
explicitly cleared, would only need to be restored if the code block
might explicitly clear any of them.  This is certainly not expected.

So, the only bits that need to be restored are the enables and the mode.
If it is the case that only those bits are to be restored, there is no need to
read the FPSCR.  Steps (2) and (3) are unnecessary, and step (6) only needs to
write the bits being restored.

We know we are transitioning out of "ignore exceptions" mode, so step (4) is
unnecessary, and in step (6), we only need to check the state we are
entering.
2019-08-28 13:51:10 -05:00
Paul A. Clarke
3c1766ea10 [powerpc] fe{en,dis}ableexcept, fesetmode: optimize FPSCR accesses
Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
"mode" (exception enable and rounding mode) bits of the Floating Point Status
Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
used to write just those bits back to the FPSCR.  The net is better performance.

In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
they determine that it doesn't need to be written because it is not changing.
In either case, the local variable holds the current values of the enable
bits in the FPSCR.  This local variable can be used instead of again reading
the FPSCR.

Also, that value of the FPSCR which is read the second time is validated
against the requested enables.  Since the write can't fail, this validation
step is unnecessary, and can be removed.  Instead, the exceptions to be
enabled (or disabled) are transformed into available bits in the FPSCR,
then validated after being transformed back, to ensure that all requested
bits are actually being set.  For example, FE_INVALID_SQRT can be
requested, but cannot actually be set.  This bit is not mapped during the
transformations, so a test for that bit being set before and after
transformations will show the bit would not be set, and the function will
return -1 for failure.

Finally, convert the local macros in fesetmode.c to more generally useful
macros in fenv_libc.h.
2019-08-28 13:50:06 -05:00
Paul A. Clarke
cd7ce12a02 [powerpc] fe{en,dis}ableexcept optimize bit translations
The exceptions passed to fe{en,dis}ableexcept() are defined in the ABI
as a bitmask, a combination of FE_INVALID, FE_OVERFLOW, etc.
Within the functions, these bits must be translated to/from the corresponding
enable bits in the Floating Point Status Control Register (FPSCR).
This translation is currently done bit-by-bit.  The compiler generates
a series of conditional bit operations.  Nicely, the "FE" exception
bits are all a uniform offset from the FPSCR enable bits, so the bit-by-bit
operation can instead be performed by a shift with appropriate masking.
2019-08-28 13:49:19 -05:00
Joseph Myers
0175c9e9be Declare most TS 18661-1 interfaces for C2X.
C2X adds the interfaces from TS 18661-1, and all except a handful in
Annex F are unconditionally visible in C2X rather than only visible
when __STDC_WANT_IEC_60559_BFP_EXT__ is defined.  This patch updates
glibc headers accordingly: most uses of __GLIBC_USE
(IEC_60559_BFP_EXT) are changed to a new __GLIBC_USE
(IEC_60559_BFP_EXT_C2X).  (Regarding totalorder and totalordermag, the
type-generic macros in tgmath.h will go away when the functions are
changed to take pointer arguments.)

	* bits/libc-header-start.h (__GLIBC_USE_IEC_60559_BFP_EXT): Update
	comment.
	(__GLIBC_USE_IEC_60559_BFP_EXT_C2X): New macro.
	* bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Change to
	[__GLIBC_USE (IEC_60559_BFP_EXT_C2X)].
	* include/limits.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
	* math/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
	* math/math.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
	* stdlib/bits/stdlib-ldbl.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* stdlib/stdint.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
	* stdlib/stdlib.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise.
	* sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/csky/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/m68k/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/microblaze/bits/fenv.h [__GLIBC_USE
	(IEC_60559_BFP_EXT)]: Likewise.
	* sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/riscv/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise.
	* math/bits/mathcalls.h [__GLIBC_USE (IEC_60559_BFP_EXT)]:
	Likewise, except for totalorder, totalordermag, getpayload,
	setpayload and setpayloadsig.
	* math/tgmath.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise,
	except for totalorder and totalordermag.
2019-08-13 11:28:51 +00:00
Raoni Fassina Firmino
066020c5e8 powerpc: Cleanup: use actual power8 assembly mnemonics
Some implementations in sysdeps/powerpc/powerpc64/power8/*.S still had
pre power8 compatible binutils hardcoded macros and were not using
.machine power8.

This patch should not have semantic changes, in fact it should have the
same exact code generated.

Tested that generated stripped shared objects are identical when
using "strip --remove-section=.note.gnu.build-id".

Checked on:
- powerpc64le, power9, build-many-glibcs.py, gcc 6.4.1 20180104, binutils 2.26.2.20160726
- powerpc64le, power8, debian 9, gcc 6.3.0 20170516, binutils 2.28
- powerpc64le, power9, ubuntu 19.04, gcc 8.3.0, binutils 2.32
- powerpc64le, power9, opensuse tumbleweed, gcc 9.1.1 20190527, binutils 2.32
- powerpc64, power9, debian 10, gcc 8.3.0, binutils 2.31.1

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-08-01 15:57:50 -03:00
Paul A. Clarke
b5232c9f9e [powerpc] fenv_libc.h: protect use of __builtin_cpu_supports
Using __builtin_cpu_supports() requires support in GCC and Glibc.
My recent patch to fenv_libc.h added an unprotected use of
__builtin_cpu_supports().  Compilation of Glibc itself will fail
with a sufficiently new GCC and sufficiently old Glibc:

../sysdeps/powerpc/fpu/fegetexcept.c: In function ‘__fegetexcept’:
../sysdeps/powerpc/fpu/fenv_libc.h:52:20: error: builtin ‘__builtin_cpu_supports’ needs GLIBC (2.23 and newer) that exports hardware capability bits [-Werror]

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Fixes 3db85a9814.
2019-07-09 13:09:35 -05:00
Adhemerval Zanella
6ea21bfe43 powerpc: refactor logb{f,l}
The power7 logb implementation does not show a performance gain on
ISA 2.07+ chips with faster floating-point to GRP instructions
(currently POWER8 and POWER9).

This patch moves the POWER7 implementation to generic one and enables
it for POWER7.  It also add some cleanup to use inline floating-point
number instead of define them using static const.

The performance difference is for POWER9:

  - Without patch:
  "logb": {
   "subnormal": {
    "duration": 4.99202e+09,
    "iterations": 8.83662e+08,
    "max": 75.194,
    "min": 5.501,
    "mean": 5.64925
   },
   "normal": {
    "duration": 4.97063e+09,
    "iterations": 9.97094e+08,
    "max": 46.489,
    "min": 4.956,
    "mean": 4.98512
   }
  }

  - With patch:
  "logb": {
   "subnormal": {
    "duration": 4.97226e+09,
    "iterations": 9.92036e+08,
    "max": 77.209,
    "min": 4.892,
    "mean": 5.01218
   },
   "normal": {
    "duration": 4.96192e+09,
    "iterations": 1.07545e+09,
    "max": 12.361,
    "min": 4.593,
    "mean": 4.61382
   }
  }

The ifunc implementation is also enabled only for powerpc64.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/power7/fpu/s_logb.c: Move to ...
	* sysdeps/powerpc/fpu/s_logb.c: ... here.  Use inline FP constants.
	* sysdeps/powerpc/power7/fpu/s_logbf.c: Move to ...
	* sysdeps/powerpc/fpu/s_logbf.c: ... here.  Use inline FP constants.
	* sysdeps/powerpc/power7/fpu/s_logbl.c: Move to ...
	* sysdeps/powerpc/fpu/s_logbl.c: ... here.  Use inline FP constants.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb-power7.c:
	Adjust implementation path.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf-power7.c:
	Adjust implementation path.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbl-power7.c:
	Adjust implementation path.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
	(libm-sysdep_routines): Add s_log* objects.
	(CFLAGS-s_logbf-power7.c, CFLAGS-s_logbl-power7.c,
	CFLAGS-s_logb-power7.c): New fule.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-power7.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-power7.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-ppc64.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-ppc64.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-power7.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-power7.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-ppc64.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-ppc64.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-power7.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-power7.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-ppc64.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-ppc64.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: Remove file.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_logb.c: Remove file.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_logbf.c: Likewise.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_logbl.c: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-07-08 17:22:22 -03:00
Adhemerval Zanella
931c616eed powerpc: Refactor modf{f}
The modf{f} optimization is not an optimization for ISA 2.07+.  This
patch move the IFUNC for powerpc64 only, move the power5+ to generic
location, and include the generic implementation for ISA 2.07+.

The performance changes are based on modf benchtests:

  * POWER9 - ppc64
  "modf": {
   "": {
    "duration": 4.97057e+09,
    "iterations": 1.00688e+09,
    "max": 28.76,
    "min": 4.912,
    "mean": 4.9366
   }
  }
  * POWER9 - power5+
  "modf": {
   "": {
    "duration": 4.98291e+09,
    "iterations": 9.32818e+08,
    "max": 15.058,
    "min": 5.107,
    "mean": 5.34178
   }
  }

  * POWER8 - ppc64
   "modf": {
   "": {
    "duration": 5.05329e+09,
    "iterations": 8.38814e+08,
    "max": 518.051,
    "min": 5.79,
    "mean": 6.02433
   }
  }
  * POWER8 - power5+
  "modf": {
   "": {
    "duration": 5.05573e+09,
    "iterations": 8.35254e+08,
    "max": 63.141,
    "min": 5.873,
    "mean": 6.05293
   }
  }

  * POWER7 - ppc64
  "modf": {
   "": {
    "duration": 4.89818e+09,
    "iterations": 1.08408e+09,
    "max": 57.556,
    "min": 3.953,
    "mean": 4.51827
   }
  }
  * POWER7 - power5+
  "modf": {
   "": {
    "duration": 4.83789e+09,
    "iterations": 1.33409e+09,
    "max": 46.608,
    "min": 2.224,
    "mean": 3.62636
   }
  }

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/power5+/fpu/s_modf.c: Move to ...
	* sysdeps/powerpc/fpu/s_modf.c: ... here.  Add ISA 2.07 optimization.
	* sysdeps/powerpc/power5+/fpu/s_modff.c: Move to ...
	* sysdeps/powerpc/fpu/s_modff.c: ... here.  Add ISA 2.07 optimization.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf-power5+.c:
	Adjust include.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modff-power5+.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile (sysdep_calls,
	sysdep_routines): Add s_modf* objects.
	(CFLAGS-s_modf-power5+.c, CFLAGS-s_modff-power5+.c,
	CFLAGS-s_modf-ppc64.c, CFLAGS-s_modff-ppc64.c): New rule.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf-power5+.c: Movo
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf-power5+.c: Move
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modf.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-power5+.c: Move
	to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-power5+.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff-ppc64.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff-ppc64.c:
	... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_modff.c: ... here.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-07-08 17:22:22 -03:00
Adhemerval Zanella
69461d9896 powerpc: hypot refactor and optimization
The powerpc hypot is slight optimized by:

  - Commit 8df4e219e4, both isnan and isinf are always inlined and thus
    the check TEST_INF_NAN does not make sense anymore.  The generic
    check for POWER7 should be faster on all powerpc configuration.

  - The redundant check 'y > two60factor && (x / y) > two60' is removed.

Both changes leads to unrequired ifunc especialization for power7 and
thus they are removed.  Finally The code is also cleanup a bit by inlining
the constants floating points.

The performance changes using the hypot benchtests are:

  - POWER9 without patch:
    "hypot": {
     "overflow": {
      "duration": 4.98585e+09,
      "iterations": 4.84932e+08,
      "max": 46.551,
      "min": 10.229,
      "mean": 10.2815
     },
     "higher_two500": {
      "duration": 5.00192e+09,
      "iterations": 4.24843e+08,
      "max": 33.319,
      "min": 11.606,
      "mean": 11.7736
     },
     "subnormal": {
      "duration": 5.0075e+09,
      "iterations": 4.06792e+08,
      "max": 22.178,
      "min": 12.15,
      "mean": 12.3097
     },
     "less_two500": {
      "duration": 5.00685e+09,
      "iterations": 4.08772e+08,
      "max": 22.784,
      "min": 12.052,
      "mean": 12.2485
     },
     "default": {
      "duration": 5.06002e+09,
      "iterations": 4.09894e+08,
      "max": 20.648,
      "min": 11.874,
      "mean": 12.3447
     }
    }

  - POWER9 with patch:
    "hypot": {
     "overflow": {
      "duration": 4.91848e+09,
      "iterations": 7.28039e+08,
      "max": 47.958,
      "min": 6.436,
      "mean": 6.75579
     },
     "higher_two500": {
      "duration": 4.9359e+09,
      "iterations": 6.63376e+08,
      "max": 20.783,
      "min": 7.321,
      "mean": 7.44057
     },
     "subnormal": {
      "duration": 4.9479e+09,
      "iterations": 6.19772e+08,
      "max": 18.856,
      "min": 7.817,
      "mean": 7.98341
     },
     "less_two500": {
      "duration": 4.94275e+09,
      "iterations": 6.3889e+08,
      "max": 17.452,
      "min": 7.597,
      "mean": 7.73647
     },
     "default": {
      "duration": 5.03645e+09,
      "iterations": 5.70718e+08,
      "max": 18.904,
      "min": 8.55,
      "mean": 8.82476
     }
    }

  - POWER7 without patch
    "hypot": {
     "overflow": {
      "duration": 4.86637e+09,
      "iterations": 6.43196e+08,
      "max": 53.958,
      "min": 7.328,
      "mean": 7.56592
     },
     "higher_two500": {
      "duration": 4.99842e+09,
      "iterations": 3.11012e+08,
      "max": 78.227,
      "min": 15.696,
      "mean": 16.0715
     },
     "subnormal": {
      "duration": 4.99841e+09,
      "iterations": 3.08935e+08,
      "max": 51.392,
      "min": 15.983,
      "mean": 16.1795
     },
     "less_two500": {
      "duration": 5.00108e+09,
      "iterations": 2.99464e+08,
      "max": 73.247,
      "min": 16.416,
      "mean": 16.7001
     },
     "default": {
      "duration": 5.04645e+09,
      "iterations": 3.52608e+08,
      "max": 70.073,
      "min": 13.38,
      "mean": 14.3118
     }
    }

  - POWER7 with patch
    "hypot": {
     "overflow": {
      "duration": 4.80785e+09,
      "iterations": 8.00001e+08,
      "max": 66.262,
      "min": 5.888,
      "mean": 6.00981
     },
     "higher_two500": {
      "duration": 4.9859e+09,
      "iterations": 3.39449e+08,
      "max": 5148.44,
      "min": 14.539,
      "mean": 14.6882
     },
     "subnormal": {
      "duration": 4.9905e+09,
      "iterations": 3.28874e+08,
      "max": 64.905,
      "min": 14.971,
      "mean": 15.1745
     },
     "less_two500": {
      "duration": 4.99494e+09,
      "iterations": 3.19755e+08,
      "max": 103.696,
      "min": 14.972,
      "mean": 15.6211
     },
     "default": {
      "duration": 5.03951e+09,
      "iterations": 4.02502e+08,
      "max": 61.008,
      "min": 12.368,
      "mean": 12.5205
     }
    }

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/e_hypot.c (two60, two500, two600, two1022,
	twoM500, twoM600, two60factor, pdnum): Remove.
	(TEST_INFO_NAN, GET_TW0_HIGH_WORD): Remove macro.
	(__ieee754_hypot): Replace static variables with inline definition,
	remove ununsed branches.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove e_hypot-* objects.
	(CFLAGS-e_hypot-power7.c, CFLAGS-e_hypotf-power7.c): Remove rule.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-power7.c: Remove
	file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-power7.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-07-08 17:21:15 -03:00
Paul A. Clarke
3db85a9814 powerpc: Use faster means to access FPSCR when possible in some cases
Using 'mffs' instruction to read the Floating Point Status Control Register
(FPSCR) can force a processor flush in some cases, with undesirable
performance impact.  If the values of the bits in the FPSCR which force the
flush are not needed, an instruction that is new to POWER9 (ISA version 3.0),
'mffsl' can be used instead.

Cases included:  get_rounding_mode, fegetround, fegetmode, fegetexcept.

	* sysdeps/powerpc/bits/fenvinline.h (__fegetround): Use
	__fegetround_ISA300() or __fegetround_ISA2() as appropriate.
	(__fegetround_ISA300) New.
	(__fegetround_ISA2) New.
	* sysdeps/powerpc/fpu_control.h (IS_ISA300): New.
	(_FPU_MFFS): Move implementation...
	(_FPU_GETCW): Here.
	(_FPU_MFFSL): Move implementation....
	(_FPU_GET_RC_ISA300): Here. New.
	(_FPU_GET_RC): Use _FPU_GET_RC_ISA300() or _FPU_GETCW() as appropriate.
	* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status_ISA300): New.
	(fegetenv_status): New.
	* sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Use fegetenv_status()
	instead of fegetenv_register().
	* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Likewise.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2019-06-30 08:40:44 -03:00
Adhemerval Zanella
aa32f5bf0c powerpc: Use generic e_expf
Generic implementation is faster on both power8 and power9:

POWER9:
- sysdeps/ieee754/flt-32/e_expf.c
  "expf": {
   "workload-spec2017.wrf": {
    "duration": 5.1236e+09,
    "iterations": 7.53344e+08,
    "reciprocal-throughput": 5.9436,
    "latency": 7.65869,
    "max-throughput": 1.68248e+08,
    "min-throughput": 1.30571e+08
   }
  }

- sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S
  "expf": {
   "workload-spec2017.wrf": {
    "duration": 5.14429e+09,
    "iterations": 5.29248e+08,
    "reciprocal-throughput": 8.05372,
    "latency": 11.3863,
    "max-throughput": 1.24166e+08,
    "min-throughput": 8.78249e+07
   }
  }

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove e_expf-power8 and expf-ppc64.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-power8.S: Remove
	file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/e_expf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
2019-06-26 14:33:58 -03:00
Adhemerval Zanella
9d5d214e86 powerpc: Refactor powerpc32 lround/lroundf/llround/llroundf
This patches consolidates all the powerpc llround{f} implementations on
the generic sysdeps/powerpc/powerpc32/fpu/s_llround{f}.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/powerpc32/fpu/Makefile
	[$(subdir) == math] (CFLAGS-s_lround.c): New rule.
	* sysdeps/powerpc/powerpc32/fpu/s_llround.c (__llround): Add power5+
	and fctidz optimization.
	* sysdeps/powerpc/powerpc32/fpu/s_lround.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_lround.c: New file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
	(CFLAGS-s_llround-power6.c, CFLAGS-s_llround-power5+.c,
	CFLAGS-s_llround-ppc32.c, CFLAGS-s_lround-ppc32.c,
	CFLAGS-s_lround-power5+.c): New rule.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power5+.c:
	New file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-power5+.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power5+.S:
	Remove file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-power6.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-power5+.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/s_llroundf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power5+/fpu/s_llround.S: Likewise.
	* sysdeps/powerpc/powerpc32/power5+/fpu/s_llroundf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power5+/fpu/s_lround.S: Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_llround.S: Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_llroundf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
2019-06-26 14:32:45 -03:00
Paul A. Clarke
49bc41b642 [powerpc] add 'volatile' to asm
Add 'volatile' keyword to a few asm statements, to force the compiler
to generate the instructions therein.

Some instances were implicitly volatile, but adding keyword for consistency.

2019-06-19  Paul A. Clarke  <pc@us.ibm.com>

	* sysdeps/powerpc/fpu/fenv_libc.h (relax_fenv_state): Add 'volatile'.
	* sysdeps/powerpc/fpu/fpu_control.h (__FPU_MFFS): Likewise.
	(__FPU_MFFSL): Likewise.
	(_FPU_SETCW): Likewise.
2019-06-19 20:20:02 -05:00
Adhemerval Zanella
dee07df1a4 powerpc: Refactor powerpc64 lround/lroundf/llround/llroundf
This patches consolidates all the powerpc {l}lround{f} implementations
on the generic sysdeps/powerpc/fpu/s_{l}lround{f}.c.

The IFUNC support is also moved only to powerpc64 only, since for
powerpc64le generic implementation resulting in optimized code.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
	(libm-sysdep_routines): Add s_llround-power8, s_llround-power6x,
	s_llround-power5+, s_llround-ppc64, and s_llroundf-ppc64.
	(CFLAGS-s_llround-power8.c, CFLAGS-s_llround-power6x.c,
	CFLAGS-s_llround-power5+.c): New rule.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power5+.c:
	New file.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power6x.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-power8.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llround.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llroundf.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_lround.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lround.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/Makefile
	[$(subdir) == math] (CFLAGS-s_llround.c): New rule.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove s_llround-* objects.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power5+.S: Remove
	file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power6x.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-power8.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround-ppc64.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf-ppc64.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_llround.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_lround.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_lroundf.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_llround.c: New file.
	* sysdeps/powerpc/powerpc64/fpu/s_llroundf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_lround.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_lroundf.c: Likewise.
	* sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Likewise.
	* sysdeps/powerpc/powerpc64/power5+/fpu/s_llroundf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Likewise.
	* sysdeps/powerpc/powerpc64/power6x/fpu/s_llroundf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_llroundf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-17 09:28:21 -03:00
Adhemerval Zanella
2166283fcc powerpc: Refactor powerpc32 lrint/lrintf/llrint/llrintf
This patches consolidates all the powerpc llrint{f} implementations on
the generic sysdeps/powerpc/powerpc32/fpu/s_llrint{f}.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cpu and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/s_lrintf.S: Remove file.
	* sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Move to ...
	* sysdeps/powerpc/fpu/s_lrintf.c: ... here.
	* sysdeps/powerpc/powerpc32/fpu/Makefile
	[$(subdir) == math] (CFLAGS-s_lrint.c): New rule.
	* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Add power4
	optimization.
	* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_lrint.c: New file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
	(CFLAGS-s_llrintf-power6.c, CFLAGS-s_llrintf-ppc32.c,
	CFLAGS-s_llrint-power6.c, CFLAGS-s_llrint-ppc32.c,
	CFLAGS-s_lrint-ppc32.c): New rule.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.S:
	Remove file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-power6.c:
	New file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-power6.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint-ppc32.c:
	Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-17 09:27:02 -03:00
Adhemerval Zanella
78049de0a9 powerpc: refactor powerpc64 lrint/lrintf/llrint/llrintf
This patches consolidates all the powerpc llrint{f} implementations on
the generic sysdeps/powerpc/fpu/s_llrint{f}.

The IFUNC support is also moved only to powerpc64 only, since for
powerpc64le generic implementation resulting in optimized code.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
	(libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and
	s_llrint-ppc64.
	(CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New
	file.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file.
	* sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove s_llrint-* objects.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove
	file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file.
	* sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file.
	* sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise.
	* sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-17 09:26:21 -03:00
Adhemerval Zanella
1192696069 powerpc: Remove optimized finite
The powerpc finite optimization do not show much gain:

  - GCC will call libm iff -fsignaling-nans is used. This usage pattern
    is usually not performance oriented and for such calls PLT overhead
    should dominate execution time.

  - The power7 uses ftdiv to optimize for some input patterns, but at
    cost of others.  Comparing against generic C implementation built
    for powerpc64-linux-gnu-power7 (--with-cpu=power7):

    - Generic sysdeps/ieee754 implementation:
       "isfinite": {
        "": {
         "duration": 5.0082e+09,
         "iterations": 2.45299e+09,
         "max": 43.824,
         "min": 2.008,
         "mean": 2.04167
        },
        "INF": {
         "duration": 4.66554e+09,
         "iterations": 2.28288e+09,
         "max": 35.73,
         "min": 2.008,
         "mean": 2.04371
        },
        "NAN": {
         "duration": 4.66274e+09,
         "iterations": 2.28716e+09,
         "max": 34.161,
         "min": 2.009,
         "mean": 2.03866
        }
       }

    - power7 optimized one:
       "isfinite": {
        "": {
         "duration": 4.99111e+09,
         "iterations": 2.65566e+09,
         "max": 25.015,
         "min": 1.716,
         "mean": 1.87942
        },
        "INF": {
         "duration": 4.6783e+09,
         "iterations": 2.0999e+09,
         "max": 35.264,
         "min": 1.868,
         "mean": 2.22787
        },
        "NAN": {
         "duration": 4.67915e+09,
         "iterations": 2.08678e+09,
         "max": 38.099,
         "min": 1.869,
         "mean": 2.24228
        }
       }

     So it basically optimizes marginally for normal numbers while
     increasing the latency for other kind of FP.

  - The power8 implementation is just the generic implementation using
    ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
    So generic implementation is the best option for powerpc64le.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
	(sysdeps_routines, libm-sysdep_routines): Remove s_finite*
	objects.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S:
	Remove file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise.
	* sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise.
	* sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
	Remove s_finite* objects.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-12 14:32:39 -03:00
Adhemerval Zanella
6427a6ac8c powerpc: Remove optimized isinf
The powerpc isinf optimizations onyl adds complexity:

  - GCC will call libm iff -fsignaling-nans is used. This usage pattern
    is usually not performance oriented and for such calls PLT overhead
    should dominate execution time.

  - The power7 uses ftdiv to optimize for some input pattern and branch
    implementation for INF and denormal that does:

    return (ix & UINT64_C (0x7fffffffffffffff)) == UINT64_C (0x7ff0000000000000)

    Although it does show slight better latency than generic algorithm
    (as below), it is only for power7 and requires it to override it
    for power8.

  - The power8 implementation is just the generic implementation using
    ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
    So generic implementation is the best option for powerpc64le.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
	(sysdeps_routines, libm-sysdep_routines): Remove s_isinf* and s_isinf*
	objects.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-power7.S:
	Remove file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise.
	* sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power7/fpu/s_isinff.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
	Remove s_isinf* and s_isinf* objects.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_isinff.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_isinff.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-12 14:32:39 -03:00
Adhemerval Zanella
2666f96390 powerpc: Remove optimized isnan
The powerpc isnan optimizations are not really a gain:

  - GCC will call libm iff -fsignaling-nans is used. This usage pattern
    is usually not performance oriented and for such calls PLT overhead
    should dominate execution time.

  - The power5, power6, and power6x are just micro-optimization to
    improve the Load-Hit-Store hazards from floating-point to general
    register transfer, and current GCC already has support to minimize
    it by inserting either extra nops or group dispatch instructions.

  - The power7 uses ftdiv to optimize for some input patterns, but at
    cost of others.  Comparing against generic C implementation built
    for powerpc-linux-gnu-power4 (which uses the hp-timing support on
    benchtests):

    - Generic sysdeps/ieee754 implementation:
      "isnan": {
       "": {
        "duration": 4.98415e+09,
        "iterations": 2.34516e+09,
        "max": 45.925,
        "min": 2.052,
        "mean": 2.12529
       },
       "INF": {
        "duration": 4.74057e+09,
        "iterations": 1.69761e+09,
        "max": 91.01,
        "min": 2.052,
        "mean": 2.79249
       },
       "NAN": {
        "duration": 4.74071e+09,
        "iterations": 1.68768e+09,
        "max": 282.343,
        "min": 2.052,
        "mean": 2.809
       }
      }

    - power7 optimized one:
    $ ./testrun.sh benchtests/bench-isnan
      "isnan": {
       "": {
        "duration": 4.96842e+09,
        "iterations": 2.56297e+09,
        "max": 50.048,
        "min": 1.872,
        "mean": 1.93854
       },
       "INF": {
        "duration": 4.76648e+09,
        "iterations": 1.54213e+09,
        "max": 373.408,
        "min": 2.661,
        "mean": 3.09084
       },
       "NAN": {
        "duration": 4.76845e+09,
        "iterations": 1.54515e+09,
        "max": 51.016,
        "min": 2.736,
        "mean": 3.08607
       }
      }

    So it basically optimizes marginally for normal numbers while
    increasing the latency for other kind of FP.

  - The generic implementation requires getting the floating point
    status, disable the invalid operation bit, and restore the
    floating-point status.  Each operation is costly and requires
    flushing the FP pipeline.

    Using the same scenarion for the previous analysis:

      "isnan": {
       "": {
        "duration": 5.08284e+09,
        "iterations": 6.2898e+08,
        "max": 41.844,
        "min": 8.057,
        "mean": 8.08108
       },
       "INF": {
        "duration": 4.97904e+09,
        "iterations": 6.16176e+08,
        "max": 39.661,
        "min": 8.057,
        "mean": 8.08055
       },
       "NAN": {
        "duration": 4.98695e+09,
        "iterations": 5.95866e+08,
        "max": 29.728,
        "min": 8.345,
        "mean": 8.36925
       }
      }

  - The power8 implementation is just the generic implementation using
    ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
    So generic implementation is the best option for powerpc64le.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/s_isnan.c: Remove file.
	* sysdeps/powerpc/fpu/s_isnanf.S: Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
	(sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and
	s_isnanf-* objects.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S:
	Remove file
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise.
	* sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls):
	Remove s_isnan-* and s_isnanf-* objects.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise.
	* sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-12 14:32:36 -03:00
Adhemerval Zanella
e41d66e41a powerpc: copysign cleanup
GCC always expand copysign{f} for all possible cpus, so calling the libm
is only done if user explicitly states to disable the builtin (which is
done usually not for performance reason).  So to provide ifunc variant
for copysign is just unrequired complexity, since libm will be called
on non-performance critical code.

This patch removes both powerpc32 and powerpc64 ifunc variants and
consolidates the powerpc implementation on
sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/s_copysign.c: New file.
	* sysdeps/powerpc/fpu/s_copysignf.c: Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
	(sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and
	s_copysign-ppc32.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S:
	Remove file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise.
	* sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls):
	Remove s_copysign-power6 s_copysign-ppc64.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S:
	Remove file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise.
	* sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-12 11:46:26 -03:00
Adhemerval Zanella
21bd039bb4 powerpc: consolidate rint
This patches consolidates all the powerpc rint{f} implementations on
the generic sysdeps/powerpc/fpu/s_rint{f}.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode,
	round_to_integer_float, round_mode): Add RINT handling.
	(reset_fenv_mode): New symbol.
	* sysdeps/powerpc/fpu/s_rint.c (__rint): Use generic implementation.
	* sysdeps/powerpc/fpu/s_rintf.c (__rintf): Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-12 11:46:22 -03:00
Paul A. Clarke
de751ebc9e [powerpc] get_rounding_mode: utilize faster method to get rounding mode
Add support to use 'mffsl' instruction if compiled for POWER9 (or later).

Also, mask the result to avoid bleeding unrelated bits into the result of
_FPU_GET_RC().

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2019-06-06 14:11:56 -05:00
Paul A. Clarke
0158473d8f [powerpc] fegetexcept: utilize function instead of duplicating code
fegetexcept() included code which exactly duplicates the code in
fenv_reg_to_exceptions().  Replace with a call to that function.

2019-06-05  Paul A. Clarke  <pc@us.ibm.com>

	* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Replace code
	with call to equivalent function.
2019-06-05 19:38:10 -05:00
Gabriel F. T. Gomes
9250e6610f powerpc: Fix build failures with current GCC
Since GCC commit 271500 (svn), also known as the following commit on the
git mirror:

commit 61edec870f9fdfb5df3fa4e40f28cbaede28a5b1
Author: amodra <amodra@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed May 22 04:34:26 2019 +0000

    [RS6000] Don't pass -many to the assembler

glibc builds are failing when an assembly implementation does not
declare the correct '.machine' directive, or when no such directive is
declared at all.  For example, when a POWER6 instruction is used, but
'.machine power6' is not declared, the assembler will fail with an error
similar to the following:

    ../sysdeps/powerpc/powerpc64/power8/strcmp.S: Assembler messages:
     24 ../sysdeps/powerpc/powerpc64/power8/strcmp.S:55: Error: unrecognized opcode: `cmpb'

This patch adds '.machine powerN' directives where none existed, as well
as it updates '.machine power7' directives on POWER8 files, because the
minimum binutils version required to build glibc (binutils 2.25) now
provides this machine version.  It also adds '-many' to the assembler
command used to build tst-set_ppr.c.

Tested for powerpc, powerpc64, and powerpc64le, as well as with
build-many-glibcs.py for powerpc targets.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2019-05-30 11:10:48 -03:00
Adhemerval Zanella
e47308c98d powerpc: generic nearbyint/nearbyintf
This patches consolidates all the powerpc nearbyint{f} implementations
on the generic sysdeps/powerpc/fpu/s_nearbyint{f}.

	* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
	NEARBYINT handling.
	* sysdeps/powerpc/fpu/s_nearbyint.c: New file.
	* sysdeps/powerpc/fpu/s_nearbyintf.c: Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Likewise.
2019-05-28 18:16:48 -03:00
Zack Weinberg
a053e87849
Remove support for PowerPC SPE extension (powerpc*-*-*gnuspe*).
GCC 9 dropped support for the SPE extensions to PowerPC, which means
powerpc*-*-*gnuspe* configurations are no longer buildable with that
compiler.  This ISA extension was peculiar to the “e500” line of
embedded PowerPC chips, which, as far as I can tell, are no longer
being manufactured, so I think we should follow suit.

This patch was developed by grepping for “e500”, “__SPE__”, and
“__NO_FPRS__”, and may not eliminate every vestige of SPE support.
Most uses of __NO_FPRS__ are left alone, as they are relevant to
normal embedded PowerPC with soft-float.

        * sysdeps/powerpc/preconfigure: Error out on powerpc-*-*gnuspe*
        host type.
        * scripts/build-many-glibcs.py: Remove powerpc-*-linux-gnuspe
        and powerpc-*-linux-gnuspe-e500v1 from list of build configurations.

        * sysdeps/powerpc/powerpc32/e500: Recursively delete.
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/e500: Recursively delete.
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/context-e500.h:
        Delete.

        * sysdeps/powerpc/fpu_control.h: Remove SPE variant.
        Issue an #error if used with a compiler in SPE-float mode.
        * sysdeps/powerpc/powerpc32/__longjmp_common.S
        * sysdeps/powerpc/powerpc32/setjmp_common.S
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/getcontext-common.S
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/getcontext.S
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/setcontext.S
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/swapcontext.S
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S
        * sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S:
        Remove code to preserve SPE register state.

        * sysdeps/unix/sysv/linux/powerpc/elision-lock.c
        * sysdeps/unix/sysv/linux/powerpc/elision-trylock.c
        * sysdeps/unix/sysv/linux/powerpc/elision-unlock.c
        Remove __SPE__ ifndefs.
2019-05-22 10:05:40 -04:00
Adhemerval Zanella
ae45cf84af powerpc: trunc/truncf refactor
This patches consolidates all the powerpc trunc{f} implementations on
the generic sysdeps/powerpc/fpu/s_trunc{f}.  The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.

The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/trunc_to_integer.h (set_fenv_mode): Add
	 TRUNC handling.
	(round_mode): Add definition for TRUNC.
	* sysdeps/powerpc/fpu/s_trunc.c: New file.
	* sysdeps/powerpc/fpu/s_truncf.c: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.c: New
	file.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.c:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.c:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.c:
	Likewise.
	* sysdep/powerpc/powerpc32/power5+/fpu/s_trunc.S: Remove file.
	* sysdep/powerpc/powerpc32/power5+/fpu/s_truncf.S: Likewise.
	* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
	(libm-sysdep_routines): Add s_trunc-power5+, s_trunc-ppc64,
	s_truncf-power5+, and s_truncf-ppc64.
	(CFLAGS-s_trunc-power5+.c, CFLAGS-s_truncf-power5+.c): New rule.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-power5+.c: New
	file.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_trunc.c: ... here.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-power5+.c: New
	file.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_truncf.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove s_trunc-power5+, s_trunc-ppc64,
	s_truncf-power5+, and s_truncf-ppc64.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Remove
	file.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S:
	Likewise.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise.
	* sysdep/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise.
	* sysdep/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
2019-05-09 09:39:28 -03:00
Adhemerval Zanella
a1cb1888b7 powerpc: round/roundf refactor
This patches consolidates all the powerpc round{f} implementations on
the generic sysdeps/powerpc/fpu/s_round{f}.  The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.

The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
	ROUND handling.
	(round_mode): Add definition for ROUND.
	(round_to_integer_float): Likewise.
	* sysdeps/powerpc/fpu/s_round.c: New file.
	* sysdeps/powerpc/fpu/s_roundf.c: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_round.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.S:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.c: New
	file.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.c:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.c:
	Likewise.
	* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.c:
	Likewise.
	* sysdep/powerpc/powerpc32/power5+/fpu/s_round.S: Remove file.
	* sysdep/powerpc/powerpc32/power5+/fpu/s_roundf.S: Likewise.
	* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
	(libm-sysdep_routines): Add s_round-power5+, s_round-ppc64,
	s_roundf-power5+, and s_roundf-ppc64.
	(CFLAGS-s_round-power5+.c, CFLAGS-s_roundf-power5+.c): New rule.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-power5+.c: New
	file.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_round.c: ... here.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-power5+.c: New
	file.
	* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_roundf.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove s_round-power5+, s_round-ppc64,
	s_roundf-power5+, and s_roundf-ppc64.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Remove
	file.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S:
	Likewise.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise.
	* sysdep/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise.
	* sysdep/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
2019-05-09 09:39:07 -03:00
Adhemerval Zanella
252296c625 powerpc: floor/floorf refactor
This patches consolidates all the powerpc floor{f} implementations on
the generic sysdeps/powerpc/fpu/s_floor{f}.  The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.

The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode):
	Add FLOOR option.
	(round_mode): Add definition for FLOOR.
	* sysdeps/powerpc/fpu/s_floor.c: New file.
	* sysdeps/powerpc/fpu/s_floorf.c: Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.S:
	Remove file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.S:
	Likewise
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.c:
	New file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Remove file.
	* sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Remove file.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
	(libm-sysdep_routines): Add s_floor-power5+, s_floor-ppc64,
	s_floorf-power5+, and s_floorf-ppc64.
	(CFLAGS-s_floor-power5+.c, CFLAGS-s_floorf-power5+.c): New rule.
	* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-power5+.c: New
	file.
	* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floor.c: ... here.
	* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-power5+.c: New
	file.
	* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floorf.c: ... here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove s_floor-power5+, s_floor-ppc64,
	s_floorf-power5+, and s_floorf-ppc64.
	* sysdep/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Remove
	file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Remove
	file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise.
	* sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
2019-05-09 09:38:40 -03:00
Adhemerval Zanella
6cac323c8d powerpc: ceil/ceilf refactor
This patches consolidates all the powerpc ceil{f} implementations on
the generic sysdeps/powerpc/fpu/s_ceil{f}.  The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the frip
instruction) or a generic implementation which uses FP only operations.

It adds a generic implementation (round_to_integer.h) which is shared
with other rounding to integer routines.  The resulting code should be
similar in term os performance to previous assembly one.

The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline_nocheck): New
	function.
	* sysdeps/powerpc/fpu/round_to_integer.h: New file.
	* sysdeps/powerpc/fpu/s_ceil.c: Likewise.
	* sysdeps/powerpc/fpu/s_ceilf.c: Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Remove file.
	* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
	(CFLAGS-s_ceil-power5+.c, CFLAGS-s_ceilf-power5+.c): New rule.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.S:
	Remove file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.S:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.c:
	New file.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Remove file.
	* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Likewise.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile: New file.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-power5+.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil.c: ... here.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-power5+.c: New
	file.
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-ppc64.c:
	Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Move to ...
	* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf.c: ...
	* here.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
	(libm-sysdep_routines): Remove s_ceil-power5+, s_ceil-ppc64,
	s_ceilf-power5+, and s_ceilf-ppc64.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Remove
	file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise.
	* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise.
	* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
2019-04-29 08:43:37 -03:00
Adhemerval Zanella
c4c0848bbb powerpc: Remove power4 mpa optimization
This patch removes the POWER4 optimized mpa optimization used currently
on all powerpc targets.  In fact for newer chips, GCC generates *worse*
code than generic implementation as below.  One possibilty would to
add ifunc variants for the mpa routines (as x86_64), but it will add
complexity only for older chips (and one would need to check if
power5, power5+, and power6 do benefict from this optimization),
and only for specific implementation (since most used one such
as sin, cos, exp, pow where optimized to avoid calling the slow
multiprecision path).

* POWER9 patched
$ ./testrun.sh benchtests/bench-atan
  "atan": {
   "": {
    "duration": 5.12565e+09,
    "iterations": 1.552e+08,
    "max": 100.552,
    "min": 7.799,
    "mean": 33.0261
   },
   "144bits": {
    "duration": 5.12745e+09,
    "iterations": 825000,
    "max": 7517.17,
    "min": 6186.3,
    "mean": 6215.09
   }
  }
$ ./testrun.sh benchtests/bench-acos
  "acos": {
   "": {
    "duration": 5.21741e+09,
    "iterations": 1.269e+08,
    "max": 191.738,
    "min": 7.931,
    "mean": 41.1144
   },
   "slow": {
    "duration": 5.25999e+09,
    "iterations": 198000,
    "max": 26681.7,
    "min": 26463.6,
    "mean": 26565.6
   }
  }

* POWER9 master
$ ./testrun.sh benchtests/bench-atan
  "atan": {
   "": {
    "duration": 5.12815e+09,
    "iterations": 1.552e+08,
    "max": 134.788,
    "min": 7.803,
    "mean": 33.0422
   },
   "144bits": {
    "duration": 5.1209e+09,
    "iterations": 447000,
    "max": 11615.8,
    "min": 11301.8,
    "mean": 11456.2
   }
  }
$ ./testrun.sh benchtests/bench-acos
  "acos": {
   "": {
    "duration": 5.22272e+09,
    "iterations": 1.269e+08,
    "max": 115.981,
    "min": 7.931,
    "mean": 41.1562
   },
   "slow": {
    "duration": 5.28723e+09,
    "iterations": 96000,
    "max": 55434.1,
    "min": 54820.6,
    "mean": 55075.3
   }
  }

* POWER8 patched
$ taskset -c 16 ./testrun.sh benchtests/bench-acos
  "acos": {
   "": {
    "duration": 5.16398e+09,
    "iterations": 9.99e+07,
    "max": 174.408,
    "min": 8.645,
    "mean": 51.6915
   },
   "slow": {
    "duration": 5.16982e+09,
    "iterations": 96000,
    "max": 54830.5,
    "min": 53703.8,
    "mean": 53852.3
   }
  }
* POWER8 master
$ taskset -c 16 ./testrun.sh benchtests/bench-acos
  "acos": {
   "": {
    "duration": 5.17019e+09,
    "iterations": 9.99e+07,
    "max": 186.127,
    "min": 8.633,
    "mean": 51.7537
   },
   "slow": {
    "duration": 5.34225e+09,
    "iterations": 90000,
    "max": 60353.2,
    "min": 59155.3,
    "mean": 59358.4
   }
  }

* POWER7 patched
$ taskset -c 16 benchtests/bench-asin
  "asin": {
   "": {
    "duration": 5.15559e+09,
    "iterations": 6.5e+07,
    "max": 193.335,
    "min": 12.227,
    "mean": 79.3168
   },
   "slow": {
    "duration": 5.20538e+09,
    "iterations": 80000,
    "max": 65705.2,
    "min": 64299.4,
    "mean": 65067.3
   }
  }
* POWER7 master
$ taskset -c 16 benchtests/bench-asin
  "asin": {
   "": {
    "duration": 5.15446e+09,
    "iterations": 6.5e+07,
    "max": 184.575,
    "min": 12.226,
    "mean": 79.2994
   },
   "slow": {
    "duration": 5.20616e+09,
    "iterations": 80000,
    "max": 65705.1,
    "min": 64336.6,
    "mean": 65076.9
   }
  }

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/power4/fpu/Makefile: Remove file.
	* sysdeps/powerpc/power4/fpu/mpa-arch.h: Likewise.
	* sysdeps/powerpc/power4/fpu/mpa.c: Likewise.

Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
2019-04-29 08:43:03 -03:00
Adhemerval Zanella
52faba65f8 powerpc: Fix format issue from 3a16dd780e
* sysdeps/powerpc/fpu/s_fma.c: Fix format.
	* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
2019-04-17 18:32:01 -03:00
Adhemerval Zanella
3a16dd780e powerpc: fma using builtins
This patch just refactor the assembly implementation to use compiler
builtins instead.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/s_fma.S: Remove file.
	* sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
	* sysdeps/powerpc/fpu/s_fma.c: New file.
	* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
2019-04-17 15:14:53 -03:00
Adhemerval Zanella
1dac8bd6f2 powerpc: Use generic fabs{f} implementations
Since be2e25bbd7 the generic ieee754 implementation uses
compiler builtin which generates fabs{f} for all supported targets.

Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).

	* sysdeps/powerpc/fpu/s_fabs.S: Remove file.
	* sysdeps/powerpc/fpu/s_fabsf.S: Likewise.
2019-04-17 15:14:47 -03:00
Adhemerval Zanella
f82ed45d7f powerpc: Use generic wcsrchr optimization
This patch removes the power6 wcsrchr optimization and uses generic
implementation instead.  Currently, both power6 and power7 IFUNC variant
resulting binary are essentially the same and the generic implementation
with unrolling loop set to 8 also results in similar performance.

Checked on powerpc64-linux-gnu.

	* sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcsrchr.c):
	New rule.
	* sysdeps/powerpc/power6/wcsrchr.c: Remove file.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power6.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-power7.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcsrchr.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power6.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcsrchr-power7.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcsrchr-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcsrchr.c: Likewise.
	* sysdeps/powerpc/powerpc64/power6/wcsrchr.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
	[$(subdir) == wcsmbs] (sysdeps_routines): Remove wcsrchr-power6 and
	wcsrchr-power7.
	(CFLAGS-wcsrchr-power7.c, CFLAGS-wcsrchr-power6.c): Remove rule.
	* sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c:
	Remove wcsrchr optimizations.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
2019-04-04 16:01:14 +07:00
Adhemerval Zanella
421e3005ca powerpc: Use generic wcschr optimization
This patch removes the power6 wcschr optimization and uses generic
implementation instead.  Currently, both power6 and power7 IFUNC variant
resulting binary are essentially the same and the generic implementation
with unrolling loop set to 8 also results in similar performance.

Checked on powerpc64-linux-gnu.

	* sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcschr.c):
	New rule.
	* sysdeps/powerpc/power6/wcschr.c: Remove file.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power6.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-power7.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcschr-power6.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcschr-power7.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcschr-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise.
	* sysdeps/powerpc/powerpc64/power6/wcschr.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
	[$(subdir) == wcsmbs] (sysdeps_routines): Remove wcschr-power6 and
	wcschr-power7.
	(CFLAGS-wcschr-power7.c, CFLAGS-wcschr-power6.c): Remove rule.
	* sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c:
	Remove wcschr optimizations.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
2019-04-04 16:01:14 +07:00
Adhemerval Zanella
447a1306c3 powerpc: Use generic wcscpy optimization
This patch removes the power6 wcscpy optimization and uses generic
implementation instead.  Currently, both power6 and power7 IFUNC variant
resulting binary are essentially the same and the generic implementation
with unrolling loop set to 8 also results in similar performance.

Checked on powerpc64-linux-gnu.

	* sysdeps/powerpc/Makefile [$(subdir) == wcsmbs] (CFLAGS-wcscpy.c):
	New rule.
	* sysdeps/powerpc/power6/wcscpy.c: Remove file.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power6.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-power7.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c:
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy-power6.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy-power7.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy-ppc64.c: Likewise.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c: Likewise.
	* sysdeps/powerpc/powerpc64/power6/wcscpy.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/Makefile
	[$(subdir) == wcsmbs] (sysdeps_routines): Remove wcscpy-power6 and
	wcscpy-power7.
	(CFLAGS-wcscpy-power7.c, CFLAGS-wcscpy-power6.c): Remove rule.
	* sysdeps/powerpc/powerpc64/multiarch/Makefile: Likewise.
	* sysdeps/powerpc/powerpc32/power4/multiarch/ifunc-impl-list.c:
	Remove wcscpy optimizations.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
2019-04-04 16:01:14 +07:00
Paul A. Clarke
10cce66930 [powerpc] Use __builtin_{mffs,mtfsf}
Replace inline asm uses of the "mffs" and "mtfsf" instructions with
the analogous GCC builtins.

__builtin_mffs and __builtin_mtfsf are both available in GCC 5 and above.
Given the minimum GCC level for GLibC is now GCC 6.2, it is safe to use
these builtins without restriction.

2019-03-29  Paul A. Clarke  <pc@us.ibm.com>

	* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_register): Replace inline
	asm with builtin.
	* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (FP_INIT_ROUNDMODE):
	Likewise.
	* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (_GET_DI_FPSCR): Likewise.
	(_GET_SI_FPSCR): Likewise.
	(_SET_SI_FPSCR): Likewise.
2019-03-29 19:16:34 -05:00
Adhemerval Zanella
019638910e powerpc: Remove ununsed s_float_bitwise.h
This file is not used anywhere since removal of {k,e}_rem_pio2f.c
(commit ca3aac57ef).

Checked with a build for powerpc-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), powerpc64-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), and powerpc64le-linux (with --with-cpu=power8).

	* sysdeps/powerpc/fpu/s_float_bitwise.h: Remove file.
2019-03-25 15:34:50 -03:00
Adhemerval Zanella
1e372ded4f Refactor hp-timing rtld usage
This patch refactor how hp-timing is used on loader code for statistics
report.  The HP_TIMING_AVAIL and HP_SMALL_TIMING_AVAIL are removed and
HP_TIMING_INLINE is used instead to check for hp-timing avaliability.
For alpha, which only defines HP_SMALL_TIMING_AVAIL, the HP_TIMING_INLINE
is set iff for IS_IN(rtld).

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. I also
checked the builds for all afected ABIs.

	* benchtests/bench-timing.h: Replace HP_TIMING_AVAIL with
	HP_TIMING_INLINE.
	* nptl/descr.h: Likewise.
	* elf/rtld.c (RLTD_TIMING_DECLARE, RTLD_TIMING_NOW, RTLD_TIMING_DIFF,
	RTLD_TIMING_ACCUM_NT, RTLD_TIMING_SET): Define.
	(dl_start_final_info, _dl_start_final, dl_main, print_statistics):
	Abstract hp-timing usage with RTLD_* macros.
	* sysdeps/alpha/hp-timing.h (HP_TIMING_INLINE): Define iff IS_IN(rtld).
	(HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Remove.
	* sysdeps/generic/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL,
	HP_TIMING_NONAVAIL): Likewise.
	* sysdeps/ia64/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
	Likewise.
	* sysdeps/powerpc/powerpc32/power4/hp-timing.h (HP_TIMING_AVAIL,
	HP_SMALL_TIMING_AVAIL): Likewise.
	* sysdeps/powerpc/powerpc64/hp-timing.h (HP_TIMING_AVAIL,
	HP_SMALL_TIMING_AVAIL): Likewise.
	* sysdeps/sparc/sparc32/sparcv9/hp-timing.h (HP_TIMING_AVAIL,
	HP_SMALL_TIMING_AVAIL): Likewise.
	* sysdeps/sparc/sparc64/hp-timing.h (HP_TIMING_AVAIL,
	HP_SMALL_TIMING_AVAIL): Likewise.
	* sysdeps/x86/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
	Likewise.
	* sysdeps/generic/hp-timing-common.h: Update comment with
	HP_TIMING_AVAIL removal.
2019-03-22 17:30:44 -03:00
Stefan Liebler
7c6513082b Fix output of LD_SHOW_AUXV=1.
Starting with commit 1616d034b6
the output was corrupted on some platforms as _dl_procinfo
was called for every auxv entry and on some architectures like s390
all entries were represented as "AT_HWCAP".

This patch is removing the condition and let _dl_procinfo decide if
an entry is printed in a platform specific or generic way.
This patch also adjusts all _dl_procinfo implementations which assumed
that they are only called for AT_HWCAP or AT_HWCAP2. They are now just
returning a non-zero-value for entries which are not handled platform
specifc.

ChangeLog:

	* elf/dl-sysdep.c (_dl_show_auxv): Remove condition and always
	call _dl_procinfo.
	* sysdeps/unix/sysv/linux/s390/dl-procinfo.h (_dl_procinfo):
	Ignore types other than AT_HWCAP.
	* sysdeps/sparc/dl-procinfo.h (_dl_procinfo): Likewise.
	* sysdeps/unix/sysv/linux/i386/dl-procinfo.h (_dl_procinfo):
	Likewise.
	* sysdeps/powerpc/dl-procinfo.h (_dl_procinfo): Adjust comment
	in the case of falling back to generic output mechanism.
	* sysdeps/unix/sysv/linux/arm/dl-procinfo.h (_dl_procinfo):
	Likewise.
2019-03-13 10:45:35 +01:00
Joseph Myers
c5f65462a2 Break lines before not after operators, batch 4.
This patch fixes further coding style issues where code should have
broken lines before operators in accordance with the GNU Coding
Standards but instead was breaking lines after them.

Tested for x86_64, and with build-many-glibcs.py.

	* stdio-common/vfscanf-internal.c (ARG): Break lines before rather
	than after operators.
	* sysdeps/mach/hurd/setitimer.c (timer_thread): Likewise.
	(setitimer_locked): Likewise.
	* sysdeps/mach/hurd/sigaction.c (__sigaction): Likewise.
	* sysdeps/mach/hurd/sigaltstack.c (__sigaltstack): Likewise.
	* sysdeps/mach/pagecopy.h (PAGE_COPY_FWD): Likewise.
	* sysdeps/mach/thread_state.h (machine_get_basic_state): Likewise.
	* sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c
	(PPC_CPU_SUPPORTED): Likewise.
	* sysdeps/unix/sysv/linux/alpha/a.out.h (N_TXTOFF): Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/overflow.h
	(stat_overflow): Likewise.
	(statfs_overflow): Likewise.
	* sysdeps/unix/sysv/linux/tst-personality.c (do_test): Likewise.
	* sysdeps/unix/sysv/linux/tst-ttyname.c (eq_ttyname): Likewise.
	(eq_ttyname_r): Likewise.
	(run_chroot_tests): Likewise.
2019-03-07 20:20:25 +00:00
Gabriel F. T. Gomes
590675c079 powerpc: Fix build of wcscpy with --disable-multi-arch
Since the commit

commit 81a1443941
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Feb 5 17:35:12 2019 -0200

    wcsmbs: optimize wcscat

powerpc64 and powerpc64le builds fail when configured with
--disable-multi-arch and --with-cpu=power6 (or newer), due to an
undefined reference to __GI___wcscpy.  This patch fixes this on
sysdeps/powerpc/powerpc64/power6/wcscpy.c, which is only used when
multi-arch is disabled.

This patch does nothing for the failures on 32-bits powerpc builds,
because the file is under the powerpc64 subdirectory, however, powerpc
builds were already failing with --disable-multi-arch, with multiple
error messages, even before the aforementioned commit.

Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled
(all pass) and disabled (powerpc still fails as explained above).
2019-03-05 11:33:19 -03:00