We turn off this feature to avoid polluting our shared libary with
a specific value. However, static libgcc is not under our control,
and has enabled this for ibm128 routines. This pollutes the
resulting shared libraries with it.
Attach a post-linking hook to replace this section with one crafted
as hard-float + indeterminate ldbl. This allows IEEE ldbl users to
avoid having to disable the gnu attributes feature which should
protect them from linking ibm ldbl libraries using the gnu attributes
feature.
Currently, this only replaces libc and libm which support both ldbl
formats and rely on application code to explicitly determine which
is to be used.
Strictly speaking, the section could be deleted with minimal lost value.
However correctly set attributes could prove useful for some future change,
and similarly missing attributes.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
-mabi=ieeelongdouble triggers the stdc++ libraries _Float128
support, which then breaks if algorithm is included. For now,
explicitly disable _Float128 for such tests.
I have opened up GCC BZ 94080 to track this.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
I have observed a bug on 7.4.0 whereby __mulkc3 calls are
swapped with __multc3 depending on ABI selection. For the
sake of being overly cautious, build all _Float128 files
with ibm128 to workaround these compilers. This has been
noted in GCC BZ 84914, and will not be fixed for GCC 7.
Likewise, non-math files built with _Float128 are assumed
to have ibm long double. Explicilty preserve this
assumption.
Finally, add some bootstrapping code to avoid applying
these options until IEEE long double is enabled as they
require GCC 7 and above.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Adds a POWER9 version of fmaf128 that uses the xsmaddqp
instruction.
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Similar to string2.h (18b10de7ce) and string3.h (09a596cc2c) this
patch removes the fenvinline.h on all architectures. Currently
only powerpc implements some optimizations. This kind of optimization
is better implemented by the compiler (which handles the architecture
ISA transparently).
Also, for the specific optimized powerpc implementation the code is
becoming convoluted and these micro-optimization are hardly wildly
used, even more being a possible hotspot in realword cases
(non-default rounding are used only on specific cases and exception
handling are done most likely only on errors path). Only x86
implements similar optimization (on fenv.h) also indicates that
these should no be on libc.
The math/test-fenv already covers all math/test-fenvinline tests,
so it is safe to remove it.
The powerpc fegetround optimization is moved to internal
fenv_libc.h.
The BZ#94193 [1] the corresponding GCC bug for adding replacements
for these on powerpc.
Checked on x86_64-linux-gnu and powerpc64le-linux-gnu.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94193
Some of these files depend on the avoidance of using the various
register sets of POWER. When enabling the IEEE 128 long double,
we must be sure to disable this ABI as some compilers will
refuse to compile if -mno-vsx and -mabi=ieeelongdouble are both
present.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
In practice, this flag should be applied globally, but it makes a good
sanity check to ensure ibm128 and ieee128 long double files are not
getting mismatched. _Float128 files use no long double, thus are
always safe to use this option.
Similarly, when investigating the linker complaints, difftime
makes trivial, self contained, usage of long double, so thus it
is also explicitly marked as such.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This better resembles the default linking process with the gnulibs,
and also resolves the increasingly difficult to maintain
f128-loader-link usage on powerpc64le as some libgcc symbols are
dependent on those found in the loader (ld).
Ensure the correct ldouble abi flags are applied to ibm128 files and
nldbl files. Remove the IEEE options if used, and apply the flags
used to build ldouble files which are ibm128 abi.
nldbl tests are a little tricky. To use the support, we must remove
all ldouble abi flags, and ensure -mlong-double-64 is used.
Co-authored-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Co-authored-by: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
With mathinline removal there is no need to keep building and testing
inline math tests.
The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries. The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.
Checked on x86_64-linux-gnu and i686-linux-gnu.
A recent change to fenvinline.h modified the check if __e is a
a power of 2 inside feraiseexcept and feclearexcept macros. It
introduced the use of the powerof2 macro but also removed the
if statement checking whether __e != 0 before issuing an mtfsb*
instruction. This is problematic because powerof2 (0) evaluates
to 1 and without the removed if __e is allowed to be 0 when
__builtin_clz is called. In that case the value 32 is passed
to __MTFSB*, which is invalid.
This commit uses __builtin_popcount instead of powerof2 to fix this
issue and avoid the extra check for __e != 0. This was the approach
used by the initial versions of that previous patch.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This supersedes the init_array sysdeps directory. It allows us to
check for ELF_INITFINI in both C and assembler code, and skip DT_INIT
and DT_FINI processing completely on newer architectures.
A new header file is needed because <dl-machine.h> is incompatible
with assembler code. <sysdep.h> is compatible with assembler code,
but it cannot be included in all assembler files because on some
architectures, it redefines register names, and some assembler files
conflict with that.
<elf-initfini.h> is replicated for legacy architectures which need
DT_INIT/DT_FINI support. New architectures follow the generic default
and disable it.
With all Linux ABIs using the expected Linux kABI to indicate
syscalls errors, the INTERNAL_SYSCALL_DECL is an empty declaration
on all ports.
This patch removes the 'err' argument on INTERNAL_SYSCALL* macro
and remove the INTERNAL_SYSCALL_DECL usage.
Checked with a build against all affected ABIs.
When unwinding through a signal frame the backtrace function on PowerPC
didn't check array bounds when storing the frame address. Fixes commit
d400dcac5e ("PowerPC: fix backtrace to handle signal trampolines").
GCC 10.0 enabled -fno-common by default and this started to point that
__cache_line_size had been implemented in 2 different places: loader and
libc.
In order to avoid this duplication, the libc variable has been removed
and the loader variable is moved to rtld_global_ro.
File sysdeps/unix/sysv/linux/powerpc/dl-auxv.h has been added in order
to reuse code for both static and dynamic linking scenarios.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch moves the vDSO setup from libc to loader code, just after
the vDSO link_map setup. For static case the initialization
is moved to _dl_non_dynamic_init instead.
Instead of using the mangled pointer, the vDSO data is set as
attribute_relro (on _rtld_global_ro for shared or _dl_vdso_* for
static). It is read-only even with partial relro.
It fixes BZ#24967 now that the vDSO pointer is setup earlier than
malloc interposition is called.
Also, vDSO calls should not be a problem for static dlopen as
indicated by BZ#20802. The vDSO pointer would be zero-initialized
and the syscall will be issued instead.
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
arm-linux-gnueabihf, powerpc64le-linux-gnu, powerpc64-linux-gnu,
powerpc-linux-gnu, s390x-linux-gnu, sparc64-linux-gnu, and
sparcv9-linux-gnu. I also run some tests on mips.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol. It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).
The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.
Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).
Passes buildmanyglibc.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This patch adds the missing bits for powerpc and fixes both
tst-ifunc-fault-lazy and tst-ifunc-fault-bindnow failures on
powerpc-linux-gnu.
Checked on powerpc-linux-gnu and powerpc-linux-gnu-power4.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Since commit a3cc4f48e9 ("Remove
--as-needed configure test."), --as-needed support is no longer
optional.
The macros are not much shorter and do not provide documentary
value, either, so this commit removes them.
This patch adds a default pthreadtypes-arch.h, the idea is to simpify
new ports inclusion and an override is required only if the architecture
adds some arch-specific extensions or requirement.
The default values on the new generic header are based on current
architecture define value and they are not optimal compared to current
code requirements as below.
- On 64 bits __SIZEOF_PTHREAD_BARRIER_T is defined as 32 while is
sizeof (struct pthread_barrier) is 20 bytes.
- On 32 bits __SIZEOF_PTHREAD_ATTR_T is defined as 36 while
sizeof (struct pthread_attr) is 32.
The default values are not changed so the generic header could be
used by some architectures.
Checked with a build on affected abis.
Change-Id: Ie0cd586258a2650f715c1af0c9fe4e7063b0409a
This patch adds a new generic __pthread_rwlock_arch_t definition meant
to be used by new ports. Its layout mimics the current usage on some
64 bits ports and it allows some ports to use the generic definition.
The arch __pthread_rwlock_arch_t definition is moved from
pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h).
Also the static intialization macro for pthread_rwlock_t is set to use
an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its
implementation.
The default pthread_rwlock_t layout differs from current ports with:
1. Internal layout is the same for 32 bits and 64 bits.
2. Internal flag is an unsigned short so it should not required
additional padding to align for word boundary (if it is the case
for the ABI).
Checked with a build on affected abis.
Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0
The current way of defining the common mutex definition for POSIX and
C11 on pthreadtypes-arch.h (added by commit 06be6368da) is
not really the best options for newer ports. It requires define some
misleading flags that should be always defined as 0
(__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it
exposes options used solely for linuxthreads compat mode
(__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and
requires newer ports to explicit define them (adding more boilerplate
code).
This patch adds a new default __pthread_mutex_s definition meant to
be used by newer ports. Its layout mimics the current usage on both
32 and 64 bits ports and it allows most ports to use the generic
definition. Only ports that use some arch-specific definition (such
as hardware lock-elision or linuxthreads compat) requires specific
headers.
For 32 bit, the generic definitions mimic the other 32-bit ports
of using an union to define the fields uses on adaptive and robust
mutexes (thus not allowing both usage at same time) and by using a
single linked-list for robust mutexes. Both decisions seemed to
follow what recent ports have done and make the resulting
pthread_mutex_t/mtx_t object smaller.
Also the static intialization macro for pthread_mutex_t is set to use
a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine
in its struct_mutex.h if it requires additional fields to be
initialized.
Checked with a build on affected abis.
Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13
The new rwlock implementation added by cc25c8b4c1 (2.25) removed
support for lock-elision. This patch removes remaining the
arch-specific unused definitions.
Checked with a build against all affected ABIs.
Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6
This patch new build tests to check for internal fields offsets for
internal pthread_rwlock_t definition. Althoug the '__data.__flags'
field layout should be preserved due static initializators, the patch
also adds tests for the futexes that may be used in a shared memory
(although using different libc version in such scenario is not really
supported).
Checked with a build against all affected ABIs.
Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148
The offsets of pthread_mutex_t __data.__nusers, __data.__spins,
__data.elision, __data.list are not required to be constant over
the releases. Only the __data.__kind is used for static
initializers.
This patch also adds an additional size check for __data.__kind.
Checked with a build against affected ABIs.
Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f
Since at least POWER8, there is no performance advantage to entering
"Ignore Exceptions Mode", and doing so conditionally requires
- the conditional logic, and
- a system call.
Make it a no-op for uses within glibc.
With only two exceptions (sys/types.h and sys/param.h, both of which
historically might have defined BYTE_ORDER) the public headers that
include <endian.h> only want to be able to test __BYTE_ORDER against
__*_ENDIAN.
This patch creates a new bits/endian.h that can be included by any
header that wants to be able to test __BYTE_ORDER and/or
__FLOAT_WORD_ORDER against the __*_ENDIAN constants, or needs
__LONG_LONG_PAIR. It only defines macros in the implementation
namespace.
The existing bits/endian.h (which could not be included independently
of endian.h, and only defines __BYTE_ORDER and maybe __FLOAT_WORD_ORDER)
is renamed to bits/endianness.h. I also took the opportunity to
canonicalize the form of this header, which we are stuck with having
one copy of per architecture. Since they are so short, this means git
doesn’t understand that they were renamed from existing headers, sigh.
endian.h itself is a nonstandard header and its only remaining use
from a standard header is guarded by __USE_MISC, so I dropped the
__USE_MISC conditionals from around all of the public-namespace things
it defines. (This means, an application that requests strict library
conformance but includes endian.h will still see the definition of
BYTE_ORDER.)
A few changes to specific bits/endian(ness).h variants deserve
mention:
- sysdeps/unix/sysv/linux/ia64/bits/endian.h is moved to
sysdeps/ia64/bits/endianness.h. If I remember correctly, ia64 did
have selectable endianness, but we have assembly code in
sysdeps/ia64 that assumes it’s little-endian, so there is no reason
to treat the ia64 endianness.h as linux-specific.
- The C-SKY port does not fully support big-endian mode, the compile
will error out if __CSKYBE__ is defined.
- The PowerPC port had extra logic in its bits/endian.h to detect a
broken compiler, which strikes me as unnecessary, so I removed it.
- The only files that defined __FLOAT_WORD_ORDER always defined it to
the same value as __BYTE_ORDER, so I removed those definitions.
The SH bits/endian(ness).h had comments inconsistent with the
actual setting of __FLOAT_WORD_ORDER, which I also removed.
- I *removed* copyright boilerplate from the few bits/endian(ness).h
headers that had it; these files record a single fact in a fashion
dictated by an external spec, so I do not think they are copyrightable.
As long as I was changing every copy of ieee754.h in the tree, I
noticed that only the MIPS variant includes float.h, because it uses
LDBL_MANT_DIG to decide among three different versions of
ieee854_long_double. This patch makes it not include float.h when
GCC’s intrinsic __LDBL_MANT_DIG__ is available.
* string/endian.h: Unconditionally define LITTLE_ENDIAN,
BIG_ENDIAN, PDP_ENDIAN, and BYTE_ORDER. Condition byteswapping
macros only on !__ASSEMBLER__. Move the definitions of
__BIG_ENDIAN, __LITTLE_ENDIAN, __PDP_ENDIAN, __FLOAT_WORD_ORDER,
and __LONG_LONG_PAIR to...
* string/bits/endian.h: ...this new file, which includes
the renamed header bits/endianness.h for the definition of
__BYTE_ORDER and possibly __FLOAT_WORD_ORDER.
* string/Makefile: Install bits/endianness.h.
* include/bits/endian.h: New wrapper.
* bits/endian.h: Rename to bits/endianness.h.
Add multiple-include guard. Rewrite the comment explaining what
the machine-specific variants of this file should do.
* sysdeps/unix/sysv/linux/ia64/bits/endian.h:
Move to sysdeps/ia64.
* sysdeps/aarch64/bits/endian.h
* sysdeps/alpha/bits/endian.h
* sysdeps/arm/bits/endian.h
* sysdeps/csky/bits/endian.h
* sysdeps/hppa/bits/endian.h
* sysdeps/ia64/bits/endian.h
* sysdeps/m68k/bits/endian.h
* sysdeps/microblaze/bits/endian.h
* sysdeps/mips/bits/endian.h
* sysdeps/nios2/bits/endian.h
* sysdeps/powerpc/bits/endian.h
* sysdeps/riscv/bits/endian.h
* sysdeps/s390/bits/endian.h
* sysdeps/sh/bits/endian.h
* sysdeps/sparc/bits/endian.h
* sysdeps/x86/bits/endian.h:
Rename to endianness.h; canonicalize form of file; remove
redundant definitions of __FLOAT_WORD_ORDER.
* sysdeps/powerpc/bits/endianness.h: Remove logic to check for
broken compilers.
* ctype/ctype.h
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
* sysdeps/csky/nptl/bits/pthreadtypes-arch.h
* sysdeps/ia64/ieee754.h
* sysdeps/ieee754/ieee754.h
* sysdeps/ieee754/ldbl-128/ieee754.h
* sysdeps/ieee754/ldbl-128ibm/ieee754.h
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
* sysdeps/mips/ieee754/ieee754.h
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
* sysdeps/nptl/pthread.h
* sysdeps/riscv/nptl/bits/pthreadtypes-arch.h
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
* sysdeps/sparc/sparc32/ieee754.h
* sysdeps/unix/sysv/linux/generic/bits/stat.h
* sysdeps/unix/sysv/linux/generic/bits/statfs.h
* sysdeps/unix/sysv/linux/sys/acct.h
* wctype/bits/wctype-wchar.h:
Include bits/endian.h, not endian.h.
* sysdeps/unix/sysv/linux/hppa/pthread.h: Don’t include endian.h.
* sysdeps/mips/ieee754/ieee754.h: Use __LDBL_MANT_DIG__
in ifdefs, instead of LDBL_MANT_DIG. Only include float.h
when __LDBL_MANT_DIG__ is not predefined, in which case
define __LDBL_MANT_DIG__ to equal LDBL_MANT_DIG.
fesetenv_mode is used variously to write the FPSCR exception enable
bits and rounding mode bits. These are referred to as the control
bits in the POWER ISA. Change the name to be reflective of its
current and expected use, and match up well with fegetenv_control.
libc_feholdsetround_noex_ppc_ctx currently performs:
1. Read FPSCR, save to context.
2. Create new FPSCR value: clear enables and set new rounding mode.
3. Write new value to FPSCR.
Since other bits just pass through, there is no need to write them.
Instead, write just the changed values (enables and rounding mode),
which can be a bit more efficient.
fegetenv_status is used variously to retrieve the FPSCR exception enable
bits, rounding mode bits, or both. These are referred to as the control
bits in the POWER ISA. FPSCR status bits are also returned by the
'mffs' and 'mffsl' instructions, but they are uniformly ignored by all
uses of fegetenv_status. Change the name to be reflective of its
current and expected use.
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
On POWER9, use more efficient means to update the 2-bit rounding mode
via the 'mffscrn' instruction (instead of two 'mtfsb0/1' instructions
or one 'mtfsfi' instruction that modifies 4 bits).
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
ROUND_TO_ODD and a couple of other places use libc_feupdateenv_test to
restore the rounding mode and exception enables, preserve exception flags,
and test whether given exception(s) were generated.
If the exception flags haven't changed, then it is sufficient and a bit
more efficient to just restore the rounding mode and enables, rather than
writing the full Floating-Point Status and Control Register (FPSCR).
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
fenv_private.h includes unused functions, magic macro constants, and
some replicated common code fragments.
Remove unused functions, replace magic constants with constants from
fenv_libc.h, and refactor replicated code.
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-By: Paul E Murphy <murphyp@linux.ibm.com>
SET_RESTORE_ROUND brackets a block of code, temporarily setting and
restoring the rounding mode and letting everything else, including
exceptions generated within the block, pass through.
On powerpc, the current code clears the exception enables, which will hide
exceptions generated within the block. This issue was introduced by me
in commit e905212627.
Fix this by not clearing exception enable bits in the prologue.
Also, since we are no longer changing the enable bits in either the
prologue or the epilogue, there is no need to test for entering/exiting
non-stop mode.
Also, optimize the prologue get/save/set rounding mode operations for
POWER9 and later by using 'mffscrn' when possible.
Suggested-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Fixes: e905212627
2019-09-19 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_and_set_rn): New.
(__fe_mffscrn): New.
* sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc_ctx):
Do not clear enable bits, remove obsolete code, use
fegetenv_and_set_rn.
(libc_feresetround_ppc): Remove obsolete code, use
fegetenv_and_set_rn.
Linux vDSO initialization code the internal function pointers require a
lot of duplicated boilerplate over different architectures. This patch
aims to simplify not only the code but the required definition to enable
a vDSO symbol.
The changes are:
1. Consolidate all init-first.c on only one implementation and enable
the symbol based on HAVE_*_VSYSCALL existence.
2. Set the HAVE_*_VSYSCALL to the architecture expected names string.
3. Add a new internal implementation, get_vdso_mangle_symbol, which
returns a mangled function pointer.
Currently the clock_gettime, clock_getres, gettimeofday, getcpu, and time
are handled in an arch-independent way, powerpc still uses some
arch-specific vDSO symbol handled in a specific init-first implementation.
Checked on aarch64-linux-gnu, arm-linux-gnueabihf, i386-linux-gnu,
mips64-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu,
sparc64-linux-gnu, and x86_64-linux-gnu.
* sysdeps/powerpc/powerpc32/backtrace.c (is_sigtramp_address,
is_sigtramp_address_rt): Use HAVE_SIGTRAMP_{RT}32 instead of SHARED.
* sysdeps/powerpc/powerpc64/backtrace.c (is_sigtramp_address):
Likewise.
* sysdeps/unix/sysv/linux/aarch64/init-first.c: Remove file.
* sysdeps/unix/sysv/linux/aarch64/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/arm/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/arm/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/mips/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/mips/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/i386/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/riscv/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/riscv/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/s390/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/s390/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/sparc/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/x86/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/init-first.c: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sysdep.h
(HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Define value based on kernel exported
name.
* sysdeps/unix/sysv/linux/arm/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/mips/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/powerpc/sysdep.h
(HAVE_CLOCK_GETRES_VSYSCALL, HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETCPU_VSYSCALL, HAVE_TIME_VSYSCALL, HAVE_GET_TBFREQ,
HAVE_SIGTRAMP_RT64, HAVE_SIGTRAMP_32, HAVE_SIGTRAMP_RT32i,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/riscv/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL,
HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
HAVE_GETCPU_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/s390/sysdep.h (HAVE_CLOCK_GETRES_VSYSCALL,
HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
HAVE_GETCPU_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/sparc/sysdep.h (HAVE_CLOCK_GETTIME_VSYSCALL,
HAVE_GETTIMEOFDAY_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep.h
(HAVE_CLOCK_GETTIME_VSYSCALL, HAVE_GETTIMEOFDAY_VSYSCALL,
HAVE_GETCPU_VSYSCALL): Likewise.
* sysdeps/unix/sysv/linux/dl-vdso.h (VDSO_NAME, VDSO_HASH): Define to
invalid names if architecture does not define them.
(get_vdso_mangle_symbol): New symbol.
* sysdeps/unix/sysv/linux/init-first.c: New file.
* sysdeps/unix/sysv/linux/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/init-first.c (gettimeofday,
clock_gettime, clock_getres, getcpu, time): Remove declaration.
(__libc_vdso_platform_setup_arch): Likewise and use
get_vdso_mangle_symbol to setup vDSO symbols.
(sigtramp_rt64, sigtramp32, sigtramp_rt32, get_tbfreq): Add
attribute_hidden.
* sysdeps/unix/sysv/linux/powerpc/libc-vdso.h: Likewise.
* sysdeps/unix/sysv/linux/sysdep-vdso.h (VDSO_SYMBOL): Remove
definition.
fegetenv_status() wants to use the lighter weight instruction 'mffsl'
for reading the Floating-Point Status and Control Register (FPSCR).
It currently will use it directly if compiled '-mcpu=power9', and will
perform a runtime check (cpu_supports("arch_3_00")) otherwise.
Nicely, it turns out that the 'mffsl' instruction will decode to
'mffs' on architectures older than "arch_3_00" because the additional
bits set for 'mffsl' are "don't care" for 'mffs'. 'mffs' is a superset
of 'mffsl'.
So, just generate 'mffsl'.
fesetenv() reads the current value of the Floating-Point Status and Control
Register (FPSCR) to determine the difference between the current state of
exception enables and the newly requested state. All of these bits are also
returned by the lighter weight 'mffsl' instruction used by fegetenv_status().
Use that instead.
Also, remove a local macro _FPU_MASK_ALL in favor of a common macro,
FPU_ENABLES_MASK from fenv_libc.h.
Finally, use a local variable ('new') in favor of a pointer dereference
('*envp').
SET_RESTORE_ROUND uses libc_feholdsetround_ppc_ctx and
libc_feresetround_ppc_ctx to bracket a block of code where the floating point
rounding mode must be set to a certain value.
For the *prologue*, libc_feholdsetround_ppc_ctx is used and performs:
1. Read/save FPSCR.
2. Create new value for FPSCR with new rounding mode and enables cleared.
3. If new value is different than current value,
a. If transitioning from a state where some exceptions enabled,
enter "ignore exceptions / non-stop" mode.
b. Write new value to FPSCR.
c. Put a mark on the wall indicating the FPSCR was changed.
(1) uses the 'mffs' instruction. On POWER9, the lighter weight 'mffsl'
instruction can be used, but it doesn't return all of the bits in the FPSCR.
fegetenv_status uses 'mffsl' on POWER9, 'mffs' otherwise, and can thus be
used instead of fegetenv_register.
(3b) uses 'mtfsf 0b11111111' to write the entire FPSCR, so it must
instead use 'mtfsf 0b00000011' to write just the enables and the mode,
because some of the rest of the bits are not valid if 'mffsl' was used.
fesetenv_mode uses 'mtfsf 0b00000011' on POWER9, 'mtfsf 0b11111111'
otherwise.
For the *epilogue*, libc_feresetround_ppc_ctx checks the mark on the wall, then
calls libc_feresetround_ppc, which just calls __libc_femergeenv_ppc with
parameters such that it performs:
1. Retreive saved value of FPSCR, saved in prologue above.
2. Read FPSCR.
3. Create new value of FPSCR where:
- Summary bits and exception indicators = current OR saved.
- Rounding mode and enables = saved.
- Status bits = current.
4. If transitioning from some exceptions enabled to none,
enter "ignore exceptions / non-stop" mode.
5. If transitioning from no exceptions enabled to some,
enter "catch exceptions" mode.
6. Write new value to FPSCR.
The summary bits are hardwired to the exception indicators, so there is no
need to restore any saved summary bits.
The exception indicator bits, which are sticky and remain set unless
explicitly cleared, would only need to be restored if the code block
might explicitly clear any of them. This is certainly not expected.
So, the only bits that need to be restored are the enables and the mode.
If it is the case that only those bits are to be restored, there is no need to
read the FPSCR. Steps (2) and (3) are unnecessary, and step (6) only needs to
write the bits being restored.
We know we are transitioning out of "ignore exceptions" mode, so step (4) is
unnecessary, and in step (6), we only need to check the state we are
entering.
Since fe{en,dis}ableexcept() and fesetmode() read-modify-write just the
"mode" (exception enable and rounding mode) bits of the Floating Point Status
Control Register (FPSCR), the lighter weight 'mffsl' instruction can be used
to read the FPSCR (enables and rounding mode), and 'mtfsf 0b00000011' can be
used to write just those bits back to the FPSCR. The net is better performance.
In addition, fe{en,dis}ableexcept() read the FPSCR again after writing it, or
they determine that it doesn't need to be written because it is not changing.
In either case, the local variable holds the current values of the enable
bits in the FPSCR. This local variable can be used instead of again reading
the FPSCR.
Also, that value of the FPSCR which is read the second time is validated
against the requested enables. Since the write can't fail, this validation
step is unnecessary, and can be removed. Instead, the exceptions to be
enabled (or disabled) are transformed into available bits in the FPSCR,
then validated after being transformed back, to ensure that all requested
bits are actually being set. For example, FE_INVALID_SQRT can be
requested, but cannot actually be set. This bit is not mapped during the
transformations, so a test for that bit being set before and after
transformations will show the bit would not be set, and the function will
return -1 for failure.
Finally, convert the local macros in fesetmode.c to more generally useful
macros in fenv_libc.h.
The exceptions passed to fe{en,dis}ableexcept() are defined in the ABI
as a bitmask, a combination of FE_INVALID, FE_OVERFLOW, etc.
Within the functions, these bits must be translated to/from the corresponding
enable bits in the Floating Point Status Control Register (FPSCR).
This translation is currently done bit-by-bit. The compiler generates
a series of conditional bit operations. Nicely, the "FE" exception
bits are all a uniform offset from the FPSCR enable bits, so the bit-by-bit
operation can instead be performed by a shift with appropriate masking.
Some implementations in sysdeps/powerpc/powerpc64/power8/*.S still had
pre power8 compatible binutils hardcoded macros and were not using
.machine power8.
This patch should not have semantic changes, in fact it should have the
same exact code generated.
Tested that generated stripped shared objects are identical when
using "strip --remove-section=.note.gnu.build-id".
Checked on:
- powerpc64le, power9, build-many-glibcs.py, gcc 6.4.1 20180104, binutils 2.26.2.20160726
- powerpc64le, power8, debian 9, gcc 6.3.0 20170516, binutils 2.28
- powerpc64le, power9, ubuntu 19.04, gcc 8.3.0, binutils 2.32
- powerpc64le, power9, opensuse tumbleweed, gcc 9.1.1 20190527, binutils 2.32
- powerpc64, power9, debian 10, gcc 8.3.0, binutils 2.31.1
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
Using __builtin_cpu_supports() requires support in GCC and Glibc.
My recent patch to fenv_libc.h added an unprotected use of
__builtin_cpu_supports(). Compilation of Glibc itself will fail
with a sufficiently new GCC and sufficiently old Glibc:
../sysdeps/powerpc/fpu/fegetexcept.c: In function ‘__fegetexcept’:
../sysdeps/powerpc/fpu/fenv_libc.h:52:20: error: builtin ‘__builtin_cpu_supports’ needs GLIBC (2.23 and newer) that exports hardware capability bits [-Werror]
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Fixes 3db85a9814.
The power7 logb implementation does not show a performance gain on
ISA 2.07+ chips with faster floating-point to GRP instructions
(currently POWER8 and POWER9).
This patch moves the POWER7 implementation to generic one and enables
it for POWER7. It also add some cleanup to use inline floating-point
number instead of define them using static const.
The performance difference is for POWER9:
- Without patch:
"logb": {
"subnormal": {
"duration": 4.99202e+09,
"iterations": 8.83662e+08,
"max": 75.194,
"min": 5.501,
"mean": 5.64925
},
"normal": {
"duration": 4.97063e+09,
"iterations": 9.97094e+08,
"max": 46.489,
"min": 4.956,
"mean": 4.98512
}
}
- With patch:
"logb": {
"subnormal": {
"duration": 4.97226e+09,
"iterations": 9.92036e+08,
"max": 77.209,
"min": 4.892,
"mean": 5.01218
},
"normal": {
"duration": 4.96192e+09,
"iterations": 1.07545e+09,
"max": 12.361,
"min": 4.593,
"mean": 4.61382
}
}
The ifunc implementation is also enabled only for powerpc64.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/power7/fpu/s_logb.c: Move to ...
* sysdeps/powerpc/fpu/s_logb.c: ... here. Use inline FP constants.
* sysdeps/powerpc/power7/fpu/s_logbf.c: Move to ...
* sysdeps/powerpc/fpu/s_logbf.c: ... here. Use inline FP constants.
* sysdeps/powerpc/power7/fpu/s_logbl.c: Move to ...
* sysdeps/powerpc/fpu/s_logbl.c: ... here. Use inline FP constants.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbl-power7.c:
Adjust implementation path.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_log* objects.
(CFLAGS-s_logbf-power7.c, CFLAGS-s_logbl-power7.c,
CFLAGS-s_logb-power7.c): New fule.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logb.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-power7.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-power7.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl-ppc64.c: Move
to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl-ppc64.c:
... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbl.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_logbl.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile: Remove file.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logb.c: Remove file.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logbf.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_logbl.c: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
Using 'mffs' instruction to read the Floating Point Status Control Register
(FPSCR) can force a processor flush in some cases, with undesirable
performance impact. If the values of the bits in the FPSCR which force the
flush are not needed, an instruction that is new to POWER9 (ISA version 3.0),
'mffsl' can be used instead.
Cases included: get_rounding_mode, fegetround, fegetmode, fegetexcept.
* sysdeps/powerpc/bits/fenvinline.h (__fegetround): Use
__fegetround_ISA300() or __fegetround_ISA2() as appropriate.
(__fegetround_ISA300) New.
(__fegetround_ISA2) New.
* sysdeps/powerpc/fpu_control.h (IS_ISA300): New.
(_FPU_MFFS): Move implementation...
(_FPU_GETCW): Here.
(_FPU_MFFSL): Move implementation....
(_FPU_GET_RC_ISA300): Here. New.
(_FPU_GET_RC): Use _FPU_GET_RC_ISA300() or _FPU_GETCW() as appropriate.
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_status_ISA300): New.
(fegetenv_status): New.
* sysdeps/powerpc/fpu/fegetmode.c (fegetmode): Use fegetenv_status()
instead of fegetenv_register().
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Likewise.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Add 'volatile' keyword to a few asm statements, to force the compiler
to generate the instructions therein.
Some instances were implicitly volatile, but adding keyword for consistency.
2019-06-19 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (relax_fenv_state): Add 'volatile'.
* sysdeps/powerpc/fpu/fpu_control.h (__FPU_MFFS): Likewise.
(__FPU_MFFSL): Likewise.
(_FPU_SETCW): Likewise.
This patches consolidates all the powerpc llrint{f} implementations on
the generic sysdeps/powerpc/fpu/s_llrint{f}.
The IFUNC support is also moved only to powerpc64 only, since for
powerpc64le generic implementation resulting in optimized code.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and
s_llrint-ppc64.
(CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New
file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file.
* sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_llrint-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file.
* sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
The powerpc finite optimization do not show much gain:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power7 uses ftdiv to optimize for some input patterns, but at
cost of others. Comparing against generic C implementation built
for powerpc64-linux-gnu-power7 (--with-cpu=power7):
- Generic sysdeps/ieee754 implementation:
"isfinite": {
"": {
"duration": 5.0082e+09,
"iterations": 2.45299e+09,
"max": 43.824,
"min": 2.008,
"mean": 2.04167
},
"INF": {
"duration": 4.66554e+09,
"iterations": 2.28288e+09,
"max": 35.73,
"min": 2.008,
"mean": 2.04371
},
"NAN": {
"duration": 4.66274e+09,
"iterations": 2.28716e+09,
"max": 34.161,
"min": 2.009,
"mean": 2.03866
}
}
- power7 optimized one:
"isfinite": {
"": {
"duration": 4.99111e+09,
"iterations": 2.65566e+09,
"max": 25.015,
"min": 1.716,
"mean": 1.87942
},
"INF": {
"duration": 4.6783e+09,
"iterations": 2.0999e+09,
"max": 35.264,
"min": 1.868,
"mean": 2.22787
},
"NAN": {
"duration": 4.67915e+09,
"iterations": 2.08678e+09,
"max": 38.099,
"min": 1.869,
"mean": 2.24228
}
}
So it basically optimizes marginally for normal numbers while
increasing the latency for other kind of FP.
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_finite*
objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
Remove s_finite* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
The powerpc isinf optimizations onyl adds complexity:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power7 uses ftdiv to optimize for some input pattern and branch
implementation for INF and denormal that does:
return (ix & UINT64_C (0x7fffffffffffffff)) == UINT64_C (0x7ff0000000000000)
Although it does show slight better latency than generic algorithm
(as below), it is only for power7 and requires it to override it
for power8.
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_isinf* and s_isinf*
objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-power7.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isinff.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
Remove s_isinf* and s_isinf* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isinff.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isinff.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
The powerpc isnan optimizations are not really a gain:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power5, power6, and power6x are just micro-optimization to
improve the Load-Hit-Store hazards from floating-point to general
register transfer, and current GCC already has support to minimize
it by inserting either extra nops or group dispatch instructions.
- The power7 uses ftdiv to optimize for some input patterns, but at
cost of others. Comparing against generic C implementation built
for powerpc-linux-gnu-power4 (which uses the hp-timing support on
benchtests):
- Generic sysdeps/ieee754 implementation:
"isnan": {
"": {
"duration": 4.98415e+09,
"iterations": 2.34516e+09,
"max": 45.925,
"min": 2.052,
"mean": 2.12529
},
"INF": {
"duration": 4.74057e+09,
"iterations": 1.69761e+09,
"max": 91.01,
"min": 2.052,
"mean": 2.79249
},
"NAN": {
"duration": 4.74071e+09,
"iterations": 1.68768e+09,
"max": 282.343,
"min": 2.052,
"mean": 2.809
}
}
- power7 optimized one:
$ ./testrun.sh benchtests/bench-isnan
"isnan": {
"": {
"duration": 4.96842e+09,
"iterations": 2.56297e+09,
"max": 50.048,
"min": 1.872,
"mean": 1.93854
},
"INF": {
"duration": 4.76648e+09,
"iterations": 1.54213e+09,
"max": 373.408,
"min": 2.661,
"mean": 3.09084
},
"NAN": {
"duration": 4.76845e+09,
"iterations": 1.54515e+09,
"max": 51.016,
"min": 2.736,
"mean": 3.08607
}
}
So it basically optimizes marginally for normal numbers while
increasing the latency for other kind of FP.
- The generic implementation requires getting the floating point
status, disable the invalid operation bit, and restore the
floating-point status. Each operation is costly and requires
flushing the FP pipeline.
Using the same scenarion for the previous analysis:
"isnan": {
"": {
"duration": 5.08284e+09,
"iterations": 6.2898e+08,
"max": 41.844,
"min": 8.057,
"mean": 8.08108
},
"INF": {
"duration": 4.97904e+09,
"iterations": 6.16176e+08,
"max": 39.661,
"min": 8.057,
"mean": 8.08055
},
"NAN": {
"duration": 4.98695e+09,
"iterations": 5.95866e+08,
"max": 29.728,
"min": 8.345,
"mean": 8.36925
}
}
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_isnan.c: Remove file.
* sysdeps/powerpc/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and
s_isnanf-* objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S:
Remove file
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls):
Remove s_isnan-* and s_isnanf-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
GCC always expand copysign{f} for all possible cpus, so calling the libm
is only done if user explicitly states to disable the builtin (which is
done usually not for performance reason). So to provide ifunc variant
for copysign is just unrequired complexity, since libm will be called
on non-performance critical code.
This patch removes both powerpc32 and powerpc64 ifunc variants and
consolidates the powerpc implementation on
sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_copysign.c: New file.
* sysdeps/powerpc/fpu/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and
s_copysign-ppc32.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c:
Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls):
Remove s_copysign-power6 s_copysign-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
This patches consolidates all the powerpc rint{f} implementations on
the generic sysdeps/powerpc/fpu/s_rint{f}.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode,
round_to_integer_float, round_mode): Add RINT handling.
(reset_fenv_mode): New symbol.
* sysdeps/powerpc/fpu/s_rint.c (__rint): Use generic implementation.
* sysdeps/powerpc/fpu/s_rintf.c (__rintf): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
Add support to use 'mffsl' instruction if compiled for POWER9 (or later).
Also, mask the result to avoid bleeding unrelated bits into the result of
_FPU_GET_RC().
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
fegetexcept() included code which exactly duplicates the code in
fenv_reg_to_exceptions(). Replace with a call to that function.
2019-06-05 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Replace code
with call to equivalent function.
Since GCC commit 271500 (svn), also known as the following commit on the
git mirror:
commit 61edec870f9fdfb5df3fa4e40f28cbaede28a5b1
Author: amodra <amodra@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Wed May 22 04:34:26 2019 +0000
[RS6000] Don't pass -many to the assembler
glibc builds are failing when an assembly implementation does not
declare the correct '.machine' directive, or when no such directive is
declared at all. For example, when a POWER6 instruction is used, but
'.machine power6' is not declared, the assembler will fail with an error
similar to the following:
../sysdeps/powerpc/powerpc64/power8/strcmp.S: Assembler messages:
24 ../sysdeps/powerpc/powerpc64/power8/strcmp.S:55: Error: unrecognized opcode: `cmpb'
This patch adds '.machine powerN' directives where none existed, as well
as it updates '.machine power7' directives on POWER8 files, because the
minimum binutils version required to build glibc (binutils 2.25) now
provides this machine version. It also adds '-many' to the assembler
command used to build tst-set_ppr.c.
Tested for powerpc, powerpc64, and powerpc64le, as well as with
build-many-glibcs.py for powerpc targets.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
GCC 9 dropped support for the SPE extensions to PowerPC, which means
powerpc*-*-*gnuspe* configurations are no longer buildable with that
compiler. This ISA extension was peculiar to the “e500” line of
embedded PowerPC chips, which, as far as I can tell, are no longer
being manufactured, so I think we should follow suit.
This patch was developed by grepping for “e500”, “__SPE__”, and
“__NO_FPRS__”, and may not eliminate every vestige of SPE support.
Most uses of __NO_FPRS__ are left alone, as they are relevant to
normal embedded PowerPC with soft-float.
* sysdeps/powerpc/preconfigure: Error out on powerpc-*-*gnuspe*
host type.
* scripts/build-many-glibcs.py: Remove powerpc-*-linux-gnuspe
and powerpc-*-linux-gnuspe-e500v1 from list of build configurations.
* sysdeps/powerpc/powerpc32/e500: Recursively delete.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/e500: Recursively delete.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/context-e500.h:
Delete.
* sysdeps/powerpc/fpu_control.h: Remove SPE variant.
Issue an #error if used with a compiler in SPE-float mode.
* sysdeps/powerpc/powerpc32/__longjmp_common.S
* sysdeps/powerpc/powerpc32/setjmp_common.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/getcontext-common.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/getcontext.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/setcontext.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/swapcontext.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S:
Remove code to preserve SPE register state.
* sysdeps/unix/sysv/linux/powerpc/elision-lock.c
* sysdeps/unix/sysv/linux/powerpc/elision-trylock.c
* sysdeps/unix/sysv/linux/powerpc/elision-unlock.c
Remove __SPE__ ifndefs.
This patches consolidates all the powerpc trunc{f} implementations on
the generic sysdeps/powerpc/fpu/s_trunc{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/trunc_to_integer.h (set_fenv_mode): Add
TRUNC handling.
(round_mode): Add definition for TRUNC.
* sysdeps/powerpc/fpu/s_trunc.c: New file.
* sysdeps/powerpc/fpu/s_truncf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_trunc.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
(CFLAGS-s_trunc-power5+.c, CFLAGS-s_truncf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_trunc.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_truncf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
This patches consolidates all the powerpc round{f} implementations on
the generic sysdeps/powerpc/fpu/s_round{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
ROUND handling.
(round_mode): Add definition for ROUND.
(round_to_integer_float): Likewise.
* sysdeps/powerpc/fpu/s_round.c: New file.
* sysdeps/powerpc/fpu/s_roundf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_round.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_round.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
(CFLAGS-s_round-power5+.c, CFLAGS-s_roundf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_round.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_roundf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
This patches consolidates all the powerpc floor{f} implementations on
the generic sysdeps/powerpc/fpu/s_floor{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode):
Add FLOOR option.
(round_mode): Add definition for FLOOR.
* sysdeps/powerpc/fpu/s_floor.c: New file.
* sysdeps/powerpc/fpu/s_floorf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.S:
Likewise
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Remove file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
(CFLAGS-s_floor-power5+.c, CFLAGS-s_floorf-power5+.c): New rule.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floor.c: ... here.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floorf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
This patches consolidates all the powerpc ceil{f} implementations on
the generic sysdeps/powerpc/fpu/s_ceil{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the frip
instruction) or a generic implementation which uses FP only operations.
It adds a generic implementation (round_to_integer.h) which is shared
with other rounding to integer routines. The resulting code should be
similar in term os performance to previous assembly one.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline_nocheck): New
function.
* sysdeps/powerpc/fpu/round_to_integer.h: New file.
* sysdeps/powerpc/fpu/s_ceil.c: Likewise.
* sysdeps/powerpc/fpu/s_ceilf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_ceil-power5+.c, CFLAGS-s_ceilf-power5+.c): New rule.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile: New file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil.c: ... here.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-power5+.c: New
file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf.c: ...
* here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_ceil-power5+, s_ceil-ppc64,
s_ceilf-power5+, and s_ceilf-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
This patch just refactor the assembly implementation to use compiler
builtins instead.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fma.S: Remove file.
* sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
* sysdeps/powerpc/fpu/s_fma.c: New file.
* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
Since be2e25bbd7 the generic ieee754 implementation uses
compiler builtin which generates fabs{f} for all supported targets.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fabs.S: Remove file.
* sysdeps/powerpc/fpu/s_fabsf.S: Likewise.
Replace inline asm uses of the "mffs" and "mtfsf" instructions with
the analogous GCC builtins.
__builtin_mffs and __builtin_mtfsf are both available in GCC 5 and above.
Given the minimum GCC level for GLibC is now GCC 6.2, it is safe to use
these builtins without restriction.
2019-03-29 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fenv_libc.h (fegetenv_register): Replace inline
asm with builtin.
* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (FP_INIT_ROUNDMODE):
Likewise.
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (_GET_DI_FPSCR): Likewise.
(_GET_SI_FPSCR): Likewise.
(_SET_SI_FPSCR): Likewise.
This file is not used anywhere since removal of {k,e}_rem_pio2f.c
(commit ca3aac57ef).
Checked with a build for powerpc-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), powerpc64-linux-gnu (with --with-cpu=power4
and --with-cpu=power7), and powerpc64le-linux (with --with-cpu=power8).
* sysdeps/powerpc/fpu/s_float_bitwise.h: Remove file.
This patch refactor how hp-timing is used on loader code for statistics
report. The HP_TIMING_AVAIL and HP_SMALL_TIMING_AVAIL are removed and
HP_TIMING_INLINE is used instead to check for hp-timing avaliability.
For alpha, which only defines HP_SMALL_TIMING_AVAIL, the HP_TIMING_INLINE
is set iff for IS_IN(rtld).
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. I also
checked the builds for all afected ABIs.
* benchtests/bench-timing.h: Replace HP_TIMING_AVAIL with
HP_TIMING_INLINE.
* nptl/descr.h: Likewise.
* elf/rtld.c (RLTD_TIMING_DECLARE, RTLD_TIMING_NOW, RTLD_TIMING_DIFF,
RTLD_TIMING_ACCUM_NT, RTLD_TIMING_SET): Define.
(dl_start_final_info, _dl_start_final, dl_main, print_statistics):
Abstract hp-timing usage with RTLD_* macros.
* sysdeps/alpha/hp-timing.h (HP_TIMING_INLINE): Define iff IS_IN(rtld).
(HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Remove.
* sysdeps/generic/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL,
HP_TIMING_NONAVAIL): Likewise.
* sysdeps/ia64/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
Likewise.
* sysdeps/powerpc/powerpc32/power4/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/powerpc/powerpc64/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/sparc/sparc32/sparcv9/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/sparc/sparc64/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/x86/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
Likewise.
* sysdeps/generic/hp-timing-common.h: Update comment with
HP_TIMING_AVAIL removal.
Starting with commit 1616d034b6
the output was corrupted on some platforms as _dl_procinfo
was called for every auxv entry and on some architectures like s390
all entries were represented as "AT_HWCAP".
This patch is removing the condition and let _dl_procinfo decide if
an entry is printed in a platform specific or generic way.
This patch also adjusts all _dl_procinfo implementations which assumed
that they are only called for AT_HWCAP or AT_HWCAP2. They are now just
returning a non-zero-value for entries which are not handled platform
specifc.
ChangeLog:
* elf/dl-sysdep.c (_dl_show_auxv): Remove condition and always
call _dl_procinfo.
* sysdeps/unix/sysv/linux/s390/dl-procinfo.h (_dl_procinfo):
Ignore types other than AT_HWCAP.
* sysdeps/sparc/dl-procinfo.h (_dl_procinfo): Likewise.
* sysdeps/unix/sysv/linux/i386/dl-procinfo.h (_dl_procinfo):
Likewise.
* sysdeps/powerpc/dl-procinfo.h (_dl_procinfo): Adjust comment
in the case of falling back to generic output mechanism.
* sysdeps/unix/sysv/linux/arm/dl-procinfo.h (_dl_procinfo):
Likewise.
This patch fixes further coding style issues where code should have
broken lines before operators in accordance with the GNU Coding
Standards but instead was breaking lines after them.
Tested for x86_64, and with build-many-glibcs.py.
* stdio-common/vfscanf-internal.c (ARG): Break lines before rather
than after operators.
* sysdeps/mach/hurd/setitimer.c (timer_thread): Likewise.
(setitimer_locked): Likewise.
* sysdeps/mach/hurd/sigaction.c (__sigaction): Likewise.
* sysdeps/mach/hurd/sigaltstack.c (__sigaltstack): Likewise.
* sysdeps/mach/pagecopy.h (PAGE_COPY_FWD): Likewise.
* sysdeps/mach/thread_state.h (machine_get_basic_state): Likewise.
* sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c
(PPC_CPU_SUPPORTED): Likewise.
* sysdeps/unix/sysv/linux/alpha/a.out.h (N_TXTOFF): Likewise.
* sysdeps/unix/sysv/linux/generic/wordsize-32/overflow.h
(stat_overflow): Likewise.
(statfs_overflow): Likewise.
* sysdeps/unix/sysv/linux/tst-personality.c (do_test): Likewise.
* sysdeps/unix/sysv/linux/tst-ttyname.c (eq_ttyname): Likewise.
(eq_ttyname_r): Likewise.
(run_chroot_tests): Likewise.
Since the commit
commit 81a1443941
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Feb 5 17:35:12 2019 -0200
wcsmbs: optimize wcscat
powerpc64 and powerpc64le builds fail when configured with
--disable-multi-arch and --with-cpu=power6 (or newer), due to an
undefined reference to __GI___wcscpy. This patch fixes this on
sysdeps/powerpc/powerpc64/power6/wcscpy.c, which is only used when
multi-arch is disabled.
This patch does nothing for the failures on 32-bits powerpc builds,
because the file is under the powerpc64 subdirectory, however, powerpc
builds were already failing with --disable-multi-arch, with multiple
error messages, even before the aforementioned commit.
Tested for powerpc, powerpc64, and powerpc64le with multi-arch enabled
(all pass) and disabled (powerpc still fails as explained above).
This patch fixes more places where a space should have been present
before '(' in accordance with the GNU Coding Standards (as with the
previous patch, mainly for calls to sizeof).
Tested with build-many-glibcs.py.
* sysdeps/powerpc/powerpc32/dl-machine.c
(__elf_machine_fixup_plt): Use space before '('.
(__process_machine_rela): Likewise.
* sysdeps/powerpc/powerpc32/register-dump.h (register_dump):
Likewise.
* sysdeps/powerpc/powerpc64/le/fpu/sfp-machine.h (TI_BITS):
Likewise.
* sysdeps/powerpc/powerpc64/register-dump.h (register_dump):
Likewise.
* sysdeps/powerpc/test-arith.c (union_t): Likewise.
(pattern): Likewise.
(delta): Likewise.
(check_result): Likewise.
(check_excepts): Likewise.
(check_op): Likewise.
(fail_xr): Likewise.
* sysdeps/unix/alpha/sysdep.h (syscall_promote): Likewise.
* sysdeps/unix/sysv/linux/alpha/a.out.h (AOUTHSZ): Likewise.
(SCNHSZ): Likewise.
* sysdeps/unix/sysv/linux/hppa/makecontext.c (FRAME_SIZE_BYTES):
Likewise.
(ARGS): Likewise.
(__makecontext): Likewise.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (ucontext_t):
Likewise.
This patch rewrites wcscat using wcslen and wcscpy. This is similar to
the optimization done on strcat by 6e46de42fe.
The strcpy changes are mainly to add the internal alias to avoid PLT
calls.
Checked on x86_64-linux-gnu and a build against the affected
architectures.
* include/wchar.h (__wcscpy): New prototype.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
(__wcscpy): Route internal symbol to generic implementation.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy):
Add internal __wcscpy alias.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise.
* sysdeps/s390/wcscpy.c (wcscpy): Likewise.
* sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise.
* wcsmbs/wcscpy.c (wcscpy): Add
* sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to
use generic implementation.
* wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.
This patch fixes -Wimplicit-fallthrough warnings in system-specific
code that show up building glibc with -Wextra, by adding fall-through
comments, or moving existing such comments to the place required for
them to work (immediately before the case label being fallen through).
Tested with build-many-glibcs.py.
* sysdeps/i386/dl-machine.h (elf_machine_rela): Add fall-through
comments.
* sysdeps/m68k/m680x0/fpu/s_cexp_template.c (s(__cexp)): Likewise.
* sysdeps/m68k/memcopy.h (WORD_COPY_FWD): Likewise.
(WORD_COPY_BWD): Likewise.
* sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise.
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/s390/iso-8859-1_cp037_z900.c (TR_LOOP): Likewise.
* sysdeps/mips/dl-machine.h (elf_machine_reloc): Move fall-through
comment.
* sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise.
The configure fragment for powerpc64le contains a test for the presence
of several compiler builtins and of the __float128 type, which are
provided by GCC 6.2 for powerpc64le. Since this configure test was
added, the compiler version required to build glibc for powerpc64le was
different than that required for the other architectures.
Now that glibc requires GCC 6.2 globally (since commit ID 4dcbbc3b28),
this patch removes the powerpc64le-specific test.
Even tough the configure test checks for compiler features rather than
compiler version, the intent of the test was to stop build attempts at
early stages, if they had been configured with a too old compiler. It
was not the intention of the test to detect compiler breakage (such as
the removal of the required compiler features in future GCC versions),
and glibc is not the place to test for compiler regressions, anyway.
Tested for powerpc64le with GCC 6.2 (built with build-many-glibcs.py).
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
The type used within e_sqrt.c(__slow_ieee754_sqrtf) was, unnecessarily and
likely inadvertently, double. float is not only appropriate, but also
more efficient, avoiding the need for the compiler to emit a
round-to-single-precision instruction.
This is the difference in compiled code:
0000000000000000 <__ieee754_sqrtf>:
0: 2c 08 20 ec fsqrts f1,f1
- 4: 18 08 20 fc frsp f1,f1
- 8: 20 00 80 4e blr
+ 4: 20 00 80 4e blr
(Found by Anton Blanchard.)
Continuing the process of moving away from having bits/mathinline.h
headers in glibc, leaving the compiler to inline functions as
appropriate depending on the options passed to it, this patch removes
the header for powerpc.
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88558> is the
corresponding GCC bug for adding replacements for these powerpc
(32-bit-only) lrint / lrintf inlines.
Tested with build-many-glibcs.py for its powerpc configurations.
* sysdeps/powerpc/bits/mathinline.h: Remove.
A single underscore was omitted in
sysdeps/powerpc/powerpc64/multiarch/strncmp.c, resulting in use of
power8 version of strncmp instead of power9 version, with significant
performance degradation.
* sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Fix #ifdef.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
An error "impossible register constraint in 'asm'" was raised on POWER
5 and due to __vector __int128_t being used as operands without passing the
option -msvx to gcc.
This patch replaces "__vector __int128_t" with "__vector unsigned int"
which requires only -maltivec, available since POWER ISA 2.03, and which
is already passed to the compiler.
* sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c:
(do_test): Changed __vector __int128_t to __vector unsigned int.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This patch fix VSCR position on ucontext. VSCR was read in the wrong
position on ucontext structure because it was ignoring the machine
endianess.
[BZ #24088]
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (vscr_t): Added
ifdef to fix read of VSCR.
* sysdeps/powerpc/powerpc64/Makefile [$subdir == stdlib]: Add
tst-ucontext-ppc64-vscr.c to test list.
* sysdeps/powerpc/powerpc64/tst-ucontext-ppc64-vscr.c: New test file.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Add support for AT_L1I_CACHESIZE, AT_L1I_CACHEGEOMETRY,
AT_L1D_CACHESIZE, AT_L1D_CACHEGEOMETRY, AT_L2_CACHESIZE,
AT_L2_CACHEGEOMETRY, AT_L3_CACHESIZE and AT_L3_CACHEGEOMETRY when
LD_SHOW_AUXV=1.
AT_L*_CACHESIZE is printed as decimal and represent the number of
bytes of the cache.
AT_L*_CACHEGEOMETRY is treated in order to specify the cache line size
and its associativity.
Example output from a POWER8:
AT_L1I_CACHESIZE: 32768
AT_L1I_CACHEGEOMETRY: 128B line size, 8-way set associative
AT_L1D_CACHESIZE: 65536
AT_L1D_CACHEGEOMETRY: 128B line size, 8-way set associative
AT_L2_CACHESIZE: 524288
AT_L2_CACHEGEOMETRY: 128B line size, 8-way set associative
AT_L3_CACHESIZE: 8388608
AT_L3_CACHEGEOMETRY: 128B line size, 8-way set associative
Some of the new types are longer than the previous ones, requiring to
increase the indentation in order to keep the values aligned.
* elf/dl-sysdep.c (auxvars): Add AT_L1I_CACHESIZE,
AT_L1I_CACHEGEOMETRY, AT_L1D_CACHESIZE, AT_L1D_CACHEGEOMETRY,
AT_L2_CACHESIZE, AT_L2_CACHEGEOMETRY, AT_L3_CACHESIZE and
AT_L3_CACHEGEOMETRY. Fix indentation when printing the other
fields.
(_dl_show_auxv): Give a special treatment to
AT_L1I_CACHEGEOMETRY, AT_L1D_CACHEGEOMETRY, AT_L2_CACHEGEOMETRY
and AT_L3_CACHEGEOMETRY.
* sysdeps/powerpc/dl-procinfo.h (cache_geometry): New function.
(_dl_procinfo): Fix indentation when printing AT_HWCAP and
AT_HWCAP2. Add support for AT_L1I_CACHEGEOMETRY,
AT_L1D_CACHEGEOMETRY, AT_L2_CACHEGEOMETRY and AT_L3_CACHEGEOMETRY.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Add CFI information about the offset of registers stored in the stack
frame.
[BZ #23614]
* sysdeps/powerpc/powerpc64/addmul_1.S (FUNC): Add CFI offset for
registers saved in the stack frame.
* sysdeps/powerpc/powerpc64/lshift.S (__mpn_lshift): Likewise.
* sysdeps/powerpc/powerpc64/mul_1.S (__mpn_mul_1): Likewise.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
Soft-float powerpc fails to build with current GCC mainline because of
use of libc_hidden_data_def for TLS variables, resulting in a non-TLS
alias being defined, to which the tls_model attribute is now copied,
resulting in a warning about it being ignored.
The problem here appears to be the non-TLS alias. This patch adds a
hidden_tls_def macro family, corresponding to the hidden_tls_proto
macros, to define TLS aliases properly in such a case, and uses it for
those powerpc soft-float variables.
Tested with build-many-glibcs.py compilers build for powerpc-linux-gnu
soft-float. Also tested for x86_64.
* include/libc-symbols.h [SHARED && !NO_HIDDEN && !__ASSEMBLER__]
(__hidden_ver2): New macro. Use old definition of __hidden_ver1
with additional parameter thread.
[SHARED && !NO_HIDDEN && !__ASSEMBLER__] (__hidden_ver1): Define
in terms of __hidden_ver2.
(hidden_tls_def): New macro.
(libc_hidden_tls_def): Likewise.
(rtld_hidden_tls_def): Likewise.
(libm_hidden_tls_def): Likewise.
(libmvec_hidden_tls_def): Likewise.
(libresolv_hidden_tls_def): Likewise.
(librt_hidden_tls_def): Likewise.
(libdl_hidden_tls_def): Likewise.
(libnss_files_hidden_tls_def): Likewise.
(libnsl_hidden_tls_def): Likewise.
(libnss_nisplus_hidden_tls_def): Likewise.
(libutil_hidden_tls_def): Likewise.
(libutil_hidden_tls_def): Likweise.
* sysdeps/powerpc/nofpu/sim-full.c (__sim_exceptions_thread): Use
libc_hidden_tls_def.
(__sim_disabled_exceptions_thread): Likewise.
(__sim_round_mode_thread): Likewise.
After my changes to move various macros, inlines and other content
from math_private.h to more specific headers, many files including
math_private.h no longer need to do so. Furthermore, since the
optimized inlines of various functions have been moved to
include/fenv.h or replaced by use of function names GCC inlines
automatically, a missing math_private.h include where one is
appropriate will reliably cause a build failure rather than possibly
causing code to be less well optimized while still building
successfully. Thus, this patch removes includes of math_private.h
that are now unnecessary. In the case of two RISC-V files, the
include is replaced by one of stdbool.h because the files in question
were relying on math_private.h to get a definition of bool.
Tested for x86_64 and x86, and with build-many-glibcs.py.
* math/fromfp.h: Do not include <math_private.h>.
* math/s_cacosh_template.c: Likewise.
* math/s_casin_template.c: Likewise.
* math/s_casinh_template.c: Likewise.
* math/s_ccos_template.c: Likewise.
* math/s_cproj_template.c: Likewise.
* math/s_fdim_template.c: Likewise.
* math/s_fmaxmag_template.c: Likewise.
* math/s_fminmag_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/s_ldexp_template.c: Likewise.
* math/s_nextdown_template.c: Likewise.
* math/w_log1p_template.c: Likewise.
* math/w_scalbln_template.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/aarch64/fpu/s_llrint.c: Likewise.
* sysdeps/aarch64/fpu/s_llrintf.c: Likewise.
* sysdeps/aarch64/fpu/s_lrint.c: Likewise.
* sysdeps/aarch64/fpu/s_lrintf.c: Likewise.
* sysdeps/i386/fpu/s_atanl.c: Likewise.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/i386/fpu/s_f32xsubf64.c: Likewise.
* sysdeps/i386/fpu/s_fdim.c: Likewise.
* sysdeps/i386/fpu/s_logbl.c: Likewise.
* sysdeps/i386/fpu/s_rintl.c: Likewise.
* sysdeps/i386/fpu/s_significandl.c: Likewise.
* sysdeps/ia64/fpu/s_matherrf.c: Likewise.
* sysdeps/ia64/fpu/s_matherrl.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_cbrt.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/flt-32/s_cbrtf.c: Likewise.
* sysdeps/ieee754/k_standardf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_finitel.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_isinfl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_isnanl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_signbitl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_cbrtl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/s_signgam.c: Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c: Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c: Likewise.
* sysdeps/powerpc/power7/fpu/s_logbf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_rint.c: Include <stdbool.h> instead of
<math_private.h>.
* sysdeps/riscv/rvf/s_rintf.c: Likewise.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __round functions to call the
corresponding round names instead, with asm redirection to __round
when the calls are not inlined.
An additional complication arises in
sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the
result converted to int, gets converted by the compiler to call
lroundl in the case of 32-bit long, so resulting in localplt test
failures. It's logically correct to let the compiler make such an
optimization; an appropriate asm redirection of lroundl to __lroundl
is thus added to that file (it's not needed anywhere else).
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_roundf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_round.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise.
* sysdeps/ieee754/float128/s_roundf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise.
(round): Redirect to __round.
(__roundl): Call round instead of __round.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round):
Remove macro.
[_ARCH_PWR5X] (__roundf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round
functions instead of __round variants.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to
__lroundl.
(__ieee754_expl): Call roundl instead of __roundl.
Linux from 3.9 through 4.2 does not abort HTM transaction on syscalls,
instead it suspend and resume it when leaving the kernel. The
side-effects of the syscall will always remain visible, even if the
transaction is aborted. This is an issue when transaction is used along
with futex syscall, on pthread_cond_wait for instance, where the futex
call might succeed but the transaction is rolled back leading the
pthread_cond object in an inconsistent state.
Glibc used to prevent it by always aborting a transaction before issuing
a syscall. Linux 4.2 also decided to abort active transaction in
syscalls which makes the glibc workaround superfluous. Worse, glibc
transaction abortion leads to a performance issue on recent kernels
where the HTM state is saved/restore lazily (v4.9). By aborting a
transaction on every syscalls, regardless whether a transaction has being
initiated before, GLIBS makes the kernel always save/restore HTM state
(it can not even lazily disable it after a certain number of syscall
iterations).
Because of this shortcoming, Transactional Lock Elision is just enabled
when it has been explicitly set (either by tunables of by a configure
switch) and if kernel aborts HTM transactions on syscalls
(PPC_FEATURE2_HTM_NOSC). It is reported that using simple benchmark [1],
the context-switch is about 5% faster by not issuing a tabort in every
syscall in newer kernels.
Checked on powerpc64le-linux-gnu with 4.4.0 kernel (Ubuntu 16.04).
* NEWS: Add note about new TLE support on powerpc64le.
* sysdeps/powerpc/nptl/tcb-offsets.sym (TM_CAPABLE): Remove.
* sysdeps/powerpc/nptl/tls.h (tcbhead_t): Rename tm_capable to
__ununsed1.
(TLS_INIT_TP, TLS_DEFINE_INIT_TP): Remove tm_capable setup.
(THREAD_GET_TM_CAPABLE, THREAD_SET_TM_CAPABLE): Remove macros.
* sysdeps/powerpc/powerpc32/sysdep.h,
sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION_IMPL,
ABORT_TRANSACTION): Remove macros.
* sysdeps/powerpc/sysdep.h (ABORT_TRANSACTION): Likewise.
* sysdeps/unix/sysv/linux/powerpc/elision-conf.c (elision_init): Set
__pthread_force_elision iff PPC_FEATURE2_HTM_NOSC is set.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h,
sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h
sysdeps/unix/sysv/linux/powerpc/syscall.S (ABORT_TRANSACTION): Remove
usage.
* sysdeps/unix/sysv/linux/powerpc/not-errno.h: Remove file.
Reported-by: Breno Leitão <leitao@debian.org>
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __rint functions to call the
corresponding rint names instead, with asm redirection to __rint when
the calls are not inlined. The x86_64 math_private.h is removed as no
longer useful after this patch.
This patch is relative to a tree with my floor patch
<https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied,
and much the same considerations arise regarding possibly replacing an
IFUNC call with a direct inline expansion.
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_rintf.c: Likewise.
* sysdeps/alpha/fpu/s_rint.c: Likewise.
* sysdeps/alpha/fpu/s_rintf.c: Likewise.
* sysdeps/i386/fpu/s_rintl.c: Likewise.
* sysdeps/ieee754/dbl-64/s_rint.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise.
* sysdeps/ieee754/float128/s_rintf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_rintf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise.
* sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise.
* sysdeps/powerpc/fpu/s_rint.c: Likewise.
* sysdeps/powerpc/fpu/s_rintf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_rint.c: Likewise.
* sysdeps/riscv/rvf/s_rintf.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/x86_64/fpu/math_private.h: Remove file.
* math/e_scalb.c (invalid_fn): Use rint functions instead of
__rint variants.
* math/e_scalbf.c (invalid_fn): Likewise.
* math/e_scalbl.c (invalid_fn): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
Likewise.
* sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise.
* sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
Similar to the changes that were made to call sqrt functions directly
in glibc, instead of __ieee754_sqrt variants, so that the compiler
could inline them automatically without needing special inline
definitions in lots of math_private.h headers, this patch makes libm
code call floor functions directly instead of __floor variants,
removing the inlines / macros for x86_64 (SSE4.1) and powerpc
(POWER5).
The redirection used to ensure that __ieee754_sqrt does still get
called when the compiler doesn't inline a built-in function expansion
is refactored so it can be applied to other functions; the refactoring
is arranged so it's not limited to unary functions either (it would be
reasonable to use this mechanism for copysign - removing the inline in
math_private_calls.h but also eliminating unnecessary local PLT entry
use in the cases (powerpc soft-float and e500v1, for IBM long double)
where copysign calls don't get inlined).
The point of this change is that more architectures can get floor
calls inlined where they weren't previously (AArch64, for example),
without needing special inline definitions in their math_private.h,
and existing such definitions in math_private.h headers can be
removed.
Note that it's possible that in some cases an inline may be used where
an IFUNC call was previously used - this is the case on x86_64, for
example. I think the direct calls to floor are still appropriate; if
there's any significant performance cost from inline SSE2 floor
instead of an IFUNC call ending up with SSE4.1 floor, that indicates
that either the function should be doing something else that's faster
than using floor at all, or it should itself have IFUNC variants, or
that the compiler choice of inlining for generic tuning should change
to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
might be called at runtime - but not that glibc should avoid calling
floor internally. (After all, all the same considerations would apply
to any user program calling floor, where it might either be inlined or
left as an out-of-line call allowing for a possible IFUNC.)
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT):
New macro.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (floor): Likewise.
* sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_floorf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_floor.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise.
* sysdeps/ieee754/float128/s_floorf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_floorf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor):
Remove macro.
[_ARCH_PWR5X] (__floorf): Likewise.
* sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove
inline function.
[__SSE4_1__] (__floorf): Likewise.
* math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions
instead of __floor variants.
* math/w_lgamma_r_compat.c (__lgamma_r): Likewise.
* math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise.
* math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise.
* math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise.
* math/w_lgammal_r_compat.c (__lgammal_r): Likewise.
* math/w_tgamma_compat.c (__tgamma): Likewise.
* math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise.
* math/w_tgammaf_compat.c (__tgammaf): Likewise.
* math/w_tgammal_compat.c (__tgammal): Likewise.
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
Optimized exp and exp2 implementations using a lookup table for
fractional powers of 2. There are several variants, see e_exp_data.c,
they can be selected by modifying math_config.h allowing different
tradeoffs.
The default selection should be acceptable as generic libm code.
Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on
aarch64 the rodata size is 2160 bytes, shared between exp and exp2.
On aarch64 .text + .rodata size decreased by 24912 bytes.
The non-nearest rounding error is less than 1 ULP even on targets
without efficient round implementation (although the error rate is
higher in that case). Targets with single instruction, rounding mode
independent, to nearest integer rounding and conversion can use them
by setting TOINT_INTRINSICS and adding the necessary code to their
math_private.h.
The __exp1 code uses the same algorithm, so the error bound of pow
increased a bit.
New double precision error handling code was added following the
style of the single precision error handling code.
Improvements on Cortex-A72 compared to current glibc master:
exp thruput: 1.61x in [-9.9 9.9]
exp latency: 1.53x in [-9.9 9.9]
exp thruput: 1.13x in [0.5 1]
exp latency: 1.30x in [0.5 1]
exp2 thruput: 2.03x in [-9.9 9.9]
exp2 latency: 1.64x in [-9.9 9.9]
For small (< 1) inputs the current exp code uses a separate algorithm
so the speed up there is less.
Was tested on
aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and
x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and
powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets,
only non-nearest rounding ulp errors increase and they are within
acceptable bounds (ulp updates are in separate patches).
* NEWS: Mention exp and exp2 improvements.
* math/Makefile (libm-support): Remove t_exp.
(type-double-routines): Add math_err and e_exp_data.
* sysdeps/aarch64/libm-test-ulps: Update.
* sysdeps/arm/libm-test-ulps: Update.
* sysdeps/i386/fpu/e_exp_data.c: New file.
* sysdeps/i386/fpu/math_err.c: New file.
* sysdeps/i386/fpu/t_exp.c: Remove.
* sysdeps/ia64/fpu/e_exp_data.c: New file.
* sysdeps/ia64/fpu/math_err.c: New file.
* sysdeps/ia64/fpu/t_exp.c: Remove.
* sysdeps/ieee754/dbl-64/e_exp.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_exp_data.c: New file.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound.
* sysdeps/ieee754/dbl-64/eexp.tbl: Remove.
* sysdeps/ieee754/dbl-64/math_config.h: New file.
* sysdeps/ieee754/dbl-64/math_err.c: New file.
* sysdeps/ieee754/dbl-64/t_exp.c: Remove.
* sysdeps/ieee754/dbl-64/t_exp2.h: Remove.
* sysdeps/ieee754/dbl-64/uexp.h: Remove.
* sysdeps/ieee754/dbl-64/uexp.tbl: Remove.
* sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file.
* sysdeps/m68k/m680x0/fpu/math_err.c: New file.
* sysdeps/m68k/m680x0/fpu/t_exp.c: Remove.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
<fenv_private.h> has inline versions of various <fenv.h> functions,
and their __fe* variants, for systems (generally soft-float) without
support for floating-point exceptions, rounding modes or both.
Having these inlines in a separate header introduces a risk of a
source file including <fenv.h> and compiling OK on x86_64, but failing
to compile (because the feraiseexcept inline is actually a macro that
discards its argument, to avoid the need for #ifdef FE_INVALID
conditionals), or not being properly optimized, on systems without the
exceptions and rounding modes support (when these inlines were in
math_private.h, we had a few cases where this broke the build because
there was no obvious reason for a file to need math_private.h and it
didn't need that header on x86_64). By moving those inlines to
include/fenv.h, this risk can be avoided, and fenv_private.h becomes
more clearly defined as specifically the header for the internal
libc_fe* and SET_RESTORE_ROUND* interfaces.
This patch makes that move, removing fenv_private.h includes that are
no longer needed (or replacing them by fenv.h includes in a few cases
that didn't already have such an include).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/generic/fenv_private.h [FE_ALL_EXCEPT == 0]: Move this
code ....
[!FE_HAVE_ROUNDING_MODES]: And this code ....
* include/fenv.h [!_ISOMAC]: ... to here.
* math/fraiseexcpt.c (__feraiseexcept): Undefine as macro.
(feraiseexcept): Likewise.
* math/fromfp.h: Do not include <fenv_private.h>.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* math/w_ilogb_template.c: Include <fenv.h> instead of
<fenv_private.h>.
* math/w_llogb_template.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
Continuing the clean-up related to the catch-all math_private.h
header, this patch stops math_private.h from including fenv_private.h.
Instead, fenv_private.h is included directly from those users of
math_private.h that also used interfaces from fenv_private.h. No
attempt is made to remove unused includes of math_private.h, but that
is a natural followup.
(However, since math_private.h sometimes defines optimized versions of
math.h interfaces or __* variants thereof, as well as defining its own
interfaces, I think it might make sense to get all those optimized
versions included from include/math.h, not requiring a separate header
at all, before eliminating unused math_private.h includes - that
avoids a file quietly becoming less-optimized if someone adds a call
to one of those interfaces without restoring a math_private.h include
to that file.)
There is still a pitfall that if code uses plain fe* and __fe*
interfaces, but only includes fenv.h and not fenv_private.h or (before
this patch) math_private.h, it will compile on platforms with
exceptions and rounding modes but not get the optimized versions (and
possibly not compile) on platforms without exception and rounding mode
support, so making it easy to break the build for such platforms
accidentally.
I think it would be most natural to move the inlines / macros for fe*
and __fe* in the case of no exceptions and rounding modes into
include/fenv.h, so that all code including fenv.h with _ISOMAC not
defined automatically gets them. Then fenv_private.h would be purely
the header for the libc_fe*, SET_RESTORE_ROUND etc. internal
interfaces and the risk of breaking the build on other platforms than
the one you tested on because of a missing fenv_private.h include
would be much reduced (and there would be some unused fenv_private.h
includes to remove along with unused math_private.h includes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math_private.h: Do not include <fenv_private.h>.
* math/fromfp.h: Include <fenv_private.h>.
* math/math-narrow.h: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_ilogb_template.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_llogb_template.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_remainder.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/gamma_product.c: Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/x2y2m1.c: Likewise.
* sysdeps/ieee754/float128/float128_private.h: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h. As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.
This patch makes the fenv_private.h name an architecture-independent
convention in glibc. The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed). Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>. The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64. The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/aarch64/fpu/fenv_private.h: New file. Based on ....
* sysdeps/aarch64/fpu/math_private.h: ... this file. All contents
moved to fenv_private.h except for ...
(TOINT_INTRINSICS): Kept in math_private.h.
(roundtoint): Likewise.
(converttoint): Likewise.
* sysdeps/arm/fenv_private.h: Change multiple-include guard to
[ARM_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/arm/math_private.h: Remove.
* sysdeps/generic/fenv_private.h: New file. Contents moved from
....
* sysdeps/generic/math_private.h: ... this file. Include
<stdbool.h>. Do not include <fenv.h> or <get-rounding-mode.h>.
Include <fenv_private.h>. Remove functions and macros moved to
fenv_private.h.
* sysdeps/i386/fpu/math_private.h: Remove.
* sysdeps/mips/math_private.h: Move to ....
* sysdeps/mips/fpu/fenv_private.h: ... here. Change
multiple-include guard to [MIPS_FENV_PRIVATE_H]. Remove
[__mips_hard_float] conditional. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
guard to [POWERPC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/math_private.h: Do not include
<fenv_private.h>.
* sysdeps/riscv/rvf/math_private.h: Move to ....
* sysdeps/riscv/rvf/fenv_private.h: ... here. Change
multiple-include guard to [RISCV_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
to [SPARC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/sparc/fpu/math_private.h: Remove.
* sysdeps/i386/fpu/fenv_private.h: Move to ....
* sysdeps/x86/fpu/fenv_private.h: ... here. Change
multiple-include guard to [X86_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/x86_64/fpu/math_private.h: Do not include
<sysdeps/i386/fpu/fenv_private.h>.
Completing the move of macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the EXCEPTION_SET_FORCES_TRAP macro out to its own
math-tests-trap-force.h header.
Tested with build-many-glibcs.py.
* sysdeps/generic/math-tests-trap-force.h: New file.
* sysdeps/generic/math-tests.h: Include <math-tests-trap-force.h>.
(EXCEPTION_SET_FORCES_TRAP): Do not define here.
* sysdeps/powerpc/math-tests.h: Remove file.
* sysdeps/powerpc/fpu/math-tests-trap-force.h: New file.
New generic optimization of sinf and cosf introduced by commit
599cf39766 shows improvement
compared to powerpc specific assembly version. Hence removing
the powerpc assembly versions to make use of generic code.
This patch moves little endian specific POWER9 optimization files to
sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions
only for little endian.
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to. Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses. Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.
* NEWS: Mention the change.
* elf/dl-tunables.list: Rename tune namespace to cpu.
* sysdeps/powerpc/dl-tunables.list: Likewise.
* sysdeps/x86/dl-tunables.list: Likewise.
* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
cpu.name.
* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
* manual/README.tunables: Likewise.
* manual/tunables.texi: Likewise.
* sysdeps/powerpc/cpu-features.c: Likewise.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
(init_cpu_features): Likewise.
* sysdeps/x86/cpu-features.c: Likewise.
* sysdeps/x86/cpu-features.h: Likewise.
* sysdeps/x86/cpu-tunables.c: Likewise.
* sysdeps/x86_64/Makefile: Likewise.
* sysdeps/x86/dl-cet.c: Likewise.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The math-tests.h header has many different macros and groups of
macros, defined using #ifndef in the generic version which is included
by architecture versions with #include_next after possibly defining
non-default versions of some of those macros.
This use of #ifndef is contrary to our normal typo-proof conventions
for macro definitions. This patch moves one of the macros,
SNAN_TESTS_TYPE_CAST, out to its own sysdeps header, to follow those
typo-proof conventions more closely.
Tested with build-many-glibcs.py.
2018-08-01 Joseph Myers <joseph@codesourcery.com>
* sysdeps/generic/math-tests-snan-cast.h: New file.
* sysdeps/powerpc/math-tests-snan-cast.h: Likewise.
* sysdeps/generic/math-tests.h: Include <math-tests-snan-cast.h>.
(SNAN_TESTS_TYPE_CAST): Do not define macro here.
* sysdeps/powerpc/math-tests.h (SNAN_TESTS_TYPE_CAST): Likewise.
This patch changes longjmp to always restore the TOC pointer (r2 register)
to the caller frame on powerpc64 and powerpc64le. This is related to bug
21895 that reports a situation where you have a static longjmp to a
shared object file.
[BZ #21895]
* sysdeps/powerpc/powerpc64/__longjmp-common.S: Remove condition code for
restoring r2 in longjmp.
* sysdeps/powerpc/powerpc64/Makefile: Added tst-setjmp-bug21895-static to
test list.
Added rules to build test tst-setjmp-bug21895-static.
Added module setjmp-bug21895 and rules to build a shared object from it.
* sysdeps/powerpc/powerpc64/setjmp-bug21895.c: New test file.
* sysdeps/powerpc/powerpc64/tst-setjmp-bug21895-static.c: New test file.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
My recent nan-sign tests fail to build for powerpc64le with GCC 8
because of the special compile / link options needed there for any
test using _Float128. This patch arranges for these tests to be
handled on powerpc64le similarly to other such tests.
Tested with build-many-glibcs.py for powerpc64le.
[BZ #23303]
* sysdeps/powerpc/powerpc64/le/Makefile
(CFLAGS-tst-strtod-nan-sign.c): Add -mfloat128.
(CFLAGS-tst-wcstod-nan-sign.c): Likewise.
(gnulib-tests): Also add $(f128-loader-link) for
tst-strtod-nan-sign abd tst-wcstod-nan-sign.
_init and _fini are special functions provided by glibc for linker to
define DT_INIT and DT_FINI in executable and shared library. They
should never be put in dynamic symbol table. This patch marks them as
hidden to remove them from dynamic symbol table.
Tested with build-many-glibcs.py.
[BZ #23145]
* elf/Makefile (tests-special): Add $(objpfx)check-initfini.out.
($(all-built-dso:=.dynsym): New target.
(common-generated): Add $(all-built-dso:$(common-objpfx)%=%.dynsym).
($(objpfx)check-initfini.out): New target.
(generated): Add check-initfini.out.
* scripts/check-initfini.awk: New file.
* sysdeps/aarch64/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/alpha/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/arm/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/hppa/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/i386/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/ia64/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/m68k/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/microblaze/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/mips/mips32/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/mips/mips64/n32/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/mips/mips64/n64/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/nios2/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/powerpc/powerpc32/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/powerpc/powerpc64/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/s390/s390-32/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/s390/s390-64/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/sh/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/sparc/crti.S (_init): Mark as hidden.
(_fini): Likewise.
* sysdeps/x86_64/crti.S (_init): Mark as hidden.
(_fini): Likewise.
When building with -mlong-double-128 or -mabi=ibmlongdouble, TFtype
represents the IBM 128-bit extended floating point type, while KFtype
represents the IEEE 128-bit floating point type.
The soft float implementation of e_sqrtf128 had to redefine TFtype and
TF in order to workaround this issue. However, this behavior changes
when -mabi=ieeelongdouble is used and the macros are not necessary.
* sysdeps/powerpc/powerpc64/le/fpu/e_sqrtf128.c
[__HAVE_FLOAT128_UNLIKE_LDBL] (TFtype, TF): Restrict TFtype
and TF redirection to KFtype and KF only when the default
long double type is not the IEEE 128-bit floating point type.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
powerpc-nofpu libc exports __sqrtsf2 and __sqrtdf2 symbols. The
export of these soft-fp symbols is a mistake; they aren't part of the
libgcc interface and GCC will never generate code that calls them.
This patch makes them into compat symbols (no code built for static
libc), moving their sources from the generic soft-fp sources to
sysdeps/powerpc/nofpu (the underlying soft-fp FP_SQRT functionality
remains of use to implement actual sqrt public interfaces, such as
sqrtl / sqrtf128 for which it is used on various platforms, but
__sqrt[sdt]f2 are not such interfaces).
Tested with build-many-glibcs.py for relevant platforms.
[BZ #18473]
* soft-fp/sqrttf2.c: Remove file.
* soft-fp/sqrtdf2.c: Move to ....
* sysdeps/powerpc/nofpu/sqrtdf2.c: ... here. Include
<shlib-compat.h>.
(__sqrtdf2): Make conditional on
[SHLIB_COMPAT (libc, GLIBC_2_3_2, GLIBC_2_28)]. Define as compat
symbol.
* soft-fp/sqrtsf2.c: Move to ....
* sysdeps/powerpc/nofpu/sqrtsf2.c: ... here. Include
<shlib-compat.h>.
(__sqrtsf2): Make conditional on
[SHLIB_COMPAT (libc, GLIBC_2_3_2, GLIBC_2_28)]. Define as compat
symbol.
* soft-fp/Makefile (gcc-single-routines): Remove sqrtsf2.
(gcc-double-routines): Remove sqrtdf2.
(gcc-quad-routines): Remove sqrttf2.
* sysdeps/nios2/Makefile [$(subdir) = soft-fp] (sysdep_routines):
Do not filter out sqrtsf2 and sqrtdf2.
* sysdeps/powerpc/nofpu/Makefile [$(subdir) = soft-fp]
(sysdep_routines): Add sqrtsf2 and sqrtdf2.
This patch creates ifunc for sqrtf128() to make use of new xssqrtqp
instruction for POWER9 when --enable-multi-arch and --with-cpu=power8
options are used on power9 system. This is achieved by explicitly
adding -mcpu=power9 flag for sqrtf128-power9.
Currently, powerpc, powerpc64, and powerpc64le imply the same set of
subdirectories from sysdeps/ieee754: flt-32, dbl-64, ldbl-128ibm, and
ldbl-opt. In preparation for the transition of the long double format -
from IBM Extended Precision to IEEE 754 128-bits floating-point - on
powerpc64le, this patch splits the shared Implies file into three
separate files (one for each of the powerpc architectures), without
changing their contents. Future patches will modify powerpc64le.
* sysdeps/powerpc/Implies: Removed. Previous contents copied to...
* sysdeps/powerpc/powerpc32/Implies-after: ... here.
* sysdeps/powerpc/powerpc64/be/Implies-after: ... here.
* sysdeps/powerpc/powerpc64/le/Implies-before: ... and here.
As per <https://sourceware.org/ml/libc-alpha/2014-10/msg00369.html>,
there should not be separate sysdeps/<arch>/soft-fp directories when
those are used by all configurations that use sysdeps/<arch>, and,
more generally, should not be sysdeps/foo/Implies files pointing to a
subdirectory foo/bar.
sysdeps/powerpc/soft-fp isn't quite such a case, as the Implies files
pointing to it are
sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/Implies and
sysdeps/unix/sysv/linux/powerpc/powerpc32/e500/nofpu/Implies (and
indeed there is a different sfp-machine.h used for powerpc64le).
However, the same principle applies: there is no need for this
directory because sfp-machine.h, the only file in it, can most
naturally go in sysdeps/powerpc/nofpu, which is used by exactly the
same configurations (and there is a close dependence between the files
there and the sfp-machine.h implementation). This patch eliminates
the sysdeps/powerpc/soft-fp directory accordingly.
Tested with build-many-glibcs.py that installed stripped shared
libraries for powerpc configurations are unchanged by this patch.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/Implies: Remove
powerpc/soft-fp.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/e500/nofpu/Implies:
Likewise.
* sysdeps/powerpc/soft-fp/sfp-machine.h: Move to ....
* sysdeps/powerpc/nofpu/sfp-machine.h: ... here.
When compiling C++ code with -mabi=ieeelongdouble, KCtype is
unavailable and the long double type should be used instead.
This is also providing macro __HAVE_FLOAT128_UNLIKE_LDBL in order to
identify the kind of long double type is being used in the current
compilation unit.
Notice that bits/floatn.h cannot benefit from the new macro due to order
of header inclusion.
* bits/floatn-common.h: Define __HAVE_FLOAT128_UNLIKE_LDBL.
* math/math.h: Restrict the prototype definition for the functions
issignaling(_Float128) and iszero(_Float128); and template
__iseqsig_type<_Float128>, from __HAVE_DISTINCT_FLOAT128 to
__HAVE_FLOAT128_UNLIKE_LDBL.
* sysdeps/powerpc/bits/floatn.h [__HAVE_FLOAT128
&& (!__GNUC_PREREQ (7, 0) || defined __cplusplus)
&& __LDBL_MANT_DIG__ == 113]: Use long double suffix for
__f128() constants; define the type _Float128 as long double;
and reuse long double in __CFLOAT128.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This patch continues cleaning up math_private.h by moving the
math_opt_barrier and math_force_eval macros to a separate header
math-barriers.h.
At present, those macros are inside a "#ifndef math_opt_barrier" in
math_private.h to allow architectures to override them and then use
a separate math-barriers.h header, no such #ifndef or #include_next is
needed; architectures just have their own alternative version of
math-barriers.h when providing their own optimized versions that avoid
going through memory unnecessarily. The generic math-barriers.h has a
comment added to document these two macros.
In this patch, math_private.h is made to #include <math-barriers.h>,
so files using these macros do not need updating yet. That is because
of uses of math_force_eval in math_check_force_underflow and
math_check_force_underflow_nonneg, which are still defined in
math_private.h. Once those are moved out to a separate header, that
separate header can be made to include <math-barriers.h>, as can the
other files directly using these barrier macros, and then the include
of <math-barriers.h> from math_private.h can be removed.
Tested for x86_64 and x86. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math-barriers.h: New file.
* sysdeps/generic/math_private.h [!math_opt_barrier]
(math_opt_barrier): Move to math-barriers.h.
[!math_opt_barrier] (math_force_eval): Likewise.
* sysdeps/aarch64/fpu/math-barriers.h: New file.
* sysdeps/aarch64/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/alpha/fpu/math-barriers.h: New file.
* sysdeps/alpha/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/x86/fpu/math-barriers.h: New file.
* sysdeps/i386/fpu/fenv_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/m68k/m680x0/fpu/math_private.h: Move to....
* sysdeps/m68k/m680x0/fpu/math-barriers.h: ... here. Adjust
multiple-include guard for rename.
* sysdeps/powerpc/fpu/math-barriers.h: New file.
* sysdeps/powerpc/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
The creation of the divergent sysdeps directory for powerpc64le
commit 2f7f3cd8cd
Author: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
Date: Fri Jul 15 18:04:40 2016 -0500
powerpc64le: Create divergent sysdep directory for powerpc64le.
allowed float128 to be enabled for powerpc64le (little-endian) and not
for powerpc64 (big-endian). Since the only intended difference between
them was the presence or absence of the float128 interface, the sysdeps
directory for powerpc64le explicitly reused the files from powerpc64
(through the use of Implies files).
Although this works, it also means that files under the powerpc64
directory might be preferred over files under powerpc64le. For
instance, on a build for powerpc64le with target set to power9, a file
from powerpc64/power5 might get built, even though a file with the same
name exists in powerpc64le/power8. That happens because the processor
hierarchy was only defined in the sysdeps directory for powerpc64 (and
borrowed by powerpc64le).
This patch fixes this behavior, by creating new subdirectories under
powerpc64 (i.e.: powerpc64/be and powerpc64/le) and creating new Implies
files to provide the hierarchy of processors for powerpc64 and
powerpc64le separately. These changes have no effect on installed,
stripped binaries (which remain unchanged).
Tested that installed stripped binaries are unchanged and that there are
no regressions on powerpc64 and powerpc64le.
* sysdeps/powerpc/fpu/libm-test-ulps: Increase double-precision
sin, cos and sincos to 1 ULP.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Wrap symbol address run-time calculation into a macro and use it
throughout, replacing inline calculations.
There are a couple of variants, most of them different in a functionally
insignificant way. Most calculations are right following RESOLVE_MAP,
at which point either the map or the symbol returned can be checked for
validity as the macro sets either both or neither. In some places both
the symbol and the map has to be checked however.
My initial implementation therefore always checked both, however that
resulted in code larger by as much as 0.3%, as many places know from
elsewhere that no check is needed. I have decided the size growth was
unacceptable.
Having looked closer I realized that it's the map that is the culprit.
Therefore I have modified LOOKUP_VALUE_ADDRESS to accept an additional
boolean argument telling it to access the map without checking it for
validity. This in turn has brought quite nice results, with new code
actually being smaller for i686, and MIPS o32, n32 and little-endian n64
targets, unchanged in size for x86-64 and, unusually, marginally larger
for big-endian MIPS n64, as follows:
i686:
text data bss dec hex filename
152255 4052 192 156499 26353 ld-2.27.9000-base.so
152159 4052 192 156403 262f3 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/el:
text data bss dec hex filename
142906 4396 260 147562 2406a ld-2.27.9000-base.so
142890 4396 260 147546 2405a ld-2.27.9000-elf-symbol-value.so
MIPS/n32/el:
text data bss dec hex filename
142267 4404 260 146931 23df3 ld-2.27.9000-base.so
142171 4404 260 146835 23d93 ld-2.27.9000-elf-symbol-value.so
MIPS/n64/el:
text data bss dec hex filename
149835 7376 408 157619 267b3 ld-2.27.9000-base.so
149787 7376 408 157571 26783 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/eb:
text data bss dec hex filename
142870 4396 260 147526 24046 ld-2.27.9000-base.so
142854 4396 260 147510 24036 ld-2.27.9000-elf-symbol-value.so
MIPS/n32/eb:
text data bss dec hex filename
142019 4404 260 146683 23cfb ld-2.27.9000-base.so
141923 4404 260 146587 23c9b ld-2.27.9000-elf-symbol-value.so
MIPS/n64/eb:
text data bss dec hex filename
149763 7376 408 157547 2676b ld-2.27.9000-base.so
149779 7376 408 157563 2677b ld-2.27.9000-elf-symbol-value.so
x86-64:
text data bss dec hex filename
148462 6452 400 155314 25eb2 ld-2.27.9000-base.so
148462 6452 400 155314 25eb2 ld-2.27.9000-elf-symbol-value.so
[BZ #19818]
* sysdeps/generic/ldsodefs.h (LOOKUP_VALUE_ADDRESS): Add `set'
parameter.
(SYMBOL_ADDRESS): New macro.
[!ELF_FUNCTION_PTR_IS_SPECIAL] (DL_SYMBOL_ADDRESS): Use
SYMBOL_ADDRESS for symbol address calculation.
* elf/dl-runtime.c (_dl_fixup): Likewise.
(_dl_profile_fixup): Likewise.
* elf/dl-symaddr.c (_dl_symbol_address): Likewise.
* elf/rtld.c (dl_main): Likewise.
* sysdeps/aarch64/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/alpha/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/arm/dl-machine.h (elf_machine_rel): Likewise.
(elf_machine_rela): Likewise.
* sysdeps/hppa/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/hppa/dl-symaddr.c (_dl_symbol_address): Likewise.
* sysdeps/i386/dl-machine.h (elf_machine_rel): Likewise.
(elf_machine_rela): Likewise.
* sysdeps/ia64/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/m68k/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/microblaze/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/mips/dl-machine.h (ELF_MACHINE_BEFORE_RTLD_RELOC):
Likewise.
(elf_machine_reloc): Likewise.
(elf_machine_got_rel): Likewise.
* sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise.
* sysdeps/nios2/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/riscv/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/s390/s390-32/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/s390/s390-64/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/sh/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela):
Likewise.
* sysdeps/tile/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The powerpc and sparc bits/mathinline.h include inlines of fdim and
fdimf. These are not restricted to -fno-math-errno, but do not set
errno, and wrongly use ordered <= comparisons instead of the required
islessequal comparisons (this latter issue is latent on powerpc
because GCC wrongly uses unordered comparison instructions for
operations that should use ordered comparison instructions).
Since we wish to avoid such header inlines anyway, leaving it to the
compiler to inline such standard functions under appropriate
conditions, this patch fixes those issues by removing the inlines in
question (and thus removing the sparc bits/mathinline.h header which
had no other inlines left in it). I've filed
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85003> for adding
correct fdim inlines to GCC, since the function is simple enough that
a correct inline is a perfectly reasonable architecture-independent
optimization with -fno-math-errno and in the absence of implicit
excess precision.
Tested with build-many-glibcs.py for all its powerpc and sparc
configurations.
[BZ #22987]
* sysdeps/powerpc/bits/mathinline.h (fdim): Remove inline
function.
(fdimf): Likewise.
* sysdeps/sparc/fpu/bits/mathinline.h: Remove file.
Remove the now unused target specific__ieee754_sqrt(f/l) inlines.
Also remove inlines of sqrt which are for really old GCC versions.
Removing these is desirable, under the general principle of leaving
such inlining to the compiler rather than trying to do it in installed
headers, especially when only very old compilers are affected.
Note that removing inlines for __ieee754_sqrt disables inlining in the
sqrt wrapper functions. Given the sqrt function will typically only be
called for negative arguments, it doesn't matter whether the inlining
happens or not.
* sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt.
* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
* sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/s390/fpu/bits/mathinline.h: Remove file.
* sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove.
(sqrtf): Remove.
(sqrtl): Remove.
(__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
(__ieee754_sqrtl): Remove.
* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
* sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove.
* sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
(__ieee754_sqrtl): Remove.
The sysdeps/ieee754/ldbl-opt version of math_ldbl_opt.h includes
math.h and math_private.h, despite not having any need for those
headers itself; the sysdeps/generic version doesn't. About 20 files
are relying on math_ldbl_opt.h to include math.h and/or math_private.h
for them, even though none of them necessarily used on a platform that
needs ldbl-opt support.
* sysdeps/ieee754/ldbl-opt/math_ldbl_opt.h: Don't include
math.h or math_private.h.
* sysdeps/alpha/fpu/s_isnan.c
* sysdeps/ieee754/ldbl-128ibm/s_ceill.c
* sysdeps/ieee754/ldbl-128ibm/s_floorl.c
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c:
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c:
Include math_private.h.
* sysdeps/ieee754/ldbl-64-128/s_finitel.c
* sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c
* sysdeps/ieee754/ldbl-64-128/s_isinfl.c
* sysdeps/ieee754/ldbl-64-128/s_isnanl.c
* sysdeps/ieee754/ldbl-64-128/s_signbitl.c
* sysdeps/powerpc/power7/fpu/s_logb.c:
Include math.h and math_private.h.
Some SPE opcodes clashes with some recent PowerISA opcodes and
until recently gas did not complain about it. However binutils
recently changed it and now VLE configured gas does not support to
assembler some instruction that might class with VLE (HTM for
instance). It also does not help that glibc build hardware lock
elision support as default (regardless of assembler support).
Although runtime will not actually enables TLE on SPE hardware
(since kernel will not advertise it), I see little advantage on
adding HTM support on SPE built glibc. SPE uses an incompatible
ABI which does not allow share the same build with default
powerpc and HTM code slows down SPE without any benefict.
This patch fixes it by only building HTM when SPE configuration
is not used.
Checked with a powerpc-linux-gnuspe build. I also did some sniff
tests on a e500 hardware without any issue.
[BZ #22926]
* sysdeps/powerpc/powerpc32/sysdep.h (ABORT_TRANSACTION_IMPL): Define
empty for __SPE__.
* sysdeps/powerpc/sysdep.h (ABORT_TRANSACTION): Likewise.
* sysdeps/unix/sysv/linux/powerpc/elision-lock.c (__lll_lock_elision):
Do not build hardware transactional code for __SPE__.
* sysdeps/unix/sysv/linux/powerpc/elision-trylock.c
(__lll_trylock_elision): Likewise.
* sysdeps/unix/sysv/linux/powerpc/elision-unlock.c
(__lll_unlock_elision): Likewise.
Compiling the testsuite for powerpc (multi-arch configurations) with
-Os with GCC 7 fails with:
In file included from ifuncmod1.c:7:0,
from ifuncdep1.c:3:
../sysdeps/powerpc/ifunc-sel.h: In function 'ifunc_sel':
../sysdeps/powerpc/ifunc-sel.h:12:3: error: asm operand 2 probably doesn't match constraints [-Werror]
__asm__ ("mflr 12\n\t"
^~~~~~~
../sysdeps/powerpc/ifunc-sel.h:12:3: error: asm operand 3 probably doesn't match constraints [-Werror]
../sysdeps/powerpc/ifunc-sel.h:12:3: error: asm operand 4 probably doesn't match constraints [-Werror]
../sysdeps/powerpc/ifunc-sel.h:12:3: error: impossible constraint in 'asm'
The "i" constraints on function pointers require the function call to
be inlined so the compiler can see the constant function pointer
arguments passed to the asm. This patch marks the relevant functions
as always_inline accordingly.
Tested that this fixes the -Os testsuite build for
powerpc-linux-gnu-power4, powerpc64-linux-gnu, powerpc64le-linux-gnu
with build-many-glibcs.py.
* sysdeps/powerpc/ifunc-sel.h (ifunc_sel): Make always_inline.
(ifunc_one): Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps (pow): Increase double and
idouble to 1 ULP.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Remove the slow paths from pow. Like several other double precision math
functions, pow is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP. A simple test over a few hundred million
values shows pow is 10% faster on average. This fixes BZ #13932.
[BZ #13932]
* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
* benchtests/pow-inputs: Update comment for slow path cases.
* manual/probes.texi (slowpow_p10): Delete removed probe.
(slowpow_p10): Likewise.
* math/Makefile: Remove halfulp.c and slowpow.c.
* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/generic/math_private.h (__exp1): Remove error argument.
(__halfulp): Remove.
(__slowpow): Remove.
* sysdeps/i386/fpu/halfulp.c: Delete file.
* sysdeps/i386/fpu/slowpow.c: Likewise.
* sysdeps/ia64/fpu/halfulp.c: Likewise.
* sysdeps/ia64/fpu/slowpow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
improve comments and add error analysis.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
(power1): Remove function:
(log1): Remove error argument, add error analysis.
(my_log2): Remove function.
* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
This patch adds the narrowing add functions from TS 18661-1 to glibc's
libm: fadd, faddl, daddl, f32addf64, f32addf32x, f32xaddf64 for all
configurations; f32addf64x, f32addf128, f64addf64x, f64addf128,
f32xaddf64x, f32xaddf128, f64xaddf128 for configurations with
_Float64x and _Float128; __nldbl_daddl for ldbl-opt. As discussed for
the build infrastructure patch, tgmath.h support is deliberately
deferred, and FP_FAST_* macros are not applicable without optimized
function implementations.
Function implementations are added for all relevant pairs of formats
(including certain cases of a format and itself where more than one
type has that format). The main implementations use round-to-odd, or
a trivial computation in the case where both formats are the same or
where the wider format is IBM long double (in which case we don't
attempt to be correctly rounding). The sysdeps/ieee754/soft-fp
implementations use soft-fp, and are used automatically for
configurations without exceptions and rounding modes by virtue of
existing Implies files. As previously discussed, optimized versions
for particular architectures are possible, but not included.
i386 gets a special version of f32xaddf64 to avoid problems with
double rounding (similar to the existing fdim version), since this
function must round just once without an intermediate rounding to long
double. (No such special version is needed for any other function,
because the nontrivial functions use round-to-odd, which does the
intermediate computation with the rounding mode set to round-to-zero,
and double rounding is OK except in round-to-nearest mode, so is OK
for that intermediate round-to-zero computation.) mul and div will
need slightly different special versions for i386 (using round-to-odd
on long double instead of precision control) because of the
possibility of inexact intermediate results in the subnormal range for
double.
To reduce duplication among the different function implementations,
math-narrow.h gets macros CHECK_NARROW_ADD, NARROW_ADD_ROUND_TO_ODD
and NARROW_ADD_TRIVIAL.
In the trivial cases and for any architecture-specific optimized
implementations, the overhead of the errno setting might be
significant, but I think that's best handled through compiler built-in
functions rather than providing separate no-errno versions in glibc
(and likewise there are no __*_finite entry points for these function
provided, __*_finite effectively being no-errno versions at present in
most cases).
Tested for x86_64 and x86, with both GCC 6 and GCC 7. Tested for
mips64 (all three ABIs, both hard and soft float) and powerpc with GCC
7. Tested with build-many-glibcs.py with both GCC 6 and GCC 7.
* math/Makefile (libm-narrow-fns): Add add.
(libm-test-funcs-narrow): Likewise.
* math/Versions (GLIBC_2.28): Add narrowing add functions.
* math/bits/mathcalls-narrow.h (add): Use __MATHCALL_NARROW .
* math/gen-auto-libm-tests.c (test_functions): Add add.
* math/math-narrow.h (CHECK_NARROW_ADD): New macro.
(NARROW_ADD_ROUND_TO_ODD): Likewise.
(NARROW_ADD_TRIVIAL): Likewise.
* sysdeps/ieee754/float128/float128_private.h (__faddl): New
macro.
(__daddl): Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Add fadd and
dadd.
(CFLAGS-nldbl-dadd.c): New variable.
(CFLAGS-nldbl-fadd.c): Likewise.
* sysdeps/ieee754/ldbl-opt/Versions (GLIBC_2.28): Add
__nldbl_daddl.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.h (__nldbl_daddl): New
prototype.
* manual/arith.texi (Misc FP Arithmetic): Document fadd, faddl,
daddl, fMaddfN, fMaddfNx, fMxaddfN and fMxaddfNx.
* math/auto-libm-test-in: Add tests of add.
* math/auto-libm-test-out-narrow-add: New generated file.
* math/libm-test-narrow-add.inc: New file.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fadd.c: Likewise.
* sysdeps/ieee754/float128/s_f32addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-dadd.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_daddl.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_faddl.c: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/mach/hurd/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
Testing narrowing functions with build-many-glibcs.py showed up a
further testsuite fix needed to enable building such functions for
powerpc64le: tests test-<narrower-type>-float128-<function> (and
likewise for float64x) needed the same special handling for
powerpc64le as test-float128-* and test-float64x-*. This patch adds
that special handling.
Tested with build-many-glibcs.py for powerpc64le in conjunction with
the main patch adding narrowing add functions.
* sysdeps/powerpc/powerpc64le/Makefile [$(subdir) = math]
(f128-pairs): New variable.
[$(subdir) = math] ($(foreach suf,$(all-object-suffixes),$(foreach
pair,$(f128-pairs),$(objpfx)test-$(pair)%$(suf)))): Add -mfloat128
to CFLAGS.
[$(subdir) = math] ($(foreach pair,$(f128-pairs),test-$(pair)%)):
Also make tests add $(f128-loader-link) to gnulib-tests.
Bug 19668 reports an unchecked malloc call in the test
sysdeps/powerpc/fpu/tst-setcontext-fpscr.c. This patch makes that
test use xmalloc. It does not otherwise move this test to the
support/ infrastructure or support/test-driver.c; the test has various
uses of exit and _exit on error cases, and uses atexit, and while I
think those things would all still work in the context of
test-driver.c, it's not an immediately obvious conversion the way it
would be for many tests that don't use test-driver.c.
Tested for powerpc.
[BZ #19668]
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c: Include
<support/support.h>. Do not include <malloc.h>.
(query_auxv): Use xmalloc instead of malloc.
The tunables framework needs to execute syscall early in process
initialization, before the TCB is available for consumption. This
behavior conflicts with powerpc{|64|64le}'s lock elision code, that
checks the TCB before trying to abort transactions immediately before
executing a syscall.
This patch adds a powerpc-specific implementation of __access_noerrno
that does not abort transactions before the executing syscall.
Tested on powerpc{|64|64le}.
[BZ #22685]
* sysdeps/powerpc/powerpc32/sysdep.h (ABORT_TRANSACTION_IMPL): Renamed
from ABORT_TRANSACTION.
(ABORT_TRANSACTION): Redirect to ABORT_TRANSACTION_IMPL.
* sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION,
ABORT_TRANSACTION_IMPL): Likewise.
* sysdeps/unix/sysv/linux/powerpc/not-errno.h: New file. Reuse
Linux code, but remove the code that aborts transactions.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
This issue is similar to BZ #19235, where spurious exceptions are
created from adding 0.5 then converting to an integer.
The solution is based on Joseph's fix for BZ #19235.
[BZ #22697]
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S (__llround):
Do not add 0.5 to integer or out-of-range arguments.
For soft-float powerpc, fmaxmagl and fminmagl generate spurious
"invalid" exceptions for quiet NaN arguments. This is another case of
the problems with fabsl inline expansion via comparisons, and so is
fixed by building those functions with -fno-builtin-fabsl.
Tested for powerpc (soft-float).
[BZ #22691]
* sysdeps/powerpc/nofpu/Makefile [$(subdir) = math]
(CFLAGS-s_fmaxmagl.c): New variable.
[$(subdir) = math] (CFLAGS-s_fminmagl.c: Likewise.
For soft-float powerpc, the remainderl function produces zero results
with the wrong sign for various inputs. This is another instance of
the problem with incorrect built-in fabsl expansion, so is fixed by
this patch using -fno-builtin-fabsl for this function.
Tested for powerpc (soft-float).
[BZ #22688]
* sysdeps/powerpc/nofpu/Makefile [$(subdir) = math]
(CFLAGS-e_remainderl.c): New variable.
For soft-float powerpc, various _Complex long double functions
generate spurious "invalid" exceptions, even with a compiler with GCC
bug 64811 fixed.
The problem is GCC's built-in fabsl expansion. Various files are
already built with -fno-builtin-fabsl because in this case (IBM long
double, for soft-float or e500v1) a fallback fabsl expansion based on
comparisons is used, which can produce the wrong sign of a zero
result. Those comparisons can also produce spurious exceptions for
NaN arguments. Furthermore, __builtin_fpclassify implemently uses
__builtin_fabsl, and is unaffected by -fno-builtin-fabsl, and the
fpclassify macro uses __builtin_fpclassify in the absence of
-fsignaling-nans. Thus, this patch arranges for the problem files
using fpclassify to be built with -fsignaling-nans in this case, to
avoid spurious exceptions from fpclassify.
Tested for powerpc (soft-float).
[BZ #22687]
* sysdeps/powerpc/nofpu/Makefile (CFLAGS-s_cacosl.c): New
variable.
(CFLAGS-s_cacoshl.c): Likewise.
(CFLAGS-s_casinhl.c): Likewise.
(CFLAGS-s_catanl.c): Likewise.
(CFLAGS-s_catanhl.c): Likewise.
(CFLAGS-s_cexpl.c): Likewise.
(CFLAGS-s_ccoshl.c): Add -fsignaling-nans.
(CFLAGS-s_csinhl.c): Likewise.
(CFLAGS-s_clogl.c): Likewise.
(CFLAGS-s_clog10l.c): Likewise.
(CFLAGS-s_csinl.c): Likewise.
(CFLAGS-s_csqrtl.c): Likewise.
The function _itoa_word() writes characters from the higher address to
the lower address, requiring the destination string to reserve that size
before calling it.
* sysdeps/powerpc/powerpc64/dl-machine.c (_dl_reloc_overflow):
Reserve 16 chars to reloc_addr before calling _itoa_word.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
In C++ mode, __MATH_TG cannot be used for defining iseqsig, because
__MATH_TG relies on __builtin_types_compatible_p, which is a C-only
builtin. This is true when float128 is provided as an ABI-distinct type
from long double.
Moreover, the comparison macros from ISO C take two floating-point
arguments, which need not have the same type. Choosing what underlying
function to call requires evaluating the formats of the arguments, then
selecting which is wider. The macro __MATH_EVAL_FMT2 provides this
information, however, only the type of the macro expansion is relevant
(actually evaluating the expression would be incorrect).
This patch provides a C++ version of iseqsig, in which only the type of
__MATH_EVAL_FMT2 (__typeof or decltype) is used as a template parameter
for __iseqsig_type. This function calls the appropriate underlying
function.
Tested for powerpc64le and x86_64.
[BZ #22377]
* math/Makefile [C++] (tests): Add test for iseqsig.
* math/math.h [C++] (iseqsig): New implementation, which does
not rely on __MATH_TG/__builtin_types_compatible_p.
* math/test-math-iseqsig.cc: New file.
* sysdeps/powerpc/powerpc64le/Makefile
(CFLAGS-test-math-iseqsig.cc): New variable.
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.
Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).
Using the glibc perf tests on sparc,
sparc (nsec) x86 (nsec)
old new old new
max 17629 395 5173 144
min 399 54 15 13
mean 5317 200 1349 23
The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.
Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0) (-1.46113312244415283203125)
new code: 2.31973271630014299393707e-01 0x1.db14cd799387ap-3
old code: 2.31973271630014271638132e-01 0x1.db14cd7993879p-3
exp = 2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp
In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes. For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures. I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
<errno.h>. Include "eexp.tbl".
(half): New constant.
(one): Likewise.
(__ieee754_exp): Rewrite.
(__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
* sysdeps/i386/fpu/slowexp.c: Likewise.
* sysdeps/ia64/fpu/slowexp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
comment.
* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
(CPPFLAGS-slowexp.c): Remove variable.
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
(CFLAGS-slowexp-fma.c): Remove variable.
(CFLAGS-slowexp-fma4.c): Likewise.
(CFLAGS-slowexp-avx.c): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
define as macro.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
* math/Makefile (type-double-routines): Remove slowexp.
* manual/probes.texi (slowexp_p6): Remove.
(slowexp_p32): Likewise.
This patch makes use of vectors for aligned inputs. Improvements
upto 30% seen for larger aligned inputs.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
There is a configure option --without-fp that specifies that nofpu
sysdeps directories should be used instead of fpu directories.
For most glibc configurations, this option is of no use: either there
is no valid nofpu variant of that configuration, or there are no fpu
or nofpu sysdeps directories for that processor and so the option does
nothing. For a few configurations, if you are using a soft-float
compiler this option is required, and failing to use it generally
results in compilation errors from inline asm using unavailable
floating-point instructions.
We're moving away from --with-cpu to configuring glibc based on how
the compiler generates code, and it is natural to do so for
--without-fp as well; in most cases the soft-float and hard-float ABIs
are incompatible so you have no hope of building a working glibc with
an inappropriately configured compiler or libgcc.
This patch eliminates --without-fp, replacing it entirely by automatic
configuration based on the compiler. Configurations for which this is
relevant (coldfire / mips / powerpc32 / sh) define a variable
with_fp_cond in their preconfigure fragments (under the same
conditions under which those fragments do anything); this is a
preprocessor conditional which the toplevel configure script then uses
in a test to determine which sysdeps directories to use.
The config.make with-fp variable remains. It's used only by powerpc
(sysdeps/powerpc/powerpc32/Makefile) to add -mhard-float to various
flags variables. For powerpc, -mcpu= options can imply use of
soft-float. That could be an issue if you want to build for
e.g. 476fp, but are using --with-cpu=476 because there isn't a 476fp
sysdeps directory. If in future we eliminate --with-cpu and replace
it entirely by testing the compiler, it would be natural at that point
to eliminate that code as well (as the user should then just use a
compiler defaulting to 476fp and the 476 sysdeps directory would be
used automatically).
Tested for x86_64, and tested with build-many-glibcs.py that installed
shared libraries are unchanged by this patch.
* configure.ac (--with-fp): Remove configure option.
(with_fp_cond): New variable.
(libc_cv_with_fp): New configure test. Use this variable instead
of with_fp.
* configure: Regenerated.
* config.make.in (with-fp): Use @libc_cv_with_fp@.
* manual/install.texi (Configuring and compiling): Remove
--without-fp.
* INSTALL: Regenerated.
* sysdeps/m68k/preconfigure (with_fp_cond): Define for ColdFire.
* sysdeps/mips/preconfigure (with_fp_cond): Define.
* sysdeps/powerpc/preconfigure (with_fp_cond): Define for 32-bit.
* sysdeps/sh/preconfigure (with_fp_cond): Define.
* scripts/build-many-glibcs.py (Context.add_all_configs): Do not
use --without-fp to configure glibc.
On POWER8, unaligned memory accesses to cached memory has little impact
on performance as opposed to its ancestors.
It is disabled by default and will only be available when the tunable
glibc.tune.cached_memopt is set to 1.
__memcpy_power8_cached __memcpy_power7
============================================================
max-size=4096: 33325.70 ( 12.65%) 38153.00
max-size=8192: 32878.20 ( 11.17%) 37012.30
max-size=16384: 33782.20 ( 11.61%) 38219.20
max-size=32768: 33296.20 ( 11.30%) 37538.30
max-size=65536: 33765.60 ( 10.53%) 37738.40
* manual/tunables.texi (Hardware Capability Tunables): Document
glibc.tune.cached_memopt.
* sysdeps/powerpc/cpu-features.c: New file.
* sysdeps/powerpc/cpu-features.h: New file.
* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
_dl_powerpc_cpu_features.
* sysdeps/powerpc/dl-tunables.list: New file.
* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
(INIT_ARCH): Initialize use_aligned_memopt.
* sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED &&
IS_IN(rtld))]: Restrict dl_platform_init availability and
initialize CPU features used by tunables.
* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines):
Add memcpy-power8-cached.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add
__memcpy_power8_cached.
* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S:
New file.
Reviewed-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
This patch adds several new tunables to control the behavior of
elision on supported platforms[1]. Since elision now depends
on tunables, we should always *compile* with elision enabled,
and leave the code disabled, but available for runtime
selection. This gives us *much* better compile-time testing of
the existing code to avoid bit-rot[2].
Tested on ppc, ppc64, ppc64le, s390x and x86_64.
[1] This part of the patch was initially proposed by
Paul Murphy but was "staled" because the framework have changed
since the patch was originally proposed:
https://patchwork.sourceware.org/patch/10342/
[2] This part of the patch was inititally proposed as a RFC by
Carlos O'Donnell. Make sense to me integrate this on the patch:
https://sourceware.org/ml/libc-alpha/2017-05/msg00335.html
* elf/dl-tunables.list: Add elision parameters.
* manual/tunables.texi: Add entries about elision tunable.
* sysdeps/unix/sysv/linux/powerpc/elision-conf.c:
Add callback functions to dynamically enable/disable elision.
Add multiple callbacks functions to set elision parameters.
Deleted __libc_enable_secure check.
* sysdeps/unix/sysv/linux/s390/elision-conf.c: Likewise.
* sysdeps/unix/sysv/linux/x86/elision-conf.c: Likewise.
* configure: Regenerated.
* configure.ac: Option enable_lock_elision was deleted.
* config.h.in: ENABLE_LOCK_ELISION flag was deleted.
* config.make.in: Remove references to enable_lock_elision.
* manual/install.texi: Elision configure option was removed.
* INSTALL: Regenerated to remove enable_lock_elision.
* nptl/Makefile:
Disable elision so it can verify error case for destroying a mutex.
* sysdeps/powerpc/nptl/elide.h:
Cleanup ENABLE_LOCK_ELISION check.
Deleted macros for the case when ENABLE_LOCK_ELISION was not defined.
* sysdeps/s390/configure: Regenerated.
* sysdeps/s390/configure.ac: Remove references to enable_lock_elision..
* nptl/tst-mutex8.c:
Deleted all #ifndef ENABLE_LOCK_ELISION from the test.
* sysdeps/powerpc/powerpc32/sysdep.h:
Deleted all ENABLE_LOCK_ELISION checks.
* sysdeps/powerpc/powerpc64/sysdep.h: Likewise.
* sysdeps/powerpc/sysdep.h: Likewise.
* sysdeps/s390/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/force-elision.h: Likewise.
* sysdeps/unix/sysv/linux/s390/elision-conf.h: Likewise.
* sysdeps/unix/sysv/linux/s390/force-elision.h: Likewise.
* sysdeps/unix/sysv/linux/s390/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/s390/Makefile: Remove references to
enable-lock-elision.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes powerpc libm function implementations use
libm_alias_float to define function aliases.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged for all its hard-float powerpc configurations.
* sysdeps/powerpc/fpu/s_cosf.c: Include <libm-alias-float.h>.
(cosf): Define using libm_alias_float.
* sysdeps/powerpc/fpu/s_fabs.S: Include <libm-alias-float.h>.
(fabsf): Define using libm_alias_float.
* sysdeps/powerpc/fpu/s_fmaf.S: Include <libm-alias-float.h>.
(fmaf): Define using libm_alias_float.
* sysdeps/powerpc/fpu/s_rintf.c: Include <libm-alias-float.h>.
(rintf): Define using libm_alias_float.
* sysdeps/powerpc/fpu/s_sinf.c: Include <libm-alias-float.h>.
(sinf): Define using libm_alias_float.
* sysdeps/powerpc/power5+/fpu/s_modff.c: Include
<libm-alias-float.h>.
(modff): Define using libm_alias_float.
* sysdeps/powerpc/power7/fpu/s_logbf.c: Include
<libm-alias-float.h>.
(logbf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Include
<libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Include
<libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Include
<libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_llroundf.c: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Include
<libm-alias-float.h>.
(lrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_lround.S: Include
<libm-alias-float.h>.
(lroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S: Include
<libm-alias-float.h>.
(nearbyintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Include
<libm-alias-float.h>.
(rintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Include
<libm-alias-float.h>.
(roundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Include
<libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c:
Include <libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c:
Include <libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c:
Include <libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrintf.c:
Include <libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llroundf.c:
Include <libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logbf.c:
Include <libm-alias-float.h>.
(logbf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrintf.c:
Include <libm-alias-float.h>.
(lrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lroundf.c:
Include <libm-alias-float.h>.
(lroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modff.c:
Include <libm-alias-float.h>.
(modff): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c:
Include <libm-alias-float.h>.
(roundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c:
Include <libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Include
<libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Include
<libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_llround.S: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_lround.S: Include
<libm-alias-float.h>.
(lroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_roundf.S: Include
<libm-alias-float.h>.
(roundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_truncf.S: Include
<libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Include
<libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llround.S: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power6x/fpu/s_lrint.S: Include
<libm-alias-float.h>.
(lrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc32/power6x/fpu/s_lround.S: Include
<libm-alias-float.h>.
(lroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Include
<libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Include
<libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_cosf.c: Include
<libm-alias-float.h>.
(cosf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Include
<libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llroundf.c: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logbf.c: Include
<libm-alias-float.h>.
(logbf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modff.c: Include
<libm-alias-float.h>.
(modff): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Include
<libm-alias-float.h>.
(roundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_sinf.c: Include
<libm-alias-float.h>.
(sinf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Include
<libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Include
<libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Include
<libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Include
<libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S: Include
<libm-alias-float.h>.
(nearbyintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Include
<libm-alias-float.h>.
(rintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Include
<libm-alias-float.h>.
(roundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Include
<libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Include
<libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Include
<libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_roundf.S: Include
<libm-alias-float.h>.
(roundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_truncf.S: Include
<libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Include
<libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power8/fpu/s_cosf.S: Include
<libm-alias-float.h>.
(cosf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Include
<libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Include
<libm-alias-float.h>.
(llroundf): Define using libm_alias_float.
* sysdeps/powerpc/powerpc64/power8/fpu/s_sinf.S: Include
<libm-alias-float.h>.
(sinf): Define using libm_alias_float.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes the remaining double powerpc functions use
libm_alias_double to define function aliases (with consequent removal
of the need for local compat symbol handling). Previous cleanups
avoid this patch changing installed stripped shared libraries for any
build-many-glibcs.py configuration (there are still some functions in
this patch for which the order of double and float aliases changes
within an individual source file, but in this case this doesn't result
in changes to the final library).
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged for all its hard-float powerpc configurations.
* sysdeps/powerpc/power7/fpu/s_logb.c: Include
<libm-alias-double.h>.
(logb): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Include
<libm-alias-double.h>.
(copysign): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_llrint.c: Include
<libm-alias-double.h>.
(llrint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_llround.c: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Include
<libm-alias-double.h>.
(lrint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_lround.S: Include
<libm-alias-double.h>.
(lround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c:
Include <libm-alias-double.h>.
(copysign): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llrint.c:
Include <libm-alias-double.h>.
(llrint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_llround.c:
Include <libm-alias-double.h>.
(llround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb.c: Include
<libm-alias-double.h>.
(logb): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lrint.c:
Include <libm-alias-double.h>.
(lrint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_lround.c:
Include <libm-alias-double.h>.
(lround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: Include
<libm-alias-double.h>.
(llrint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_llround.S: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_lround.S: Include
<libm-alias-double.h>.
(lround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Include
<libm-alias-double.h>.
(copysign): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: Include
<libm-alias-double.h>.
(llrint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power6/fpu/s_llround.S: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power6x/fpu/s_lrint.S: Include
<libm-alias-double.h>.
(lrint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power6x/fpu/s_lround.S: Include
<libm-alias-double.h>.
(lround): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Include
<libm-alias-double.h>.
(copysign): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Include
<libm-alias-double.h>.
(llrint): Define using libm_alias_double.
(lrint): Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
(lround): Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c: Include
<libm-alias-double.h>.
(logb): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Include
<libm-alias-double.h>.
(copysign): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Include
<libm-alias-double.h>.
(llrint): Define using libm_alias_double.
(lrint): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llround.S: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
(lround): Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
(lround): Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Include
<libm-alias-double.h>.
(copysign): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Include
<libm-alias-double.h>.
(llrint): Define using libm_alias_double.
(lrint): Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
(lround): Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Include
<libm-alias-double.h>.
(llrint): Define using libm_alias_double.
(lrint): Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llround.S: Include
<libm-alias-double.h>.
(llround): Define using libm_alias_double.
(lround): Likewise.
sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c defines the
lroundl compat symbol, version GLIBC_2_1, twice, once based on llround
and once based on __lround. Those are aliases for each other (llround
weak, __lround strong), but defining it twice does not make sense.
This patch changes it to define the compat symbol once only, matching
how libm_alias_double defines it.
Tested with build-many-glibcs.py for its powerpc64 configurations.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llround.c
[LONG_DOUBLE_COMPAT(libm, GLIBC_2_1)] (lroundl): Do not define
compat symbol based on llround.
Some powerpc logb implementations define a compat symbol for logbl
based on logb, whereas libm_alias_double defines such a compat symbol
based on __logb instead. This difference (logb is weak, __logb isn't)
is enough to result in different installed stripped shared libraries.
The difference in the installed libraries isn't significant, but first
changing the compat_symbol calls helps make it possible to validate a
subsequent change to use libm_alias_double by comparison of libraries,
so this patch does such a preliminary change.
Tested with build-many-glibcs.py for all its hard-float powerpc
configurations.
* sysdeps/powerpc/power7/fpu/s_logb.c
[LONG_DOUBLE_COMPAT (libm, GLIBC_2_0)] (logbl): Define as compat
symbol based on __logb, not on logb.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_logb.c
[LONG_DOUBLE_COMPAT (libm, GLIBC_2_0)] (logbl): Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_logb.c
[LONG_DOUBLE_COMPAT (libm, GLIBC_2_0)] (logbl): Likewise.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch various powerpc functions use libm_alias_double to
define function aliases (with consequent removal of the need for local
compat symbol handling). (The present patch excludes the changes to
some functions where such changes could result in differences in
installed stripped shared libraries because of changes to the exact
ordering or properties of symbols in individual .os files.)
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged for all its hard-float powerpc configurations.
* sysdeps/powerpc/fpu/s_rint.c: Include <libm-alias-double.h>.
(rint): Define using libm_alias_double.
* sysdeps/powerpc/power5+/fpu/s_modf.c: Include
<libm-alias-double.h>.
(modf): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Include
<libm-alias-double.h>.
(ceil): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Include
<libm-alias-double.h>.
(floor): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S: Include
<libm-alias-double.h>.
(nearbyint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Include
<libm-alias-double.h>.
(rint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_round.S: Include
<libm-alias-double.h>.
(round): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Include
<libm-alias-double.h>.
(trunc): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Include
<libm-alias-double.h>.
(ceil): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c:
Include <libm-alias-double.h>.
(floor): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf.c: Include
<libm-alias-double.h>.
(modf): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c:
Include <libm-alias-double.h>.
(round): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c:
Include <libm-alias-double.h>.
(trunc): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Include
<libm-alias-double.h>.
(ceil): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Include
<libm-alias-double.h>.
(floor): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_round.S: Include
<libm-alias-double.h>.
(round): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_trunc.S: Include
<libm-alias-double.h>.
(trunc): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Include
<libm-alias-double.h>.
(ceil): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Include
<libm-alias-double.h>.
(floor): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Include
<libm-alias-double.h>.
(modf): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Include
<libm-alias-double.h>.
(round): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Include
<libm-alias-double.h>.
(trunc): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Include
<libm-alias-double.h>.
(ceil): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Include
<libm-alias-double.h>.
(floor): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S: Include
<libm-alias-double.h>.
(nearbyint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Include
<libm-alias-double.h>.
(rint): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_round.S: Include
<libm-alias-double.h>.
(round): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Include
<libm-alias-double.h>.
(trunc): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Include
<libm-alias-double.h>.
(ceil): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Include
<libm-alias-double.h>.
(floor): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_round.S: Include
<libm-alias-double.h>.
(round): Define using libm_alias_double.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_trunc.S: Include
<libm-alias-double.h>.
(trunc): Define using libm_alias_double.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes powerpc fabs and fma use libm_alias_double
to define function aliases. This brings in automatic symbol
versioning compat handling, so the powerpc32 and powerpc64 wrappers
that added such handling to the generic sysdeps/powerpc/fpu versions
are removed as no longer required (there are no sysdeps directory
ordering issues that would necessitate keeping trivial wrappers
there).
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged for all its hard-float powerpc configurations.
* sysdeps/powerpc/fpu/s_fabs.S: Include <libm-alias-double.h>.
(fabs): Define using libm_alias_double.
* sysdeps/powerpc/fpu/s_fma.S: Include <libm-alias-double.h>.
(fma): Define using libm_alias_double.
* sysdeps/powerpc/powerpc32/fpu/s_fabs.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_fma.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_fabs.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_fma.S: Likewise.
On POWER9, cbrtf128 fails by 1 ULP.
* sysdeps/powerpc/fpu/libm-test-ulps: Regenerate.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes an e500 libm function implementation use
libm_alias_float to define function aliases.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged for its e500 configurations.
* sysdeps/powerpc/powerpc32/e500/nofpu/s_fabsf.S: Include
<libm-alias-float.h>.
(fabsf): Define using libm_alias_float.
This patch continues filling out TS 18661-3 support by adding *f64x
function aliases on platforms with _Float64x support. (It so happens
the set of such platforms is exactly the same as the set of platforms
with _Float128 support, although on x86_64, x86 and ia32 the _Float64x
format is Intel extended rather than binary128.) The API provided
corresponds exactly to that provided for _Float128, mostly coming from
TS 18661-3. As these functions always alias those for another type
(long double, _Float128 or both), __* function names are not provided,
as in other cases of alias types.
Given the preparation done in previous patches, this one just enables
the feature via Makeconfig and bits/floatn.h, adds symbol versions,
and updates documentation and ABI baselines. The symbol versions are
present unconditionally as GLIBC_2.27 in the relevant Versions files,
as it's OK for those to specify versions for functions that may not be
present in some configurations; no additional complexity is needed
unless in future some configuration gains support for this type that
didn't have such support in 2.27. The Makeconfig additions for ia64
and x86 aren't strictly needed, as those configurations also get
float64x-alias-fcts definitions from
sysdeps/ieee754/float128/Makeconfig, but still seem appropriate given
that _Float64x is not _Float128 for those configurations.
A libm-test-ulps update for x86 is included. This is because
bits/mathinline.h does not have _Float64x support added and for two
functions the use of out-of-line functions results in increased ulps
(ifloat64x shares ulps with ildouble / ifloat128 as appropriate).
Given that we'd like generally to eliminate bits/mathinline.h
optimizations, preferring to have such optimizations in GCC instead,
it seems reasonable not to add such support there for new types. GCC
support for _FloatN / _FloatNx built-in functions is limited, but has
been improved in GCC 8, and at some point I hope the full set of libm
built-in functions in GCC, and other optimizations with
per-floating-type aspects, will be enabled for all _FloatN / _FloatNx
types.
Tested for x86_64 and x86, and with build-many-glibcs.py, with both
GCC 6 and GCC 7.
* sysdeps/ia64/Makeconfig (float64x-alias-fcts): New variable.
* sysdeps/ieee754/float128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/ieee754/ldbl-128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/x86/Makeconfig: New file.
* bits/floatn-common.h (__HAVE_FLOAT64X): Remove macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* bits/floatn.h (__HAVE_FLOAT64X): New macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ia64/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h (__HAVE_FLOAT64X):
Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/mips/ieee754/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/powerpc/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/x86/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* manual/math.texi (Mathematics): Document support for _Float64x.
* math/Versions (GLIBC_2.27): Add _Float64x functions.
* stdlib/Versions (GLIBC_2.27): Likewise.
* wcsmbs/Versions (GLIBC_2.27): Likewise.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
Supporting _Float64x on powerpc64le means that tests of that type need
to use -mfloat128 just like tests of _Float128. This patch adds the
necessary uses of that option.
Tested (compilation only) for powerpc64le with build-many-glibcs.py,
in conjunction with _Float64x support patches.
* sysdeps/powerpc/powerpc64le/Makefile ($(foreach
suf,$(all-object-suffixes),$(objpfx)test-float64x%$(suf))): Add
-mfloat128 to CFLAGS.
($(foreach
suf,$(all-object-suffixes),$(objpfx)test-ifloat64x%$(suf))):
Likewise.
(CFLAGS-libm-test-support-float64x.c): New variable.
($(objpfx)test-float64x% $(objpfx)test-ifloat64x%): Add
$(f128-loader-link) to gnulib-tests.
Linux commit ID cba6ac4869e45cc93ac5497024d1d49576e82666 reserved a new
bit for a scenario where transactional memory is available, but the
suspended state is disabled.
* sysdeps/powerpc/bits/hwcap.h (PPC_FEATURE2_HTM_NO_SUSPEND): New
macro.
* sysdeps/powerpc/dl-procinfo.c (_dl_powerpc_cap_flags): Add
htm-no-suspend.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Further _FloatN / _FloatNx type alias support will involve making
architecture-specific .S files use the common macros for libm function
aliases. Making them use those macros will also serve to simplify
existing code for aliases / symbol versions in various cases, similar
to such simplifications for ldbl-opt code.
The libm-alias-*.h files sometimes need to include <bits/floatn.h> to
determine which aliases they should define. At present, this does not
work for inclusion from .S files because <bits/floatn.h> can define
typedefs for old compilers. This patch changes all the
<bits/floatn.h> and <bits/floatn-common.h> headers to include
__ASSEMBLER__ conditionals. Those conditionals disable everything
related to C syntax in the __ASSEMBLER__ case, not just the problem
typedefs, as that seemed cleanest. The __HAVE_* definitions remain in
the __ASSEMBLER__ case, as those provide information that is required
to define the correct set of aliases.
Tested with build-many-glibcs.py for a representative set of
configurations (x86_64-linux-gnu i686-linux-gnu ia64-linux-gnu
powerpc64le-linux-gnu mips64-linux-gnu-n64 sparc64-linux-gnu) with GCC
6. Also tested with GCC 6 for i686-linux-gnu in conjunction with
changes to use alias macros in .S files.
* bits/floatn-common.h [!__ASSEMBLER]: Disable everything related
to C syntax instead of availability and properties of types.
* bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/ia64/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/mips/ieee754/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/powerpc/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/x86/bits/floatn.h [!__ASSEMBLER]: Likewise.
This patch adds two new internal defines to set the internal
pthread_mutex_t layout required by the supported ABIS:
1. __PTHREAD_MUTEX_NUSERS_AFTER_KIND which control whether to define
__nusers fields before or after __kind. The preferred value for
is 0 for new ports and it sets __nusers before __kind.
2. __PTHREAD_MUTEX_USE_UNION which control whether internal __spins and
__list members will be place inside an union for linuxthreads
compatibility. The preferred value is 0 for ports and it sets
to not use an union to define both fields.
It fixes the wrong offsets value for __kind value on x86_64-linux-gnu-x32.
Checked with a make check run-built-tests=no on all afected ABIs.
[BZ #22298]
* nptl/allocatestack.c (allocate_stack): Check if
__PTHREAD_MUTEX_HAVE_PREV is non-zero, instead if
__PTHREAD_MUTEX_HAVE_PREV is defined.
* nptl/descr.h (pthread): Likewise.
* nptl/nptl-init.c (__pthread_initialize_minimal_internal):
Likewise.
* nptl/pthread_create.c (START_THREAD_DEFN): Likewise.
* sysdeps/nptl/fork.c (__libc_fork): Likewise.
* sysdeps/nptl/pthread.h (PTHREAD_MUTEX_INITIALIZER): Likewise.
* sysdeps/nptl/bits/thread-shared-types.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
(__pthread_internal_list): Check __PTHREAD_MUTEX_USE_UNION instead
of __WORDSIZE for internal layout.
(__pthread_mutex_s): Check __PTHREAD_MUTEX_NUSERS_AFTER_KIND instead
of __WORDSIZE for internal __nusers layout and __PTHREAD_MUTEX_USE_UNION
instead of __WORDSIZE whether to use an union for __spins and __list
fields.
(__PTHREAD_MUTEX_HAVE_PREV): Define also for __PTHREAD_MUTEX_USE_UNION
case.
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/s390/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/tile/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/x86/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch adds a new build test to check for internal fields
offsets for user visible internal field. Although currently
the only field which is statically initialized to a non zero value
is pthread_mutex_t.__data.__kind value, the tests also check the
offset of __kind, __spins, __elision (if supported), and __list
internal member. A internal header (pthread-offset.h) is added
to each major ABI with the reference value.
Checked on x86_64-linux-gnu and with a build check for all affected
ABIs (aarch64-linux-gnu, alpha-linux-gnu, arm-linux-gnueabihf,
hppa-linux-gnu, i686-linux-gnu, ia64-linux-gnu, m68k-linux-gnu,
microblaze-linux-gnu, mips64-linux-gnu, mips64-n32-linux-gnu,
mips-linux-gnu, powerpc64le-linux-gnu, powerpc-linux-gnu,
s390-linux-gnu, s390x-linux-gnu, sh4-linux-gnu, sparc64-linux-gnu,
sparcv9-linux-gnu, tilegx-linux-gnu, tilegx-linux-gnu-x32,
tilepro-linux-gnu, x86_64-linux-gnu, and x86_64-linux-x32).
* nptl/pthreadP.h (ASSERT_PTHREAD_STRING,
ASSERT_PTHREAD_INTERNAL_OFFSET): New macro.
* nptl/pthread_mutex_init.c (__pthread_mutex_init): Add build time
checks for internal pthread_mutex_t offsets.
* sysdeps/aarch64/nptl/pthread-offsets.h
(__PTHREAD_MUTEX_NUSERS_OFFSET, __PTHREAD_MUTEX_KIND_OFFSET,
__PTHREAD_MUTEX_SPINS_OFFSET, __PTHREAD_MUTEX_ELISION_OFFSET,
__PTHREAD_MUTEX_LIST_OFFSET): New macro.
* sysdeps/alpha/nptl/pthread-offsets.h: Likewise.
* sysdeps/arm/nptl/pthread-offsets.h: Likewise.
* sysdeps/hppa/nptl/pthread-offsets.h: Likewise.
* sysdeps/i386/nptl/pthread-offsets.h: Likewise.
* sysdeps/ia64/nptl/pthread-offsets.h: Likewise.
* sysdeps/m68k/nptl/pthread-offsets.h: Likewise.
* sysdeps/microblaze/nptl/pthread-offsets.h: Likewise.
* sysdeps/mips/nptl/pthread-offsets.h: Likewise.
* sysdeps/nios2/nptl/pthread-offsets.h: Likewise.
* sysdeps/powerpc/nptl/pthread-offsets.h: Likewise.
* sysdeps/s390/nptl/pthread-offsets.h: Likewise.
* sysdeps/sh/nptl/pthread-offsets.h: Likewise.
* sysdeps/sparc/nptl/pthread-offsets.h: Likewise.
* sysdeps/tile/nptl/pthread-offsets.h: Likewise.
* sysdeps/x86_64/nptl/pthread-offsets.h: Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Update strcasestr-power8 to use power8 version of strnlen for
calculating length.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
The powerpc bits/floatn.h declares _Float128 support to be present
when the compiler supports it for powerpc64le. However, in the case
where -mlong-double-64 is used, __MATH_TG does not actually support
_Float128; it only supports _Float128 in the distinct-long-double
case.
This shows up as a build failure when building glibc mainline with GCC
mainline, given the recently added sanity check in math.h for
configurations supported by __MATH_TG, as the compat code for
-mlong-double-64 fails to build. However, the bug was logically
present before that change (including in 2.26), just less visible.
This patch fixes the build failure by declaring _Float128 to be
unsupported in that case. (Of course this can't actually stop users
calling the type-generic macros with _Float128 arguments with
-mlong-double-64, just as they could be called with other unsupported
types on other platforms, but perhaps makes it less likely by making
all the type-specific _Float128 interfaces invisible in that case.)
Tested compilation for powerpc64le with build-many-glibcs.py.
[BZ #22402]
* sysdeps/powerpc/bits/floatn.h: Include <bits/long-double.h>.
[__NO_LONG_DOUBLE_MATH] (__HAVE_FLOAT128): Define to 0.
This is another one where we'll be wanting the base symbols for
powerpc64le rather than just a power7 variant.
* sysdeps/powerpc/powerpc64/multiarch/strncase_l-power7.c: Include
string/strncase_l.c, not string/strncase.c.
(USE_IN_EXTENDED_LOCALE_MODEL): Don't define.
(libc_hidden_def): Redefine.
The routine being assembled here is strcasecmp_l, so ask for that via
__STRCMP and STRCMP defines. That change means tweaking the power7
override. Needed for later powerpc64le changes where we want the base
symbols, not just a power7 variant.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S:
(__STRCMP, STRCMP, __strcasecmp_l): Define.
(__strcasecmp): Don't define.
These functions aren't used in ld.so at the moment since we don't have
strcmp or strncmp ifuncs for them there. Remove the ld.so bloat.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S: Wrap in
IS_IN (libc).
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise.
USE_AS_STPNCPY is defined by sysdeps/powerpc/powerpc64/power8/stpncpy.S,
included by this file.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Don't define
USE_AS_STPNCPY.
It seems to me that libc.a should not contain any of the __GI_
symbols, and certainly --enable-multi-arch ought to not add to the
list. At the end of this patch series we have the following in both
--enable-multi-arch and --disable-multi-arch libc.a:
0000000000000000 T __GI___readdir64
0000000000000000 T __GI___fxstatat64
0000000000000000 T __GI_getrlimit
0000000000000000 T __GI___getrlimit
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S (hidden_def):
Redefine only when SHARED.
POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load traps to the kernel, causing a performance degradation. To
handle this in memcpy and memmove, lvx/stvx is used for aligned
addresses instead of lxvd2x/stxvd2x.
Reference: https://patchwork.ozlabs.org/patch/814059/
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Replace
lxvd2x/stxvd2x with lvx/stvx.
* sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
cfi info for stack adjust needs to be on the insn doing the adjust.
cfi describing register saves can be anywhere after the save insn but
before the reg is altered. Fewer locations with cfi result in smaller
cfi programs and possibly slightly faster exception handling. Thus
the LR cfi_offset move.
The idea behind ajusting sp after restoring regs is to break a
register dependency chain, in this case not be using r1 immediately
after it is modified.
The missing LR cfi_restore meant that code after the blr,
unaligned_lt_16 and other labels, would have cfi that said LR was at
cfa+16, but that code is reached without LR being saved.
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Move LR cfi.
Adjust stack after restoring regs. Add missing LR cfi_restore.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This patch moves the frame setup and teardown to immediately around
the single memset call, as has been done for power8. I've also
decreased FRAMESIZE to that needed to save the two callee-saved
registers used. Plus added cfi.
* sysdeps/powerpc/powerpc64/power7/strncpy.S: Decrease FRAMESIZE.
Move LR save and frame setup/teardown and LR restore to
immediately around memset call. Provide cfi.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
The bits/floatn.h header currently only has defines relating to
_Float128. This patch adds defines relating to other _FloatN /
_FloatNx types.
The approach taken is to add defines for all _FloatN / _FloatNx types
known to GCC, and to put them in a common bits/floatn-common.h header
included at the end of all the individual bits/floatn.h headers. If
in future some defines become different for different glibc
configurations, they will move out into the separate bits/floatn.h
headers.
Some defines are expected always to be the same across glibc ports.
Corresponding defines are nevertheless put in this header. The intent
is that where there are conditionals (in headers or in non-installed
files) that can just repeat the same or nearly the same logic for each
floating-point type, they should do so, even if in fact the cases for
some types could be unconditionally present or absent because the same
conditionals are true or false for all glibc configurations. This
should make the glibc code with such conditionals easier to read,
because the reader can just see that the same conditionals are
repeated for each type, rather than seeing different conditionals for
different types and needing to reason, at each location with such
differences, why those differences are indeed correct there. (Cases
involving per-format rather than per-type logic are more likely still
to need differences in how they handle different types.)
Having such defines and conditionals also helps in incremental
preparation for adding _Float32 / _Float64 / _Float32x / _Float64x
function aliases. I intend subsequent patches to add such
conditionals corresponding to those already present for _Float128, as
well as making more architecture-specific function implementations use
common macros to define aliases in preparation for adding such _FloatN
/ _FloatNx aliases.
Tested for x86_64.
* bits/floatn-common.h: New file.
* math/Makefile (headers): Add bits/floatn-common.h.
* bits/floatn.h: Include <bits/floatn-common.h>.
* sysdeps/ia64/bits/floatn.h: Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
* sysdeps/mips/ieee754/bits/floatn.h: Likewise.
* sysdeps/powerpc/bits/floatn.h: Likewise.
* sysdeps/x86/bits/floatn.h: Likewise.
A performance regression was introduced by commit
84d74e427a "powerpc: Cleanup fenv_private.h".
In the powerpc implementation of SET_RESTORE_ROUND, there is the
following code in the "SET" function (slightly simplified):
--
old.fenv = fegetenv_register ();
new.l = (old.l & _FPU_MASK_TRAPS_RN) | r; (1)
if (new.l != old.l) (2)
{
if ((old.l & _FPU_ALL_TRAPS) != 0)
(void) __fe_mask_env ();
fesetenv_register (new.fenv); (3)
--
Line (1) sets the value of "new" to the current value of FPSCR,
but masks off summary bits, exceptions, non-IEEE mode, and
rounding mode, then ORs in the new rounding mode.
Line (2) compares this new value to the current value in order to
avoid setting a new value in the FPSCR (line (3)) unless something
significant has changed (exception enables or rounding mode).
The summary bits are not germane to the comparison, but are cleared
in "new" and preserved in "old", resulting in false negative
comparisons, and unnecessarily setting the FPSCR in those cases
with associated negative performance impacts.
The solution is to treat the summaries identically for "new" and "old":
- save them in SET
- leave them alone otherwise
- restore the saved values in RESTORE
Also minor changes:
- expand _FPU_MASK_RN to 64bit hex, to match other MASKs
- treat bit 52 (left-to-right) as reserved (since it is)
* sysdeps/powerpc/fpu/fenv_private.h (_FPU_MASK_TRAPS_RN):
(_FPU_MASK_FRAC_INEX_RET_CC): Fix masks to more properly handle
summary bits.
(_FPU_MASK_RN): Expand _FPU_MASK_RN to 64bit hex.
(_FPU_MASK_NOT_RN_NI): Treat bit 52 (left-to-right) as reserved.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Normally, TLS relocations against local symbols are optimised by the linker
to be absolute. However, gold does not do this, and so it is possible to
end up with, for example, R_SPARC_TLS_DTPMOD64 referring to a local symbol.
Since sym_map is left as null in elf_machine_rela for the special local
symbol case, the relocation handling thinks it has nothing to do, and so
the module gets left as 0. Havoc then ensues when the variable in question
is accessed.
Before this fix, the main_local_gold program would receive a SIGBUS on
sparc64, and SIGSEGV on powerpc32. With this fix applied, that test now
passes like the rest of them.
* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
Assign sym_map to be map for local symbols, as TLS relocations
use sym_map to determine whether the symbol is defined and to
extract the TLS information.
* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela): Likewise.