This patch optimizes the generic spinlock code.
The type pthread_spinlock_t is a typedef to volatile int on all archs.
Passing a volatile pointer to the atomic macros which are not mapped to the
C11 atomic builtins can lead to extra stores and loads to stack if such
a macro creates a temporary variable by using "__typeof (*(mem)) tmp;".
Thus, those macros which are used by spinlock code - atomic_exchange_acquire,
atomic_load_relaxed, atomic_compare_exchange_weak - have to be adjusted.
According to the comment from Szabolcs Nagy, the type of a cast expression is
unqualified (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_423.htm):
__typeof ((__typeof (*(mem)) *(mem)) tmp;
Thus from spinlock perspective the variable tmp is of type int instead of
type volatile int. This patch adjusts those macros in include/atomic.h.
With this construct GCC >= 5 omits the extra stores and loads.
The atomic macros are replaced by the C11 like atomic macros and thus
the code is aligned to it. The pthread_spin_unlock implementation is now
using release memory order instead of sequentially consistent memory order.
The issue with passed volatile int pointers applies to the C11 like atomic
macros as well as the ones used before.
I've added a glibc_likely hint to the first atomic exchange in
pthread_spin_lock in order to return immediately to the caller if the lock is
free. Without the hint, there is an additional jump if the lock is free.
I've added the atomic_spin_nop macro within the loop of plain reads.
The plain reads are also realized by C11 like atomic_load_relaxed macro.
The new define ATOMIC_EXCHANGE_USES_CAS determines if the first try to acquire
the spinlock in pthread_spin_lock or pthread_spin_trylock is an exchange
or a CAS. This is defined in atomic-machine.h for all architectures.
The define SPIN_LOCK_READS_BETWEEN_CMPXCHG is now removed.
There is no technical reason for throwing in a CAS every now and then,
and so far we have no evidence that it can improve performance.
If that would be the case, we have to adjust other spin-waiting loops
elsewhere, too! Using a CAS loop without plain reads is not a good idea
on many targets and wasn't used by one. Thus there is now no option to
do so.
Architectures are now using the generic spinlock automatically if they
do not provide an own implementation. Thus the pthread_spin_lock.c files
in sysdeps folder are deleted.
ChangeLog:
* NEWS: Mention new spinlock implementation.
* include/atomic.h:
(__atomic_val_bysize): Cast type to omit volatile qualifier.
(atomic_exchange_acq): Likewise.
(atomic_load_relaxed): Likewise.
(ATOMIC_EXCHANGE_USES_CAS): Check definition.
* nptl/pthread_spin_init.c (pthread_spin_init):
Use atomic_store_relaxed.
* nptl/pthread_spin_lock.c (pthread_spin_lock):
Use C11-like atomic macros.
* nptl/pthread_spin_trylock.c (pthread_spin_trylock):
Likewise.
* nptl/pthread_spin_unlock.c (pthread_spin_unlock):
Use atomic_store_release.
* sysdeps/aarch64/nptl/pthread_spin_lock.c: Delete File.
* sysdeps/arm/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/hppa/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/m68k/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/microblaze/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/mips/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/nios2/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/aarch64/atomic-machine.h (ATOMIC_EXCHANGE_USES_CAS): Define.
* sysdeps/alpha/atomic-machine.h: Likewise.
* sysdeps/arm/atomic-machine.h: Likewise.
* sysdeps/i386/atomic-machine.h: Likewise.
* sysdeps/ia64/atomic-machine.h: Likewise.
* sysdeps/m68k/coldfire/atomic-machine.h: Likewise.
* sysdeps/m68k/m680x0/m68020/atomic-machine.h: Likewise.
* sysdeps/microblaze/atomic-machine.h: Likewise.
* sysdeps/mips/atomic-machine.h: Likewise.
* sysdeps/powerpc/powerpc32/atomic-machine.h: Likewise.
* sysdeps/powerpc/powerpc64/atomic-machine.h: Likewise.
* sysdeps/s390/atomic-machine.h: Likewise.
* sysdeps/sparc/sparc32/atomic-machine.h: Likewise.
* sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: Likewise.
* sysdeps/sparc/sparc64/atomic-machine.h: Likewise.
* sysdeps/tile/tilegx/atomic-machine.h: Likewise.
* sysdeps/tile/tilepro/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/nios2/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/sh/atomic-machine.h: Likewise.
* sysdeps/x86_64/atomic-machine.h: Likewise.
With the removal of divdi3 object from sparcv9-linux-gnu build, its
definition came from libgcc and its functions internall calls .udiv.
Since glibc also exports these symbols for compatibility reasons, it
will end up creating PLT calls internally in libc.so.
To avoid it, this patch uses the linker option --wrap to replace all
the internal libc.so .udiv calls to the wrapper __wrap_.udiv. Along
with strong alias in the udiv implementations, it makes linker do
local calls.
Checked on sparcv9-linux-gnu.
* sysdeps/sparc/sparc32/Makefile (libc.so-gnulib): New rule.
* sysdeps/sparc/sparc32/sparcv8/udiv.S (.udiv): Make a strong_alias
to __wrap_.udiv.
* sysdeps/sparc/sparc32/sparcv9/udiv.S (.udiv): Likewise.
* sysdeps/sparc/sparc32/udiv.S (.udiv): Likewise.
This commit moves one step towards the deprecation of wrappers that
use _LIB_VERSION / matherr / __kernel_standard functionality, by
adding the suffix '_compat' to their filenames and adjusting Makefiles
and #includes accordingly.
New template wrappers that do not use such functionality will be added
by future patches and will be first used by the float128 wrappers.
This was used by --enable-omitfp, and the bulk of it was removed in this
commit:
commit bdeba1354b
Author: Ulrich Drepper <drepper@gmail.com>
Date: Sat Jan 7 11:29:31 2012 -0500
Remove --enable-omitfp support
Current sparc32 sem_init and default one only differ on sem.newsem.pad
initialization. This patch removes sparc32 and sparc32v9 sem_init arch
specific implementation and set sparc32 to use nptl default one.
The default implementation sets the required sem.newsem.pad to 0 (which
is ununsed in other architectures).
I checked on i686 and a sparc32v9 build.
* nptl/sem_init.c (sem_init): Init pad value to 0.
* sysdeps/sparc/sparc32/sem_init.c: Remove file.
* sysdeps/sparc/sparc32/sparcv9/sem_init.c: Likewise.
This patch removes the sparc32 sem_wait.c implementation since it is
identical to default nptl one. The sparcv9 is no longer required with
the removal.
Checked with a sparcv9 build.
* sysdeps/sparc/sparc32/sem_wait.c: Remove file.
* sysdeps/sparc/sparc32/sparcv9/sem_wait.c: Likewise.
Current sparc32 sem_open and default one only differ on:
1. Default one contains a 'futex_supports_pshared' check.
2. sem.newsem.pad is initialized to zero.
This patch removes sparc32 and sparc32v9 sem_open arch specific
implementation and instead set sparc32 to use nptl default one.
Using 1. is fine since it should always evaluate 0 for Linux
(an optimized away by the compiler). Adding 2. to default
implementation should be ok since 'pad' field is used mainly
on sparc32 code.
I checked on i686 and checked a sparc32v9 build.
* nptl/sem_open.c (sem_open): Init pad value to 0.
* sysdeps/sparc/sparc32/sem_open.c: Remove file.
* sysdeps/sparc/sparc32/sparcv9/sem_open.c: Likewise.
The only difference is the usage of math_narrow_eval when
building s_fdiml.c. This should be harmless for long double,
but I did observe some code generation changes on m68k, but
lack the resources to test it.
Likewise, to more easily support overriding symbol generation,
the aliasing macros are always conditionally defined on their
absence to reduce boilerplate.
I also ran builds for i486, ppc64, sparcv9, aarch64,
s390x and observed no changes to s_fdim* objects.
sparc32 passes floating point values in the integer registers. VIS3
instructions gives access to the movwtos instruction to directly
transfer a value from an integer register to a floating point register.
Therefore it makes sense to provide a VIS3 version consisting in the
generic version compiled with -mvis3.
Changelog:
* math/s_fdim.c: Avoid alias renamed.
* math/s_fdimf.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile
[$(subdir) = math && $(have-as-vis3) = yes] (libm-sysdep_routines):
Add s_fdimf-vis3, s_fdim-vis3.
(CFLAGS-s_fdimf-vis3.c): New. Set to -Wa,-Av9d -mvis3.
(CFLAGS-s_fdim-vis3.c): Likewise.
sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim-vis3.c: New file.
sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim.c: Likewise.
The fdim and fdimf functions on sparc do not fully follow the standard
and do not set errno to ERANGE when the result overflows. Since glibc
2.24 this causes the two following tests to fail:
Failure: fdim (max_value, -max_value): errno set to 0, expected 34 (ERANGE)
Failure: fdim_upward (max_value, -max_value): errno set to 0, expected 34 (ERANGE)
It happens that using GCC with the generic C code generates very similar
code to the sparc specific implementations. Therefore this patches
remove them. Note it might still worth adding a vis3 specific version of
fdim on sparc32/sparcv9, this is done in a following patch to ease
backporting.
Changelog:
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile
[$(subdir) = math && $(have-as-vis3) = yes] (libm-sysdep_routines):
Remove s_fdimf-vis3, s_fdim-vis3.
* sysdeps/sparc/sparc32/fpu/s_fdim.S: Delete file.
* sysdeps/sparc/sparc32/fpu/s_fdimf.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdimf-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdimf.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_fdim.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_fdimf.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_fdim.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_fdimf.S: Likewise.
When building for sparc32/sparcv9 or sparc64, we assume that VIS
instructions are available and use them in the sparc specific assembly
code. However we do not tell GCC to use such instructions, resulting in
slightly suboptimal code.
Fix that by passing -Wa,-Av9a -mvis to GCC.
Changelog:
* sysdeps/sparc/sparc32/sparcv9/Makefile (sysdep-CFLAGS): Add -mvis.
* sysdeps/sparc/sparc64/Makefile (sysdep-CFLAGS): New. Define to
-Wa,-Av9a -mvis.
The ceil, floor and trunc functions on sparc do not fully follow the
standard and trigger an inexact exception when presented a value which
is not an integer. Since glibc 2.24 this causes a few tests to fail,
for instance:
testing double (without inline functions)
Failure: ceil (lit_pi): Exception "Inexact" set
Failure: ceil (-lit_pi): Exception "Inexact" set
Failure: ceil (min_subnorm_value): Exception "Inexact" set
Failure: ceil (min_value): Exception "Inexact" set
Failure: ceil (0.1): Exception "Inexact" set
Failure: ceil (0.25): Exception "Inexact" set
Failure: ceil (0.625): Exception "Inexact" set
Failure: ceil (-min_subnorm_value): Exception "Inexact" set
Failure: ceil (-min_value): Exception "Inexact" set
Failure: ceil (-0.1): Exception "Inexact" set
Failure: ceil (-0.25): Exception "Inexact" set
Failure: ceil (-0.625): Exception "Inexact" set
I tried to fix that by using the same strategy than used on other
architectures, that is by saving the FSR register at the beginning
and restoring it at the end of the function. When doing so I noticed
a comment that this operation might be very costly, so I decided to
do some benchmarks.
The benchmarks below represent the time required to run each of the
function 60 millions of times with different input value. I have done
that in the basic V9 code, the VIS2 code, and using the default C
implementation of the libc, for both sparc32 and sparc64, on a Niagara
T1 based machine and an UltraSparc IIIi. Given I don't have access to a
more recent machine), I haven't been able to test the VIS3 version. Also
it should be noted that it doesn't make sense to do this benchmark for
V8 or earlier as in that case we use the default C implementation. The
results are available in the table below, the "+ fix" version correspond
to the one saving and restoring the FSR.
Niagara T1 / sparc32
--------------------
ceilf ceil floorf floor truncf trunc
V9 19.10 22.48 19.10 22.48 16.59 19.27
V9 + fix 19.77 23.34 19.77 23.33 17.27 20.12
VIS2 16.87 19.62 16.87 19.62
VIS2 + fix 17.55 20.47 17.55 20.47
C impl 11.39 13.80 11.40 13.80 10.88 10.84
Niagara T1 / sparc64
--------------------
ceilf ceil floorf floor truncf trunc
V9 18.14 22.23 18.14 22.23 15.64 19.02
V9 + fix 18.82 23.08 18.82 23.08 16.32 19.87
VIS2 15.92 19.37 15.92 19.37
VIS2 + fix 16.59 20.22 16.59 20.22
C impl 11.39 13.60 11.39 15.36 10.88 12.65
UltraSparc IIIi / sparc32
-------------------------
ceilf ceil floorf floor truncf trunc
V9 4.81 7.09 6.61 11.64 4.91 7.05
V9 + fix 7.20 10.42 7.14 10.54 6.76 9.47
VIS2 4.81 7.03 4.76 7.13
VIS2 + fix 6.76 9.51 6.71 9.63
C impl 3.88 8.62 3.90 9.45 3.57 6.62
UltraSparc IIIi / sparc64
-------------------------
ceilf ceil floorf floor truncf trunc
V9 3.48 4.39 3.48 4.41 3.01 3.85
V9 + fix 4.76 5.90 4.76 5.90 4.86 6.26
VIS2 2.95 3.61 2.95 3.61
VIS2 + fix 4.24 5.37 4.30 7.97
C impl 3.63 4.89 3.62 6.38 3.33 4.03
The first thing that should be noted is that the C implementation is
always faster on the Niagara T1 based machine. On the UltraSparc IIIi
the float version on sparc32 is also faster.
Coming back about the fix saving and restoring the FSR, it appears
it has a big impact as expected. In that case the C implementation is
always faster than the fixed implementations.
This patch therefore removes the sparc specific implementations in
favor of the generic ones.
Changelog:
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile
[$(subdir) = math] (libm-sysdep_routines): Remove.
[$(subdir) = math && $(have-as-vis3) = yes] (libm-sysdep_routines):
Remove s_ceilf-vis3, s_ceil-vis3, s_floorf-vis3, s_floor-vis3,
s_truncf-vis3, s_trunc-vis3.
* sysdeps/sparc/sparc64/fpu/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceil-vis2.S: Delete
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceil-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceil.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceilf-vis2.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceilf-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_ceilf.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floor-vis2.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floor-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floor.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floorf-vis2.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floorf-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_floorf.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf-vis3.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_ceil.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_ceilf.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_floor.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_floorf.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_trunc.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_truncf.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceil-vis2.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceil-vis3.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf-vis2.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf-vis3.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floor-vis2.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floor-vis3.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf-vis2.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf-vis3.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc-vis3.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf-vis3.S: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_ceil.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_ceilf.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_floor.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_floorf.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_trunc.S: Likewise.
* sysdeps/sparc/sparc64/fpu/s_truncf.S: Likewise.
nearbyint and nearbyintf should not trigger inexact exceptions, but
should still trigger an invalid exception for a sNaN input.
The SPARC specific implementations of these functions save the FSR at
the beginning of the function and restore it at the end to not trigger
an inexact exception. This however doesn't work for an sNaN input which
need to trigger an invalid exception. Fix that by adding a fcmp
instruction using the input value before saving FSR, so that an invalid
exception is triggered for a sNaN input.
This fixes the math/test-nearbyint-except test on SPARC.
Changelog:
* sparc/sparc32/sparcv9/fpu/s_nearbyint.S (__nearbyint): Trigger an
invalid exception for a sNaN input.
* sparc/sparc32/sparcv9/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyint-vis3.S
(__nearbyint_vis3): Likewise
* sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyintf-vis3.S
(__nearbyintf_vis3): Likewise
* sparc/sparc64/fpu/s_nearbyint.S (__nearbyint): Likewise.
* sparc/sparc64/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sparc/sparc64/fpu/multiarch/s_nearbyint-vis3.S (__nearbyint_vis3):
Likewise.
* sparc/sparc64/fpu/multiarch/s_nearbyintf-vis3.S (__nearbyintf_vis3):
Likewise.
The previous barrier implementation did not fulfill the POSIX requirements
for when a barrier can be destroyed. Specifically, it was possible that
threads that haven't noticed yet that their round is complete still access
the barrier's memory, and that those accesses can happen after the barrier
has been legally destroyed.
The new algorithm does not have this issue, and it avoids using a lock
internally.
It was noted in
<https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the
bits/*.h naming scheme should only be used for installed headers.
This patch renames bits/atomic.h to atomic-machine.h to follow that
convention.
This is the only change in this series that needs to change the
filename rather than simply removing a directory level (because both
atomic.h and bits/atomic.h exist at present).
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch).
[BZ #14912]
* sysdeps/aarch64/bits/atomic.h: Move to ...
* sysdeps/aarch64/atomic-machine.h: ...here.
(_AARCH64_BITS_ATOMIC_H): Rename macro to
_AARCH64_ATOMIC_MACHINE_H.
* sysdeps/alpha/bits/atomic.h: Move to ...
* sysdeps/alpha/atomic-machine.h: ...here.
* sysdeps/arm/bits/atomic.h: Move to ...
* sysdeps/arm/atomic-machine.h: ...here. Update comments.
* bits/atomic.h: Move to ...
* sysdeps/generic/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/i386/bits/atomic.h: Move to ...
* sysdeps/i386/atomic-machine.h: ...here.
* sysdeps/ia64/bits/atomic.h: Move to ...
* sysdeps/ia64/atomic-machine.h: ...here.
* sysdeps/m68k/coldfire/bits/atomic.h: Move to ...
* sysdeps/m68k/coldfire/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/m68k/m680x0/m68020/bits/atomic.h: Move to ...
* sysdeps/m68k/m680x0/m68020/atomic-machine.h: ...here.
* sysdeps/microblaze/bits/atomic.h: Move to ...
* sysdeps/microblaze/atomic-machine.h: ...here.
* sysdeps/mips/bits/atomic.h: Move to ...
* sysdeps/mips/atomic-machine.h: ...here.
(_MIPS_BITS_ATOMIC_H): Rename macro to _MIPS_ATOMIC_MACHINE_H.
* sysdeps/powerpc/bits/atomic.h: Move to ...
* sysdeps/powerpc/atomic-machine.h: ...here. Update comments.
* sysdeps/powerpc/powerpc32/bits/atomic.h: Move to ...
* sysdeps/powerpc/powerpc32/atomic-machine.h: ...here. Update
comments. Include <atomic-machine.h> instead of <bits/atomic.h>.
* sysdeps/powerpc/powerpc64/bits/atomic.h: Move to ...
* sysdeps/powerpc/powerpc64/atomic-machine.h: ...here. Include
<atomic-machine.h> instead of <bits/atomic.h>.
* sysdeps/s390/bits/atomic.h: Move to ...
* sysdeps/s390/atomic-machine.h: ...here.
* sysdeps/sparc/sparc32/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc32/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/sparc/sparc32/sparcv9/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: ...here.
* sysdeps/sparc/sparc64/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc64/atomic-machine.h: ...here.
* sysdeps/tile/bits/atomic.h: Move to ...
* sysdeps/tile/atomic-machine.h: ...here.
* sysdeps/tile/tilegx/bits/atomic.h: Move to ...
* sysdeps/tile/tilegx/atomic-machine.h: ...here. Include
<sysdeps/tile/atomic-machine.h> instead of
<sysdeps/tile/bits/atomic.h>.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/tile/tilepro/bits/atomic.h: Move to ...
* sysdeps/tile/tilepro/atomic-machine.h: ...here. Include
<sysdeps/tile/atomic-machine.h> instead of
<sysdeps/tile/bits/atomic.h>.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/arm/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/arm/atomic-machine.h: ...here. Include
<sysdeps/arm/atomic-machine.h> instead of
<sysdeps/arm/bits/atomic.h>.
* sysdeps/unix/sysv/linux/hppa/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/hppa/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/m68k/coldfire/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/nios2/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/nios2/atomic-machine.h: ...here.
(_NIOS2_BITS_ATOMIC_H): Rename macro to _NIOS2_ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/sh/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/sh/atomic-machine.h: ...here.
* sysdeps/x86_64/bits/atomic.h: Move to ...
* sysdeps/x86_64/atomic-machine.h: ...here.
* include/atomic.h: Include <atomic-machine.h> instead of
<bits/atomic.h>.
This patch combines BUSY_WAIT_NOP and atomic_delay into a new
atomic_spin_nop function and adjusts all clients. The new function is
put into atomic.h because what is best done in a spin loop is
architecture-specific, and atomics must be used for spinning. The
function name is meant to tell users that this has no effect on
synchronization semantics but is a performance aid for spinning.
This sets __HAVE_64B_ATOMICS if provided. It also sets
USE_ATOMIC_COMPILER_BUILTINS to true if the existing atomic ops use the
__atomic* builtins (aarch64, mips partially) or if this has been
tested (x86_64); otherwise, this is set to false so that C11 atomics will
be based on the existing atomic operations.
* sysdeps/sparc/sparc32/sparcv9/bits/atomic.h
(__arch_compare_and_exchange_val_32_acq): Use %g0 as second
argument of CAS if possible.
* sysdeps/sparc/sparc64/bits/atomic.h
(__arch_compare_and_exchange_val_32_acq): Likewise.
(__arch_compare_and_exchange_val_64_acq): Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile: Add vis3
trunc{,f} to libm-sysdep_routes.
* sysdeps/sparc/sparc64/fpu/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_trunc.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_truncf.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc-vis3.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf-vis3.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.S: New file.
* sysdeps/sparc/sparc64/fpu/s_trunc.S: New file.
* sysdeps/sparc/sparc64/fpu/s_truncf.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile: Add vis3
nearbyint{,f} to libm-sysdep_routes.
* sysdeps/sparc/sparc64/fpu/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyint-vis3.S:
New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyint.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyintf-vis3.S:
New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyintf.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_nearbyint.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_nearbyintf.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyint-vis3.S: New
file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyint.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyintf-vis3.S: New
file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyintf.S: New file.
* sysdeps/sparc/sparc64/fpu/s_nearbyint.S: New file.
* sysdeps/sparc/sparc64/fpu/s_nearbyintf.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile: Add vis3
fdim/fdimf to libm-sysdep_routines.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdimf-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdimf.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_fdim.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_fdimf.S: New file.
* sysdeps/sparc/sparc32/fpu/s_fdim.S: New file.
* sysdeps/sparc/sparc32/fpu/s_fdimf.S: New file.
* sysdeps/sparc/sparc64/fpu/s_fdim.S: New file.
* sysdeps/sparc/sparc64/fpu/s_fdimf.S: New file.
* sysdeps/sparc/sparc32/sparcv9/mul_1.S: Properly optimize for 32-bit
sparc V9 rather than using V8 code.
* sysdeps/sparc/sparc32/sparcv9/addmul_1.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/submul_1.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/mul_1.S: Properly optimize for 32-bit
sparc V9 rather than using V8 code.
* sysdeps/sparc/sparc32/sparcv9/addmul_1.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/submul_1.S: Likewise.
* crypt/Makefile: Move test targets after toplevel Rules
inclusion. Grab any necessary sysdep routines when linking.
* crypt/md5.c (md5_process_block): Remove define, we will always
name it __md5_process_block.
(md5_finish_ctx): Update md5_process_block call.
(md5_stream): Likewise.
(md5_process_bytes): Likewise.
(md5_process_block): Rename to __md5_process_block and move to ...
* crypt/md5-block.c: ... here.
* crypt/sha256.c (sha256_process_block): Move to ...
* crypt/sha256-block.c: ... here.
* crypt/sha512.c (sha512_process_block): Move to ...
* crypt/sha512-block.c: ... here.
* locale/Makefile (CFLAGS-md5.c): Define to add crypt/ to include
path.
* sysdeps/sparc/sparc-ifunc.c (sparc_libc_ifunc): Define.
* sysdeps/sparc/sparc64/multiarch/Makefile
(libcrypt-sysdep_routines): Add crypto assembler sysdeps when in
crypt subdir.
(localedef-aux): Add md5 crypto assembler when in locale subdir.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile: Mirror sparc64
multiarch changes.
* sysdeps/sparc/sparc64/multiarch/md5-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/md5-crop.S: New file.
* sysdeps/sparc/sparc64/multiarch/sha256-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/sha256-crop.S: New file.
* sysdeps/sparc/sparc64/multiarch/sha512-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/sha512-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/md5-block.c: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/md5-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha256-block.c: New
file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha256-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha512-block.c: New
file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha512-crop.S: New file.