cfi info for stack adjust needs to be on the insn doing the adjust.
cfi describing register saves can be anywhere after the save insn but
before the reg is altered. Fewer locations with cfi result in smaller
cfi programs and possibly slightly faster exception handling. Thus
the LR cfi_offset move.
The idea behind ajusting sp after restoring regs is to break a
register dependency chain, in this case not be using r1 immediately
after it is modified.
The missing LR cfi_restore meant that code after the blr,
unaligned_lt_16 and other labels, would have cfi that said LR was at
cfa+16, but that code is reached without LR being saved.
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Move LR cfi.
Adjust stack after restoring regs. Add missing LR cfi_restore.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This patch moves the frame setup and teardown to immediately around
the single memset call, as has been done for power8. I've also
decreased FRAMESIZE to that needed to save the two callee-saved
registers used. Plus added cfi.
* sysdeps/powerpc/powerpc64/power7/strncpy.S: Decrease FRAMESIZE.
Move LR save and frame setup/teardown and LR restore to
immediately around memset call. Provide cfi.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This patch replaces i386 assembly versions of e_exp2f with generic
e_exp2f.c. For workload-spec2017.wrf, on Nehalem, it improves
performance by:
Before After Improvement
reciprocal-throughput 112.996 40.0454 182%
latency 126.581 54.4479 132%
On Skylake, it improves performance by:
Before After Improvement
reciprocal-throughput 113.14 39.447 186%
latency 136.068 55.684 144%
On IvyBridge with --disable-multi-arch, it improves performance by:
Before After Improvement
reciprocal-throughput 132.521 40.3759 228%
latency 145.791 58.4587 149%
* sysdeps/i386/fpu/e_exp2f.S: Removed.
* sysdeps/i386/fpu/w_exp2f.c: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_exp2f.c.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
Add e_exp2f-sse2.
(CFLAGS-e_exp2f-sse2.c): New.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f-sse2.c: New file.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Likewise.
The bits/floatn.h header currently only has defines relating to
_Float128. This patch adds defines relating to other _FloatN /
_FloatNx types.
The approach taken is to add defines for all _FloatN / _FloatNx types
known to GCC, and to put them in a common bits/floatn-common.h header
included at the end of all the individual bits/floatn.h headers. If
in future some defines become different for different glibc
configurations, they will move out into the separate bits/floatn.h
headers.
Some defines are expected always to be the same across glibc ports.
Corresponding defines are nevertheless put in this header. The intent
is that where there are conditionals (in headers or in non-installed
files) that can just repeat the same or nearly the same logic for each
floating-point type, they should do so, even if in fact the cases for
some types could be unconditionally present or absent because the same
conditionals are true or false for all glibc configurations. This
should make the glibc code with such conditionals easier to read,
because the reader can just see that the same conditionals are
repeated for each type, rather than seeing different conditionals for
different types and needing to reason, at each location with such
differences, why those differences are indeed correct there. (Cases
involving per-format rather than per-type logic are more likely still
to need differences in how they handle different types.)
Having such defines and conditionals also helps in incremental
preparation for adding _Float32 / _Float64 / _Float32x / _Float64x
function aliases. I intend subsequent patches to add such
conditionals corresponding to those already present for _Float128, as
well as making more architecture-specific function implementations use
common macros to define aliases in preparation for adding such _FloatN
/ _FloatNx aliases.
Tested for x86_64.
* bits/floatn-common.h: New file.
* math/Makefile (headers): Add bits/floatn-common.h.
* bits/floatn.h: Include <bits/floatn-common.h>.
* sysdeps/ia64/bits/floatn.h: Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
* sysdeps/mips/ieee754/bits/floatn.h: Likewise.
* sysdeps/powerpc/bits/floatn.h: Likewise.
* sysdeps/x86/bits/floatn.h: Likewise.
As noted by Florian Weimer, current Linux posix_spawn implementation
can trigger an assert if the auxiliary process is terminated before
actually setting the err member:
340 /* Child must set args.err to something non-negative - we rely on
341 the parent and child sharing VM. */
342 args.err = -1;
[...]
362 new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
363 CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
364
365 if (new_pid > 0)
366 {
367 ec = args.err;
368 assert (ec >= 0);
Another possible issue is killing the child between setting the err and
actually calling execve. In this case the process will not ran, but
posix_spawn also will not report any error:
269
270 args->err = 0;
271 args->exec (args->file, args->argv, args->envp);
As suggested by Andreas Schwab, this patch removes the faulty assert
and also handles any signal that happens before fork and execve as the
spawn was successful (and thus relaying the handling to the caller to
figure this out). Different than Florian, I can not see why using
atomics to set err would help here, essentially the code runs
sequentially (due CLONE_VFORK) and I think it would not be legal the
compiler evaluate ec without checking for new_pid result (thus there
is no need to compiler barrier).
Summarizing the possible scenarios on posix_spawn execution, we
have:
1. For default case with a success execution, args.err will be 0, pid
will not be collected and it will be reported to caller.
2. For default failure case, args.err will be positive and the it will
be collected by the waitpid. An error will be reported to the
caller.
3. For the unlikely case where the process was terminated and not
collected by a caller signal handler, it will be reported as succeful
execution and not be collected by posix_spawn (since args.err will
be 0). The caller will need to actually handle this case.
4. For the unlikely case where the process was terminated and collected
by caller we have 3 other possible scenarios:
4.1. The auxiliary process was terminated with args.err equal to 0:
it will handled as 1. (so it does not matter if we hit the pid
reuse race since we won't possible collect an unexpected
process).
4.2. The auxiliary process was terminated after execve (due a failure
in calling it) and before setting args.err to -1: it will also
be handle as 1. but with the issue of not be able to report the
caller a possible execve failures.
4.3. The auxiliary process was terminated after args.err is set to -1:
this is the case where it will be possible to hit the pid reuse
case where we will need to collected the auxiliary pid but we
can not be sure if it will be expected one. I think for this
case we need to actually change waitpid to use WNOHANG to avoid
hanging indefinitely on the call and report an error to caller
since we can't differentiate between a default failure as 2.
and a possible pid reuse race issue.
Checked on x86_64-linux-gnu.
* sysdeps/unix/sysv/linux/spawni.c (__spawnix): Handle the case where
the auxiliary process is terminated by a signal before calling _exit
or execve.
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers. It simplifies _dl_runtime_resolve and supports
different calling conventions. ld.so code size is reduced by more than
1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.
Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:
Before After Change
Westmere (SSE)/fxsave 345 866 151%
IvyBridge (AVX)/xsave 420 643 53%
Haswell (AVX)/xsave 713 1252 75%
Skylake (AVX+MPX)/xsavec 559 719 28%
Skylake (AVX512+MPX)/xsavec 145 272 87%
Ryzen (AVX)/xsavec 280 553 97%
This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases. With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.
On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises. On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.
[BZ #21265]
* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
New.
* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
(get_common_indeces): Set xsave_state_size, xsave_state_full_size
and bit_arch_XSAVEC_Usable if needed.
(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
and bit_arch_Use_dl_runtime_resolve_opt.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
Removed.
(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
(bit_arch_Prefer_No_AVX512): Updated.
(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
(bit_arch_XSAVEC_Usable): New.
(STATE_SAVE_OFFSET): Likewise.
(STATE_SAVE_MASK): Likewise.
[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
(cpu_features): Add xsave_state_size and xsave_state_full_size.
(index_arch_Use_dl_runtime_resolve_opt): Removed.
(index_arch_Use_dl_runtime_resolve_slow): Likewise.
(index_arch_XSAVEC_Usable): New.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow.
* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
is enabled.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
_dl_runtime_resolve_xsavec.
* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
Removed.
(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
instead of VEC_SIZE.
(REGISTER_SAVE_BND0): Removed.
(REGISTER_SAVE_BND1): Likewise.
(REGISTER_SAVE_BND3): Likewise.
(REGISTER_SAVE_RAX): Always defined to 0.
(VMOV): Removed.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_slow): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_avx512): Likewise.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(USE_FXSAVE): New.
(_dl_runtime_resolve_fxsave): Likewise.
(USE_XSAVE): Likewise.
(_dl_runtime_resolve_xsave): Likewise.
(USE_XSAVEC): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
Removed.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(_dl_runtime_resolve_fxsave): New.
(_dl_runtime_resolve_xsave): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
When --enable-static-pie is used to configure glibc, we need to use
_dl_relocate_static_pie to compute load address in static PIE.
* sysdeps/m68k/dl-machine.h (elf_machine_load_address): Use
_dl_relocate_static_pie instead of _dl_start to compute load
address in static PIE.
After commit 37f802f864 (Remove
__need_IOV_MAX and __need_FOPEN_MAX), UIO_MAXIOV is no longer supplied
(indirectly) through <bits/stdio_lim.h>, so sysdeps/posix/sysconf.c no
longer sees the definition.
This patch adds a MIPS-specific bits/floatn.h header. This header is
identical to the ldbl-128 version except for the comment at the top;
the purpose is to ensure that a 32-bit MIPS build installs a header
that is the same as in a 64-bit MIPS build and so properly shows
_Float128 support to be available for 64-bit compilations, on the
general principle of an installation for one multilib providing
headers also suitable for other multilibs.
Tested with build-many-glibcs.py.
* sysdeps/mips/ieee754/bits/floatn.h: New file.
Similar to bug 21987 for SPARC, MIPS64 wrongly installs the ldbl-128
version of bits/long-double.h, meaning incorrect results when using
headers installed from a 64-bit installation for a 32-bit build. (I
haven't actually seen this cause build failures before its interaction
with bits/floatn.h did so - installed headers wrongly expecting
_Float128 to be available in a 32-bit configuration.)
This patch fixes the bug by moving the MIPS header to
sysdeps/mips/ieee754, which comes before sysdeps/ieee754/ldbl-128 in
the sysdeps directory ordering. (bits/floatn.h will need a similar
fix - duplicating the ldbl-128 version for MIPS will suffice - for
headers from a 32-bit installation to be correct for 64-bit builds.)
Tested with build-many-glibcs.py (compilers build for
mips64-linux-gnu, where there was previously a libstdc++ build failure
as at
<https://sourceware.org/ml/libc-testresults/2017-q4/msg00130.html>).
[BZ #22322]
* sysdeps/mips/bits/long-double.h: Move to ....
* sysdeps/mips/ieee754/bits/long-double.h: ... here.
This patch adds support for *f128 function aliases on platforms where
long double has the binary128 format (and thus GCC 7 provides the
_Float128 type with the same ABI as long double but as a distinct type
in terms of C type compatibility). This is the same API as provided
in glibc 2.26 for powerpc64le / x86_64 / x86 / ia64 where _Float128
has a different format from long double, with the bulk of the API
coming from TS 18661-3. All the functions alias the corresponding
long double functions, and __* function names are not provided since
those are only needed once for each floating-point format, not more
than once for different types with the same format (so for example,
-ffinite-math-only maps foof128 to __fool_finite, while type-generic
macros end up calling e.g. __issignalingl for _Float128 arguments on
such platforms).
The preparation for this feature was done in previous patches, so this
one just needs to add the relevant makefile and header definitions,
and update macro definitions of libm_alias_ldouble_other_r, to turn on
the feature, and update documentation and ABI baselines.
Tested (a) for x86_64, (b) for aarch64, (c) with build-many-glibcs.py
with both GCC 6 and GCC 7.
* sysdeps/ieee754/ldbl-128/Makeconfig: New file.
* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
* sysdeps/ieee754/ldbl-128/float128-abi.h: Likewise.
* sysdeps/generic/libm-alias-ldouble.h: Include <bits/floatn.h>.
[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128]
(libm_alias_ldouble_other_r): Also create _Float128 alias.
* sysdeps/ieee754/ldbl-opt/libm-alias-ldouble.h: Include
<bits/floatn.h>.
[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128]
(libm_alias_ldouble_other_r): Also create _Float128 alias.
* manual/math.texi (Mathematics): Document additional architecture
support for _Float128.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC
symbol instead of _dl_start.
The static address of _DYNAMIC symbol is stored in the first GOT entry.
Here is the change which makes this solution work (part of binutils 2.24):
https://sourceware.org/ml/binutils/2013-06/msg00248.html
i386, x86_64 targets use the same method to do this as well.
The original implementation relies on a trick that R_AARCH64_ABS32 relocation
being resolved at link time and the static address fits in the 32bits.
However, in LP64, normally, the address is defined to be 64 bit.
Here is the C version one which should be portable in all cases.
* sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use
_DYNAMIC symbol to calculate load address.
A performance regression was introduced by commit
84d74e427a "powerpc: Cleanup fenv_private.h".
In the powerpc implementation of SET_RESTORE_ROUND, there is the
following code in the "SET" function (slightly simplified):
--
old.fenv = fegetenv_register ();
new.l = (old.l & _FPU_MASK_TRAPS_RN) | r; (1)
if (new.l != old.l) (2)
{
if ((old.l & _FPU_ALL_TRAPS) != 0)
(void) __fe_mask_env ();
fesetenv_register (new.fenv); (3)
--
Line (1) sets the value of "new" to the current value of FPSCR,
but masks off summary bits, exceptions, non-IEEE mode, and
rounding mode, then ORs in the new rounding mode.
Line (2) compares this new value to the current value in order to
avoid setting a new value in the FPSCR (line (3)) unless something
significant has changed (exception enables or rounding mode).
The summary bits are not germane to the comparison, but are cleared
in "new" and preserved in "old", resulting in false negative
comparisons, and unnecessarily setting the FPSCR in those cases
with associated negative performance impacts.
The solution is to treat the summaries identically for "new" and "old":
- save them in SET
- leave them alone otherwise
- restore the saved values in RESTORE
Also minor changes:
- expand _FPU_MASK_RN to 64bit hex, to match other MASKs
- treat bit 52 (left-to-right) as reserved (since it is)
* sysdeps/powerpc/fpu/fenv_private.h (_FPU_MASK_TRAPS_RN):
(_FPU_MASK_FRAC_INEX_RET_CC): Fix masks to more properly handle
summary bits.
(_FPU_MASK_RN): Expand _FPU_MASK_RN to 64bit hex.
(_FPU_MASK_NOT_RN_NI): Treat bit 52 (left-to-right) as reserved.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This patch moves the generic definition from x86_64 init-arch
to a common header ifunc-init.h. No functional changes is expected.
Checked on a x86_64-linux-gnu build.
* sysdeps/generic/ifunc-init.h: New file.
* sysdeps/x86/init-arch.h: Use generic ifunc-init.h.
With support for _Float128 functions on platforms where that type has
the same ABI as long double, as well as on platforms where it is
ABI-distinct, those functions will need to be exported from glibc's
shared libraries at appropriate symbol versions in each case.
This patch avoids duplication of lists of symbols to export by moving
the symbols other than __* to math/Versions and stdlib/Versions.
There, they are conditional on <float128-abi.h> defining
FLOAT128_VERSION and a default version of that header is added that
does not define that macro. Enabling the float128 function aliases
will then include adding a sysdeps/ieee754/ldbl-128/float128-abi.h
that defines FLOAT128_VERSION to GLIBC_2.27. Symbols __* remain in
sysdeps/ieee754/float128/Versions; those symbols should be present
only once per floating-point format, not once per type.
Note that if any platforms currently lacking support for a type with
binary128 format get glibc support for such a type in future (whether
only as _Float128, or also as a new long double format), and new libm
functions (present for all types) have been added by then, additional
macros will be needed to allow such functions to get a version of the
form "GLIBC_2.28 if the platform had _Float128 support by then, or the
later version at which that platform had _Float128 support added".
This is not however a preexisting condition, but would have applied
equally to the existing support for _Float128 as an ABI-distinct
type. New all-type libm functions should just be added to the
appropriate symbol version (currently GLIBC_2.27) for all types, with
such special-case handling for _Float128 versions (and _Float64x as
well in future) waiting until someone actually wants to add support
for _Float128 to an existing platform after a release in which that
platform and a post-2.26 libm function had support but that platform
lacked _Float128 support.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch. Also tested in conjunction
with the remaining changes to enable float128 aliases.
* sysdeps/generic/float128-abi.h: New file.
* sysdeps/ieee754/float128/Versions (FLOAT128_VERSION): Move
non-__prefixed symbols to ....
* math/Versions: ... here. Include <float128-abi.h>.
* stdlib/Versions ... and here. Include <float128-abi.h>
This patch adds support for building strtof128, wcstof128, strtof128_l
and wcstof128_l as aliases, in the case of __HAVE_FLOAT128 &&
!__HAVE_DISTINCT_FLOAT128.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch. Also tested together with
changes to enable float128 aliases.
* stdlib/strtold.c: Include <bits/floatn.h>
[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (strtof128): Define
and later undefine as macro. Define as weak alias if
[!USE_WIDE_CHAR].
[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (wcstof128): Define
and later undefine as macro. Define as weak alias if
[USE_WIDE_CHAR].
* sysdeps/ieee754/ldbl-128/strtold_l.c [__HAVE_FLOAT128 &&
!__HAVE_DISTINCT_FLOAT128] (strtof128_l): Define and later
undefine as macro. Define as weak alias if [!USE_WIDE_CHAR].
[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (wcstof128_l):
Define and later undefine as macro. Define as weak alias if
[USE_WIDE_CHAR].
* sysdeps/ieee754/ldbl-64-128/strtold_l.c: Include
<bits/floatn.h>.
[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (strtof128_l):
Define and later undefine as macro. Define as weak alias if
[!USE_WIDE_CHAR].
[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (wcstof128_l):
Define and later undefine as macro. Define as weak alias if
[USE_WIDE_CHAR].
This patch makes ldbl-64-128/s_nextafterl.c restore the default
weak_alias definition and use libm_alias_ldouble_other (having
undefined and redefined weak_alias for the include of
ldbl-128/s_nextafterl.c, so the libm_alias_ldouble use in the latter
file is ineffective).
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch. Also tested together with
changes to enable float128 aliases.
* sysdeps/ieee754/ldbl-64-128/s_nextafterl.c (weak_alias):
Undefine and restore default definition. Use
libm_alias_ldouble_other.
Normally, TLS relocations against local symbols are optimised by the linker
to be absolute. However, gold does not do this, and so it is possible to
end up with, for example, R_SPARC_TLS_DTPMOD64 referring to a local symbol.
Since sym_map is left as null in elf_machine_rela for the special local
symbol case, the relocation handling thinks it has nothing to do, and so
the module gets left as 0. Havoc then ensues when the variable in question
is accessed.
Before this fix, the main_local_gold program would receive a SIGBUS on
sparc64, and SIGSEGV on powerpc32. With this fix applied, that test now
passes like the rest of them.
* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
Assign sym_map to be map for local symbols, as TLS relocations
use sym_map to determine whether the symbol is defined and to
extract the TLS information.
* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela): Likewise.
Fix the ifdef clause that was being used in the opposite way, setting
a wrong value of the carry bit.
This is also correcting 2 memory accesses that were mistakenly referring
to r0 while they were supposed to mean the immediate value 0.
[BZ #22142]
* stdio-common/tst-printf.c (fp_test): Add tests for DBL_MAX and
-DBL_MAX.
(do_test): Likewise.
* stdio-common/tst-printf.sh: Likewise.
* sysdeps/powerpc/powerpc64/power7/add_n.S: Invert the initial
ifdef clause in order to set the carry bit right. Replace r0 by
0 without changing the behavior.
This patch makes SPARC fabsl implementation use libm_alias_ldouble, to
prepare them for also defining _Float128 function aliases.
Tested with build-many-glibcs.py that installed stripped shared
libraries (sparc64-linux-gnu and sparcv9-linux-gnu) are unchanged by
the patch.
* sysdeps/sparc/sparc32/fpu/s_fabsl.c: Include
<libm-alias-ldouble.h>.
(fabsl): Define using libm_alias_ldouble.
* sysdeps/sparc/sparc64/fpu/s_fabsl.c: Include
<libm-alias-ldouble.h>.
(fabsl): Define using libm_alias_ldouble.
Testing with changes to enable _Float128 function aliases shows that
the libm_alias_ldouble_other usage in ldbl-opt/w_lgamma_compatl.c does
not in fact work. Furthermore, it is unnecessary; the relevant
aliases get created through w_lgammal_compat2.c. This patch removes
the problem code.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by the patch. Also tested in conjunction with
patches to enable _Float128 function aliases.
* sysdeps/ieee754/ldbl-opt/w_lgamma_compatl.c [BUILD_LGAMMA]:
Remove conditional code.
Testing with changes to enable _Float128 function aliases shows that
the libm_alias_ldouble_other usage in ldbl-opt/s_clog10l.c does not in
fact work, because __clog10l is defined with long_double_symbol rather
than as a normal C alias. This patch fixes this by renaming the
__clog10l__internal alias (not strictly necessary, but avoids a hack
with "__clog10l_interna" / "__clog10l__interna" as first argument to
libm_alias_ldouble_other) and using the renamed alias when calling
libm_alias_ldouble_other.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanges by the patch. Also tested in conjunction with
patches to enable _Float128 function aliases.
* sysdeps/ieee754/ldbl-opt/s_clog10l.c (__clog10l__internal):
Rename to __clog10_internal_l.
(__clog10_internal_l): Define aliases using
libm_alias_ldouble_other instead of using libm_alias_ldouble_other
with __clog10.
Current GLIBC has two ways to implement the single thread optimization
on syscalls to avoid calling the cancellation path: either by using
global variables (__{libc,pthread}_multiple_thread) or by accessing
the TCB field (defined by TLS_MULTIPLE_THREADS_IN_TCB). Both the
variables and the macros to acces its value are defined in the
architecture sysdep-cancel.h header.
This patch consolidates its definition on only one header,
sysdeps/unix/sysv/linux/sysdep-cancel.h, and adds a new define
(SINGLE_THREAD_BY_GLOBAL) which the architecture defines if it prefer
to use the global variables instead of the TCB field. This is an
optimization, so if the architecture does not define it, the TCB
method will be used as default.
Checked on x86_64-linux-gnu and on a build with major touched
ABIs (aarch64-linux-gnu, alpha-linux-gnu, arm-linux-gnueabihf,
hppa-linux-gnu, i686-linux-gnu, m68k-linux-gnu, microblaze-linux-gnu,
mips-linux-gnu, mips64-linux-gnu, powerpc-linux-gnu,
powerpc64le-linux-gnu, s390-linux-gnu, s390x-linux-gnu, sh4-linux-gnu,
sparcv9-linux-gnu, sparc64-linux-gnu, tilegx-linux-gnu).
* sysdeps/unix/sysv/linux/aarch64/sysdep-cancel.h: Remove file.
* sysdeps/unix/sysv/linux/alpha/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/mips/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/nios2/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep-cancel.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h
(SINGLE_THREAD_BY_GLOBAL): Define.
* sysdeps/unix/sysv/linux/aarch64/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
Likewise.
* sysdeps/unix/sysv/linux/alpha/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
Likewise.
* sysdeps/unix/sysv/linux/arm/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
Likewise.
* sysdeps/unix/sysv/linux/hppa/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
Likewise.
* sysdeps/unix/sysv/linux/microblaze/sysdep.h
(SINGLE_THREAD_BY_GLOBAL): Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
Likewise.
This patch fixes ldbl-opt code to use generic libm alias macros in
preparation for getting _FloatN / _FloatNx aliases where appropriate.
Four functions are affected, that undefine and redefine alias macros
before including the implementations they wrap in such a way that
_FloatN / _FloatNx aliases would not appear. s_clog10l.c undefines
and redefined declare_mgen_alias, so just needs a
libm_alias_ldouble_other call added. w_exp10l_compat.c undefines and
redefines weak_alias, but in fact does not need to do so, since
math/w_exp10l_compat.c uses libm_alias_ldouble and does not use
weak_alias other than through that, so the undefines and redefines of
weak_alias are removed. w_lgamma_compatl.c and w_remainderl_compat.c
are made to use libm_alias_ldouble_other in conjunction with restoring
the original definition of weak_alias so this is effective.
Tested with build-many-glibcs.py. Installed stripped shared libraries
are unchanged by this patch.
* sysdeps/ieee754/ldbl-opt/s_clog10l.c: Use
libm_alias_ldouble_other.
* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (weak_alias): Do not
undefine and redefine.
[LIBM_SVID_COMPAT && !LONG_DOUBLE_COMPAT (libm, GLIBC_2_1)]
(exp10l): Do not define here.
* sysdeps/ieee754/ldbl-opt/w_lgamma_compatl.c [BUILD_LGAMMA]
(weak_alias): Undefine and redefine.
[BUILD_LGAMMA]: Use libm_alias_ldouble_other.
* sysdeps/ieee754/ldbl-opt/w_remainderl_compat.c
[LIBM_SVID_COMPAT] (weak_alias): Undefine and redefine here.
[LIBM_SVID_COMPAT]: Use libm_alias_ldouble_other.
Some libm functions are unable to use the generic alias macros such as
libm_alias_double because they have special symbol versioning
requirements for the main float, double or long double public names.
To facilitate adding _FloatN / _FloatNx function aliases in future,
it's still desirable to have generic macros those functions can use as
far as possible. This patch adds macros such as
libm_alias_double_other, which only define names for _FloatN /
_FloatNx aliases, not for float / double / long double. As present,
all these new macros do nothing, but they are called in the
appropriate places in macros such as libm_alias_double. This patch
also arranges for lgamma implementations, and the recently added
optimized float function implementations, to use the new macros to
make them ready for addition of _FloatN / _FloatNx aliases.
Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by this patch.
* sysdeps/generic/libm-alias-double.h (libm_alias_double_other_r):
New macro.
(libm_alias_double_other): Likewise.
(libm_alias_double_r): Use libm_alias_double_other_r.
* sysdeps/generic/libm-alias-float.h (libm_alias_float_other_r):
New macro.
(libm_alias_float_other): Likewise.
(libm_alias_float_r): Use libm_alias_float_other_r.
* sysdeps/generic/libm-alias-float128.h
(libm_alias_float128_other_r): New macro.
(libm_alias_float128_other): Likewise.
(libm_alias_float128_r): Use libm_alias_float128_other_r.
* sysdeps/generic/libm-alias-ldouble.h
(libm_alias_ldouble_other_r): New macro.
(libm_alias_ldouble_other): Likewise.
(libm_alias_ldouble_r): Use libm_alias_ldouble_other_r.
* sysdeps/ieee754/ldbl-opt/libm-alias-double.h
(libm_alias_double_other_r): New macro.
(libm_alias_double_other): Likewise.
(libm_alias_double_r): Use libm_alias_double_other_r.
* sysdeps/ieee754/ldbl-opt/libm-alias-ldouble.h
(libm_alias_ldouble_other_r): New macro.
(libm_alias_ldouble_other): Likewise.
(libm_alias_ldouble_r): Use libm_alias_ldouble_other_r.
* math/w_lgamma_main.c: Include <libm-alias-double.h>.
[!USE_AS_COMPAT]: Use libm_alias_double_other.
* math/w_lgammaf_main.c: Include <libm-alias-float.h>.
[!USE_AS_COMPAT]: Use libm_alias_float_other.
* math/w_lgammal_main.c: Include <libm-alias-ldouble.h>.
[!USE_AS_COMPAT]: Use libm_alias_ldouble_other.
* math/w_exp2f.c: Use libm_alias_float_other.
* math/w_expf.c: Likewise.
* math/w_log2f.c: Likewise.
* math/w_logf.c: Likewise.
* math/w_powf.c: Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c: Include <libm-alias-float.h>.
[!__exp2f]: Use libm_alias_float_other.
* sysdeps/ieee754/flt-32/e_expf.c: Include <libm-alias-float.h>.
[!__expf]: Use libm_alias_float_other.
* sysdeps/ieee754/flt-32/e_log2f.c: Include <libm-alias-float.h>.
[!__log2f]: Use libm_alias_float_other.
* sysdeps/ieee754/flt-32/e_logf.c: Include <libm-alias-float.h>.
[!__logf]: Use libm_alias_float_other.
* sysdeps/ieee754/flt-32/e_powf.c: Include <libm-alias-float.h>.
[!__powf]: Use libm_alias_float_other.
Continuing the use of generic macros for defining libm function
aliases, in preparation for adding more _FloatN / _FloatNx function
names, this patch makes the lgamma_r functions use such macros.
declare_mgen_alias_r becomes a standard macro in math-type-macros.h
instead of being locally defined in w_lgamma_r_templace.c. This in
turn must be defined by each math-type-macros-<type>.h. Rather than
providing an unused default in math-type-macros.h, that header is made
to give an error if math-type-macros-<type>.h failed to define
declare_mgen_alias or declare_mgen_alias_r. The compat lgamma_r
wrappers are updated similarly. The ldbl-opt versions are removed as
no longer needed.
Tested for x86_64, and with build-many-glibcs.py. Installed stripped
shared libraries are unchanged except for powerpc64le (where the usual
issue applies that an ldbl-opt long double function previously used
long_double_symbol unconditionally and now the symbol versions on
powerpc64le mean weak_alias is used instead, resulting in the same
symbol versions in the final shared library but still enough
difference in the input objects for that library not to be
byte-identical).
* sysdeps/generic/math-type-macros.h [!declare_mgen_alias]: Give
error. Remove default definition of declare_mgen_alias.
[!declare_mgen_alias_r]: Likewise.
* sysdeps/generic/math-type-macros-double.h
[!declare_mgen_alias_r] (declare_mgen_alias_r): New macro.
* sysdeps/generic/math-type-macros-float.h [!declare_mgen_alias_r]
(declare_mgen_alias_r): Likewise.
* sysdeps/generic/math-type-macros-float128.h
[!declare_mgen_alias_r] (declare_mgen_alias_r): Likewise.
* sysdeps/generic/math-type-macros-ldouble.h
[!declare_mgen_alias_r] (declare_mgen_alias_r): Likewise.
* math/w_lgamma_r_template.c (declare_mgen_alias_r_x): Remove
macro.
(declare_mgen_alias_r_s): Likewise.
(declare_mgen_alias_r): Likewise.
* math/w_lgamma_r_compat.c: Include <libm-alias-double.h>.
(lgamma_r): Define using libm_alias_double_r.
* math/w_lgammaf_r_compat.c: Include <libm-alias-float.h>.
(lgammaf_r): Define using libm_alias_float_r.
* math/w_lgammal_r_compat.c: Include <libm-alias-ldouble.h>.
(lgammal_r): Define using libm_alias_ldouble_r.
* sysdeps/ieee754/ldbl-opt/w_lgamma_r_compat.c: Remove file.
* sysdeps/ieee754/ldbl-opt/w_lgammal_r_compat.c: Likewise.
The ldbl-opt version of w_scalbln.c is not in fact needed; it handles
compat symbol versions for libc, but this file isn't built for libc,
only for libm. This patch removes this file.
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.
* sysdeps/ieee754/ldbl-opt/w_scalbln.c: Remove file.
This patch makes the ldbl-128 and ldbl-96 implementations of fma use
libm_alias_double.
Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by the patch.
* sysdeps/ieee754/ldbl-128/s_fma.c: Include <libm-alias-double.h>.
[!__fma] (fma): Define using libm_alias_double.
* sysdeps/ieee754/ldbl-96/s_fma.c: Include <libm-alias-double.h>.
[!__fma] (fma): Define using libm_alias_double.
This patch makes ldbl-128 functions use libm_alias_ldouble to define
function aliases. float128_private.h is updated accordingly. Most of
the ldbl-64-128 wrappers are removed as no longer needed with this
change (leaving those that involve versioning for functions in libc or
that shouldn't be exported from libm for _Float128 / _Float64x types
with the same format as long double).
Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by this patch.
* sysdeps/ieee754/float128/float128_private.h: Include
<libm-alias-ldouble.h> and <libm-alias-float128.h>.
(libm_alias_ldouble_r): Undefine and redefine.
* sysdeps/ieee754/ldbl-128/s_asinhl.c: Include
<libm-alias-ldouble.h>.
(asinhl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Include
<libm-alias-ldouble.h>.
(atanl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_cbrtl.c: Include
<libm-alias-ldouble.h>.
(cbrtl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_ceill.c: Include
<libm-alias-ldouble.h>.
(ceill): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_copysignl.c: Include
<libm-alias-ldouble.h>.
(copysignl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_cosl.c: Include
<libm-alias-ldouble.h>.
(cosl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_erfl.c: Include
<libm-alias-ldouble.h>.
(erfl): Define using libm_alias_ldouble.
(erfcl): Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c: Include
<libm-alias-ldouble.h>.
(expm1l): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_fabsl.c: Include
<libm-alias-ldouble.h>.
(fabsl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_floorl.c: Include
<libm-alias-ldouble.h>.
(floorl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Include
<libm-alias-ldouble.h>.
(fmal): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_frexpl.c: Include
<libm-alias-ldouble.h>.
(frexpl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_fromfpl.c (fromfpl): Define using
libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_fromfpl_main.c: Include
<libm-alias-ldouble.h>.
* sysdeps/ieee754/ldbl-128/s_fromfpxl.c (fromfpxl): Define using
libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_getpayloadl.c: Include
<libm-alias-ldouble.h>.
(getpayloadl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Include
<libm-alias-ldouble.h>.
(llrintl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Include
<libm-alias-ldouble.h>.
(llroundl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_logbl.c: Include
<libm-alias-ldouble.h>.
(logbl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Include
<libm-alias-ldouble.h>.
(lrintl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Include
<libm-alias-ldouble.h>.
(lroundl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_modfl.c: Include
<libm-alias-ldouble.h>.
(modfl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Include
<libm-alias-ldouble.h>.
(nearbyintl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_nextafterl.c: Include
<libm-alias-ldouble.h>.
(nextafterl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_nextupl.c: Include
<libm-alias-ldouble.h>.
(nextupl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_remquol.c: Include
<libm-alias-ldouble.h>.
(remquol): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_rintl.c: Include
<libm-alias-ldouble.h>.
(rintl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_roundevenl.c: Include
<libm-alias-ldouble.h>.
(roundevenl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_roundl.c: Include
<libm-alias-ldouble.h>.
(roundl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_setpayloadl.c (setpayloadl): Define
using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_setpayloadl_main.c: Include
<libm-alias-ldouble.h>.
* sysdeps/ieee754/ldbl-128/s_setpayloadsigl.c (setpayloadsigl):
Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_sincosl.c: Include
<libm-alias-ldouble.h>.
(sincosl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_sinl.c: Include
<libm-alias-ldouble.h>.
(sinl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_tanhl.c: Include
<libm-alias-ldouble.h>.
(tanhl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_tanl.c: Include
<libm-alias-ldouble.h>.
(tanl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_totalorderl.c: Include
<libm-alias-ldouble.h>.
(totalorderl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_totalordermagl.c: Include
<libm-alias-ldouble.h>.
(totalordermagl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_truncl.c: Include
<libm-alias-ldouble.h>.
(truncl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_ufromfpl.c (ufromfpl): Define using
libm_alias_ldouble.
* sysdeps/ieee754/ldbl-128/s_ufromfpxl.c (ufromfpxl): Define using
libm_alias_ldouble.
* sysdeps/ieee754/ldbl-64-128/s_copysignl.c: Include
<libm-alias-ldouble.h>.
(weak_alias): Do not undefine and redefine.
[IS_IN (libc)] (libm_alias_ldouble): Undefine and redefine.
(copysignl): Define with long_double_symbol only if [IS_IN
(libc)].
* sysdeps/ieee754/ldbl-64-128/s_frexpl.c: Include
<libm-alias-ldouble.h>.
(weak_alias): Do not undefine and redefine.
[IS_IN (libc)] (libm_alias_ldouble): Undefine and redefine.
(frexpl): Define with long_double_symbol only if [IS_IN (libc)].
* sysdeps/ieee754/ldbl-64-128/s_modfl.c: Include
<libm-alias-ldouble.h>.
(weak_alias): Do not undefine and redefine.
[IS_IN (libc)] (libm_alias_ldouble): Undefine and redefine.
(modfl): Define with long_double_symbol only if [IS_IN (libc)].
* sysdeps/ieee754/ldbl-64-128/s_asinhl.c: Remove file.
* sysdeps/ieee754/ldbl-64-128/s_atanl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_cbrtl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_ceill.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_cosl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_expm1l.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_fabsl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_floorl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_logbl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_remquol.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_roundl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_sincosl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_tanhl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_truncl.c: Likewise.
Various source files in ldbl-64-128 are redundant, because they wrap
files that no longer provide public symbols that need special
versioning (those symbols having moved to separate errno-setting
wrappers), or, in the case of w_scalblnl.c, because the type-generic
template now does everything required (it deals with symbol versioning
for use in libm, and this file is never built for libc anyway - the
compat scalbln* symbols in libc, as opposed to scalbn*, are only for
i386 and m68k and are aliases to the corresponding scalbn* symbols).
This patch removes those redundant files.
Tested with build-many-glibcs.py (for all ldbl-64-128 configurations)
that installed stripped shared libraries are unchanged by this patch.
* sysdeps/ieee754/ldbl-64-128/e_ilogbl.c: Remove file.
* sysdeps/ieee754/ldbl-64-128/s_log1pl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_scalblnl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_scalbnl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/w_scalblnl.c: Likewise.
Recent commit 59ba2d2b54 missed to add __memrchr_power8 in
ifunc list. Also handled discarding unwanted bytes for
unaligned inputs in power8 optimization.
2017-10-05 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
* sysdeps/powerpc/powerpc64/multiarch/memrchr-ppc64.c: Revert
back to powerpc32 file.
* sysdeps/powerpc/powerpc64/multiarch/memrchr.c
(memrchr): Add __memrchr_power8 to ifunc list.
* sysdeps/powerpc/powerpc64/power8/memrchr.S: Mask
extra bytes for unaligned inputs.
This patch makes ldbl-96 functions use libm_alias_ldouble to define
function aliases.
Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by the patch.
* sysdeps/ieee754/ldbl-96/s_asinhl.c: Include
<libm-alias-ldouble.h>.
(asinhl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_cbrtl.c: Include
<libm-alias-ldouble.h>.
(cbrtl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_copysignl.c: Include
<libm-alias-ldouble.h>.
(copysignl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_cosl.c: Include
<libm-alias-ldouble.h>.
(cosl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_erfl.c: Include
<libm-alias-ldouble.h>.
(erfl): Define using libm_alias_ldouble.
(erfcl): Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Include
<libm-alias-ldouble.h>.
(fmal): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_frexpl.c: Include
<libm-alias-ldouble.h>.
(frexpl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_fromfpl.c (fromfpl): Define using
libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_fromfpl_main.c: Include
<libm-alias-ldouble.h>.
* sysdeps/ieee754/ldbl-96/s_fromfpxl.c (fromfpxl): Define using
libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_getpayloadl.c: Include
<libm-alias-ldouble.h>.
(getpayloadl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Include
<libm-alias-ldouble.h>.
(llrintl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Include
<libm-alias-ldouble.h>.
(llroundl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Include
<libm-alias-ldouble.h>.
(lrintl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Include
<libm-alias-ldouble.h>.
(lroundl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_modfl.c: Include
<libm-alias-ldouble.h>.
(modfl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_nextupl.c: Include
<libm-alias-ldouble.h>.
(nextupl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_remquol.c: Include
<libm-alias-ldouble.h>.
(remquol): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_roundevenl.c: Include
<libm-alias-ldouble.h>.
(roundevenl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_roundl.c: Include
<libm-alias-ldouble.h>.
(roundl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_setpayloadl.c (setpayloadl): Define
using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_setpayloadl_main.c: Include
<libm-alias-ldouble.h>.
* sysdeps/ieee754/ldbl-96/s_setpayloadsigl.c: Include
<libm-alias-ldouble.h>.
(setpayloadsigl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_sincosl.c: Include
<libm-alias-ldouble.h>.
(sincosl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_sinl.c: Include
<libm-alias-ldouble.h>.
(sinl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_tanhl.c: Include
<libm-alias-ldouble.h>.
(tanhl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_tanl.c: Include
<libm-alias-ldouble.h>.
(tanl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_totalorderl.c: Include
<libm-alias-ldouble.h>.
(totalorderl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_totalordermagl.c: Include
<libm-alias-ldouble.h>.
(totalordermagl): Define using libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_ufromfpl.c (ufromfpl): Define using
libm_alias_ldouble.
* sysdeps/ieee754/ldbl-96/s_ufromfpxl.c (ufromfpxl): Define using
libm_alias_ldouble.
This is an optimized memmove implementation for the Qualcomm Falkor
processor core. Due to the way the falkor memcpy needs to be written,
code cannot be easily shared between memmove and memcpy like in case
of other aarch64 memcpy implementations due to which this routine is
separate. The underlying principle is the same as that of memcpy
where it tries to use registers with the same lower 4 bits for
fetching the same stream, thus optimizing hardware prefetcher
performance.
The memcpy copy loop copies 64 bytes at a time using the same register
pair since that's the way to train the hardware prefetcher on the
falkor core. memmove cannot quite do that since it needs to avoid
overlaps, so it does the next best thing, i.e. has a 32 byte loop with
a 32 byte end (prefetch a loop ahead to account for overlapping
locations) with register pairs that alias so that they hit the same
prefetcher. Due to this difference in loop size, they have to
currently be separate implementations but efforts are on to try and
get memmove to fall back into memcpy whenever it can without simply
duplicating all of the code.
Performance:
The routine fares around 20-25% better than the generic memmove for
most medium to large sizes (i.e. > 128 bytes) for the new walking
memmove benchmark (memmove-walk) with an unexplained regression
between 1K and 2K. The minor regression is something worth looking
into for us, but the remaining gains are significant enough that we
would like this included upstream as we looking into the cause for the
regression. Here is a snippet of the numbers as generated from the
microbenchmark by the compare_strings script. Comparisons are against
__memmove_generic:
Function: memmove
Variant: walk
__memmove_thunderx __memmove_falkor __memmove_generic
========================================================================================================================
<snip>
length=16384: 12508800.00 ( 6.09%) 11486800.00 ( 13.76%) 13319600.00
length=16400: 13614200.00 ( -0.67%) 11585000.00 ( 14.33%) 13523600.00
length=16385: 13448400.00 ( 0.10%) 11732700.00 ( 12.84%) 13461200.00
length=16399: 13594100.00 ( -0.22%) 11859600.00 ( 12.57%) 13564400.00
length=16386: 13211600.00 ( 1.13%) 11503800.00 ( 13.91%) 13362400.00
length=16398: 13218600.00 ( 2.12%) 11573200.00 ( 14.30%) 13504700.00
length=16387: 13510900.00 ( -0.37%) 11744200.00 ( 12.76%) 13461300.00
length=16397: 13603700.00 ( -0.15%) 11878200.00 ( 12.55%) 13583200.00
length=16388: 13461700.00 ( -0.13%) 11558000.00 ( 14.03%) 13444100.00
length=16396: 13517500.00 ( -0.03%) 11561300.00 ( 14.45%) 13513900.00
length=16389: 13534100.00 ( 0.17%) 11756800.00 ( 13.28%) 13556900.00
length=16395: 13585600.00 ( 0.11%) 11791800.00 ( 13.30%) 13601200.00
length=16390: 13480100.00 ( -0.13%) 11685500.00 ( 13.20%) 13462100.00
length=16394: 13529900.00 ( -0.23%) 11549800.00 ( 14.43%) 13498200.00
length=16391: 13595400.00 ( -0.26%) 11768200.00 ( 13.22%) 13560600.00
length=16393: 13567000.00 ( 0.20%) 11779700.00 ( 13.35%) 13594700.00
length=32768: 71308800.00 ( -6.53%) 50220800.00 ( 24.98%) 66939200.00
length=32784: 72100800.00 (-11.55%) 50114100.00 ( 22.47%) 64636300.00
length=32769: 71767000.00 ( -7.10%) 51238400.00 ( 23.54%) 67010000.00
length=32783: 70113700.00 (-40.95%) 51129000.00 ( -2.78%) 49744400.00
length=32770: 71367600.00 ( -6.52%) 50244700.00 ( 25.01%) 67000900.00
length=32782: 64366700.00 ( 4.71%) 50101400.00 ( 25.83%) 67545600.00
length=32771: 71440100.00 ( -6.51%) 51263900.00 ( 23.57%) 67074900.00
length=32781: 66993000.00 ( 0.34%) 51108300.00 ( 23.97%) 67220300.00
length=32772: 71443900.00 (-60.50%) 50062100.00 (-12.47%) 44512600.00
length=32780: 71759100.00 ( -6.58%) 50263200.00 ( 25.35%) 67328600.00
length=32773: 71714900.00 (-33.21%) 51076600.00 ( 5.12%) 53835400.00
length=32779: 71756900.00 ( -6.56%) 51290800.00 ( 23.83%) 67337800.00
length=32774: 59689300.00 (-34.55%) 50068400.00 (-12.86%) 44363300.00
length=32778: 71847500.00 (-18.20%) 50084100.00 ( 17.61%) 60786500.00
length=32775: 71599300.00 ( -6.54%) 51278200.00 ( 23.70%) 67204800.00
length=32777: 71862900.00 (-60.85%) 51094000.00 (-14.36%) 44677900.00
length=65536: 282848000.00 ( -6.60%) 199187000.00 ( 24.93%) 265325000.00
length=65552: 243285000.00 (-41.61%) 198512000.00 (-15.54%) 171805000.00
length=65537: 255415000.00 (-23.47%) 202499000.00 ( 2.11%) 206858000.00
length=65551: 280122000.00 (-62.95%) 203349000.00 (-18.29%) 171911000.00
length=65538: 283676000.00 (-14.46%) 198368000.00 ( 19.96%) 247848000.00
length=65550: 275566000.00 (-51.76%) 198494000.00 ( -9.31%) 181581000.00
length=65539: 283699000.00 ( -6.58%) 203453000.00 ( 23.57%) 266195000.00
length=65549: 286572000.00 ( -6.65%) 202607000.00 ( 24.60%) 268712000.00
length=65540: 283710000.00 ( -6.59%) 199161000.00 ( 25.17%) 266160000.00
length=65548: 237573000.00 ( 11.48%) 198462000.00 ( 26.06%) 268395000.00
length=65541: 284150000.00 ( -6.58%) 203273000.00 ( 23.75%) 266600000.00
length=65547: 286250000.00 ( -6.70%) 202594000.00 ( 24.48%) 268263000.00
length=65542: 284167000.00 ( -6.60%) 199122000.00 ( 25.31%) 266584000.00
length=65546: 285656000.00 ( -6.59%) 198443000.00 ( 25.95%) 268002000.00
length=65543: 284600000.00 ( -6.58%) 203247000.00 ( 23.89%) 267030000.00
length=65545: 285665000.00 ( -6.40%) 202575000.00 ( 24.55%) 268472000.00
<snip>
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
memmove_falkor.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Likewise.
* sysdeps/aarch64/multiarch/memmove.c: Likewise.
* sysdeps/aarch64/multiarch/memmove_falkor.S: New file.
glibc has an add-ons mechanism to allow additional software to be
integrated into the glibc build. Such add-ons may be within the glibc
source tree, or outside it at a path passed to the --enable-add-ons
configure option.
localedata and crypt were once add-ons, distributed in separate
release tarballs, but long since stopped using that mechanism.
Linuxthreads was always an add-on. Ports spent some time as an add-on
with separate release tarballs, then was first moved into the glibc
source tree, then had its sysdeps files moved into the main sysdeps
hierarchy so the add-ons mechanism was no longer used. NPTL spent
some time as an add-on in the main glibc tree before stopping using
the add-on mechanism. libidn used to have separate release tarballs
but no longer does so, but still uses the add-ons mechanism within the
glibc source tree. Various other software has supported building with
the add-ons mechanism at times in the past, but I don't think any is
still widely used.
Add-ons involve significant, little-used complexity in the glibc build
system, and make it hard to understand what the space of possible
glibc configurations is. This patch removes the add-ons mechanism.
libidn is now built via the Subdirs mechanism to cause any
configuration using sysdeps/unix/inet to build libidn; HAVE_LIBIDN
(which effectively means shared libraries are available) is now
defined via sysdeps/unix/inet/configure. Various references to
add-ons around the source tree are removed (in the case of maint.texi,
the example list of sysdeps directories is still very out of date).
Externally maintained ports should now put their files in the normal
sysdeps directory structure rather than being arranged as add-ons;
they probably need to change e.g. elf.h anyway, rather than actually
being able to work just as a drop-in subtree. Hurd libpthread should
be arranged similarly to NPTL, so some files might go in a
hurd-pthreads (or similar) top-level directory in glibc, while sysdeps
files should go in the normal sysdeps directory structure (possibly in
hurd or hurd-pthreads subdirectories, just as there are nptl
subdirectories in the sysdeps tree).
Tested for x86_64, and with build-many-glibcs.py.
* configure.ac (--enable-add-ons): Remove option.
(machine): Do not mention add-ons in comment.
(LIBC_PRECONFIGURE): Likewise.
(add_ons): Remove variable and sanity checks and logic to locate
add-ons.
(add_ons_automatic): Remove variable.
(configured_add_ons): Likewise.
(add_ons_sfx): Likewise.
(add_ons_pfx): Likewise.
(add_on_subdirs): Likewise.
(sysnames_add_ons): Likewise. Remove loop over add-ons and
consideration of add-ons in Implies handling.
(sysdeps_add_ons): Likewise.
* configure: Regenerated.
* libidn/configure.ac: Remove.
* libidn/configure: Likewise.
* sysdeps/unix/inet/configure.ac: New file.
* sysdeps/unix/inet/configure: New generated file.
* sysdeps/unix/inet/Subdirs: Add libidn.
* Makeconfig (sysdeps-srcdirs): Remove variable.
(+sysdep_dirs): Do not include $(sysdeps-srcdirs).
($(common-objpfx)config.status): Do not depend on add-on files.
($(common-objpfx)shlib-versions.v.i): Do not mention add-ons in
comment.
(all-subdirs): Do not include $(add-on-subdirs).
* Makefile (dist-prepare): Do not use $(sysdeps-add-ons).
* config.make.in (add-ons): Remove variable.
(add-on-subdirs): Likewise.
(sysdeps-add-ons): Likewise.
* manual/Makefile (add-chapters): Remove.
($(objpfx)texis): Do not depend on $(add-chapters).
(nonexamples): Do not handle $(add-chapters).
(examples): Do not handle $(add-ons).
(chapters.% top-menu.%): Do not pass '$(add-chapters)' to
libc-texinfo.sh.
* manual/install.texi (Installation): Do not mention add-ons.
(--enable-add-ons): Do not document configure option.
* INSTALL: Regenerated.
* manual/libc-texinfo.sh: Do not handle $2 add-ons argument.
* manual/maint.texi (Hierarchy Conventions): Do not mention
add-ons.
* scripts/build-many-glibcs.py (Glibc.build_glibc): Do not use
--enable-add-ons.
* scripts/gen-sorted.awk: Do not handle Subdirs files from
add-ons.
* scripts/test-installation.pl: Do not handle glibc-compat add-on.
* sysdeps/nptl/Makeconfig: Do not mention add-ons in comment.