The ldbl-128ibm implementation of roundl is only correct in
round-to-nearest mode (in other modes, there are incorrect results and
overflow exceptions in some cases). This patch reimplements it along
the lines used for floorl, ceill and truncl, using __round on the high
part, and on the low part if the high part is an integer, and then
adjusting in the cases where this is incorrect.
Tested for powerpc.
[BZ #19594]
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c (__roundl): Use __round
on high and low parts then adjust result and use
ldbl_canonicalize_int if needed.
The ldbl-128ibm implementation of truncl is only correct in
round-to-nearest mode (in other modes, there are incorrect results and
overflow exceptions in some cases). It is also unnecessarily
complicated, rounding both high and low parts to the nearest integer
and then adjusting for the semantics of trunc, when it seems more
natural to take the truncation of the high part (__trunc optimized
inline versions can be used), and the floor or ceiling of the low part
(depending on the sign of the high part) if the high part is an
integer, as was done for floorl and ceill. This patch makes it use
that simpler approach.
Tested for powerpc.
[BZ #19593]
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Use __trunc
on high part and __floor or __ceil on low part then use
ldbl_canonicalize_int if needed.
The ldbl-128ibm implementation of ceill is only correct in
round-to-nearest mode (in other modes, there are incorrect results and
overflow exceptions in some cases). It is also unnecessarily
complicated, rounding both high and low parts to the nearest integer
and then adjusting for the semantics of ceil, when it seems more
natural to take the ceiling of the high part (__ceil optimized inline
versions can be used), and that of the low part if the high part is an
integer, as was done for floorl. This patch makes it use that simpler
approach.
Tested for powerpc.
[BZ #19592]
* sysdeps/ieee754/ldbl-128ibm/s_ceill.c (__ceill): Use __ceil on
high and low parts then use ldbl_canonicalize_int if needed.
The ldbl-128ibm implementation of floorl is only correct in
round-to-nearest mode (in other modes, there are incorrect results and
overflow exceptions in some cases going beyond the incorrect signs of
zero results noted in bug 17899). It is also unnecessarily
complicated, rounding both high and low parts to the nearest integer
and then adjusting for the semantics of floor, when it seems more
natural to take the floor of the high part (__floor optimized inline
versions can be used), and that of the low part if the high part is an
integer. This patch makes it use that simpler approach, with a
canonicalization that works in all rounding modes (given that the only
way the result can be noncanonical is if taking the floor of a
negative noninteger low part increased its exponent).
Tested for powerpc, where over a thousand failures are removed from
test-ldouble.out (floorl problems affect many powl tests).
[BZ #17899]
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h (ldbl_canonicalize_int):
New function.
* sysdeps/ieee754/ldbl-128ibm/s_floorl.c (__floorl): Use __floor
on high and low parts then use ldbl_canonicalize_int if needed.
As discussed in
https://sourceware.org/ml/libc-alpha/2015-10/msg00403.html
the setting of _STRING_ARCH_unaligned currently controls the external
GLIBC ABI as well as selecting the use of unaligned accesses withing
GLIBC.
Since _STRING_ARCH_unaligned was recently changed for AArch64, this
would potentially break the ABI in GLIBC 2.23, so split the uses and add
_STRING_INLINE_unaligned to select the string ABI. This setting must be
fixed for each target, while _STRING_ARCH_unaligned may be changed from
release to release. _STRING_ARCH_unaligned is used unconditionally in
glibc. But <bits/string.h>, which defines _STRING_ARCH_unaligned, isn't
included with -Os. Since _STRING_ARCH_unaligned is internal to glibc and
may change between glibc releases, it should be made private to glibc.
_STRING_ARCH_unaligned should defined in the new string_private.h heade
file which is included unconditionally from internal <string.h> for glibc
build.
[BZ #19462]
* bits/string.h (_STRING_ARCH_unaligned): Renamed to ...
(_STRING_INLINE_unaligned): This.
* include/string.h: Include <string_private.h>.
* string/bits/string2.h: Replace _STRING_ARCH_unaligned with
_STRING_INLINE_unaligned.
* sysdeps/aarch64/bits/string.h (_STRING_ARCH_unaligned): Removed.
(_STRING_INLINE_unaligned): New.
* sysdeps/aarch64/string_private.h: New file.
* sysdeps/generic/string_private.h: Likewise.
* sysdeps/m68k/m680x0/m68020/string_private.h: Likewise.
* sysdeps/s390/string_private.h: Likewise.
* sysdeps/x86/string_private.h: Likewise.
* sysdeps/m68k/m680x0/m68020/bits/string.h
(_STRING_ARCH_unaligned): Renamed to ...
(_STRING_INLINE_unaligned): This.
* sysdeps/s390/bits/string.h (_STRING_ARCH_unaligned): Renamed
to ...
(_STRING_INLINE_unaligned): This.
* sysdeps/sparc/bits/string.h (_STRING_ARCH_unaligned): Renamed
to ...
(_STRING_INLINE_unaligned): This.
* sysdeps/x86/bits/string.h (_STRING_ARCH_unaligned): Renamed
to ...
(_STRING_INLINE_unaligned): This.
Since libmvec_nonshared.a may be linked into shared objects, ALIAS_IMPL
should use PIC relocation.
[BZ #19590]
* sysdeps/x86_64/fpu/svml_finite_alias.S (ALIAS_IMPL): Use PIC
relocation.
The handling of negative offsets in MIPS mmap is inconsistent with
other architectures, as shown by failure of the test
posix/tst-mmap-offset for o32 and n32. The MIPS mmap syscall uses a
signed argument and does a signed arithmetic shift on it, whereas the
glibc semantics expected by that test are for the offset to be
considered as a large positive offset. This patch makes MIPS
consistent with other architectures as far as possible by using the
mmap2 syscall on o32 (#including the generic implementation), and
making mmap not an alias for mmap64 for n32, with a custom
implementation for n32 that zero-extends the offset argument to 64-bit
before calling the mmap syscall.
Tested for MIPS64 (o32, n32, n64).
[BZ #19550]
* sysdeps/unix/sysv/linux/mips/mips32/mmap.c: New file.
* sysdeps/unix/sysv/linux/mips/mips64/mmap64.c: Move to ....
* sysdeps/unix/sysv/linux/mips/mips64/n64/mmap64.c: ... here.
* sysdeps/unix/sysv/linux/mips/mips64/n32/mmap.c: New file.
* sysdeps/unix/sysv/linux/mips/mips64/n32/syscalls.list (mmap64):
New syscall entry.
* sysdeps/unix/sysv/linux/mips/mips64/n64/syscalls.list (mmap):
New syscall entry.
* sysdeps/unix/sysv/linux/mips/mips64/syscalls.list (mmap): Remove
syscall entry.
The MIPS memcpy optimizations at
<https://sourceware.org/ml/libc-alpha/2015-10/msg00597.html>
introduced a bug causing many string function tests to fail with
segfaults for n32 and n64:
FAIL: string/stratcliff
FAIL: string/test-bcopy
FAIL: string/test-memccpy
FAIL: string/test-memcmp
FAIL: string/test-memcpy
FAIL: string/test-memmove
FAIL: string/test-mempcpy
FAIL: string/test-stpncpy
FAIL: string/test-strncmp
FAIL: string/test-strncpy
(Some failures in other directories could also be caused by this bug.)
The problem is that after the check for whether a word of input is
left that can be copied as a word before moving to byte copies, a load
can occur in the branch delay slot, resulting in a segfault if we are
at the end of a page and the following page is unmapped. I don't see
how this would have passed the tests as reported in the original patch
posting (different kernel configurations affecting the code setting up
unmapped pages, maybe?), since the tests in question don't appear to
have changed recently.
This patch moves a later instruction into the delay slot, as suggested
at <https://sourceware.org/ml/libc-alpha/2016-01/msg00584.html>.
Tested for n32 and n64.
2016-01-28 Steve Ellcey <sellcey@imgtec.com>
Joseph Myers <joseph@codesourcery.com>
* sysdeps/mips/memcpy.S (MEMCPY_NAME) [USE_DOUBLE]: Avoid word
load in branch delay slot when less than a word of input left.
Complement the addition of the required kernel support, present upstream
as from commit 2b5e869ecfcb3112f7e1267cb0328f3ff6d49b18 ("MIPS: ELF:
Interpret the NAN2008 file header flag") and released with Linux 4.5-rc1
on Jan 24th, 2016.
* sysdeps/unix/sysv/linux/mips/configure.ac: Set
`arch_minimum_kernel' to 4.5.0 if 2008 NaN encoding is used.
* sysdeps/unix/sysv/linux/mips/configure: Regenerate.
The changes to restrict implementation-namespace symbol aliases such
as __finitel to compat symbols used code for __finitel in libm
analogous to that for __finitel in libc. However, the versions for
the two symbols are actually different, GLIBC_2.0 in libc and
GLIBC_2.1 in libm. This patch fixes the handling of the libm compat
symbol.
Tested for mips (o32), where it fixes an ABI test failure.
* sysdeps/ieee754/dbl-64/s_finite.c
[NO_LONG_DOUBLE && LDBL_CLASSIFY_COMPAT] (__finitel): Define
compat symbol at version GLIBC_2_1 and use GLIBC_2_1 in
SHLIB_COMPAT condition for libm, not GLIBC_2_0.
* sysdeps/ieee754/dbl-64/wordsize-64/s_finite.c
[NO_LONG_DOUBLE && LDBL_CLASSIFY_COMPAT] (__finitel): Likewise.
Testing for powerpc-nofpu showed that localplt.data was out of date.
Two new soft-fp functions showed up in the list: __gtsf2 and
__unordsf2; this patch adds these as optional. __signbit and
__signbitl no longer appear as local PLT entries; given the move to
__builtin_signbit* for all GCC versions supported for building glibc
(and given the use of the type-generic signbit macro within glibc),
those can safely be removed from the list, which this patch does.
Tested for powerpc-nofpu.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/localplt.data
(__gtsf2): Add as optional for libc.so.
(__unordsf2): Likewise.
(__signbit): Remove for libc.so.
(__signbitl): Likewise.
On running tests after from-scratch ulps regeneration, I found that
some libm tests failed with ulps in excess of those recorded in the
from-scratch regeneration, which should never happen unless those ulps
exceed the limit on ulps that can go in libm-test-ulps files.
Failure: Test: atan2_upward (inf, -inf)
Result:
is: 2.35619498e+00 0x1.2d97ccp+1
should be: 2.35619450e+00 0x1.2d97c8p+1
difference: 4.76837159e-07 0x1.000000p-21
ulp : 2.0000
max.ulp : 1.0000
Maximal error of `atan2_upward'
is : 2 ulp
accepted: 1 ulp
Failure: Test: carg_upward (-inf + inf i)
Result:
is: 2.35619498e+00 0x1.2d97ccp+1
should be: 2.35619450e+00 0x1.2d97c8p+1
difference: 4.76837159e-07 0x1.000000p-21
ulp : 2.0000
max.ulp : 1.0000
Maximal error of `carg_upward'
is : 2 ulp
accepted: 1 ulp
The problem comes from the addition of tests for the finite-math-only
versions of libm functions. Those tests share ulps with the default
function variants. make regen-ulps runs the default tests before the
finite-math-only tests, concatenating the resulting ulps before
feeding them to gen-libm-test.pl to generate a new libm-test-ulps
file. But gen-libm-test.pl always takes the last ulps value given for
any (function, type) pair. So, if the largest ulps for a function
come from non-finite inputs, a from-scratch regeneration loses those
ulps.
This patch fixes gen-libm-test.pl, in the case where there are
multiple ulps values for a (function, type) pair - which can only
happen as part of a regeneration - to take the largest ulps value
rather than the last one.
Tested for ARM / MIPS / powerpc-nofpu.
* math/gen-libm-test.pl (parse_ulps): Do not reduce
already-recorded ulps.
* sysdeps/arm/libm-test-ulps: Regenerated.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
I've regenerated ulps from scratch for s390/s390x.
All math testcases are passing afterwards.
ChangeLog:
* sysdeps/s390/fpu/libm-test-ulps: Regenerated.
I get some math test-failures on s390 for float/double/ldouble for
various lrint/lround functions like:
lrint (0x1p64): Exception "Inexact" set
lrint (-0x1p64): Exception "Inexact" set
lround (0x1p64): Exception "Inexact" set
lround (-0x1p64): Exception "Inexact" set
...
GCC emits "convert to fixed" instructions for casting floating point
values to integer values. These instructions raise invalid and inexact
exceptions if the floating point value exceeds the integer type ranges.
This patch enables the various FIX_DBL_LONG_CONVERT_OVERFLOW macros in
order to avoid a cast from floating point to integer type and raise the
invalid exception with feraiseexcept.
The ldbl-128 rint/round functions are now using the same logic.
ChangeLog:
[BZ #19486]
* sysdeps/s390/fix-fp-int-convert-overflow.h: New File.
* sysdeps/generic/fix-fp-int-convert-overflow.h
(FIX_LDBL_LONG_CONVERT_OVERFLOW,
FIX_LDBL_LLONG_CONVERT_OVERFLOW): New define.
* sysdeps/arm/fix-fp-int-convert-overflow.h: Likewise.
* sysdeps/mips/mips32/fpu/fix-fp-int-convert-overflow.h:
Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl):
Avoid conversions to long int where inexact exceptions
could be raised.
* sysdeps/ieee754/ldbl-128/s_lroundl.c (__lroundl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c (__llrintl):
Avoid conversions to long long int where inexact exceptions
could be raised.
* sysdeps/ieee754/ldbl-128/s_llroundl.c (__llroundl):
Likewise.
The previous barrier implementation did not fulfill the POSIX requirements
for when a barrier can be destroyed. Specifically, it was possible that
threads that haven't noticed yet that their round is complete still access
the barrier's memory, and that those accesses can happen after the barrier
has been legally destroyed.
The new algorithm does not have this issue, and it avoids using a lock
internally.
GLIBC benchtest testcases shows SSE2_Unaligned based implementations
are performing faster compare to SSE2 based implementations for
routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family
0x15h CPU's. This makes SSE2_Unaligned based implementations as
default for these routines.
[BZ #19467]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
index_Fast_Unaligned_Load flag for Excavator family CPUs.
Preparation for gcc -fsplit-stack support (gcc bug #68191). The new
field is basically identical to the one on x86. Its TCB offset needs
to be constant, as it'll be hardcoded in gcc.
ChangeLog:
* sysdeps/s390/nptl/tls.h (struct tcbhead_t): Add __private_ss field.
This patch adds some new header definitions from Linux 4.4:
* MCL_ONFAULT is added to bits/mman.h / bits/mman-linux.h (this was
already done for hppa).
* PTRACE_SECCOMP_GET_FILTER is added to sys/ptrace.h. Along with it,
the older PTRACE_GETSIGMASK and PTRACE_SETSIGMASK, added in Linux
3.11 but missed at the time, are also added.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).
* bits/mman-linux.h [!MCL_CURRENT] (MCL_ONFAULT): New macro.
* sysdeps/unix/sysv/linux/alpha/bits/mman.h (MCL_ONFAULT):
Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/mman.h (MCL_ONFAULT):
Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/mman.h (MCL_ONFAULT):
Likewise.
* sysdeps/unix/sysv/linux/sys/ptrace.h (PTRACE_GETSIGMASK): New
enum constant and macro.
(PTRACE_SETSIGMASK): Likewise.
(PTRACE_SECCOMP_GET_FILTER): Likewise.
* sysdeps/unix/sysv/linux/aarch64/sys/ptrace.h
(PTRACE_GETSIGMASK): Likewise.
(PTRACE_SETSIGMASK): Likewise.
(PTRACE_SECCOMP_GET_FILTER): Likewise.
* sysdeps/unix/sysv/linux/ia64/sys/ptrace.h (PTRACE_GETSIGMASK):
Likewise.
(PTRACE_SETSIGMASK): Likewise.
(PTRACE_SECCOMP_GET_FILTER): Likewise.
* sysdeps/unix/sysv/linux/powerpc/sys/ptrace.h
(PTRACE_GETSIGMASK): Likewise.
(PTRACE_SETSIGMASK): Likewise.
(PTRACE_SECCOMP_GET_FILTER): Likewise.
* sysdeps/unix/sysv/linux/s390/sys/ptrace.h (PTRACE_GETSIGMASK):
Likewise.
(PTRACE_SETSIGMASK): Likewise.
(PTRACE_SECCOMP_GET_FILTER): Likewise.
* sysdeps/unix/sysv/linux/sparc/sys/ptrace.h (PTRACE_GETSIGMASK):
Likewise.
(PTRACE_SETSIGMASK): Likewise.
(PTRACE_SECCOMP_GET_FILTER): Likewise.
* sysdeps/unix/sysv/linux/tile/sys/ptrace.h (PTRACE_GETSIGMASK):
Likewise.
(PTRACE_SETSIGMASK): Likewise.
(PTRACE_SECCOMP_GET_FILTER): Likewise.
Work around a GCC behavior with hardware transactional memory built-ins.
GCC doesn't treat the PowerPC transactional built-ins as compiler
barriers, moving instructions past the transaction boundaries and
altering their atomicity.
The attached patch fixes dladdr on hppa.
Instead of using the generic version of _dl_lookup_address, we use an
implementation more or less modeled after __canonicalize_funcptr_for_compare()
in gcc. The function pointer is analyzed and if it points to the
trampoline used to call _dl_runtime_resolve just before the global
offset table, then we call _dl_fixup to resolve the function pointer.
Then, we return the instruction pointer from the first word of the
descriptor.
The change fixes the testcase provided in [BZ #19415] and the Debian
nss package now builds successfully.
We define __ASSUME_ST_INO_64_BIT by default for Linux targets, and then
undef it for alpha/sh targets. But the code that uses it looks at its
value (as 0/1) rather than whether it's defined (like all other assume
knobs). Change the code to see if it's defined to fix build Wundef build
errors for alpha/sh.
Since internal unistd functions are only used internally in ld.so and
libc.so, they can be made hidden. __close, __getcwd, __getpid,
__libc_read and __libc_write can't be hidden in ld.so on Hurd since they
will be preempted by the ones in libc.so after bootstrap.
[BZ #19122]
* include/unistd.h [IS_IN (rtld)]: Include <dl-unistd.h>.
* sysdeps/generic/dl-unistd.h: New file.
* sysdeps/mach/hurd/dl-unistd.h: Likewise.
Since ld.so internal mmap functions are only used internally in ld.so,
they can be made hidden. Don't hide __mmap on Hurd, since __mmap in
ld.so will be preempted by the one in libc.so after bootstrap.
[BZ #19122]
* include/sys/mman.h [IS_IN (rtld)]: Include <dl-mman.h>.
* sysdeps/generic/dl-mman.h: New file.
* sysdeps/mach/hurd/dl-mman.h: Likewise.
When looking at the code generated for pow() on ppc64 I noticed quite
a few sign extensions. Making the array indices unsigned reduces the
number of sign extensions from 24 to 7.
Tested for powerpc64le and x86_64.