This patch adds some randomly-generated tests of atanh that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of atanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of atan that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of atan.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of cbrt that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of cbrt.
* math/auto-libm-test-out: Regenerated.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
This patch adds some randomly-generated tests of cabs that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of cabs.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The dbl-64 implementation of atan2 does computations that expect to
run in round-to-nearest mode, and in other modes the errors can
accumulate to more than the maximum accepted 9ulp. This patch makes
it use FE_TONEAREST internally, similar to other functions with such
issues. Tests that previously produced large errors are added for
atan2 and the closely related carg, clog and clog10 functions.
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #18210]
[BZ #18211]
* sysdeps/ieee754/dbl-64/e_atan2.c: Include <fenv.h>.
(__ieee754_atan2): Set FE_TONEAREST mode for internal
computations.
* math/auto-libm-test-in: Add more tests of atan2, carg, clog and
clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The dbl-64 implementation of atan does computations that expect to run
in round-to-nearest mode, and in other modes the errors can accumulate
to more than the maximum accepted 9ulp. This patch makes it use
FE_TONEAREST internally, similar to other functions with such issues.
Tested for x86_64 and x86; no ulps updates needed.
[BZ #18197]
* sysdeps/ieee754/dbl-64/s_atan.c: Include <fenv.h>.
(atan): Set FE_TONEAREST mode for internal computations.
* math/auto-libm-test-in: Add more tests of atan.
* math/auto-libm-test-out: Regenerated.
On Alpha and IA-64, fexcept_t is unsigned long. But all the values
fit within an int, so the cast is ok for printing. All other hosts
use unsigned int or unsigned short already.
Trimming heaps is a balance between saving memory and the system overhead
required to update page tables and discard allocated pages. The malloc
option M_TRIM_THRESHOLD is a tunable that users are meant to use to decide
where this balance point is but it is only applied to the main arena.
For scalability reasons, glibc malloc has per-thread heaps but these are
shrunk with madvise() if there is one page free at the top of the heap.
In some circumstances this can lead to high system overhead if a thread
has a control flow like
while (data_to_process) {
buf = malloc(large_size);
do_stuff();
free(buf);
}
For a large size, the free() will call madvise (pagetable teardown, page
free and TLB flush) every time followed immediately by a malloc (fault,
kernel page alloc, zeroing and charge accounting). The kernel overhead
can dominate such a workload.
This patch allows the user to tune when madvise gets called by applying
the trim threshold to the per-thread heaps and using similar logic to the
main arena when deciding whether to shrink. Alternatively if the dynamic
brk/mmap threshold gets adjusted then the new values will be obeyed by
the per-thread heaps.
Bug 17195 was a test case motivated by a problem encountered in scientific
applications written in python that performance badly due to high page fault
overhead. The basic operation of such a program was posted by Julian Taylor
https://sourceware.org/ml/libc-alpha/2015-02/msg00373.html
With this patch applied, the overhead is eliminated. All numbers in this
report are in seconds and were recorded by running Julian's program 30
times.
pyarray
glibc madvise
2.21 v2
System min 1.81 ( 0.00%) 0.00 (100.00%)
System mean 1.93 ( 0.00%) 0.02 ( 99.20%)
System stddev 0.06 ( 0.00%) 0.01 ( 88.99%)
System max 2.06 ( 0.00%) 0.03 ( 98.54%)
Elapsed min 3.26 ( 0.00%) 2.37 ( 27.30%)
Elapsed mean 3.39 ( 0.00%) 2.41 ( 28.84%)
Elapsed stddev 0.14 ( 0.00%) 0.02 ( 82.73%)
Elapsed max 4.05 ( 0.00%) 2.47 ( 39.01%)
glibc madvise
2.21 v2
User 141.86 142.28
System 57.94 0.60
Elapsed 102.02 72.66
Note that almost a minutes worth of system time is eliminted and the
program completes 28% faster on average.
To illustrate the problem without python this is a basic test-case for
the worst case scenario where every free is a madvise followed by a an alloc
/* gcc bench-free.c -lpthread -o bench-free */
static int num = 1024;
void __attribute__((noinline,noclone)) dostuff (void *p)
{
}
void *worker (void *data)
{
int i;
for (i = num; i--;)
{
void *m = malloc (48*4096);
dostuff (m);
free (m);
}
return NULL;
}
int main()
{
int i;
pthread_t t;
void *ret;
if (pthread_create (&t, NULL, worker, NULL))
exit (2);
if (pthread_join (t, &ret))
exit (3);
return 0;
}
Before the patch, this resulted in 1024 calls to madvise. With the patch applied,
madvise is called twice because the default trim threshold is high enough to avoid
this.
This a more complex case where there is a mix of frees. It's simply a different worker
function for the test case above
void *worker (void *data)
{
int i;
int j = 0;
void *free_index[num];
for (i = num; i--;)
{
void *m = malloc ((i % 58) *4096);
dostuff (m);
if (i % 2 == 0) {
free (m);
} else {
free_index[j++] = m;
}
}
for (; j >= 0; j--)
{
free(free_index[j]);
}
return NULL;
}
glibc 2.21 calls malloc 90305 times but with the patch applied, it's
called 13438. Increasing the trim threshold will decrease the number of
times it's called with the option of eliminating the overhead.
ebizzy is meant to generate a workload resembling common web application
server workloads. It is threaded with a large working set that at its core
has an allocation, do_stuff, free loop that also hits this case. The primary
metric of the benchmark is records processed per second. This is running on
my desktop which is a single socket machine with an I7-4770 and 8 cores.
Each thread count was run for 30 seconds. It was only run once as the
performance difference is so high that the variation is insignificant.
glibc 2.21 patch
threads 1 10230 44114
threads 2 19153 84925
threads 4 34295 134569
threads 8 51007 183387
Note that the saving happens to be a concidence as the size allocated
by ebizzy was less than the default threshold. If a different number of
chunks were specified then it may also be necessary to tune the threshold
to compensate
This is roughly quadrupling the performance of this benchmark. The difference in
system CPU usage illustrates why.
ebizzy running 1 thread with glibc 2.21
10230 records/s 306904
real 30.00 s
user 7.47 s
sys 22.49 s
22.49 seconds was spent in the kernel for a workload runinng 30 seconds. With the
patch applied
ebizzy running 1 thread with patch applied
44126 records/s 1323792
real 30.00 s
user 29.97 s
sys 0.00 s
system CPU usage was zero with the patch applied. strace shows that glibc
running this workload calls madvise approximately 9000 times a second. With
the patch applied madvise was called twice during the workload (or 0.06
times per second).
2015-02-10 Mel Gorman <mgorman@suse.de>
[BZ #17195]
* malloc/arena.c (free): Apply trim threshold to per-thread heaps
as well as the main arena.
Silvermont and Knights Landing have a modular system design with two cores
sharing an L2 cache. If more than 2 cores are detected to shared L2 cache,
it should be adjusted for Silvermont and Knights Landing.
[BZ #18185]
* sysdeps/x86_64/cacheinfo.c (init_cacheinfo): Limit threads
sharing L2 cache to 2 for Silvermont/Knights Landing.
Linkers in some versions of binutils 2.25 and 2.26 don't support protected
data symbol with error messsage like:
/usr/bin/ld: copy reloc against protected `bar' is invalid
/usr/bin/ld: failed to set dynamic section sizes: Bad value
We check if linker supports copy reloc against protected data symbol to
avoid running the test if linker is broken.
[BZ #17711]
* config.make.in (have-protected-data): New.
* configure.ac: Check linker support for protected data symbol.
* configure: Regenerated.
* elf/Makefile (modules-names): Add tst-protected1moda and
tst-protected1modb if $(have-protected-data) is yes.
(tests): Add tst-protected1a and tst-protected1b if
$(have-protected-data) is yes.
($(objpfx)tst-protected1a): New.
($(objpfx)tst-protected1b): Likewise.
(tst-protected1modb.so-no-z-defs): Likewise.
* elf/tst-protected1a.c: New file.
* elf/tst-protected1b.c: Likewise.
* elf/tst-protected1mod.h: Likewise.
* elf/tst-protected1moda.c: Likewise.
* elf/tst-protected1modb.c: Likewise.
With copy relocation, address of protected data defined in the shared
library may be external. When there is a relocation against the
protected data symbol within the shared library, we need to check if we
should skip the definition in the executable copied from the protected
data. This patch adds ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA and defines
it for x86. If ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA isn't 0, do_lookup_x
will skip the data definition in the executable from copy reloc.
[BZ #17711]
* elf/dl-lookup.c (do_lookup_x): When UNDEF_MAP is NULL, which
indicates it is called from do_lookup_x on relocation against
protected data, skip the data definion in the executable from
copy reloc.
(_dl_lookup_symbol_x): Pass ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA,
instead of ELF_RTYPE_CLASS_PLT, to do_lookup_x for
EXTERN_PROTECTED_DATA relocation against STT_OBJECT symbol.
* sysdeps/generic/ldsodefs.h * (ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA):
New. Defined to 4 if DL_EXTERN_PROTECTED_DATA is defined,
otherwise to 0.
* sysdeps/i386/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
* sysdeps/i386/dl-machine.h (elf_machine_type_class): Set class
to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_386_GLOB_DAT.
* sysdeps/x86_64/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
* sysdeps/x86_64/dl-machine.h (elf_machine_type_class): Set class
to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_X86_64_GLOB_DAT.
IFUNC is difficult to correctly implement on any target needing a GOT
to support position independent code, due to the dependency on order
of dynamic relocations. ld.so should be changed to apply IFUNC
relocations last, globally, because without that it is actually
impossible to write an IFUNC resolver in C that works in all
situations. Case in point, vfork in libpthread.so is an IFUNC with
the resolver returning &__libc_vfork. (system and fork are similar.)
If another shared library, libA say, uses vfork then it is quite
possible that libpthread.so hasn't been dynamically relocated before
the unfortunate libA is dynamically relocated. In that case the GOT
entry for &__libc_vfork is still zero, so the IFUNC resolver returns
NULL. LD_BIND_NOW=1 results in libA PLT dynamic relocations being
applied using this NULL value and ld.so segfaults.
This patch hardens ld.so to not segfault on a NULL from an IFUNC
resolver. It also fixes a problem with undefined weak. If you leave
the plt entry as-is for undefined weak then if the entry is ever
called it will loop in ld.so rather than segfaulting.
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_fixup_plt):
Don't segfault if ifunc resolver returns a NULL. Do set plt to
zero for undefined weak.
(elf_machine_plt_conflict): Similarly.
This patch adds some randomly-generated tests of acosh, asinh and
atanh that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of acosh, asinh and
atanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds a randomly-generated test of asin that is observed to
increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add another test of asin.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
In the course of the work on six-argument syscalls I noticed that the
i386 lowlevellock.h contained some unused macro definitions (already
unused before my patch). This patch removes them.
Tested for x86 that installed stripped shared libraries are unchanged
by this patch.
* sysdeps/unix/sysv/linux/i386/lowlevellock.h (LLL_EBX_LOAD):
Remove macro.
(LLL_EBX_REG): Likewise.
(LLL_ENTER_KERNEL): Likewise.
This patch adds some randomly-generated tests of asin that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of asin.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch follows the approach outlined in
<https://sourceware.org/ml/libc-alpha/2015-03/msg00656.html> to
support six-argument syscalls from INTERNAL_SYSCALL for 32-bit x86,
making them call a function __libc_do_syscall that takes the syscall
number and three syscall arguments in the registers in which the
kernel expects them, along with a pointer to a structure containing
the other three arguments.
In turn, this allows the generic lowlevellock-futex.h to be used on
32-bit x86, so supporting lll_futex_timed_wait_bitset (and so allowing
FUTEX_CLOCK_REALTIME to be used in various cases, so fixing bug 18138
for 32-bit x86 and leaving hppa as the only architecture missing
lll_futex_timed_wait_bitset). The change to lowlevellock.h's
definition of SYS_futex is because the generic lowlevelloc-futex.h
ends up bringing in bits/syscall.h which defines SYS_futex to
__NR_futex, so resulting in redefinition errors. The revised
definition in lowlevellock.h is in line with what the x86_64 version
does.
__libc_do_syscall is only needed in libpthread at present (meaning
nothing special needs to be done to make it shared-only in most
libraries containing it, static in libc only, as on ARM).
Tested for 32-bit x86, with the glibc testsuite and with the test in
bug 18138. The failures seen
FAIL: nptl/tst-cleanupx4
FAIL: rt/tst-cpuclock2
are pre-existing.
[BZ #18138]
* sysdeps/unix/sysv/linux/i386/sysdep.h (struct
libc_do_syscall_args): New structure.
(INTERNAL_SYSCALL_MAIN_0): New macro.
(INTERNAL_SYSCALL_MAIN_1): Likewise.
(INTERNAL_SYSCALL_MAIN_2): Likewise.
(INTERNAL_SYSCALL_MAIN_3): Likewise.
(INTERNAL_SYSCALL_MAIN_4): Likewise.
(INTERNAL_SYSCALL_MAIN_5): Likewise.
(INTERNAL_SYSCALL_MAIN_6): Likewise. Call __libc_do_syscall.
(INTERNAL_SYSCALL): Define to use INTERNAL_SYSCALL_MAIN_##nr.
Replace conditional definitions by conditional definitions of ....
(INTERNAL_SYSCALL_MAIN_INLINE): ... this. New macro.
* sysdeps/unix/sysv/linux/i386/libc-do-syscall.S: New file.
* sysdeps/unix/sysv/linux/i386/Makefile [$(subdir) = nptl]
(libpthread-sysdep_routines): Add libc-do-syscall.
* sysdeps/unix/sysv/linux/i386/lowlevellock-futex.h: Remove file.
* sysdeps/unix/sysv/linux/i386/lowlevellock.h (SYS_futex): Define
to __NR_futex not 240.
This patch is glibc support for a PowerPC TLS optimization, inspired
by Alexandre Oliva's TLS optimization for other processors,
http://www.lsd.ic.unicamp.br/~oliva/writeups/TLS/RFC-TLSDESC-x86.txt
In essence, this optimization uses a zero module id in the tls_index
GOT entry to indicate that a TLS variable is allocated space in the
static TLS area. A special plt call linker stub for __tls_get_addr
checks for such a tls_index and if found, returns the offset
immediately. The linker communicates the fact that the special
__tls_get_addr stub is used by setting a bit in the dynamic tag
DT_PPC64_OPT/DT_PPC_OPT. glibc communicates to the linker that this
optimization is available by the presence of __tls_get_addr_opt.
tst-tlsmod2.so is built with -Wl,--no-tls-get-addr-optimize for
tst-tls-dlinfo, which otherwise would fail since it tests that no
static tls is allocated. The ld option --no-tls-get-addr-optimize has
been available since binutils-2.20 so doesn't need a configure test.
* NEWS: Advertise TLS optimization.
* elf/elf.h (R_PPC_TLSGD, R_PPC_TLSLD, DT_PPC_OPT, PPC_OPT_TLS): Define.
(DT_PPC_NUM): Increment.
* elf/dynamic-link.h (HAVE_STATIC_TLS): Define.
(CHECK_STATIC_TLS): Use here.
* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela): Optimize
TLS descriptors.
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise.
* sysdeps/powerpc/dl-tls.c: New file.
* sysdeps/powerpc/Versions: Add __tls_get_addr_opt.
* sysdeps/powerpc/tst-tlsopt-powerpc.c: New tls test.
* sysdeps/unix/sysv/linux/powerpc/Makefile: Add new test.
Build tst-tlsmod2.so with --no-tls-get-addr-optimize.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist: Update.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist: Likewise.
This feature doesn't depend on the linker, as can be seen from the
actual test. It's a compiler feature.
* sysdeps/powerpc/powerpc64/configure.ac: Correct "linker support
for overlapping .opd entries" to "support...".
* sysdeps/powerpc/powerpc64/configure: Regenerate
This patch adds some randomly-generated tests of acos that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of acos.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of expm1 that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of expm1.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of cosh and sinh that
are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of cosh and sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The x86_64 and x86 libm-test-ulps files hadn't been regenerated from
scratch for some time, as evidenced by the presence of entries for
*_tonearest functions (those tests duplicated the
default-rounding-mode tests, and such duplicates are no longer run).
The aarch64, alpha, hppa, ia64, m68k, microblaze, powerpc, s390, sh,
sparc, tile files similarly could do with from-scratch regeneration as
evidenced by the presence of such entries. (Truncate the existing
file then run "make regen-ulps" and move the resulting file into
place.)
This patch regenerates the x86_64 and x86 files from scratch. It's
likely some of the reduced / removed ulps will need restoring because
they appear on processors or compiler versions other than the one I
tested on, but in such cases I'd like to first see if I can generate
new tests that show such ulps on the Intel processor I'm testing on,
to reduce the effects from different people using different processors
and compilers to regenerate the ulps.
* sysdeps/i386/fpu/libm-test-ulps: Regenerated.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
In testing for x86_64 on an AMD processor, I observed libm test
failures of the form:
testing long double (without inline functions)
Failure: Test: log2_downward (0x2.b7e151628aed4p+0)
Result:
is: 1.44269504088896356633e+00 0xb.8aa3b295c17f67600000p-3
should be: 1.44269504088896356622e+00 0xb.8aa3b295c17f67500000p-3
difference: 1.08420217248550443400e-19 0x8.00000000000000000000p-66
ulp : 1.0000
max.ulp : 0.0000
Maximal error of `log2_downward'
is : 1 ulp
accepted: 0 ulp
These issues arise because the maximum ulps when regenerating on one
processor are not the same as on another processor, so regeneration on
several processors may be needed when updating libm-test-ulps to avoid
failures for some users testing glibc - but such regeneration on
multiple processors is inconvenient. Causes can be: on x86 and, for
x86_64, for long double, variation in results of x87 instructions for
transcendental operations between processors; on x86, variation in
compiler excess precision between compiler versions and
configurations; on any processor where the compiler may contract
expressions using fused multiply-add, variation in what contraction
occurs.
Although it's hard to be sure libm-test-ulps covers all ulps that may
be seen in any configuration for the given architecture, in practice
it helps simply to add wider test coverage to make it more likely
that, when testing on one processor, the ulps seen are the biggest
that can be seen for that function on that processor, and hopefully
they are also the biggest that can be seen for that function in other
configurations for that architecture. Thus, this patch adds some
tests of log2 that increase the ulps I see on x86_64 on an Intel
processor, so that hopefully future from-scratch regenerations on that
processor will produce ulps big enough not to have errors from testing
on AMD processors. These tests were found by randomly generating
inputs and seeing what produced ulps larger than those currently in
libm-test-ulps. Of course such increases also improve the accuracy of
the empirical table of known ulps generated from libm-test-ulps files
that goes in the manual.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of log2.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
extend_alloca was used to emulate VLA deallocation. The new version
also handles the res == 0 corner case more explicitly, by returning 0
instead of the (potentially undefined, but usually zero) system call
error.
In bc0cdc498 the configure check for HAVE_ASM_PPC_REL16 was removed
on the grounds that the minimum binutils supports rel16 relocs. This
is true, but not all references to HAVE_ASM_PPC_REL16 in the sources
were removed.
* config.h.in: Remove HAVE_ASM_PPC_REL16.
* sysdeps/powerpc/powerpc32/tls-macros.h: Remove HAVE_ASM_PPC_REL16
and false branch of conditional.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S:
Likewise.
* sysdeps/mach/hurd/Makefile ($(common-objpfx)errnos.d): Depend on
libc-modules.h
* sysdeps/mach/hurd/i386/trampoline.c (_hurd_setup_sighandler): Remove
unused declaration of _hurd_intr_rpc_msg_in_trap.
* mach/mach_init.c (__mach_init): Test whether HAVE_HOST_PAGE_SIZE is
defined instead of whether it is non-zero.
* sysdeps/mach/hurd/i386/intr-msg.h (INTR_MSG_TRAP): Use "+m"
input constraint instead of both input and output constraint. Use ecx
clobber instead of %ecx.
* sysdeps/mach/hurd/malloc-machine.h (mutex_init, mutex_lock,
mutex_unlock): Use a statement expression instead of an expression list.
* sysdeps/mach/hurd/setitimer.c (_hurd_itimer_thread_stack_size): Set
type to vm_size_t instead of vm_address_t.
* sysdeps/mach/hurd/fork.c (__fork): Test whether STACK_GROWTH_UP is
defined instead of whether it is non-zero.
* hurd/hurd/ioctl.h (_hurd_locked_install_cttyid): New declaration.
* sysdeps/mach/hurd/setsid.c: Include <hurd/ioctl.h>.
* sysdeps/mach/hurd/mmap.c (__mmap): Use 0 instead of NULL for
comparisons with mapaddr.
* nscd/nscd-client.h: Include <time.h>.
* sysdeps/mach/hurd/dl-sysdep.c (fmh): Pass vm_offset_t dummy
9th parameter to __vm_region instead of int.
sem_timedwait converts absolute timeouts to relative to pass them to
the futex syscall. (Before the recent reimplementation, on x86_64 it
used FUTEX_CLOCK_REALTIME, but not on other architectures.)
Correctly implementing POSIX requirements, however, requires use of
FUTEX_CLOCK_REALTIME; passing a relative timeout to the kernel does
not conform to POSIX. The POSIX specification for sem_timedwait says
"The timeout shall be based on the CLOCK_REALTIME clock.". The POSIX
specification for clock_settime says "If the value of the
CLOCK_REALTIME clock is set via clock_settime(), the new value of the
clock shall be used to determine the time of expiration for absolute
time services based upon the CLOCK_REALTIME clock. This applies to the
time at which armed absolute timers expire. If the absolute time
requested at the invocation of such a time service is before the new
value of the clock, the time service shall expire immediately as if
the clock had reached the requested time normally.". If a relative
timeout is passed to the kernel, it is interpreted according to the
CLOCK_MONOTONIC clock, and so fails to meet that POSIX requirement in
the event of clock changes.
This patch makes sem_timedwait use lll_futex_timed_wait_bitset with
FUTEX_CLOCK_REALTIME when possible, as done in some other places in
NPTL. FUTEX_CLOCK_REALTIME is always available for supported Linux
kernel versions; unavailability of lll_futex_timed_wait_bitset is only
an issue for hppa (an issue noted in
<https://sourceware.org/glibc/wiki/PortStatus>, and fixed by the
unreviewed
<https://sourceware.org/ml/libc-alpha/2014-12/msg00655.html> that
removes the hppa lowlevellock.h completely).
In the FUTEX_CLOCK_REALTIME case, the glibc code still needs to check
for negative tv_sec and handle that as timeout, because the Linux
kernel returns EINVAL not ETIMEDOUT for that case, so resulting in
failures of nptl/tst-abstime and nptl/tst-sem13 in the absence of that
check. If we're trying to distinguish between Linux-specific and
generic-futex NPTL code, I suppose having this in an nptl/ file isn't
ideal, but there doesn't seem to be any better place at present.
It's not possible to add a testcase for this issue to the testsuite
because of the requirement to change the system clock as part of a
test (this is a case where testing would require some form of
container, with root in that container, and one whose CLOCK_REALTIME
is isolated from that of the host; I'm not sure what forms of
containers, short of a full virtual machine, provide that clock
isolation).
Tested for x86_64. Also tested for powerpc with the testcase included
in the bug.
[BZ #18138]
* nptl/sem_waitcommon.c: Include <kernel-features.h>.
(futex_abstimed_wait)
[__ASSUME_FUTEX_CLOCK_REALTIME && lll_futex_timed_wait_bitset]:
Use lll_futex_timed_wait_bitset with FUTEX_CLOCK_REALTIME instead
of lll_futex_timed_wait.
If xports is NULL in xprt_register we malloc it but if sock >
_rpc_dtablesize() that memory does not get initialised and may in theory
contain any value. Later we make a conditional jump in svc_getreq_common
based on the uninitialised memory and this caused a general protection
fault in rpc.statd on an older version of glibc but this code has not
changed since that version.
Following is the valgrind warning.
==26802== Conditional jump or move depends on uninitialised value(s)
==26802== at 0x5343A25: svc_getreq_common (in /lib64/libc-2.5.so)
==26802== by 0x534357B: svc_getreqset (in /lib64/libc-2.5.so)
==26802== by 0x10DE1F: ??? (in /sbin/rpc.statd)
==26802== by 0x10D0EF: main (in /sbin/rpc.statd)
==26802== Uninitialised value was created by a heap allocation
==26802== at 0x4C2210C: malloc (vg_replace_malloc.c:195)
==26802== by 0x53438BE: xprt_register (in /lib64/libc-2.5.so)
==26802== by 0x53450DF: svcudp_bufcreate (in /lib64/libc-2.5.so)
==26802== by 0x10FE32: ??? (in /sbin/rpc.statd)
==26802== by 0x10D13E: main (in /sbin/rpc.statd)
for ChangeLog
[BZ #17090]
[BZ #17620]
[BZ #17621]
[BZ #17628]
* NEWS: Update.
* elf/dl-tls.c (_dl_update_slotinfo): Clean up outdated DTV
entries with Static TLS too. Skip entries past the end of the
allocated DTV, from Alan Modra.
(tls_get_addr_tail): Update to glibc_likely/unlikely. Move
Static TLS DTV entry set up from...
(_dl_allocate_tls_init): ... here (fix modid assertion), ...
* elf/dl-reloc.c (_dl_nothread_init_static_tls): ... here...
* nptl/allocatestack.c (init_one_static_tls): ... and here...
* elf/dlopen.c (dl_open_worker): Drop l_tls_modid upper bound
for Static TLS.
* elf/tlsdeschtab.h (map_generation): Return size_t. Check
that the slot we find is associated with the given map before
using its generation count.
* nptl_db/db_info.c: Include ldsodefs.h.
(rtld_global, dtv_slotinfo_list, dtv_slotinfo): New typedefs.
* nptl_db/structs.def (DB_RTLD_VARIABLE): New macro.
(DB_MAIN_VARIABLE, DB_RTLD_GLOBAL_FIELD): Likewise.
(link_map::l_tls_offset): New struct field.
(dtv_t::counter): Likewise.
(rtld_global): New struct.
(_rtld_global): New rtld variable.
(dl_tls_dtv_slotinfo_list): New rtld global field.
(dtv_slotinfo_list): New struct.
(dtv_slotinfo): Likewise.
* nptl_db/td_symbol_list.c: Drop gnu/lib-names.h include.
(td_lookup): Rename to...
(td_mod_lookup): ... this. Use new mod parameter instead of
LIBPTHREAD_SO.
* nptl_db/td_thr_tlsbase.c: Include link.h.
(dtv_slotinfo_list, dtv_slotinfo): New functions.
(td_thr_tlsbase): Check DTV generation. Compute Static TLS
addresses even if the DTV is out of date or missing them.
* nptl_db/fetch-value.c (_td_locate_field): Do not refuse to
index zero-length arrays.
* nptl_db/thread_dbP.h: Include gnu/lib-names.h.
(td_lookup): Make it a macro implemented in terms of...
(td_mod_lookup): ... this declaration.
* nptl_db/db-symbols.awk (DB_RTLD_VARIABLE): Override.
(DB_MAIN_VARIABLE): Likewise.
We need to add a BND prefix before indirect branch at the end of
_dl_runtime_resolve to preserve bound registers.
[BZ #18134]
* sysdeps/x86_64/dl-trampoline.S (PRESERVE_BND_REGS_PREFIX): New.
(_dl_runtime_resolve): Add a BND prefix before indirect branch.
In bug 14906 the user complains that the inotify support in nscd
is not sufficient when it comes to detecting changes in the
configurationfiles that should be watched for the various databases.
The current nscd implementation uses inotify to watch for changes in
the configuration files, but adds watches only for IN_DELETE_SELF and
IN_MODIFY. These watches are insufficient to cover even the most basic
uses by a system administrator. For example using emacs or vim to edit
a configuration file should trigger a reload but it might not if
the editors use move to atomically update the file. This atomic update
changes the inode and thus removes the notification on the file (as
inotify is based on inodes). Thus the inotify support in nscd for
configuration files is insufficient to account for the average use
cases of system administrators and users.
The inotify support is significantly enhanced and described here:
https://www.sourceware.org/ml/libc-alpha/2015-02/msg00504.html
Tested on x86_64 with and without inotify support.
This patch makes soft-fp use static assertions in place of conditional
calls to abort, in places where there are checks for conditions (on
the types for which a macro is used) that the code is not prepared to
handle. The fallback definition of _FP_STATIC_ASSERT (for kernel use
only, as only relevant to compilers not supported for building glibc)
is as in misc/sys/cdefs.h.
This means that soft-fp only ever calls abort for _FP_UNREACHABLE
calls in builds with GCC versions before 4.5. Thus, there is no need
for an abort declaration or <stdlib.h> include, since the kernel code
handles defining abort as a macro itself - and so this avoids any need
for an __KERNEL__ condition on the abort declaration to avoid it
breaking with the kernel's macro definition. That is, this patch is
intended to make glibc's soft-fp code suitable for kernel use with no
kernel-local changes to the soft-fp code needed at all.
Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by the patch. One explicit <stdlib.h> include had to be
added to a file that was relying on the include from soft-fp.h.
* soft-fp/soft-fp.h (_FP_STATIC_ASSERT): New macro.
[_LIBC]: Do not include <stdlib.h>.
[!_LIBC] (abort): Remove declaration.
* soft-fp/op-2.h (_FP_MUL_MEAT_2_120_240_double): Use
_FP_STATIC_ASSERT instead of conditionally calling abort.
* soft-fp/op-common.h (_FP_FROM_INT): Likewise.
(_FP_EXTEND_CNAN): Likewise.
(FP_TRUNC): Likewise.
(__FP_CLZ): Likewise.
* sysdeps/powerpc/nofpu/flt-rounds.c: Include <stdlib.h>.
ldconfig is using an aux-cache to speed up the ld.so.cache update. It
is read by mmaping the file to a structure which contains data offsets
used as pointers. As they are not checked, it is not hard to get
ldconfig to segfault with a corrupted file. This happens for instance if
the file is truncated, which is common following a filesystem check
following a system crash.
This can be reproduced for example by truncating the file to roughly
half of it's size.
There is already some code in elf/cache.c (load_aux_cache) to check
for a corrupted aux cache, but it happens to be broken and not enough.
The test (aux_cache->nlibs >= aux_cache_size) compares the number of
libs entry with the cache size. It's a non sense, as it basically
assumes that each library entry is a 1 byte... Instead this commit
computes the theoretical cache size using the headers and compares it
to the real size.
With AIX port deprecated there is no need to check/define
HAVE_ASM_GLOBAL_DOT_NAME anymore since the current minimum binutils
supported (2.22) does not emit global symbol with dot.
This patch removes all the HAVE_ASM_GLOBAL_DOT_NAME definition and
checks for powerpc64 port.
The function feupdateenv has been fixed to correctly handle FE_DFL_ENV
and FE_NOMASK_ENV.
The fesetexceptflag function has been fixed to correctly handle setting
the new flags instead of just OR-ing the existing flags.
This fixes the test-fenv-return and test-fenvinline failures on hppa.
The constraints in the inline assembly in feholdexcept and fesetenv
are incorrect. The assembly modifies the buffer pointer, but doesn't
express that in the constraints. The simple fix is to remove the
modification of the buffer pointer which is no longer required by
the existing code, and adjust the one constraint that did express
the modification of bufptr.
The change fixes test-fenv when glibc is compiled with recent gcc.
This patch makes soft-fp use a new macro _FP_UNREACHABLE in place of
calling abort in unreachable default cases of switch statements.
_FP_UNREACHABLE expands to call __builtin_unreachable for GCC 4.5 and
later; the fallback to abort is thus only for kernel use.
Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by this patch. Also tested with the math/ tests for mips64
(in the case of fma there *was* previously an abort call generated,
unlike for the other operations - one switch only deals with a subset
of classes for one operand based on what could have been generated in
the earlier part of fma, whereas the other switches deal with all
combinations of two classes - and this is apparently too complicated
for the default case to have been optimized away).
* soft-fp/soft-fp.h (_FP_UNREACHABLE): New macro.
* soft-fp/op-common.h (_FP_MUL): Use _FP_UNREACHABLE instead of
abort.
(_FP_FMA): Likewise.
(_FP_DIV): Likewise.
This patch makes soft-fp headers consistently use multiple-include
guards, something previously done mainly only in the Linux kernel
version. The guard macros aren't the same as those used in the Linux
kernel, but there seems to be enough variation in such guards in Linux
kernel code that hopefully this version will be acceptable there.
Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/double.h [SOFT_FP_DOUBLE_H]: New multiple-include guard.
* soft-fp/extended.h [SOFT_FP_EXTENDED_H]: Likewise.
* soft-fp/op-1.h [SOFT_FP_OP_1_H]: Likewise.
* soft-fp/op-2.h [SOFT_FP_OP_2_H]: Likewise.
* soft-fp/op-4.h [SOFT_FP_OP_4_H]: Likewise.
* soft-fp/op-8.h [SOFT_FP_OP_8_H]: Likewise.
* soft-fp/op-common.h [SOFT_FP_OP_COMMON_H]: Likewise.
* soft-fp/quad.h [SOFT_FP_QUAD_H]: Likewise.
* soft-fp/single.h [SOFT_FP_SINGLE_H]: Likewise.
* soft-fp/soft-fp.h (SOFT_FP_H): Define to 1 rather than empty.
Add comment on closing #endif.
In the Linux kernel, some architectures have a single function that
uses different kinds of unpacking and packing depending on the
instruction being emulated, meaning it is not readily visible to the
compiler that variables from _FP_DECL and _FP_FRAC_DECL_* macros are
only used in cases where they were initialized. The existing copy of
soft-fp in the Linux kernel uses zero-initialization to avoid warnings
in this case, so while frowned upon as a warning suppression mechanism
in code built for glibc it seems appropriate to have such
zero-initialization conditional on __KERNEL__. This patch duly adds
it, via a macro _FP_ZERO_INIT that expands to empty for non-kernel
compilations.
Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/soft-fp.h (_FP_ZERO_INIT): New macro. Define depending
on [__KERNEL__].
* soft-fp/op-1.h (_FP_FRAC_DECL_1): Use _FP_ZERO_INIT.
* soft-fp/op-2.h (_FP_FRAC_DECL_2): Likewise.
* soft-fp/op-common.h (_FP_DECL): Likewise.
With copy relocation, address of protected data defined in the shared
library may be external. Compiler shouldn't asssume protected data will
be local. But due to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248
__attribute__((visibility("protected"))) doesn't work correctly, we need
to use asm (".protected xxx") instead.
* elf/ifuncdep2.c (global): Replace
__attribute__((visibility("protected"))) with
asm (".protected global").
* elf/ifuncmod1.c (global): Likewise.
* elf/ifuncmod5.c (global): Likewise.
My Linux kernel patch to update the kernel to current glibc soft-fp
<https://sourceware.org/ml/libc-alpha/2015-02/msg00107.html> still
leaves a few small differences between the two copies of soft-fp.
I think it's desirable to avoid such differences completely if
possible by having one set of sources suitable for use in both places.
To that end, this patch introduces a conditional on __KERNEL__ for the
path by which sfp-machine.h is included.
Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/soft-fp.h [!_LIBC && __KERNEL__]: Include
<asm/sfp-machine.h> instead of <sfp-machine.h>.
The manual gives "an example showing how to handle failure to open a
file correctly." The example function, open_sesame, uses the
newly-introduced strerror function and errno and
program_invocation_short_name variables. It fails to specify GNU
extensions, however, so attempts to use it in the following way:
int main (void) {open_sesame ("badname");}
fail during compilation with "error: ‘program_invocation_short_name’
undeclared", indicating the example is incomplete. The presence of
"#include"s suggest everything neccesary for the function to work should
be present. For completeness, the example is lacking the following line:
#define _GNU_SOURCE
as the declarations of program_invocation_*name in errno.h are wrapped
in an "#ifdef __USE_GNU" conditional.
The documentation of the variables is also expanded, adding that their
definition lies in errno.h and noting specifically they are GNU
extensions.
This patch fixes the inline feraiseexcept and feclearexcept macros for
powerpc by casting the input argument to integer before operation on it.
It fixes BZ#17776.
Since 2014-11-24 binutils git commit bb4d2ac2, readelf has appended
the symbol version to symbols shown in reloc dumps.
[BZ #16512]
* scripts/localplt.awk: Strip off symbol version.
* NEWS: Mention bug fix.
__ASSUME_PRLIMIT64 is defined in kernel-features.h for kernels 2.6.36
and later, but hppa, microblaze and sh did not add the prlimit64
syscall until 2.6.37. This patch adds corresponding undefines of
__ASSUME_PRLIMIT64 to those architectures' kernel-features.h files.
(This concludes the kernel-features.h fixes arising out of the review
- limited to macros defined in the architecture-independent
kernel-features.h file - I did in connection with the move to 2.6.32
minimum kernel version. For that subset of macros - I didn't check
any purely architecture-specific macros - I think they are now defined
for the correct kernel versions on each architecture after this
patch.)
[BZ #17779]
* sysdeps/unix/sysv/linux/hppa/kernel-features.h
[__LINUX_KERNEL_VERSION < 0x020625] (__ASSUME_PRLIMIT64):
Undefine.
* sysdeps/unix/sysv/linux/microblaze/kernel-features.h
[__LINUX_KERNEL_VERSION < 0x020625] (__ASSUME_PRLIMIT64):
Likewise.
* sysdeps/unix/sysv/linux/sh/kernel-features.h
[__LINUX_KERNEL_VERSION < 0x020625] (__ASSUME_PRLIMIT64):
Likewise.
Protocted symbol in shared library can only be accessed from PIE
or shared library. Linker in binutils 2.26 enforces it. We must
compile vismain with -fPIE and link it with -pie.
[BZ #17711]
* elf/Makefile (tests): Add vismain only if PIE is enabled.
(tests-pie): Add vismain.
(CFLAGS-vismain.c): New.
* elf/vismain.c: Add comments for PIE requirement.
The threshold in ldbl-96 atanhl for when to return the argument,
0x1p-28, is a bit too big, and that in ldbl-128ibm atanhl is much too
big (the relevant condition being x^3/3 being < 0.5ulp of x),
resulting in errors a bit above the limits of those considered
acceptable in glibc in the ldbl-96 case, and in large errors in the
ldbl-128ibm case. This patch changes those implementations to use
more appropriate thresholds and adds tests around the thresholds for
various formats.
Tested for x86_64, x86 and powerpc. x86_64 and x86 ulps updated
accordingly.
[BZ #18046]
[BZ #18047]
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c (__ieee754_atanhl): Use
0x1p-56L as threshold for just returning the argument.
* sysdeps/ieee754/ldbl-96/e_atanhl.c (__ieee754_atanhl): Use
0x1p-32L as threshold for just returning the argument.
* math/auto-libm-test-in: Add more tests of atanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulp: Likewise.
We want to avoid -Wno- options in makefiles as far as possible, by
cleaning up the underlying issues if possible or failing that by using
diagnostic pragmas. This patch eliminates the use of
-Wno-write-strings for sysdeps/ieee754/k_standard.c by using casts in
the source file to cast away const; those casts are encapsulated in a
macro that also deals with the choice of strings for float / double /
long double functions (for which the logic was previously replicated
many times).
Tested for x86_64; the only change to disassembly of installed
stripped shared libraries was a line number in an assertion.
* sysdeps/ieee754/k_standard.c (CSTR): New macro.
(__kernel_standard): Use CSTR macro when setting exc.name.
* sysdeps/ieee754/Makefile [$(subdir) = math]
(CFLAGS-k_standard.c): Remove variable.
math/Makefile currently has:
# The fdlibm code generates a lot of these warnings but is otherwise clean.
override CFLAGS += -Wno-uninitialized
This is of course undesirable; warnings should be disabled as narrowly
as possible. To remove this override, we need to fix files that
generate such warnings, or put warning-disabling pragmas in them.
This patch does so for Bessel function implementations, one of the
cases that have the warnings if the override is removed. The warnings
arise because functions set pointer variables p and q only for certain
values of the function argument, then use them unconditionally. As
the static functions in question only get called for arguments that
satisfy the last condition in the if/else chain, the natural fix is to
change the last "else if" to just "else", which this patch does. (The
ldbl-128 / ldbl-128ibm implementation of these functions is
substantially different and looks like it already does use "else" in
the last case in the nearest corresponding code.)
Tested for x86_64 and x86.
* sysdeps/ieee754/dbl-64/e_j0.c (pzero): Change last case for
setting p and q from "else if" to "else".
(qzero): Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c (pone): Likewise.
(qone): Likewise.
* sysdeps/ieee754/flt-32/e_j0f.c (pzerof): Likewise.
(qzerof): Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c (ponef): Likewise.
(qonef): Likewise.
* sysdeps/ieee754/ldbl-96/e_j0l.c (pzero): Likewise.
(qzero): Likewise.
* sysdeps/ieee754/ldbl-96/e_j1l.c (pone): Likewise.
(qone): Likewise.
The ldbl-128 and ldbl-128ibm implementations of acosl have similar
bugs, using a threshold of 0x1p-57L to determine when they just return
pi/2. Since the result pi/2 - asinl (x) is roughly pi/2 - x for small
x, the relevant cut-off is actually x being < 0.5ulp of 1. This patch
fixes the implementations to use that cut-off and adds tests of small
acos arguments.
Tested for powerpc and mips64. Also tested for x86_64 and x86; no
ulps updates needed.
[BZ #18038]
[BZ #18039]
* sysdeps/ieee754/ldbl-128/e_acosl.c (__ieee754_acosl): Only
return pi/2 for arguments below 0x1p-113L.
* sysdeps/ieee754/ldbl-128ibm/e_acosl.c (__ieee754_acosl): Only
return pi/2 for arguments below 0x1p-106L.
* math/auto-libm-test-in: Add more tests of acos.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, some asin implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
Tested for x86_64, x86, powerpc and mips64.
[BZ #16351]
* sysdeps/i386/fpu/e_asin.S (dbl_min): New object.
(MO): New macro.
(__ieee754_asin): Force underflow exception for results with small
absolute value.
* sysdeps/i386/fpu/e_asinf.S (flt_min): New object.
(MO): New macro.
(__ieee754_asinf): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/dbl-64/e_asin.c: Include <float.h> and <math.h>.
(__ieee754_asin): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/flt-32/e_asinf.c: Include <float.h>.
(__ieee754_asinf): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-128/e_asinl.c: Include <float.h>.
(__ieee754_asinl): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Include <float.h>.
(__ieee754_asinl): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-96/e_asinl.c: Include <float.h>.
(__ieee754_asinl): Force underflow exception for results with
small absolute value.
* sysdeps/x86_64/fpu/multiarch/e_asin.c [HAVE_FMA4_SUPPORT]:
Include <math.h>.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 16351.
* math/auto-libm-test-out: Regenerated.
The ldbl-128ibm implementation of logbl produces incorrect results
when the high part of the argument is a power of 2 and the low part a
nonzero number with the opposite sign (and so the returned exponent
should be 1 less than that of the high part). For example, logbl
(0x1.ffffffffffffffp1L) returns 2 but should return 1. (This is
similar to (fixed) bug 16740 for frexpl, and (fixed) bug 18029 for
ilogbl.) This patch adds checks for that case.
Tested for powerpc.
[BZ #18030]
* sysdeps/ieee754/ldbl-128ibm/s_logbl.c (__logbl): Adjust exponent
of power of 2 down when low part has opposite sign.
* math/libm-test.inc (logb_test_data): Add more tests.
The ldbl-128ibm implementation of ilogbl produces incorrect results
when the high part of the argument is a power of 2 and the low part a
nonzero number with the opposite sign (and so the returned exponent
should be 1 less than that of the high part). For example, ilogbl
(0x1.ffffffffffffffp1L) returns 2 but should return 1. (This is
similar to (fixed) bug 16740 for frexpl, and bug 18030 for logbl.)
This patch adds checks for that case.
Tested for powerpc.
[BZ #18029]
* sysdeps/ieee754/ldbl-128ibm/e_ilogbl.c (__ieee754_ilogbl):
Adjust exponent of power of 2 down when low part has opposite
sign.
* math/libm-test.inc (ilogb_test_data): Add more tests.
If a locale alias is defined in locale.alias but not in an archive,
and the referenced locale is only present in the archive, setlocale
will fail if given the alias name. This is unintuitive. This patch
fixes it, arranging for the locale archive to be searched again after
alias expansion.
for ChangeLog
[BZ #15969]
* locale/findlocale.c (_nl_find_locale): Retry archive search
after alias expansion.
This patch fixes the missing "__memcpy_ppc" symbol for memmove-ppc64
object in static builds. Since memcpy ifunc is not enabled in static
mode, the specialized symbols are not provided. The patch changed the
it to just "__memcpy" instead.
The ldbl-128ibm implementation of asinhl uses cut-offs of 0x1p28 and
0x1p-29 to determine when to use simpler formulas that avoid possible
overflow / underflow. Both those cut-offs are inappropriate for this
format, resulting in large errors. This patch changes the code to use
more appropriate cut-offs of 0x1p56 and 0x1p-56, adding tests around
the cut-offs for various floating-point formats.
Tested for powerpc. Also tested for x86_64 and x86 and updated ulps.
[BZ #18020]
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c (__asinhl): Use 2**56 and
2**-56 not 2**28 and 2**-29 as thresholds for simpler formulas.
* math/auto-libm-test-in: Add more tests of asinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Similarly to what we did for in6_addr, we need a macro
to guard in6_pktinfo and ip6_mtuinfo too.
Cc: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
This patch is an "easy win" partial fix for BZ #16145, which notes
the heavy contention on tzset_lock when multiple threads are converting
times with localtime_r().
In __tz_convert(), the lock does not need to be held after
__tzfile_compute() / __tz_compute() have been called, so we can move the
unlock up. At this point there is still significant work to be done in
__offtime(), so we see some improvement (in my testing with 8 cores
banging on localtime_r(), ~20% improvement in throughput).
The ldbl-128ibm implementation of acoshl uses a cut-off of 0x1p28 to
determine when to use log(x) + log(2) as a formula. That cut-off is
too small for this format, resulting in large errors. This patch
changes it to a more appropriate cut-off of 0x1p56, adding tests
around the cut-offs for various floating-point formats.
Tested for powerpc. Also tested for x86_64 and x86 and updated ulps.
[BZ #18019]
* sysdeps/ieee754/ldbl-128ibm/e_acoshl.c (__ieee754_acoshl): Use
2**56 not 2**28 as threshold for log (2x) formula.
* math/auto-libm-test-in: Add more tests of acosh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The stack-grows-down case is missing paren around the buf cast.
The stack-grows-up case is missing a cast with the buf assignment.
This leads to build failures due to -Werror:
vfprintf.c: In function '_IO_vfprintf_internal':
vfprintf.c:1738:16: error: initialization from incompatible pointer type [-Werror]
Various x86 / x86_64 versions of scalb / scalbf / scalbl produce
spurious "invalid" exceptions for (qNaN, -Inf) arguments, because this
is wrongly handled like (+/-Inf, -Inf) which *should* raise such an
exception. (In fact the NaN case of the code determining whether to
quietly return a zero or a NaN for second argument -Inf was
accidentally dead since the code had been made to return a NaN with
exception.) This patch fixes the code to do the proper test for an
infinity as distinct from a NaN.
(Since the existing code does nothing to distinguish qNaNs and sNaNs
here, this patch doesn't either. If in future we systematically
implement proper sNaN semantics following TS 18661-1:2014, there will
be lots of bugs to address - Thomas found lots of issues with his
patch <https://sourceware.org/ml/libc-ports/2013-04/msg00008.html> to
add SNaN tests (which never went in and would now require significant
reworking).)
Tested for x86_64 and x86. Committed.
[BZ #16783]
* sysdeps/i386/fpu/e_scalb.S (__ieee754_scalb): Do not handle
arguments (NaN, -Inf) the same as (+/-Inf, -Inf).
* sysdeps/i386/fpu/e_scalbf.S (__ieee754_scalbf): Likewise.
* sysdeps/i386/fpu/e_scalbl.S (__ieee754_scalbl): Likewise.
* sysdeps/x86_64/fpu/e_scalbl.S (__ieee754_scalbl): Likewise.
* math/libm-test.inc (scalb_test_data): Add more tests.
Both open and openat load their last argument 'mode' lazily, using
va_arg() only if O_CREAT is found in oflag. This is wrong, mode is also
necessary if O_TMPFILE is in oflag.
By chance on x86_64, the problem wasn't evident when using O_TMPFILE
with open, as the 3rd argument of open, even when not loaded with
va_arg, is left untouched in RDX, where the syscall expects it.
However, openat was not so lucky, and O_TMPFILE couldn't be used: mode
is the 4th argument, in RCX, but the syscall expects its 4th argument in
a different register than the glibc wrapper, in R10.
Introduce a macro __OPEN_NEEDS_MODE (oflag) to test if either O_CREAT or
O_TMPFILE is set in oflag.
Tested on Linux x86_64.
[BZ #17523]
* io/fcntl.h (__OPEN_NEEDS_MODE): New macro.
* io/bits/fcntl2.h (open): Use it.
(openat): Likewise.
* io/open.c (__libc_open): Likewise.
* io/open64.c (__libc_open64): Likewise.
* io/open64_2.c (__open64_2): Likewise.
* io/open_2.c (__open_2): Likewise.
* io/openat.c (__openat): Likewise.
* io/openat64.c (__openat64): Likewise.
* io/openat64_2.c (__openat64_2): Likewise.
* io/openat_2.c (__openat_2): Likewise.
* sysdeps/mach/hurd/open.c (__libc_open): Likewise.
* sysdeps/mach/hurd/openat.c (__openat): Likewise.
* sysdeps/posix/open64.c (__libc_open64): Likewise.
* sysdeps/unix/sysv/linux/dl-openat64.c (openat64): Likewise.
* sysdeps/unix/sysv/linux/generic/open.c (__libc_open): Likewise.
(__open_nocancel): Likewise.
* sysdeps/unix/sysv/linux/generic/open64.c (__libc_open64): Likewise.
* sysdeps/unix/sysv/linux/open64.c (__libc_open64): Likewise.
* sysdeps/unix/sysv/linux/openat.c (__OPENAT): Likewise.
DNSSEC defines a number of response types that one me expect when the
DO bit is set. We don't process any of them, but since we do allow
setting the DO bit, skip them without logging an error since it is
only a nuisance.
Tested on x86_64.
[BZ #14841]
* resolv/gethnamaddr.c (getanswer): Skip logging if
RES_USE_DNSSEC is set.
* resolv/nss_dns/dns-host.c (getanswer_r): Likewise.
for ChangeLog
* include/stdc-predef.h (__STDC_ISO_10646__): Update to
201304L, for Unicode 7.
for localedata/ChangeLog
* unicode-gen/ctype_compatibility.py: Use date ranges in
copyright notice.
* unicode-gen/ctype_compatibility_test_cases.py: Likewise.
* unicode-gen/gen_unicode_ctype.py: Likewise.
* unicode-gen/utf8_compatibility.py: Likewise.
* unicode-gen/utf8_gen.py: Likewise. Use upper case for
global variables, use tuples for global constant arrays. From
Mike FABIAN. Suggested by Mike Frysinger <vapier@gentoo.org>.
We compile gcrt1.o with -fPIC to support both "gcc -pg" and "gcc -pie -pg".
[BZ #17836]
* csu/Makefile (extra-objs): Add gmon-start.o if not builing
shared library. Add gmon-start.os otherwise.
($(objpfx)g$(start-installed-name)): Use $(objpfx)S%
$(objpfx)gmon-start.os if builing shared library.
($(objpfx)g$(static-start-installed-name)): Likewise.
soft-fp calls abort in various cases that the code doesn't handle, all
cases that should never actually occur for any supported choice of
types.
Calling an abort function is not appropriate for kernel use, so the
Linux kernel redefines abort as a macro in various ways in the ports
using this code, typically to "return 0" or similar.
One use of abort in soft-fp is inside a comma expression and doesn't
work with such a macro. This patch changes it to use a statement
expression.
Tested for powerpc-nofpu that installed shared libraries are unchanged
by this patch.
(There are two classes of aborts: those to make control flow visible
to the compiler, in default cases of switches over _FP_CLS_COMBINE,
which could reasonably change to __builtin_unreachable for glibc but
would still need to handle pre-4.5 compilers for kernel use, and those
intended to detect any use of soft-fp for combinations of types the
code doesn't know how to handle, which could reasonably become link
failures if the calls should always be optimized away. But those are
separate possible future enhancements.)
* soft-fp/op-common.h (_FP_FROM_INT): Wrap call to abort in
expression inside statement expression.
* sysdeps/unix/sysv/linux/s390/lowlevellock.h: Include
<sysdeps/nptl/lowlevellock.h> and remove macros and
functions that are now defined there.
(SYS_futex): Remove.
(lll_compare_and_swap): Remove.
* sysdeps/s390/bits/atomic.h (atomic_exchange_acq): Define.
The POSIX function scandir calls scandirat, which is not a POSIX
function. This patch fixes this by making it use __scandirat and
making scandirat a weak alias. There are no changes for scandir64 /
scandirat64 because those are both _GNU_SOURCE-only functions so no
namespace issue arises for them.
Tested for x86_64 that the disassembly of installed shared libraries
is unchanged by this patch.
[BZ #17999]
* dirent/scandir.c [!SCANDIR] (SCANDIRAT): Define to __scandirat
instead of scandirat.
* dirent/scandirat.c [!SCANDIRAT] (SCANDIRAT): Likewise.
[!SCANDIRAT] (SCANDIRAT_WEAK_ALIAS): Define.
[SCANDIRAT_WEAK_ALIAS] (scandirat): Define as weak alias of
__scandirat.
* include/dirent.h (scandirat): Do not use libc_hidden_proto.
(__scandirat): Declare. Use libc_hidden_proto.
* conform/Makefile (test-xfail-POSIX2008/dirent.h/linknamespace):
Remove variable.
(test-xfail-XOPEN2K8/dirent.h/linknamespace): Likewise.
This patch fixes bug 15319, missing underflows from atan / atan2 when
the result of atan is very close to its small argument (or that of
atan2 is very close to the ratio of its arguments, which may be an
exact division).
The usual approach of doing an underflowing computation if the
computed result is subnormal is followed. For 32-bit x86, there are
extra complications: the inline __ieee754_atan2 in bits/mathinline.h
needs to be disabled for float and double because other libm functions
using it generally rely on getting proper underflow exceptions from
it, while the out-of-line functions have to remove excess range and
precision from the underflowing result so as to return an exact 0 in
the case where errno should be set for underflow to 0. (The failures
I saw without that are similar to those Carlos reported for other
functions, where I haven't seen a response to
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>
confirming if my diagnosis is correct. Arguably all libm functions
with float and double returns should remove excess range and
precision, but that's a separate matter.)
The x86_64 long double case reported in a comment in bug 15319 is not
a bug (it's an argument of LDBL_MIN, and x86_64 is an after-rounding
architecture so the correct IEEE result is not to raise underflow in
the given rounding mode, in addition to treating the result as an
exact LDBL_MIN being within the newly clarified documentation of
accuracy goals). I'm presuming that the fpatan instruction can be
trusted to raise appropriate exceptions when the (long double) result
underflows (after rounding) and so no changes are needed for x86 /
x86_64 long double functions here; empirically this is the case for
the cases covered in the testsuite, on my system.
Tested for x86_64, x86, powerpc and mips64. Only 32-bit x86 needs
ulps updates (for the changes to inlines meaning some functions no
longer get excess precision from their __ieee754_atan2* calls).
[BZ #15319]
* sysdeps/i386/fpu/e_atan2.S (dbl_min): New object.
(MO): New macro.
(__ieee754_atan2): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/e_atan2f.S (flt_min): New object.
(MO): New macro.
(__ieee754_atan2f): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/s_atan.S (dbl_min): New object.
(MO): New macro.
(__atan): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/i386/fpu/s_atanf.S (flt_min): New object.
(MO): New macro.
(__atanf): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/ieee754/dbl-64/e_atan2.c: Include <float.h> and
<math.h>.
(__ieee754_atan2): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/dbl-64/s_atan.c: Include <float.h> and
<math_private.h>.
(atan): Force underflow exception for results with small absolute
value.
* sysdeps/ieee754/flt-32/s_atanf.c: Include <float.h>.
(__atanf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Include <float.h> and
<math.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Include <float.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/x86/fpu/bits/mathinline.h
[!__SSE2_MATH__ && !__x86_64__ && __LIBC_INTERNAL_MATH_INLINES]
(__ieee754_atan2): Only define inline for long double.
* sysdeps/x86_64/fpu/multiarch/e_atan2.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Include <math.h>.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 15319. Add more tests of atan2.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (casin_test_data): Do not mark underflow
exceptions as possibly missing for bug 15319.
(casinh_test_data): Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Update.
The implementation of the (XSI POSIX) functions hsearch / hcreate /
hdestroy uses hsearch_r / hcreate_r / hdestroy_r, which are not POSIX
functions. This patch makes those into weak aliases for __*_r and
uses those names for the calls within libc.
Tested for x86_64 that the disassembly of installed shared libraries
is unchanged by this patch.
[BZ #17996]
* include/search.h (hcreate_r): Don't use libc_hidden_proto.
(hdestroy_r): Likewise.
(hsearch_r): Likewise.
(__hcreate_r): Declare and use libc_hidden_proto.
(__hdestroy_r): Likewise.
(__hsearch_r): Likewise.
* misc/hsearch.c (hsearch): Call __hsearch_r instead of hsearch_r.
(hcreate): Call __hcreate_r instead of hcreate_r.
(__hdestroy): Call __hdestroy_r instead of hdestroy_r.
* misc/hsearch_r.c (hcreate_r): Rename to __hcreate_r and define
as weak alias of __hcreate_r.
(hdestroy_r): Rename to __hdestroy_r and define as weak alias of
__hdestroy_r.
(hsearch_r): Rename to __hsearch_r and define as weak alias of
__hsearch_r.
* conform/Makefile (test-xfail-XPG3/search.h/linknamespace):
Remove variable.
(test-xfail-XPG4/search.h/linknamespace): Likewise.
(test-xfail-UNIX98/search.h/linknamespace): Likewise.
(test-xfail-XOPEN2K/search.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/search.h/linknamespace): Likewise.
This seems to have been left behind as an artifact of some old changes
and can now be merged. Verified that the only generated code change
on x86_64 is that of line numbers in asserts, like so:
@@ -27253,7 +27253,7 @@ Disassembly of section .text:
416f09: 48 89 42 20 mov %rax,0x20(%rdx)
416f0d: e9 7e f6 ff ff jmpq 416590 <_int_free+0x230>
416f12: b9 3f 9f 4a 00 mov $0x4a9f3f,%ecx
- 416f17: ba d5 0f 00 00 mov $0xfd5,%edx
+ 416f17: ba d6 0f 00 00 mov $0xfd6,%edx
416f1c: be a8 9b 4a 00 mov $0x4a9ba8,%esi
416f21: bf 6a 9c 4a 00 mov $0x4a9c6a,%edi
416f26: e8 45 e8 ff ff callq 415770 <__malloc_assert>
We are replacing all of the bespoke alignment code with
ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and PTR_ALIGN_DOWN.
This cleans up malloc/malloc.c, malloc/arena.c, and
elf/dl-reloc.c. It also makes all the code consistently
use pagesize, and powerof2 as required.
Code size is reduced with the removal of precomputed
pagemask, and use of pagesize instead. No measurable
difference in performance.
No regressions on x86_64.
posix_spawn (a standard POSIX function) brings in a use of getrlimit64
(not a standard POSIX function). This patch fixes this by using
__getrlimit64 and making getrlimit64 a weak alias.
This is more complicated than some such changes because of files that
define getrlimit64 in their own way using symbol versioning after
including the main sysdeps/unix/sysv/linux/getrlimit64.c with a
getrlimit macro defined. There are various existing patterns for such
cases in glibc; the one I've used here is that a getrlimit64 macro
disables the weak_alias / libc_hidden_weak calls, leaving it to the
including file to define the getrlimit64 name in whatever way is
appropriate.
Tested for x86_64 and x86 that installed stripped shared libraries are
unchanged by this patch.
[BZ #17991]
* include/sys/resource.h (__getrlimit64): Declare. Use
libc_hidden_proto.
* resource/getrlimit64.c (getrlimit64): Rename to __getrlimit64
and define as weak alias of __getrlimit64. Use libc_hidden_weak.
* sysdeps/posix/spawni.c (__spawni): Call __getrlimit64 instead of
getrlimit64.
* sysdeps/unix/sysv/linux/getrlimit64.c (getrlimit64): Rename to
__getrlimit64.
[!getrlimit64] (getrlimit64): Define as weak alias of
__getrlimit64. Use libc_hidden_weak.
* sysdeps/unix/sysv/linux/i386/getrlimit64.c (getrlimit64): Define
using __getrlimit64 not __new_getrlimit64.
(__GI_getrlimit64): Likewise.
* sysdeps/unix/sysv/linux/mips/getrlimit64.c (getrlimit64):
Likewise.
(__GI_getrlimit64): Likewise.
(__old_getrlimit64): Use __getrlimit64 not __new_getrlimit64.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/syscalls.list
(getrlimit): Add __getrlimit64 alias.
* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (getrlimit):
Likewise.
* conform/Makefile (test-xfail-XOPEN2K/spawn.h/linknamespace):
Remove variable.
(test-xfail-POSIX2008/spawn.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/spawn.h/linknamespace): Likewise.
This patch refines the math.texi documentation of the goals for when
libm function raise the inexact and underflow exceptions. The
previous text was problematic in some cases around the underflow
threshold.
* Strictly, it would have meant that if the mathematical result of pow
was very slightly below DBL_MIN, for example, it was required to
raise the underflow exception; although normally a few ulps error
would be OK, if that error meant the computed value was slightly
above DBL_MIN it would fail the previously described underflow
exception goal.
* Similarly, strict IEEE semantics would imply that sin (DBL_MIN), in
round-to-nearest mode, underflows on before-rounding but not
after-rounding architectures, while returning DBL_MIN; the previous
wording would have required an underflow exception, so preventing
checks for a result with absolute value below DBL_MIN from being
sufficient checks to determine whether the exception is required.
(Under the previous wording, checks for a result with absolute value
<= DBL_MIN wouldn't have been sufficient either, because in
FE_TOWARDZERO mode a result of DBL_MIN definitely does not result
from an underflowing infinite-precision result.)
* The previous wording about rounding infinite-precision values could
be taken to mean all exceptions including "inexact" must be
consistent with some such value. That would mean that a result of
DBL_MIN in FE_UPWARD mode with "inexact" raised must also have
"underflow" raised on before-rounding architectures. Again, that
would cause problems for computing a result (possibly with spurious
"inexact" exceptions) and then using a rounding-mode-independent
test for results with absolute value below DBL_MIN to determine
whether an underflow exception must be forced in case the underflows
from intermediate computations happened to be exact.
By refining the documentation, this patch avoids stating goals for
accuracy close to the underflow threshold that were stricter than
applied anywhere else, and allows the implementation strategy of:
compute a result within a few ulps, taking care to avoid underflows in
intermediate computations, then force an underflow exception if that
result was subnormal. Only fully-defined functions such as fma need
to take greater care about the exact underflow threshold (including
its dependence on whether the architecture is before-rounding or
after-rounding, and on the rounding mode on after-rounding
architectures).
(If the rounding mode is changed as part of the computation, it's
still necessary to ensure that not just intermediate computations, but
the final computation of the result to be returned, do not raise
underflow if that result is the least normal value and underflow would
be inconsistent with the original rounding mode. Since such code can
readily discard exceptions as part of saving and restoring the
rounding mode - SET_RESTORE_ROUND_NOEX etc. - I don't think that
should be a problem in practice.)
* manual/math.texi (Errors in Math Functions): Clarify goals
regarding inexact and underflow exceptions.
ia64 seems to use the same implementation of low-level locks as the
generic Linux lowlevellock.h. The futex syscalls are somewhat
different, but Roland thought it shouldn't matter. Note that the futex
calls are on the slow path always (except for PI mutexes).
Removing the custom low-level lock implementation will make further
refactoring easier, for example adding proper error checking to futex
operations.
Various remquo implementations produce a zero remainder with the wrong
sign (a zero remainder should always have the sign of the first
argument, as specified in IEEE 754) in round-downward mode, resulting
from the sign of 0 - 0. This patch checks for zero results and fixes
their sign accordingly.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #17987]
* sysdeps/ieee754/dbl-64/s_remquo.c (__remquo): Ensure sign of
zero result does not depend on the sign resulting from
subtraction.
* sysdeps/ieee754/dbl-64/wordsize-64/s_remquo.c (__remquo):
Likewise.
* sysdeps/ieee754/flt-32/s_remquof.c (__remquof): Likewise.
* sysdeps/ieee754/ldbl-128/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-96/s_remquol.c (__remquol): Likewise.
* math/libm-test.inc (remquo_test_data): Add more tests.
Various remquo implementations, when computing the last three bits of
the quotient, have spurious overflows when 4 times the second argument
to remquo overflows. These overflows can in turn cause bad results in
rounding modes where that overflow results in a finite value. This
patch adds tests to avoid the problem multiplications in cases where
they would overflow, similar to those that control an earlier
multiplication by 8.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #17978]
* sysdeps/ieee754/dbl-64/s_remquo.c (__remquo): Do not form
products 4 * y and 2 * y where those would overflow.
* sysdeps/ieee754/dbl-64/wordsize-64/s_remquo.c (__remquo):
Likewise.
* sysdeps/ieee754/flt-32/s_remquof.c (__remquof): Likewise.
* sysdeps/ieee754/ldbl-128/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-96/s_remquol.c (__remquol): Likewise.
* math/libm-test.inc (remquo_test_data): Add more tests.
I see an error
../sysdeps/mips/memcpy.S:209:68: error: "_ABIO64" is not defined [-Werror=undef]
#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABIO32) || (_MIPS_SIM == _ABIO64))
^
cc1: some warnings being treated as errors
in MIPS builds. This patch arranges for _ABIO64 to be defined with
the same value as GCC uses when building for O64 (the ABI itself isn't
supported by glibc, but defining the macro seems the simplest way of
avoiding the error in code that may be shared with other C libraries).
* sysdeps/mips/sgidefs.h [!_ABIO64] (_ABIO64): New macro.
I see an error
../sysdeps/mips/strcmp.S:25:7: error: "_COMPILING_NEWLIB" is not defined [-Werror=undef]
#elif _COMPILING_NEWLIB
^
cc1: some warnings being treated as errors
in MIPS builds. (This is with GCC 4.9; it's possible that the DR#412
change in GCC 5 - see
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60570> - means that
-Wundef diagnostics no longer occur for #elif conditions where a
previous group's condition was true, just as with other errors there.)
This patch duly adjusts the conditionals to test whether
_COMPILING_NEWLIB is defined.
* sysdeps/mips/memcpy.S [_COMPILING_NEWLIB]: Change condition to
[defined _COMPILING_NEWLIB].
* sysdeps/mips/memset.S [_COMPILING_NEWLIB]: Likewise.
* sysdeps/mips/strcmp.S [_COMPILING_NEWLIB]: Likewise.
I see an error
In file included from ../sysdeps/mips/include/sys/asm.h:20:0,
from ../sysdeps/mips/start.S:39:
../sysdeps/mips/sys/asm.h:421:5: error: "__mips_isa_rev" is not defined [-Werror=undef]
#if __mips_isa_rev < 6
^
cc1: some warnings being treated as errors
in MIPS builds. As sys/asm.h is an installed header, it seems better
to test for !defined __mips_isa_rev here, instead of defining it to 0
as done in sysdeps/unix/mips/sysdep.h, to avoid perturbing any code
outside glibc that tests whether __mips_isa_rev is defined; this patch
does so.
* sysdeps/mips/sys/asm.h [__mips_isa_rev < 6]: Change condition to
[!defined __mips_isa_rev || __mips_isa_rev < 6].
Remove IA64 PAGE_SIZE related macros as PAGE_SIZE is not defined.
Also remove macros that are only used for BFD's trad-core support
which is not relavant for IA64 according to the thread starting
here:
https://sourceware.org/ml/libc-ports/2013-11/msg00028.html
This patch is neither built nor tested but is equivalent to a MIPS
patch for the same fix.
The dbl-64/wordsize-64 remquo implementation follows similar logic to
various other implementations, but where that logic computes some
absolute values, it wrongly uses a previously computed bit-pattern for
the absolute value of the first argument, where actually it needs the
absolute value of the first argument mod 8 times the second. This
patch fixes it to compute the correct absolute value.
The integer quotient result of remquo is only specified mod 8
(including its sign); architecture-specific versions may well vary in
what results they give for higher bits of that result (and indeed bug
17569 gives an example correct result from __builtin_remquo giving 9
for that result, where the particular glibc implementation used in
that bug report would give 1 after this fix). Thus, this patch adapts
the tests of remquo to test that result only mod 8, to allow for such
variation when tests with higher quotient are included.
Tested for x86_64 and x86.
[BZ #17569]
* sysdeps/ieee754/dbl-64/wordsize-64/s_remquo.c (__remquo):
Compute absolute value of x as modified by fmod, not original
value of x.
* math/libm-test.inc (RUN_TEST_ffI_f1): Rename to
RUN_TEST_ffI_f1_mod8. Check extra return value mod 8.
(RUN_TEST_LOOP_ffI_f1): Rename to RUN_TEST_LOOP_ffI_f1_mod8. Call
RUN_TEST_ffI_f1_mod8.
(remquo_test_data): Add more tests.
Similarly to sqrt in
<https://sourceware.org/ml/libc-alpha/2015-02/msg00353.html>, the
powerpc sqrtf implementation for when _ARCH_PPCSQ is not defined also
relies on a * b + c being contracted into a fused multiply-add.
Although this contraction is not explicitly disabled for e_sqrtf.c, it
still seems appropriate to make the file explicit about its
requirements by using __builtin_fmaf; this patch does so.
Furthermore, it turns out that doing so fixes the observed inaccuracy
and missing exceptions (that is, that without explicit __builtin_fmaf
usage, it was not being compiled as intended).
Tested for powerpc32 (hard float).
[BZ #17967]
* sysdeps/powerpc/fpu/e_sqrtf.c (__slow_ieee754_sqrtf): Use
__builtin_fmaf instead of relying on contraction of a * b + c.
As Adhemerval noted in
<https://sourceware.org/ml/libc-alpha/2015-01/msg00451.html>, the
powerpc sqrt implementation for when _ARCH_PPCSQ is not defined is
inaccurate in some cases.
The problem is that this code relies on fused multiply-add, and relies
on the compiler contracting a * b + c to get a fused operation. But
sysdeps/ieee754/dbl-64/Makefile disables contraction for e_sqrt.c,
because the implementation in that directory relies on *not* having
contracted operations.
While it would be possible to arrange makefiles so that an earlier
sysdeps directory can disable the setting in
sysdeps/ieee754/dbl-64/Makefile, it seems a lot cleaner to make the
dependence on fused operations explicit in the .c file. GCC 4.6
introduced support for __builtin_fma on powerpc and other
architectures with such instructions, so we can rely on that; this
patch duly makes the code use __builtin_fma for all such fused
operations.
Tested for powerpc32 (hard float).
2015-02-12 Joseph Myers <joseph@codesourcery.com>
[BZ #17964]
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrt): Use
__builtin_fma instead of relying on contraction of a * b + c.
The tv_sec is of type time_t in both struct timeval and struct timespec.
This matches the implementation and also the relevant standard (checked
C11 for timespec and opengroup for timeval).
This patch fixes the remaining part of bug 16560, spurious underflows
from exp2 of arguments close to 0 (when the result is close to 1, so
should not underflow), by just using 1+x instead of a more complicated
calculation when the argument is sufficiently small.
Tested for x86_64, x86 and mips64.
[BZ #16560]
* math/e_exp2l.c [LDBL_MANT_DIG == 106] (LDBL_EPSILON): Undefine
and redefine.
(__ieee754_exp2l): Do not multiply small fractional parts by
M_LN2l.
* sysdeps/i386/fpu/e_exp2l.S (__ieee754_exp2l): Just add 1 to
small argument.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise.
* sysdeps/x86_64/fpu/e_exp2l.S (__ieee754_exp2l): Likewise.
* math/auto-libm-test-in: Add more tests of exp2.
* math/auto-libm-test-out: Regenerated.
This patch optimizes strncpy for power7 for unaligned source or
destination address. The source or destination address is aligned
to doubleword and data is shifted based on the alignment and
added with the previous loaded data to be written as a doubleword.
For each load, cmpb instruction is used for faster null check.
The new optimization shows 10 to 70% of performance improvement
for longer string though it does not show big difference on string
size less than 16 due to additional checks.Hence this new algorithm
is restricted to string greater than 16.
pthread_mutexattr_settype adds PTHREAD_MUTEX_NO_ELISION_NP to kind,
which is an internal flag that pthread_mutexattr_gettype shouldn't
expose, since pthread_mutexattr_settype wouldn't accept it.
and revert the corresponding part of ba90e05 which was making the fix
necessary.
* abi-tags: Revert ae20c9a: rename back gnu into gnu-gnu.
* configure.ac, configure: Revert ba90e05: modify gnu-* host_os back
into gnu-gnu, and update comment to refer to abi-tags.
This patch makes sincos set errno to EDOM when passed an infinity,
similarly to sin and cos.
Tested for x86_64, x86, powerpc and mips64. I don't know if the
architecture-specific implementations for ia64 and m68k might need
corresponding fixes.
2015-02-11 Joseph Myers <joseph@codesourcery.com>
[BZ #15467]
* sysdeps/ieee754/dbl-64/s_sincos.c: Include <errno.h>.
(__sincos): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/flt-32/s_sincosf.c: Include <errno.h>.
(SINCOSF_FUNC): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-128/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-128ibm/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-96/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* math/libm-test.inc (sincos_test_data): Test errno setting.
As noted in
<https://sourceware.org/ml/libc-alpha/2014-10/msg00369.html>, soft-fp
sysdeps subdirectories (and more generally, subdirectories where
sysdeps/foo/Implies contains foo/bar) are unnecessary and should be
eliminated. This patch does so for MIPS.
Tested for MIPS64 (all three ABIs, soft-float) that installed stripped
shared libraries are unchanged by this patch.
* sysdeps/mips/soft-fp/sfp-machine.h: Move to ....
* sysdeps/mips/mips32/sfp-machine.h: ... here.
* sysdeps/mips/mips64/soft-fp/Makefile: Move to ....
* sysdeps/mips/mips64/Makefile: ... here.
* sysdeps/mips/mips64/soft-fp/e_sqrtl.c: Move to ....
* sysdeps/mips/mips64/e_sqrtl.c: ... here.
* sysdeps/mips/mips64/soft-fp/sfp-machine.h: Move to ....
* sysdeps/mips/mips64/sfp-machine.h: ... here.
* sysdeps/mips/mips32/Implies: Remove mips/soft-fp.
* sysdeps/mips/mips64/n32/Implies: Remove mips/mips64/soft-fp.
* sysdeps/mips/mips64/n64/Implies: Likewise.
Current minimum binutils supported (2.22) has ".machine altivec" support
as default, so there is no need to add a configure check for such
functionality. This patches removes the configure checks for it.
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation. The patch removes the internal IFUNC for wordcopy
symbols and uses local branches in the memmmove optimization instead.
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation.
This change by removing then and used the optimized memmove instead
for supported chips.
This patch simplify the default bcopy symbol for powerpc64 by just using
memmove instead of implementing using the default bcopy. Since the
symbol is deprecated, it trades speed by code size.
This removes code which actually never happens, and is already taken
care of in the function.
This is in the second part of select, when the __mach_msg() function
over the portset has returned something else than MACH_MSG_SUCCESS. I
guess in the past the value returned by __mach_msg() was stored in err,
so this code was necessary to set back err to 0, but now it is stored in
msgerr, so err is already still 0 by default. It can thus never contain
MACH_RCV_TIMED_OUT, i.e. the code is dead. The first case mentioned in
the comment is already handled: on time out with no message, err is
already still the default 0. On time out due to poll, err would still be
0, unless some of the io_select RPCs has returned EINTR, in which case
it contains EINTR. If any other io_select RPCs had returned a proper
answer, got!=0, and thus err is set to 0 just below. The code is thus
indeed not useful any more.
It looks like _hurd_thread_sigstate used to return with the sigstate
lock held long ago, but since that's no longer the case, don't unlock
something that isn't locked.
Note that it's unlikely this change fixes anything in practice since
its current implementation (on i386) makes this call a nop.
soft-fp's _FP_FMA fails to set the result's exponent for cases where
the result of the multiplication is 0, yielding incorrect (arbitrary,
depending on uninitialized values) results for those cases. This
affects libm for architectures using soft-fp to implement fma. This
patch adds the exponent setting and tests for this case.
Tested for ARM soft-float (which uses soft-fp fma), x86_64 and x86 (to
verify not introducing new libm test failures there).
(This bug showed up in testing my patch to move the Linux kernel to
current soft-fp. math/Makefile has "override CFLAGS +=
-Wno-uninitialized" which would have stopped compiler warnings from
showing up this problem, although I wouldn't be surprised if removing
that shows spurious warnings from this code, if the compiler fails to
follow that various cases where the exponent is uninitialized don't
need it initialized because the class is set to a value meaning the
uninitialized exponent isn't used.)
[BZ #17932]
* soft-fp/op-common.h (_FP_FMA): Set exponent of result in case
where multiplication results in zero and third argument is finite
and nonzero.
* math/auto-libm-test-in: Add more tests of fma.
* math/auto-libm-test-out: Regenerated.
In <https://sourceware.org/ml/libc-alpha/2014-09/msg00488.html>, I
noted that comparisons in soft-fp did not set FP_EX_DENORM unless
denormal operands were flushed to zero.
This patch fixes soft-fp to check for denormal operands for
comparisons and set that exception whenever FP_EX_DENORM is not zero.
In particular, for the one architecture for which the Linux kernel
defines FP_EX_DENORM (alpha), this corresponds to the existing logic
for comparisons and so allows that logic to be replaced by a simple
call to FP_CMP_D when soft-fp is updated in the kernel.
Tested for powerpc (e500) that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/op-common.h (_FP_CMP_CHECK_DENORM): New macro.
(_FP_CMP_CHECK_FLUSH_ZERO): Likewise.
(_FP_CMP): Use_FP_CMP_CHECK_DENORM and _FP_CMP_CHECK_FLUSH_ZERO.
(_FP_CMP_EQ): Likewise.
(_FP_CMP_UNORD): Use _FP_CMP_CHECK_DENORM.
One special case needed in soft-fp to replace the old version in the
Linux kernel is extending from a narrower floating-point format to a
wider one without quieting signaling NaNs. (This is for
arch/powerpc/math-emu/lfs.c, where previously it used the old FP_CONV
which didn't do anything special for NaNs, then handled packing
specially for NaNs to avoid quieting at packing time, and discarded
the exceptions from unpacking.)
This patch accordingly refactors FP_EXTEND, creating a separate
_FP_EXTEND_CNAN that offers a choice of how NaNs are handled, with
FP_EXTEND reimplemented as a wrapper that provides the common case of
the IEEE operation that does quiet signaling NaNs and raise exceptions
for them.
Tested for powerpc (e500) that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/op-common.h (FP_EXTEND): Rename to _FP_EXTEND_CNAN with
extra argument CHECK_NAN. Redefine as wrapper around
_FP_EXTEND_CNAN.