soft-fp's _FP_FMA fails to set the result's exponent for cases where
the result of the multiplication is 0, yielding incorrect (arbitrary,
depending on uninitialized values) results for those cases. This
affects libm for architectures using soft-fp to implement fma. This
patch adds the exponent setting and tests for this case.
Tested for ARM soft-float (which uses soft-fp fma), x86_64 and x86 (to
verify not introducing new libm test failures there).
(This bug showed up in testing my patch to move the Linux kernel to
current soft-fp. math/Makefile has "override CFLAGS +=
-Wno-uninitialized" which would have stopped compiler warnings from
showing up this problem, although I wouldn't be surprised if removing
that shows spurious warnings from this code, if the compiler fails to
follow that various cases where the exponent is uninitialized don't
need it initialized because the class is set to a value meaning the
uninitialized exponent isn't used.)
[BZ #17932]
* soft-fp/op-common.h (_FP_FMA): Set exponent of result in case
where multiplication results in zero and third argument is finite
and nonzero.
* math/auto-libm-test-in: Add more tests of fma.
* math/auto-libm-test-out: Regenerated.
BZ #16618
Under certain conditions wscanf can allocate too little memory for the
to-be-scanned arguments and overflow the allocated buffer. The
implementation now correctly computes the required buffer size when
using malloc.
A regression test was added to tst-sscanf.
memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.
[BZ #17801]
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
New.
(index_AVX_Fast_Unaligned_Load): Likewise.
(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
The padding bytes in the statsdata struct are not initialized, due to
which valgrind throws a warning:
==11384== Memcheck, a memory error detector
==11384== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==11384== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==11384== Command: nscd -d
==11384==
Fri 25 Apr 2014 10:34:53 AM CEST - 11384: handle_request: request received (Version = 2) from PID 11396
Fri 25 Apr 2014 10:34:53 AM CEST - 11384: GETSTAT
==11384== Thread 6:
==11384== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==11384== at 0x4E4ACDC: send (in /lib64/libpthread-2.12.so)
==11384== by 0x11AF6B: send_stats (in /usr/sbin/nscd)
==11384== by 0x112F75: nscd_run_worker (in /usr/sbin/nscd)
==11384== by 0x4E439D0: start_thread (in /lib64/libpthread-2.12.so)
==11384== by 0x599AB6C: clone (in /lib64/libc-2.12.so)
==11384== Address 0x15708395 is on thread 6's stack
Fix the warning by initializing the structure.
This patch fixes a bug introduced by 18f2945ae9, where it optimizes
the FPSCR set by just issuing a mtfs instruction if new flag is different
from older one. The issue is a typo, where the new flag should the the
new value, instead of the old one.
It fixes BZ#17885.
Some powerpc64 processors (e5500 core for instance) does not provide the
fsqrt instruction, however current check to use in math_private.h is
__WORDSIZE and _ARCH_PWR4 (ISA 2.02). This is patch change it to use
the compiler flag _ARCH_PPCSQ (which is the same condition GCC uses to
decide whether to generate fsqrt instruction).
It fixes BZ#16576.
GLIBC memset optimization for POWER8 uses the '.machine power8'
directive, which is only supported officially on binutils 2.24+. This
causes a build failure on older binutils.
Since the requirement of .machine power8 is to correctly assembly the
'mtvsrd' instruction and it is already handled by the MTVSRD_V1_R4
macro, there is no really needed of using it.
The patch replaces the power8 with power7 for .machine directive.
It fixes BZ#17869.
This patch fix the elf/ifuncmain6pie failure when building with GCC
4.9+. For some reason, the compiler removes the branch taken code at
resolve_ifunc (sysdeps/powerpc/powerpc64/dl-machine.h) as dead-code
and thus the testcase fails because the ifunc resolves branches to an
invalid memory location. It fixes by explicit adding a dependency of
value based on odp variable to avoid compiler optimization.
It fixes BZ#17868.
This patch replaces unsigned long int and 1UL with uint64_t and
(uint64_t) 1 to support ILP32 targets like x32.
[BZ #17870]
* nptl/sem_post.c (__new_sem_post): Replace unsigned long int
with uint64_t.
* nptl/sem_waitcommon.c (__sem_wait_cleanup): Replace 1UL with
(uint64_t) 1.
(__new_sem_wait_slow): Replace unsigned long int with uint64_t.
Replace 1UL with (uint64_t) 1.
* sysdeps/nptl/internaltypes.h (new_sem): Replace unsigned long
int with uint64_t.
This patch fix powerpc __get_clockfreq racy and cancel-safe issues by
dropping internal static cache and by using nocancel file operations.
The vDSO failure check is also removed, since kernel code does not
return an error (it cleans cr0.so bit on function return) and the static
code (to read value /proc) now uses non-cancellable calls.
The ability to recursively call dlopen is useful for malloc
implementations that wish to load other dynamic modules that
implement reentrant/AS-safe functions to use in their own
implementation.
Given that a user malloc implementation may be called by an
ongoing dlopen to allocate memory the user malloc
implementation interrupts dlopen and if it calls dlopen again
that's a reentrant call.
This patch fixes the issues with the ld.so.cache mapping
and the _r_debug assertion which prevent this from working
as expected.
See:
https://sourceware.org/ml/libc-alpha/2014-12/msg00446.html
This commit fixes semaphore destruction by either using 64b atomic
operations (where available), or by using two separate fields when only
32b atomic operations are available. In the latter case, we keep a
conservative estimate of whether there are any waiting threads in one
bit of the field that counts the number of available tokens, thus
allowing sem_post to atomically both add a token and determine whether
it needs to call futex_wake.
See:
https://sourceware.org/ml/libc-alpha/2014-12/msg00155.html
This patch adds an optimized POWER8 strncmp. The implementation focus
on speeding up unaligned cases follwing the ideas of power8 strcmp.
The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
(where sources alignment are different).
This patch adds an optimized POWER8 strcmp using unaligned accesses.
The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
This patch adds an optimized POWER8 st{r,p}ncpy using unaligned accesses.
It shows 10%-80% improvement over the optimized POWER7 one that uses
only aligned accesses, specially on unaligned inputs.
The algorithm first read and check 16 bytes (if inputs do not cross a 4K
page size). The it realign source to 16-bytes and issue a 16 bytes read
and compare loop to speedup null byte checks for large strings. Also,
different from POWER7 optimization, the null pad is done inline in the
implementation using possible unaligned accesses, instead of realying on
a memset call. Special case is added for page cross reads.
This patch adds an optimized POWER8 strcpy using unaligned accesses.
For strings up to 16 bytes the implementation first calculate the
string size, like strlen, and issues a memcpy. For larger strings,
source is first aligned to 16 bytes and then tested over a loop that
reads 16 bytes am combine the cmpb results for speedup. Special case is
added for page cross reads.
It shows 30%-60% improvement over the optimized POWER7 one that uses
only aligned accesses.
[Modified from the original email by Siddhesh Poyarekar]
This patch solves bug #16009 by implementing an additional path in
strxfrm that does not depend on caching the weight and rule indices.
In detail the following changed:
* The old main loop was factored out of strxfrm_l into the function
do_xfrm_cached to be able to alternativly use the non-caching version
do_xfrm.
* strxfrm_l allocates a a fixed size array on the stack. If this is not
sufficiant to store the weight and rule indices, the non-caching path is
taken. As the cache size is not dependent on the input there can be no
problems with integer overflows or stack allocations greater than
__MAX_ALLOCA_CUTOFF. Note that malloc-ing is not possible because the
definition of strxfrm does not allow an oom errorhandling.
* The uncached path determines the weight and rule index for every char
and for every pass again.
* Passing all the locale data array by array resulted in very long
parameter lists, so I introduced a structure that holds them.
* Checking for zero src string has been moved a bit upwards, it is
before the locale data initialization now.
* To verify that the non-caching path works correct I added a test run
to localedata/sort-test.sh & localedata/xfrm-test.c where all strings
are patched up with spaces so that they are too large for the caching path.
The ldbl-96 implementation of scalblnl (used for x86_64 and ia64) uses
a condition k <= -63 to determine when a standard underflowing result
tiny*__copysignl(tiny,x) should be returned. However, that condition
corresponds to values with exponent -16446 or less, and in the case of
-16446, the correct result for round-to-nearest depends on whether the
value is exactly 0x1p-16446 (half the least subnormal) or more than
that. This patch fixes the bug by changing the condition to k <= -64
and accordingly adjusting the exponent by 64 not 63 when converting to
a normal value.
Tested for x86_64.
[BZ #17803]
* sysdeps/ieee754/ldbl-96/s_scalblnl.c (twom63): Rename to
twom64. Adjust value to 0x1p-64L.
(__scalblnl): Only return standard underflowing result for K <=
-64 not K <= -63; adjust exponent for underflowing result by 64
not 63.
* math/libm-test.inc (scalbn_test_data): Add more tests.
(scalbln_test_data): Likewise.
The ldbl-96 implementation of scalblnl (used for x86_64 and ia64) is
incorrect for subnormal arguments (this is a separate bug from bug
17803, which is about underflowing results). There are two problems
with the adjustments of subnormal arguments: the "two63" variable
multiplied by is actually 0x1p52L not 0x1p63L, so is insufficient to
make values normal, and then GET_LDOUBLE_EXP(es,x), used to extract
the new exponent, extracts it into a variable that isn't used, while
the value taken to by the new exponent is wrongly taken from the high
part of the mantissa before the adjustment (hx). This patch fixes
both those problems and adds appropriate tests.
Tested for x86_64.
[BZ #17834]
* sysdeps/ieee754/ldbl-96/s_scalblnl.c (two63): Change value to
0x1p63L.
(__scalblnl): Get new exponent of adjusted subnormal value from ES
not HX.
* math/libm-test.inc (scalbn_test_data): Add more tests.
(scalbln_test_data): Likewise.
This patch adds support for lock elision using ISA 2.07 hardware
transactional memory instructions for pthread_mutex primitives.
Similar to s390 version, the for elision logic defined in
'force-elision.h' is only enabled if ENABLE_LOCK_ELISION is defined.
Also, the lock elision code should be able to be built even with
a compiler that does not provide HTM support with builtins.
However I have noted the performance is sub-optimal due scheduling
pressures.
Microblaze apparently has a variable page size (see thread below) and
should not hard-code any page-size related macros.
Also remove macros that are only used for BFD's trad-core support
which is not relavant for microblaze also according to the thread
starting here:
https://sourceware.org/ml/libc-ports/2013-11/msg00028.html
This patch is neither built nor tested but mirrors a MIPS patch that
fixes the same issue.
Thanks,
Matthew
* sysdepsysdeps/unix/sysv/linux/microblaze/sys/user.h
(PAGE_SHIFT, PAGE_SIZE, PAGE_MASK, NBPG, UPAGES): Remove.
(HOST_TEXT_START_ADDR, HOST_STACK_END_ADDR): Remove.
Signed-off-by: David Holsgrove <david.holsgrove@xilinx.com>
Concluding the fixes for C90 libm functions calling C99 fe* functions,
this patch fixes the case of feupdateenv by making it a weak alias for
__feupdateenv and making the affected code call __feupdateenv.
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch). Also tested for ARM
(soft-float) that the math.h linknamespace tests now pass.
[BZ #17748]
* include/fenv.h (__feupdateenv): Use libm_hidden_proto.
* math/feupdateenv.c (__feupdateenv): Use libm_hidden_def.
* sysdeps/aarch64/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/alpha/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/arm/feupdateenv.c (feupdateenv): Rename to __feupdateenv
and define as weak alias of __feupdateenv. Use libm_hidden_weak.
* sysdeps/hppa/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/i386/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/ia64/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/m68k/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/mips/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/powerpc/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/powerpc/nofpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/feupdateenv.c
(__feupdateenv): Likewise.
* sysdeps/s390/fpu/feupdateenv.c (feupdateenv): Rename to
__feupdateenv and define as weak alias of __feupdateenv. Use
libm_hidden_weak.
* sysdeps/sh/sh4/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sparc/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/tile/math_private.h (__feupdateenv): New inline
function.
* sysdeps/x86_64/fpu/feupdateenv.c (__feupdateenv): Use
libm_hidden_def.
* sysdeps/generic/math_private.h (default_libc_feupdateenv): Call
__feupdateenv instead of feupdateenv.
(default_libc_feupdateenv_test): Likewise.
(libc_feresetround_ctx): Likewise.
glibc maintains a binary tree of environment strings it malloc()ed
itself. However, it's possible for it to malloc() a string, then find
that an identical string is already in the tree. In this case, the
memory is leaked and is not freed if the application later calls
__libc_freeres(). Fix this by freeing 'new_value' when it's unneeded.
Test case:
#include <stdlib.h>
#include <string.h>
int main()
{
char *p = calloc(100000, 1);
memset(p, 'A', 99999);
setenv("TESTVAR", p, 1);
setenv("TESTVAR", p, 1);
free(p);
}
Leak that was reported by valgrind:
100,008 bytes in 1 blocks are definitely lost in loss record 1 of 1
at 0x4C29F90: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x4E6B3D4: __add_to_environ (setenv.c:176)
by 0x4C31B8F: setenv (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x400642: main (in /mnt/tmpfs/a.out)
When mount entry contains only four fields and have more then one space or
tab at the and, mp.mnt_freq and mp.mnt_passno will be set to some specific
values as side effect from parsing of previus mount entry. It is because
sscanf(""," %d %d ", &a, &b) returns -1, but this case is unprocessed.
Values of mp.mnt_freq and mp.mnt_passno stays unchanged. This patch is
attempt to fix described issue by removing trailing tabs and spaces.
C99 specifies that CLOCKS_PER_SEC is an expression with the type clock_t.
This patch adds a generic <bits/time2.h> to define CLOCKS_PER_SEC and
provides the Linux/x86-64 version of <bits/time2.h> to support x32.
[BZ #17797]
* bits/time.h (CLOCKS_PER_SEC): Changed to ((clock_t) 1000000).
* sysdeps/unix/sysv/linux/bits/time.h (CLOCKS_PER_SEC): Likewise.
* sysdeps/unix/sysv/linux/clock.c (clock): _Static_assert
CLOCKS_PER_SEC == 1000000.
* time/clocktest.c (main): Replace %ld with %jd and cast to
intmax_t.
sysdeps/unix/sysv/linux/mips/mips64/n64/posix_fadvise.c defines
posix_fadvise64 as a strong alias for posix_fadvise (for
!SHLIB_COMPAT(libc, GLIBC_2_2, GLIBC_2_3_3) - i.e., for static
linking, which is the case when this matters), but it should be a weak
alias. This patch makes it a weak alias.
Tested for MIPS that this fixes the observed linknamespace test
failures.
[BZ #17796]
* sysdeps/unix/sysv/linux/mips/mips64/n64/posix_fadvise.c
[!SHLIB_COMPAT(libc, GLIBC_2_2, GLIBC_2_3_3)] (posix_fadvise64):
Define as weak alias not strong alias.
ARM posix_fadvise calls __posix_fadvise64_l64, to which
posix_fadvise64 is a strong alias, but posix_fadvise is a POSIX
function and posix_fadvise64 isn't. This patch changes it into a weak
alias.
Tested for ARM that this fixes the corresponding linknamespace test
failures.
[BZ #17793]
* sysdeps/unix/sysv/linux/arm/posix_fadvise64.c (posix_fadvise64):
Define as weak alias not strong alias.
Use of isblank brings in isascii and toascii, but isblank is a C99
function and the other two aren't; similarly, isascii and toascii are
UNIX98 functions and bring in isblank, which isn't. (Not a
conformance issue because of the is* and to* reservation, but still
contrary to glibc practice.) This patch fixes this by splitting
isblank out of ctype-extn.c to a separate ctype-c99.c. isblank_l is
also moved to a separate file, ctype-c99_l.c (non-XSI POSIX.1-2008 has
isblank_l, but isascii / toascii are marked OB XSI). (In principle
all these functions could go in separate files - that's optimal for
static linking - but they are also all very small, and splitting them
all out is not needed to fix the present bug.)
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch - the ordering in which new and
existing sources are listed in ctype/Makefile is arranged so functions
go in the same order so that this comparison works).
[BZ #17635]
* ctype/ctype-c99.c: New file. isblank implementation moved from
...
* ctype/ctype-extn.c: ... here.
(__isblank_l): Move to ...
* ctype/ctype-c99_l.c: ... here. New file.
* ctype/Makefile (routines): Add ctype-c99 and ctype-c99_l.
* conform/Makefile (test-xfail-ISO99/ctype.h/linknamespace):
Remove variable.
(test-xfail-ISO11/ctype.h/linknamespace): Likewise.
(test-xfail-XPG3/ctype.h/linknamespace): Likewise.
(test-xfail-XPG4/ctype.h/linknamespace): Likewise.
(test-xfail-UNIX98/ctype.h/linknamespace): Likewise.
(test-xfail-POSIX2008/ctype.h/linknamespace): Likewise.
On systems using sysdeps/unix/sysv/linux/wordsize-64, posix_fadvise64
and posix_fallocate64 (non-POSIX) are strong aliases for posix_fadvise
and posix_fallocate (POSIX), meaning references to the latter wrongly
bring in definitions of the former. They should be weak aliases; this
patch makes them so.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch).
[BZ #17777]
* sysdeps/unix/sysv/linux/wordsize-64/posix_fadvise.c
(posix_fadvise64): Define as weak alias not strong alias.
* sysdeps/unix/sysv/linux/wordsize-64/posix_fallocate.c
(posix_fallocate64): Likewise.
* conform/Makefile (test-xfail-XOPEN2K/fcntl.h/linknamespace):
Remove variable.
(test-xfail-XOPEN2K/mqueue.h/linknamespace): Likewise.
(test-xfail-POSIX2008/fcntl.h/linknamespace): Likewise.
(test-xfail-POSIX2008/mqueue.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/fcntl.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/mqueue.h/linknamespace): Likewise.
MIPS supports a variable page size but glibc defines a constant.
This causes at least two glibc tests to fail when the page size
does not match the hard-coded size:
inet/test-ifaddrs
inet/test_ifindex
[BZ #16191]
* NEWS: Mention bug fix.
* sysdeps/unix/sysv/linux/mips/sys/user.h (PAGE_SHIFT): Remove.
(PAGE_SIZE, PAGE_MASK, NBPG, UPAGES): Likewise.
(HOST_TEXT_START_ADDR, HOST_DATA_START_ADDR): Likewise.
(HOST_STACK_END_ADDR): Likewise.
sysdeps/unix/sysv/linux/mips/bits/termios.h defines TIOCSER_TEMT
unconditionally, but it's in the user's namespace. This patch
conditions it on __USE_MISC, as on powerpc. I've filed bug 17783 for
the residual inconsistency in conditions on this macro (sparc defines
it for __USE_GNU only).
[BZ #17782]
* sysdeps/unix/sysv/linux/mips/bits/termios.h (TIOCSER_TEMT):
Condition macro definition on [__USE_MISC].
sysdeps/unix/sysv/linux/mips/bits/sigaction.h gives sa_flags type
unsigned int, but POSIX says it should be signed int. This patch
gives it the correct type (the layout is unchanged, so there are no
ABI issues involved).
[BZ #17781]
* sysdeps/unix/sysv/linux/mips/bits/sigaction.h
(struct sigaction): Change type of sa_flags field to int.
sysdeps/unix/sysv/linux/mips/bits/fcntl.h has a structure field called
pad, which is in the user's namespace. This patch changes it to
__glibc_reserved0.
[BZ #17780]
* sysdeps/unix/sysv/linux/mips/bits/fcntl.h (struct flock)
[!__USE_FILE_OFFSET64 && _MIPS_SIM != _ABI64]: Rename pad field to
__glibc_reserved0.
PI_STATIC_AND_HIDDEN is always defined for i386. There is no need to
check PI_STATIC_AND_HIDDEN in sysdeps/i386/dl-machine.h.
[BZ #17775]
* sysdeps/i386/dl-machine.h (PI_STATIC_AND_HIDDEN): Removed.
(elf_machine_dynamic) [!PI_STATIC_AND_HIDDEN]: Likewise.
(elf_machine_load_address) [!PI_STATIC_AND_HIDDEN]: Likewise.