This patch removes all the replicated pthread definition accross the
architectures and consolidates it on shared headers. The new
organization is as follow:
* Architecture specific definition (such as pthread types sizes) are
place in the new pthreadtypes-arch.h header in arch specific path.
* All shared structure definition are moved to a common NPTL header
at sysdeps/nptl/bits/pthreadtypes.h (with now includes the arch
specific one for internal definitions).
* Also, for C11 future thread support, both mutex and condition
definition are placed in a common header at
sysdeps/nptl/bits/thread-shared-types.h.
It is also a refactor patch without expected functional changes.
Checked with a build for all major ABI (aarch64-linux-gnu, alpha-linux-gnu,
arm-linux-gnueabi, i386-linux-gnu, ia64-linux-gnu,
m68k-linux-gnu, microblaze-linux-gnu, mips{64}-linux-gnu, nios2-linux-gnu,
powerpc{64le}-linux-gnu, s390{x}-linux-gnu, sparc{64}-linux-gnu,
tile{pro,gx}-linux-gnu, and x86_64-linux-gnu).
* posix/Makefile (headers): Add pthreadtypes-arch.h and
thread-shared-types.h.
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h: New file: arch
specific thread definition.
* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/s390/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/tile/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/x86/nptl/bits/pthreadtypes-arch.h: Likewise.
* sysdeps/nptl/bits/thread-shared-types.h: New file: shared
thread definition between POSIX and C11.
* sysdeps/aarch64/nptl/bits/pthreadtypes.h.: Remove file.
* sysdeps/alpha/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/arm/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/hppa/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/m68k/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/microblaze/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/mips/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/nios2/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/ia64/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/powerpc/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/s390/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/sh/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/sparc/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/tile/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/x86/nptl/bits/pthreadtypes.h: Likewise.
* sysdeps/nptl/bits/pthreadtypes.h: New file: common thread
definitions shared across all architectures.
1. Fix the results for negative subnormals by ignoring the signal when
normalizing the value.
2. Fix the output when the high part is a power of 2 and the low part
is a nonzero number with opposite sign. This fix is based on commit
380bd0fd24.
After applying this patch, logbl() tests pass cleanly on POWER >= 7.
Tested on powerpc, powerpc64 and powerpc64le
[BZ #21280]
* sysdeps/powerpc/power7/fpu/s_logbl.c (__logbl): Ignore the
signal of subnormals and adjust the exponent of power of 2 down
when low part has opposite sign.
float128 on powerpc64le requires the addition of the ieee754/float128
sysdep, whereas powerpc64 doesn't. This requires creating a bunch of
submachine and cpu directories and Implies files which just point
towards their powerpc64 equivalent.
Tested on P7, P8, and generic powerpc64le targets with and without
multiarch.
* sysdeps/powerpc/powerpc64le/Implies: New file.
* sysdeps/powerpc/powerpc64le/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64le/fpu/multiarch/Implies: New file.
* sysdeps/powerpc/powerpc64le/multiarch/Implies: New file.
* sysdeps/powerpc/powerpc64le/power7/Implies: New file.
* sysdeps/powerpc/powerpc64le/power7/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64le/power7/fpu/multiarch/Implies: New file.
* sysdeps/powerpc/powerpc64le/power7/multiarch/Implies: New file.
* sysdeps/powerpc/powerpc64le/power8/Implies: New file.
* sysdeps/powerpc/powerpc64le/power8/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64le/power8/fpu/multiarch/Implies: New file.
* sysdeps/powerpc/powerpc64le/power8/multiarch/Implies: New file.
* sysdeps/powerpc/powerpc64le/power9/Implies: New file.
* sysdeps/powerpc/powerpc64le/power9/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64le/power9/fpu/multiarch/Implies: New file.
* sysdeps/powerpc/powerpc64le/power9/multiarch/Implies: New file.
* sysdeps/powerpc/preconfigure: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64le/Implies: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64le/fpu/Implies: New file.
P7 code is used for <=32B strings and for > 32B vectorized loops are used.
This shows as an average 25% improvement depending on the position of search
character. The performance is same for shorter strings.
Tested on ppc64 and ppc64le.
With new optimized strnlen for POWER8 [1], this patch adds
strncat for power8 to make use of optimized strlen and strnlen.
This is faster than POWER7 current implementation for larger strings.
Tested on powerpc64 and powerpc64le.
[1] https://sourceware.org/ml/libc-alpha/2017-03/msg00491.html
* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add
strncat-power8.
* sysdeps/powerpc/powerpc64/multiarch/strncat.c (strncat): Add
__strncat_power8 to ifunc list.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
(strncat): Add __strncat_power8 to list of strncat functions.
* sysdeps/powerpc/powerpc64/multiarch/strncat-power8.c: New file.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memcmp-power4.S: Define the
implementation-specific function name and remove unneeded
macros definition.
* sysdeps/powerpc/powerpc64/multiarch/memcmp-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcmp.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power7/memcmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-a2.S: Define the
implementation-specific function name and remove unneeded
macros definition.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-cell.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power4.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/a2/memcpy.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/cell/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memchr-power7.S: Define the
implementation-specific function name and remove unneeded macros
definition.
* sysdeps/powerpc/powerpc64/multiarch/memrchr-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/rawmemchr-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memchr.S: Set a default
function name if not defined and pass as parameter to macros
accordingly.
* sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memset-power4.S: Define the
implementation-specific function name and remove unneeded macros
definition.
* sysdeps/powerpc/powerpc64/multiarch/memset-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/memset.S: Set a default function name if
not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/memset.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/strcasestr-power8.S: Define the
strcasestr implementation name and remove unneeded macros definition.
* sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Define
strstr implementation name and remove unneeded macros definition.
* sysdeps/powerpc/powerpc64/power7/strstr.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power8/strcasestr.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/strchr-power7.S: Define the
implementation-specific function name and remove unneeded macros
definition.
* sysdeps/powerpc/powerpc64/multiarch/strchr-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchr-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchrnul-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchrnul-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strchr.S: Set a default
function name if not defined and pass as parameter to macros
accordingly.
* sysdeps/powerpc/powerpc64/power7/strchrnul.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strrchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/strchr.S: Likewise.
* sysdeps/powerpc/powerpc64/strchr.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/strlen-power7.S: Define
the strlen implementation name and remove unneeded macros definition.
* sysdeps/powerpc/powerpc64/multiarch/strlen-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strlen-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strnlen-power7.S: Define
the strnlen implementation name and remove unneeded macros definition.
* sysdeps/powerpc/powerpc64/power7/strlen.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power7/strnlen.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/strlen.S: Likewise.
* sysdeps/powerpc/powerpc64/strlen.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S: Define
the implementation-specific function name and remove unneeded
macros definition.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power4.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/strncmp.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power7/strcmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/strcmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power9/strcmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power9/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/strcmp.S: Likewise.
* sysdeps/powerpc/powerpc64/strncmp.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy-power8.S: Define the
implementation-specific function name and remove unneeded macros
definition.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strncpy.S: Set a default
function name if not defined.
* sysdeps/powerpc/powerpc64/power8/strcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise.
This patch moves all arch specific pthreadtypes.h to a similar path
for all architectures (sysdeps/unix/sysv/<arch>/bits). No functional
or build change is expected. The idea is mainly to organize the
header placement for all architectures.
Checked with a build for all major ABI (aarch64-linux-gnu, alpha-linux-gnu,
arm-linux-gnueabi, i386-linux-gnu, ia64-linux-gnu,
m68k-linux-gnu, microblaze-linux-gnu [1], mips{64}-linux-gnu, nios2-linux-gnu,
powerpc{64le}-linux-gnu, s390{x}-linux-gnu, sparc{64}-linux-gnu,
tile{pro,gx}-linux-gnu, and x86_64-linux-gnu).
* sysdeps/unix/sysv/linux/x86/Implies: New file.
* sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h: Move to ...
* sysdeps/alpha/nptl/bits/pthreadtypes.h: ... here.
* sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h: Move to ...
* sysdeps/powerpc/nptl/bits/pthreadtypes.h: ... here.
* sysdeps/x86/bits/pthreadtypes.h: Move to ...
* sysdeps/x86/nptl/bits/pthreadtypes.h: ... here.
As noted in [1], divdi3 object is only exported in a handful ABIs
(i386, m68k, powerpc32, s390-32, and ia64), however it is built
for all current architectures regardless.
This patch refact the make rules for this object to so only the
aforementioned architectures that actually require it builds it.
Also, to avoid internal PLT calls to the exported symbol from the
module, glibc uses an internal header (symbol-hacks.h) which is
unrequired (and in fact breaks the build for architectures that
intend to get symbol definitions from libgcc.a). The patch also
changes it to create its own header (divdi3-symbol-hacks.h) and
adjust the architectures that require it accordingly.
I checked the build/check (with run-built-tests=no) on the
following architectures (which I think must cover all supported
ABI/builds) using GCC 6.3:
aarch64-linux-gnu
alpha-linux-gnu
arm-linux-gnueabihf
hppa-linux-gnu
ia64-linux-gnu
m68k-linux-gnu
microblaze-linux-gnu
mips64-n32-linux-gnu
mips-linux-gnu
mips64-linux-gnu
nios2-linux-gnu
powerpc-linux-gnu
powerpc-linux-gnu-power4
powerpc64-linux-gnu
powerpc64le-linux-gnu
s390x-linux-gnu
s390-linux-gnu
sh4-linux-gnu
sh4-linux-gnu-soft
sparc64-linux-gnu
sparcv9-linux-gnu
tilegx-linux-gnu
tilegx-linux-gnu-32
tilepro-linux-gnu
x86_64-linux-gnu
x86_64-linux-gnu-x32
i686-linux-gnu
I only saw one regression on sparcv9-linux-gnu (extra PLT call to
.udiv) which I address in next patch in the set. It also correctly
build SH with GCC 7.0.1 (without any regression from c89721e25d).
[1] https://sourceware.org/ml/libc-alpha/2017-03/msg00243.html
* sysdeps/i386/symbol-hacks.h: New file.
* sysdeps/m68k/symbol-hacks.h: New file.
* sysdeps/powerpc/powerpc32/symbol-hacks.h: New file.
* sysdeps/s390/s390-32/symbol-hacks.h: New file.
* sysdeps/unix/sysv/linux/i386/Makefile
[$(subdir) = csu] (sysdep_routines): New rule: divdi3 object.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/unix/sysv/linux/m68k/Makefile
[$(subdir) = csu] (sysdep_routines): Likewise.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/Makefile
[$(subdir) = csu] (sysdep_routines): Likewise.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/Makefile
[$(subdir) = csu] (sysdep_routines): Likewise.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/wordsize-32/Makefile: Remove file.
* sysdeps/wordsize-32/symbol-hacks.h: Definitions move to ...
* sysdeps/wordsize-32/divdi3-symbol-hacks.h: ... here.
Added strnlen POWER8 otimized for long strings. It delivers
same performance as POWER7 implementation for short strings.
This takes advantage of reasonably performing unaligned loads
and bit permutes to check the first 1-16 bytes until
quadword aligned, then checks in 64 bytes strides until unsafe,
then 16 bytes, truncating the count if need be.
Likewise, the POWER7 code is recycled for less than 32 bytes strings.
Tested on ppc64 and ppc64le.
* sysdeps/powerpc/powerpc64/multiarch/Makefile
(sysdep_routines): Add strnlen-power8.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
(strnlen): Add __strnlen_power8 to list of strnlen functions.
* sysdeps/powerpc/powerpc64/multiarch/strnlen-power8.S:
New file.
* sysdeps/powerpc/powerpc64/multiarch/strnlen.c
(__strnlen): Add __strnlen_power8 to ifunc list.
* sysdeps/powerpc/powerpc64/power8/strnlen.S: New file.
These are a grab bag of changes where the testsuite was using internal
symbols of some variety, but this was straightforward to fix, and the
fixed code should work with or without the change to compile the
testsuite under _ISOMAC.
Four of these are just more #include adjustments, but I want to highlight
sysdeps/powerpc/fpu/tst-setcontext-fpscr.c, which appears to have been
written before the advent of sys/auxv.h. I think a big chunk of this file
could be replaced by a simple call to getauxval, but I'll let someone who
actually has a powerpc machine to test on do that.
dlfcn/tst-dladdr.c was including ldsodefs.h just so it could use
DL_LOOKUP_ADDRESS to print an additional diagnostic; as requested by Carlos,
I have removed this.
math/test-misc.c was using #ifndef NO_LONG_DOUBLE, which is an internal
configuration macro, to decide whether to do certain tests involving
'long double'. I changed the test to #if LDBL_MANT_DIG > DBL_MANT_DIG
instead, which uses only public float.h macros and is equivalent on
all supported platforms. (Note that NO_LONG_DOUBLE doesn't mean 'the
compiler doesn't support long double', it means 'long double is the
same as double'.)
tst-writev.c has a configuration macro 'ARTIFICIAL_LIMIT' that the
Makefiles are expected to define, and sysdeps/unix/sysv/linux/Makefile
was using the internal __getpagesize in the definition; changed to
sysconf(_SC_PAGESIZE) which is the POSIX equivalent.
ia64-linux doesn't supply 'clone', only '__clone2', which is not
defined in the public headers(!) All the other clone tests have local
extern declarations of __clone2, but tst-clone.c doesn't; it was
getting away with this because include/sched.h does declare __clone2.
* nss/tst-cancel-getpwuid_r.c: Include nss.h.
* string/strcasestr.c: No need to include config.h.
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c: Include
sys/auxv.h. Don't include sysdep.h.
* sysdeps/powerpc/tst-set_ppr.c: Don't include dl-procinfo.h.
* dlfcn/tst-dladdr.c: Don't include ldsodefs.h. Don't use
DL_LOOKUP_ADDRESS.
* math/test-misc.c: Instead of testing NO_LONG_DOUBLE, test whether
LDBL_MANT_DIG is greater than DBL_MANT_DIG.
* sysdeps/unix/sysv/linux/Makefile (CFLAGS-tst-writev.c): Use
sysconf (_SC_PAGESIZE) instead of __getpagesize in definition
of ARTIFICIAL_LIMIT.
* sysdeps/unix/sysv/linux/tst-clone.c [__ia64__]: Add extern
declaration of __clone2.
A few 'long double'-related tests include math_private.h just for
their variety of math_ldbl.h, which contains macros for assembling and
disassembling the binary representation of 'long double'. math_ldbl.h
insists on being included from math_private.h, but if we relax this
restriction (and fix some portability sloppiness) we can use it
directly and not have to expose all of math_private.h to the testsuite.
* sysdeps/generic/math_private.h: Use __BIG_ENDIAN and
__LITTLE_ENDIAN, not BIG_ENDIAN and LITTLE_ENDIAN.
* sysdeps/generic/math_ldbl.h
* sysdeps/ia64/fpu/math_ldbl.h
* sysdeps/ieee754/ldbl-128/math_ldbl.h
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h
* sysdeps/ieee754/ldbl-96/math_ldbl.h
* sysdeps/powerpc/fpu/math_ldbl.h
* sysdeps/x86_64/fpu/math_ldbl.h:
Allow direct inclusion. Use uintNN_t instead of u_intNN_t.
Use __BIG_ENDIAN and __LITTLE_ENDIAN, not BIG_ENDIAN and
LITTLE_ENDIAN. Include endian.h and/or stdint.h if necessary.
Add copyright notices.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h (ldbl_canonicalize_int):
Don't use EXTRACT_WORDS64.
* sysdeps/ieee754/ldbl-96/test-canonical-ldbl-96.c
* sysdeps/ieee754/ldbl-96/test-totalorderl-ldbl-96.c
* sysdeps/ieee754/ldbl-128ibm/test-canonical-ldbl-128ibm.c
* sysdeps/ieee754/ldbl-128ibm/test-totalorderl-ldbl-128ibm.c:
Include math_ldbl.h, not math_private.h.
The sys/platform/ppc.h header defines a class of __ppc_set_ppr functions
used to set the Program Priority Register (PPR) in PowerPC.
This patch implements test cases for these functions.
Tested on ppc64le, ppc64, and ppc.
* sysdeps/powerpc/tst-set_ppr.c: New file.
Implement test cases for __ppc_set_ppr_* functions.
* sysdeps/powerpc/Makefile ($(subdir),misc): Add tst-set_ppr
in the list of tests.
Change the powerpc tests to use <support/test-driver.c>.
Also replace some of pthread calls to its xpthread equivalent.
Tested on ppc64le.
* sysdeps/powerpc/test-get_hwcap.c: Use <support/test-driver.c>
instead of test-skeleton.c.
(do_test): Replaced pthread_create and pthread_join with
xpthread_create and xpthread_join. Use TEST_VERIFY_EXIT macro.
Removed unneeded status variable.
* sysdeps/powerpc/test-gettimebase.c: Use <support/test-driver.c>
instead of test-skeleton.c.
* sysdeps/powerpc/tst-tlsopt-powerpc.c: Likewise.
For strings >16B and <32B existing algorithm takes more time than default
implementation when strings are placed closed to end of page. This is due
to byte by byte access for handling page cross. This is improved by
following >32B code path where the address is adjusted to aligned memory
before doing load doubleword operation instead of loading bytes.
Tested on powerpc64 and powerpc64le.
Based on comments on previous attempt to address BZ#16640 [1],
the idea is not support invalid use of strtok (the original
bug report proposal). This leader to a new strtok optimized
strtok implementation [2].
The idea of this patch is to fix BZ#16640 to align all the
implementations to a same contract. However, with newer strtok
code it is better to get remove the old assembly ones instead of
fix them.
For x86 is a gain in all cases since the new implementation can
potentially use sse2/sse42 implementation for strspn and strcspn.
This shows a better performance on both i686 and x86_64 using
the string benchtests.
On powerpc64 the gains are mixed, where only for larger inputs
or keys some gains are showns (based on benchtest it seems that
it shows some gains for keys larger than 10 and inputs larger
than 32). I would prefer to remove the optimized implementation
based on first code simplicity and second because some more gain
could be optimized using a better optimized strcspn/strspn
code (as for x86). However if powerpc arch maintainers prefer I
can send a v2 with the assembly code adjusted instead.
Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu.
[BZ #16640]
* sysdeps/i386/i686/strtok.S: Remove file.
* sysdeps/i386/i686/strtok_r.S: Likewise.
* sysdeps/i386/strtok.S: Likewise.
* sysdeps/i386/strtok_r.S: Likewise.
* sysdeps/powerpc/powerpc64/strtok.S: Likewise.
* sysdeps/powerpc/powerpc64/strtok_r.S: Likewise.
* sysdeps/x86_64/strtok.S: Likewise.
* sysdeps/x86_64/strtok_r.S: Likewise.
[1] https://sourceware.org/ml/libc-alpha/2016-10/msg00411.html
[2] https://sourceware.org/ml/libc-alpha/2016-12/msg00461.html
After this update, math/test-ildouble, math/test-ldouble and
math/test-ldouble-finite pass on hard float, POWER < 7 builds.
Tested on powerpc, powerpc64 and powerpc64le.
This commit moves one step towards the deprecation of wrappers that
use _LIB_VERSION / matherr / __kernel_standard functionality, by
adding the suffix '_compat' to their filenames and adjusting Makefiles
and #includes accordingly.
New template wrappers that do not use such functionality will be added
by future patches and will be first used by the float128 wrappers.
Since commit 6e46de42fe default strcat implementation is essentially
the same for specialized ia64 and powerpc ones. This patch removes the
redundant implementation and adjust powerpc64 ifunc code to use the
default one.
Checked on powerpc32-linux-gnu (default and power4) and ia64-linux build
and on powerpc64le-linux-gnu.
* sysdeps/ia64/strcat.c: Remove file.
* sysdeps/powerpc/strcat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat-power7.c: Use default
C implementation.
* sysdeps/powerpc/powerpc64/multiarch/strcat-power8.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat-ppc64.c: Likewise.
The same error fixed in commit b224637928
happens in the 32-bit implementation of memchr for power7.
This patch adopts the same solution, with a minimal change: it
implements a saturated addition where overflows sets the maximum pointer
size to UINTPTR_MAX.
The P7 code is used for <=32B strings and for > 32B vectorized loops are used.
This shows as an average 25% improvement depending on the position of search
character. The performance is same for shorter strings.
Tested on ppc64 and ppc64le.
When dynamically linking, ifunc resolvers are called before TLS is
initialized, so they cannot be safely stack-protected.
We avoid disabling stack-protection on large numbers of files by
using __attribute__ ((__optimize__ ("-fno-stack-protector")))
to turn it off just for the resolvers themselves. (We provide
the attribute even when statically linking, because we will later
use it elsewhere too.)
Current optimized powercp64/power7 memchr uses a strategy to check for
p versus align(p+n) (where 'p' is the input char pointer and n the
maximum size to check for the byte) without taking care for possible
overflow on the pointer addition in case of large 'n'.
It was triggered by 3038145ca2 where default rawmemchr (used to
created ppc64 rawmemchr in ifunc selection) now uses memchr (p, c, (size_t)-1)
on its implementation.
This patch fixes it by implement a satured addition where overflows
sets the maximum pointer size to UINTPTR_MAX.
Checked on powerpc64le-linux-gnu.
[BZ# 20971]
* sysdeps/powerpc/powerpc64/power7/memchr.S (__memchr): Avoid
overflow in pointer addition.
* string/test-memchr.c (do_test): Add an argument to pass as
the size on memchr.
(test_main): Add check for SIZE_MAX.
Various fmax and fmin function implementations mishandle sNaN
arguments:
(a) When both arguments are NaNs, the return value should be a qNaN,
but sometimes it is an sNaN if at least one argument is an sNaN.
(b) Under TS 18661-1 semantics, if either argument is an sNaN then the
result should be a qNaN (whereas if one argument is a qNaN and the
other is not a NaN, the result should be the non-NaN argument).
Various implementations treat sNaNs like qNaNs here.
This patch fixes the powerpc versions of these functions (shared by
float and double, 32-bit and 64-bit). The structure of those versions
is that all ordered cases are already handled before anything dealing
with the case where the arguments are unordered; thus, this patch
causes no change to the code executed in the common case (neither
argument a NaN).
Tested for powerpc (32-bit and 64-bit), together with tests to be
added along with the x86_64 / x86 fixes.
[BZ #20947]
* sysdeps/powerpc/fpu/s_fmax.S (__fmax): Add the arguments when
either is a signaling NaN.
* sysdeps/powerpc/fpu/s_fmin.S (__fmin): Likewise.
Information about whether the ABI of long double is the same as that
of double is split between bits/mathdef.h and bits/wordsize.h.
When the ABIs are the same, bits/mathdef.h defines
__NO_LONG_DOUBLE_MATH. In addition, in the case where the same glibc
binary supports both -mlong-double-64 and -mlong-double-128,
bits/wordsize.h defines __LONG_DOUBLE_MATH_OPTIONAL, along with
__NO_LONG_DOUBLE_MATH if this particular compilation is with
-mlong-double-64.
As part of the refactoring I proposed in
<https://sourceware.org/ml/libc-alpha/2016-11/msg00745.html>, this
patch puts all that information in a single header,
bits/long-double.h. It is included from sys/cdefs.h alongside the
include of bits/wordsize.h, so other headers generally do not need to
include bits/long-double.h directly.
Previously, various bits/mathdef.h headers and bits/wordsize.h headers
had this long double information (including implicitly in some
bits/mathdef.h headers through not having the defines present in the
default version). After the patch, it's all in six bits/long-double.h
headers. Furthermore, most of those new headers are not
architecture-specific. Architectures with optional long double all
use the ldbl-opt sysdeps directory, either in the order (ldbl-64-128,
ldbl-opt, ldbl-128) or (ldbl-128ibm, ldbl-opt). Thus a generic header
for the case where long double = double, and headers in ldbl-128,
ldbl-96 and ldbl-opt, suffices to cover every architecture except for
cases where long double properties vary between different ABIs sharing
a set of installed headers; fortunately all the ldbl-opt cases share a
single compiler-predefined macro __LONG_DOUBLE_128__ that can be used
to tell whether this compilation is -mlong-double-64 or
-mlong-double-128.
The two cases where a set of headers is shared between ABIs with
different long double properties, MIPS (o32 has long double = double,
other ABIs use ldbl-128) and SPARC (32-bit has optional long double,
64-bit has required long double), need their own bits/long-double.h
headers.
As with bits/wordsize.h, multiple-include protection for this header
is generally implicit through the include guards on sys/cdefs.h, and
multiple inclusion is harmless in any case. There is one subtlety:
the header must not define __LONG_DOUBLE_MATH_OPTIONAL if
__NO_LONG_DOUBLE_MATH was defined before its inclusion, because doing
so breaks how sysdeps/ieee754/ldbl-opt/nldbl-compat.h defines
__NO_LONG_DOUBLE_MATH itself before including system headers. Subject
to keeping that working, it would be reasonable to move these macros
from defined/undefined #ifdef to always-defined 1/0 #if semantics, but
this patch does not attempt to do so, just rearranges where the macros
are defined.
After this patch, the only use of bits/mathdef.h is the alpha one for
modifying complex function ABIs for old GCC. Thus, all versions of
the header other than the default and alpha versions are removed, as
is the include from math.h.
Tested for x86_64 and x86. Also did compilation-only testing with
build-many-glibcs.py.
* bits/long-double.h: New file.
* sysdeps/ieee754/ldbl-128/bits/long-double.h: Likewise.
* sysdeps/ieee754/ldbl-96/bits/long-double.h: Likewise.
* sysdeps/ieee754/ldbl-opt/bits/long-double.h: Likewise.
* sysdeps/mips/bits/long-double.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/long-double.h: Likewise.
* math/Makefile (headers): Add bits/long-double.h.
* misc/sys/cdefs.h: Include <bits/long-double.h>.
* stdlib/strtold.c: Include <bits/long-double.h> instead of
<bits/wordsize.h>.
* bits/mathdef.h [!_COMPLEX_H]: Do not allow inclusion.
[!__NO_LONG_DOUBLE_MATH]: Remove conditional code.
* math/math.h: Do not include <bits/mathdef.h>.
* sysdeps/aarch64/bits/mathdef.h: Remove file.
* sysdeps/alpha/bits/mathdef.h [!_COMPLEX_H]: Do not allow
inclusion.
* sysdeps/ia64/bits/mathdef.h: Remove file.
* sysdeps/m68k/m680x0/bits/mathdef.h: Likewise.
* sysdeps/mips/bits/mathdef.h: Likewise.
* sysdeps/powerpc/bits/mathdef.h: Likewise.
* sysdeps/s390/bits/mathdef.h: Likewise.
* sysdeps/sparc/bits/mathdef.h: Likewise.
* sysdeps/x86/bits/mathdef.h: Likewise.
* sysdeps/s390/s390-32/bits/wordsize.h
[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]: Remove
conditional code.
* sysdeps/s390/s390-64/bits/wordsize.h
[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
Likewise.
* sysdeps/unix/sysv/linux/alpha/bits/wordsize.h
[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h
[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/wordsize.h
[!__NO_LONG_DOUBLE_MATH && !__LONG_DOUBLE_MATH_OPTIONAL]:
Likewise.
TS 18661-1 generally defines libm functions taking sNaN arguments to
return qNaN and raise "invalid", even for the cases where a
corresponding qNaN argument would not result in a qNaN return. This
includes hypot with one argument being an infinity and the other being
an sNaN. This patch duly fixes hypot implementatations in glibc
(generic and powerpc) to ensure qNaN, computed by arithmetic on the
arguments, is returned in that case.
Various implementations do their checks for infinities and NaNs inline
by manipulating the representations of the arguments. For simplicity,
this patch just uses issignaling to check for sNaN arguments. This
could be inlined like the existing code (with due care about reversed
quiet NaN conventions, for implementations where that is relevant),
but given that all these checks are in cases where it's already known
at least one argument is not finite, which should be the uncommon
case, that doesn't seem worthwhile unless performance issues are
observed in practice.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #20940]
* sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Do not
return Inf for arguments Inf and sNaN.
* sysdeps/ieee754/flt-32/e_hypotf.c (__ieee754_hypotf): Likewise.
* sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/powerpc/fpu/e_hypot.c (TEST_INF_NAN): Do not return Inf
for arguments Inf and sNaN. When returning a NaN, compute it by
arithmetic on the arguments.
* sysdeps/powerpc/fpu/e_hypotf.c (TEST_INF_NAN): Likewise.
* math/libm-test.inc (pow_test_data): Add tests of sNaN arguments.
Commit c7debbdfac redirected the internal strrch to default powerpc64
implementation by redefining the weak_alias at
sysdeps/powerpc/powerpc64/multiarch/strchr-ppc64.c:
#undef weak_alias
#define weak_alias(name, aliasname) \
extern __typeof (__strrchr_ppc) aliasname \
__attribute__ ((weak, alias ("__strrchr_ppc")));
This creates a __GI_strchr alias that clashes with the IFUNC symbol in
stprchr.os. There is not need to define the default version for internal
version, since ifunc should work internally for powerpc64. This patch
removes the weak_alias indirection.
Checked on powerpc64le.
* sysdeps/powerpc/powerpc64/multiarch/strrchr-ppc64.c (weak_alias):
Remove redirection to __strrchr_ppc.
Continuing the refactoring of bits/mathdef.h, this patch stops it
defining FP_ILOGB0 and FP_ILOGBNAN, moving the required information to
a new header bits/fp-logb.h.
There are only two possible values of each of those macros permitted
by ISO C. TS 18661-1 adds corresponding macros for llogb, and their
values are required to correspond to those of the ilogb macros in the
obvious way. Thus two boolean values - for which the same choices are
correct for most architectures - suffice to determine the value of all
these macros, and by defining macros for those boolean values in
bits/fp-logb.h we can then define the public FP_* macros in math.h and
avoid the present duplication of the associated feature test macro
logic.
This patch duly moves to bits/fp-logb.h defining __FP_LOGB0_IS_MIN and
__FP_LOGBNAN_IS_MIN. Default definitions of those to 0 are correct
for both architectures, while ia64, m68k and x86 get their own
versions of bits/fp-logb.h to reflect their use of values different
from the defaults.
The patch renders many copies of bits/mathdef.h trivial (needed only
to avoid the default __NO_LONG_DOUBLE_MATH). I'll revise
<https://sourceware.org/ml/libc-alpha/2016-11/msg00865.html>
accordingly so that it removes all bits/mathdef.h headers except the
default one and the alpha one, and arranges for the header to be
included only by complex.h as the only remaining use at that point
will be for the alpha ABI issues there.
Tested for x86_64 and x86. Also did compile-only testing with
build-many-glibcs.py (using glibc sources from before the commit that
introduced many build failures with undefined __GI___sigsetjmp).
* bits/fp-logb.h: New file.
* sysdeps/ia64/bits/fp-logb.h: Likewise.
* sysdeps/m68k/m680x0/bits/fp-logb.h: Likewise.
* sysdeps/x86/bits/fp-logb.h: Likewise.
* math/Makefile (headers): Add bits/fp-logb.h.
* math/math.h: Include <bits/fp-logb.h>.
[__USE_ISOC99] (FP_ILOGB0): Define based on __FP_LOGB0_IS_MIN.
[__USE_ISOC99] (FP_ILOGBNAN): Define based on __FP_LOGBNAN_IS_MIN.
* bits/mathdef.h (FP_ILOGB0): Remove.
(FP_ILOGBNAN): Likewise.
* sysdeps/aarch64/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/alpha/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/ia64/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/m68k/m680x0/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/mips/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/powerpc/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/s390/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/sparc/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
* sysdeps/x86/bits/mathdef.h (FP_ILOGB0): Likewise.
(FP_ILOGBNAN): Likewise.
Commit 142e0a9953 redirected the internal stpcpy to default powerpc64
implementation by redefining the weak_alias at
sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.c:
#undef weak_alias
#define weak_alias(name, aliasname) \
extern __typeof (__stpcpy_ppc) aliasname \
__attribute__ ((weak, alias ("__stpcpy_ppc")));
This creates a __GI_stpcpy alias that clashes with the IFUNC symbol in
stpcpy.os. There is not need to define the default version for internal
version, since ifunc should work internally for powerpc64. This patch
removes the weak_alias indirection.
Checked on powerpc64le.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.c (weak_alias):
Remove redirection to __stpcpy_ppc.
The __longjmp symbol was left in accidentally. It is not exported
through a Versions file, but through a .symver assembler directive.
The corresponding exported symbol was removed from the non-fpu
powerpc64 targets in commit 9b9ef82358.
Continuing the refactoring of bits/mathdef.h, this patch moves the
FP_FAST_* definitions into a new bits/fp-fast.h header. Currently
this is only for FP_FAST_FMA*, but in future it would be the
appropriate place for the FP_FAST_* macros from TS 18661-1 as well.
The generic bits/mathdef.h header defines these macros based on
whether the compiler defines __FP_FAST_*. Most architecture-specific
headers, however, fail to do so, meaning that if the architecture (or
some particular processors) does in fact have fused operations, and
GCC knows to use them inline, the FP_FAST_* macros will still not be
defined.
By refactoring, this patch causes the generic version (based on
__FP_FAST_*) to be used in more cases, and so the macro definitions to
be more accurate. Architectures that already defined some or all of
these macros other than based on the predefines have their own
versions of fp-fast.h, which are arranged so they define FP_FAST_* if
either the architecture-specific conditions are true or __FP_FAST_*
are defined.
After this refactoring, various bits/mathdef.h headers for
architectures with long double = double are semantically identical to
the generic version. The patch removes those headers that are
redundant. (In fact two of the four removed were already redundant
before this patch because they did use __FP_FAST_*.)
Tested for x86_64 and x86, and compilation-only with
build-many-glibcs.py.
* bits/fp-fast.h: New file.
* sysdeps/aarch64/bits/fp-fast.h: Likewise.
* sysdeps/powerpc/bits/fp-fast.h: Likewise.
* math/Makefile (headers): Add bits/fp-fast.h.
* math/math.h: Include <bits/fp-fast.h>.
* bits/mathdef.h (FP_FAST_FMA): Remove.
(FP_FAST_FMAF): Likewise.
(FP_FAST_FMAL): Likewise.
* sysdeps/aarch64/bits/mathdef.h (FP_FAST_FMA): Likewise.
(FP_FAST_FMAF): Likewise.
* sysdeps/powerpc/bits/mathdef.h (FP_FAST_FMA): Likewise.
(FP_FAST_FMAF): Likewise.
* sysdeps/x86/bits/mathdef.h (FP_FAST_FMA): Likewise.
(FP_FAST_FMAF): Likewise.
(FP_FAST_FMAL): Likewise.
* sysdeps/arm/bits/mathdef.h: Remove file.
* sysdeps/hppa/fpu/bits/mathdef.h: Likewise.
* sysdeps/sh/sh4/bits/mathdef.h: Likewise.
* sysdeps/tile/bits/mathdef.h: Likewise.
This patch remove the PID cache and usage in current GLIBC code. Current
usage is mainly used a performance optimization to avoid the syscall,
however it adds some issues:
- The exposed clone syscall will try to set pid/tid to make the new
thread somewhat compatible with current GLIBC assumptions. This cause
a set of issue with new workloads and usecases (such as BZ#17214 and
[1]) as well for new internal usage of clone to optimize other algorithms
(such as clone plus CLONE_VM for posix_spawn, BZ#19957).
- The caching complexity also added some bugs in the past [2] [3] and
requires more effort of each port to handle such requirements (for
both clone and vfork implementation).
- Caching performance gain in mainly on getpid and some specific
code paths. The getpid performance leverage is questionable [4],
either by the idea of getpid being a hotspot as for the getpid
implementation itself (if it is indeed a justifiable hotspot a
vDSO symbol could let to a much more simpler solution).
Other usage is mainly for non usual code paths, such as pthread
cancellation signal and handling.
For thread creation (on stack allocation) the code simplification in fact
adds some performance gain due the no need of transverse the stack cache
and invalidate each element pid.
Other thread usages will require a direct getpid syscall, such as
cancellation/setxid signal, thread cancellation, thread fail path (at
create_thread), and thread signal (pthread_kill and pthread_sigqueue).
However these are hardly usual hotspots and I think adding a syscall is
justifiable.
It also simplifies both the clone and vfork arch-specific implementation.
And by review each fork implementation there are some discrepancies that
this patch also solves:
- microblaze clone/vfork does not set/reset the pid/tid field
- hppa uses the default vfork implementation that fallback to fork.
Since vfork is deprecated I do not think we should bother with it.
The patch also removes the TID caching in clone. My understanding for
such semantic is try provide some pthread usage after a user program
issue clone directly (as done by thread creation with CLONE_PARENT_SETTID
and pthread tid member). However, as stated before in multiple discussions
threads, GLIBC provides clone syscalls without further supporting all this
semantics.
I ran a full make check on x86_64, x32, i686, armhf, aarch64, and powerpc64le.
For sparc32, sparc64, and mips I ran the basic fork and vfork tests from
posix/ folder (on a qemu system). So it would require further testing
on alpha, hppa, ia64, m68k, nios2, s390, sh, and tile (I excluded microblaze
because it is already implementing the patch semantic regarding clone/vfork).
[1] https://codereview.chromium.org/800183004/
[2] https://sourceware.org/ml/libc-alpha/2006-07/msg00123.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15368
[4] http://yarchive.net/comp/linux/getpid_caching.html
* sysdeps/nptl/fork.c (__libc_fork): Remove pid cache setting.
* nptl/allocatestack.c (allocate_stack): Likewise.
(__reclaim_stacks): Likewise.
(setxid_signal_thread): Obtain pid through syscall.
* nptl/nptl-init.c (sigcancel_handler): Likewise.
(sighandle_setxid): Likewise.
* nptl/pthread_cancel.c (pthread_cancel): Likewise.
* sysdeps/unix/sysv/linux/pthread_kill.c (__pthread_kill): Likewise.
* sysdeps/unix/sysv/linux/pthread_sigqueue.c (pthread_sigqueue):
Likewise.
* sysdeps/unix/sysv/linux/createthread.c (create_thread): Likewise.
* sysdeps/unix/sysv/linux/getpid.c: Remove file.
* nptl/descr.h (struct pthread): Change comment about pid value.
* nptl/pthread_getattr_np.c (pthread_getattr_np): Remove thread
pid assert.
* sysdeps/unix/sysv/linux/pthread-pids.h (__pthread_initialize_pids):
Do not set pid value.
* nptl_db/td_ta_thr_iter.c (iterate_thread_list): Remove thread
pid cache check.
* nptl_db/td_thr_validate.c (td_thr_validate): Likewise.
* sysdeps/aarch64/nptl/tcb-offsets.sym: Remove pid offset.
* sysdeps/alpha/nptl/tcb-offsets.sym: Likewise.
* sysdeps/arm/nptl/tcb-offsets.sym: Likewise.
* sysdeps/hppa/nptl/tcb-offsets.sym: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym: Likewise.
* sysdeps/ia64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/m68k/nptl/tcb-offsets.sym: Likewise.
* sysdeps/microblaze/nptl/tcb-offsets.sym: Likewise.
* sysdeps/mips/nptl/tcb-offsets.sym: Likewise.
* sysdeps/nios2/nptl/tcb-offsets.sym: Likewise.
* sysdeps/powerpc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/s390/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sh/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sparc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/tile/nptl/tcb-offsets.sym: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/unix/sysv/linux/aarch64/clone.S: Remove pid and tid caching.
* sysdeps/unix/sysv/linux/alpha/clone.S: Likewise.
* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
* sysdeps/unix/sysv/linux/hppa/clone.S: Likewise.
* sysdeps/unix/sysv/linux/i386/clone.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/clone2.S: Likewise.
* sysdeps/unix/sysv/linux/mips/clone.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sh/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/tile/clone.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/aarch64/vfork.S: Remove pid set and reset.
* sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/i386/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/clone.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/mips/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sh/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tile/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tst-clone2.c (f): Remove direct pthread
struct access.
(clone_test): Remove function.
(do_test): Rewrite to take in consideration pid is not cached anymore.