To support Intel Control-flow Enforcement Technology (CET) run-time
control:
1. An architecture specific field in the writable ld.so namespace is
needed to indicate if CET features are enabled at run-time.
2. An architecture specific field in struct link_map is needed if
CET features are enabled in an ELF module.
This patch adds dl-procruntime.c to the writable ld.so namespace and
link_map.h to struct link_map.
Tested with build-many-glibcs.py.
* elf/dl-support.c: Include <dl-procruntime.c>.
* include/link.h: Include <link_map.h>.
* sysdeps/generic/dl-procruntime.c: New file.
* sysdeps/generic/link_map.h: Likewise.
* sysdeps/generic/ldsodefs.h: Include <dl-procruntime.c> in
the writable ld.so namespace.
Verify that sizes, alignments and field offsets of jmp_buf as well as
sigjmp_buf are unchanged regardless how struct __jmp_buf_tag is defined.
Since jmp_buf is target specific, jmp_buf-macros.h is added for each
Linux target. A new target must provides its own jmp_buf-macros.h.
TODO: Hurd needs to provide a jmp_buf-macros.h.
Tested with build-many-glibcs.py.
* include/setjmp.h [!_ISOMAC]: Include <stddef.h> and
<jmp_buf-macros.h>.
[!_ISOMAC] (STR_HELPER): New.
[!_ISOMAC] (STR): Likewise.
[!_ISOMAC] (TEST_SIZE): Likewise.
[!_ISOMAC] (TEST_ALIGN): Likewise.
[!_ISOMAC] (TEST_OFFSET): Likewise.
[!_ISOMAC] Add _Static_assert to check sizes, alignments and
field offsets of jmp_buf as well as sigjmp_buf.
* sysdeps/unix/sysv/linux/aarch64/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/arm/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/i386/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/jmp_buf-macros.h:
Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/jmp_buf-macros.h:
Likewise.
* sysdeps/unix/sysv/linux/nios2/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/jmp_buf-macros.h:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/jmp_buf-macros.h:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/sh/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/jmp_buf-macros.h:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/jmp_buf-macros.h:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilepro/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/jmp_buf-macros.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/jmp_buf-macros.h: Likewise.
This patch adds two new internal defines to set the internal
pthread_mutex_t layout required by the supported ABIS:
1. __PTHREAD_MUTEX_NUSERS_AFTER_KIND which control whether to define
__nusers fields before or after __kind. The preferred value for
is 0 for new ports and it sets __nusers before __kind.
2. __PTHREAD_MUTEX_USE_UNION which control whether internal __spins and
__list members will be place inside an union for linuxthreads
compatibility. The preferred value is 0 for ports and it sets
to not use an union to define both fields.
It fixes the wrong offsets value for __kind value on x86_64-linux-gnu-x32.
Checked with a make check run-built-tests=no on all afected ABIs.
[BZ #22298]
* nptl/allocatestack.c (allocate_stack): Check if
__PTHREAD_MUTEX_HAVE_PREV is non-zero, instead if
__PTHREAD_MUTEX_HAVE_PREV is defined.
* nptl/descr.h (pthread): Likewise.
* nptl/nptl-init.c (__pthread_initialize_minimal_internal):
Likewise.
* nptl/pthread_create.c (START_THREAD_DEFN): Likewise.
* sysdeps/nptl/fork.c (__libc_fork): Likewise.
* sysdeps/nptl/pthread.h (PTHREAD_MUTEX_INITIALIZER): Likewise.
* sysdeps/nptl/bits/thread-shared-types.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
(__pthread_internal_list): Check __PTHREAD_MUTEX_USE_UNION instead
of __WORDSIZE for internal layout.
(__pthread_mutex_s): Check __PTHREAD_MUTEX_NUSERS_AFTER_KIND instead
of __WORDSIZE for internal __nusers layout and __PTHREAD_MUTEX_USE_UNION
instead of __WORDSIZE whether to use an union for __spins and __list
fields.
(__PTHREAD_MUTEX_HAVE_PREV): Define also for __PTHREAD_MUTEX_USE_UNION
case.
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/s390/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/tile/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/x86/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch adds a new build test to check for internal fields
offsets for user visible internal field. Although currently
the only field which is statically initialized to a non zero value
is pthread_mutex_t.__data.__kind value, the tests also check the
offset of __kind, __spins, __elision (if supported), and __list
internal member. A internal header (pthread-offset.h) is added
to each major ABI with the reference value.
Checked on x86_64-linux-gnu and with a build check for all affected
ABIs (aarch64-linux-gnu, alpha-linux-gnu, arm-linux-gnueabihf,
hppa-linux-gnu, i686-linux-gnu, ia64-linux-gnu, m68k-linux-gnu,
microblaze-linux-gnu, mips64-linux-gnu, mips64-n32-linux-gnu,
mips-linux-gnu, powerpc64le-linux-gnu, powerpc-linux-gnu,
s390-linux-gnu, s390x-linux-gnu, sh4-linux-gnu, sparc64-linux-gnu,
sparcv9-linux-gnu, tilegx-linux-gnu, tilegx-linux-gnu-x32,
tilepro-linux-gnu, x86_64-linux-gnu, and x86_64-linux-x32).
* nptl/pthreadP.h (ASSERT_PTHREAD_STRING,
ASSERT_PTHREAD_INTERNAL_OFFSET): New macro.
* nptl/pthread_mutex_init.c (__pthread_mutex_init): Add build time
checks for internal pthread_mutex_t offsets.
* sysdeps/aarch64/nptl/pthread-offsets.h
(__PTHREAD_MUTEX_NUSERS_OFFSET, __PTHREAD_MUTEX_KIND_OFFSET,
__PTHREAD_MUTEX_SPINS_OFFSET, __PTHREAD_MUTEX_ELISION_OFFSET,
__PTHREAD_MUTEX_LIST_OFFSET): New macro.
* sysdeps/alpha/nptl/pthread-offsets.h: Likewise.
* sysdeps/arm/nptl/pthread-offsets.h: Likewise.
* sysdeps/hppa/nptl/pthread-offsets.h: Likewise.
* sysdeps/i386/nptl/pthread-offsets.h: Likewise.
* sysdeps/ia64/nptl/pthread-offsets.h: Likewise.
* sysdeps/m68k/nptl/pthread-offsets.h: Likewise.
* sysdeps/microblaze/nptl/pthread-offsets.h: Likewise.
* sysdeps/mips/nptl/pthread-offsets.h: Likewise.
* sysdeps/nios2/nptl/pthread-offsets.h: Likewise.
* sysdeps/powerpc/nptl/pthread-offsets.h: Likewise.
* sysdeps/s390/nptl/pthread-offsets.h: Likewise.
* sysdeps/sh/nptl/pthread-offsets.h: Likewise.
* sysdeps/sparc/nptl/pthread-offsets.h: Likewise.
* sysdeps/tile/nptl/pthread-offsets.h: Likewise.
* sysdeps/x86_64/nptl/pthread-offsets.h: Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Update strcasestr-power8 to use power8 version of strnlen for
calculating length.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This patch simplify Linux sigqueue implementation by assuming
__NR_rt_sigqueueinfo existence due minimum kernel requirement
(it pre-dates Linux git inclusion for Linux 2.6.12).
Checked on x86_64-linux-gnu.
* sysdeps/unix/sysv/linux/sigqueue.c (__sigqueue): Asssume
__NR_rt_sigqueueinfo.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Zack Weinberg <zackw@panix.com>
This patch simplifies sig{timed}wait{info} by:
- Assuming __NR_rt_sigtimedwait existence on all architectures due minimum
kernel version requirement (it pre-dates Linux git inclusion for Linux
2.6.12).
- Call __sigtimedwait on both sigwait and sigwaitinfo.
- Now that sigwait is based on an internal sigtimedwait call and it is
present of both libc.so and libpthread.so we need to add an external
private definition of __sigtimedwait for libpthread.so call.
Checked on x86_64-linux-gnu.
* sysdeps/unix/sysv/linux/Versions (libc) [GLIBC_PRIVATE]: Add
__sigtimedwait.
* sysdeps/unix/sysv/linux/sigtimedwait.c: Simplify includes and
assume __NR_rt_sigtimedwait.
* sysdeps/unix/sysv/linux/sigwait.c (__sigwait): Call __sigtimedwait
and add LIBC_CANCEL_HANDLED for cancellation marking.
* sysdeps/unix/sysv/linux/sigwaitinfo.c (__sigwaitinfo): Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Zack Weinberg <zackw@panix.com>
This patch refactor ARM memchr ifunc selector to a C implementation.
No functional change is expected, including ifunc resolution rules.
It also reorganize the ifunc options code:
1. The memchr_impl.S is renamed to memchr_neon.S and multiple
compilation options (which route to armv6t2/memchr one) is
removed. The code to build if __ARM_NEON__ is defined is
also simplified.
2. A memchr_noneon is added (which as build along previous ifunc
resolution) and includes the armv6t2 direct.
3. Same as 2. for loader object.
Alongside the aforementioned changes, it also some cleanus:
- Internal memchr definition (__GI_memcpy) is now a hidden
symbol.
- No need to create hidden definition for the ifunc variants.
Checked on armv7-linux-gnueabihf and with a build for arm-linux-gnueabi,
arm-linux-gnueabihf with and without multiarch support and with both
GCC 7.1 and GCC mainline.
* sysdeps/arm/armv7/multiarch/Makefile [$(subdir) = string]
(sysdeps_routines): Add memchr_noneon.
* sysdeps/arm/armv7/multiarch/ifunc-memchr.h: New file.
* sysdeps/arm/armv7/multiarch/memchr_noneon.S: Likewise.
* sysdeps/arm/armv7/multiarch/rtld-memchr.S: Likewise.
* sysdeps/arm/armv7/multiarch/memchr.S: Remove file.
* sysdeps/arm/armv7/multiarch/memchr.c: New file.
* sysdeps/arm/armv7/multiarch/memchr_impl.S: Move to ...
* sysdeps/arm/armv7/multiarch/memchr_neon.S: ... here.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch refactor ARM memcpy ifunc selector to a C implementation.
No functional change is expected, including ifunc resolution rules.
It also adds some cleanup:
- Internal memcpy hidden definition (__GI_memcpy) is now a hidden
symbol.
- No need to create hidden definition for the ifunc variants.
Checked on armv7-linux-gnueabihf and with a build for arm-linux-gnueabi,
arm-linux-gnueabihf with and without multiarch support and with both
GCC 7.1 and GCC mainline. I also checked with the some possible
multiarch different configurations that trigger different memcpy
buids (__ARM_NEON__ && !__SOFT_FP__, !__ARM_NEON__ && !__SOFT_FP__, and
!__ARM_NEON__ && __SOFT_FP__).
* sysdeps/arm/arm-ifunc.h: New file.
* sysdeps/arm/armv7/multiarch/ifunc-memcpy.h: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy.c: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy_arm.S: Likewise.
* sysdeps/arm/armv7/multiarch/rtld-memcpy.S: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy_neon.S [!__ARM_NEON__]
(__memcpy_neon): Avoid create hidden alias.
* sysdeps/arm/armv7/multiarch/memcpy_vfp.S [!__ARM_NEON_]
(__memcpy_vfp): Likewise.
* sysdeps/arm/armv7/multiarch/Makefile [$(subdir) = string]
(sysdep_routines): Add memcpy_arm.
* sysdeps/arm/armv7/multiarch/memcpy.S: Remove file.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The powerpc bits/floatn.h declares _Float128 support to be present
when the compiler supports it for powerpc64le. However, in the case
where -mlong-double-64 is used, __MATH_TG does not actually support
_Float128; it only supports _Float128 in the distinct-long-double
case.
This shows up as a build failure when building glibc mainline with GCC
mainline, given the recently added sanity check in math.h for
configurations supported by __MATH_TG, as the compat code for
-mlong-double-64 fails to build. However, the bug was logically
present before that change (including in 2.26), just less visible.
This patch fixes the build failure by declaring _Float128 to be
unsupported in that case. (Of course this can't actually stop users
calling the type-generic macros with _Float128 arguments with
-mlong-double-64, just as they could be called with other unsupported
types on other platforms, but perhaps makes it less likely by making
all the type-specific _Float128 interfaces invisible in that case.)
Tested compilation for powerpc64le with build-many-glibcs.py.
[BZ #22402]
* sysdeps/powerpc/bits/floatn.h: Include <bits/long-double.h>.
[__NO_LONG_DOUBLE_MATH] (__HAVE_FLOAT128): Define to 0.
Using the cache hierarchy linesize minimum in CTR_EL0.
See the comment within the code for rationale.
* sysdeps/unix/sysv/linux/aarch64/sysconf.c: New file.
Remove some load/store instructions from the dynamic tlsdesc resolver
fast path. This gives around 20% faster tls access in dlopened shared
libraries (assuming glibc ran out of static tls space).
* sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Optimize.
Lazy tlsdesc initialization is no longer used in the dynamic linker
so all related code can be removed.
* sysdeps/arm/dl-machine.h (elf_machine_runtime_setup): Remove
DT_TLSDESC_GOT initialization.
* sysdeps/arm/dl-tlsdesc.S (_dl_tlsdesc_lazy_resolver): Remove.
(_dl_tlsdesc_resolve_hold): Likewise.
* sysdeps/aarch64/dl-tlsdesc.h (_dl_tlsdesc_lazy_resolver): Remove.
(_dl_tlsdesc_resolve_hold): Likewise.
* sysdeps/aarch64/tlsdesc.c (_dl_tlsdesc_lazy_resolver_fixup): Remove.
(_dl_tlsdesc_resolve_hold_fixup): Likewise.
Follow up to
https://sourceware.org/ml/libc-alpha/2015-11/msg00272.html
Always do tls descriptor initialization at load time during relocation
processing (as if DF_BIND_NOW were set for the binary) to avoid barriers
at every tls access. This patch mimics bind-now semantics in the lazy
relocation code of the arm target (elf_machine_lazy_rel).
Ideally the static linker should be updated too to not emit tlsdesc
relocs in DT_REL*, so elf_machine_lazy_rel is not called on them at all.
[BZ #18572]
* sysdeps/arm/dl-machine.h (elf_machine_lazy_rel): Do symbol binding
non-lazily for R_ARM_TLS_DESC.
This patch reverts
commit 9c82da17b5
Author: Maciej W. Rozycki <macro@codesourcery.com>
Date: 2014-07-17 19:22:05 +0100
[BZ #17078] ARM: R_ARM_TLS_DESC prelinker support
This only implemented support for the lazy binding case (and thus
closed the bugzilla ticket prematurely), however tlsdesc on arm is
not correct with lazy binding because there is a data race between
the lazy initialization code and tlsdesc resolver functions.
Lazy initialization of tlsdesc entries will be removed from arm to
fix the data races and thus this half-finished prelinker support
is no longer useful.
[BZ #17078]
* sysdeps/arm/dl-machine.h (elf_machine_rela): Remove the
R_ARM_TLS_DESC case.
(elf_machine_lazy_rel): Remove the prelink check.
Always do TLS descriptor initialization at load time during relocation
processing to avoid barriers at every TLS access. In non-dlopened shared
libraries the overhead of tls access vs static global access is > 3x
bigger when lazy initialization is used (_dl_tlsdesc_return_lazy)
compared to bind-now (_dl_tlsdesc_return) so the barriers dominate tls
access performance.
TLSDESC relocs are in DT_JMPREL which are processed at load time using
elf_machine_lazy_rel which is only supposed to do lightweight
initialization using the DT_TLSDESC_PLT trampoline (the trampoline code
jumps to the entry point in DT_TLSDESC_GOT which does the lazy tlsdesc
initialization at runtime). This patch changes elf_machine_lazy_rel
in aarch64 to do the symbol binding and initialization as if DF_BIND_NOW
was set, so the non-lazy code path of elf/do-rel.h was replicated.
The static linker could be changed to emit TLSDESC relocs in DT_REL*,
which are processed non-lazily, but the goal of this patch is to always
guarantee bind-now semantics, even if the binary was produced with an
old linker, so the barriers can be dropped in tls descriptor functions.
After this change the synchronizing ldar instructions can be dropped
as well as the lazy initialization machinery including the DT_TLSDESC_GOT
setup.
I believe this should be done on all targets, including ones where no
barrier is needed for lazy initialization. There is very little gain in
optimizing for large number of symbolic tlsdesc relocations which is an
extremely uncommon case. And currently the tlsdesc entries are only
readonly protected with -z now and some hardennings against writable
JUMPSLOT relocs don't work for TLSDESC so they are a security hazard.
(But to fix that the static linker has to be changed.)
* sysdeps/aarch64/dl-machine.h (elf_machine_lazy_rel): Do symbol
binding and initialization non-lazily for R_AARCH64_TLSDESC.
Add a new header file, sysdeps/x86/sysdep.h, for common assembly code
macros between i386 and x86-64. Tested on i686 and x86-64. There are
no differences in outputs of "readelf -a" and "objdump -dw" on all glibc
shared objects before and after the patch.
* sysdeps/i386/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
* sysdeps/x86/sysdep.h: New file.
* sysdeps/x86_64/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
sigprocmask.c, sigtimedwait.c, sigwait.c and sigwaitinfo.c files from
sysdeps/unix/sysv/linux include nptl-signals.h via nptl/pthreadP.h,
and so SIGCANCEL and SIGSETXID become defined unconditionally. But
later in the code, there are some checks weither symbols defined,
which is useless. This patch removes useless checks.
Checked on x86_64-linux-gnu.
* sysdeps/unix/sysv/linux/sigprocmask.c: Remove useless #ifdefs.
* sysdeps/unix/sysv/linux/sigtimedwait.c: Likewise.
* sysdeps/unix/sysv/linux/sigwait.c: Likewise.
* sysdeps/unix/sysv/linux/sigwaitinfo.c: Likewise.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
ia64, s390-64, sparc64 and x86_64 host their own implementation of
sigpending() in corresponding files, but they are identical to generic
linux file despite few comments. This patch removes that files, so the
implementation of sigpending() is taken from sysdeps/unix/sysv/linux
for all ports.
Build-tested on x86_64.
* sysdeps/unix/sysv/linux/ia64/sigpending.c: Remove file.
* sysdeps/unix/sysv/linux/s390/s390-64/sigpending.c: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/sigpending.c: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sigpending.c: Likewise.
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This is another one where we'll be wanting the base symbols for
powerpc64le rather than just a power7 variant.
* sysdeps/powerpc/powerpc64/multiarch/strncase_l-power7.c: Include
string/strncase_l.c, not string/strncase.c.
(USE_IN_EXTENDED_LOCALE_MODEL): Don't define.
(libc_hidden_def): Redefine.
The routine being assembled here is strcasecmp_l, so ask for that via
__STRCMP and STRCMP defines. That change means tweaking the power7
override. Needed for later powerpc64le changes where we want the base
symbols, not just a power7 variant.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S:
(__STRCMP, STRCMP, __strcasecmp_l): Define.
(__strcasecmp): Don't define.
These functions aren't used in ld.so at the moment since we don't have
strcmp or strncmp ifuncs for them there. Remove the ld.so bloat.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S: Wrap in
IS_IN (libc).
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise.
USE_AS_STPNCPY is defined by sysdeps/powerpc/powerpc64/power8/stpncpy.S,
included by this file.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Don't define
USE_AS_STPNCPY.
It seems to me that libc.a should not contain any of the __GI_
symbols, and certainly --enable-multi-arch ought to not add to the
list. At the end of this patch series we have the following in both
--enable-multi-arch and --disable-multi-arch libc.a:
0000000000000000 T __GI___readdir64
0000000000000000 T __GI___fxstatat64
0000000000000000 T __GI_getrlimit
0000000000000000 T __GI___getrlimit
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S (hidden_def):
Redefine only when SHARED.
i586 strcpy.S used a clever trick with LEA to implement jump table:
/* ECX has the last 2 bits of the address of source - 1. */
andl $3, %ecx
call 2f
2: popl %edx
/* 0xb is the distance between 2: and 1:. */
leal 0xb(%edx,%ecx,8), %ecx
jmp *%ecx
.align 8
1: /* ECX == 0 */
orb (%esi), %al
jz L(end)
stosb
xorl %eax, %eax
incl %esi
/* ECX == 1 */
orb (%esi), %al
jz L(end)
stosb
xorl %eax, %eax
incl %esi
/* ECX == 2 */
orb (%esi), %al
jz L(end)
stosb
xorl %eax, %eax
incl %esi
/* ECX == 3 */
L(1): movl (%esi), %ecx
leal 4(%esi),%esi
This fails if there are instruction length changes before L(1):. This
patch replaces it with conditional branches:
cmpb $2, %cl
je L(Src2)
ja L(Src3)
cmpb $1, %cl
je L(Src1)
L(Src0):
which have similar performance and work with any instruction lengths.
Tested on i586 and i686 with and without --disable-multi-arch.
[BZ #22353]
* sysdeps/i386/i586/strcpy.S (STRCPY): Use conditional branches.
(1): Renamed to ...
(L(Src0)): This.
(L(Src1)): New.
(L(Src2)): Likewise.
(L(1)): Renamed to ...
(L(Src3)): This.
POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load traps to the kernel, causing a performance degradation. To
handle this in memcpy and memmove, lvx/stvx is used for aligned
addresses instead of lxvd2x/stxvd2x.
Reference: https://patchwork.ozlabs.org/patch/814059/
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Replace
lxvd2x/stxvd2x with lvx/stvx.
* sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The glibc implementation of iseqsig relies on ordered comparison
operators raising the "invalid" exception for quiet NaN operands, with
a workaround on platforms where a GCC bug means that exception is not
raised. For x86, that bug has now been fixed for GCC 8, so this patch
disables the workaround in that case. If and when the corresponding
bugs for powerpc and s390 are fixed, the headers for those platforms
should of course be updated similarly.
Tested for x86_64 and x86, including with GCC mainline. Note that
other failures appear with GCC mainline because of spurious use of
ordered comparison instructions for unordered operations
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82692>.
* sysdeps/x86/fpu/fix-fp-int-compare-invalid.h
(FIX_COMPARE_INVALID): Define to 0 if [__GNUC_PREREQ (8, 0)].
As shown in some buildbot issues on aarch64 and powerpc, calling
clone (VFORK) and waitpid (WNOHANG) does not guarantee the child
is ready to be collected. This patch changes the call back to 0
as before fe05e1cb6d fix.
This change can lead to the scenario 4.3 described in the commit,
where the waitpid call can hang undefinitely on the call. However
this is also a very unlikely and also undefinied situation where
both the caller is trying to terminate a pid before posix_spawn
returns and the race pid reuse is triggered. I don't see how to
correct handle this specific situation within posix_spawn.
Checked on x86_64-linux-gnu, aarch64-linux-gnu and
powerpc64-linux-gnu.
* sysdeps/unix/sysv/linux/spawni.c (__spawnix): Use 0 instead of
WNOHANG in waitpid call.
cfi info for stack adjust needs to be on the insn doing the adjust.
cfi describing register saves can be anywhere after the save insn but
before the reg is altered. Fewer locations with cfi result in smaller
cfi programs and possibly slightly faster exception handling. Thus
the LR cfi_offset move.
The idea behind ajusting sp after restoring regs is to break a
register dependency chain, in this case not be using r1 immediately
after it is modified.
The missing LR cfi_restore meant that code after the blr,
unaligned_lt_16 and other labels, would have cfi that said LR was at
cfa+16, but that code is reached without LR being saved.
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Move LR cfi.
Adjust stack after restoring regs. Add missing LR cfi_restore.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This patch moves the frame setup and teardown to immediately around
the single memset call, as has been done for power8. I've also
decreased FRAMESIZE to that needed to save the two callee-saved
registers used. Plus added cfi.
* sysdeps/powerpc/powerpc64/power7/strncpy.S: Decrease FRAMESIZE.
Move LR save and frame setup/teardown and LR restore to
immediately around memset call. Provide cfi.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This patch replaces i386 assembly versions of e_exp2f with generic
e_exp2f.c. For workload-spec2017.wrf, on Nehalem, it improves
performance by:
Before After Improvement
reciprocal-throughput 112.996 40.0454 182%
latency 126.581 54.4479 132%
On Skylake, it improves performance by:
Before After Improvement
reciprocal-throughput 113.14 39.447 186%
latency 136.068 55.684 144%
On IvyBridge with --disable-multi-arch, it improves performance by:
Before After Improvement
reciprocal-throughput 132.521 40.3759 228%
latency 145.791 58.4587 149%
* sysdeps/i386/fpu/e_exp2f.S: Removed.
* sysdeps/i386/fpu/w_exp2f.c: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_exp2f.c.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
Add e_exp2f-sse2.
(CFLAGS-e_exp2f-sse2.c): New.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f-sse2.c: New file.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Likewise.