glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-03 10:21:05 +00:00

Author	SHA1	Message	Date
Samuel Thibault	35cf8a85ed	hurd: Bump BRK_START to 0x20000000 By nowadays uses, 256MiB is not that large for the program+libraries. Let's push the heap further to leave room for e.g. clang.	2021-12-31 18:25:49 +01:00
Samuel Thibault	8c0727af63	hurd: Avoid overzealous shared objects constraints `407765e9f2` ("hurd: Fix ELF_MACHINE_USER_ADDRESS_MASK value") switched ELF_MACHINE_USER_ADDRESS_MASK from 0xf8000000UL to 0xf0000000UL to let libraries etc. get loaded at 0x08000000. But ELF_MACHINE_USER_ADDRESS_MASK is actually only meaningful for the main program anyway, so keep it at 0xf8000000UL to prevent the program loader from putting ld.so beyond 0x08000000. And conversely, drop the use of ELF_MACHINE_USER_ADDRESS_MASK for shared objects, which don't need any constraints since the program will have already be loaded by then.	2021-12-31 18:22:46 +01:00
Adhemerval Zanella	1f17da01e6	time: Refactor timesize.h for some ABIs Commit `a4b4131355` changed default __TIMESIZE to 64, however it added sub-architecture timesize.h for powerpc, s390, and sparc. Also simplify mips by removing _MIPS_SIM usage (which would require to add sgidefs inclusion.	2021-12-31 10:58:13 -03:00
Samuel Thibault	33e8e95cbd	hurd: Make getrandom a stub inside the random translator glibc uses /dev/urandom for getrandom(), and from version 2.34 malloc initialization uses it. We have to detect when we are running the random translator itself, in which case we can't read ourself.	2021-12-31 08:54:41 +01:00
Stafford Horne	4dfa8f4870	open64: Force O_LARGEFILE on all architectures When running tests on OpenRISC which has 32-bit wordsize but 64-bit timesize it was found that O_LARGEFILE is not being set when calling open64. For 64-bit architectures the O_LARGEFILE flag is generally implied by the kernel according to force_o_largefile. However, for 32-bit architectures this is not done. For this patch we unconditionally now set the O_LARGEFILE flag for open64 class syscalls as there is no harm in doing so. Tested on the OpenRISC the build works and timezone/tst-tzset passes which was failing before. I would expect this also would fix arc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-31 07:10:19 +09:00
Sunil K Pandey	c21c7bc24e	x86-64: Add vector tan/tanf implementation to libmvec Implement vectorized tan/tanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tan/tanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-30 10:19:13 -08:00
Sunil K Pandey	8881cca8fb	x86-64: Add vector erfc/erfcf implementation to libmvec Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erfc/erfcf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-30 10:19:03 -08:00
Sunil K Pandey	e682d01578	x86-64: Add vector asinh/asinhf implementation to libmvec Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asinh/asinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:56 -08:00
Sunil K Pandey	c0f36fc303	x86-64: Add vector tanh/tanhf implementation to libmvec Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tanh/tanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:50 -08:00
Sunil K Pandey	f9ce13fdac	x86-64: Add vector erf/erff implementation to libmvec Implement vectorized erf/erff containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erf/erff with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:44 -08:00
Sunil K Pandey	0625489ccc	x86-64: Add vector acosh/acoshf implementation to libmvec Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acosh/acoshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:39 -08:00
Sunil K Pandey	6dea4dd3da	x86-64: Add vector atanh/atanhf implementation to libmvec Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atanh/atanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:34 -08:00
Sunil K Pandey	74265c16ab	x86-64: Add vector log1p/log1pf implementation to libmvec Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log1p/log1pf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:27 -08:00
Sunil K Pandey	7e1722fec8	x86-64: Add vector log2/log2f implementation to libmvec Implement vectorized log2/log2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log2/log2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:21 -08:00
Sunil K Pandey	8f8566026d	x86-64: Add vector log10/log10f implementation to libmvec Implement vectorized log10/log10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log10/log10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:15 -08:00
Sunil K Pandey	2941a24f8c	x86-64: Add vector atan2/atan2f implementation to libmvec Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan2/atan2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:09 -08:00
Sunil K Pandey	2bf02c5843	x86-64: Add vector cbrt/cbrtf implementation to libmvec Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:02 -08:00
Sunil K Pandey	aa1809a1df	x86-64: Add vector sinh/sinhf implementation to libmvec Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector sinh/sinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:55 -08:00
Sunil K Pandey	76ddc74e86	x86-64: Add vector expm1/expm1f implementation to libmvec Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector expm1/expm1f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:49 -08:00
Sunil K Pandey	ef7ea9c132	x86-64: Add vector cosh/coshf implementation to libmvec Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cosh/coshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:42 -08:00
Sunil K Pandey	8b726453d5	x86-64: Add vector exp10/exp10f implementation to libmvec Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp10/exp10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:35 -08:00
Sunil K Pandey	3fc9ccc20b	x86-64: Add vector exp2/exp2f implementation to libmvec Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp2/exp2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:29 -08:00
Sunil K Pandey	37475ba883	x86-64: Add vector hypot/hypotf implementation to libmvec Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector hypot/hypotf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:21 -08:00
Sunil K Pandey	11c01de14c	x86-64: Add vector asin/asinf implementation to libmvec Implement vectorized asin/asinf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asin/asinf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:03 -08:00
Sunil K Pandey	146310177a	x86-64: Add vector atan/atanf implementation to libmvec Implement vectorized atan/atanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan/atanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:36:46 -08:00
Florian Weimer	5d28a8962d	elf: Add _dl_find_object function It can be used to speed up the libgcc unwinder, and the internal _dl_find_dso_for_object function (which is used for caller identification in dlopen and related functions, and in dladdr). _dl_find_object is in the internal namespace due to bug 28503. If libgcc switches to _dl_find_object, this namespace issue will be fixed. It is located in libc for two reasons: it is necessary to forward the call to the static libc after static dlopen, and there is a link ordering issue with -static-libgcc and libgcc_eh.a because libc.so is not a linker script that includes ld.so in the glibc build tree (so that GCC's internal -lc after libgcc_eh.a does not pick up ld.so). It is necessary to do the i386 customization in the sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because otherwise, multilib installations are broken. The implementation uses software transactional memory, as suggested by Torvald Riegel. Two copies of the supporting data structures are used, also achieving full async-signal-safety. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-28 22:52:56 +01:00
Adhemerval Zanella	83b8d5027d	malloc: Remove memusage.h And use machine-sp.h instead. The Linux implementation is based on already provided CURRENT_STACK_FRAME (used on nptl code) and STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	a75b1e35c5	malloc: Use hp-timing on libmemusage Instead of reimplemeting on GETTIME macro.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	92ff345137	Remove atomic-machine.h atomic typedefs Now that memusage.c uses generic types we can remove them.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	5a5f7a160d	malloc: Remove atomic_* usage These typedef are used solely on memusage and can be replaced with generic types.	2021-12-28 14:57:57 -03:00
Thomas Petazzoni	c75aa9246a	microblaze: Add missing implementation when !__ASSUME_TIME64_SYSCALLS In commit `a92f4e6299` ("linux: Add time64 pselect support"), a Microblaze specific implementation of __pselect32() was added to cover the case of kernels < 3.15 which lack the pselect6 system call. This new file sysdeps/unix/sysv/linux/microblaze/pselect32.c takes precedence over the default implementation sysdeps/unix/sysv/linux/pselect32.c. However sysdeps/unix/sysv/linux/pselect32.c provides an implementation of __pselect32() which is needed when __ASSUME_TIME64_SYSCALLS is not defined. On Microblaze, which is a 32-bit architecture, __ASSUME_TIME64_SYSCALLS is only true for kernels >= 5.1. Due to sysdeps/unix/sysv/linux/microblaze/pselect32.c taking precedence over sysdeps/unix/sysv/linux/pselect32.c, it means that when we are with a kernel >= 3.15 but < 5.1, we need a __pselect32() implementation, but sysdeps/unix/sysv/linux/microblaze/pselect32.c doesn't provide it, and sysdeps/unix/sysv/linux/pselect32.c which would provide it is not compiled in. This causes the following build failure on Microblaze with for example Linux kernel headers 4.9: [...]/build/libc_pic.os: in function `__pselect64': (.text+0x120b44): undefined reference to `__pselect32' collect2: error: ld returned 1 exit status Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-28 09:09:49 -03:00
Adhemerval Zanella	8c0664e2b8	elf: Add _dl_audit_pltexit It consolidates the code required to call la_pltexit audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	eff687e846	elf: Add _dl_audit_pltenter It consolidates the code required to call la_pltenter audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	0b98a87487	elf: Add _dl_audit_preinit It consolidates the code required to call la_preinit audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	cda4f265c6	elf: Add _dl_audit_symbind_alt and _dl_audit_symbind It consolidates the code required to call la_symbind{32,64} audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	311c9ee54e	elf: Add _dl_audit_objclose It consolidates the code required to call la_objclose audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	c91008d349	elf: Add _dl_audit_objsearch It consolidates the code required to call la_objsearch audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	3dac3959a5	elf: Add _dl_audit_activity_map and _dl_audit_activity_nsid It consolidates the code required to call la_activity audit callback. Also for a new Lmid_t the namespace link_map list are empty, so it requires to check if before using it. This can happen for when audit module is used along with dlmopen. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	aee6e90f93	elf: Add _dl_audit_objopen It consolidates the code required to call la_objopen audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Samuel Thibault	ae49f218da	hurd: Fix static-PIE startup hurd initialization stages use RUN_HOOK to run various initialization functions. That is however using absolute addresses which need to be relocated, which is done later by csu. We can however easily make the linker compute relative addresses which thus don't need a relocation. The new SET_RELHOOK and RUN_RELHOOK macros implement this.	2021-12-28 10:28:22 +01:00
Samuel Thibault	2ce0481d26	hurd: let csu initialize tls Since `9cec82de71` ("htl: Initialize later"), we let csu initialize pthreads. We can thus let it initialize tls later too, to better align with the generic order. Initialization however accesses ports which links/unlinks into the sigstate for unwinding. We can however easily skip that during initialization.	2021-12-28 10:15:52 +01:00
Samuel Thibault	7b358de1af	hurd: Fix XFAIL-ing mallocfork2 tests They are using setpshared but are outside the htl directory.	2021-12-27 22:21:08 +01:00
Samuel Thibault	1c6e6e52e5	hurd: XFAIL more tests that require setpshared support	2021-12-27 22:15:43 +01:00
Noah Goldstein	cca457f9c5	x86: Optimize L(less_vec) case in memcmpeq-evex.S No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-27 03:18:58 -06:00
Noah Goldstein	abddd61de0	x86: Optimize L(less_vec) case in memcmp-evex-movbe.S No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-27 03:17:59 -06:00
Adhemerval Zanella	a4b4131355	Set default __TIMESIZE default to 64 This is expected size for newer ABIs.	2021-12-23 11:41:08 -03:00
Sunil K Pandey	f20f980c71	x86-64: Add vector acos/acosf implementation to libmvec Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-22 13:03:14 -08:00
H.J. Lu	d3e4f5a101	s_sincosf.h: Change pio4 type to float [BZ #28713 ] s_cosf.c and s_sinf.c have if (abstop12 (y) < abstop12 (pio4)) where abstop12 takes a float argument, but pio4 is static const double. pio4 is used only in calls to abstop12 and never in arithmetic. Apply -static const double pio4 = 0x1.921FB54442D18p-1; +static const float pio4 = 0x1.921FB6p-1f; to fix: FAIL: math/test-float-cos FAIL: math/test-float-sin FAIL: math/test-float-sincos FAIL: math/test-float32-cos FAIL: math/test-float32-sin FAIL: math/test-float32-sincos when compiling with GCC 12. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-21 08:56:12 -08:00
maminjie	e0fc721ce6	Linux: Fix 32-bit vDSO for clock_gettime on powerpc32 When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID, on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully ( because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S does not support these two IDs, the 32-bit time_t syscall will be used), but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used, resulting in two system calls. Fix commit `72e84d1db2`. Signed-off-by: maminjie <maminjie2@huawei.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-21 09:47:16 -03:00
H.J. Lu	de8a0897e3	Regenerate ulps on x86_64 with GCC 12 Fix FAIL: math/test-float-clog10 FAIL: math/test-float32-clog10 on Intel Core i7-1165G7 with GCC 12.	2021-12-20 15:25:00 -08:00
Joseph Myers	a94d9659cd	Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol numbers for PF_CAN") but apparently missed for glibc at the time. Tested for x86_64.	2021-12-20 15:38:32 +00:00
Adhemerval Zanella	691d9ae9e6	Remove ununsed tcb-offset Some architectures do not use the auto-generated tcb-offsets.h.	2021-12-17 17:47:29 -03:00
Aurelien Jarno	225da459ce	riscv: align stack before calling _dl_init [BZ #28703 ] Align the stack pointer to 128 bits during the call to _dl_init() as specified by the RISC-V ABI [1]. This fixes the elf/tst-align2 test. Fixes bug 28703. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc	2021-12-17 20:29:34 +01:00
Aurelien Jarno	d2e594d715	riscv: align stack in clone [BZ #28702 ] The RISC-V ABI [1] mandates that "the stack pointer shall be aligned to a 128-bit boundary upon procedure entry". This as not the case in clone. This fixes the misc/tst-misalign-clone-internal and misc/tst-misalign-clone tests. Fixes bug 28702. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc	2021-12-17 20:29:32 +01:00
Aurelien Jarno	94058f6cde	elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704 ] On KVM guests running on some AMD systems, the IBRS feature is reported as a synthetic feature using the Intel feature, while the cpuinfo entry keeps the same. Handle that by first checking the presence of the Intel feature on AMD systems. Fixes bug 28704.	2021-12-17 20:20:15 +01:00
Matheus Castanho	ae91d3df24	powerpc64[le]: Allocate extra stack frame on syscall.S The syscall function does not allocate the extra stack frame for scv like other assembly syscalls using DO_CALL_SCV. So after commit `d120fb9941` changed the offset that is used to save LR, syscall ended up using an invalid offset, causing regressions on powerpc64. So make sure the extra stack frame is allocated in syscall.S as well to make it consistent with other uses of DO_CALL_SCV and avoid similar issues in the future. Tested on powerpc, powerpc64, and powerpc64le (with and without scv) Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-12-17 15:40:53 -03:00
Florian Weimer	ce1e5b1122	arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macro Due to PIE-by-default, PIC is now defined in more cases. libc.a does not have _rtld_global_ro, and statically linking setcontext fails. SHARED is the right condition to use, so that libc.a references _dl_hwcap instead of _rtld_global_ro. For static PIE support, the !SHARED case would still have to be made PIC. This patch does not achieve that. Fixes commit `23645707f1` ("Replace --enable-static-pie with --disable-default-pie"). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-17 11:48:44 +01:00
Adhemerval Zanella	98d5fcb8d0	malloc: Add Huge Page support for mmap With the morecore hook removed, there is not easy way to provide huge pages support on with glibc allocator without resorting to transparent huge pages. And some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly with the system default size, while a positive value means and specific page size that is matched against the supported ones by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:38 -03:00
Adhemerval Zanella	5f6d8d97c6	malloc: Add madvise support for Transparent Huge Pages Linux Transparent Huge Pages (THP) current supports three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous pages. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:14 -03:00
Florian Weimer	cb976fba4c	powerpc: Use global register variable in <thread_pointer.h> A local register variable is merely a compiler hint, and so not appropriate in this context. Move the global register variable into <thread_pointer.h> and include it from <tls.h>, as there can only be one global definition for one particular register. Fixes commit `8dbeb0561e` ("nptl: Add <thread_pointer.h> for defining __thread_pointer"). Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-12-15 16:06:25 +01:00
H.J. Lu	4435c29892	Support target specific ALIGN for variable alignment test [BZ #28676 ] Add <tst-file-align.h> to support target specific ALIGN for variable alignment test: 1. Alpha: Use 0x10000. 2. MicroBlaze and Nios II: Use 0x8000. 3. All others: Use 0x200000. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-14 14:50:33 -08:00
Samuel Thibault	ec06717856	hurd: Do not set PIE_UNSUPPORTED This is now supported.	2021-12-14 08:38:05 +01:00
Akila Welihinda	3b1402b3fc	sysdeps: Simplify sin Taylor Series calculation The macro TAYLOR_SIN adds the term `-0.5daa^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5daa^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-13 15:31:05 +01:00
Adhemerval Zanella	104d2005d5	math: Remove the error handling wrapper from hypot and hypotf The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Wilco Dijkstra	2f44eef584	math: Use fmin/fmax on hypot It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	ecb94e9587	aarch64: Add math-use-builtins-f{max,min}.h It allows to remove the arch-specific implementations.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	583c4d424e	math: Add math-use-builtinds-fmin.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 10:08:43 -03:00
Adhemerval Zanella	72ab1eaec7	math: Add math-use-builtinds-fmax.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	2eb1cd2f47	math: Remove powerpc e_hypot The generic implementation is shows only slight worse performance: POWER10 reciprocal-throughput latency master 8.28478 13.7253 new hypot 7.21945 13.1933 POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 14.8479 15.8061 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 16.5371 18.4057 One way to improve might to make gcc generate xsmaxdp/xsmindp for fmax/fmin (it onl does for -ffast-math, clang does for default options). Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu (power9).	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	a1d3c9b642	i386: Move hypot implementation to C The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu.	2021-12-13 09:08:02 -03:00
Adhemerval Zanella	c212d6397e	math: Use an improved algorithm for hypotl (ldbl-128) This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Adhemerval Zanella	aa9c28cde3	math: Use an improved algorithm for hypotl (ldbl-96) This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.02% results with an error of 1 ulp (23158 results) while the new implementation only shows 0.0001% of total (111). [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Wilco Dijkstra	ccfa865a82	math: Improve hypot performance with FMA Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite.	2021-12-13 09:02:34 -03:00
Wilco Dijkstra	6c848d7038	math: Use an improved algorithm for hypot (dbl-64) This implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for denormal results. The main advantage of the new algorithm is its precision: with a random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc current implementation shows around 0.34% results with an error of 1 ulp (3424869 results) while the new implementation only shows 0.002% of total (18851). The performance result are also only slight worse than current implementation. On x86_64 (Ryzen 5900X) with gcc 12: Before: "hypot": { "workload-random": { "duration": 3.73319e+09, "iterations": 1.12e+08, "reciprocal-throughput": 22.8737, "latency": 43.7904, "max-throughput": 4.37184e+07, "min-throughput": 2.28361e+07 } } After: "hypot": { "workload-random": { "duration": 3.7597e+09, "iterations": 9.8e+07, "reciprocal-throughput": 23.7547, "latency": 52.9739, "max-throughput": 4.2097e+07, "min-throughput": 1.88772e+07 } } Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Adhemerval Zanella	7fe0ace3e2	math: Simplify hypotf implementation Use a more optimized comparison for check for NaN and infinite and add an inlined issignaling implementation for float. With gcc it results in 2 FP comparisons. The file Copyright is also changed to use GPL, the implementation was completely changed by `7c10fd3515` to use double precision instead of scaling and this change removes all the GET_FLOAT_WORD usage. Checked on x86_64-linux-gnu.	2021-12-13 09:02:30 -03:00
Siddhesh Poyarekar	5afe4c0d69	Cleanup encoding in comments Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8 equivalents so that files don't end up with mixed encodings. With this, all files (except tests that actually test different encodings) have a single encoding. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-13 10:01:45 +05:30
Siddhesh Poyarekar	23645707f1	Replace --enable-static-pie with --disable-default-pie Build glibc programs and tests as PIE by default and enable static-pie automatically if the architecture and toolchain supports it. Also add a new configuration option --disable-default-pie to prevent building programs as PIE. Only the following architectures now have PIE disabled by default because they do not work at the moment. hppa, ia64, alpha and csky don't work because the linker is unable to handle a pcrel relocation generated from PIE objects. The microblaze compiler is currently failing with an ICE. GNU hurd tries to enable static-pie, which does not work and hence fails. All these targets have default PIE disabled at the moment and I have left it to the target maintainers to enable PIE on their targets. build-many-glibcs runs clean for all targets. I also tested x86_64 on Fedora and Ubuntu, to verify that the default build as well as --disable-default-pie work as expected with both system toolchains. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-13 08:08:59 +05:30
Samuel Thibault	556a6126f8	hurd: Add rules for static PIE build This fixes [BZ #28671].	2021-12-12 00:42:13 +01:00
Samuel Thibault	26803075e4	hurd: Fix gmon-static We need to use crt0 for gmon-static too.	2021-12-12 00:42:12 +01:00
H.J. Lu	ea5814467a	x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656 ] Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since the first PT_LOAD segment is no longer executable due to defaulting to -z separate-code. This fixes [BZ #28656]. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-10 14:01:34 -08:00
Florian Weimer	5cc3385654	nptl: Add one more barrier to nptl/tst-create1 Without the bar_ctor_finish barrier, it was possible that thread2 re-locked user_lock before ctor had a chance to lock it. ctor then blocked in its locking operation, xdlopen from the main thread did not return, and thread2 was stuck waiting in bar_dtor: thread 1: started. thread 2: started. thread 2: locked user_lock. constructor started: 0. thread 1: in ctor: started. thread 3: started. thread 3: done. thread 2: unlocked user_lock. thread 2: locked user_lock. Fixes the test in commit `83b5323261` ("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-10 11:51:25 +01:00
Florian Weimer	627f5ede70	Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely introduced to support a configuration where the thread pointer has not the same alignment as THREAD_SELF. Only ia64 seems to use that, but for the stack/pointer guard, not for storing tcbhead_t. Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift the thread pointer, potentially landing in a different residue class modulo the alignment, but the changes should not impact that. In general, given that TLS variables have their own alignment requirements, having different alignment for the (unshifted) thread pointer and struct pthread would potentially result in dynamic offsets, leading to more complexity. hppa had different values before: __alignof__ (tcbhead_t), which seems to be 4, and __alignof__ (struct pthread), which was 8 (old default) and is now 32. However, it defines THREAD_SELF as: /* Return the thread descriptor for the current thread. / # define THREAD_SELF \ ({ struct pthread __self; \ __self = __get_cr27(); \ __self - 1; \ }) So the thread pointer points after struct pthread (hence __self - 1), and they have to have the same alignment on hppa as well. Similarly, on ia64, the definitions were different. We have: # define TLS_PRE_TCB_SIZE \ (sizeof (struct pthread) \ + (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \ ? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \ & ~(__alignof__ (struct pthread) - 1)) \ : 0)) # define THREAD_SELF \ ((struct pthread ) ((char ) __thread_self - TLS_PRE_TCB_SIZE)) And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment (confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c). On m68k, we have a larger gap between tcbhead_t and struct pthread. But as far as I can tell, the port is fine with that. The definition of TCB_OFFSET is sufficient to handle the shifted TCB scenario. This fixes commit `23c77f6018` ("nptl: Increase default TCB alignment to 32"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-09 23:47:49 +01:00
Florian Weimer	c901c3e764	nptl: Add public rseq symbols and <sys/rseq.h> The relationship between the thread pointer and the rseq area is made explicit. The constant offset can be used by JIT compilers to optimize rseq access (e.g., for really fast sched_getcpu). Extensibility is provided through __rseq_size and __rseq_flags. (In the future, the kernel could request a different rseq size via the auxiliary vector.) Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	e3e589829d	nptl: Add glibc.pthread.rseq tunable to control rseq registration This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-09 09:49:32 +01:00
Florian Weimer	1d350aa060	Linux: Use rseq to accelerate sched_getcpu Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	95e114a091	nptl: Add rseq registration The rseq area is placed directly into struct pthread. rseq registration failure is not treated as an error, so it is possible that threads run with inconsistent registration status. <sys/rseq.h> is not yet installed as a public header. Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-09 09:49:32 +01:00
Florian Weimer	8d1927d8dc	nptl: Introduce THREAD_GETMEM_VOLATILE This will be needed for rseq TCB access. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	ce2248ab91	nptl: Introduce <tcb-access.h> for THREAD_* accessors These are common between most architectures. Only the x86 targets are outliers. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	8dbeb0561e	nptl: Add <thread_pointer.h> for defining __thread_pointer <tls.h> already contains a definition that is quite similar, but it is not consistent across architectures. Only architectures for which rseq support is added are covered. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
H.J. Lu	ceeffe968c	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used.	2021-12-06 07:14:12 -08:00
Adhemerval Zanella	a329f68f2e	linux: Add generic ioctl implementation The powerpc is refactor to use the default implementation.	2021-12-06 08:03:18 -03:00
Adhemerval Zanella	00baddbb93	linux: Add generic syscall implementation It allows also to remove hppa specific implementation and simplify riscv implementation a bit.	2021-12-06 08:03:11 -03:00
Florian Weimer	4fb4e7e821	csu: Always use __executable_start in gmon-start.c Current binutils defines __executable_start as the lowest text address, so using the entry point address as a fallback is no longer necessary. As a result, overriding <entry.h> is only necessary if the entry point is not called _start. The previous approach to define __ASSEMBLY__ to suppress the declaration breaks if headers included by <entry.h> are not compatible with __ASSEMBLY__. This happens with rseq integration because it is necessary to include kernel headers in more places. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-05 13:50:21 +01:00
Florian Weimer	c1cb2deeca	elf: execve statically linked programs instead of crashing [BZ #28648 ] Programs without dynamic dependencies and without a program interpreter are now run via execve. Previously, the dynamic linker either crashed while attempting to read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT data), or the self-relocated in the static PIE executable crashed because the outer dynamic linker had already applied RELRO protection. <dl-execve.h> is needed because execve is not available in the dynamic loader on Hurd. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-05 11:28:34 +01:00
Noah Goldstein	4df1fa6ddc	x86-64: Use notl in EVEX strcmp [BZ #28646 ] Must use notl %edi here as lower bits are for CHAR comparisons potentially out of range thus can be 0 without indicating mismatch. This fixes BZ #28646. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>	2021-12-03 21:14:11 -08:00
Florian Weimer	23c77f6018	nptl: Increase default TCB alignment to 32 rseq support will use a 32-byte aligned field in struct pthread, so the whole struct needs to have at least that alignment. nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h> to obtain the fallback definition. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-03 20:43:31 +01:00
Wilco Dijkstra	b31bd11454	AArch64: Improve A64FX memcpy v2 is a complete rewrite of the A64FX memcpy. Performance is improved by streamlining the code, aligning all large copies and using a single unrolled loop for all sizes. The code size for memcpy and memmove goes down from 1796 bytes to 868 bytes. Performance is better in all cases: bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33% faster for large sizes, bench-memcpy-walk is 25% faster for small sizes and 20% for the largest sizes. The geomean of all tests in bench-memcpy is 5.1% faster, and total time is reduced by 4%. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-02 18:36:03 +00:00
Wilco Dijkstra	b51eb35c57	AArch64: Optimize memcmp Rewrite memcmp to improve performance. On small and medium inputs performance is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per iteration, which is 30-50% faster depending on the size. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-02 18:35:53 +00:00
Matheus Castanho	d120fb9941	powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532 ] Syscalls based on the assembly templates are missing CFI for r31, which gets clobbered when scv is used, and info for LR is inaccurate, placed in the wrong LOC and not using the proper offset. LR was also being saved to the callee's frame, while the ABI mandates it to be saved to the caller's frame. These are fixed by this commit. After this change: $ readelf -wF libc.so.6 \| grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6 00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c LOC CFA r31 ra 000000000004b9d4 r1+0 u u 000000000004b9e4 r1+48 u u 000000000004b9e8 r1+48 c-16 u 000000000004b9fc r1+48 c-16 c+16 000000000004ba08 r1+48 c-16 000000000004ba18 r1+48 u 000000000004ba1c r1+0 u libc.so.6: file format elf64-powerpcle Disassembly of section .text: 000000000004b9d4 <kill>: 4b9d4: 1f 00 4c 3c addis r2,r12,31 4b9d8: 2c c3 42 38 addi r2,r2,-15572 4b9dc: 25 00 00 38 li r0,37 4b9e0: d1 ff 21 f8 stdu r1,-48(r1) 4b9e4: 20 00 e1 fb std r31,32(r1) 4b9e8: 98 8f ed eb ld r31,-28776(r13) 4b9ec: 10 00 ff 77 andis. r31,r31,16 4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38> 4b9f4: a6 02 28 7d mflr r9 4b9f8: 40 00 21 f9 std r9,64(r1) 4b9fc: 01 00 00 44 scv 0 4ba00: 40 00 21 e9 ld r9,64(r1) 4ba04: a6 03 28 7d mtlr r9 4ba08: 08 00 00 48 b 4ba10 <kill+0x3c> 4ba0c: 02 00 00 44 sc 4ba10: 00 00 bf 2e cmpdi cr5,r31,0 4ba14: 20 00 e1 eb ld r31,32(r1) 4ba18: 30 00 21 38 addi r1,r1,48 4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60> 4ba20: 01 f0 20 39 li r9,-4095 4ba24: 40 48 23 7c cmpld r3,r9 4ba28: 20 00 e0 4d bltlr+ 4ba2c: d0 00 63 7c neg r3,r3 4ba30: 08 00 00 48 b 4ba38 <kill+0x64> 4ba34: 20 00 e3 4c bnslr+ 4ba38: c8 32 fe 4b b 2ed00 <__syscall_error> ... 4ba44: 40 20 0c 00 .long 0xc2040 4ba48: 68 00 00 00 .long 0x68 4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3 4ba50: 6b 69 6c 6c xoris r12,r3,26987	2021-11-30 15:18:52 -03:00
Adhemerval Zanella	efc6b2dbc4	linux: Implement pipe in terms of __NR_pipe2 The syscall pipe2 was added in linux 2.6.27 and glibc requires linux 3.2.0. The patch removes the arch-specific implementation for alpha, ia64, mips, sh, and sparc which requires a different kernel ABI than the usual one. Checked on x86_64-linux-gnu and with a build for the affected ABIs.	2021-11-30 13:13:03 -03:00
Adhemerval Zanella	5b3e31e312	linux: Implement mremap in C Variadic function calls in syscalls.list does not work for all ABIs (for instance where the argument are passed on the stack instead of registers) and might have underlying issues depending of the variadic type (for instance if a 64-bit argument is used). Checked on x86_64-linux-gnu.	2021-11-30 13:13:03 -03:00
Adhemerval Zanella	83008fa495	linux: Add prlimit64 C implementation The LFS prlimit64 requires a arch-specific implementation in syscalls.list. Instead add a generic one that handles the required symbol alias for __RLIM_T_MATCHES_RLIM64_T. HPPA is the only outlier which requires a different default symbol. Checked on x86_64-linux-gnu and with build for the affected ABIs.	2021-11-30 13:13:03 -03:00
Adhemerval Zanella	137ed5ac44	linux: Use /proc/stat fallback for __get_nprocs_conf (BZ #28624 ) The /proc/statm fallback was removed by `f13fb81ad3` if sysfs is not available, reinstate it. Checked on x86_64-linux-gnu.	2021-11-25 11:00:42 -03:00
Adhemerval Zanella	d150181d73	linux: Add fanotify_mark C implementation Passing 64-bit arguments on syscalls.list is tricky: it requires to reimplement the expected kernel abi in each architecture. This is way to better to represent in C code where we already have macros for this (SYSCALL_LL64). Checked on x86_64-linux-gnu.	2021-11-25 09:56:57 -03:00
Adhemerval Zanella	c3b023a782	linux: Only build fstatat fallback if required For 32-bit architecture with __ASSUME_STATX there is no need to build fstatat64_time64_stat. Checked on i686-linux-gnu.	2021-11-25 09:28:27 -03:00
Sunil K Pandey	c58d3b7d00	x86-64: Add vector sin/sinf to libmvec microbenchmark Add vector sin/sinf and input files to libmvec microbenchmark. libmvec-sin-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-sinf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:50:23 -08:00
Sunil K Pandey	6a556bac81	x86-64: Add vector pow/powf to libmvec microbenchmark Add vector pow/powf and input files to libmvec microbenchmark. libmvec-pow-inputs: arg1: 90% Normal random distribution range: (0.0, 256.0) mean: 0.0 sigma: 32.0 10% uniform random distribution in range (0.0, 256.0) arg2: 90% Normal random distribution range: (-127.0, 127.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-127.0, 127.0) libmvec-powf-inputs: arg1: 90% Normal random distribution range: (0.0f, 100.0f) mean: 0.0f sigma: 16.0f 10% uniform random distribution in range (0.0f, 100.0f) arg2: 90% Normal random distribution range: (-10.0f, 10.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-10.0f, 10.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:49:14 -08:00
Sunil K Pandey	8ab8afb336	x86-64: Add vector log/logf to libmvec microbenchmark Add vector log/logf and input files to libmvec microbenchmark. libmvec-log-inputs: 70% Normal random distribution range: (0.0, DBL_MAX) mean: 1.0 sigma: 50.0 30% uniform random distribution in range (0.0, DBL_MAX) libmvec-logf-inputs: 70% Normal random distribution range: (0.0f, FLT_MAX) mean: 1.0f sigma: 50.0f 30% uniform random distribution in range (0.0f, FLT_MAX) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:48:14 -08:00
Sunil K Pandey	37df38bd5f	x86-64: Add vector exp/expf to libmvec microbenchmark Add vector exp/expf and input files to libmvec microbenchmark. libmvec-exp-inputs: 90% Normal random distribution range: (-708.0, 709.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-500.0, 500.0) libmvec-expf-inputs: 90% Normal random distribution range: (-87.0f, 88.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-50.0f, 50.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:46:59 -08:00
Sunil K Pandey	4443695598	x86-64: Add vector cos/cosf to libmvec microbenchmark Add vector cos/cosf and input files to libmvec microbenchmark. libmvec-cos-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-cosf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:45:20 -08:00
Adhemerval Zanella	456b3c08b6	io: Refactor close_range and closefrom Now that Hurd implementis both close_range and closefrom (`f2c996597d`), we can make close_range() a base ABI, and make the default closefrom() implementation on top of close_range(). The generic closefrom() implementation based on __getdtablesize() is moved to generic close_range(). On Linux it will be overriden by the auto-generation syscall while on Hurd it will be a system specific implementation. The closefrom() now calls close_range() and __closefrom_fallback(). Since on Hurd close_range() does not fail, __closefrom_fallback() is an empty static inline function set by__ASSUME_CLOSE_RANGE. The __ASSUME_CLOSE_RANGE also allows optimize Linux __closefrom_fallback() implementation when --enable-kernel=5.9 or higher is used. Finally the Linux specific tst-close_range.c is moved to io and enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are guarded so it can be built for Hurd (I have not actually test it). Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu build.	2021-11-24 09:09:37 -03:00
Florian Weimer	e186fc5a31	nptl: Do not set signal mask on second setjmp return [BZ #28607 ] __libc_signal_restore_set was in the wrong place: It also ran when setjmp returned the second time (after pthread_exit or pthread_cancel). This is observable with blocked pending signals during thread exit. Fixes commit `b3cae39dcb` ("nptl: Start new threads with all signals blocked [BZ #25098]"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-24 08:59:54 +01:00
Adhemerval Zanella	aac54dcd37	powerpc: Define USE_PPC64_NOTOC iff compiler supports it The @notoc usage only yields an advantage on ISA 3.1+ machine (power10) and for ld.bfd also when it sees pcrel relocations used on the code (generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+ instruction on stubs are used iff linker also sees the new pc-relative relocations (for instance R_PPC64_D34), otherwise it generates default stubs (ppc64_elf_check_relocs:4700). This patch also help on linkers that do not implement this optimization, since building for older ISA (such as 3.0 / power9) will also trigger power10 stubs generation in the assembly code uses the NOTOC imacro. Checked on powerpc64le-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2021-11-22 14:49:11 -03:00
Adhemerval Zanella	bc801b3a40	setjmp: Replace jmp_buf-macros.h with jmp_buf-macros.sym It requires less boilerplate code for newer ports. The _Static_assert checks from internal setjmp are moved to its own internal test since setjmp.h is included early by multiple headers (to generate rtld-sizes.sym). The riscv jmp_buf-macros.h check is also redundant, it is already done by riscv configure.ac. Checked with a build for the affected architectures.	2021-11-22 13:43:22 -03:00
Joseph Myers	5c3ece451d	Update kernel version to 5.15 in tst-mman-consts.py This patch updates the kernel version in the test tst-mman-consts.py to 5.15. (There are no new MAP_* constants covered by this test in 5.15 that need any other header changes.) Tested with build-many-glibcs.py.	2021-11-22 15:30:12 +00:00
Joseph Myers	bdeb7a8fa9	Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add these constants to bits/socket.h. Tested for x86_64.	2021-11-17 14:25:16 +00:00
Florian Weimer	f1d333b5bf	elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres This will be used to deallocate memory allocated using the non-minimal malloc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:29 +01:00
Florian Weimer	8bd336a00a	nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c And make it an installed header. This addresses a few aliasing violations (which do not seem to result in miscompilation due to the use of atomics), and also enables use of wide counters in other parts of the library. The debug output in nptl/tst-cond22 has been adjusted to print the 32-bit values instead because it avoids a big-endian/little-endian difference. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:13 +01:00
Sunil K Pandey	a43c0b5483	x86-64: Create microbenchmark infrastructure for libmvec Add python script to generate libmvec microbenchmark from the input values for each libmvec function using skeleton benchmark template. Creates double and float benchmarks with vector length 1, 2, 4, 8, and 16 for each libmvec function. Vector length 1 corresponds to scalar version of function and is included for vector function perf comparison. Co-authored-by: Haochen Jiang <haochen.jiang@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-16 11:37:39 -08:00
Noah Goldstein	2f9062d717	x86: Shrink memcmp-sse4.S code size No bug. This implementation refactors memcmp-sse4.S primarily with minimizing code size in mind. It does this by removing the lookup table logic and removing the unrolled check from (256, 512] bytes. memcmp-sse4 code size reduction : -3487 bytes wmemcmp-sse4 code size reduction: -1472 bytes The current memcmp-sse4.S implementation has a large code size cost. This has serious adverse affects on the ICache / ITLB. While in micro-benchmarks the implementations appears fast, traces of real-world code have shown that the speed in micro benchmarks does not translate when the ICache/ITLB are not primed, and that the cost of the code size has measurable negative affects on overall application performance. See https://research.google/pubs/pub48320/ for more details. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-10 20:12:10 -06:00
Joseph Myers	3387c40a8b	Update syscall lists for Linux 5.15 Linux 5.15 has one new syscall, process_mrelease (and also enables the clone3 syscall for RV32). It also has a macro __NR_SYSCALL_MASK for Arm, which is not a syscall but matches the pattern used for syscall macro names. Add __NR_SYSCALL_MASK to the names filtered out in the code dealing with syscall lists, update syscall-names.list for the new syscall and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py.	2021-11-10 15:21:19 +00:00
Florian Weimer	98966749f2	s390: Use long branches across object boundaries (jgh instead of jh) Depending on the layout chosen by the linker, the 16-bit displacement of the jh instruction is insufficient to reach the target label. Analysis of the linker failure was carried out by Nick Clifton. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Stefan Liebler <stli@linux.ibm.com>	2021-11-10 15:21:37 +01:00
H.J. Lu	0bd356df1a	Remove the unused +mkdep/+make-deps/s-proto.S/s-proto-cancel.S Since commit `d73f5331ce` Author: Roland McGrath <roland@gnu.org> Date: Fri May 2 02:20:45 2003 +0000 2003-05-01 Roland McGrath <roland@redhat.com> dependency is generated by passing -MD -MF to compiler. Remove the unused +mkdep, +make-deps, s-proto.S and s-proto-cancel.S. This fixes BZ #28554.	2021-11-10 04:54:18 -08:00
Adhemerval Zanella	824dd3ec49	Fix build a chec failures after `b05fae4d8e` The include cleanup on dl-minimal.c removed too much for some targets. Also for Hurd, __sbrk is removed from localplt.data now that tunables allocated memory through mmap. Checked with a build for all affected architectures.	2021-11-09 23:21:22 -03:00
Adhemerval Zanella	b05fae4d8e	elf: Use the minimal malloc on tunables_strdup The rtld_malloc functions are moved to its own file so it can be used on csu code. Also, the functiosn are renamed to __minimal_* (since there are now used not only on loader code). Using the __minimal_malloc on tunables_strdup() avoids potential issues with sbrk() calls while processing the tunables (I see sporadic elf/tst-dso-ordering9 on powerpc64le with different tests failing due ASLR). Also, using __minimal_malloc over plain mmap optimizes the memory allocation on both static and dynamic case (since it will any unused space in either the last page of data segments, avoiding mmap() call, or from the previous mmap() call). Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-11-09 14:11:25 -03:00
Samuel Thibault	d41985b71e	hurd: Remove unused __libc_close_range That was just cargo-culted.	2021-11-07 16:23:51 +01:00
Sergey Bugaev	f2c996597d	hurd: Implement close_range and closefrom The close_range () function implements the same API as the Linux and FreeBSD syscalls. It operates atomically and reliably. The specified upper bound is clamped to the actual size of the file descriptor table; it is expected that the most common use case is with last = UINT_MAX. Like in the Linux syscall, it is also possible to pass the CLOSE_RANGE_CLOEXEC flag to mark the file descriptors in the range cloexec instead of acually closing them. Also, add a Hurd version of the closefrom () function. Since unlike on Linux, close_range () cannot fail due to being unuspported by the running kernel, a fallback implementation is never necessary. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20211106153524.82700-1-bugaevc@gmail.com>	2021-11-07 16:16:11 +01:00
Noah Goldstein	475b63702e	x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:18:08 -05:00
Noah Goldstein	a6b7502ec0	x86: Optimize memmove-vec-unaligned-erms.S No bug. The optimizations are as follows: 1) Always align entry to 64 bytes. This makes behavior more predictable and makes other frontend optimizations easier. 2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have significant benefits in the case that: 0 < (dst - src) < [256, 512] 3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%] improvement and for FSRM [-10%, 25%]. In addition to these primary changes there is general cleanup throughout to optimize the aligning routines and control flow logic. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:18:03 -05:00
Paul A. Clarke	9fea0f1a2a	[powerpc] Tighten contraints for asm constant parameters There are a few places where only known numeric values are acceptable for `asm` parameters, yet the constraint "i" is used. "i" can include "symbolic constants whose values will be known only at assembly time or later." Use "n" instead of "i" where known numeric values are required. Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2021-11-03 09:17:28 -05:00
Adhemerval Zanella	09f214528c	riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN It allows build both glibc and tests with lld (Since lld does not support R_RISCV_ALIGN linker relaxation). Checked with a build for riscv32-linux-gnu-rv32imafdc-ilp32d and riscv64-linux-gnu-rv64imafdc-lp64d. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Fangrui Song <maskray@google.com>	2021-11-03 09:25:06 -03:00
Fangrui Song	6720d36b66	x86-64: Replace movzx with movzbl Clang cannot assemble movzx in the AT&T dialect mode. ../sysdeps/x86_64/strcmp.S:2232:16: error: invalid operand for instruction movzx (%rsi), %ecx ^~~~ Change movzx to movzbl, which follows the AT&T dialect and is used elsewhere in the file. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-02 20:59:52 -07:00
Florian Weimer	cca75bd8b5	i386: Explain why __HAVE_64B_ATOMICS has to be 0	2021-11-02 10:26:23 +01:00
Adhemerval Zanella	613cb5c7b1	arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors support The lld linker does not support TLSDESC for arm. The have-arm-tls-desc is a leftover of `56583289b1` to support NaCL. Reviewed-by: Fangrui Song <maskray@google.com>	2021-11-01 16:23:15 -03:00
Adhemerval Zanella	d6dea8c847	arm: Use internal symbol for _dl_argv on _dl_start_user The lld does not support R_ARM_GOTOFF32 to preemptible symbol (_dl_argv has default visibility). Use the internal alias instead (one option would to use HIDDEN_JUMPTARGET, bu the macro is not defined for !__ASSEMBLER__ and I made this patch arm-specific to avoid require to check extensivelly on other architecture it this might break something). Checked on arm-linux-gnueabihf. Reviewed-by: Fangrui Song <maskray@google.com>	2021-11-01 16:21:53 -03:00
H.J. Lu	14dbbf46a0	x86-64: Remove Prefer_AVX2_STRCMP Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake.	2021-11-01 07:53:04 -07:00
H.J. Lu	c46e9afb2d	x86-64: Improve EVEX strcmp with masked load In strcmp-evex.S, to compare 2 32-byte strings, replace VMOVU (%rdi, %rdx), %YMM0 VMOVU (%rsi, %rdx), %YMM1 /* Each bit in K0 represents a mismatch in YMM0 and YMM1. / VPCMP $4, %YMM0, %YMM1, %k0 VPCMP $0, %YMMZERO, %YMM0, %k1 VPCMP $0, %YMMZERO, %YMM1, %k2 / Each bit in K1 represents a NULL in YMM0 or YMM1. / kord %k1, %k2, %k1 / Each bit in K1 represents a NULL or a mismatch. / kord %k0, %k1, %k1 kmovd %k1, %ecx testl %ecx, %ecx jne L(last_vector) with VMOVU (%rdi, %rdx), %YMM0 VPTESTM %YMM0, %YMM0, %k2 / Each bit cleared in K1 represents a mismatch or a null CHAR in YMM0 and 32 bytes at (%rsi, %rdx). */ VPCMP $0, (%rsi, %rdx), %YMM0, %k1{%k2} kmovd %k1, %ecx incl %ecx jne L(last_vector) It makes EVEX strcmp faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake. Co-Authored-By: Noah Goldstein <goldstein.w.n@gmail.com>	2021-11-01 07:52:56 -07:00
Stafford Horne	6446c725d4	Fix compiler issue with mmap_internal Compiling mmap_internal fails to compile when we use -1 for MMAP2_PAGE_UNIT on 32 bit architectures. The error is as follows: ../sysdeps/unix/sysv/linux/mmap_internal.h:30:8: error: unknown type name 'uint64_t' \| 30 \| static uint64_t page_unit; \| \| ^~~~~~~~ Fix by adding including stdint.h.	2021-10-29 09:21:37 -03:00
Noah Goldstein	1d56fd3bae	x86_64: Add memcmpeq.S to fix disable-multi-arch build The following commit: commit `cf4fd28ea4` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Tue Oct 26 19:43:18 2021 -0500 Broke --disable-multi-arch build for x86_64 because x86_64/memcmpeq.S was not defined outside of multiarch and the alias for __memcmpeq in x86_64/memcmp.S was removed. This commit fixes that issue by adding x86_64/memcmpeq.S. make xcheck passes on x86_64 with and without --disable-multi-arch	2021-10-28 16:35:50 -05:00
Fangrui Song	6838920383	riscv: Fix incorrect jal with HIDDEN_JUMPTARGET A non-local STV_DEFAULT defined symbol is by default preemptible in a shared object. j/jal cannot target a preemptible symbol. On other architectures, such a jump instruction either causes PLT [BZ #18822], or if short-ranged, sometimes rejected by the linker (but not by GNU ld's riscv port [ld PR/28509]). Use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead. With this patch, ld.so and libc.so can be linked with LLD if source files are compiled/assembled with -mno-relax/-Wa,-mno-relax. Acked-by: Palmer Dabbelt <palmer@dabbelt.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-28 11:39:49 -07:00
Noah Goldstein	9b7cfab180	x86_64: Add evex optimized __memcmpeq in memcmpeq-evex.S No bug. This commit adds new optimized __memcmpeq implementation for evex. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.	2021-10-27 13:03:46 -05:00
Noah Goldstein	b4ed69ba16	x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.S No bug. This commit adds new optimized __memcmpeq implementation for avx2. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.	2021-10-27 13:03:46 -05:00
Noah Goldstein	fa7f63d8d6	x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.S No bug. This commit does not modify any of the memcmp implementation. It just adds __memcmpeq ifdefs to skip obvious cases where computing the proper 1/-1 required by memcmp is not needed.	2021-10-27 13:03:46 -05:00
Noah Goldstein	cf4fd28ea4	x86_64: Add support for __memcmpeq using sse2, avx2, and evex No bug. This commit adds support for __memcmpeq to be implemented seperately from memcmp. Support is added for versions optimized with sse2, avx2, and evex.	2021-10-27 13:03:46 -05:00
Noah Goldstein	9894127d20	String: Add hidden defs for __memcmpeq() to enable internal usage No bug. This commit adds hidden defs for all declarations of __memcmpeq. This enables usage of __memcmpeq without the PLT for usage internal to GLIBC.	2021-10-26 16:51:29 -05:00
Noah Goldstein	44829b3ddb	String: Add support for __memcmpeq() ABI on all targets No bug. This commit adds support for __memcmpeq() as a new ABI for all targets. In this commit __memcmpeq() is implemented only as an alias to the corresponding targets memcmp() implementation. __memcmpeq() is added as a new symbol starting with GLIBC_2.35 and defined in string.h with comments explaining its behavior. Basic tests that it is callable and works where added in string/tester.c As discussed in the proposal "Add new ABI '__memcmpeq()' to libc" __memcmpeq() is essentially a reserved namespace for bcmp(). The means is shares the same specifications as memcmp() except the return value for non-equal byte sequences is any non-zero value. This is less strict than memcmp()'s return value specification and can be better optimized when a boolean return is all that is needed. __memcmpeq() is meant to only be called by compilers if they can prove that the return value of a memcmp() call is only used for its boolean value. All tests in string/tester.c passed. As well build succeeds on x86_64-linux-gnu target.	2021-10-26 16:51:29 -05:00
Fangrui Song	8438135d34	configure: Don't check LD -v --help for LIBC_LINKER_FEATURE When LIBC_LINKER_FEATURE is used to check a linker option with the equal sign, it will likely fail because the LD -v --help output may look like `-z lam-report=[none\|warning\|error]` while the needle is something like `-z lam-report=warning`. The LD -v --help filter doesn't save much time, so just remove it.	2021-10-25 13:17:44 -07:00
Noah Goldstein	bad852b61b	x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. it could potentially be dangerous to use SSE2 if this function is ever called without using 'vzeroupper' beforehand. While compilers appear to use 'vzeroupper' before function calls if AVX2 has been used, using SSE2 here is more brittle. Since it is not absolutely necessary it should be avoided. It costs 2-extra bytes but the extra bytes should only eat into alignment padding. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-23 13:02:42 -05:00
Sunil K Pandey	4f690aad9e	x86_64: Add missing libmvec ABI tests Add vector ABI tests for cos, exp, log, pow and sin functions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-22 06:46:49 -07:00
Adhemerval Zanella	927246e188	elf: Fix `e6fd79f379` build with --enable-tunables=no The _dl_sort_maps_init() is not defined when tunables is not enabled. Checked on x86_64-linux-gnu.	2021-10-21 17:26:32 -03:00
Chung-Lin Tang	15a0c5730d	elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645 ) This second patch contains the actual implementation of a new sorting algorithm for shared objects in the dynamic loader, which solves the slow behavior that the current "old" algorithm falls into when the DSO set contains circular dependencies. The new algorithm implemented here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Another notable implementation adjustment related to this _dl_sort_maps change is the removing of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes. This has been changed to simply use two new bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes do along the way. Tunable support for switching between different sorting algorithms at runtime is also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1 (old algorithm) and 2 (new DFS algorithm) has been added. At time of commit of this patch, the default setting is 1 (old algorithm). Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-21 11:23:53 -03:00
Fangrui Song	aa783f9a7b	linux: Fix a possibly non-constant expression in _Static_assert According to C11 6.6p6, `const int` as an operand may not make up a constant expression. GCC -O0 errors: ../sysdeps/unix/sysv/linux/opendir.c:107:19: error: static_assert expression is not an integral constant expression _Static_assert (allocation_size >= sizeof (struct dirent64), -O2 -Wpedantic has a similar warning. See https://gcc.gnu.org/PR102502 for GCC's inconsistency. Use enum which is guaranteed to be a constant expression. This also makes the file compilable with Clang. Fixes: `4b962c9e85` ("linux: Simplify opendir buffer allocation")	2021-10-20 14:22:43 -07:00
H.J. Lu	d962cce139	x86-64: Add sysdeps/x86_64/fpu/Makeconfig 1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which contains libmvec ABI test dependencies and CFLAGS, in the build directory. 2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS. Tested on SSE4, AVX, AVX2 and AVX512 machines. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-20 11:53:45 -07:00
Adhemerval Zanella	82fd7314c7	powerpc: Remove backtrace implementation The powerpc optimization to provide a fast stacktrace requires some ad-hoc code to handle Linux signal frames and the change is fragile once the kernel decides to slight change its execution sequence [1]. The generic implementation work as-is and it should be future proof since the kernel provides the expected CFI directives in vDSO shared page. Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and powerpc64-linux-gnu. [1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html	2021-10-20 10:40:53 -03:00
H.J. Lu	2ec99d8c42	ld.so: Initialize bootstrap_map.l_ld_readonly [BZ #28340 ] 1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly before calling elf_get_dynamic_info to get dynamic info in bootstrap_map, 2. Define a single static inline bool dl_relocate_ld (const struct link_map l) { / Don't relocate dynamic section if it is readonly */ return !(l->l_ld_readonly \|\| DL_RO_DYN_SECTION); } This updates BZ #28340 fix.	2021-10-19 06:40:38 -07:00
Stafford Horne	1d550265a7	timex: Use 64-bit fields on 32-bit TIMESIZE=64 systems (BZ #28469 ) This was found when testing the OpenRISC port I am working on. These two tests fail with SIGSEGV: FAIL: misc/tst-ntp_gettime FAIL: misc/tst-ntp_gettimex This was found to be due to the kernel overwriting the stack space allocated by the timex structure. The reason for the overwrite being that the kernel timex has 64-bit fields and user space code only allocates enough stack space for timex with 32-bit fields. On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined. This causes the timex structure to use 32-bit fields with type __syscall_slong_t. This patch adjusts the ifdef condition to allow 32-bit systems with TIMESIZE=64 to use the 64-bit long long timex definition. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-18 17:17:20 -03:00
Samuel Thibault	1d3decee99	hurd if_index: Explicitly use AF_INET for if index discovery `5bf07e1b3a` ("Linux: Simplify __opensock and fix race condition [BZ #28353]") made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET knows about network interfaces, so better actually specify that in if_index.	2021-10-18 01:39:02 +02:00
Samuel Thibault	1d20f33ff4	hurd: Fix intr-msg parameter/stack kludge INTR_MSG_TRAP was tinkering with esp to make it point to _hurd_intr_rpc_mach_msg's parameters, and notably use (&msg)[-1] which is meaningless in C. Instead, just push the parameters on the stack, which also avoids leaving local variables of _hurd_intr_rpc_mach_msg below esp. We now also properly express that OPTION and TIMEOUT may be updated during the trap call.	2021-10-18 00:50:41 +02:00
H.J. Lu	9d3c9a046a	x86-64: Add test-vector-abi.h/test-vector-abi-sincos.h Add templates for vector ABI test and use them for vector sincos/sincosf ABI tests.	2021-10-14 11:59:12 -07:00
Adhemerval Zanella	d6d89608ac	elf: Fix dynamic-link.h usage on rtld.c The `4af6982e4c` fix does not fully handle RTLD_BOOTSTRAP usage on rtld.c due two issues: 1. RTLD_BOOTSTRAP is also used on dl-machine.h on various architectures and it changes the semantics of various machine relocation functions. 2. The elf_get_dynamic_info() change was done sideways, previously to `490e6c62aa` get-dynamic-info.h was included by the first dynamic-link.h include without RTLD_BOOTSTRAP being defined. It means that the code within elf_get_dynamic_info() that uses RTLD_BOOTSTRAP is in fact unused. To fix 1. this patch now includes dynamic-link.h only once with RTLD_BOOTSTRAP defined. The ELF_DYNAMIC_RELOCATE call will now have the relocation fnctions with the expected semantics for the loader. And to fix 2. part of `4af6982e4c` is reverted (the check argument elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP pieces are removed. To reorganize the includes the static TLS definition is moved to its own header to avoid a circular dependency (it is defined on dynamic-link.h and dl-machine.h requires it at same time other dynamic-link.h definition requires dl-machine.h defitions). Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL are moved to its own header. Only ancient ABIs need special values (arm, i386, and mips), so a generic one is used as default. The powerpc Elf64_FuncDesc is also moved to its own header, since csu code required its definition (which would require either include elf/ folder or add a full path with elf/). Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32, and powerpc64le. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-10-14 14:52:07 -03:00
Noah Goldstein	e59ced2384	x86: Optimize memset-vec-unaligned-erms.S No bug. Optimization are 1. change control flow for L(more_2x_vec) to fall through to loop and jump for L(less_4x_vec) and L(less_8x_vec). This uses less code size and saves jumps for length > 4x VEC_SIZE. 2. For EVEX/AVX512 move L(less_vec) closer to entry. 3. Avoid complex address mode for length > 2x VEC_SIZE 4. Slightly better aligning code for the loop from the perspective of code size and uops. 5. Align targets so they make full use of their fetch block and if possible cache line. 6. Try and reduce total number of icache lines that will need to be pulled in for a given length. 7. Include "local" version of stosb target. For AVX2/EVEX/AVX512 jumping to the stosb target in the sse2 code section will almost certainly be to a new page. The new version does increase code size marginally by duplicating the target but should get better iTLB behavior as a result. test-memset, test-wmemset, and test-bzero are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-12 13:38:02 -05:00
Noah Goldstein	1bd8b8d58f	x86: Optimize memcmp-evex-movbe.S for frontend behavior and size No bug. The frontend optimizations are to: 1. Reorganize logically connected basic blocks so they are either in the same cache line or adjacent cache lines. 2. Avoid cases when basic blocks unnecissarily cross cache lines. 3. Try and 32 byte align any basic blocks possible without sacrificing code size. Smaller / Less hot basic blocks are used for this. Overall code size shrunk by 168 bytes. This should make up for any extra costs due to aligning to 64 bytes. In general performance before deviated a great deal dependending on whether entry alignment % 64 was 0, 16, 32, or 48. These changes essentially make it so that the current implementation is at least equal to the best alignment of the original for any arguments. The only additional optimization is in the page cross case. Branch on equals case was removed from the size == [4, 7] case. As well the [4, 7] and [2, 3] case where swapped as [4, 7] is likely a more hot argument size. test-memcmp and test-wmemcmp are both passing.	2021-10-12 12:02:12 -05:00
Adhemerval Zanella	4af6982e4c	elf: Fix elf_get_dynamic_info definition Before to `490e6c62aa` ('elf: Avoid nested functions in the loader [BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on the first dynamic-link.h include and later within _dl_start(). The former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used on setup_vdso() (since it is a global definition), while the former does define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation. With the commit change, the function is now included and defined once instead of defined as a nested function. So rtld.c defines without defining RTLD_BOOTSTRAP and it brokes at least powerpc32. This patch fixes by moving the get-dynamic-info.h include out of dynamic-link.h, which then the caller can corirectly set the expected semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or RESOLVE_MAP. It also required to enable some asserts only for the loader bootstrap to avoid issues when called from setup_vdso(). As a side note, this is another issues with nested functions: it is not clear from pre-processed output (-E -dD) how the function will be build and its semantic (since nested function will be local and extra C defines may change it). I checked on x86_64-linux-gnu (w/o --enable-static-pie), i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4, aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and s390x-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>	2021-10-12 13:25:43 -03:00
Joseph Myers	4912c738fc	Fix nios2 localplt failure Building for nios2-linux-gnu has recently started showing a localplt test failure, arising from a reference to __floatunsidf from getloadavg after commit `b5c8a3aa82` ("Linux: implement getloadavg(3) using sysinfo(2)") (this is an architecture with soft-fp in libc). Add this as a permitted local PLT reference in localplt.data. Tested with build-many-glibcs.py for nios2-linux-gnu.	2021-10-11 21:47:32 +00:00
Fangrui Song	bf433b849a	elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) Intel MPX failed to gain wide adoption and has been deprecated for a while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019. This patch removes the support code from the dynamic loader. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-11 11:14:02 -07:00
Noah Goldstein	fc5bd179ef	x86: Modify ENTRY in sysdep.h so that p2align can be specified No bug. This change adds a new macro ENTRY_P2ALIGN which takes a second argument, log2 of the desired function alignment. The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this doesn't affect any existing functionality. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-08 11:30:52 -05:00
Cristian Rodríguez	b5c8a3aa82	Linux: implement getloadavg(3) using sysinfo(2) Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-08 09:52:19 -03:00
Fangrui Song	490e6c62aa	elf: Avoid nested functions in the loader [BZ #27220 ] dynamic-link.h is included more than once in some elf/ files (rtld.c, dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested functions. This harms readability and the nested functions usage is the biggest obstacle prevents Clang build (Clang doesn't support GCC nested functions). The key idea for unnesting is to add extra parameters (struct link_map and struct r_scope_elm []) to RESOLVE_MAP, ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a], elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired by Stan Shebs' ppc64/x86-64 implementation in the google/grte/v5-2.27/master which uses mixed extra parameters and static variables.) Future simplification: * If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM, elf_machine_runtime_setup can drop the `scope` parameter. * If TLSDESC no longer need to be in elf_machine_lazy_rel, elf_machine_lazy_rel can drop the `scope` parameter. Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32, sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64. In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu and riscv64-linux-gnu-rv64imafdc-lp64d. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-07 11:55:02 -07:00
H.J. Lu	349b0441da	Add run-time check for indirect external access When performing symbol lookup for references in executable without indirect external access: 1. Disallow copy relocations in executable against protected data symbols in a shared object with indirect external access. 2. Disallow non-zero symbol values of undefined function symbols in executable, which are used as the function pointer, against protected function symbols in a shared object with indirect external access. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-07 10:26:48 -07:00
H.J. Lu	1bd888d0b7	Initial support for GNU_PROPERTY_1_NEEDED 1. Add GNU_PROPERTY_1_NEEDED: #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO to indicate the needed properties by the object file. 2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) to indicate that the object file requires canonical function pointers and cannot be used with copy relocation. 3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-07 10:26:08 -07:00
Stefan Liebler	f2e06656d0	S390: Add PCI_MIO and SIE HWCAPs Both new HWCAPs were introduced in these kernel commits: - 7e8403ecaf884f307b627f3c371475913dd29292 "s390: add HWCAP_S390_PCI_MIO to ELF hwcaps" - 7e82523f2583e9813e4109df3656707162541297 "s390/hwcaps: make sie capability regular hwcap" Also note that the kernel commit 511ad531afd4090625def4d9aba1f5227bd44b8e "s390/hwcaps: shorten HWCAP defines" has shortened the prefix of the macros from "HWCAP_S390_" to "HWCAP_". For compatibility reasons, we do not change the prefix in public glibc header file.	2021-10-07 06:49:39 +02:00
Stefan Liebler	47252e4336	S390: update libm test ulps Update after commit `6bbf729832`. Fixed inaccuracy of j0f (BZ #28185) See also e.g. commit `c75b106145` aarch64: update libm test ulps	2021-10-06 16:34:40 +02:00
Adhemerval Zanella	260d3032ad	powerpc: update libm test ulps Update after commit `6bbf729832` (Fixed inaccuracy of j0f (BZ #28185)).	2021-10-06 10:50:33 -03:00
Adhemerval Zanella	d2b1254db2	y2038: Use a common definition for stat for sparc32 The sparc32 misses support for support done by `4e8521333b`. Checked on sparcv9-linux-gnu.	2021-10-06 08:10:13 -03:00
Szabolcs Nagy	c75b106145	aarch64: update libm test ulps Update after commit `6bbf729832`. Fixed inaccuracy of j0f (BZ #28185)	2021-10-05 13:44:27 +01:00
Paul Zimmermann	6bbf729832	Fixed inaccuracy of j0f (BZ #28185 ) The largest errors over the full binary32 range are after this patch (on x86_64): RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6 RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7 Inputs that were yielding huge errors have been added to "make check". Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>	2021-10-05 13:45:37 +02:00
Szabolcs Nagy	83b5323261	elf: Avoid deadlock between pthread_create and ctors [BZ #28357 ] The fix for bug 19329 caused a regression such that pthread_create can deadlock when concurrent ctors from dlopen are waiting for it to finish. Use a new GL(dl_load_tls_lock) in pthread_create that is not taken around ctors in dlopen. The new lock is also used in __tls_get_addr instead of GL(dl_load_lock). The new lock is held in _dl_open_worker and _dl_close_worker around most of the logic before/after the init/fini routines. When init/fini routines are running then TLS is in a consistent, usable state. In _dl_open_worker the new lock requires catching and reraising dlopen failures that happen in the critical section. The new lock is reinitialized in a fork child, to keep the existing behaviour and it is kept recursive in case malloc interposition or TLS access from signal handlers can retake it. It is not obvious if this is necessary or helps, but avoids changing the preexisting behaviour. The new lock may be more appropriate for dl_iterate_phdr too than GL(dl_load_write_lock), since TLS state of an incompletely loaded module may be accessed. If the new lock can replace the old one, that can be a separate change. Fixes bug 28357. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-04 15:07:05 +01:00
Florian Weimer	eae81d7057	nptl: pthread_kill must send signals to a specific thread [BZ #28407 ] The choice between the kill vs tgkill system calls is not just about the TID reuse race, but also about whether the signal is sent to the whole process (and any thread in it) or to a specific thread. This was caught by the openposix test suite: LTP: openposix test suite - FAIL: SIGUSR1 is member of new thread pendingset. <https://gitlab.com/cki-project/kernel-tests/-/issues/764> Fixes commit `526c3cf11e` ("nptl: Fix race between pthread_kill and thread exit (bug 12889)"). Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2021-10-01 18:16:41 +02:00
Adhemerval Zanella	2313ab153d	nptl: Add CLOCK_MONOTONIC support for PI mutexes Linux added FUTEX_LOCK_PI2 to support clock selection (commit bf22a6976897977b0a3f1aeba6823c959fc4fdae). With the new flag we can now proper support CLOCK_MONOTONIC for pthread_mutex_clocklock with Priority Inheritance. If kernel does not support, EINVAL is returned instead. The difference is the futex operation will be issued and the kernel will advertise the missing support (instead of hard-code error return). Checked on x86_64-linux-gnu and i686-linux-gnu on Linux 5.14, 5.11, and 4.15.	2021-10-01 10:11:11 -03:00
Adhemerval Zanella	8352b6df37	nptl: Use FUTEX_LOCK_PI2 when available This patch uses the new futex PI operation provided by Linux v5.14 when it is required. The futex_lock_pi64() is moved to futex-internal.c (since it used on two different places and its code size might be large depending of the kernel configuration) and clockid is added as an argument. Co-authored-by: Kurt Kanzenbach <kurt@linutronix.de>	2021-10-01 08:09:13 -03:00
Kurt Kanzenbach	dd5adb515c	Linux: Add FUTEX_LOCK_PI2 Linux v5.14.0 introduced a new futex operation called FUTEX_LOCK_PI2. This kernel feature can be used to implement pthread_mutex_clocklock(MONOTONIC)/PI. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-01 08:09:13 -03:00
Adhemerval Zanella	beca615c5e	Update alpha libm-test-ulps	2021-09-30 09:06:21 -03:00
Paul A. Clarke	ee874f44fd	powerpc: Fix unrecognized instruction errors with recent binutils Recent versions of binutils (with commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a) stopped preserving "sticky" options across a base `.machine` directive, nullifying the use of passing "-many" through GCC to the assembler. As a result, some instructions which were recognized even under older, more stringent `.machine` directives become unrecognized instructions in that context. In `sysdeps/powerpc/tst-set_ppr.c`, the use of the `mfppr32` extended mnemonic became unrecognized, as the default compilation with GCC for 32bit powerpc adds a `.machine ppc` in the resulting assembly, so the command line option `-Wa,-many` is essentially ignored, and the ISA 2.06 instructions and mnemonics, like `mfppr32`, are unrecognized. The compilation of `sysdeps/powerpc/tst-set_ppr.c` fails with: Error: unrecognized opcode: `mfppr32' Add appropriate `.machine` directives in the assembly to bracket the `mfppr32` instruction. Part of a 2019 fix (commit `9250e6610f`) to the above test's Makefile to add `-many` to the compilation when GCC itself stopped passing `-many` to the assember no longer has any effect, so remove that. Reported-by: Joseph Myers <joseph@codesourcery.com>	2021-09-29 14:42:20 -05:00
Joseph Myers	90f0ac10a7	Add fmaximum, fminimum functions C2X adds new <math.h> functions for floating-point maximum and minimum, corresponding to the new operations that were added in IEEE 754-2019 because of concerns about the old operations not being associative in the presence of signaling NaNs. fmaximum and fminimum handle NaNs like most <math.h> functions (any NaN argument means the result is a quiet NaN). fmaximum_num and fminimum_num handle both quiet and signaling NaNs the way fmax and fmin handle quiet NaNs (if one argument is a number and the other is a NaN, return the number), but still raise "invalid" for a signaling NaN argument, making them exceptions to the normal rule that a function with a floating-point result raising "invalid" also returns a quiet NaN. fmaximum_mag, fminimum_mag, fmaximum_mag_num and fminimum_mag_num are corresponding functions returning the argument with greatest or least absolute value. All these functions also treat +0 as greater than -0. There are also corresponding <tgmath.h> type-generic macros. Add these functions to glibc. The implementations use type-generic templates based on those for fmax, fmin, fmaxmag and fminmag, and test inputs are based on those for those functions with appropriate adjustments to the expected results. The RISC-V maintainers might wish to add optimized versions of fmaximum_num and fminimum_num (for float and double), since RISC-V (F extension version 2.2 and later) provides instructions corresponding to those functions - though it might be at least as useful to add architecture-independent built-in functions to GCC and teach the RISC-V back end to expand those functions inline, which is what you generally want for functions that can be implemented with a single instruction. Tested for x86_64 and x86, and with build-many-glibcs.py.	2021-09-28 23:31:35 +00:00
Florian Weimer	5bf07e1b3a	Linux: Simplify __opensock and fix race condition [BZ #28353 ] AF_NETLINK support is not quite optional on modern Linux systems anymore, so it is likely that the first attempt will always succeed. Consequently, there is no need to cache the result. Keep AF_UNIX and the Internet address families as a fallback, for the rare case that AF_NETLINK is missing. The other address families previously probed are totally obsolete be now, so remove them. Use this simplified version as the generic implementation, disabling Netlink support as needed.	2021-09-28 18:55:49 +02:00
Stafford Horne	9874ca536b	pthread/tst-cancel28: Fix barrier re-init race condition When running this test on the OpenRISC port I am working on this test fails with a timeout. The test passes when being straced or debugged. Looking at the code there seems to be a race condition in that: 1 main thread: calls xpthread_cancel 2 sub thread : receives cancel signal 3 sub thread : cleanup routine waits on barrier 4 main thread: re-inits barrier 5 main thread: waits on barrier After getting to 5 the main thread and sub thread wait forever as the 2 barriers are no longer the same. Removing the barrier re-init seems to fix this issue. Also, the barrier does not need to be reinitialized as that is done by default. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-09-28 10:47:08 -03:00
Fangrui Song	8e2557a2b8	powerpc: Delete unneeded ELF_MACHINE_BEFORE_RTLD_RELOC Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-09-27 10:12:50 -07:00
Adhemerval Zanella	8f42a98654	posix: Remove spawni.c Although it provide an alternate implementation that communicates using pipe() instead of shared memory, no port uses and it adds extra burden for posix_spawn() extensions. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-09-27 12:44:25 -03:00
H.J. Lu	b0a33dc967	Disable symbol hack in libc_nonshared.a Don't reference __GI_memmove, __GI_memset, __GI_memcpy, __divdi3_internal, __udivdi3_internal and __moddi3_internal in libc_nonshared.a.	2021-09-27 07:46:25 -07:00
Adhemerval Zanella	342298278e	linux: Revert the use of sched_getaffinity on get_nproc (BZ #28310 ) The use of sched_getaffinity on get_nproc and sysconf (_SC_NPROCESSORS_ONLN) done in `903bc7dcc2` (BZ #27645) breaks the top command in common hypervisor configurations and also other monitoring tools. The main issue using sched_getaffinity changed the symbols semantic from system-wide scope of online CPUs to per-process one (which can be changed with kernel cpusets or book parameters in VM). This patch reverts mostly of the `903bc7dcc2`, with the exceptions: * No more cached values and atomic updates, since they are inherent racy. * No /proc/cpuinfo fallback, since /proc/stat is already used and it would require to revert more arch-specific code. * The alloca is replace with a static buffer of 1024 bytes. So the implementation first consult the sysfs, and fallbacks to procfs. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-09-27 09:18:43 -03:00
Adhemerval Zanella	33099d72e4	linux: Simplify get_nprocs This patch simplifies the memory allocation code and uses the sched routines instead of reimplement it. This still uses a stack allocation buffer, so it can be used on malloc initialization code. Linux currently supports at maximum of 4096 cpus for most architectures: $ find -iname Kconfig \| xargs git grep -A10 -w NR_CPUS \| grep -w range arch/alpha/Kconfig- range 2 32 arch/arc/Kconfig- range 2 4096 arch/arm/Kconfig- range 2 16 if DEBUG_KMAP_LOCAL arch/arm/Kconfig- range 2 32 if !DEBUG_KMAP_LOCAL arch/arm64/Kconfig- range 2 4096 arch/csky/Kconfig- range 2 32 arch/hexagon/Kconfig- range 2 6 if SMP arch/ia64/Kconfig- range 2 4096 arch/mips/Kconfig- range 2 256 arch/openrisc/Kconfig- range 2 32 arch/parisc/Kconfig- range 2 32 arch/riscv/Kconfig- range 2 32 arch/s390/Kconfig- range 2 512 arch/sh/Kconfig- range 2 32 arch/sparc/Kconfig- range 2 32 if SPARC32 arch/sparc/Kconfig- range 2 4096 if SPARC64 arch/um/Kconfig- range 1 1 arch/x86/Kconfig-# [NR_CPUS_RANGE_BEGIN ... NR_CPUS_RANGE_END] range. arch/x86/Kconfig- range NR_CPUS_RANGE_BEGIN NR_CPUS_RANGE_END arch/xtensa/Kconfig- range 2 32 With x86 supporting 8192: arch/x86/Kconfig 976 config NR_CPUS_RANGE_END 977 int 978 depends on X86_64 979 default 8192 if SMP && CPUMASK_OFFSTACK 980 default 512 if SMP && !CPUMASK_OFFSTACK 981 default 1 if !SMP So using a maximum of 32k cpu should cover all cases (and I would expect once we start to have many more CPUs that Linux would provide a more straightforward way to query for such information). A test is added to check if sched_getaffinity can successfully return with large buffers. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-09-27 09:18:12 -03:00
Adhemerval Zanella	11a02b035b	misc: Add __get_nprocs_sched This is an internal function meant to return the number of avaliable processor where the process can scheduled, different than the __get_nprocs which returns a the system available online CPU. The Linux implementation currently only calls __get_nprocs(), which in tuns calls sched_getaffinity. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-09-27 09:13:06 -03:00
Samuel Thibault	1cc205c510	htl: make pthread_sigstate read/write set/oset outside sigstate section so that if a segfault occurs, the handler can run fine.	2021-09-26 01:04:13 +02:00
Joseph Myers	b26901b26e	Fix sysdeps/x86/fpu/s_ffma.c for 32-bit FMA processor case It turns out the __SSE2_MATH__ conditional in sysdeps/x86/fpu/s_ffma.c does not cover all cases where the x86 fenv_private.h macros might manipulate one of the SSE and 387 floating-point state, while the actual fma implementation uses the other. Specifically, in the 32-bit case, with a compiler not defaulting to -mfpmath=sse, but testing on a processor with hardware FMA support, the multiarch fma function implementations will end up using SSE, while the fenv_private.h macros will use the 387 state for double. Change the conditional to use the default macros rather than the optimized ones in all cases except when the compiler inlines an fma instruction (in which case, since all those instructions are SSE instructions and -mfpmath=sse must be in effect for them to be inlined, the optimized macros will only use the SSE state and it's OK for them to only use the SSE state). Tested for x86_64 and x86. H.J. reports in <https://sourceware.org/pipermail/libc-alpha/2021-September/131367.html> that it fixes the problems he observed.	2021-09-24 17:59:22 +00:00
Florian Weimer	5ad9d62c3b	Linux: Avoid closing -1 on failure in __closefrom_fallback Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-09-24 19:51:52 +02:00
Fangrui Song	91e92272ca	i386: Port elf_machine_{load_address,dynamic} from x86-64 This drops reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time address of _DYNAMIC. The code sequence length does not change. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-09-24 09:36:32 -07:00
Naohiro Tamura	381b29616a	aarch64: Disable A64FX memcpy/memmove BTI unconditionally This patch disables A64FX memcpy/memmove BTI instruction insertion unconditionally such as A64FX memset patch [1] for performance. [1] commit `07b427296b` Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-09-24 13:26:59 +01:00
Tulio Magno Quites Machado Filho	54ff4f1e39	powerpc64le: Avoid conflicting types for f64xfmaf128 when IFUNC is not used Avoid defining f64xfmaf128 twice when building s_fmaf128.c. This can be reproduced on powerpc64le whenever f128 functions do not have IFUNC enabled, e.g. using "--with-cpu=power8 --disable-multi-arch", or when using "-with-cpu=power9". Fixes: `b3f27d8150` ("Add narrowing fma functions")	2021-09-23 19:29:54 -03:00
Joseph Myers	4ed7a383f9	Fix ffma use of round-to-odd on x86 On 32-bit x86 with -mfpmath=sse, and on x86_64 with --disable-multi-arch, the tests of ffma and its aliases (fma narrowing from binary64 to binary32) fail. This is probably the issue reported by H.J. in <https://sourceware.org/pipermail/libc-alpha/2021-September/131277.html>. The problem is the use of fenv_private.h macros in the round-to-odd implementation. Those macros are set up to manipulate only one of the SSE and 387 floating-point state, whichever is relevant for the type indicated by the suffix on the macro name. But x86 configurations sometimes use the ldbl-96 implementation of binary64 fma (that's where --disable-multi-arch is relevant for x86_64: it causes the ldbl-96 implementation to be used, instead of an IFUNC implementation that falls back to the dbl-64 version), contrary to the expectations of those macros for functions operating on double when __SSE2_MATH__ is defined. This can be addressed by using the default versions of those macros (giving x86 its own version of s_ffma.c), as is done for the *f128 macro variants where it depends on the details of how GCC was configured when building libgcc which floating-point state is affected by _Float128 arithmetic. The issue only applies when __SSE2_MATH__ is defined, and doesn't apply when __FP_FAST_FMA is defined (because in that case, fma will be inlined by the compiler, meaning it's definitely an SSE operation; for the same reason, this is not an issue for narrowing sqrt, as hardware sqrt is always inlined in that implementation for x86), but in other cases it's safest to use the default versions of the fenv_private.h macros to ensure things work whichever fma implementation is used. Tested for x86_64 (with and without --disable-multi-arch) and x86 (with and without -mfpmath=sse).	2021-09-23 21:18:31 +00:00
Florian Weimer	2849e2f533	nptl: Avoid setxid deadlock with blocked signals in thread exit [BZ #28361 ] As part of the fix for bug 12889, signals are blocked during thread exit, so that application code cannot run on the thread that is about to exit. This would cause problems if the application expected signals to be delivered after the signal handler revealed the thread to still exist, despite pthread_kill can no longer be used to send signals to it. However, glibc internally uses the SIGSETXID signal in a way that is incompatible with signal blocking, due to the way the setxid handshake delays thread exit until the setxid operation has completed. With a blocked SIGSETXID, the handshake can never complete, causing a deadlock. As a band-aid, restore the previous handshake protocol by not blocking SIGSETXID during thread exit. The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on a downstream test by Martin Osvald. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2021-09-23 09:56:07 +02:00

... 2 3 4 5 6 ...

14743 Commits