glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-29 16:21:07 +00:00

Author	SHA1	Message	Date
Sunil K Pandey	3fc9ccc20b	x86-64: Add vector exp2/exp2f implementation to libmvec Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp2/exp2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:29 -08:00
Sunil K Pandey	37475ba883	x86-64: Add vector hypot/hypotf implementation to libmvec Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector hypot/hypotf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:21 -08:00
Sunil K Pandey	11c01de14c	x86-64: Add vector asin/asinf implementation to libmvec Implement vectorized asin/asinf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asin/asinf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:03 -08:00
Sunil K Pandey	146310177a	x86-64: Add vector atan/atanf implementation to libmvec Implement vectorized atan/atanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan/atanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:36:46 -08:00
Florian Weimer	5d28a8962d	elf: Add _dl_find_object function It can be used to speed up the libgcc unwinder, and the internal _dl_find_dso_for_object function (which is used for caller identification in dlopen and related functions, and in dladdr). _dl_find_object is in the internal namespace due to bug 28503. If libgcc switches to _dl_find_object, this namespace issue will be fixed. It is located in libc for two reasons: it is necessary to forward the call to the static libc after static dlopen, and there is a link ordering issue with -static-libgcc and libgcc_eh.a because libc.so is not a linker script that includes ld.so in the glibc build tree (so that GCC's internal -lc after libgcc_eh.a does not pick up ld.so). It is necessary to do the i386 customization in the sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because otherwise, multilib installations are broken. The implementation uses software transactional memory, as suggested by Torvald Riegel. Two copies of the supporting data structures are used, also achieving full async-signal-safety. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-28 22:52:56 +01:00
Adhemerval Zanella	83b8d5027d	malloc: Remove memusage.h And use machine-sp.h instead. The Linux implementation is based on already provided CURRENT_STACK_FRAME (used on nptl code) and STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	a75b1e35c5	malloc: Use hp-timing on libmemusage Instead of reimplemeting on GETTIME macro.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	92ff345137	Remove atomic-machine.h atomic typedefs Now that memusage.c uses generic types we can remove them.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	5a5f7a160d	malloc: Remove atomic_* usage These typedef are used solely on memusage and can be replaced with generic types.	2021-12-28 14:57:57 -03:00
Thomas Petazzoni	c75aa9246a	microblaze: Add missing implementation when !__ASSUME_TIME64_SYSCALLS In commit `a92f4e6299` ("linux: Add time64 pselect support"), a Microblaze specific implementation of __pselect32() was added to cover the case of kernels < 3.15 which lack the pselect6 system call. This new file sysdeps/unix/sysv/linux/microblaze/pselect32.c takes precedence over the default implementation sysdeps/unix/sysv/linux/pselect32.c. However sysdeps/unix/sysv/linux/pselect32.c provides an implementation of __pselect32() which is needed when __ASSUME_TIME64_SYSCALLS is not defined. On Microblaze, which is a 32-bit architecture, __ASSUME_TIME64_SYSCALLS is only true for kernels >= 5.1. Due to sysdeps/unix/sysv/linux/microblaze/pselect32.c taking precedence over sysdeps/unix/sysv/linux/pselect32.c, it means that when we are with a kernel >= 3.15 but < 5.1, we need a __pselect32() implementation, but sysdeps/unix/sysv/linux/microblaze/pselect32.c doesn't provide it, and sysdeps/unix/sysv/linux/pselect32.c which would provide it is not compiled in. This causes the following build failure on Microblaze with for example Linux kernel headers 4.9: [...]/build/libc_pic.os: in function `__pselect64': (.text+0x120b44): undefined reference to `__pselect32' collect2: error: ld returned 1 exit status Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-28 09:09:49 -03:00
Adhemerval Zanella	8c0664e2b8	elf: Add _dl_audit_pltexit It consolidates the code required to call la_pltexit audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	eff687e846	elf: Add _dl_audit_pltenter It consolidates the code required to call la_pltenter audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	0b98a87487	elf: Add _dl_audit_preinit It consolidates the code required to call la_preinit audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	cda4f265c6	elf: Add _dl_audit_symbind_alt and _dl_audit_symbind It consolidates the code required to call la_symbind{32,64} audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	311c9ee54e	elf: Add _dl_audit_objclose It consolidates the code required to call la_objclose audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	c91008d349	elf: Add _dl_audit_objsearch It consolidates the code required to call la_objsearch audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	3dac3959a5	elf: Add _dl_audit_activity_map and _dl_audit_activity_nsid It consolidates the code required to call la_activity audit callback. Also for a new Lmid_t the namespace link_map list are empty, so it requires to check if before using it. This can happen for when audit module is used along with dlmopen. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	aee6e90f93	elf: Add _dl_audit_objopen It consolidates the code required to call la_objopen audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Samuel Thibault	ae49f218da	hurd: Fix static-PIE startup hurd initialization stages use RUN_HOOK to run various initialization functions. That is however using absolute addresses which need to be relocated, which is done later by csu. We can however easily make the linker compute relative addresses which thus don't need a relocation. The new SET_RELHOOK and RUN_RELHOOK macros implement this.	2021-12-28 10:28:22 +01:00
Samuel Thibault	2ce0481d26	hurd: let csu initialize tls Since `9cec82de71` ("htl: Initialize later"), we let csu initialize pthreads. We can thus let it initialize tls later too, to better align with the generic order. Initialization however accesses ports which links/unlinks into the sigstate for unwinding. We can however easily skip that during initialization.	2021-12-28 10:15:52 +01:00
Samuel Thibault	7b358de1af	hurd: Fix XFAIL-ing mallocfork2 tests They are using setpshared but are outside the htl directory.	2021-12-27 22:21:08 +01:00
Samuel Thibault	1c6e6e52e5	hurd: XFAIL more tests that require setpshared support	2021-12-27 22:15:43 +01:00
Noah Goldstein	cca457f9c5	x86: Optimize L(less_vec) case in memcmpeq-evex.S No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-27 03:18:58 -06:00
Noah Goldstein	abddd61de0	x86: Optimize L(less_vec) case in memcmp-evex-movbe.S No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-27 03:17:59 -06:00
Adhemerval Zanella	a4b4131355	Set default __TIMESIZE default to 64 This is expected size for newer ABIs.	2021-12-23 11:41:08 -03:00
Sunil K Pandey	f20f980c71	x86-64: Add vector acos/acosf implementation to libmvec Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-22 13:03:14 -08:00
H.J. Lu	d3e4f5a101	s_sincosf.h: Change pio4 type to float [BZ #28713 ] s_cosf.c and s_sinf.c have if (abstop12 (y) < abstop12 (pio4)) where abstop12 takes a float argument, but pio4 is static const double. pio4 is used only in calls to abstop12 and never in arithmetic. Apply -static const double pio4 = 0x1.921FB54442D18p-1; +static const float pio4 = 0x1.921FB6p-1f; to fix: FAIL: math/test-float-cos FAIL: math/test-float-sin FAIL: math/test-float-sincos FAIL: math/test-float32-cos FAIL: math/test-float32-sin FAIL: math/test-float32-sincos when compiling with GCC 12. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-21 08:56:12 -08:00
maminjie	e0fc721ce6	Linux: Fix 32-bit vDSO for clock_gettime on powerpc32 When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID, on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully ( because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S does not support these two IDs, the 32-bit time_t syscall will be used), but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used, resulting in two system calls. Fix commit `72e84d1db2`. Signed-off-by: maminjie <maminjie2@huawei.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-21 09:47:16 -03:00
H.J. Lu	de8a0897e3	Regenerate ulps on x86_64 with GCC 12 Fix FAIL: math/test-float-clog10 FAIL: math/test-float32-clog10 on Intel Core i7-1165G7 with GCC 12.	2021-12-20 15:25:00 -08:00
Joseph Myers	a94d9659cd	Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol numbers for PF_CAN") but apparently missed for glibc at the time. Tested for x86_64.	2021-12-20 15:38:32 +00:00
Adhemerval Zanella	691d9ae9e6	Remove ununsed tcb-offset Some architectures do not use the auto-generated tcb-offsets.h.	2021-12-17 17:47:29 -03:00
Aurelien Jarno	225da459ce	riscv: align stack before calling _dl_init [BZ #28703 ] Align the stack pointer to 128 bits during the call to _dl_init() as specified by the RISC-V ABI [1]. This fixes the elf/tst-align2 test. Fixes bug 28703. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc	2021-12-17 20:29:34 +01:00
Aurelien Jarno	d2e594d715	riscv: align stack in clone [BZ #28702 ] The RISC-V ABI [1] mandates that "the stack pointer shall be aligned to a 128-bit boundary upon procedure entry". This as not the case in clone. This fixes the misc/tst-misalign-clone-internal and misc/tst-misalign-clone tests. Fixes bug 28702. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc	2021-12-17 20:29:32 +01:00
Aurelien Jarno	94058f6cde	elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704 ] On KVM guests running on some AMD systems, the IBRS feature is reported as a synthetic feature using the Intel feature, while the cpuinfo entry keeps the same. Handle that by first checking the presence of the Intel feature on AMD systems. Fixes bug 28704.	2021-12-17 20:20:15 +01:00
Matheus Castanho	ae91d3df24	powerpc64[le]: Allocate extra stack frame on syscall.S The syscall function does not allocate the extra stack frame for scv like other assembly syscalls using DO_CALL_SCV. So after commit `d120fb9941` changed the offset that is used to save LR, syscall ended up using an invalid offset, causing regressions on powerpc64. So make sure the extra stack frame is allocated in syscall.S as well to make it consistent with other uses of DO_CALL_SCV and avoid similar issues in the future. Tested on powerpc, powerpc64, and powerpc64le (with and without scv) Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-12-17 15:40:53 -03:00
Florian Weimer	ce1e5b1122	arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macro Due to PIE-by-default, PIC is now defined in more cases. libc.a does not have _rtld_global_ro, and statically linking setcontext fails. SHARED is the right condition to use, so that libc.a references _dl_hwcap instead of _rtld_global_ro. For static PIE support, the !SHARED case would still have to be made PIC. This patch does not achieve that. Fixes commit `23645707f1` ("Replace --enable-static-pie with --disable-default-pie"). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-17 11:48:44 +01:00
Adhemerval Zanella	98d5fcb8d0	malloc: Add Huge Page support for mmap With the morecore hook removed, there is not easy way to provide huge pages support on with glibc allocator without resorting to transparent huge pages. And some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly with the system default size, while a positive value means and specific page size that is matched against the supported ones by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:38 -03:00
Adhemerval Zanella	5f6d8d97c6	malloc: Add madvise support for Transparent Huge Pages Linux Transparent Huge Pages (THP) current supports three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous pages. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:14 -03:00
Florian Weimer	cb976fba4c	powerpc: Use global register variable in <thread_pointer.h> A local register variable is merely a compiler hint, and so not appropriate in this context. Move the global register variable into <thread_pointer.h> and include it from <tls.h>, as there can only be one global definition for one particular register. Fixes commit `8dbeb0561e` ("nptl: Add <thread_pointer.h> for defining __thread_pointer"). Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-12-15 16:06:25 +01:00
H.J. Lu	4435c29892	Support target specific ALIGN for variable alignment test [BZ #28676 ] Add <tst-file-align.h> to support target specific ALIGN for variable alignment test: 1. Alpha: Use 0x10000. 2. MicroBlaze and Nios II: Use 0x8000. 3. All others: Use 0x200000. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-14 14:50:33 -08:00
Samuel Thibault	ec06717856	hurd: Do not set PIE_UNSUPPORTED This is now supported.	2021-12-14 08:38:05 +01:00
Akila Welihinda	3b1402b3fc	sysdeps: Simplify sin Taylor Series calculation The macro TAYLOR_SIN adds the term `-0.5daa^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5daa^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-13 15:31:05 +01:00
Adhemerval Zanella	104d2005d5	math: Remove the error handling wrapper from hypot and hypotf The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Wilco Dijkstra	2f44eef584	math: Use fmin/fmax on hypot It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	ecb94e9587	aarch64: Add math-use-builtins-f{max,min}.h It allows to remove the arch-specific implementations.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	583c4d424e	math: Add math-use-builtinds-fmin.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 10:08:43 -03:00
Adhemerval Zanella	72ab1eaec7	math: Add math-use-builtinds-fmax.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	2eb1cd2f47	math: Remove powerpc e_hypot The generic implementation is shows only slight worse performance: POWER10 reciprocal-throughput latency master 8.28478 13.7253 new hypot 7.21945 13.1933 POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 14.8479 15.8061 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 16.5371 18.4057 One way to improve might to make gcc generate xsmaxdp/xsmindp for fmax/fmin (it onl does for -ffast-math, clang does for default options). Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu (power9).	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	a1d3c9b642	i386: Move hypot implementation to C The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu.	2021-12-13 09:08:02 -03:00
Adhemerval Zanella	c212d6397e	math: Use an improved algorithm for hypotl (ldbl-128) This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00

1 2 3 4 5 ...

14572 Commits