glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-03 10:21:05 +00:00

Author	SHA1	Message	Date
Siddhesh Poyarekar	23e0e8f5f1	getcwd: Set errno to ERANGE for size == 1 (CVE-2021-3999) No valid path returned by getcwd would fit into 1 byte, so reject the size early and return NULL with errno set to ERANGE. This change is prompted by CVE-2021-3999, which describes a single byte buffer underflow and overflow when all of the following conditions are met: - The buffer size (i.e. the second argument of getcwd) is 1 byte - The current working directory is too long - '/' is also mounted on the current working directory Sequence of events: - In sysdeps/unix/sysv/linux/getcwd.c, the syscall returns ENAMETOOLONG because the linux kernel checks for name length before it checks buffer size - The code falls back to the generic getcwd in sysdeps/posix - In the generic func, the buf[0] is set to '\0' on line 250 - this while loop on line 262 is bypassed: while (!(thisdev == rootdev && thisino == rootino)) since the rootfs (/) is bind mounted onto the directory and the flow goes on to line 449, where it puts a '/' in the byte before the buffer. - Finally on line 458, it moves 2 bytes (the underflowed byte and the '\0') to the buf[0] and buf[1], resulting in a 1 byte buffer overflow. - buf is returned on line 469 and errno is not set. This resolves BZ #28769. Reviewed-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Qualys Security Advisory <qsa@qualys.com> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-01-24 11:00:17 +05:30
Samuel Thibault	8c86ba4463	htl: Fix cleaning the reply port If any RPC fails, the reply port will already be deallocated. __pthread_thread_terminate thus has to defer taking its name until the very last __thread_terminate_release which doesn't reply a message. But then we have to read from the pthread structure. This introduces __pthread_dealloc_finish() which does the recording of the thread termination, so the slot can be reused really only just before the __thread_terminate_release call. Only the real thread can set it, so let's decouple this from the pthread_state by just removing the PTHREAD_TERMINATED state and add a terminated field.	2022-01-22 02:17:19 +01:00
Florian Weimer	f44820821a	mips: Move DT_MIPS into <ldsodefs.h> ELF_MACHINE_XHASH_SETUP in that file needs it. Fixes commit `c90363403b` ("elf: Move _dl_setup_hash to its own file").	2022-01-19 20:11:55 +01:00
H.J. Lu	1e000d3d33	x86: Black list more Intel CPUs for TSX [BZ #27398 ] Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-01-18 14:20:09 -08:00
Samuel Thibault	f8b765bec4	htl: Fix build error in annexc We were getting ../scripts/evaluate-test.sh posix/annexc $? true false > /usr/src/glibc-upstream/build/posix/annexc.test-result In file included from ../include/pthread.h:1, from <stdin>:1: ../sysdeps/htl/include/pthread.h:7:62: error: missing binary operator before token "(" 7 \| # if defined __USE_EXTERN_INLINES && defined _LIBC && !IS_IN (libsupport) \| ^	2022-01-17 23:18:27 +00:00
Aurelien Jarno	c242fcce06	x86: use default cache size if it cannot be determined [BZ #28784 ] In some cases (e.g QEMU, non-Intel/AMD CPU) the cache information can not be retrieved and the corresponding values are set to 0. Commit `2d651eb926` ("x86: Move x86 processor cache info to cpu_features") changed the behaviour in such case by defining the __x86_shared_cache_size and __x86_data_cache_size variables to 0 instead of using the default values. This cause an issue with the i686 SSE2 optimized bzero/routine which assumes that the cache size is at least 128 bytes, and otherwise tries to zero/set the whole address space minus 128 bytes. Fix that by restoring the original code to only update __x86_shared_cache_size and __x86_data_cache_size variables if the corresponding cache sizes are not zero. Fixes bug 28784 Fixes commit `2d651eb926` Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-17 19:42:46 +01:00
Adhemerval Zanella	5f3a7ebc35	Linux: Add epoll_pwait2 (BZ #27359 ) It is similar to epoll_wait, with the difference the timeout has nanosecond resoluting by using struct timespec instead of int. Although Linux interface only provides 64 bit time_t support, old 32 bit interface is also provided (so keep in sync with current practice and to no force opt-in on 64 bit time_t). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2022-01-17 14:34:54 -03:00
Adhemerval Zanella	9fe6f63638	elf: Fix 64 time_t support for installed statically binaries The usage of internal static symbol for statically linked binaries does not work correctly for objects built with -D_TIME_BITS=64, since the internal definition does not provide the expected aliases. This patch makes it to use the default stat functions instead (which uses the default 64 time_t alias and types). Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2022-01-17 10:57:09 -03:00
Adhemerval Zanella	cedd498dbc	Revert "elf: Fix 64 time_t support for installed statically binaries" This reverts commit `0b8e83eb14`.	2022-01-17 10:56:58 -03:00
Samuel Thibault	41a11a5e83	hurd: optimize exec cleanup When ports are nul we do not need to request their deallocation. It is also useless to look for them in portnames.	2022-01-16 00:02:16 +01:00
Samuel Thibault	54dda2cdba	hurd: Add __rtld_execve It trivially execs with the same dtable, portarray and intarray, and only has to take care of deallocating / destroying ports (file, notably).	2022-01-15 23:42:35 +01:00
Samuel Thibault	1bd7a06a95	htl: Hide __pthread_attr's __schedparam type [BZ #23088 ] The content of the structure is only used internally, so we can make __pthread_attr_getschedparam and __pthread_attr_setschedparam convert between the public sched_param type and an internal __sched_param. This allows to avoid to spuriously expose the sched_param type. This fixes BZ #23088.	2022-01-15 21:31:08 +01:00
Samuel Thibault	c1105e34ac	htl: Clear kernel_thread field before releasing the thread structure Otherwise this is a use-after-free.	2022-01-15 21:31:08 +01:00
Samuel Thibault	67ca1c5560	hurd: Fix timer/clock_getres crash on NULL res parameter POSIX allows res to be NULL.	2022-01-15 15:37:03 +01:00
Samuel Thibault	2c040d0b90	hurd: Fix pthread_kill on exiting/ted thread We have to drop the kernel_thread port from the thread structure, to avoid pthread_kill's call to _hurd_thread_sigstate trying to reference it and fail.	2022-01-15 15:11:54 +01:00
Samuel Thibault	dfb204d87f	[hurd] Drop spurious #ifdef SHARED The whole file is already #ifdef SHARED	2022-01-15 14:23:37 +01:00
Samuel Thibault	f05faf5f22	[hurd] Call _dl_sort_maps_init in _dl_sysdep_start This follows `15a0c5730d` ("elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)").	2022-01-15 14:21:53 +01:00
Florian Weimer	f01d482f03	s390x: Use <gcc-macros.h> in early HWCAP check This is required so that the checks still work if $(early-cflags) selects a different ISA level. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2022-01-14 20:17:58 +01:00
Florian Weimer	990c953bce	x86: Add x86-64-vN check to early startup This ISA level covers the glibc build itself. <dl-hwcap-check.h> cannot be used because this check (by design) happens before DL_PLATFORM_INIT and the x86 CPU flags initialization. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-14 20:17:49 +01:00
Florian Weimer	5501164866	powerpc64le: Use <gcc-macros.h> in early HWCAP check This is required so that the checks still work if $(early-cflags) selects a different ISA level. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2022-01-14 20:17:40 +01:00
Florian Weimer	5732a881aa	x86: HAVE_X86_LAHF_SAHF, HAVE_X86_MOVBE and -march=x86-64-vN (bug 28782) HAVE_X86_LAHF_SAHF is implied by x86-64-v2, and HAVE_X86_MOVBE by x86-64-v3. The individual flag does not appear in -fverbose-asm flag output even if the ISA level implies it. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-14 16:09:20 +01:00
Sunil K Pandey	047512374a	math: Add more inputs to atan2 accuracy tests [BZ #28765 ] This patch adds following inputs: 0x1.bcab29da0e947p-54 0x1.bc41f4d2294b8p-54 0x1.a11891ec004d4p-348 0x1.814830510be26p-348 0x1.b836ed678be29p-588 0x1.b7be6f5a03a8cp-588 0x1.a83f842ef3f73p-633 0x1.a799d8a6677ep-633 to atan2 tests and updates x86_64 double atan2 ulps. This fixes BZ #28765. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2022-01-14 06:00:06 -08:00
Joseph Myers	4997a533ae	Update syscall lists for Linux 5.16 Linux 5.16 has one new syscall, futex_waitv. Update syscall-names.list and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py.	2022-01-13 22:18:13 +00:00
Florian Weimer	a78e6a10d0	i386: Remove broken CAN_USE_REGISTER_ASM_EBP (bug 28771) The configure check for CAN_USE_REGISTER_ASM_EBP tried to compile a simple function that uses %ebp as an inline assembly operand. If compilation failed, CAN_USE_REGISTER_ASM_EBP was set 0, which eventually had these consequences: (1) %ebx was avoided as an inline assembly operand, with an assembler macro hack to avoid unnecessary register moves. (2) %ebp was avoided as an inline assembly operand, using an out-of-line syscall function for 6-argument system calls. (1) is no longer needed for any GCC version that is supported for building glibc. %ebx can be used directly as a register operand. Therefore, this commit removes the %ebx avoidance completely. This avoids the assembler macro hack, which turns out to be incompatible with the current Systemtap probe macros (which switch to .altmacro unconditionally). (2) is still needed in many build configurations. The existing configure check cannot really capture that because the simple function succeeds to compile, while the full glibc build still fails. Therefore, this commit removes the check, the CAN_USE_REGISTER_ASM_EBP macro, and uses the out-of-line syscall function for 6-argument system calls unconditionally. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-13 14:59:44 +01:00
Sunil K Pandey	49e2bf58d5	x86_64: Fix SSE4.2 libmvec atan2 function accuracy [BZ #28765 ] This patch fixes SSE4.2 libmvec atan2 function accuracy for following inputs to less than 4 ulps. {0x1.bcab29da0e947p-54,0x1.bc41f4d2294b8p-54} 4.19888 ulps {0x1.b836ed678be29p-588,0x1.b7be6f5a03a8cp-588} 4.09889 ulps This fixes BZ #28765. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-12 13:23:22 -08:00
Adhemerval Zanella	572e0c8554	Revert "linux: Fix ancillary 64-bit time timestamp conversion (BZ #28349 , BZ #28350 )" This reverts commit `21e0f45c7d`.	2022-01-12 10:35:06 -03:00
Adhemerval Zanella	21e0f45c7d	linux: Fix ancillary 64-bit time timestamp conversion (BZ #28349 , BZ #28350 ) The __convert_scm_timestamps() only updates the control message last pointer for SOL_SOCKET type, so if the message control buffer contains multiple ancillary message types the converted timestamp one might overwrite a valid message. The test check if the extra ancillary space is correctly handled by recvmsg/recvmmsg, where if there is no extra space for the 64-bit time_t converted message the control buffer should be marked with MSG_TRUNC. It also check if recvmsg/recvmmsg handle correctly multiple ancillary data. Checked on x86_64-linux and on i686-linux-gnu on both 5.11 and 4.15 kernel. Co-authored-by: Fabian Vogt <fvogt@suse.de>	2022-01-12 10:30:10 -03:00
Adhemerval Zanella	0b8e83eb14	elf: Fix 64 time_t support for installed statically binaries The usage of internal static symbol for statically linked binaries does not work correctly for objects built with -D_TIME_BITS=64, since the internal definition does not provide the expected aliases. This patch makes it to use the default stat functions instead (which uses the default 64 time_t alias and types). Checked on i686-linux-gnu.	2022-01-12 10:30:10 -03:00
Szabolcs Nagy	5a1be8ebdf	aarch64: Add HWCAP2_ECV from Linux 5.16 Indicates the availability of enhanced counter virtualization extension of armv8.6-a with self-synchronized virtual counter CNTVCTSS_EL0 usable in userspace.	2022-01-11 16:05:16 +00:00
Noah Goldstein	7e08db3359	x86: Fix __wcsncmp_evex in strcmp-evex.S [BZ# 28755] Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to __wcscmp_evex. For x86_64 this covers the entire address range so any length larger could not possibly be used to bound `s1` or `s2`. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-01-10 20:31:57 -06:00
Noah Goldstein	ddf0992cf5	x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to __wcscmp_avx2. For x86_64 this covers the entire address range so any length larger could not possibly be used to bound `s1` or `s2`. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-01-10 20:31:46 -06:00
Szabolcs Nagy	347a5b592c	math: Fix float conversion regressions with gcc-12 [BZ #28713 ] Converting double precision constants to float is now affected by the runtime dynamic rounding mode instead of being evaluated at compile time with default rounding mode (except static object initializers). This can change the computed result and cause performance regression. The known correctness issues (increased ulp errors) are already fixed, this patch fixes remaining cases of unnecessary runtime conversions. Add float M_* macros to math.h as new GNU extension API. To avoid conversions the new M_* macros are used and instead of casting double literals to float, use float literals (only required if the conversion is inexact). The patch was tested on aarch64 where the following symbols had new spurious conversion instructions that got fixed: __clog10f __gammaf_r_finite@GLIBC_2.17 __j0f_finite@GLIBC_2.17 __j1f_finite@GLIBC_2.17 __jnf_finite@GLIBC_2.17 __kernel_casinhf __lgamma_negf __log1pf __y0f_finite@GLIBC_2.17 __y1f_finite@GLIBC_2.17 cacosf cacoshf casinhf catanf catanhf clogf gammaf_positive Fixes bug 28713. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2022-01-10 14:27:17 +00:00
Florian Weimer	6b0978c14a	Restore ENTRY_POINT definition on hppa, ia64 (bug 28749) ENTRY_POINT is still needed for elf/rtld.c. Fixes commit `4fb4e7e821` ("csu: Always use __executable_start in gmon-start.c").	2022-01-07 14:47:31 +01:00
Samuel Thibault	d5b0046e3d	ttydefaults.h: Fix CSTATUS to control-t 4.4BSD actually defaults CSTATUS to control-t, so our generic header should as well.	2022-01-07 00:23:05 +01:00
Wilco Dijkstra	e5fa62b8db	AArch64: Check for SVE in ifuncs [BZ #28744 ] Add a check for SVE in the A64FX ifuncs for memcpy, memset and memmove. This fixes BZ #28744.	2022-01-06 14:36:28 +00:00
Adhemerval Zanella	65ccd641ba	debug: Remove catchsegv and libSegfault (BZ #14913 ) Trapping SIGSEGV within the process is error-prone, adds security issues, and modern analysis design tends to happen out of the process (either by attaching a debugger or by post-mortem analysis). The libSegfault also has some design problems, it uses non async-signal-safe function (backtrace) on signal handler. There are multiple alternatives if users do want to use similar functionality, such as sigsegv gnulib module or libsegfault.	2022-01-06 07:59:49 -03:00
Stafford Horne	0c3c62ca7d	or1k: Build Infrastructure Here we define the minumum linux kernel version at 5.4.0, as that is the long term support version where 32-bit architectures start to support 64-bit time API's. The OpenRISC kernel had some bugs up until version 5.8 which caused issues with glibc fork/clone, they have been backported to 5.4 but not previous versions. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	d147259b5c	or1k: ABI lists Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	7d334b1831	or1k: Linux ABI Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	1871c95f2b	or1k: Linux Syscall Interface Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	9a47b9660b	or1k: math soft float support OpenRISC support hard float but I will like to submit that after glibc soft float goes upstream. The hard float support depends on adding user access to the FPCSR, which is not supported by the kernel yet. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	9f3653b1fa	or1k: Atomics and Locking primitives Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	96882a00ce	or1k: Thread Local Storage support OpenRISC includes 3 TLS addressing models. Local Dynamic optimizations are not done in the linker and therefore use the same code sequences as Global Dynamic. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	de5c0edc80	or1k: startup and dynamic linking code Code for C runtime startup and dynamic loading including PLT layout. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:06 +09:00
Stafford Horne	6e5964311d	or1k: ABI Implementation This code deals with the OpenRISC ABI. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:05 +09:00
Stafford Horne	9dde3a24f1	linux/syscalls: Add or1k_atomic syscall for OpenRISC Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-01-05 06:40:05 +09:00
H.J. Lu	9288c92d00	elf: Add <dl-debug.h> Add <dl-debug.h> to setup debugging entry in PT_DYNAMIC segment to support DT_DEBUG, DT_MIPS_RLD_MAP_REL and DT_MIPS_RLD_MAP. Tested on x86-64, x32 and i686 as well as with build-many-glibcs.py.	2022-01-03 05:16:03 -08:00
Paul Eggert	634b5ebac6	Update copyright dates not handled by scripts/update-copyrights. I've updated copyright dates in glibc for 2022. This is the patch for the changes not generated by scripts/update-copyrights and subsequent build / regeneration of generated files. As well as the usual annual updates, mainly dates in --version output (minus csu/version.c which previously had to be handled manually but is now successfully updated by update-copyrights), there is a small change to the copyright notice in NEWS which should let NEWS get updated automatically next year. Please remember to include 2022 in the dates for any new files added in future (which means updating any existing uncommitted patches you have that add new files to use the new copyright dates in them).	2022-01-01 11:42:26 -08:00
Paul Eggert	581c785bf3	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: * 912-#endif remote: * 913: remote: * 914- remote: * error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines	2022-01-01 11:40:24 -08:00
Samuel Thibault	edb5ab841a	hurd: Use __trivfs_server_name instead of trivfs_server_name The latter violates namespace contraints	2022-01-01 17:51:18 +01:00
Samuel Thibault	35cf8a85ed	hurd: Bump BRK_START to 0x20000000 By nowadays uses, 256MiB is not that large for the program+libraries. Let's push the heap further to leave room for e.g. clang.	2021-12-31 18:25:49 +01:00
Samuel Thibault	8c0727af63	hurd: Avoid overzealous shared objects constraints `407765e9f2` ("hurd: Fix ELF_MACHINE_USER_ADDRESS_MASK value") switched ELF_MACHINE_USER_ADDRESS_MASK from 0xf8000000UL to 0xf0000000UL to let libraries etc. get loaded at 0x08000000. But ELF_MACHINE_USER_ADDRESS_MASK is actually only meaningful for the main program anyway, so keep it at 0xf8000000UL to prevent the program loader from putting ld.so beyond 0x08000000. And conversely, drop the use of ELF_MACHINE_USER_ADDRESS_MASK for shared objects, which don't need any constraints since the program will have already be loaded by then.	2021-12-31 18:22:46 +01:00
Adhemerval Zanella	1f17da01e6	time: Refactor timesize.h for some ABIs Commit `a4b4131355` changed default __TIMESIZE to 64, however it added sub-architecture timesize.h for powerpc, s390, and sparc. Also simplify mips by removing _MIPS_SIM usage (which would require to add sgidefs inclusion.	2021-12-31 10:58:13 -03:00
Samuel Thibault	33e8e95cbd	hurd: Make getrandom a stub inside the random translator glibc uses /dev/urandom for getrandom(), and from version 2.34 malloc initialization uses it. We have to detect when we are running the random translator itself, in which case we can't read ourself.	2021-12-31 08:54:41 +01:00
Stafford Horne	4dfa8f4870	open64: Force O_LARGEFILE on all architectures When running tests on OpenRISC which has 32-bit wordsize but 64-bit timesize it was found that O_LARGEFILE is not being set when calling open64. For 64-bit architectures the O_LARGEFILE flag is generally implied by the kernel according to force_o_largefile. However, for 32-bit architectures this is not done. For this patch we unconditionally now set the O_LARGEFILE flag for open64 class syscalls as there is no harm in doing so. Tested on the OpenRISC the build works and timezone/tst-tzset passes which was failing before. I would expect this also would fix arc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-31 07:10:19 +09:00
Sunil K Pandey	c21c7bc24e	x86-64: Add vector tan/tanf implementation to libmvec Implement vectorized tan/tanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tan/tanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-30 10:19:13 -08:00
Sunil K Pandey	8881cca8fb	x86-64: Add vector erfc/erfcf implementation to libmvec Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erfc/erfcf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-30 10:19:03 -08:00
Sunil K Pandey	e682d01578	x86-64: Add vector asinh/asinhf implementation to libmvec Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asinh/asinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:56 -08:00
Sunil K Pandey	c0f36fc303	x86-64: Add vector tanh/tanhf implementation to libmvec Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tanh/tanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:50 -08:00
Sunil K Pandey	f9ce13fdac	x86-64: Add vector erf/erff implementation to libmvec Implement vectorized erf/erff containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erf/erff with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:44 -08:00
Sunil K Pandey	0625489ccc	x86-64: Add vector acosh/acoshf implementation to libmvec Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acosh/acoshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:39 -08:00
Sunil K Pandey	6dea4dd3da	x86-64: Add vector atanh/atanhf implementation to libmvec Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atanh/atanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:34 -08:00
Sunil K Pandey	74265c16ab	x86-64: Add vector log1p/log1pf implementation to libmvec Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log1p/log1pf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:27 -08:00
Sunil K Pandey	7e1722fec8	x86-64: Add vector log2/log2f implementation to libmvec Implement vectorized log2/log2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log2/log2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:21 -08:00
Sunil K Pandey	8f8566026d	x86-64: Add vector log10/log10f implementation to libmvec Implement vectorized log10/log10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log10/log10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:15 -08:00
Sunil K Pandey	2941a24f8c	x86-64: Add vector atan2/atan2f implementation to libmvec Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan2/atan2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:09 -08:00
Sunil K Pandey	2bf02c5843	x86-64: Add vector cbrt/cbrtf implementation to libmvec Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:02 -08:00
Sunil K Pandey	aa1809a1df	x86-64: Add vector sinh/sinhf implementation to libmvec Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector sinh/sinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:55 -08:00
Sunil K Pandey	76ddc74e86	x86-64: Add vector expm1/expm1f implementation to libmvec Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector expm1/expm1f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:49 -08:00
Sunil K Pandey	ef7ea9c132	x86-64: Add vector cosh/coshf implementation to libmvec Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cosh/coshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:42 -08:00
Sunil K Pandey	8b726453d5	x86-64: Add vector exp10/exp10f implementation to libmvec Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp10/exp10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:35 -08:00
Sunil K Pandey	3fc9ccc20b	x86-64: Add vector exp2/exp2f implementation to libmvec Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp2/exp2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:29 -08:00
Sunil K Pandey	37475ba883	x86-64: Add vector hypot/hypotf implementation to libmvec Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector hypot/hypotf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:21 -08:00
Sunil K Pandey	11c01de14c	x86-64: Add vector asin/asinf implementation to libmvec Implement vectorized asin/asinf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asin/asinf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:03 -08:00
Sunil K Pandey	146310177a	x86-64: Add vector atan/atanf implementation to libmvec Implement vectorized atan/atanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan/atanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:36:46 -08:00
Florian Weimer	5d28a8962d	elf: Add _dl_find_object function It can be used to speed up the libgcc unwinder, and the internal _dl_find_dso_for_object function (which is used for caller identification in dlopen and related functions, and in dladdr). _dl_find_object is in the internal namespace due to bug 28503. If libgcc switches to _dl_find_object, this namespace issue will be fixed. It is located in libc for two reasons: it is necessary to forward the call to the static libc after static dlopen, and there is a link ordering issue with -static-libgcc and libgcc_eh.a because libc.so is not a linker script that includes ld.so in the glibc build tree (so that GCC's internal -lc after libgcc_eh.a does not pick up ld.so). It is necessary to do the i386 customization in the sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because otherwise, multilib installations are broken. The implementation uses software transactional memory, as suggested by Torvald Riegel. Two copies of the supporting data structures are used, also achieving full async-signal-safety. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-28 22:52:56 +01:00
Adhemerval Zanella	83b8d5027d	malloc: Remove memusage.h And use machine-sp.h instead. The Linux implementation is based on already provided CURRENT_STACK_FRAME (used on nptl code) and STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	a75b1e35c5	malloc: Use hp-timing on libmemusage Instead of reimplemeting on GETTIME macro.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	92ff345137	Remove atomic-machine.h atomic typedefs Now that memusage.c uses generic types we can remove them.	2021-12-28 14:57:57 -03:00
Adhemerval Zanella	5a5f7a160d	malloc: Remove atomic_* usage These typedef are used solely on memusage and can be replaced with generic types.	2021-12-28 14:57:57 -03:00
Thomas Petazzoni	c75aa9246a	microblaze: Add missing implementation when !__ASSUME_TIME64_SYSCALLS In commit `a92f4e6299` ("linux: Add time64 pselect support"), a Microblaze specific implementation of __pselect32() was added to cover the case of kernels < 3.15 which lack the pselect6 system call. This new file sysdeps/unix/sysv/linux/microblaze/pselect32.c takes precedence over the default implementation sysdeps/unix/sysv/linux/pselect32.c. However sysdeps/unix/sysv/linux/pselect32.c provides an implementation of __pselect32() which is needed when __ASSUME_TIME64_SYSCALLS is not defined. On Microblaze, which is a 32-bit architecture, __ASSUME_TIME64_SYSCALLS is only true for kernels >= 5.1. Due to sysdeps/unix/sysv/linux/microblaze/pselect32.c taking precedence over sysdeps/unix/sysv/linux/pselect32.c, it means that when we are with a kernel >= 3.15 but < 5.1, we need a __pselect32() implementation, but sysdeps/unix/sysv/linux/microblaze/pselect32.c doesn't provide it, and sysdeps/unix/sysv/linux/pselect32.c which would provide it is not compiled in. This causes the following build failure on Microblaze with for example Linux kernel headers 4.9: [...]/build/libc_pic.os: in function `__pselect64': (.text+0x120b44): undefined reference to `__pselect32' collect2: error: ld returned 1 exit status Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-28 09:09:49 -03:00
Adhemerval Zanella	8c0664e2b8	elf: Add _dl_audit_pltexit It consolidates the code required to call la_pltexit audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	eff687e846	elf: Add _dl_audit_pltenter It consolidates the code required to call la_pltenter audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	0b98a87487	elf: Add _dl_audit_preinit It consolidates the code required to call la_preinit audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	cda4f265c6	elf: Add _dl_audit_symbind_alt and _dl_audit_symbind It consolidates the code required to call la_symbind{32,64} audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	311c9ee54e	elf: Add _dl_audit_objclose It consolidates the code required to call la_objclose audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	c91008d349	elf: Add _dl_audit_objsearch It consolidates the code required to call la_objsearch audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	3dac3959a5	elf: Add _dl_audit_activity_map and _dl_audit_activity_nsid It consolidates the code required to call la_activity audit callback. Also for a new Lmid_t the namespace link_map list are empty, so it requires to check if before using it. This can happen for when audit module is used along with dlmopen. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Adhemerval Zanella	aee6e90f93	elf: Add _dl_audit_objopen It consolidates the code required to call la_objopen audit callback. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-28 08:40:38 -03:00
Samuel Thibault	ae49f218da	hurd: Fix static-PIE startup hurd initialization stages use RUN_HOOK to run various initialization functions. That is however using absolute addresses which need to be relocated, which is done later by csu. We can however easily make the linker compute relative addresses which thus don't need a relocation. The new SET_RELHOOK and RUN_RELHOOK macros implement this.	2021-12-28 10:28:22 +01:00
Samuel Thibault	2ce0481d26	hurd: let csu initialize tls Since `9cec82de71` ("htl: Initialize later"), we let csu initialize pthreads. We can thus let it initialize tls later too, to better align with the generic order. Initialization however accesses ports which links/unlinks into the sigstate for unwinding. We can however easily skip that during initialization.	2021-12-28 10:15:52 +01:00
Samuel Thibault	7b358de1af	hurd: Fix XFAIL-ing mallocfork2 tests They are using setpshared but are outside the htl directory.	2021-12-27 22:21:08 +01:00
Samuel Thibault	1c6e6e52e5	hurd: XFAIL more tests that require setpshared support	2021-12-27 22:15:43 +01:00
Noah Goldstein	cca457f9c5	x86: Optimize L(less_vec) case in memcmpeq-evex.S No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-27 03:18:58 -06:00
Noah Goldstein	abddd61de0	x86: Optimize L(less_vec) case in memcmp-evex-movbe.S No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-27 03:17:59 -06:00
Adhemerval Zanella	a4b4131355	Set default __TIMESIZE default to 64 This is expected size for newer ABIs.	2021-12-23 11:41:08 -03:00
Sunil K Pandey	f20f980c71	x86-64: Add vector acos/acosf implementation to libmvec Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-22 13:03:14 -08:00
H.J. Lu	d3e4f5a101	s_sincosf.h: Change pio4 type to float [BZ #28713 ] s_cosf.c and s_sinf.c have if (abstop12 (y) < abstop12 (pio4)) where abstop12 takes a float argument, but pio4 is static const double. pio4 is used only in calls to abstop12 and never in arithmetic. Apply -static const double pio4 = 0x1.921FB54442D18p-1; +static const float pio4 = 0x1.921FB6p-1f; to fix: FAIL: math/test-float-cos FAIL: math/test-float-sin FAIL: math/test-float-sincos FAIL: math/test-float32-cos FAIL: math/test-float32-sin FAIL: math/test-float32-sincos when compiling with GCC 12. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-21 08:56:12 -08:00
maminjie	e0fc721ce6	Linux: Fix 32-bit vDSO for clock_gettime on powerpc32 When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID, on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully ( because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S does not support these two IDs, the 32-bit time_t syscall will be used), but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used, resulting in two system calls. Fix commit `72e84d1db2`. Signed-off-by: maminjie <maminjie2@huawei.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-21 09:47:16 -03:00
H.J. Lu	de8a0897e3	Regenerate ulps on x86_64 with GCC 12 Fix FAIL: math/test-float-clog10 FAIL: math/test-float32-clog10 on Intel Core i7-1165G7 with GCC 12.	2021-12-20 15:25:00 -08:00
Joseph Myers	a94d9659cd	Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol numbers for PF_CAN") but apparently missed for glibc at the time. Tested for x86_64.	2021-12-20 15:38:32 +00:00
Adhemerval Zanella	691d9ae9e6	Remove ununsed tcb-offset Some architectures do not use the auto-generated tcb-offsets.h.	2021-12-17 17:47:29 -03:00
Aurelien Jarno	225da459ce	riscv: align stack before calling _dl_init [BZ #28703 ] Align the stack pointer to 128 bits during the call to _dl_init() as specified by the RISC-V ABI [1]. This fixes the elf/tst-align2 test. Fixes bug 28703. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc	2021-12-17 20:29:34 +01:00
Aurelien Jarno	d2e594d715	riscv: align stack in clone [BZ #28702 ] The RISC-V ABI [1] mandates that "the stack pointer shall be aligned to a 128-bit boundary upon procedure entry". This as not the case in clone. This fixes the misc/tst-misalign-clone-internal and misc/tst-misalign-clone tests. Fixes bug 28702. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc	2021-12-17 20:29:32 +01:00
Aurelien Jarno	94058f6cde	elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704 ] On KVM guests running on some AMD systems, the IBRS feature is reported as a synthetic feature using the Intel feature, while the cpuinfo entry keeps the same. Handle that by first checking the presence of the Intel feature on AMD systems. Fixes bug 28704.	2021-12-17 20:20:15 +01:00
Matheus Castanho	ae91d3df24	powerpc64[le]: Allocate extra stack frame on syscall.S The syscall function does not allocate the extra stack frame for scv like other assembly syscalls using DO_CALL_SCV. So after commit `d120fb9941` changed the offset that is used to save LR, syscall ended up using an invalid offset, causing regressions on powerpc64. So make sure the extra stack frame is allocated in syscall.S as well to make it consistent with other uses of DO_CALL_SCV and avoid similar issues in the future. Tested on powerpc, powerpc64, and powerpc64le (with and without scv) Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-12-17 15:40:53 -03:00
Florian Weimer	ce1e5b1122	arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macro Due to PIE-by-default, PIC is now defined in more cases. libc.a does not have _rtld_global_ro, and statically linking setcontext fails. SHARED is the right condition to use, so that libc.a references _dl_hwcap instead of _rtld_global_ro. For static PIE support, the !SHARED case would still have to be made PIC. This patch does not achieve that. Fixes commit `23645707f1` ("Replace --enable-static-pie with --disable-default-pie"). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-17 11:48:44 +01:00
Adhemerval Zanella	98d5fcb8d0	malloc: Add Huge Page support for mmap With the morecore hook removed, there is not easy way to provide huge pages support on with glibc allocator without resorting to transparent huge pages. And some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly with the system default size, while a positive value means and specific page size that is matched against the supported ones by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:38 -03:00
Adhemerval Zanella	5f6d8d97c6	malloc: Add madvise support for Transparent Huge Pages Linux Transparent Huge Pages (THP) current supports three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous pages. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:14 -03:00
Florian Weimer	cb976fba4c	powerpc: Use global register variable in <thread_pointer.h> A local register variable is merely a compiler hint, and so not appropriate in this context. Move the global register variable into <thread_pointer.h> and include it from <tls.h>, as there can only be one global definition for one particular register. Fixes commit `8dbeb0561e` ("nptl: Add <thread_pointer.h> for defining __thread_pointer"). Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-12-15 16:06:25 +01:00
H.J. Lu	4435c29892	Support target specific ALIGN for variable alignment test [BZ #28676 ] Add <tst-file-align.h> to support target specific ALIGN for variable alignment test: 1. Alpha: Use 0x10000. 2. MicroBlaze and Nios II: Use 0x8000. 3. All others: Use 0x200000. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-14 14:50:33 -08:00
Samuel Thibault	ec06717856	hurd: Do not set PIE_UNSUPPORTED This is now supported.	2021-12-14 08:38:05 +01:00
Akila Welihinda	3b1402b3fc	sysdeps: Simplify sin Taylor Series calculation The macro TAYLOR_SIN adds the term `-0.5daa^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5daa^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-13 15:31:05 +01:00
Adhemerval Zanella	104d2005d5	math: Remove the error handling wrapper from hypot and hypotf The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Wilco Dijkstra	2f44eef584	math: Use fmin/fmax on hypot It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	ecb94e9587	aarch64: Add math-use-builtins-f{max,min}.h It allows to remove the arch-specific implementations.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	583c4d424e	math: Add math-use-builtinds-fmin.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 10:08:43 -03:00
Adhemerval Zanella	72ab1eaec7	math: Add math-use-builtinds-fmax.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	2eb1cd2f47	math: Remove powerpc e_hypot The generic implementation is shows only slight worse performance: POWER10 reciprocal-throughput latency master 8.28478 13.7253 new hypot 7.21945 13.1933 POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 14.8479 15.8061 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 16.5371 18.4057 One way to improve might to make gcc generate xsmaxdp/xsmindp for fmax/fmin (it onl does for -ffast-math, clang does for default options). Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu (power9).	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	a1d3c9b642	i386: Move hypot implementation to C The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu.	2021-12-13 09:08:02 -03:00
Adhemerval Zanella	c212d6397e	math: Use an improved algorithm for hypotl (ldbl-128) This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Adhemerval Zanella	aa9c28cde3	math: Use an improved algorithm for hypotl (ldbl-96) This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.02% results with an error of 1 ulp (23158 results) while the new implementation only shows 0.0001% of total (111). [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Wilco Dijkstra	ccfa865a82	math: Improve hypot performance with FMA Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite.	2021-12-13 09:02:34 -03:00
Wilco Dijkstra	6c848d7038	math: Use an improved algorithm for hypot (dbl-64) This implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for denormal results. The main advantage of the new algorithm is its precision: with a random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc current implementation shows around 0.34% results with an error of 1 ulp (3424869 results) while the new implementation only shows 0.002% of total (18851). The performance result are also only slight worse than current implementation. On x86_64 (Ryzen 5900X) with gcc 12: Before: "hypot": { "workload-random": { "duration": 3.73319e+09, "iterations": 1.12e+08, "reciprocal-throughput": 22.8737, "latency": 43.7904, "max-throughput": 4.37184e+07, "min-throughput": 2.28361e+07 } } After: "hypot": { "workload-random": { "duration": 3.7597e+09, "iterations": 9.8e+07, "reciprocal-throughput": 23.7547, "latency": 52.9739, "max-throughput": 4.2097e+07, "min-throughput": 1.88772e+07 } } Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Adhemerval Zanella	7fe0ace3e2	math: Simplify hypotf implementation Use a more optimized comparison for check for NaN and infinite and add an inlined issignaling implementation for float. With gcc it results in 2 FP comparisons. The file Copyright is also changed to use GPL, the implementation was completely changed by `7c10fd3515` to use double precision instead of scaling and this change removes all the GET_FLOAT_WORD usage. Checked on x86_64-linux-gnu.	2021-12-13 09:02:30 -03:00
Siddhesh Poyarekar	5afe4c0d69	Cleanup encoding in comments Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8 equivalents so that files don't end up with mixed encodings. With this, all files (except tests that actually test different encodings) have a single encoding. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-13 10:01:45 +05:30
Siddhesh Poyarekar	23645707f1	Replace --enable-static-pie with --disable-default-pie Build glibc programs and tests as PIE by default and enable static-pie automatically if the architecture and toolchain supports it. Also add a new configuration option --disable-default-pie to prevent building programs as PIE. Only the following architectures now have PIE disabled by default because they do not work at the moment. hppa, ia64, alpha and csky don't work because the linker is unable to handle a pcrel relocation generated from PIE objects. The microblaze compiler is currently failing with an ICE. GNU hurd tries to enable static-pie, which does not work and hence fails. All these targets have default PIE disabled at the moment and I have left it to the target maintainers to enable PIE on their targets. build-many-glibcs runs clean for all targets. I also tested x86_64 on Fedora and Ubuntu, to verify that the default build as well as --disable-default-pie work as expected with both system toolchains. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-13 08:08:59 +05:30
Samuel Thibault	556a6126f8	hurd: Add rules for static PIE build This fixes [BZ #28671].	2021-12-12 00:42:13 +01:00
Samuel Thibault	26803075e4	hurd: Fix gmon-static We need to use crt0 for gmon-static too.	2021-12-12 00:42:12 +01:00
H.J. Lu	ea5814467a	x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656 ] Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since the first PT_LOAD segment is no longer executable due to defaulting to -z separate-code. This fixes [BZ #28656]. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-10 14:01:34 -08:00
Florian Weimer	5cc3385654	nptl: Add one more barrier to nptl/tst-create1 Without the bar_ctor_finish barrier, it was possible that thread2 re-locked user_lock before ctor had a chance to lock it. ctor then blocked in its locking operation, xdlopen from the main thread did not return, and thread2 was stuck waiting in bar_dtor: thread 1: started. thread 2: started. thread 2: locked user_lock. constructor started: 0. thread 1: in ctor: started. thread 3: started. thread 3: done. thread 2: unlocked user_lock. thread 2: locked user_lock. Fixes the test in commit `83b5323261` ("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-10 11:51:25 +01:00
Florian Weimer	627f5ede70	Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely introduced to support a configuration where the thread pointer has not the same alignment as THREAD_SELF. Only ia64 seems to use that, but for the stack/pointer guard, not for storing tcbhead_t. Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift the thread pointer, potentially landing in a different residue class modulo the alignment, but the changes should not impact that. In general, given that TLS variables have their own alignment requirements, having different alignment for the (unshifted) thread pointer and struct pthread would potentially result in dynamic offsets, leading to more complexity. hppa had different values before: __alignof__ (tcbhead_t), which seems to be 4, and __alignof__ (struct pthread), which was 8 (old default) and is now 32. However, it defines THREAD_SELF as: /* Return the thread descriptor for the current thread. / # define THREAD_SELF \ ({ struct pthread __self; \ __self = __get_cr27(); \ __self - 1; \ }) So the thread pointer points after struct pthread (hence __self - 1), and they have to have the same alignment on hppa as well. Similarly, on ia64, the definitions were different. We have: # define TLS_PRE_TCB_SIZE \ (sizeof (struct pthread) \ + (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \ ? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \ & ~(__alignof__ (struct pthread) - 1)) \ : 0)) # define THREAD_SELF \ ((struct pthread ) ((char ) __thread_self - TLS_PRE_TCB_SIZE)) And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment (confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c). On m68k, we have a larger gap between tcbhead_t and struct pthread. But as far as I can tell, the port is fine with that. The definition of TCB_OFFSET is sufficient to handle the shifted TCB scenario. This fixes commit `23c77f6018` ("nptl: Increase default TCB alignment to 32"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-09 23:47:49 +01:00
Florian Weimer	c901c3e764	nptl: Add public rseq symbols and <sys/rseq.h> The relationship between the thread pointer and the rseq area is made explicit. The constant offset can be used by JIT compilers to optimize rseq access (e.g., for really fast sched_getcpu). Extensibility is provided through __rseq_size and __rseq_flags. (In the future, the kernel could request a different rseq size via the auxiliary vector.) Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	e3e589829d	nptl: Add glibc.pthread.rseq tunable to control rseq registration This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-09 09:49:32 +01:00
Florian Weimer	1d350aa060	Linux: Use rseq to accelerate sched_getcpu Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	95e114a091	nptl: Add rseq registration The rseq area is placed directly into struct pthread. rseq registration failure is not treated as an error, so it is possible that threads run with inconsistent registration status. <sys/rseq.h> is not yet installed as a public header. Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-09 09:49:32 +01:00
Florian Weimer	8d1927d8dc	nptl: Introduce THREAD_GETMEM_VOLATILE This will be needed for rseq TCB access. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	ce2248ab91	nptl: Introduce <tcb-access.h> for THREAD_* accessors These are common between most architectures. Only the x86 targets are outliers. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	8dbeb0561e	nptl: Add <thread_pointer.h> for defining __thread_pointer <tls.h> already contains a definition that is quite similar, but it is not consistent across architectures. Only architectures for which rseq support is added are covered. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
H.J. Lu	ceeffe968c	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used.	2021-12-06 07:14:12 -08:00
Adhemerval Zanella	a329f68f2e	linux: Add generic ioctl implementation The powerpc is refactor to use the default implementation.	2021-12-06 08:03:18 -03:00
Adhemerval Zanella	00baddbb93	linux: Add generic syscall implementation It allows also to remove hppa specific implementation and simplify riscv implementation a bit.	2021-12-06 08:03:11 -03:00
Florian Weimer	4fb4e7e821	csu: Always use __executable_start in gmon-start.c Current binutils defines __executable_start as the lowest text address, so using the entry point address as a fallback is no longer necessary. As a result, overriding <entry.h> is only necessary if the entry point is not called _start. The previous approach to define __ASSEMBLY__ to suppress the declaration breaks if headers included by <entry.h> are not compatible with __ASSEMBLY__. This happens with rseq integration because it is necessary to include kernel headers in more places. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-05 13:50:21 +01:00
Florian Weimer	c1cb2deeca	elf: execve statically linked programs instead of crashing [BZ #28648 ] Programs without dynamic dependencies and without a program interpreter are now run via execve. Previously, the dynamic linker either crashed while attempting to read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT data), or the self-relocated in the static PIE executable crashed because the outer dynamic linker had already applied RELRO protection. <dl-execve.h> is needed because execve is not available in the dynamic loader on Hurd. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-05 11:28:34 +01:00
Noah Goldstein	4df1fa6ddc	x86-64: Use notl in EVEX strcmp [BZ #28646 ] Must use notl %edi here as lower bits are for CHAR comparisons potentially out of range thus can be 0 without indicating mismatch. This fixes BZ #28646. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>	2021-12-03 21:14:11 -08:00
Florian Weimer	23c77f6018	nptl: Increase default TCB alignment to 32 rseq support will use a 32-byte aligned field in struct pthread, so the whole struct needs to have at least that alignment. nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h> to obtain the fallback definition. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-03 20:43:31 +01:00
Wilco Dijkstra	b31bd11454	AArch64: Improve A64FX memcpy v2 is a complete rewrite of the A64FX memcpy. Performance is improved by streamlining the code, aligning all large copies and using a single unrolled loop for all sizes. The code size for memcpy and memmove goes down from 1796 bytes to 868 bytes. Performance is better in all cases: bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33% faster for large sizes, bench-memcpy-walk is 25% faster for small sizes and 20% for the largest sizes. The geomean of all tests in bench-memcpy is 5.1% faster, and total time is reduced by 4%. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-02 18:36:03 +00:00
Wilco Dijkstra	b51eb35c57	AArch64: Optimize memcmp Rewrite memcmp to improve performance. On small and medium inputs performance is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per iteration, which is 30-50% faster depending on the size. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-02 18:35:53 +00:00
Matheus Castanho	d120fb9941	powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532 ] Syscalls based on the assembly templates are missing CFI for r31, which gets clobbered when scv is used, and info for LR is inaccurate, placed in the wrong LOC and not using the proper offset. LR was also being saved to the callee's frame, while the ABI mandates it to be saved to the caller's frame. These are fixed by this commit. After this change: $ readelf -wF libc.so.6 \| grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6 00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c LOC CFA r31 ra 000000000004b9d4 r1+0 u u 000000000004b9e4 r1+48 u u 000000000004b9e8 r1+48 c-16 u 000000000004b9fc r1+48 c-16 c+16 000000000004ba08 r1+48 c-16 000000000004ba18 r1+48 u 000000000004ba1c r1+0 u libc.so.6: file format elf64-powerpcle Disassembly of section .text: 000000000004b9d4 <kill>: 4b9d4: 1f 00 4c 3c addis r2,r12,31 4b9d8: 2c c3 42 38 addi r2,r2,-15572 4b9dc: 25 00 00 38 li r0,37 4b9e0: d1 ff 21 f8 stdu r1,-48(r1) 4b9e4: 20 00 e1 fb std r31,32(r1) 4b9e8: 98 8f ed eb ld r31,-28776(r13) 4b9ec: 10 00 ff 77 andis. r31,r31,16 4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38> 4b9f4: a6 02 28 7d mflr r9 4b9f8: 40 00 21 f9 std r9,64(r1) 4b9fc: 01 00 00 44 scv 0 4ba00: 40 00 21 e9 ld r9,64(r1) 4ba04: a6 03 28 7d mtlr r9 4ba08: 08 00 00 48 b 4ba10 <kill+0x3c> 4ba0c: 02 00 00 44 sc 4ba10: 00 00 bf 2e cmpdi cr5,r31,0 4ba14: 20 00 e1 eb ld r31,32(r1) 4ba18: 30 00 21 38 addi r1,r1,48 4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60> 4ba20: 01 f0 20 39 li r9,-4095 4ba24: 40 48 23 7c cmpld r3,r9 4ba28: 20 00 e0 4d bltlr+ 4ba2c: d0 00 63 7c neg r3,r3 4ba30: 08 00 00 48 b 4ba38 <kill+0x64> 4ba34: 20 00 e3 4c bnslr+ 4ba38: c8 32 fe 4b b 2ed00 <__syscall_error> ... 4ba44: 40 20 0c 00 .long 0xc2040 4ba48: 68 00 00 00 .long 0x68 4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3 4ba50: 6b 69 6c 6c xoris r12,r3,26987	2021-11-30 15:18:52 -03:00
Adhemerval Zanella	efc6b2dbc4	linux: Implement pipe in terms of __NR_pipe2 The syscall pipe2 was added in linux 2.6.27 and glibc requires linux 3.2.0. The patch removes the arch-specific implementation for alpha, ia64, mips, sh, and sparc which requires a different kernel ABI than the usual one. Checked on x86_64-linux-gnu and with a build for the affected ABIs.	2021-11-30 13:13:03 -03:00

1 2 3 4 5 ...

14743 Commits