glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-26 06:50:07 +00:00

Author	SHA1	Message	Date
Sergey Bugaev	d08ae9c3fb	hurd, htl: Add some x86_64-specific code Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230212111044.610942-12-bugaevc@gmail.com>	2023-02-12 16:35:03 +01:00
Samuel Thibault	8420b3e832	Fix typos in comments	2023-02-12 16:34:28 +01:00
Samuel Thibault	bfb583e791	htl: Generalize i386 pt-machdep.h to x86	2023-02-12 16:33:39 +01:00
Sergey Bugaev	be6d002ca2	hurd: Set up the basic tree for x86_64-gnu And move pt-setup.c to the generic x86 tree. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230212111044.610942-11-bugaevc@gmail.com>	2023-02-12 16:12:06 +01:00
Sergey Bugaev	4fedebc911	mach: Look for mach_i386.defs on x86_64 too Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230212111044.610942-10-bugaevc@gmail.com>	2023-02-12 16:04:50 +01:00
Sergey Bugaev	3d008a92a8	htl: Fix semaphore reference 'sem' is the opaque 'sem_t', 'isem' is the actual 'struct new_sem'. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230212111044.610942-6-bugaevc@gmail.com>	2023-02-12 15:57:32 +01:00
Sergey Bugaev	f4315054b4	hurd: Use mach_msg_type_number_t where appropriate It has been decided that on x86_64, mach_msg_type_number_t stays 32-bit. Therefore, it's not possible to use mach_msg_type_number_t interchangeably with size_t, in particular this breaks when a pointer to a variable is passed to a MIG routine. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230212111044.610942-3-bugaevc@gmail.com>	2023-02-12 15:52:07 +01:00
Sergey Bugaev	8a86e7b6a6	hurd: Refactor readlinkat() Make the code flow more linear using early returns where possible. This makes it so much easier to reason about what runs on error / successful code paths. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230212111044.610942-2-bugaevc@gmail.com>	2023-02-12 15:50:40 +01:00
Samuel Thibault	63550530d9	hurd: Fix unwinding over INTR_MSG_TRAP We used to use .cfi_adjust_cfa_offset around %esp manipulation asm instructions to fix unwinding, but when building glibc with -fno-omit-frame-pointer this is bogus since in that case %ebp is the CFA and does not move. Instead, let's force -fno-omit-frame-pointer when building intr-msg.c so that %ebp can always be used and no .cfi_adjust_cfa_offset is needed.	2023-02-09 19:58:43 +01:00
Adhemerval Zanella Netto	16e424a325	powerpc64: Add the clone3 wrapper It follows the internal signature: extern int clone3 (struct clone_args __cl_args, size_t __size, int (__func) (void __arg), void __arg); The powerpc64 ABI requires an initial stackframe so the child can store/restore the TOC. It is create prior calling clone3 by adjusting the stack size (since kernel will compute the stack as stack plus size). Checked on powerpc64-linux-gnu (power8, kernel 6.0) and powerpc64le-linux-gnu (power9, kernel 4.18). Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>	2023-02-09 07:49:25 -03:00
Adhemerval Zanella	22999b2f0f	string: Add libc_hidden_proto for memrchr Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>	2023-02-08 17:13:58 -03:00
Adhemerval Zanella	7ea510127e	string: Add libc_hidden_proto for strchrnul Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>	2023-02-08 17:13:56 -03:00
quxm	ec6d2b83f2	C-SKY: Strip hard float abi from hard float feature. The hard float abi and hard float are different, Hard float abi: Use float register to pass float type arguments. Hard float: Enable the hard float ISA feature. So the with_fp_cond cannot represent these two features. When -mfloat-abi=softfp, the float abi is soft and hard float is enabled. So add 'with_hard_float_abi' in preconfigure and define 'CSKY_HARD_FLOAT_ABI' if float abi is hard, and use 'CSKY_HARD_FLOAT_ABI' to determine dynamic linker because it is what determines compatibility. And with_fp_cond is still needed to tell glibc whether to enable hard floating feature. In addition, use AC_TRY_COMMAND to test gcc to ensure compatibility between different versions of gcc. The original way has a problem that __CSKY_HARD_FLOAT_FPU_SF__ means the target only has single hard float-points ISA, so it's not defined in CPUs like ck810f. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-02-07 16:34:24 +08:00
Stefan Liebler	41f67ccbe9	S390: Influence hwcaps/stfle via GLIBC_TUNABLES. This patch enables the option to influence hwcaps and stfle bits used by the s390 specific ifunc-resolvers. The currently x86-specific tunable glibc.cpu.hwcaps is also used on s390x to achieve the task. In addition the user can also set a CPU arch-level like z13 instead of single HWCAP and STFLE features. Note that the tunable only handles the features which are really used in the IFUNC-resolvers. All others are ignored as the values are only used inside glibc. Thus we can influence: - HWCAP_S390_VXRS (z13) - HWCAP_S390_VXRS_EXT (z14) - HWCAP_S390_VXRS_EXT2 (z15) - STFLE_MIE3 (z15) The influenced hwcap/stfle-bits are stored in the s390-specific cpu_features struct which also contains reserved fields for future usage. The ifunc-resolvers and users of stfle bits are adjusted to use the information from cpu_features struct. On 31bit, the ELF_MACHINE_IRELATIVE macro is now also defined. Otherwise the new ifunc-resolvers segfaults as they depend on the not yet processed_rtld_global_ro@GLIBC_PRIVATE relocation.	2023-02-07 09:19:27 +01:00
Adhemerval Zanella	25788431c0	riscv: Add string-fza.h and string-fzi.h It uses the bitmanip extension to optimize index_fist and index_last with clz/ctz (using generic implementation that routes to compiler builtin) and orc.b to check null bytes. Checked the string test on riscv64 user mode. Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	c505eb828e	sh: Add string-fzb.h Use the SH cmp/str on has_{zero,eq,zero_eq}. Checked on sh4-linux-gnu.	2023-02-06 16:19:35 -03:00
Richard Henderson	080685c90f	powerpc: Add string-fza.h While ppc has the more important string functions in assembly, there are still a few generic routines used. Use the Power 6 CMPB insn for testing of zeros. Checked on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-02-06 16:19:35 -03:00
Richard Henderson	885306b2f6	arm: Add string-fza.h While arm has the more important string functions in assembly, there are still a few generic routines used. Use the UQSUB8 insn for testing of zeros. Checked on armv7-linux-gnueabihf Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-02-06 16:19:35 -03:00
Richard Henderson	120ad6ed1a	alpha: Add string-fza, string-fzb.h, string-fzi.h, and string-shift.h While alpha has the more important string functions in assembly, there are still a few for find the generic routines are used. Use the CMPBGE insn, via the builtin, for testing of zeros. Use a simplified expansion of __builtin_ctz when the insn isn't available. Checked on alpha-linux-gnu. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-02-06 16:19:35 -03:00
Richard Henderson	c62b1c29c2	hppa: Add string-fza.h, string-fzc.h, and string-fzi.h Use UXOR,SBZ to test for a zero byte within a word. While we can get semi-decent code out of asm-goto, we would do slightly better with a compiler builtin. For index_zero et al, sequential testing of bytes is less expensive than any tricks that involve a count-leading-zeros insn that we don't have. Checked on hppa-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-02-06 16:19:35 -03:00
Richard Henderson	be836d9153	hppa: Add memcopy.h GCC's combine pass cannot merge (x >> c \| y << (32 - c)) into a double-word shift unless (1) the subtract is in the same basic block and (2) the result of the subtract is used exactly once. Neither condition is true for any use of MERGE. By forcing the use of a double-word shift, we not only reduce contention on SAR, but also allow the setting of SAR to be hoisted outside of a loop. Checked on hppa-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	0f4254311e	string: Improve generic strnlen with memchr It also cleanups the multiple inclusion by leaving the ifunc implementation to undef the weak_alias and libc_hidden_def. Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	2a8867a17f	string: Improve generic memchr New algorithm read the first aligned address and mask off the unwanted bytes (this strategy is similar to arch-specific implementations used on powerpc, sparc, and sh). The loop now read word-aligned address and check using the has_eq macro. Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, and powerpc64-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	506f7dbbab	string: Improve generic strchr New algorithm now calls strchrnul. Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, and powerpc64-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	685e844a97	string: Improve generic strchrnul New algorithm read the first aligned address and mask off the unwanted bytes (this strategy is similar to arch-specific implementations used on powerpc, sparc, and sh). The loop now read word-aligned address and check using the has_zero_eq function. Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu, and powerpc-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	350d8d1366	string: Improve generic strlen New algorithm read the first aligned address and mask off the unwanted bytes (this strategy is similar to arch-specific implementations used on powerpc, sparc, and sh). The loop now read word-aligned address and check using the has_zero macro. Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu, and powercp64-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	00cb84dde7	Add string vectorized find and detection functions This patch adds generic string find and detection meant to be used in generic vectorized string implementation. The idea is to decompose the basic string operation so each architecture can reimplement if it provides any specialized hardware instruction. The 'string-misc.h' provides miscellaneous functions: - extractbyte: extracts the byte from an specific index. - repeat_bytes: setup an word by replicate the argument on each byte. The 'string-fza.h' provides zero byte detection functions: - find_zero_low, find_zero_all, find_eq_low, find_eq_all, find_zero_eq_low, find_zero_eq_all, and find_zero_ne_all The 'string-fzb.h' provides boolean zero byte detection functions: - has_zero: determine if any byte within a word is zero. - has_eq: determine byte equality between two words. - has_zero_eq: determine if any byte within a word is zero along with byte equality between two words. The 'string-fzi.h' provides positions for string-fza.h results: - index_first: return index of first zero byte within a word. - index_last: return index of first byte different between two words. The 'string-fzc.h' provides a combined version of fza and fzi: - index_first_zero_eq: return index of first zero byte within a word or first byte different between two words. - index_first_zero_ne: return index of first zero byte within a word or first byte equal between two words. - index_last_zero: return index of last zero byte within a word. - index_last_eq: return index of last byte different between two words. The 'string-shift.h' provides a way to mask off parts of a work based on some alignmnet (to handle unaligned arguments): - shift_find, shift_find_last. Co-authored-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-02-06 16:19:35 -03:00
Richard Henderson	d45890b28c	Parameterize OP_T_THRES from memcopy.h It moves OP_T_THRES out of memcopy.h to its own header and adjust each architecture that redefines it. Checked with a build and check with run-built-tests=no for all major Linux ABIs. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2023-02-06 16:19:35 -03:00
Adhemerval Zanella	d1a9b6d8e7	Parameterize op_t from memcopy.h It moves the op_t definition out to an specific header, adds the attribute 'may-alias', and cleanup its duplicated definitions. Checked with a build and check with run-built-tests=no for all major Linux ABIs. Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2023-02-06 16:19:35 -03:00
Wilco Dijkstra	d2d3f3720c	AArch64: Improve SVE memcpy and memmove Improve SVE memcpy by copying 2 vectors if the size is small enough. This improves performance of random memcpy by ~9% on Neoverse V1, and 33-64 byte copies are ~16% faster. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-02-06 16:15:34 +00:00
Flavio Cruz	a1dcc64c9b	Move RETURN_TO to x86/sysdep.h and implement x86_64 version. Message-Id: <Y99nfeBrTubZL9oi@jupiter.tail36e24.ts.net>	2023-02-05 12:36:38 +01:00
Flavio Cruz	5130cd77b0	Remove sysdeps/mach/i386/machine-sp.h This file is not used today since we end up using sysdeps/i386/htl/machine-sp.h. Getting the stack pointer does not need to be hurd specific and can go into sysdeps/<arch>. Message-Id: <Y9tpWs2WOgE/Duiq@jupiter.tail36e24.ts.net>	2023-02-02 19:47:47 +01:00
Samuel Thibault	e0dc827bf6	hurd: Move some i386 bits to x86 As they will actually be usable on x86_64 too.	2023-02-02 00:27:26 +01:00
Sergey Bugaev	a979b72747	hurd: Implement SHM_ANON This adds a special SHM_ANON value that can be passed into shm_open () in place of a name. When called in this way, shm_open () will create a new anonymous shared memory file. The file will be created in the same way that other shared memory files are created (i.e., under /dev/shm/), except that it is not given a name and therefore cannot be reached from the file system, nor by other calls to shm_open (). This is accomplished by utilizing O_TMPFILE. This is intended to be compatible with FreeBSD's API of the same name. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230130125216.6254-4-bugaevc@gmail.com>	2023-02-01 23:36:11 +01:00
Sergey Bugaev	65392c8478	hurd: Implement O_TMPFILE This is a flag that causes open () to create a new, unnamed file in the same filesystem as the given directory. The file descriptor can be simply used in the creating process as a temporary file, or shared with children processes via fork (), or sent over a Unix socket. The file can be left anonymous, in which case it will be deleted from the backing file system once all copies of the file descriptor are closed, or given a permanent name with a linkat () call, such as the following: int fd = open ("/tmp", O_TMPFILE \| O_RDWR, 0700); /* Do something with the file... */ linkat (fd, "", AT_FDCWD, "/tmp/filename", AT_EMPTY_PATH); In between creating the file and linking it to the file system, it is possible to set the file content, mode, ownership, author, and other attributes, so that the file visibly appears in the file system (perhaps replacing another file) atomically, with all of its attributes already set up. The Hurd support for O_TMPFILE directly exposes the dir_mkfile RPC to user programs. Previously, dir_mkfile was used by glibc internally, in particular for implementing tmpfile (), but not exposed to user programs through a Unix-level API. O_TMPFILE was initially introduced by Linux. This implementation is intended to be compatible with the Linux implementation, except that the O_EXCL flag is not given the special meaning when used together with O_TMPFILE, unlike on Linux. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230130125216.6254-3-bugaevc@gmail.com>	2023-02-01 23:32:21 +01:00
Adhemerval Zanella Netto	98f9435f33	Linux: optimize clone3 internal usage Add an optimization to avoid calling clone3 when glibc detects that there is no kernel support. It also adds __ASSUME_CLONE3, which allows skipping this optimization and issuing the clone3 syscall directly. It does not handle the the small window between 5.3 and 5.5 for posix_spawn (CLONE_CLEAR_SIGHAND was added in 5.5). Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-02-01 08:42:11 -03:00
Adhemerval Zanella Netto	1e442efd57	aarch64: Add the clone3 wrapper It follow the internal signature: extern int clone3 (struct clone_args __cl_args, size_t __size, int (__func) (void __arg), void __arg); Checked on aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-02-01 08:42:11 -03:00
Adhemerval Zanella Netto	2053c11331	linux: Add clone3 CLONE_CLEAR_SIGHAND optimization to posix_spawn The clone3 flag resets all signal handlers of the child not set to SIG_IGN to SIG_DFL. It allows to skip most of the sigaction calls to setup child signal handling, where previously a posix_spawn had to issue 2 times NSIG sigaction calls (one to obtain the current disposition and another to set either SIG_DFL or SIG_IGN). With POSIX_SPAWN_SETSIGDEF the child will setup the signal for the case where the disposition is SIG_IGN. The code must handle the fallback where clone3 is not available. This is done by splitting __clone_internal_fallback from __clone_internal. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-02-01 08:42:11 -03:00
Adhemerval Zanella Netto	2290cf73cc	Linux: Do not align the stack for __clone3 All internal callers of __clone3 should provide an already aligned stack. Removing the stack alignment in __clone3 is a net gain: it simplifies the internal function contract (mask/unmask signals) along with the arch-specific code. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-02-01 08:42:11 -03:00
Adhemerval Zanella Netto	2fe58919a0	linux: Extend internal clone3 documentation Different than kernel, clone3 returns EINVAL for NULL struct clone_args or function pointer. This is similar to clone interface that return EINVAL for NULL function argument. It also clean up the Linux clone3.h interface, since it not currently exported. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-02-01 08:42:11 -03:00
Adhemerval Zanella Netto	ff9ffc805f	linux: Do not reset signal handler in posix_spawn if it is already SIG_DFL There is no need to issue another sigaction if the disposition is already SIG_DFL. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-02-01 08:42:11 -03:00
Noah Goldstein	b2c474f8de	x86: Fix strncat-avx2.S reading past length [BZ #30065 ] Occurs when `src` has no null-term. Two cases: 1) Zero-length check is doing: ``` test %rdx, %rdx jl L(zero_len) ``` which doesn't actually check zero (was at some point `decq` and the flag never got updated). The fix is just make the flag `jle` i.e: ``` test %rdx, %rdx jle L(zero_len) ``` 2) Length check in page-cross case checking if we should continue is doing: ``` cmpq %r8, %rdx jb L(page_cross_small) ``` which means we will continue searching for null-term if length ends at the end of a page and there was no null-term in `src`. The fix is to make the flag: ``` cmpq %r8, %rdx jbe L(page_cross_small) ```	2023-01-31 19:13:46 -06:00
Carlos O'Donell	b01f976900	Regenerate configure. Run using vanilla upstream autoconf 2.69. Minor whitespace change to sysdeps/loongarch/configure and sysdeps/mach/configure, and nothing else.	2023-01-31 17:51:40 -05:00
Andreas K. Hüttel	33f0f58b59	sparc (64bit): Regenerate ulps Linux catbus 5.15.69-gentoo #1 SMP Sat Sep 24 07:56:24 PDT 2022 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux gcc (Gentoo 11.3.1_p20221209 p3) 11.3.1 20221209 GNU ld (Gentoo 2.38 p4) 2.38 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-01-24 11:21:50 -05:00
Andreas K. Hüttel	0bac959d75	ia64: Regenerate ulps Linux guppy 5.13.0-00002-gdecb01746d6c #368 SMP Sat Aug 14 20:10:13 UTC 2021 ia64 Dual-Core Intel(R) Itanium(R) Processor 9040 GenuineIntel GNU/Linux gcc (Gentoo 12.2.1_p20221231 p8) 12.2.1 20221231 GNU ld (Gentoo 2.40 p1) 2.40 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-01-24 11:21:38 -05:00
Sajan Karumanchi	103a469dc7	x86: Cache computation for AMD architecture. All AMD architectures cache details will be computed based on __cpuid__ `0x8000_001D` and the reference to __cpuid__ `0x8000_0006` will be zeroed out for future architectures. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>	2023-01-18 19:28:54 +01:00
Wilco Dijkstra	55599d4804	AArch64: Improve strrchr Use shrn for narrowing the mask which simplifies code and speeds up small strings. Unroll the first search loop to improve performance on large strings. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	ad098893ba	AArch64: Optimize strnlen Optimize strnlen using the shrn instruction and improve the main loop. Small strings are around 10% faster, large strings are 40% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	03c8ce5000	AArch64: Optimize strlen Optimize strlen by unrolling the main loop. Large strings are 64% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	349e48c01e	AArch64: Optimize strcpy Unroll the main loop. Large strings are around 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00

1 2 3 4 5 ...

15477 Commits