glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-24 05:50:14 +00:00

Author	SHA1	Message	Date
Wilco Dijkstra	8ecb477ea1	AArch64: Remove memset-reg.h Remove memset-reg.h by moving register definitions into the memset implementations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-09-10 14:18:03 +01:00
Wilco Dijkstra	cec3aef324	AArch64: Optimize memset Improve small memsets by avoiding branches and use overlapping stores. Use DC ZVA for copies over 128 bytes. Remove unnecessary code for ZVA sizes other than 64 and 128. Performance of random memset benchmark improves by 24% on Neoverse N1. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-09-09 15:30:00 +01:00
Joe Ramsay	8b09af572b	aarch64: Avoid redundant MOVs in AdvSIMD F32 logs Since the last operation is destructive, the first argument to the FMA also has to be the first argument to the special-case in order to avoid unnecessary MOVs. Reorder arguments and adjust special-case bounds to facilitate this. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2024-09-09 13:03:49 +01:00
Adhemerval Zanella	e2f88d8524	aarch64: Regenerate ULPs From new tests added by `0797283910`.	2024-08-07 11:02:03 -03:00
Wilco Dijkstra	3dc426b642	AArch64: Improve generic strlen Improve performance by handling another 16 bytes before entering the loop. Use ADDHN in the loop to avoid SHRN+FMOV when it terminates. Change final size computation to avoid increasing latency. On Neoverse V1 performance of the random strlen benchmark improves by 4.6%. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-08-07 14:58:46 +01:00
Adhemerval Zanella	cfc9b07346	aarch64: Regenerate ULPs From new tests added by `4dc22baa84`.	2024-07-25 10:41:30 -03:00
Andrew Pinski	2f1f7a5f8a	Aarch64: Add new memset for Qualcomm's oryon-1 core Qualcom's new core, oryon-1, has a different characteristics for memset than the current versions of memset. For non-zero, larger sizes, using GPRs rather than the SIMD stores is ~30% faster. For even larger sizes, using the nontemporal stores is needed not to polute the L1/L2 caches. For zero values, using `dc zva` should be used. Since we know the size will always be 64 bytes, we don't need to figure out the size there. I started with the emag memset and added back the `dc zva` code. Changes since v1: * v3: Fix comment formating Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-30 13:47:17 +02:00
Andrew Pinski	4dc83cac78	Aarch64: Add memcpy for qualcomm's oryon-1 core Qualcomm's new core (oryon-1) has a different performance characteristic than other cores. For memcpy, it is faster to use the GPRs to do the copy for large sizes (2x faster). For even larger sizes, it is better to use the nontemporal load/store instructions so we don't pollute the L1/L2 caches. For smaller sizes, the characteristic are very similar to other cores. I used the thunderx memcpy as a starting point and expanded from there. Changes since v1: * v2: Fix ordering in Makefile. * v3: Fix comment grammar about the ldnp/stnp instructions. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-30 13:46:33 +02:00
Adhemerval Zanella	45f5f51b85	aarch64: Update ulps For the exp10m1, exp2m1, and log10p1 implementations.	2024-06-18 17:31:10 -03:00
Andreas K. Hüttel	98ffc1bfeb	Convert to autoconf 2.72 (vanilla release, no distribution patches) As discussed at the patch review meeting Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Reviewed-by: Simon Chopin <simon.chopin@canonical.com>	2024-06-17 21:15:28 +02:00
Joseph Myers	bb014f50c4	Implement C23 logp1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the logp1 functions (aliases for log1p functions - the name is intended to be more consistent with the new log2p1 and log10p1, where clearly it would have been very confusing to name those functions log21p and log101p). As aliases rather than new functions, the content of this patch is somewhat different from those actually adding new functions. Tests are shared with log1p, so this patch does mechanically update all affected libm-test-ulps files to expect the same errors for both functions. The vector versions of log1p on aarch64 and x86_64 are not updated to have logp1 aliases (and thus there are no corresponding header, tests, abilist or ulps changes for vector functions either). It would be reasonable for such vector aliases and corresponding changes to other files to be made separately. For now, the log1p tests instead avoid testing logp1 in the vector case (a Makefile change is needed to avoid problems with grep, used in generating the .c files for vector function tests, matching more than one ALL_RM_TEST line in a file testing multiple functions with the same inputs, when it assumes that the .inc file only has a single such line). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-06-17 13:47:09 +00:00
Adhemerval Zanella	ef9596352b	aarch64: Remove duplicate memchr/strlen in libc.a (BZ 31777) The generic version provides weak definitions of memchr/strlen, which are already provided by the ifunc resolvers. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Joe Ramsay	0fed0b250f	aarch64/fpu: Add vector variants of pow Plus a small amount of moving includes around in order to be able to remove duplicate definition of asuint64. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-21 14:38:49 +01:00
Adhemerval Zanella	241338bd6f	aarch64: Update ulps For the log2p1 implementation.	2024-05-20 13:12:23 -03:00
Joe Ramsay	75207bde68	aarch64/fpu: Add vector variants of cbrt Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-16 14:35:06 +01:00
Joe Ramsay	157f89fa3d	aarch64/fpu: Add vector variants of hypot Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-16 14:34:43 +01:00
Joe Ramsay	90a6ca8b28	aarch64: Fix AdvSIMD libmvec routines for big-endian Previously many routines used * to load from vector types stored in the data table. This is emitted as ldr, which byte-swaps the entire vector register, and causes bugs for big-endian when not all lanes contain the same value. When a vector is to be used this way, it has been replaced with an array and the load with an explicit ld1 intrinsic, which byte-swaps only within lanes. As well, many routines previously used non-standard GCC syntax for vector operations such as indexing into vectors types with [] and assembling vectors using {}. This syntax should not be mixed with ACLE, as the former does not respect endianness whereas the latter does. Such examples have been replaced with, for instance, vcombine_* and vgetq_lane* intrinsics. Helpers which only use the GCC syntax, such as the v_call helpers, do not need changing as they do not use intrinsics. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-14 13:10:33 +01:00
Adhemerval Zanella	bcae44ea85	elf: Only process multiple tunable once (BZ 31686) The `680c597e9c` commit made loader reject ill-formatted strings by first tracking all set tunables and then applying them. However, it does not take into consideration if the same tunable is set multiple times, where parse_tunables_string appends the found tunable without checking if it was already in the list. It leads to a stack-based buffer overflow if the tunable is specified more than the total number of tunables. For instance: GLIBC_TUNABLES=glibc.malloc.check=2:... (repeat over the number of total support for different tunable). Instead, use the index of the tunable list to get the expected tunable entry. Since now the initial list is zero-initialized, the compiler might emit an extra memset and this requires some minor adjustment on some ports. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reported-by: Yuto Maeda <maeda@cyberdefense.jp> Reported-by: Yutaro Shimizu <shimizu@cyberdefense.jp> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2024-05-07 12:16:36 -03:00
Wilco Dijkstra	6dae61567f	AArch64: Remove unused defines of CPU names Remove unused defines of CPU names in cpu-features.h. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-04-30 13:32:29 +01:00
Florian Weimer	f8d8b1b1e6	aarch64: Enhanced CPU diagnostics for ld.so This prints some information from struct cpu_features, and the midr_el1 and dczid_el0 system register contents on every CPU. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-08 16:48:55 +02:00
Adhemerval Zanella	50c2be2390	aarch64: Remove ld.so __tls_get_addr plt usage Use the hidden alias instead. Checked on aarch64-linux-gnu.	2024-04-04 17:02:32 -03:00
Joe Ramsay	87cb1dfcd6	aarch64/fpu: Add vector variants of erfc Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:24 +01:00
Joe Ramsay	3d3a4fb8e4	aarch64/fpu: Add vector variants of tanh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:20 +01:00
Joe Ramsay	eedbbca0bf	aarch64/fpu: Add vector variants of sinh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:16 +01:00
Joe Ramsay	8b67920528	aarch64/fpu: Add vector variants of atanh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:12 +01:00
Joe Ramsay	81406ea3c5	aarch64/fpu: Add vector variants of asinh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:02 +01:00
Joe Ramsay	b09fee1d21	aarch64/fpu: Add vector variants of acosh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:32:58 +01:00
Joe Ramsay	bdb5705b7b	aarch64/fpu: Add vector variants of cosh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:32:52 +01:00
Joe Ramsay	cb5d84f1f8	aarch64/fpu: Add vector variants of erf Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:32:48 +01:00
Wilco Dijkstra	2e94e2f5d2	AArch64: Check kernel version for SVE ifuncs Old Linux kernels disable SVE after every system call. Calling the SVE-optimized memcpy afterwards will then cause a trap to reenable SVE. As a result, applications with a high use of syscalls may run slower with the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0, except for 5.14.0 which was patched. Avoid this by checking the kernel version and selecting the SVE ifunc on modern kernels. Parse the kernel version reported by uname() into a 24-bit kernel.major.minor value without calling any library functions. If uname() is not supported or if the version format is not recognized, assume the kernel is modern. Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-03-21 16:50:51 +00:00
Adhemerval Zanella	3d53d18fc7	elf: Enable TLS descriptor tests on aarch64 The aarch64 uses 'trad' for traditional tls and 'desc' for tls descriptors, but unlike other targets it defaults to 'desc'. The gnutls2 configure check does not set aarch64 as an ABI that uses TLS descriptors, which then disable somes stests. Also rename the internal machinery fron gnu2 to tls descriptors. Checked on aarch64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-03-19 14:53:30 -03:00
Szabolcs Nagy	73c26018ed	aarch64: fix check for SVE support in assembler Due to GCC bug 110901 -mcpu can override -march setting when compiling asm code and thus a compiler targetting a specific cpu can fail the configure check even when binutils gas supports SVE. The workaround is that explicit .arch directive overrides both -mcpu and -march, and since that's what the actual SVE memcpy uses the configure check should use that too even if the GCC issue is fixed independently. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-03-14 14:27:56 +00:00
Joe Ramsay	e302e10213	aarch64/fpu: Sync libmvec routines from 2.39 and before with AOR This includes a fix for big-endian in AdvSIMD log, some cosmetic changes, and numerous small optimisations mainly around inlining and using indexed variants of MLA intrinsics. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-02-26 09:45:50 -03:00
Adhemerval Zanella Netto	ae4b8d6a0e	string: Use builtins for ffs and ffsll It allows to remove a lot of arch-specific implementations. Checked on x86_64, aarch64, powerpc64. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-02-01 09:31:33 -03:00
Joseph Myers	42cc619dfb	Refer to C23 in place of C2X in glibc WG14 decided to use the name C23 as the informal name of the next revision of the C standard (notwithstanding the publication date in 2024). Update references to C2X in glibc to use the C23 name. This is intended to update everything except where it involves renaming files (the changes involving renaming tests are intended to be done separately). In the case of the _ISOC2X_SOURCE feature test macro - the only user-visible interface involved - support for that macro is kept for backwards compatibility, while adding _ISOC23_SOURCE. Tested for x86_64.	2024-02-01 11:02:01 +00:00
Sergey Bugaev	520b1df08d	aarch64: Make cpu-features definitions not Linux-specific These describe generic AArch64 CPU features, and are not tied to a kernel-specific way of determining them. We can share them between the Linux and Hurd AArch64 ports. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240103171502.1358371-13-bugaevc@gmail.com>	2024-01-04 23:48:54 +01:00
Szabolcs Nagy	0c12c8c0cb	aarch64: Add longjmp test for SME Includes test for setcontext too. The test directly checks after longjmp if ZA got disabled and the ZA contents got saved following the lazy saving scheme. It does not use ACLE code to verify that gcc can interoperate with glibc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-02 16:54:21 +00:00
Szabolcs Nagy	a7373e457f	aarch64: Add longjmp support for SME For the ZA lazy saving scheme to work, longjmp has to call __libc_arm_za_disable. In ld.so we assume ZA is not used so longjmp does not need special support there. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-02 15:43:30 +00:00
Szabolcs Nagy	d3c32ae207	aarch64: Add SME runtime support The runtime support routines for the call ABI of the Scalable Matrix Extension (SME) are mostly in libgcc. Since libc.so cannot depend on libgcc_s.so have an implementation of __arm_za_disable in libc for libc internal use in longjmp and similar APIs. __libc_arm_za_disable follows the same PCS rules as __arm_za_disable, but it's a hidden symbol so it does not need variant PCS marking. Using __libc_fatal instead of abort because it can print a message and works in ld.so too. But for now we don't need SME routines in ld.so. To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in _rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2 object is accessed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-02 15:43:30 +00:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
Joe Ramsay	667f277c78	aarch64: Add SIMD attributes to math functions with vector versions Added annotations for autovec by GCC and GFortran - this enables GCC >= 9 to autovectorise math calls at -Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-12-20 08:41:25 +00:00
Joe Ramsay	cc0d77ba94	aarch64: Add half-width versions of AdvSIMD f32 libmvec routines Compilers may emit calls to 'half-width' routines (two-lane single-precision variants). These have been added in the form of wrappers around the full-width versions, where the low half of the vector is simply duplicated. This will perform poorly when one lane triggers the special-case handler, as there will be a redundant call to the scalar version, however this is expected to be rare at Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-12-20 08:41:25 +00:00
Andreas Schwab	3f79842788	aarch64: correct CFI in rawmemchr (bug 31113) The .cfi_return_column directive changes the return column for the whole FDE range. But the actual intent is to tell the unwinder that the value in x30 (lr) now resides in x15 after the move, and that is expressed by the .cfi_register directive.	2023-12-05 12:49:37 +01:00
Szabolcs Nagy	8e755f5bc8	aarch64: fix tested ifunc variants Don't test a64fx string functions when BTI is enabled since they are not BTI compatible.	2023-12-04 14:41:26 +00:00
Joe Ramsay	7b12776584	aarch64: Improve special-case handling in AdvSIMD double-precision libmvec routines Avoids emitting many saves/restores of vector registers, reduces the amount of code generated around the scalar fallback.	2023-11-29 15:03:36 +00:00
Joe Ramsay	bd70d3bacf	aarch64: Fix libmvec benchmarks These were broken by the new atan2 functions, as they were only set up for univariate functions. Arity is now detected from the input file - this revealed a mistake that the double-precision inputs were being used for both single- and double-precision routines, which is now remedied.	2023-11-22 09:10:43 +00:00
Adhemerval Zanella	55f41ef8de	elf: Remove LD_PROFILE for static binaries The _dl_non_dynamic_init does not parse LD_PROFILE, which does not enable profile for dlopen objects. Since dlopen is deprecated for static objects, it is better to remove the support. It also allows to trim down libc.a of profile support. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-11-21 16:15:42 -03:00
Joe Ramsay	a8830c9285	aarch64: Add vector implementations of expm1 routines May discard sign of 0 - auto tests for -0 and -0x1p-10000 updated accordingly.	2023-11-20 17:53:14 +00:00
Wilco Dijkstra	2f5524cc53	AArch64: Remove Falkor memcpy The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-13 16:52:50 +00:00
Wilco Dijkstra	3d7090f14b	AArch64: Add memset_zva64 Add a specialized memset for the common ZVA size of 64 to avoid the overhead of reading the ZVA size. Since the code is identical to __memset_falkor, remove the latter. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-13 16:50:44 +00:00

1 2 3 4 5 ...

470 Commits