glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 13:00:06 +00:00

Author	SHA1	Message	Date
Joe Ramsay	cc0d77ba94	aarch64: Add half-width versions of AdvSIMD f32 libmvec routines Compilers may emit calls to 'half-width' routines (two-lane single-precision variants). These have been added in the form of wrappers around the full-width versions, where the low half of the vector is simply duplicated. This will perform poorly when one lane triggers the special-case handler, as there will be a redundant call to the scalar version, however this is expected to be rare at Ofast. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-12-20 08:41:25 +00:00
Andreas Schwab	3f79842788	aarch64: correct CFI in rawmemchr (bug 31113) The .cfi_return_column directive changes the return column for the whole FDE range. But the actual intent is to tell the unwinder that the value in x30 (lr) now resides in x15 after the move, and that is expressed by the .cfi_register directive.	2023-12-05 12:49:37 +01:00
Szabolcs Nagy	8e755f5bc8	aarch64: fix tested ifunc variants Don't test a64fx string functions when BTI is enabled since they are not BTI compatible.	2023-12-04 14:41:26 +00:00
Joe Ramsay	7b12776584	aarch64: Improve special-case handling in AdvSIMD double-precision libmvec routines Avoids emitting many saves/restores of vector registers, reduces the amount of code generated around the scalar fallback.	2023-11-29 15:03:36 +00:00
Joe Ramsay	bd70d3bacf	aarch64: Fix libmvec benchmarks These were broken by the new atan2 functions, as they were only set up for univariate functions. Arity is now detected from the input file - this revealed a mistake that the double-precision inputs were being used for both single- and double-precision routines, which is now remedied.	2023-11-22 09:10:43 +00:00
Adhemerval Zanella	55f41ef8de	elf: Remove LD_PROFILE for static binaries The _dl_non_dynamic_init does not parse LD_PROFILE, which does not enable profile for dlopen objects. Since dlopen is deprecated for static objects, it is better to remove the support. It also allows to trim down libc.a of profile support. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-11-21 16:15:42 -03:00
Joe Ramsay	a8830c9285	aarch64: Add vector implementations of expm1 routines May discard sign of 0 - auto tests for -0 and -0x1p-10000 updated accordingly.	2023-11-20 17:53:14 +00:00
Wilco Dijkstra	2f5524cc53	AArch64: Remove Falkor memcpy The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-13 16:52:50 +00:00
Wilco Dijkstra	3d7090f14b	AArch64: Add memset_zva64 Add a specialized memset for the common ZVA size of 64 to avoid the overhead of reading the ZVA size. Since the code is identical to __memset_falkor, remove the latter. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-13 16:50:44 +00:00
Wilco Dijkstra	9627ab99b5	AArch64: Cleanup emag memset Cleanup emag memset - merge the memset_base64.S file, remove the unused ZVA code (since it is disabled on emag). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-13 16:45:47 +00:00
Joe Ramsay	3548a4f087	aarch64: Add vector implementations of log1p routines May discard sign of zero.	2023-11-10 17:07:43 +00:00
Joe Ramsay	b07038c5d3	aarch64: Add vector implementations of atan2 routines	2023-11-10 17:07:43 +00:00
Joe Ramsay	d30c39f80d	aarch64: Add vector implementations of atan routines	2023-11-10 17:07:42 +00:00
Joe Ramsay	b5d23367a8	aarch64: Add vector implementations of acos routines	2023-11-10 17:07:42 +00:00
Joe Ramsay	9bed498418	aarch64: Add vector implementations of asin routines	2023-11-10 17:07:42 +00:00
Wilco Dijkstra	9fd3409842	AArch64: Cleanup ifuncs Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename strlen_mte to strlen_generic. Remove rtld-memset. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-11-01 13:41:59 +00:00
Wilco Dijkstra	2bd0017988	AArch64: Add support for MOPS memcpy/memmove/memset Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for memcpy, memmove and memset (use .inst for now so it works with all binutils versions without needing complex configure and conditional compilation). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-10-24 13:37:48 +01:00
Joe Ramsay	31aaf6fed9	aarch64: Add vector implementations of exp10 routines Double-precision routines either reuse the exp table (AdvSIMD) or use SVE FEXPA intruction.	2023-10-23 15:00:45 +01:00
Joe Ramsay	067a34156c	aarch64: Add vector implementations of log10 routines A table is also added, which is shared between AdvSIMD and SVE log10.	2023-10-23 15:00:45 +01:00
Joe Ramsay	a8e3ab3074	aarch64: Add vector implementations of log2 routines A table is also added, which is shared between AdvSIMD and SVE log2.	2023-10-23 15:00:45 +01:00
Joe Ramsay	b39e9db5e3	aarch64: Add vector implementations of exp2 routines Some routines reuse table from v_exp_data.c	2023-10-23 15:00:45 +01:00
Joe Ramsay	f554334c05	aarch64: Add vector implementations of tan routines This includes some utility headers for evaluating polynomials using various schemes.	2023-10-23 15:00:44 +01:00
Joe Ramsay	5a4b6f8e4b	aarch64: Optimise vecmath logs * Transpose table layout for improved memory access * Use half-vector special comparisons for AdvSIMD * Improve register use near special-case branches - Due to the presence of a function call, return value would get mov-d out of x0 in order to facilitate PCS. By moving the final computation after the branch this can be avoided Also change SVE routines to use overloaded intrinsics for readability.	2023-10-05 16:54:16 +01:00
Joe Ramsay	480a0dfe1a	aarch64: Cosmetic change in SVE exp routines Use overloaded intrinsics for readability. Codegen does not change, however while we're bringing the routines up-to-date with recent improvements to other routines in AOR it is worth copying this change over as well.	2023-10-05 16:54:00 +01:00
Joe Ramsay	9180160e08	aarch64: Optimize SVE cos & cosf Saves a mov by ensuring return value does not need to be moved out of the way before special-case branch. Also change to use overloaded intrinsics.	2023-10-05 16:53:38 +01:00
Joe Ramsay	8014d1e832	aarch64: Improve vecmath sin routines * Update ULP comment reflecting a new observed max in [-pi/2, pi/2] * Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions * Improve register use near special-case branch Also use overloaded intrinsics for SVE.	2023-10-05 16:53:06 +01:00
Wilco Dijkstra	6b695e5c62	AArch64: Remove -0.0 check from vector sin Remove the unnecessary extra checks for sin (-0.0) from vector sin/sinf, improving performance. Passes regress. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-09-26 13:40:07 +01:00
Siddhesh Poyarekar	c6cb8783b5	configure: Use autoconf 2.71 Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-07-17 10:08:10 -04:00
Joe Ramsay	4a9392ffc2	aarch64: Add vector implementations of exp routines Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-06-30 09:04:26 +01:00
Joe Ramsay	78c01a5cbe	aarch64: Add vector implementations of log routines Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. Log lookup table added as HIDDEN symbol to allow it to be shared between AdvSIMD and SVE variants. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-06-30 09:04:22 +01:00
Joe Ramsay	3bb1af2051	aarch64: Add vector implementations of sin routines Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-06-30 09:04:16 +01:00
Joe Ramsay	aed39a3aa3	aarch64: Add vector implementations of cos routines Replace the loop-over-scalar placeholder routines with optimised implementations from Arm Optimized Routines (AOR). Also add some headers containing utilities for aarch64 libmvec routines, and update libm-test-ulps. Data tables for new routines are used via a pointer with a barrier on it, in order to prevent overly aggressive constant inlining in GCC. This allows a single adrp, combined with offset loads, to be used for every constant in the table. Special-case handlers are marked NOINLINE in order to confine the save/restore overhead of switching from vector to normal calling standard. This way we only incur the extra memory access in the exceptional cases. NOINLINE definitions have been moved to math_private.h in order to reduce duplication. AOR exposes a config option, WANT_SIMD_EXCEPT, to enable selective masking (and later fixing up) of invalid lanes, in order to trigger fp exceptions correctly (AdvSIMD only). This is tested and maintained in AOR, however it is configured off at source level here for performance reasons. We keep the WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify the upstreaming process from AOR to glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-06-30 09:04:10 +01:00
Paul Pluzhnikov	2cbeda847b	Fix a few more typos I missed in previous round -- BZ 25337	2023-06-02 23:46:32 +00:00
Paul Pluzhnikov	65cc53fe7c	Fix misspellings in sysdeps/ -- BZ 25337	2023-05-30 23:02:29 +00:00
Szabolcs Nagy	642f1b9b3d	aarch64: More configure checks for libmvec Check assembler and linker support too, not just SVE ACLE in the compiler, since variant PCS requires at least binutils 2.32.1.	2023-05-05 11:34:44 +01:00
Szabolcs Nagy	ee68e9cba4	aarch64: SVE ACLE configure test cleanups Use more idiomatic configure test for better autoconf cache and logs.	2023-05-05 10:28:29 +01:00
Szabolcs Nagy	1a62d7e5c3	aarch64: fix SVE ACLE check for bootstrap glibc builds arm_sve.h depends on stdint.h but that relies on libc headers unless compiled in freestanding mode. Without this change a bootstrap glibc build (that uses a compiler without installed libc headers) failed with checking for availability of SVE ACLE... In file included from [...]/arm_sve.h:28, from conftest.c:1: [...]/stdint.h:9:16: fatal error: stdint.h: No such file or directory 9 \| # include_next <stdint.h> \| ^~~~~~~~~~ compilation terminated. configure: error: mathvec is enabled but compiler does not have SVE ACLE. [...]	2023-05-04 10:19:11 +01:00
Joe Ramsay	cd94326a13	Enable libmvec support for AArch64 This patch enables libmvec on AArch64. The proposed change is mainly implementing build infrastructure to add the new routines to ABI, tests and benchmarks. I have demonstrated how this all fits together by adding implementations for vector cos, in both single and double precision, targeting both Advanced SIMD and SVE. The implementations of the routines themselves are just loops over the scalar routine from libm for now, as we are more concerned with getting the plumbing right at this point. We plan to contribute vector routines from the Arm Optimized Routines repo that are compliant with requirements described in the libmvec wiki. Building libmvec requires minimum GCC 10 for SVE ACLE. To avoid raising the minimum GCC by such a big jump, we allow users to disable libmvec if their compiler is too old. Note that at this point users have to manually call the vector math functions. This seems to be acceptable to some downstream users. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-05-03 12:09:49 +01:00
Szabolcs Nagy	2ce48fbd5a	aarch64: update libm test ulps	2023-02-24 10:55:38 +00:00
Jun Tang	311a7e0256	AArch64: Fix HP_TIMING_DIFF computation [BZ# 29329] Fix the computation to allow for cntfrq_el0 being larger than 1GHz. Assume cntfrq_el0 is a multiple of 1MHz to increase the maximum interval (1024 seconds at 1GHz). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-02-22 16:45:59 +00:00
Adhemerval Zanella	a9b3b770f5	string: Remove string_private.h Now that _STRING_ARCH_unaligned is not used anymore. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-02-17 15:56:54 -03:00
Adhemerval Zanella	22999b2f0f	string: Add libc_hidden_proto for memrchr Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>	2023-02-08 17:13:58 -03:00
Adhemerval Zanella	7ea510127e	string: Add libc_hidden_proto for strchrnul Although static linker can optimize it to local call, it follows the internal scheme to provide hidden proto and definitions. Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>	2023-02-08 17:13:56 -03:00
Wilco Dijkstra	d2d3f3720c	AArch64: Improve SVE memcpy and memmove Improve SVE memcpy by copying 2 vectors if the size is small enough. This improves performance of random memcpy by ~9% on Neoverse V1, and 33-64 byte copies are ~16% faster. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-02-06 16:15:34 +00:00
Wilco Dijkstra	55599d4804	AArch64: Improve strrchr Use shrn for narrowing the mask which simplifies code and speeds up small strings. Unroll the first search loop to improve performance on large strings. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	ad098893ba	AArch64: Optimize strnlen Optimize strnlen using the shrn instruction and improve the main loop. Small strings are around 10% faster, large strings are 40% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	03c8ce5000	AArch64: Optimize strlen Optimize strlen by unrolling the main loop. Large strings are 64% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	349e48c01e	AArch64: Optimize strcpy Unroll the main loop. Large strings are around 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	09ebd8549b	AArch64: Improve strchrnul Unroll the main loop, which improves performance slightly. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00
Wilco Dijkstra	51541a2297	AArch64: Optimize strchr Simplify calculation of the mask using shrn. Unroll the main loop. Small strings are 20% faster on modern CPUs. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-01-17 15:09:18 +00:00

1 2 3 4 5 ...

429 Commits