glibc

mirror of https://sourceware.org/git/glibc.git synced 2025-01-07 10:00:07 +00:00

Author	SHA1	Message	Date
Richard Earnshaw	d27f0e5d88	aarch64: Add aarch64-specific files for memory tagging support This final patch provides the architecture-specific implementation of the memory-tagging support hooks for aarch64.	2020-12-21 15:25:25 +00:00
Szabolcs Nagy	4033f21eb2	aarch64: remove the strlen_asimd symbol This symbol is not in the implementation reserved namespace for static linking and it was never used: it seems it was mistakenly added in the orignal strlen_asimd commit `436e4d5b96`	2020-12-15 14:42:45 +00:00
Guillaume Gardet	d4136903a2	aarch64: fix static PIE start code for BTI [BZ #27068 ] A bti c was missing from rcrt1.o which made all -static-pie binaries fail at program startup on BTI enabled systems. Fixes bug 27068.	2020-12-15 13:48:45 +00:00
Szabolcs Nagy	cd543b5eb3	aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831 ] Re-mmap executable segments if possible instead of using mprotect to add PROT_BTI. This allows using BTI protection with security policies that prevent mprotect with PROT_EXEC. If the fd of the ELF module is not available because it was kernel mapped then mprotect is used and failures are ignored. To protect the main executable even when mprotect is filtered the linux kernel will have to be changed to add PROT_BTI to it. The delayed failure reporting is mainly needed because currently _dl_process_gnu_properties does not propagate failures such that the required cleanups happen. Using the link_map_machine struct for error propagation is not ideal, but this seemed to be the least intrusive solution. Fixes bug 26831. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-11 15:46:02 +00:00
Szabolcs Nagy	c00452d775	elf: Pass the fd to note processing To handle GNU property notes on aarch64 some segments need to be mmaped again, so the fd of the loaded ELF module is needed. When the fd is not available (kernel loaded modules), then -1 is passed. The fd is passed to both _dl_process_pt_gnu_property and _dl_process_pt_note for consistency. Target specific note processing functions are updated accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-11 15:45:37 +00:00
Szabolcs Nagy	8b8f616e6a	aarch64: align address for BTI protection [BZ #26988 ] Handle unaligned executable load segments (the bfd linker is not expected to produce such binaries, but other linkers may). Computing the mapping bounds follows _dl_map_object_from_fd more closely now. Fixes bug 26988. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-11 15:04:39 +00:00
Szabolcs Nagy	72739c79f6	aarch64: Fix missing BTI protection from dependencies [BZ #26926 ] The _dl_open_check and _rtld_main_check hooks are not called on the dependencies of a loaded module, so BTI protection was missed on every module other than the main executable and directly dlopened libraries. The fix just iterates over dependencies to enable BTI. Fixes bug 26926.	2020-12-11 14:52:13 +00:00
Florian Weimer	1daccf403b	nptl: Move stack list variables into _rtld_global Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-11-16 19:33:30 +01:00
Florian Weimer	5edf3d9fd6	aarch64: Add unwind information to _start (bug 26853) This adds CFI directives which communicate that the stack ends with this function. Fixes bug 26853.	2020-11-09 11:31:04 +01:00
Szabolcs Nagy	e156dabc76	aarch64: Add variant PCS lazy binding test [BZ #26798 ] This test fails without bug 26798 fixed because some integer registers likely get clobbered by lazy binding and variant PCS only allows x16 and x17 to be clobbered at call time. The test requires binutils 2.32.1 or newer for handling variant PCS symbols. SVE registers are not covered by this test, to avoid the complexity of handling multiple compile- and runtime feature support cases.	2020-11-02 09:39:24 +00:00
Szabolcs Nagy	558251bd87	aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798 ] The variant PCS support was ineffective because in the common case linkmap->l_mach.plt == 0 but then the symbol table flags were ignored and normal lazy binding was used instead of resolving the relocs early. (This was a misunderstanding about how GOT[1] is setup by the linker.) In practice this mainly affects SVE calls when the vector length is more than 128 bits, then the top bits of the argument registers get clobbered during lazy binding. Fixes bug 26798.	2020-11-02 09:39:24 +00:00
Wilco Dijkstra	e11ed9d2b4	AArch64: Use __memcpy_simd on Neoverse N2/V1 Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as the memcpy/memmove ifunc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-10-14 14:27:50 +01:00
Szabolcs Nagy	238032ead6	aarch64: enforce >=64K guard size [BZ #26691 ] There are several compiler implementations that allow large stack allocations to jump over the guard page at the end of the stack and corrupt memory beyond that. See CVE-2017-1000364. Compilers can emit code to probe the stack such that the guard page cannot be skipped, but on aarch64 the probe interval is 64K by default instead of the minimum supported page size (4K). This patch enforces at least 64K guard on aarch64 unless the guard is disabled by setting its size to 0. For backward compatibility reasons the increased guard is not reported, so it is only observable by exhausting the address space or parsing /proc/self/maps on linux. On other targets the patch has no effect. If the stack probe interval is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can be defined to get large enough stack guard on libc allocated stacks. The patch does not affect threads with user allocated stacks. Fixes bug 26691.	2020-10-02 09:57:44 +01:00
Wilco Dijkstra	bd394d131c	AArch64: Improve backwards memmove performance On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses. So change the memmove loop in memcpy_advsimd.S to use 2x STR rather than STP. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-08-28 17:51:40 +01:00
Szabolcs Nagy	12b2fd0ef9	aarch64: update ulps. For new j0 test.	2020-08-13 13:02:35 +01:00
Szabolcs Nagy	2dc33b928b	aarch64: Use future HWCAP2_MTE in ifunc resolver Make glibc MTE-safe on systems where MTE is available. This allows using heap tagging with an LD_PRELOADed malloc implementation that enables MTE. We don't document this as guaranteed contract yet, so glibc may not be MTE safe when HWCAP2_MTE is set (older glibcs certainly aren't). This is mainly for testing and debugging. The HWCAP flag is not exposed in public headers until Linux adds it to its uapi. The HWCAP value reservation will be in Linux 5.9.	2020-07-27 12:54:22 +01:00
Szabolcs Nagy	7ebd114211	aarch64: Respect p_flags when protecting code with PROT_BTI Use PROT_READ and PROT_WRITE according to the load segment p_flags when adding PROT_BTI. This is before processing relocations which may drop PROT_BTI in case of textrels. Executable stacks are not protected via PROT_BTI either. PROT_BTI is hardening in case memory corruption happened, it's value is reduced if there is writable and executable memory available so missing it on such memory is fine, but we should respect the p_flags and should not drop PROT_WRITE.	2020-07-24 08:52:22 +01:00
Wilco Dijkstra	f46ef33ad1	AArch64: Improve strlen_asimd performance (bug 25824) Optimize strlen using a mix of scalar and SIMD code. On modern micro architectures large strings are 2.6 times faster than existing strlen_asimd and 35% faster than the new MTE version of strlen. On a random strlen benchmark using small sizes the speedup is 7% vs strlen_asimd and 40% vs the MTE strlen. This fixes the main strlen regressions on Cortex-A53 and other cores with a simple Neon unit. Rename __strlen_generic to __strlen_mte, and select strlen_asimd when MTE is not enabled (this is waiting on support for a HWCAP_MTE bit). This fixes big-endian bug 25824. Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2020-07-17 15:07:23 +01:00
Wilco Dijkstra	0f6278a879	AArch64: Rename IS_ARES to IS_NEOVERSE_N1 Rename IS_ARES to IS_NEOVERSE_N1 since that is a bit clearer. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-15 16:58:07 +01:00
Wilco Dijkstra	4a733bf375	AArch64: Add optimized Q-register memcpy Add a new memcpy using 128-bit Q registers - this is faster on modern cores and reduces codesize. Similar to the generic memcpy, small cases include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Large copies align the source rather than the destination. bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1, so make this memcpy the default on N1 (on Centriq it is 15% faster than memcpy_falkor). Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2020-07-15 16:55:07 +01:00
Wilco Dijkstra	34f0d01d5e	AArch64: Align ENTRY to a cacheline Given almost all uses of ENTRY are for string/memory functions, align ENTRY to a cacheline to simplify things. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-15 16:50:02 +01:00
Szabolcs Nagy	d174ec248d	aarch64: redefine RETURN_ADDRESS to strip PAC RETURN_ADDRESS is used at several places in glibc to mean a valid code address of the call site, but with pac-ret it may contain a pointer authentication code (PAC), so its definition is adjusted. This is gcc PR target/94891: __builtin_return_address should not expose signed pointers to user code where it can cause ABI issues. In glibc RETURN_ADDRESS is only changed if it is built with pac-ret. There is no detection for the specific gcc issue because it is hard to test and the additional xpac does not cause problems. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	c94767712b	aarch64: fix pac-ret support in _mcount Currently gcc -pg -mbranch-protection=pac-ret passes signed return address to _mcount, so _mcount now has to always strip pac from the frompc since that's from user code that may be built with pac-ret. This is gcc PR target/94791: signed pointers should not escape and get passed across extern call boundaries, since that's an ABI break, but because existing gcc has this issue we work it around in glibc until that is resolved. This is compatible with a fixed gcc and it is a nop on systems without PAuth support. The bug was introduced in gcc-7 with -msign-return-address=non-leaf\|all support which in gcc-9 got renamed to -mbranch-protection=pac-ret\|pac-ret+leaf\|standard. strip_pac uses inline asm instead of __builtin_aarch64_xpaclri since that is not a documented api and not available in all supported gccs. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	1be3d6eb82	aarch64: Add pac-ret support to assembly files Use return address signing in assembly files for functions that save LR when pac-ret is enabled in the compiler. The GNU property note for PAC-RET is not meaningful to the dynamic linker so it is not strictly required, but it may be used to track the security property of binaries. (The PAC-RET property is only set if BTI is set too because BTI implies working GNU property support.) Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	9e1751e6d6	aarch64: configure check for pac-ret code generation Return address signing requires unwinder support, which is present in libgcc since >=gcc-7, however due to bugs the support may be broken in <gcc-10 (and similarly there may be issues in custom unwinders), so pac-ret is not always safe to use. So in assembly code glibc should only use pac-ret if the compiler uses it too. Unfortunately there is no predefined feature macro for it set by the compiler so pac-ret is inferred from the code generation. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	de9301c02e	aarch64: ensure objects are BTI compatible When glibc is built with branch protection (i.e. with a gcc configured with --enable-standard-branch-protection), all glibc binaries should be BTI compatible and marked as such. It is easy to link BTI incompatible objects by accident and this is silent currently which is usually not the expectation, so this is changed into a link error. (There is no linker flag for failing on BTI incompatible inputs so all warnings are turned into fatal errors outside the test system when building glibc with branch protection.) Unfortunately, outlined atomic functions are not BTI compatible in libgcc (PR libgcc/96001), so to build glibc with current gcc use 'CC=gcc -mno-outline-atomics', this should be fixed in libgcc soon and then glibc can be built and tested without such workarounds. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Sudakshina Das	605338745b	aarch64: enable BTI at runtime Binaries can opt-in to using BTI via an ELF object file marking. The dynamic linker has to then mprotect the executable segments with PROT_BTI. In case of static linked executables or in case of the dynamic linker itself, PROT_BTI protection is done by the operating system. On AArch64 glibc uses PT_GNU_PROPERTY instead of PT_NOTE to check the properties of a binary because PT_NOTE can be unreliable with old linkers (old linkers just append the notes of input objects together and add them to the output without checking them for consistency which means multiple incompatible GNU property notes can be present in PT_NOTE). BTI property is handled in the loader even if glibc is not built with BTI support, so in theory user code can be BTI protected independently of glibc. In practice though user binaries are not marked with the BTI property if glibc has no support because the static linked libc objects (crt files, libc_nonshared.a) are unmarked. This patch relies on Linux userspace API that is not yet in a linux release but in v5.8-rc1 so scheduled to be in Linux 5.8. Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Szabolcs Nagy	5f846c8b0d	aarch64: fix RTLD_START for BTI Tailcalls must use x16 or x17 for the indirect branch instruction to be compatible with code that uses BTI c at function entries. (Other forms of indirect branches can only land on BTI j.) Also added a BTI c at the ELF entry point of rtld, this is not strictly necessary since the kernel does not use indirect branch to get there, but it seems safest once building glibc itself with BTI is supported. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Sudakshina Das	91181954f9	aarch64: Add BTI support to assembly files To enable building glibc with branch protection, assembly code needs BTI landing pads and ELF object file markings in the form of a GNU property note. The landing pads are unconditionally added to all functions that may be indirectly called. When the code segment is not mapped with PROT_BTI these instructions are nops. They are kept in the code when BTI is not supported so that the layout of performance critical code is unchanged across configurations. The GNU property notes are only added when there is support for BTI in the toolchain, because old binutils does not handle the notes right. (Does not know how to merge them nor to put them in PT_GNU_PROPERTY segment instead of PT_NOTE, and some versions of binutils emit warnings about the unknown GNU property. In such cases the produced libc binaries would not have valid ELF marking so BTI would not be enabled.) Note: functions using ENTRY or ENTRY_ALIGN now start with an additional BTI c, so alignment of the following code changes, but ENTRY_ALIGN_AND_PAD was fixed so there is no change to the existing code layout. Some string functions may need to be tuned for optimal performance after this commit. Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Szabolcs Nagy	2a4c2dde49	aarch64: Rename place holder .S files to .c The compiler can add required elf markings based on CFLAGS but the assembler cannot, so using C code for empty files creates less of a maintenance problem. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Szabolcs Nagy	1b0a4f58f5	aarch64: configure test for BTI support Check BTI support in the compiler and linker. The check also requires READELF that understands the BTI GNU property note. It is expected to succeed with gcc >=gcc-9 configured with --enable-standard-branch-protection and binutils >=binutils-2.33. Note: passing -mbranch-protection=bti in CFLAGS when building glibc may not be enough to get a glibc that supports BTI because crtbegin* and crtend* provided by the compiler needs to be BTI compatible too. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Alex Butler	03e1378f94	aarch64: MTE compatible strncmp Add support for MTE to strncmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	adac54ffc5	aarch64: MTE compatible strcmp Add support for MTE to strcmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	79160c06c7	aarch64: MTE compatible strrchr Add support for MTE to strrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	df06b0d90f	aarch64: MTE compatible memrchr Add support for MTE to memrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	7ff899969f	aarch64: MTE compatible memchr Add support for MTE to memchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	bb2c12aecb	aarch64: MTE compatible strcpy Add support for MTE to strcpy. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Adhemerval Zanella	ea04f02131	aarch64: Remove fpu Makefile The -fno-math-errno is already added by default and the minimum required GCC to build glibc (6.2) make the -ffinite-math-only superflous. Checked on aarch64-linux-gnu.	2020-06-22 11:09:50 -03:00
Adhemerval Zanella	271afad8f4	aarch64: Use math-use-builtins for ceil{f} The define is already set on the math-use-builtins-ceil.h, the patch just removes the implementations (it was missed on `c9feb1be93`). Checked on aarch64-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	e80501a5c9	math: Decompose math-use-builtins.h Each symbol definitions are moved on a separated file and it cover all symbol type definitions (float, double, long double, and float128). It allows to set support for architectures without the boiler place of copying default values. Checked with a build on the affected ABIs.	2020-06-22 11:09:45 -03:00
Andrea Corallo	a365ac45b7	aarch64: MTE compatible strlen Introduce an Arm MTE compatible strlen implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance on modern cores. On cores with less efficient Advanced SIMD implementation such as Cortex-A53 it can be slower. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-09 09:21:11 +01:00
Andrea Corallo	49beaaec1b	aarch64: MTE compatible strchr Introduce an Arm MTE compatible strchr implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-09 09:20:27 +01:00
Andrea Corallo	f7de454f20	aarch64: MTE compatible strchrnul Introduce an Arm MTE compatible strchrnul implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-09 09:20:27 +01:00
Krzysztof Koch	d1f75e9644	AArch64: Merge Falkor memcpy and memmove implementations Falkor's memcpy and memmove share some implementation details, therefore, the two routines are moved to a single source file for code reuse. The two routines now share code for small and medium copies (up to and including 128 bytes). Large copies in memcpy do not handle overlap correctly, consequently, the loops for moving/copying more than 128 bytes stay separate for memcpy and memmove. To increase code reuse a number of small modifications were made: 1. The old implementation of memcpy copied the first 16-bytes as soon as the size of data was determined to be greater than 32 bytes. For memcpy code to also work when copying small/medium overlapping data, the first load and store was moved to the large copy case. 2. Medium memcpy case no longer assumes that 16 bytes were already copied and uses 8 registers to copy up to 128 bytes. 3. Small case for memmove was enlarged to that of memcpy, which is less than or equal to 32 bytes. 4. Medium case for memmove was enlarged to that of memcpy, which is less than or equal to 128 bytes. Other changes include: 1. Improve alignment of existing loop bodies. 2. 'Delouse' memmove and memcpy input arguments. Make sure that upper 32-bits of input registers are zeroed if unused. 3. Do one more iteration in memmove loops and reduce the number of copies made from the start/end of the buffer, depending on the direction of the memmove loop. Benchmarking: Looking at the results from bench-memcpy-random.out, we can see that now memmove_falkor is about 5% faster than memcpy_falkor_old, while memmove_falkor_old was more than 15% slower. The memcpy implementation remained largely unmodified, so there is no significant performance change. The reason for such a significant memmove performance gain is the increase of the upper bound on the small copy case to 32 bytes and the increase of the upper bound on the medium copy case to 128 bytes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-08 14:13:05 +01:00
Vineet Gupta	c9feb1be93	aarch/fpu: use generic builtins based math functions introduce sysdep header math-use-builtins.h to replace aarch64 implementations with corresponding generic ones. - newly inroduced generic sqrt{,f}, fma{,f} - existing floor{,f}, nearbyint{,f}, rint{,f}, round{,f}, trunc{,f} - Note that generic copysign was already enabled (via generic math-use-builtins.h) now thru sysdep header Tested with build-many-glibcs for aarch64-linux-gnu This is a non functional change and aarch64 libm before/after was byte invariant as compared below: \| cd /SCRATCH/vgupta/gnu/install-glibc-A-baseline \| for i in `find . -name libm-2.31.9000.so`; do \| echo $i; diff $i /SCRATCH/vgupta/gnu/install-glibc-C-reduce-scope/$i ; \| echo $?; \| done \| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabi/lib/libm-2.31.9000.so \| 0 \| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./powerpc-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./microblaze-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./nios2-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./hppa-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./s390x-linux-gnu/lib64/libm-2.31.9000.so \| 0 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-03 10:23:33 -07:00
Lexi Shao	59b64f9cbb	aarch64: fix strcpy and strnlen for big-endian [BZ #25824 ] This patch fixes the optimized implementation of strcpy and strnlen on a big-endian arm64 machine. The optimized method uses neon, which can process 128bit with one instruction. On a big-endian machine, the bit order should be reversed for the whole 128-bits double word. But with instuction rev64 datav.16b, datav.16b it reverses 64bits in the two halves rather than reversing 128bits. There is no such instruction as rev128 to reverse the 128bits, but we can fix this by loading the data registers accordingly. Fixes 0237b61526e7("aarch64: Optimized implementation of strcpy") and 2911cb68ed3d("aarch64: Optimized implementation of strnlen"). Signed-off-by: Lexi Shao <shaolexi@huawei.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2020-05-15 12:15:56 +01:00
Adhemerval Zanella	6a0474c769	Update aarch64 libm-test-ulps	2020-04-08 13:52:44 -03:00
Adhemerval Zanella	1c15464ca0	math: Remove inline math tests With mathinline removal there is no need to keep building and testing inline math tests. The gen-libm-tests.py support to generate ULP_I_* is removed and all libm-test-ulps files are updated to longer have the i{float,double,ldouble} entries. The support for no-test-inline is also removed from both gen-auto-libm-tests and the auto-libm-test-out-* were regenerated. Checked on x86_64-linux-gnu and i686-linux-gnu.	2020-03-19 11:45:44 -03:00
Wilco Dijkstra	7000651327	[AArch64] Improve integer memcpy Further optimize integer memcpy. Small cases now include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Comments have been rewritten.	2020-03-11 17:15:25 +00:00
Florian Weimer	f4349837d9	Introduce <elf-initfini.h> and ELF_INITFINI for all architectures This supersedes the init_array sysdeps directory. It allows us to check for ELF_INITFINI in both C and assembler code, and skip DT_INIT and DT_FINI processing completely on newer architectures. A new header file is needed because <dl-machine.h> is incompatible with assembler code. <sysdep.h> is compatible with assembler code, but it cannot be included in all assembler files because on some architectures, it redefines register names, and some assembler files conflict with that. <elf-initfini.h> is replicated for legacy architectures which need DT_INIT/DT_FINI support. New architectures follow the generic default and disable it.	2020-02-18 15:12:25 +01:00

1 2 3 4 5 ...

288 Commits