glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-02 09:40:13 +00:00

Author	SHA1	Message	Date
Wilco Dijkstra	9e97f239ea	Remove dbl-64/wordsize-64 (part 2) Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The second patch just moves all wordsize-64 files and removes a few wordsize-64 uses in comments and Implies files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-07 15:26:26 +00:00
Shuo Wang	f5082c7010	aarch64: push the set of rules before falling into slow path It is supposed to save the rules for the instructions before falling into slow path. Tested in glibc-2.28 before fixing: Thread 2 "xxxxxxx" hit Breakpoint 1, _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149 149 stp x1, x2, [sp, #-32]! Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.3.0-20190804.h24.aarch64 (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150 150 stp x3, x4, [sp, #16] (gdb) _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157 157 mrs x4, tpidr_el0 (gdb) 158 ldr PTR_REG (1), [x0,#TLSDESC_ARG] (gdb) 159 ldr PTR_REG (0), [x4,#TCBHEAD_DTV] (gdb) 160 ldr PTR_REG (3), [x1,#TLSDESC_GEN_COUNT] (gdb) 161 ldr PTR_REG (2), [x0,#DTV_COUNTER] (gdb) 162 cmp PTR_REG (3), PTR_REG (2) (gdb) 163 b.hi 2f (gdb) 165 ldp PTR_REG (2), PTR_REG (3), [x1,#TLSDESC_MODID] (gdb) 166 add PTR_REG (0), PTR_REG (0), PTR_REG (2), lsl #(PTR_LOG_SIZE + 1) (gdb) 167 ldr PTR_REG (0), [x0] /* Load val member of DTV entry. / (gdb) 168 cmp PTR_REG (0), #TLS_DTV_UNALLOCATED (gdb) 169 b.eq 2f (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:169 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=4294967295) at /home/test/test_function.c:30 #2 0x0000000000400c08 in initaaa () at thread.c:58 #3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71 #4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486 #5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:184 184 stp x29, x30, [sp,#-16NSAVEXREGPAIRS]! (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:184 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=4294967295) at /home/test/test_function.c:30 #2 0x0000000000000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Co-authored-by: liqingqing <liqingqing3@huawei.com>	2021-01-05 09:25:19 +00:00
Shuo Wang	cd6274089f	aarch64: fix stack missing after sp is updated After sp is updated, the CFA offset should be set before next instruction. Tested in glibc-2.28: Thread 2 "xxxxxxx" hit Breakpoint 1, _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149 149 stp x1, x2, [sp, #-32]! Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.3.0-20190804.h24.aarch64 (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184) at /home/test/test_function.c:30 #2 0x0000000000400c08 in initaaa () at thread.c:58 #3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71 #4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486 #5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150 150 stp x3, x4, [sp, #16] (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184) at /home/test/test_function.c:30 #2 0x0000000000000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) ni _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157 157 mrs x4, tpidr_el0 (gdb) bt #0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157 #1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184) at /home/test/test_function.c:30 #2 0x0000000000400c08 in initaaa () at thread.c:58 #3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71 #4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486 #5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Signed-off-by: liqingqing <liqingqing3@huawei.com> Signed-off-by: Shuo Wang <wangshuo47@huawei.com>	2021-01-04 15:37:06 +00:00
Paul Eggert	2b778ceb40	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: * pre-commit check failed ... remote: * error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master	2021-01-02 12:17:34 -08:00
Szabolcs Nagy	45b1e17e91	aarch64: use PTR_ARG and SIZE_ARG instead of DELOUSE DELOUSE was added to asm code to make them compatible with non-LP64 ABIs, but it is an unfortunate name and the code was not compatible with ABIs where pointer and size_t are different. Glibc currently only supports the LP64 ABI so these macros are not really needed or tested, but for now the name is changed to be more meaningful instead of removing them completely. Some DELOUSE macros were dropped: clone, strlen and strnlen used it unnecessarily. The out of tree ILP32 patches are currently not maintained and will likely need a rework to rebase them on top of the time64 changes.	2020-12-31 16:50:58 +00:00
Szabolcs Nagy	682cdd6e1a	aarch64: update ulps. For new test cases in commit `cad5ad81d2`	2020-12-21 16:40:34 +00:00
Richard Earnshaw	d27f0e5d88	aarch64: Add aarch64-specific files for memory tagging support This final patch provides the architecture-specific implementation of the memory-tagging support hooks for aarch64.	2020-12-21 15:25:25 +00:00
Szabolcs Nagy	4033f21eb2	aarch64: remove the strlen_asimd symbol This symbol is not in the implementation reserved namespace for static linking and it was never used: it seems it was mistakenly added in the orignal strlen_asimd commit `436e4d5b96`	2020-12-15 14:42:45 +00:00
Guillaume Gardet	d4136903a2	aarch64: fix static PIE start code for BTI [BZ #27068 ] A bti c was missing from rcrt1.o which made all -static-pie binaries fail at program startup on BTI enabled systems. Fixes bug 27068.	2020-12-15 13:48:45 +00:00
Szabolcs Nagy	cd543b5eb3	aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831 ] Re-mmap executable segments if possible instead of using mprotect to add PROT_BTI. This allows using BTI protection with security policies that prevent mprotect with PROT_EXEC. If the fd of the ELF module is not available because it was kernel mapped then mprotect is used and failures are ignored. To protect the main executable even when mprotect is filtered the linux kernel will have to be changed to add PROT_BTI to it. The delayed failure reporting is mainly needed because currently _dl_process_gnu_properties does not propagate failures such that the required cleanups happen. Using the link_map_machine struct for error propagation is not ideal, but this seemed to be the least intrusive solution. Fixes bug 26831. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-11 15:46:02 +00:00
Szabolcs Nagy	c00452d775	elf: Pass the fd to note processing To handle GNU property notes on aarch64 some segments need to be mmaped again, so the fd of the loaded ELF module is needed. When the fd is not available (kernel loaded modules), then -1 is passed. The fd is passed to both _dl_process_pt_gnu_property and _dl_process_pt_note for consistency. Target specific note processing functions are updated accordingly. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-11 15:45:37 +00:00
Szabolcs Nagy	8b8f616e6a	aarch64: align address for BTI protection [BZ #26988 ] Handle unaligned executable load segments (the bfd linker is not expected to produce such binaries, but other linkers may). Computing the mapping bounds follows _dl_map_object_from_fd more closely now. Fixes bug 26988. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-11 15:04:39 +00:00
Szabolcs Nagy	72739c79f6	aarch64: Fix missing BTI protection from dependencies [BZ #26926 ] The _dl_open_check and _rtld_main_check hooks are not called on the dependencies of a loaded module, so BTI protection was missed on every module other than the main executable and directly dlopened libraries. The fix just iterates over dependencies to enable BTI. Fixes bug 26926.	2020-12-11 14:52:13 +00:00
Florian Weimer	1daccf403b	nptl: Move stack list variables into _rtld_global Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-11-16 19:33:30 +01:00
Florian Weimer	5edf3d9fd6	aarch64: Add unwind information to _start (bug 26853) This adds CFI directives which communicate that the stack ends with this function. Fixes bug 26853.	2020-11-09 11:31:04 +01:00
Szabolcs Nagy	e156dabc76	aarch64: Add variant PCS lazy binding test [BZ #26798 ] This test fails without bug 26798 fixed because some integer registers likely get clobbered by lazy binding and variant PCS only allows x16 and x17 to be clobbered at call time. The test requires binutils 2.32.1 or newer for handling variant PCS symbols. SVE registers are not covered by this test, to avoid the complexity of handling multiple compile- and runtime feature support cases.	2020-11-02 09:39:24 +00:00
Szabolcs Nagy	558251bd87	aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798 ] The variant PCS support was ineffective because in the common case linkmap->l_mach.plt == 0 but then the symbol table flags were ignored and normal lazy binding was used instead of resolving the relocs early. (This was a misunderstanding about how GOT[1] is setup by the linker.) In practice this mainly affects SVE calls when the vector length is more than 128 bits, then the top bits of the argument registers get clobbered during lazy binding. Fixes bug 26798.	2020-11-02 09:39:24 +00:00
Wilco Dijkstra	e11ed9d2b4	AArch64: Use __memcpy_simd on Neoverse N2/V1 Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as the memcpy/memmove ifunc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-10-14 14:27:50 +01:00
Szabolcs Nagy	238032ead6	aarch64: enforce >=64K guard size [BZ #26691 ] There are several compiler implementations that allow large stack allocations to jump over the guard page at the end of the stack and corrupt memory beyond that. See CVE-2017-1000364. Compilers can emit code to probe the stack such that the guard page cannot be skipped, but on aarch64 the probe interval is 64K by default instead of the minimum supported page size (4K). This patch enforces at least 64K guard on aarch64 unless the guard is disabled by setting its size to 0. For backward compatibility reasons the increased guard is not reported, so it is only observable by exhausting the address space or parsing /proc/self/maps on linux. On other targets the patch has no effect. If the stack probe interval is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can be defined to get large enough stack guard on libc allocated stacks. The patch does not affect threads with user allocated stacks. Fixes bug 26691.	2020-10-02 09:57:44 +01:00
Wilco Dijkstra	bd394d131c	AArch64: Improve backwards memmove performance On some microarchitectures performance of the backwards memmove improves if the stores use STR with decreasing addresses. So change the memmove loop in memcpy_advsimd.S to use 2x STR rather than STP. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-08-28 17:51:40 +01:00
Szabolcs Nagy	12b2fd0ef9	aarch64: update ulps. For new j0 test.	2020-08-13 13:02:35 +01:00
Szabolcs Nagy	2dc33b928b	aarch64: Use future HWCAP2_MTE in ifunc resolver Make glibc MTE-safe on systems where MTE is available. This allows using heap tagging with an LD_PRELOADed malloc implementation that enables MTE. We don't document this as guaranteed contract yet, so glibc may not be MTE safe when HWCAP2_MTE is set (older glibcs certainly aren't). This is mainly for testing and debugging. The HWCAP flag is not exposed in public headers until Linux adds it to its uapi. The HWCAP value reservation will be in Linux 5.9.	2020-07-27 12:54:22 +01:00
Szabolcs Nagy	7ebd114211	aarch64: Respect p_flags when protecting code with PROT_BTI Use PROT_READ and PROT_WRITE according to the load segment p_flags when adding PROT_BTI. This is before processing relocations which may drop PROT_BTI in case of textrels. Executable stacks are not protected via PROT_BTI either. PROT_BTI is hardening in case memory corruption happened, it's value is reduced if there is writable and executable memory available so missing it on such memory is fine, but we should respect the p_flags and should not drop PROT_WRITE.	2020-07-24 08:52:22 +01:00
Wilco Dijkstra	f46ef33ad1	AArch64: Improve strlen_asimd performance (bug 25824) Optimize strlen using a mix of scalar and SIMD code. On modern micro architectures large strings are 2.6 times faster than existing strlen_asimd and 35% faster than the new MTE version of strlen. On a random strlen benchmark using small sizes the speedup is 7% vs strlen_asimd and 40% vs the MTE strlen. This fixes the main strlen regressions on Cortex-A53 and other cores with a simple Neon unit. Rename __strlen_generic to __strlen_mte, and select strlen_asimd when MTE is not enabled (this is waiting on support for a HWCAP_MTE bit). This fixes big-endian bug 25824. Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2020-07-17 15:07:23 +01:00
Wilco Dijkstra	0f6278a879	AArch64: Rename IS_ARES to IS_NEOVERSE_N1 Rename IS_ARES to IS_NEOVERSE_N1 since that is a bit clearer. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-15 16:58:07 +01:00
Wilco Dijkstra	4a733bf375	AArch64: Add optimized Q-register memcpy Add a new memcpy using 128-bit Q registers - this is faster on modern cores and reduces codesize. Similar to the generic memcpy, small cases include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Large copies align the source rather than the destination. bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1, so make this memcpy the default on N1 (on Centriq it is 15% faster than memcpy_falkor). Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2020-07-15 16:55:07 +01:00
Wilco Dijkstra	34f0d01d5e	AArch64: Align ENTRY to a cacheline Given almost all uses of ENTRY are for string/memory functions, align ENTRY to a cacheline to simplify things. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-15 16:50:02 +01:00
Szabolcs Nagy	d174ec248d	aarch64: redefine RETURN_ADDRESS to strip PAC RETURN_ADDRESS is used at several places in glibc to mean a valid code address of the call site, but with pac-ret it may contain a pointer authentication code (PAC), so its definition is adjusted. This is gcc PR target/94891: __builtin_return_address should not expose signed pointers to user code where it can cause ABI issues. In glibc RETURN_ADDRESS is only changed if it is built with pac-ret. There is no detection for the specific gcc issue because it is hard to test and the additional xpac does not cause problems. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	c94767712b	aarch64: fix pac-ret support in _mcount Currently gcc -pg -mbranch-protection=pac-ret passes signed return address to _mcount, so _mcount now has to always strip pac from the frompc since that's from user code that may be built with pac-ret. This is gcc PR target/94791: signed pointers should not escape and get passed across extern call boundaries, since that's an ABI break, but because existing gcc has this issue we work it around in glibc until that is resolved. This is compatible with a fixed gcc and it is a nop on systems without PAuth support. The bug was introduced in gcc-7 with -msign-return-address=non-leaf\|all support which in gcc-9 got renamed to -mbranch-protection=pac-ret\|pac-ret+leaf\|standard. strip_pac uses inline asm instead of __builtin_aarch64_xpaclri since that is not a documented api and not available in all supported gccs. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	1be3d6eb82	aarch64: Add pac-ret support to assembly files Use return address signing in assembly files for functions that save LR when pac-ret is enabled in the compiler. The GNU property note for PAC-RET is not meaningful to the dynamic linker so it is not strictly required, but it may be used to track the security property of binaries. (The PAC-RET property is only set if BTI is set too because BTI implies working GNU property support.) Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	9e1751e6d6	aarch64: configure check for pac-ret code generation Return address signing requires unwinder support, which is present in libgcc since >=gcc-7, however due to bugs the support may be broken in <gcc-10 (and similarly there may be issues in custom unwinders), so pac-ret is not always safe to use. So in assembly code glibc should only use pac-ret if the compiler uses it too. Unfortunately there is no predefined feature macro for it set by the compiler so pac-ret is inferred from the code generation. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Szabolcs Nagy	de9301c02e	aarch64: ensure objects are BTI compatible When glibc is built with branch protection (i.e. with a gcc configured with --enable-standard-branch-protection), all glibc binaries should be BTI compatible and marked as such. It is easy to link BTI incompatible objects by accident and this is silent currently which is usually not the expectation, so this is changed into a link error. (There is no linker flag for failing on BTI incompatible inputs so all warnings are turned into fatal errors outside the test system when building glibc with branch protection.) Unfortunately, outlined atomic functions are not BTI compatible in libgcc (PR libgcc/96001), so to build glibc with current gcc use 'CC=gcc -mno-outline-atomics', this should be fixed in libgcc soon and then glibc can be built and tested without such workarounds. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:38 +01:00
Sudakshina Das	605338745b	aarch64: enable BTI at runtime Binaries can opt-in to using BTI via an ELF object file marking. The dynamic linker has to then mprotect the executable segments with PROT_BTI. In case of static linked executables or in case of the dynamic linker itself, PROT_BTI protection is done by the operating system. On AArch64 glibc uses PT_GNU_PROPERTY instead of PT_NOTE to check the properties of a binary because PT_NOTE can be unreliable with old linkers (old linkers just append the notes of input objects together and add them to the output without checking them for consistency which means multiple incompatible GNU property notes can be present in PT_NOTE). BTI property is handled in the loader even if glibc is not built with BTI support, so in theory user code can be BTI protected independently of glibc. In practice though user binaries are not marked with the BTI property if glibc has no support because the static linked libc objects (crt files, libc_nonshared.a) are unmarked. This patch relies on Linux userspace API that is not yet in a linux release but in v5.8-rc1 so scheduled to be in Linux 5.8. Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Szabolcs Nagy	5f846c8b0d	aarch64: fix RTLD_START for BTI Tailcalls must use x16 or x17 for the indirect branch instruction to be compatible with code that uses BTI c at function entries. (Other forms of indirect branches can only land on BTI j.) Also added a BTI c at the ELF entry point of rtld, this is not strictly necessary since the kernel does not use indirect branch to get there, but it seems safest once building glibc itself with BTI is supported. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Sudakshina Das	91181954f9	aarch64: Add BTI support to assembly files To enable building glibc with branch protection, assembly code needs BTI landing pads and ELF object file markings in the form of a GNU property note. The landing pads are unconditionally added to all functions that may be indirectly called. When the code segment is not mapped with PROT_BTI these instructions are nops. They are kept in the code when BTI is not supported so that the layout of performance critical code is unchanged across configurations. The GNU property notes are only added when there is support for BTI in the toolchain, because old binutils does not handle the notes right. (Does not know how to merge them nor to put them in PT_GNU_PROPERTY segment instead of PT_NOTE, and some versions of binutils emit warnings about the unknown GNU property. In such cases the produced libc binaries would not have valid ELF marking so BTI would not be enabled.) Note: functions using ENTRY or ENTRY_ALIGN now start with an additional BTI c, so alignment of the following code changes, but ENTRY_ALIGN_AND_PAD was fixed so there is no change to the existing code layout. Some string functions may need to be tuned for optimal performance after this commit. Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Szabolcs Nagy	2a4c2dde49	aarch64: Rename place holder .S files to .c The compiler can add required elf markings based on CFLAGS but the assembler cannot, so using C code for empty files creates less of a maintenance problem. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Szabolcs Nagy	1b0a4f58f5	aarch64: configure test for BTI support Check BTI support in the compiler and linker. The check also requires READELF that understands the BTI GNU property note. It is expected to succeed with gcc >=gcc-9 configured with --enable-standard-branch-protection and binutils >=binutils-2.33. Note: passing -mbranch-protection=bti in CFLAGS when building glibc may not be enough to get a glibc that supports BTI because crtbegin* and crtend* provided by the compiler needs to be BTI compatible too. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-07-08 15:02:37 +01:00
Alex Butler	03e1378f94	aarch64: MTE compatible strncmp Add support for MTE to strncmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	adac54ffc5	aarch64: MTE compatible strcmp Add support for MTE to strcmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	79160c06c7	aarch64: MTE compatible strrchr Add support for MTE to strrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	df06b0d90f	aarch64: MTE compatible memrchr Add support for MTE to memrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	7ff899969f	aarch64: MTE compatible memchr Add support for MTE to memchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	bb2c12aecb	aarch64: MTE compatible strcpy Add support for MTE to strcpy. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Adhemerval Zanella	ea04f02131	aarch64: Remove fpu Makefile The -fno-math-errno is already added by default and the minimum required GCC to build glibc (6.2) make the -ffinite-math-only superflous. Checked on aarch64-linux-gnu.	2020-06-22 11:09:50 -03:00
Adhemerval Zanella	271afad8f4	aarch64: Use math-use-builtins for ceil{f} The define is already set on the math-use-builtins-ceil.h, the patch just removes the implementations (it was missed on `c9feb1be93`). Checked on aarch64-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	e80501a5c9	math: Decompose math-use-builtins.h Each symbol definitions are moved on a separated file and it cover all symbol type definitions (float, double, long double, and float128). It allows to set support for architectures without the boiler place of copying default values. Checked with a build on the affected ABIs.	2020-06-22 11:09:45 -03:00
Andrea Corallo	a365ac45b7	aarch64: MTE compatible strlen Introduce an Arm MTE compatible strlen implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance on modern cores. On cores with less efficient Advanced SIMD implementation such as Cortex-A53 it can be slower. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-09 09:21:11 +01:00
Andrea Corallo	49beaaec1b	aarch64: MTE compatible strchr Introduce an Arm MTE compatible strchr implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-09 09:20:27 +01:00
Andrea Corallo	f7de454f20	aarch64: MTE compatible strchrnul Introduce an Arm MTE compatible strchrnul implementation. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-09 09:20:27 +01:00
Krzysztof Koch	d1f75e9644	AArch64: Merge Falkor memcpy and memmove implementations Falkor's memcpy and memmove share some implementation details, therefore, the two routines are moved to a single source file for code reuse. The two routines now share code for small and medium copies (up to and including 128 bytes). Large copies in memcpy do not handle overlap correctly, consequently, the loops for moving/copying more than 128 bytes stay separate for memcpy and memmove. To increase code reuse a number of small modifications were made: 1. The old implementation of memcpy copied the first 16-bytes as soon as the size of data was determined to be greater than 32 bytes. For memcpy code to also work when copying small/medium overlapping data, the first load and store was moved to the large copy case. 2. Medium memcpy case no longer assumes that 16 bytes were already copied and uses 8 registers to copy up to 128 bytes. 3. Small case for memmove was enlarged to that of memcpy, which is less than or equal to 32 bytes. 4. Medium case for memmove was enlarged to that of memcpy, which is less than or equal to 128 bytes. Other changes include: 1. Improve alignment of existing loop bodies. 2. 'Delouse' memmove and memcpy input arguments. Make sure that upper 32-bits of input registers are zeroed if unused. 3. Do one more iteration in memmove loops and reduce the number of copies made from the start/end of the buffer, depending on the direction of the memmove loop. Benchmarking: Looking at the results from bench-memcpy-random.out, we can see that now memmove_falkor is about 5% faster than memcpy_falkor_old, while memmove_falkor_old was more than 15% slower. The memcpy implementation remained largely unmodified, so there is no significant performance change. The reason for such a significant memmove performance gain is the increase of the upper bound on the small copy case to 32 bytes and the increase of the upper bound on the medium copy case to 128 bytes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-08 14:13:05 +01:00
Vineet Gupta	c9feb1be93	aarch/fpu: use generic builtins based math functions introduce sysdep header math-use-builtins.h to replace aarch64 implementations with corresponding generic ones. - newly inroduced generic sqrt{,f}, fma{,f} - existing floor{,f}, nearbyint{,f}, rint{,f}, round{,f}, trunc{,f} - Note that generic copysign was already enabled (via generic math-use-builtins.h) now thru sysdep header Tested with build-many-glibcs for aarch64-linux-gnu This is a non functional change and aarch64 libm before/after was byte invariant as compared below: \| cd /SCRATCH/vgupta/gnu/install-glibc-A-baseline \| for i in `find . -name libm-2.31.9000.so`; do \| echo $i; diff $i /SCRATCH/vgupta/gnu/install-glibc-C-reduce-scope/$i ; \| echo $?; \| done \| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabi/lib/libm-2.31.9000.so \| 0 \| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./powerpc-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./microblaze-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./nios2-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./hppa-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./s390x-linux-gnu/lib64/libm-2.31.9000.so \| 0 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-03 10:23:33 -07:00
Lexi Shao	59b64f9cbb	aarch64: fix strcpy and strnlen for big-endian [BZ #25824 ] This patch fixes the optimized implementation of strcpy and strnlen on a big-endian arm64 machine. The optimized method uses neon, which can process 128bit with one instruction. On a big-endian machine, the bit order should be reversed for the whole 128-bits double word. But with instuction rev64 datav.16b, datav.16b it reverses 64bits in the two halves rather than reversing 128bits. There is no such instruction as rev128 to reverse the 128bits, but we can fix this by loading the data registers accordingly. Fixes 0237b61526e7("aarch64: Optimized implementation of strcpy") and 2911cb68ed3d("aarch64: Optimized implementation of strnlen"). Signed-off-by: Lexi Shao <shaolexi@huawei.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2020-05-15 12:15:56 +01:00
Adhemerval Zanella	6a0474c769	Update aarch64 libm-test-ulps	2020-04-08 13:52:44 -03:00
Adhemerval Zanella	1c15464ca0	math: Remove inline math tests With mathinline removal there is no need to keep building and testing inline math tests. The gen-libm-tests.py support to generate ULP_I_* is removed and all libm-test-ulps files are updated to longer have the i{float,double,ldouble} entries. The support for no-test-inline is also removed from both gen-auto-libm-tests and the auto-libm-test-out-* were regenerated. Checked on x86_64-linux-gnu and i686-linux-gnu.	2020-03-19 11:45:44 -03:00
Wilco Dijkstra	7000651327	[AArch64] Improve integer memcpy Further optimize integer memcpy. Small cases now include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Comments have been rewritten.	2020-03-11 17:15:25 +00:00
Florian Weimer	f4349837d9	Introduce <elf-initfini.h> and ELF_INITFINI for all architectures This supersedes the init_array sysdeps directory. It allows us to check for ELF_INITFINI in both C and assembler code, and skip DT_INIT and DT_FINI processing completely on newer architectures. A new header file is needed because <dl-machine.h> is incompatible with assembler code. <sysdep.h> is compatible with assembler code, but it cannot be included in all assembler files because on some architectures, it redefines register names, and some assembler files conflict with that. <elf-initfini.h> is replicated for legacy architectures which need DT_INIT/DT_FINI support. New architectures follow the generic default and disable it.	2020-02-18 15:12:25 +01:00
Andreas Schwab	4970c9e0b5	nptl: add missing pthread-offsets.h All architectures using their own definition of struct __pthread_rwlock_arch_t need to provide their own pthread-offsets.h.	2020-02-10 17:01:21 +01:00
Wilco Dijkstra	220622dde5	Add libm_alias_finite for _finite symbols This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2020-01-03 10:02:04 -03:00
Joseph Myers	d614a75396	Update copyright dates with scripts/update-copyrights.	2020-01-01 00:14:33 +00:00
Xuelei Zhang	863d775c48	aarch64: add default memcpy version for kunpeng920 Checked on aarch64-linux-gnu.	2019-12-27 11:59:37 -03:00
Xuelei Zhang	10df95cdaf	aarch64: ifunc rename for kunpeng Rename ifunc for kunpeng to kunpeng920, and modify the corresponding function files including IS_KUNPENG920 judgement. Checked on aarch64-linux-gnu.	2019-12-27 11:59:51 -03:00
Xuelei Zhang	64297d49b3	aarch64: Modify error-shown comments for strcpy Checked on aarch64-linux-gnu.	2019-12-27 11:59:37 -03:00
Xuelei Zhang	525de033a9	aarch64: Optimized memset for Kunpeng processor. Due to the branch prediction issue of Kunpeng processor, we found memset_generic has poor performance on middle sizes setting, and so we reconstructed the logic, expanded the loop by 4 times in set_long to solve the problem, even when setting below 1K sizes have benefit. Another change is that DZ_ZVA seems no work when setting zero, so we discarded it and used set_long to set zero instead. Fewer branches and predictions also make the zero case have slightly improvement. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2019-12-19 16:31:04 -03:00
Xuelei Zhang	c2150769d0	aarch64: Optimized strlen for strlen_asimd Optimize the strlen implementation by using vector operations and loop unrolling in main loop.Compared to __strlen_generic,it reduces latency of cases in bench-strlen by 7%~18% when the length of src is greater than 128 bytes, with gains throughout the benchmark. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2019-12-19 16:31:04 -03:00
Xuelei Zhang	a7611806d5	aarch64: Optimized implementation of memrchr Considering the excellent performance of memchr.S on glibc 2.30, the same algorithm is used to find chrin. Compared to memrchr.c, this method with memrchr.S achieves an average performance improvement of 58% based on benchtest and its extension cases. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2019-12-19 16:31:04 -03:00
Xuelei Zhang	2911cb68ed	aarch64: Optimized implementation of strnlen Optimize the strlen implementation by using vector operations and loop unrooling in main loop. Compared to aarch64/strnlen.S, it reduces latency of cases in bench-strnlen by 11%~24% when the length of src is greater than 64 bytes, with gains throughout the benchmark. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2019-12-19 16:31:04 -03:00
Xuelei Zhang	0237b61526	aarch64: Optimized implementation of strcpy Optimize the strcpy implementation by using vector loads and operations in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases in bench-strlen by 5%~18% when the length of src is greater than 64 bytes, with gains throughout the benchmark. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2019-12-19 16:31:04 -03:00
Xuelei Zhang	233efd433d	aarch64: Optimized implementation of memcmp The loop body is expanded from a 16-byte comparison to a 64-byte comparison, and the usage of ldp is replaced by the Post-index mode to the Base plus offset mode. Hence, compare can faster 18% around > 128 bytes in all. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2019-12-19 16:31:04 -03:00
Florian Weimer	4db71d2f98	elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214 ] This commit adds missing skip_ifunc checks to aarch64, arm, i386, sparc, and x86_64. A new test case ensures that IRELATIVE IFUNC resolvers do not run in various diagnostic modes of the dynamic loader. Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>	2019-12-02 14:55:22 +01:00
Adhemerval Zanella	7ddac7f265	nptl: Add default pthread-offsets.h This patch adds a default pthread-offsets.h based on default thread definitions from struct_mutex.h and struct_rwlock.h. The idea is to simplify new ports inclusion. Checked with a build on affected abis. Change-Id: I7785a9581e651feb80d1413b9e03b5ac0452668a	2019-11-26 13:53:36 +00:00
Adhemerval Zanella	7df8af43ad	nptl: Add struct_rwlock.h This patch adds a new generic __pthread_rwlock_arch_t definition meant to be used by new ports. Its layout mimics the current usage on some 64 bits ports and it allows some ports to use the generic definition. The arch __pthread_rwlock_arch_t definition is moved from pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h). Also the static intialization macro for pthread_rwlock_t is set to use an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its implementation. The default pthread_rwlock_t layout differs from current ports with: 1. Internal layout is the same for 32 bits and 64 bits. 2. Internal flag is an unsigned short so it should not required additional padding to align for word boundary (if it is the case for the ABI). Checked with a build on affected abis. Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0	2019-11-26 13:53:36 +00:00
Adhemerval Zanella	1c3f9acf1f	nptl: Add struct_mutex.h The current way of defining the common mutex definition for POSIX and C11 on pthreadtypes-arch.h (added by commit `06be6368da`) is not really the best options for newer ports. It requires define some misleading flags that should be always defined as 0 (__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it exposes options used solely for linuxthreads compat mode (__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and requires newer ports to explicit define them (adding more boilerplate code). This patch adds a new default __pthread_mutex_s definition meant to be used by newer ports. Its layout mimics the current usage on both 32 and 64 bits ports and it allows most ports to use the generic definition. Only ports that use some arch-specific definition (such as hardware lock-elision or linuxthreads compat) requires specific headers. For 32 bit, the generic definitions mimic the other 32-bit ports of using an union to define the fields uses on adaptive and robust mutexes (thus not allowing both usage at same time) and by using a single linked-list for robust mutexes. Both decisions seemed to follow what recent ports have done and make the resulting pthread_mutex_t/mtx_t object smaller. Also the static intialization macro for pthread_mutex_t is set to use a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine in its struct_mutex.h if it requires additional fields to be initialized. Checked with a build on affected abis. Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13	2019-11-26 13:53:36 +00:00
Adhemerval Zanella	0377a7fde6	nptl: Remove rwlock elision definitions The new rwlock implementation added by `cc25c8b4c1` (2.25) removed support for lock-elision. This patch removes remaining the arch-specific unused definitions. Checked with a build against all affected ABIs. Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6	2019-11-26 13:53:36 +00:00
Adhemerval Zanella	48dbce60cf	nptl: Add tests for internal pthread_rwlock_t offsets This patch new build tests to check for internal fields offsets for internal pthread_rwlock_t definition. Althoug the '__data.__flags' field layout should be preserved due static initializators, the patch also adds tests for the futexes that may be used in a shared memory (although using different libc version in such scenario is not really supported). Checked with a build against all affected ABIs. Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148	2019-11-26 13:53:36 +00:00
Adhemerval Zanella	71d260c107	nptl: Cleanup mutex internal offset tests The offsets of pthread_mutex_t __data.__nusers, __data.__spins, __data.elision, __data.list are not required to be constant over the releases. Only the __data.__kind is used for static initializers. This patch also adds an additional size check for __data.__kind. Checked with a build against affected ABIs. Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f	2019-11-26 13:53:36 +00:00
Krzysztof Koch	b9f145df85	aarch64: Increase small and medium cases for __memcpy_generic Increase the upper bound on medium cases from 96 to 128 bytes. Now, up to 128 bytes are copied unrolled. Increase the upper bound on small cases from 16 to 32 bytes so that copies of 17-32 bytes are not impacted by the larger medium case. Benchmarking: The attached figures show relative timing difference with respect to 'memcpy_generic', which is the existing implementation. 'memcpy_med_128' denotes the the version of memcpy_generic with only the medium case enlarged. The 'memcpy_med_128_small_32' numbers are for the version of memcpy_generic submitted in this patch, which has both medium and small cases enlarged. The figures were generated using the script from: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html Depending on the platform, the performance improvement in the bench-memcpy-random.c benchmark ranges from 6% to 20% between the original and final version of memcpy.S Tested against GLIBC testsuite and randomized tests.	2019-11-12 17:08:18 +00:00
Alistair Francis	aa706e13f4	Split up endian.h to minimize exposure of BYTE_ORDER. With only two exceptions (sys/types.h and sys/param.h, both of which historically might have defined BYTE_ORDER) the public headers that include <endian.h> only want to be able to test __BYTE_ORDER against ___ENDIAN. This patch creates a new bits/endian.h that can be included by any header that wants to be able to test __BYTE_ORDER and/or __FLOAT_WORD_ORDER against the ___ENDIAN constants, or needs __LONG_LONG_PAIR. It only defines macros in the implementation namespace. The existing bits/endian.h (which could not be included independently of endian.h, and only defines __BYTE_ORDER and maybe __FLOAT_WORD_ORDER) is renamed to bits/endianness.h. I also took the opportunity to canonicalize the form of this header, which we are stuck with having one copy of per architecture. Since they are so short, this means git doesn’t understand that they were renamed from existing headers, sigh. endian.h itself is a nonstandard header and its only remaining use from a standard header is guarded by __USE_MISC, so I dropped the __USE_MISC conditionals from around all of the public-namespace things it defines. (This means, an application that requests strict library conformance but includes endian.h will still see the definition of BYTE_ORDER.) A few changes to specific bits/endian(ness).h variants deserve mention: - sysdeps/unix/sysv/linux/ia64/bits/endian.h is moved to sysdeps/ia64/bits/endianness.h. If I remember correctly, ia64 did have selectable endianness, but we have assembly code in sysdeps/ia64 that assumes it’s little-endian, so there is no reason to treat the ia64 endianness.h as linux-specific. - The C-SKY port does not fully support big-endian mode, the compile will error out if __CSKYBE__ is defined. - The PowerPC port had extra logic in its bits/endian.h to detect a broken compiler, which strikes me as unnecessary, so I removed it. - The only files that defined __FLOAT_WORD_ORDER always defined it to the same value as __BYTE_ORDER, so I removed those definitions. The SH bits/endian(ness).h had comments inconsistent with the actual setting of __FLOAT_WORD_ORDER, which I also removed. - I removed copyright boilerplate from the few bits/endian(ness).h headers that had it; these files record a single fact in a fashion dictated by an external spec, so I do not think they are copyrightable. As long as I was changing every copy of ieee754.h in the tree, I noticed that only the MIPS variant includes float.h, because it uses LDBL_MANT_DIG to decide among three different versions of ieee854_long_double. This patch makes it not include float.h when GCC’s intrinsic __LDBL_MANT_DIG__ is available. * string/endian.h: Unconditionally define LITTLE_ENDIAN, BIG_ENDIAN, PDP_ENDIAN, and BYTE_ORDER. Condition byteswapping macros only on !__ASSEMBLER__. Move the definitions of __BIG_ENDIAN, __LITTLE_ENDIAN, __PDP_ENDIAN, __FLOAT_WORD_ORDER, and __LONG_LONG_PAIR to... * string/bits/endian.h: ...this new file, which includes the renamed header bits/endianness.h for the definition of __BYTE_ORDER and possibly __FLOAT_WORD_ORDER. * string/Makefile: Install bits/endianness.h. * include/bits/endian.h: New wrapper. * bits/endian.h: Rename to bits/endianness.h. Add multiple-include guard. Rewrite the comment explaining what the machine-specific variants of this file should do. * sysdeps/unix/sysv/linux/ia64/bits/endian.h: Move to sysdeps/ia64. * sysdeps/aarch64/bits/endian.h * sysdeps/alpha/bits/endian.h * sysdeps/arm/bits/endian.h * sysdeps/csky/bits/endian.h * sysdeps/hppa/bits/endian.h * sysdeps/ia64/bits/endian.h * sysdeps/m68k/bits/endian.h * sysdeps/microblaze/bits/endian.h * sysdeps/mips/bits/endian.h * sysdeps/nios2/bits/endian.h * sysdeps/powerpc/bits/endian.h * sysdeps/riscv/bits/endian.h * sysdeps/s390/bits/endian.h * sysdeps/sh/bits/endian.h * sysdeps/sparc/bits/endian.h * sysdeps/x86/bits/endian.h: Rename to endianness.h; canonicalize form of file; remove redundant definitions of __FLOAT_WORD_ORDER. * sysdeps/powerpc/bits/endianness.h: Remove logic to check for broken compilers. * ctype/ctype.h * sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h * sysdeps/arm/nptl/bits/pthreadtypes-arch.h * sysdeps/csky/nptl/bits/pthreadtypes-arch.h * sysdeps/ia64/ieee754.h * sysdeps/ieee754/ieee754.h * sysdeps/ieee754/ldbl-128/ieee754.h * sysdeps/ieee754/ldbl-128ibm/ieee754.h * sysdeps/m68k/nptl/bits/pthreadtypes-arch.h * sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h * sysdeps/mips/ieee754/ieee754.h * sysdeps/mips/nptl/bits/pthreadtypes-arch.h * sysdeps/nios2/nptl/bits/pthreadtypes-arch.h * sysdeps/nptl/pthread.h * sysdeps/riscv/nptl/bits/pthreadtypes-arch.h * sysdeps/sh/nptl/bits/pthreadtypes-arch.h * sysdeps/sparc/sparc32/ieee754.h * sysdeps/unix/sysv/linux/generic/bits/stat.h * sysdeps/unix/sysv/linux/generic/bits/statfs.h * sysdeps/unix/sysv/linux/sys/acct.h * wctype/bits/wctype-wchar.h: Include bits/endian.h, not endian.h. * sysdeps/unix/sysv/linux/hppa/pthread.h: Don’t include endian.h. * sysdeps/mips/ieee754/ieee754.h: Use __LDBL_MANT_DIG__ in ifdefs, instead of LDBL_MANT_DIG. Only include float.h when __LDBL_MANT_DIG__ is not predefined, in which case define __LDBL_MANT_DIG__ to equal LDBL_MANT_DIG.	2019-10-01 14:54:46 -07:00
Paul Eggert	5a82c74822	Prefer https to http for gnu.org and fsf.org URLs Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http\|ftp)(://(.\.)?(gnu\|fsf\|sourceware)\.org($\|[^.]\|\.[^a-z])),https\2,g s,(http\|ftp)(://(.\.)?)sources\.redhat\.com($\|[^.]\|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '.po' \ ! -name 'ChangeLog' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: * error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: * error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S	2019-09-07 02:43:31 -07:00
Feng Xue	b68fabfbbc	aarch64: Disable using DC ZVA in emag memset * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD): Disable DC ZVA code if this macro is defined as zero. * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD): Change to zero to disable using DC ZVA.	2019-08-14 10:58:21 +08:00
Joseph Myers	0175c9e9be	Declare most TS 18661-1 interfaces for C2X. C2X adds the interfaces from TS 18661-1, and all except a handful in Annex F are unconditionally visible in C2X rather than only visible when __STDC_WANT_IEC_60559_BFP_EXT__ is defined. This patch updates glibc headers accordingly: most uses of __GLIBC_USE (IEC_60559_BFP_EXT) are changed to a new __GLIBC_USE (IEC_60559_BFP_EXT_C2X). (Regarding totalorder and totalordermag, the type-generic macros in tgmath.h will go away when the functions are changed to take pointer arguments.) * bits/libc-header-start.h (__GLIBC_USE_IEC_60559_BFP_EXT): Update comment. (__GLIBC_USE_IEC_60559_BFP_EXT_C2X): New macro. * bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Change to [__GLIBC_USE (IEC_60559_BFP_EXT_C2X)]. * include/limits.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * math/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * math/math.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * stdlib/bits/stdlib-ldbl.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * stdlib/stdint.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * stdlib/stdlib.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/csky/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/m68k/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/microblaze/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/riscv/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise. * math/bits/mathcalls.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise, except for totalorder, totalordermag, getpayload, setpayload and setpayloadsig. * math/tgmath.h [__GLIBC_USE (IEC_60559_BFP_EXT)]: Likewise, except for totalorder and totalordermag.	2019-08-13 11:28:51 +00:00
Szabolcs Nagy	30ba037546	aarch64: simplify the DT_AARCH64_VARIANT_PCS handling code Remove unnecessary variant_pcs field: the dynamic tag can be checked directly. * sysdeps/aarch64/dl-machine.h (elf_machine_runtime_setup): Remove the DT_AARCH64_VARIANT_PCS check. (elf_machine_lazy_rel): Use l_info[DT_AARCH64 (VARIANT_PCS)]. * sysdeps/aarch64/linkmap.h (struct link_map_machine): Remove variant_pcs.	2019-07-10 15:28:00 +01:00
Szabolcs Nagy	2b8a3c86e7	aarch64: new ifunc resolver ABI Passing a second argument to the ifunc resolver allows accessing AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2 on linux because for ilp32 to remain compatible with lp64 ABI no more than 32bit hwcap flags can be in AT_HWCAP which is already used up. Currently the relocation ordering logic does not guarantee that ifunc resolvers can call libc apis or access libc objects, so only the resolver arguments and runtime environment dependent instructions can be used to do the dispatch (this affects ifunc resolvers outside of the libc). Since ifunc resolver is target specific and only supposed to be called by the dynamic linker, the call ABI can be changed in a backward compatible way: Old call ABI passed hwcap as uint64_t, new abi sets the _IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument that's a pointer to an extendible struct. A resolver has to check the _IFUNC_ARG_HWCAP flag before accessing the second argument. The new sys/ifunc.h installed header has the definitions for the new ABI, everything is in the implementation reserved namespace. An alternative approach is to try to support extern calls from ifunc resolvers such as getauxval, but that seems non-trivial https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html * sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests. * sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI. * sysdeps/aarch64/sys/ifunc.h: New file. * sysdeps/aarch64/tst-ifunc-arg-1.c: New file. * sysdeps/aarch64/tst-ifunc-arg-2.c: New file.	2019-07-04 11:13:32 +01:00
Szabolcs Nagy	82bc69c012	aarch64: handle STO_AARCH64_VARIANT_PCS Avoid lazy binding of symbols that may follow a variant PCS with different register usage convention from the base PCS. Currently the lazy binding entry code does not preserve all the registers required for AdvSIMD and SVE vector calls. Saving and restoring all registers unconditionally may break existing binaries, even if they never use vector calls, because of the larger stack requirement for lazy resolution, which can be significant on an SVE system. The solution is to mark all symbols in the symbol table that may follow a variant PCS so the dynamic linker can handle them specially. In this patch such symbols are always resolved at load time, not lazily. So currently LD_AUDIT for variant PCS symbols are not supported, for that the _dl_runtime_profile entry needs to be changed e.g. to unconditionally save/restore all registers (but pass down arg and retval registers to pltentry/exit callbacks according to the base PCS). This patch also removes a __builtin_expect from the modified code because the branch prediction hint did not seem useful. * sysdeps/aarch64/dl-dtprocnum.h: New file. * sysdeps/aarch64/dl-machine.h (DT_AARCH64): Define. (elf_machine_runtime_setup): Handle DT_AARCH64_VARIANT_PCS. (elf_machine_lazy_rel): Check STO_AARCH64_VARIANT_PCS and bind such symbols at load time. * sysdeps/aarch64/linkmap.h (struct link_map_machine): Add variant_pcs.	2019-06-13 09:45:00 +01:00
Anton Youdkevitch	32e902a94e	aarch64: thunderx2 memmove performance improvements The performance improvement is about 20%-30% for larger cases and about 1%-5% for smaller cases. Used SIMD load/store instead of GPR for large overlapping forward moves. Reused existing memcpy implementation for smaller or overlapping backward moves. Fixed the existing memcpy implementation to allow it to deal with the overlapping case. Simplified loop tails in the memcpy implementation - use branchless overlapping sequence of fixed length load/stores instead of branching depending on the size. A cleanup/optimization converting str's to stp's. Added __memmove_thunderx2 to the list of the available implementations.	2019-05-03 11:01:34 -07:00
Anton Youdkevitch	94e358f6d4	aarch64: thunderx2 memcpy implementation cleanup and streamlining Here is the updated patch for improving the long unaligned code path (the one using "ext" instruction). 1. Always taken conditional branch at the beginning is removed. 2. Epilogue code is placed after the end of the loop to reduce the number of branches. 3. The redundant "mov" instructions inside the loop are gone due to the changed order of the registers in the "ext" instructions inside the loop, the prologue has additional "ext" instruction. 4.Updating count in the prologue was hoisted out as it is the same update for each prologue. 5. Invariant code of the loop epilogue was hoisted out. 6. As the current size of the ext chunk is exactly 16 instructions long "nop" was added at the beginning of the code sequence so that the loop entry for all the chunks be aligned. * sysdeps/aarch64/multiarch/memcpy_thunderx2.S: Cleanup branching and remove redundant code.	2019-04-05 13:59:54 -07:00
Joseph Myers	a04549c194	Break more lines before not after operators. This patch makes further coding style fixes where code was breaking lines after an operator, contrary to the GNU Coding Standards. As with the previous patch, it is limited to files following a reasonable approximation to GNU style already, and is not exhaustive; more such issues remain to be fixed. Tested for x86_64, and with build-many-glibcs.py. * dirent/dirent.h [!_DIRENT_HAVE_D_NAMLEN && _DIRENT_HAVE_D_RECLEN] (_D_ALLOC_NAMLEN): Break lines before rather than after operators. * elf/cache.c (print_cache): Likewise. * gshadow/fgetsgent_r.c (__fgetsgent_r): Likewise. * htl/pt-getattr.c (__pthread_getattr_np): Likewise. * hurd/hurdinit.c (_hurd_setproc): Likewise. * hurd/hurdkill.c (_hurd_sig_post): Likewise. * hurd/hurdlookup.c (__file_name_lookup_under): Likewise. * hurd/hurdsig.c (_hurd_internal_post_signal): Likewise. (reauth_proc): Likewise. * hurd/lookup-at.c (__file_name_lookup_at): Likewise. (__file_name_split_at): Likewise. (__directory_name_split_at): Likewise. * hurd/lookup-retry.c (__hurd_file_name_lookup_retry): Likewise. * hurd/port2fd.c (_hurd_port2fd): Likewise. * iconv/gconv_dl.c (do_print): Likewise. * inet/netinet/in.h (struct sockaddr_in): Likewise. * libio/wstrops.c (_IO_wstr_seekoff): Likewise. * locale/setlocale.c (new_composite_name): Likewise. * malloc/memusagestat.c (main): Likewise. * misc/fstab.c (fstab_convert): Likewise. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_usercnt): Likewise. * nss/nss_compat/compat-grp.c (getgrent_next_nss): Likewise. (getgrent_next_file): Likewise. (internal_getgrnam_r): Likewise. (internal_getgrgid_r): Likewise. * nss/nss_compat/compat-initgroups.c (getgrent_next_nss): Likewise. (internal_getgrent_r): Likewise. * nss/nss_compat/compat-pwd.c (getpwent_next_nss_netgr): Likewise. (getpwent_next_nss): Likewise. (getpwent_next_file): Likewise. (internal_getpwnam_r): Likewise. (internal_getpwuid_r): Likewise. * nss/nss_compat/compat-spwd.c (getspent_next_nss_netgr): Likewise. (getspent_next_nss): Likewise. (internal_getspnam_r): Likewise. * pwd/fgetpwent_r.c (__fgetpwent_r): Likewise. * shadow/fgetspent_r.c (__fgetspent_r): Likewise. * string/strchr.c (STRCHR): Likewise. * string/strchrnul.c (STRCHRNUL): Likewise. * sysdeps/aarch64/fpu/fpu_control.h (_FPU_FPCR_IEEE): Likewise. * sysdeps/aarch64/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/csky/dl-machine.h (elf_machine_rela): Likewise. * sysdeps/generic/memcopy.h (PAGE_COPY_FWD_MAYBE): Likewise. * sysdeps/generic/symbol-hacks.h (__stack_chk_fail_local): Likewise. * sysdeps/gnu/netinet/ip_icmp.h (ICMP_INFOTYPE): Likewise. * sysdeps/gnu/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/gnu/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/hppa/jmpbuf-unwind.h (_JMPBUF_UNWINDS): Likewise. * sysdeps/mach/hurd/bits/stat.h (S_ISPARE): Likewise. * sysdeps/mach/hurd/dl-sysdep.c (_dl_sysdep_start): Likewise. (open_file): Likewise. * sysdeps/mach/hurd/htl/pt-mutexattr-setprotocol.c (pthread_mutexattr_setprotocol): Likewise. * sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise. * sysdeps/mach/hurd/mmap.c (__mmap): Likewise. * sysdeps/mach/hurd/ptrace.c (ptrace): Likewise. * sysdeps/mach/hurd/spawni.c (__spawni): Likewise. * sysdeps/microblaze/dl-machine.h (elf_machine_type_class): Likewise. (elf_machine_rela): Likewise. * sysdeps/mips/mips32/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/mips/mips64/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/mips/sys/asm.h (multiple #if conditionals): Likewise. * sysdeps/posix/rename.c (rename): Likewise. * sysdeps/powerpc/novmx-sigjmp.c (__novmx__sigjmp_save): Likewise. * sysdeps/powerpc/sigjmp.c (__vmx__sigjmp_save): Likewise. * sysdeps/s390/fpu/fenv_libc.h (FPC_VALID_MASK): Likewise. * sysdeps/s390/utf8-utf16-z9.c (gconv_end): Likewise. * sysdeps/unix/grantpt.c (grantpt): Likewise. * sysdeps/unix/sysv/linux/a.out.h (N_TXTOFF): Likewise. * sysdeps/unix/sysv/linux/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/unix/sysv/linux/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/x86/cpu-features.c (get_common_indices): Likewise. * time/tzfile.c (__tzfile_compute): Likewise.	2019-02-25 13:19:19 +00:00
Feng Xue	83d1cc42d8	aarch64: Optimized memchr specific to AmpereComputing emag This version uses general register based memory instruction to load data, because vector register based is slightly slower in emag. Character-matching is performed on 16-byte (both size and alignment) memory block in parallel each iteration. * sysdeps/aarch64/memchr.S (__memchr): Rename to MEMCHR. [!MEMCHR](MEMCHR): Set to __memchr. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memchr_generic and memchr_nosimd. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add memchr ifuncs. * sysdeps/aarch64/multiarch/memchr.c: New file. * sysdeps/aarch64/multiarch/memchr_generic.S: Likewise. * sysdeps/aarch64/multiarch/memchr_nosimd.S: Likewise.	2019-02-01 08:14:21 -05:00
Feng Xue	c7d3890ff5	aarch64: Optimized memset specific to AmpereComputing emag This version uses general register based memory store instead of vector register based, for the former is faster than the latter in emag. The fact that DC ZVA size in emag is 64-byte, is used by IFUNC dispatch to select this memset, so that cost of runtime-check on DC ZVA size can be saved. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memset_emag. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __memset_emag to memset ifunc. * sysdeps/aarch64/multiarch/memset.c (libc_ifunc): Add IS_EMAG check for ifunc dispatch. * sysdeps/aarch64/multiarch/memset_base64.S: New file. * sysdeps/aarch64/multiarch/memset_emag.S: New file.	2019-02-01 07:59:18 -05:00
Wilco Dijkstra	02f440c1ef	[AArch64] Add ifunc support for Ares Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares supports 2 128-bit loads/stores, use Neon registers for memcpy by selecting __memcpy_falkor by default (we should rename this to __memcpy_simd or similar). * manual/tunables.texi (glibc.cpu.name): Add ares tunable. * sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use __memcpy_falkor for ares. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES): Add new define. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add ares cpu.	2019-01-09 10:35:34 +00:00
Joseph Myers	04277e02d7	Update copyright dates with scripts/update-copyrights. * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.	2019-01-01 00:11:28 +00:00
Wilco Dijkstra	5770c0ad1e	[AArch64] Adjust writeback in non-zero memset This fixes an ineffiency in the non-zero memset. Delaying the writeback until the end of the loop is slightly faster on some cores - this shows ~5% performance gain on Cortex-A53 when doing large non-zero memsets. * sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop.	2018-11-20 12:37:00 +00:00
Steve Ellcey	f0da0bcf8b	Remove extra space at end of line.	2018-10-16 11:02:03 -07:00
Anton Youdkevitch	75c1aee500	aarch64: optimized memcpy implementation for thunderx2 Since aligned loads and stores are huge performance advantage the implementation always tries to do aligned access. Among the cases when src and dst addresses are aligned or unaligned evenly there are cases of not evenly unaligned src and dst. For such cases (if the length is big enough) ext instruction is used to merge-and-shift two memory chunks loaded from two adjacent aligned locations and then the adjusted chunk gets stored to aligned address. Performance gain against the current T2 implementation: memcpy-large: 65K-32M: +40% - +10% memcpy-walk: 128-32M: +20% - +2%	2018-10-16 11:00:27 -07:00
Joseph Myers	c52944e8cc	Remove unnecessary math_private.h includes. After my changes to move various macros, inlines and other content from math_private.h to more specific headers, many files including math_private.h no longer need to do so. Furthermore, since the optimized inlines of various functions have been moved to include/fenv.h or replaced by use of function names GCC inlines automatically, a missing math_private.h include where one is appropriate will reliably cause a build failure rather than possibly causing code to be less well optimized while still building successfully. Thus, this patch removes includes of math_private.h that are now unnecessary. In the case of two RISC-V files, the include is replaced by one of stdbool.h because the files in question were relying on math_private.h to get a definition of bool. Tested for x86_64 and x86, and with build-many-glibcs.py. * math/fromfp.h: Do not include <math_private.h>. * math/s_cacosh_template.c: Likewise. * math/s_casin_template.c: Likewise. * math/s_casinh_template.c: Likewise. * math/s_ccos_template.c: Likewise. * math/s_cproj_template.c: Likewise. * math/s_fdim_template.c: Likewise. * math/s_fmaxmag_template.c: Likewise. * math/s_fminmag_template.c: Likewise. * math/s_iseqsig_template.c: Likewise. * math/s_ldexp_template.c: Likewise. * math/s_nextdown_template.c: Likewise. * math/w_log1p_template.c: Likewise. * math/w_scalbln_template.c: Likewise. * sysdeps/aarch64/fpu/feholdexcpt.c: Likewise. * sysdeps/aarch64/fpu/fesetround.c: Likewise. * sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise. * sysdeps/aarch64/fpu/ftestexcept.c: Likewise. * sysdeps/aarch64/fpu/s_llrint.c: Likewise. * sysdeps/aarch64/fpu/s_llrintf.c: Likewise. * sysdeps/aarch64/fpu/s_lrint.c: Likewise. * sysdeps/aarch64/fpu/s_lrintf.c: Likewise. * sysdeps/i386/fpu/s_atanl.c: Likewise. * sysdeps/i386/fpu/s_f32xaddf64.c: Likewise. * sysdeps/i386/fpu/s_f32xsubf64.c: Likewise. * sysdeps/i386/fpu/s_fdim.c: Likewise. * sysdeps/i386/fpu/s_logbl.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/i386/fpu/s_significandl.c: Likewise. * sysdeps/ia64/fpu/s_matherrf.c: Likewise. * sysdeps/ia64/fpu/s_matherrl.c: Likewise. * sysdeps/ieee754/dbl-64/s_atan.c: Likewise. * sysdeps/ieee754/dbl-64/s_cbrt.c: Likewise. * sysdeps/ieee754/dbl-64/s_fma.c: Likewise. * sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise. * sysdeps/ieee754/flt-32/s_cbrtf.c: Likewise. * sysdeps/ieee754/k_standardf.c: Likewise. * sysdeps/ieee754/k_standardl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_finitel.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_isinfl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_isnanl.c: Likewise. * sysdeps/ieee754/ldbl-64-128/s_signbitl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_cbrtl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fma.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise. * sysdeps/ieee754/s_signgam.c: Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c: Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c: Likewise. * sysdeps/powerpc/power7/fpu/s_logbf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvd/s_finite.c: Likewise. * sysdeps/riscv/rvd/s_fmax.c: Likewise. * sysdeps/riscv/rvd/s_fmin.c: Likewise. * sysdeps/riscv/rvd/s_fpclassify.c: Likewise. * sysdeps/riscv/rvd/s_isinf.c: Likewise. * sysdeps/riscv/rvd/s_isnan.c: Likewise. * sysdeps/riscv/rvd/s_issignaling.c: Likewise. * sysdeps/riscv/rvf/fegetround.c: Likewise. * sysdeps/riscv/rvf/feholdexcpt.c: Likewise. * sysdeps/riscv/rvf/fesetenv.c: Likewise. * sysdeps/riscv/rvf/fesetround.c: Likewise. * sysdeps/riscv/rvf/feupdateenv.c: Likewise. * sysdeps/riscv/rvf/fgetexcptflg.c: Likewise. * sysdeps/riscv/rvf/ftestexcept.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/riscv/rvf/s_finitef.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/riscv/rvf/s_fmaxf.c: Likewise. * sysdeps/riscv/rvf/s_fminf.c: Likewise. * sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise. * sysdeps/riscv/rvf/s_isinff.c: Likewise. * sysdeps/riscv/rvf/s_isnanf.c: Likewise. * sysdeps/riscv/rvf/s_issignalingf.c: Likewise. * sysdeps/riscv/rvf/s_nearbyintf.c: Likewise. * sysdeps/riscv/rvf/s_roundevenf.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Include <stdbool.h> instead of <math_private.h>. * sysdeps/riscv/rvf/s_rintf.c: Likewise.	2018-09-28 21:53:33 +00:00
Joseph Myers	9755bc4686	Use round functions not __round functions in glibc libm. Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __round functions to call the corresponding round names instead, with asm redirection to __round when the calls are not inlined. An additional complication arises in sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the result converted to int, gets converted by the compiler to call lroundl in the case of 32-bit long, so resulting in localplt test failures. It's logically correct to let the compiler make such an optimization; an appropriate asm redirection of lroundl to __lroundl is thus added to that file (it's not needed anywhere else). Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_roundf.c: Likewise. * sysdeps/ieee754/dbl-64/s_round.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise. * sysdeps/ieee754/float128/s_roundf128.c: Likewise. * sysdeps/ieee754/flt-32/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise. * sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_round.c: Likewise. * sysdeps/riscv/rvf/s_roundf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise. (round): Redirect to __round. (__roundl): Call round instead of __round. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round): Remove macro. [_ARCH_PWR5X] (__roundf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round functions instead of __round variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to __lroundl. (__ieee754_expl): Call roundl instead of __roundl.	2018-09-27 12:35:23 +00:00
Joseph Myers	7abf97bed9	Use trunc functions not __trunc functions in glibc libm. Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __trunc functions to call the corresponding trunc names instead, with asm redirection to __trunc when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_truncf.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise. * sysdeps/ieee754/float128/s_truncf128.c: Likewise. * sysdeps/ieee754/dbl-64/s_trunc.c: Likewise. * sysdeps/ieee754/flt-32/s_truncf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise. (ceil): Redirect to __ceil. (floor): Redirect to __floor. (trunc): Redirect to __trunc. (__truncl): Call trunc instead of __trunc. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc): Remove macro. [_ARCH_PWR5X] (__truncf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use trunc functions instead of __trunc variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise.	2018-09-20 21:11:10 +00:00
Joseph Myers	71223ef909	Use ceil functions not __ceil functions in glibc libm. Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __ceil functions to call the corresponding ceil names instead, with asm redirection to __ceil when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_ceilf.c: Likewise. * sysdeps/ieee754/dbl-64/s_ceil.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise. * sysdeps/ieee754/float128/s_ceilf128.c: Likewise. * sysdeps/ieee754/flt-32/s_ceilf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil): Remove macro. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil functions instead of __ceil variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.	2018-09-17 20:42:06 +00:00
Joseph Myers	f29b6f17e4	Use rint functions not __rint functions in glibc libm. Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __rint functions to call the corresponding rint names instead, with asm redirection to __rint when the calls are not inlined. The x86_64 math_private.h is removed as no longer useful after this patch. This patch is relative to a tree with my floor patch <https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied, and much the same considerations arise regarding possibly replacing an IFUNC call with a direct inline expansion. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_rintf.c: Likewise. * sysdeps/alpha/fpu/s_rint.c: Likewise. * sysdeps/alpha/fpu/s_rintf.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/ieee754/dbl-64/s_rint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise. * sysdeps/ieee754/float128/s_rintf128.c: Likewise. * sysdeps/ieee754/flt-32/s_rintf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise. * sysdeps/powerpc/fpu/s_rint.c: Likewise. * sysdeps/powerpc/fpu/s_rintf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Likewise. * sysdeps/riscv/rvf/s_rintf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/math_private.h: Remove file. * math/e_scalb.c (invalid_fn): Use rint functions instead of __rint variants. * math/e_scalbf.c (invalid_fn): Likewise. * math/e_scalbl.c (invalid_fn): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.	2018-09-14 13:10:39 +00:00
Joseph Myers	e44acb2063	Use floor functions not __floor functions in glibc libm. Similar to the changes that were made to call sqrt functions directly in glibc, instead of __ieee754_sqrt variants, so that the compiler could inline them automatically without needing special inline definitions in lots of math_private.h headers, this patch makes libm code call floor functions directly instead of __floor variants, removing the inlines / macros for x86_64 (SSE4.1) and powerpc (POWER5). The redirection used to ensure that __ieee754_sqrt does still get called when the compiler doesn't inline a built-in function expansion is refactored so it can be applied to other functions; the refactoring is arranged so it's not limited to unary functions either (it would be reasonable to use this mechanism for copysign - removing the inline in math_private_calls.h but also eliminating unnecessary local PLT entry use in the cases (powerpc soft-float and e500v1, for IBM long double) where copysign calls don't get inlined). The point of this change is that more architectures can get floor calls inlined where they weren't previously (AArch64, for example), without needing special inline definitions in their math_private.h, and existing such definitions in math_private.h headers can be removed. Note that it's possible that in some cases an inline may be used where an IFUNC call was previously used - this is the case on x86_64, for example. I think the direct calls to floor are still appropriate; if there's any significant performance cost from inline SSE2 floor instead of an IFUNC call ending up with SSE4.1 floor, that indicates that either the function should be doing something else that's faster than using floor at all, or it should itself have IFUNC variants, or that the compiler choice of inlining for generic tuning should change to allow for the possibility that, by not inlining, an SSE4.1 IFUNC might be called at runtime - but not that glibc should avoid calling floor internally. (After all, all the same considerations would apply to any user program calling floor, where it might either be inlined or left as an out-of-line call allowing for a possible IFUNC.) Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (floor): Likewise. * sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_floorf.c: Likewise. * sysdeps/ieee754/dbl-64/s_floor.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise. * sysdeps/ieee754/float128/s_floorf128.c: Likewise. * sysdeps/ieee754/flt-32/s_floorf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor): Remove macro. [_ARCH_PWR5X] (__floorf): Likewise. * sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove inline function. [__SSE4_1__] (__floorf): Likewise. * math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions instead of __floor variants. * math/w_lgamma_r_compat.c (__lgamma_r): Likewise. * math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise. * math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise. * math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise. * math/w_lgammal_r_compat.c (__lgammal_r): Likewise. * math/w_tgamma_compat.c (__tgamma): Likewise. * math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise. * math/w_tgammaf_compat.c (__tgammaf): Likewise. * math/w_tgammal_compat.c (__tgammal): Likewise. * sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise. * sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.	2018-09-14 13:09:01 +00:00
Szabolcs Nagy	e70c176825	Add new exp and exp2 implementations Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.	2018-09-05 16:22:00 +01:00

1 2 3 4 5 ...

344 Commits