glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-26 04:31:03 +00:00

Author	SHA1	Message	Date
Noah Goldstein	bcc41f66a4	x86: Optimize svml_s_tanhf8_core_avx2.S Optimizations are: 1. Reduce code size (-81 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata size (-32 bytes). Result is roughly a 17-18% speedup: Function, New Time, Old Time, New / Old _ZGVdN8v_tanhf, 1.977, 2.402, 0.823	2022-06-09 12:51:22 -07:00
Noah Goldstein	3a49ce8799	x86: Add data file that can be shared by tanhf-avx2 and tanhf-sse4 tanhf-avx2 and tanhf-sse4 use the same data tables so we can save over 4kb using a shared datatable. This does increase the memory footprint of the sse4 version (as now all the targets are 32 bytes instead of 16), generally it seems worth the code size save. NB: This patch doesn't do anything itself, it is setup for future patches.	2022-06-09 12:51:15 -07:00
Noah Goldstein	e560b3c2d2	x86: Optimize svml_s_tanhf16_core_avx512.S Optimizations are: 1. Reduce code size (-67 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Reduce rodata usage (-448 bytes). Result is roughly a 14% speedup: Function, New Time, Old Time, New / Old _ZGVeN16v_tanhf, 0.649, 0.752, 0.863	2022-06-09 12:51:12 -07:00
Noah Goldstein	fe1915d4f6	x86: Improve svml_s_atanhf4_core_sse4.S Improvements are: 1. Reduce code size (-62 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Reduce rodata usage (-16 bytes). The throughput improvement is not significant as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVbN4v_atanhf, 8.821, 8.903, 0.991	2022-06-09 12:51:09 -07:00
Noah Goldstein	65897e9916	x86: Improve svml_s_atanhf8_core_avx2.S Improvements are: 1. Reduce code size (-60 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Prefer registers which get short instruction encoding. 5. Shrink rodata usage (-32 bytes). The throughput improvement is not that significant (3-5%) as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVdN8v_atanhf, 2.799, 2.923, 0.958	2022-06-09 12:51:04 -07:00
Noah Goldstein	73bae395cf	x86: Improve svml_s_atanhf16_core_avx512.S Improvements are: 1. Reduce code size (-64 bytes). 2. Remove redundant move instructions. 3. Slightly improve instruction selection/scheduling where possible. 4. Reduce rodata size ([-128, -188] bytes). The throughput improvement is not significant as the port 0 bottleneck is unavoidable. Function, New Time, Old Time, New / Old _ZGVeN16v_atanhf, 1.39, 1.408, 0.987	2022-06-09 12:50:58 -07:00
Noah Goldstein	0f91811333	x86: Align varshift table to 32-bytes This ensures the load will never split a cache line.	2022-06-09 12:50:26 -07:00
Noah Goldstein	4654e7fd5a	x86: Add copyright to strpbrk-c.c	2022-06-09 12:50:00 -07:00
Sam James	ace9e3edbc	nss: handle stat failure in check_reload_and_get (BZ #28752 ) Skip the chroot test if the database isn't loaded correctly (because the chroot test uses some existing DB state). The __stat64_time64 -> fstatat call can fail if running under an (aggressive) seccomp filter, like Firefox seems to use. This manifested in a crash when using glib built with FAM support with such a Firefox build. Suggested-by: DJ Delorie <dj@redhat.com> Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2022-06-08 21:29:39 -04:00
Sam James	3fdf0a205b	nss: add assert to DB_LOOKUP_FCT (BZ #28752 ) It's interesting if we have a null action list, so an assert is worthwhile. Suggested-by: DJ Delorie <dj@redhat.com> Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2022-06-08 21:29:39 -04:00
Noah Goldstein	2c9af8421d	x86: Fix page cross case in rawmemchr-avx2 [BZ #29234 ] commit `6dcbb7d95d` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Jun 6 21:11:33 2022 -0700 x86: Shrink code size of memchr-avx2.S Changed how the page cross case aligned string (rdi) in rawmemchr. This was incompatible with how `L(cross_page_continue)` expected the pointer to be aligned and would cause rawmemchr to read data start started before the beginning of the string. What it would read was in valid memory but could count CHAR matches resulting in an incorrect return value. This commit fixes that issue by essentially reverting the changes to the L(page_cross) case as they didn't really matter. Test cases added and all pass with the new code (and where confirmed to fail with the old code). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-08 17:07:34 -07:00
Paul E. Murphy	aa13fd1618	nptl_db: disable DT_RELR on libthread_db.so Some nptl tests inadvertently use the host's gdb to verify libthread_db.so, which is loaded with the host's runtime. This causes a couple of test failures when the host glibc does not support DT_RELR. The not correct, but simple, workaround is to build without DT_RELR as this library is otherwise likely to load on glibc 2.17 and newer today. This allows tst-pthread-gdb-attach{,-static} to continue working when testing on a gdb loaded with an older glibc. This avoids a failure in tst-pthread-gdb-attach similar to: Trying host libthread_db library: .../build/glibc/nptl_db/libthread_db.so.1. dlopen failed: /lib64/libc.so.6: version `GLIBC_ABI_DT_RELR' not found (required by .../build/glibc/nptl_db/libthread_db.so.1). Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-06-08 11:17:47 -05:00
Andreas Schwab	c2f39be490	elf: add missing newlines in lateglobal test	2022-06-08 15:28:41 +02:00
Adhemerval Zanella	c7d36dcecc	nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214) This was due a wrong revert done on `404656009b`. Checked on x86_64-linux-gnu.	2022-06-08 09:23:02 -03:00
Noah Goldstein	c28db9cb29	x86: ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST expect no transactions Give fall-through path to `vzeroupper` and taken-path to `vzeroall`. Generally even on machines with RTM the expectation is the string-library functions will not be called in transactions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:10:32 -07:00
Noah Goldstein	56da3fe1dd	x86: Shrink code size of memchr-evex.S This is not meant as a performance optimization. The previous code was far to liberal in aligning targets and wasted code size unnecissarily. The total code size saving is: 64 bytes There are no non-negligible changes in the benchmarks. Geometric Mean of all benchmarks New / Old: 1.000 Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:10:32 -07:00
Noah Goldstein	6dcbb7d95d	x86: Shrink code size of memchr-avx2.S This is not meant as a performance optimization. The previous code was far to liberal in aligning targets and wasted code size unnecissarily. The total code size saving is: 59 bytes There are no major changes in the benchmarks. Geometric Mean of all benchmarks New / Old: 0.967 Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:10:31 -07:00
Noah Goldstein	af5306a735	x86: Optimize memrchr-avx2.S The new code: 1. prioritizes smaller user-arg lengths more. 2. optimizes target placement more carefully 3. reuses logic more 4. fixes up various inefficiencies in the logic. The biggest case here is the `lzcnt` logic for checking returns which saves either a branch or multiple instructions. The total code size saving is: 306 bytes Geometric Mean of all benchmarks New / Old: 0.760 Regressions: There are some regressions. Particularly where the length (user arg length) is large but the position of the match char is near the beginning of the string (in first VEC). This case has roughly a 10-20% regression. This is because the new logic gives the hot path for immediate matches to shorter lengths (the more common input). This case has roughly a 15-45% speedup. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:10:27 -07:00
Noah Goldstein	b4209615a0	x86: Optimize memrchr-evex.S The new code: 1. prioritizes smaller user-arg lengths more. 2. optimizes target placement more carefully 3. reuses logic more 4. fixes up various inefficiencies in the logic. The biggest case here is the `lzcnt` logic for checking returns which saves either a branch or multiple instructions. The total code size saving is: 263 bytes Geometric Mean of all benchmarks New / Old: 0.755 Regressions: There are some regressions. Particularly where the length (user arg length) is large but the position of the match char is near the beginning of the string (in first VEC). This case has roughly a 20% regression. This is because the new logic gives the hot path for immediate matches to shorter lengths (the more common input). This case has roughly a 35% speedup. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:10:24 -07:00
Noah Goldstein	731feee386	x86: Optimize memrchr-sse2.S The new code: 1. prioritizes smaller lengths more. 2. optimizes target placement more carefully. 3. reuses logic more. 4. fixes up various inefficiencies in the logic. The total code size saving is: 394 bytes Geometric Mean of all benchmarks New / Old: 0.874 Regressions: 1. The page cross case is now colder, especially re-entry from the page cross case if a match is not found in the first VEC (roughly 50%). My general opinion with this patch is this is acceptable given the "coldness" of this case (less than 4%) and generally performance improvement in the other far more common cases. 2. There are some regressions 5-15% for medium/large user-arg lengths that have a match in the first VEC. This is because the logic was rewritten to optimize finds in the first VEC if the user-arg length is shorter (where we see roughly 20-50% performance improvements). It is not always the case this is a regression. My intuition is some frontend quirk is partially explaining the data although I haven't been able to find the root cause. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:09:36 -07:00
Noah Goldstein	d0370d992e	Benchtests: Improve memrchr benchmarks Add a second iteration for memrchr to set `pos` starting from the end of the buffer. Previously `pos` was only set relative to the beginning of the buffer. This isn't really useful for memrchr because the beginning of the search space is (buf + len). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:09:16 -07:00
Noah Goldstein	dd5c483b25	x86: Add COND_VZEROUPPER that can replace vzeroupper if no `ret` The RTM vzeroupper mitigation has no way of replacing inline vzeroupper not before a return. This can be useful when hoisting a vzeroupper to save code size for example: ``` L(foo): cmpl %eax, %edx jz L(bar) tzcntl %eax, %eax addq %rdi, %rax VZEROUPPER_RETURN L(bar): xorl %eax, %eax VZEROUPPER_RETURN ``` Can become: ``` L(foo): COND_VZEROUPPER cmpl %eax, %edx jz L(bar) tzcntl %eax, %eax addq %rdi, %rax ret L(bar): xorl %eax, %eax ret ``` This code does not change any existing functionality. There is no difference in the objdump of libc.so before and after this patch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:08:28 -07:00
Noah Goldstein	8a780a6b91	x86: Create header for VEC classes in x86 strings library This patch does not touch any existing code and is only meant to be a tool for future patches so that simple source files can more easily be maintained to target multiple VEC classes. There is no difference in the objdump of libc.so before and after this patch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-07 13:08:28 -07:00
Matheus Castanho	0218463dd8	powerpc: Fix VSX register number on __strncpy_power9 [BZ #29197 ] __strncpy_power9 initializes VR 18 with zeroes to be used throughout the code, including when zero-padding the destination string. However, the v18 reference was mistakenly being used for stxv and stxvl, which take a VSX vector as operand. The code ended up using the uninitialized VSR 18 register by mistake. Both occurrences have been changed to use the proper VSX number for VR 18 (i.e. VSR 50). Tested on powerpc, powerpc64 and powerpc64le. Signed-off-by: Kewen Lin <linkw@gcc.gnu.org>	2022-06-07 15:07:25 -03:00
Wilco Dijkstra	eea282d9c6	AArch64: Sort makefile entries Sort makefile entries to reduce conflicts.	2022-06-07 16:58:15 +01:00
Wilco Dijkstra	9f298bfe1f	AArch64: Add SVE memcpy Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly. Cleanup the memcpy and memmove ifunc selectors.	2022-06-07 16:58:03 +01:00
Raghuveer Devulapalli	5082a287d5	x86_64: Add strstr function with 512-bit EVEX Adding a 512-bit EVEX version of strstr. The algorithm works as follows: (1) We spend a few cycles at the begining to peek into the needle. We locate an edge in the needle (first occurance of 2 consequent distinct characters) and also store the first 64-bytes into a zmm register. (2) We search for the edge in the haystack by looking into one cache line of the haystack at a time. This avoids having to read past a page boundary which can cause a seg fault. (3) If an edge is found in the haystack we first compare the first 64-bytes of the needle (already stored in a zmm register) before we proceed with a full string compare performed byte by byte. Benchmarking results: (old = strstr_sse2_unaligned, new = strstr_avx512) Geometric mean of all benchmarks: new / old = 0.66 Difficult skiptable(0) : new / old = 0.02 Difficult skiptable(1) : new / old = 0.01 Difficult 2-way : new / old = 0.25 Difficult testing first 2 : new / old = 1.26 Difficult skiptable(0) : new / old = 0.05 Difficult skiptable(1) : new / old = 0.06 Difficult 2-way : new / old = 0.26 Difficult testing first 2 : new / old = 1.05 Difficult skiptable(0) : new / old = 0.42 Difficult skiptable(1) : new / old = 0.24 Difficult 2-way : new / old = 0.21 Difficult testing first 2 : new / old = 1.04 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-06-06 19:46:55 -07:00
Adhemerval Zanella	8521001731	scripts/glibcelf.py: Add PT_AARCH64_MEMTAG_MTE constant It was added in commit `603e5c8ba7`. This caused the elf/tst-glibcelf consistency check to fail. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2022-06-06 15:56:48 -03:00
Dmitriy Fedchenko	999835533b	socket: Fix mistyped define statement in socket/sys/socket.h (BZ #29225 )	2022-06-06 12:46:14 -03:00
Joseph Myers	828c72519f	Declare timegm for ISO C2X The next revision of the ISO C standard has added the timegm function (that was already supported in glibc). Update the feature test conditionals on its declaration in <time.h> accordingly. Tested for x86_64.	2022-06-06 14:47:03 +00:00
Joseph Myers	603e5c8ba7	Add PT_AARCH64_MEMTAG_MTE from Linux 5.18 to elf.h Linux 5.18 defines a new AArch64 ELF segment type PT_AARCH64_MEMTAG_MTE; add it to elf.h. Tested with build-many-glibcs.py for aarch64-linux-gnu.	2022-06-06 14:45:34 +00:00
Sam James	7df596a58c	grep: egrep -> grep -E, fgrep -> grep -F Newer versions of GNU grep (after grep 3.7, not inclusive) will warn on 'egrep' and 'fgrep' invocations. Convert usages within the tree to their expanded non-aliased counterparts to avoid irritating warnings during ./configure and the test suite. Signed-off-by: Sam James <sam@gentoo.org> Reviewed-by: Fangrui Song <maskray@google.com>	2022-06-05 12:09:02 -07:00
H.J. Lu	3c23fa9f44	string.h: Fix boolean spelling in comments	2022-06-03 10:22:38 -07:00
Carlos O'Donell	48f4b30780	elf: Add #include <errno.h> for use of E* constants. In __strerror_r we use errno constants and must include errno.h. Tested on x86_64 and i686 without regression.	2022-06-02 15:20:36 -04:00
Carlos O'Donell	62c888b337	elf: Add #include <sys/param.h> for MAX usage. In _dl_audit_pltenter we use MAX and so need to include param.h. Tested on x86_64 and i686 without regression.	2022-06-02 15:20:36 -04:00
Adhemerval Zanella	1002f1af1c	linux: Add process_mrelease Added in Linux 5.15 (884a7e5964e06ed93c7771c0d7cf19c09a8946f1), the new syscalls allows a caller to free the memory of a dying target process. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-06-02 15:43:28 -03:00
Adhemerval Zanella	d19ee3473d	linux: Add process_madvise It was added on Linux 5.10 (ecb8ac8b1f146915aa6b96449b66dd48984caacc) with the same functionality as madvise but using a pidfd of the target process. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-06-02 15:43:28 -03:00
Adhemerval Zanella	7d3e91ba19	linux: Set tst-pidfd-consts unsupported for kernels headers older than 5.10 Instead of fail trying to build the compare source file. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Matheus Castanho <msc@linux.ibm.com>	2022-06-02 15:43:25 -03:00
Florian Weimer	bb8887379f	testrun.sh: Support passing strace and valgrind arguments This is a bit of a hack, but it works quite well in practice. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-06-02 18:37:30 +02:00
Florian Weimer	4b527650e0	Linux: Adjust struct rseq definition to current kernel version This definition is only used as a fallback with old kernel headers. The change follows kernel commit bfdf4e6208051ed7165b2e92035b4bf11 ("rseq: Remove broken uapi field layout on 32-bit little endian"). Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-06-02 16:29:59 +02:00
Adhemerval Zanella	c789e6e409	iconv: Use 64 bit stat for gconv_parseconfdir (BZ# 29213) The issue is only when used within libc.so (iconvconfig already builds with _TIME_SIZE=64). This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:16 -03:00
Adhemerval Zanella	634f566c3e	catgets: Use 64 bit stat for __open_catalog (BZ# 29211) This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:16 -03:00
Adhemerval Zanella	3cd4785ea0	inet: Use 64 bit stat for ruserpass (BZ# 29210) This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:16 -03:00
Adhemerval Zanella	87f1ec12e7	socket: Use 64 bit stat for isfdtype (BZ# 29209) This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:16 -03:00
Adhemerval Zanella	6e7137f28c	posix: Use 64 bit stat for fpathconf (_PC_ASYNC_IO) (BZ# 29208) This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:16 -03:00
Adhemerval Zanella	574ba60fc8	posix: Use 64 bit stat for posix_fallocate fallback (BZ# 29207) This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:16 -03:00
Adhemerval Zanella	ec995fb215	misc: Use 64 bit stat for getusershell (BZ# 29203) This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:16 -03:00
Adhemerval Zanella	3fbc33010c	misc: Use 64 bit stat for daemon (BZ# 29203) This is a missing spot initially from `52a5fe70a2`. Checked on i686-linux-gnu.	2022-06-01 13:23:13 -03:00
WANG Xuerui	e6547d635b	linux: use statx for fstat if neither newfstatat nor fstatat64 is present LoongArch is going to be the first architecture supported by Linux that has neither fstat* nor newfstatat [1], instead exclusively relying on statx. So in fstatat64's implementation, we need to also enable statx usage if neither fstatat64 nor newfstatat is present, to prepare for this new case of kernel ABI. [1]: https://lore.kernel.org/all/20220518092619.1269111-1-chenhuacai@loongson.cn/ Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-06-01 12:29:01 -03:00
Joseph Myers	de3501d60f	Add MADV_DONTNEED_LOCKED from Linux 5.18 to bits/mman-linux.h Linux 5.18 adds a constant MADV_DONTNEED_LOCKED (defined in multiple header files, but with the same value on all architectures). Add this constant to bits/mman-linux.h. Tested for x86_64.	2022-06-01 14:45:48 +00:00

... 3 4 5 6 7 ...

39150 Commits