glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 13:00:06 +00:00

Author	SHA1	Message	Date
Paul Eggert	c52ef24829	regex: fix buffer read overrun in search [BZ#28470] Problem reported by Benno Schulenberg in: https://lists.gnu.org/r/bug-gnulib/2021-10/msg00035.html * posix/regexec.c (re_search_internal): Use better bounds check.	2021-11-24 14:16:09 -08:00
Sunil K Pandey	c58d3b7d00	x86-64: Add vector sin/sinf to libmvec microbenchmark Add vector sin/sinf and input files to libmvec microbenchmark. libmvec-sin-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-sinf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:50:23 -08:00
Sunil K Pandey	6a556bac81	x86-64: Add vector pow/powf to libmvec microbenchmark Add vector pow/powf and input files to libmvec microbenchmark. libmvec-pow-inputs: arg1: 90% Normal random distribution range: (0.0, 256.0) mean: 0.0 sigma: 32.0 10% uniform random distribution in range (0.0, 256.0) arg2: 90% Normal random distribution range: (-127.0, 127.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-127.0, 127.0) libmvec-powf-inputs: arg1: 90% Normal random distribution range: (0.0f, 100.0f) mean: 0.0f sigma: 16.0f 10% uniform random distribution in range (0.0f, 100.0f) arg2: 90% Normal random distribution range: (-10.0f, 10.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-10.0f, 10.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:49:14 -08:00
Sunil K Pandey	8ab8afb336	x86-64: Add vector log/logf to libmvec microbenchmark Add vector log/logf and input files to libmvec microbenchmark. libmvec-log-inputs: 70% Normal random distribution range: (0.0, DBL_MAX) mean: 1.0 sigma: 50.0 30% uniform random distribution in range (0.0, DBL_MAX) libmvec-logf-inputs: 70% Normal random distribution range: (0.0f, FLT_MAX) mean: 1.0f sigma: 50.0f 30% uniform random distribution in range (0.0f, FLT_MAX) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:48:14 -08:00
Sunil K Pandey	37df38bd5f	x86-64: Add vector exp/expf to libmvec microbenchmark Add vector exp/expf and input files to libmvec microbenchmark. libmvec-exp-inputs: 90% Normal random distribution range: (-708.0, 709.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-500.0, 500.0) libmvec-expf-inputs: 90% Normal random distribution range: (-87.0f, 88.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-50.0f, 50.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:46:59 -08:00
Sunil K Pandey	4443695598	x86-64: Add vector cos/cosf to libmvec microbenchmark Add vector cos/cosf and input files to libmvec microbenchmark. libmvec-cos-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-cosf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:45:20 -08:00
Adhemerval Zanella	456b3c08b6	io: Refactor close_range and closefrom Now that Hurd implementis both close_range and closefrom (`f2c996597d`), we can make close_range() a base ABI, and make the default closefrom() implementation on top of close_range(). The generic closefrom() implementation based on __getdtablesize() is moved to generic close_range(). On Linux it will be overriden by the auto-generation syscall while on Hurd it will be a system specific implementation. The closefrom() now calls close_range() and __closefrom_fallback(). Since on Hurd close_range() does not fail, __closefrom_fallback() is an empty static inline function set by__ASSUME_CLOSE_RANGE. The __ASSUME_CLOSE_RANGE also allows optimize Linux __closefrom_fallback() implementation when --enable-kernel=5.9 or higher is used. Finally the Linux specific tst-close_range.c is moved to io and enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are guarded so it can be built for Hurd (I have not actually test it). Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu build.	2021-11-24 09:09:37 -03:00
Florian Weimer	e186fc5a31	nptl: Do not set signal mask on second setjmp return [BZ #28607 ] __libc_signal_restore_set was in the wrong place: It also ran when setjmp returned the second time (after pthread_exit or pthread_cancel). This is observable with blocked pending signals during thread exit. Fixes commit `b3cae39dcb` ("nptl: Start new threads with all signals blocked [BZ #25098]"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-24 08:59:54 +01:00
Adhemerval Zanella	aac54dcd37	powerpc: Define USE_PPC64_NOTOC iff compiler supports it The @notoc usage only yields an advantage on ISA 3.1+ machine (power10) and for ld.bfd also when it sees pcrel relocations used on the code (generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+ instruction on stubs are used iff linker also sees the new pc-relative relocations (for instance R_PPC64_D34), otherwise it generates default stubs (ppc64_elf_check_relocs:4700). This patch also help on linkers that do not implement this optimization, since building for older ISA (such as 3.0 / power9) will also trigger power10 stubs generation in the assembly code uses the NOTOC imacro. Checked on powerpc64le-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2021-11-22 14:49:11 -03:00
Adhemerval Zanella	bc801b3a40	setjmp: Replace jmp_buf-macros.h with jmp_buf-macros.sym It requires less boilerplate code for newer ports. The _Static_assert checks from internal setjmp are moved to its own internal test since setjmp.h is included early by multiple headers (to generate rtld-sizes.sym). The riscv jmp_buf-macros.h check is also redundant, it is already done by riscv configure.ac. Checked with a build for the affected architectures.	2021-11-22 13:43:22 -03:00
Joseph Myers	5c3ece451d	Update kernel version to 5.15 in tst-mman-consts.py This patch updates the kernel version in the test tst-mman-consts.py to 5.15. (There are no new MAP_* constants covered by this test in 5.15 that need any other header changes.) Tested with build-many-glibcs.py.	2021-11-22 15:30:12 +00:00
Florian Weimer	3d981795cd	socket: Do not use AF_NETLINK in __opensock It is not possible to use interface ioctls with netlink sockets on all Linux kernels. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-22 14:47:13 +01:00
Adhemerval Zanella	ed3ce71f5c	elf: Move la_activity (LA_ACT_ADD) after _dl_add_to_namespace_list() (BZ #28062 ) It ensures that the the namespace is guaranteed to not be empty. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-11-18 17:17:58 -03:00
Joseph Myers	bdeb7a8fa9	Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add these constants to bits/socket.h. Tested for x86_64.	2021-11-17 14:25:16 +00:00
Stafford Horne	f1bcfde3a7	malloc: Fix malloc debug for 2.35 onwards The change `1e5a5866cb` ("Remove malloc hooks [BZ #23328]") has broken ports that are using GLIBC_2_35, like the new OpenRISC port I am working on. The libc_malloc_debug.so library used to bring in the debug infrastructure is currently essentially empty for GLIBC_2_35 ports like mine causing mtrace tests to fail: cat sysdeps/unix/sysv/linux/or1k/shlib-versions DEFAULT GLIBC_2.35 ld=ld-linux-or1k.so.1 FAIL: posix/bug-glob2-mem FAIL: posix/bug-regex14-mem FAIL: posix/bug-regex2-mem FAIL: posix/bug-regex21-mem FAIL: posix/bug-regex31-mem FAIL: posix/bug-regex36-mem FAIL: malloc/tst-mtrace. The issue seems to be with the ifdefs in malloc/malloc-debug.c. The ifdefs are currently essentially exluding all symbols for ports > 2.35. Removing the top level SHLIB_COMPAT ifdef allows things to just work. Fixes: `1e5a5866cb` ("Remove malloc hooks [BZ #23328]") Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-11-17 21:33:39 +09:00
Florian Weimer	f1d333b5bf	elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres This will be used to deallocate memory allocated using the non-minimal malloc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:29 +01:00
Florian Weimer	8bd336a00a	nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c And make it an installed header. This addresses a few aliasing violations (which do not seem to result in miscompilation due to the use of atomics), and also enables use of wide counters in other parts of the library. The debug output in nptl/tst-cond22 has been adjusted to print the 32-bit values instead because it avoids a big-endian/little-endian difference. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:13 +01:00
Sunil K Pandey	a43c0b5483	x86-64: Create microbenchmark infrastructure for libmvec Add python script to generate libmvec microbenchmark from the input values for each libmvec function using skeleton benchmark template. Creates double and float benchmarks with vector length 1, 2, 4, 8, and 16 for each libmvec function. Vector length 1 corresponds to scalar version of function and is included for vector function perf comparison. Co-authored-by: Haochen Jiang <haochen.jiang@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-16 11:37:39 -08:00
Adhemerval Zanella	d8c2e8e043	elf: hidden visibility for __minimal_malloc functions Since `b05fae4d8e`, __minimal malloc code is used during static startup before PIE self-relocation (_dl_relocate_static_pie). So it requires the same fix done for other objects by `47618209d0`. Checked on aarch64, x86_64, and i686 with and without static-pie.	2021-11-16 16:03:31 -03:00
H.J. Lu	1f67d8286b	elf: Use a temporary file to generate Makefile fragments [BZ #28550 ] 1. Use a temporary file to generate Makefile fragments for DSO sorting tests and use -include on them. 2. Add Makefile fragments to postclean-generated so that a "make clean" removes the autogenerated fragments and a subsequent "make" regenerates them. This partially fixes BZ #28550. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-16 05:18:01 -08:00
H.J. Lu	b4bbedb1e7	dso-ordering-test.py: Put all sources in one directory [BZ #28550 ] Put all sources for DSO sorting tests in the dso-sort-tests-src directory and compile test relocatable objects with $(objpfx)tst-dso-ordering1-dir/tst-dso-ordering1-a.os: $(objpfx)dso-sort-tests-src/tst-dso-ordering1-a.c $(compile.c) $(OUTPUT_OPTION) to avoid random $< values from $(before-compile) when compiling test relocatable objects with $(objpfx)%$o: $(objpfx)%.c $(before-compile); $$(compile-command.c) compile-command.c = $(compile.c) $(OUTPUT_OPTION) $(compile-mkdep-flags) compile.c = $(CC) $< -c $(CFLAGS) $(CPPFLAGS) for 3 "make -j 28" parallel builds on a machine with 112 cores at the same time. This partially fixes BZ #28550. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-15 11:53:40 -08:00
Adhemerval Zanella	54816ae98d	elf: Move LAV_CURRENT to link_lavcurrent.h No functional change.	2021-11-15 15:28:17 -03:00
H.J. Lu	120ac6d238	Move assignment out of the CAS condition Update commit `49302b8fdf` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Nov 11 06:54:01 2021 -0800 Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537] Replace boolean CAS with value CAS to avoid the extra load. and commit `0b82747dc4` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Nov 11 06:31:51 2021 -0800 Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537] Replace boolean CAS with value CAS to avoid the extra load. by moving assignment out of the CAS condition.	2021-11-15 05:50:56 -08:00
H.J. Lu	cbcd65c8b5	Add a comment for --enable-initfini-array [BZ #27945 ] Document that --enable-initfini-array is enabled by default in GCC 12, which can be removed when GCC 12 becomes the minimum requirement.	2021-11-13 09:50:07 -08:00
Stafford Horne	afbf26492a	tst-tzset: output reason when creating 4GiB file fails Currently, if the temporary file creation fails the create_tz_file function returns NULL. The NULL pointer is then passed to setenv which causes a SIGSEGV. Rather than failing with a SIGSEGV print a warning and exit.	2021-11-13 08:08:47 +09:00
H.J. Lu	d672a98a1a	Add LLL_MUTEX_READ_LOCK [BZ #28537 ] CAS instruction is expensive. From the x86 CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock loop if compare may fail to reduce cache line bouncing on contended locks. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-11-12 10:32:09 -08:00
H.J. Lu	49302b8fdf	Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537 ] Replace boolean CAS with value CAS to avoid the extra load. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-11-12 10:31:31 -08:00
H.J. Lu	0b82747dc4	Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537 ] Replace boolean CAS with value CAS to avoid the extra load. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-11-12 10:30:57 -08:00
Noah Goldstein	6c1e3c0fd0	String: Split memcpy tests so that parallel build is faster No bug. This commit splits test-memcpy.c into test-memcpy.c and test-memcpy-large.c. The idea is parallel builds will be able to run both in parallel speeding up the process. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-10 20:14:09 -06:00
Noah Goldstein	2f9062d717	x86: Shrink memcmp-sse4.S code size No bug. This implementation refactors memcmp-sse4.S primarily with minimizing code size in mind. It does this by removing the lookup table logic and removing the unrolled check from (256, 512] bytes. memcmp-sse4 code size reduction : -3487 bytes wmemcmp-sse4 code size reduction: -1472 bytes The current memcmp-sse4.S implementation has a large code size cost. This has serious adverse affects on the ICache / ITLB. While in micro-benchmarks the implementations appears fast, traces of real-world code have shown that the speed in micro benchmarks does not translate when the ICache/ITLB are not primed, and that the cost of the code size has measurable negative affects on overall application performance. See https://research.google/pubs/pub48320/ for more details. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-10 20:12:10 -06:00
Joseph Myers	309548bec3	Support C2X printf %b, %B C2X adds a printf %b format (see <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2630.pdf>, accepted for C2X), for outputting integers in binary. It also has recommended practice for a corresponding %B format (like %b, but %#B starts the output with 0B instead of 0b). Add support for these formats to glibc. One existing test uses %b as an example of an unknown format, to test how glibc printf handles unknown formats; change that to %v. Use of %b and %B as user-registered format specifiers continues to work (and we already have a test that covers that, tst-printfsz.c). Note that C2X also has scanf %b support, plus support for binary constants starting 0b in strtol (base 0 and 2) and scanf %i (strtol base 0 and scanf %i coming from a previous paper that added binary integer literals). I intend to implement those features in a separate patch or patches; as discussed in the thread starting at <https://sourceware.org/pipermail/libc-alpha/2020-December/120414.html>, they will be more complicated because they involve adding extra public symbols to ensure compatibility with existing code that might not expect 0b constants to be handled by strtol base 0 and 2 and scanf %i, whereas simply adding a new format specifier poses no such compatibility concerns. Note that the actual conversion from integer to string uses existing code in _itoa.c. That code has special cases for bases 8, 10 and 16, probably so that the compiler can optimize division by an integer constant in the code for those bases. If desired such special cases could easily be added for base 2 as well, but that would be an optimization, not actually needed for these printf formats to work. Tested for x86_64 and x86. Also tested with build-many-glibcs.py for aarch64-linux-gnu with GCC mainline to make sure that the test does indeed build with GCC 12 (where format checking warnings are enabled for most of the test).	2021-11-10 15:52:21 +00:00
Joseph Myers	3387c40a8b	Update syscall lists for Linux 5.15 Linux 5.15 has one new syscall, process_mrelease (and also enables the clone3 syscall for RV32). It also has a macro __NR_SYSCALL_MASK for Arm, which is not a syscall but matches the pattern used for syscall macro names. Add __NR_SYSCALL_MASK to the names filtered out in the code dealing with syscall lists, update syscall-names.list for the new syscall and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py.	2021-11-10 15:21:19 +00:00
Florian Weimer	98966749f2	s390: Use long branches across object boundaries (jgh instead of jh) Depending on the layout chosen by the linker, the 16-bit displacement of the jh instruction is insufficient to reach the target label. Analysis of the linker failure was carried out by Nick Clifton. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Stefan Liebler <stli@linux.ibm.com>	2021-11-10 15:21:37 +01:00
H.J. Lu	0bd356df1a	Remove the unused +mkdep/+make-deps/s-proto.S/s-proto-cancel.S Since commit `d73f5331ce` Author: Roland McGrath <roland@gnu.org> Date: Fri May 2 02:20:45 2003 +0000 2003-05-01 Roland McGrath <roland@redhat.com> dependency is generated by passing -MD -MF to compiler. Remove the unused +mkdep, +make-deps, s-proto.S and s-proto-cancel.S. This fixes BZ #28554.	2021-11-10 04:54:18 -08:00
Adhemerval Zanella	824dd3ec49	Fix build a chec failures after `b05fae4d8e` The include cleanup on dl-minimal.c removed too much for some targets. Also for Hurd, __sbrk is removed from localplt.data now that tunables allocated memory through mmap. Checked with a build for all affected architectures.	2021-11-09 23:21:22 -03:00
Adhemerval Zanella	b05fae4d8e	elf: Use the minimal malloc on tunables_strdup The rtld_malloc functions are moved to its own file so it can be used on csu code. Also, the functiosn are renamed to __minimal_* (since there are now used not only on loader code). Using the __minimal_malloc on tunables_strdup() avoids potential issues with sbrk() calls while processing the tunables (I see sporadic elf/tst-dso-ordering9 on powerpc64le with different tests failing due ASLR). Also, using __minimal_malloc over plain mmap optimizes the memory allocation on both static and dynamic case (since it will any unused space in either the last page of data segments, avoiding mmap() call, or from the previous mmap() call). Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-11-09 14:11:25 -03:00
Joseph Myers	db6c4935fa	Fix memmove call in vfprintf-internal.c:group_number A recent GCC mainline change introduces errors of the form: vfprintf-internal.c: In function 'group_number': vfprintf-internal.c:2093:15: error: 'memmove' specified bound between 9223372036854775808 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=] 2093 \| memmove (w, s, (front_ptr -s) * sizeof (CHAR_T)); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a genuine bug in the glibc code: s > front_ptr is always true at this point in the code, and the intent is clearly for the subtraction to be the other way round. The other arguments to the memmove call here also appear to be wrong; w and s point just after the destination and source for copying the rest of the number, so the size needs to be subtracted to get appropriate pointers for the copying. Adjust the memmove call to conform to the apparent intent of the code, so fixing the -Wstringop-overflow error. Now, if the original code were ever executed, a buffer overrun would result. However, I believe this code (introduced in commit `edc1686af0`, "vfprintf: Reuse work_buffer in group_number", so in glibc 2.26) is unreachable in prior glibc releases (so there is no need for a bug in Bugzilla, no need to consider any backports unless someone wants to build older glibc releases with GCC 12 and no possibility of this buffer overrun resulting in a security issue). work_buffer is 1000 bytes / 250 wide characters. This case is only reachable if an initial part of the number, plus a grouped copy of the rest of the number, fail to fit in that space; that is, if the grouped number fails to fit in the space. In the wide character case, grouping is always one wide character, so even with a locale (of which there aren't any in glibc) grouping every digit, a number would need to occupy at least 125 wide characters to overflow, and a 64-bit integer occupies at most 23 characters in octal including a leading 0. In the narrow character case, the multibyte encoding of the grouping separator would need to be at least 42 bytes to overflow, again supposing grouping every digit, but MB_LEN_MAX is 16. So even if we admit the case of artificially constructed locales not shipped with glibc, given that such a locale would need to use one of the character sets supported by glibc, this code cannot be reached at present. (And POSIX only actually specifies the ' flag for grouping for decimal output, though glibc acts on it for other bases as well.) With binary output (if you consider use of grouping there to be valid), you'd need a 15-byte multibyte character for overflow; I don't know if any supported character set has such a character (if, again, we admit constructed locales using grouping every digit and a grouping separator chosen to have a multibyte encoding as long as possible, as well as accepting use of grouping with binary), but given that we have this code at all (clearly it's not correct, or in accordance with the principle of avoiding arbitrary limits, to skip grouping on running out of internal space like that), I don't think it should need any further changes for binary printf support to go in. On the other hand, support for large sizes of _BitInt in printf (see the N2858 proposal) would require something to be done about such arbitrary limits (presumably using dynamic allocation in printf again, for sufficiently large _BitInt arguments only - currently only floating-point uses dynamic allocation, and, as previously discussed, that could actually be replaced by bounded allocation given smarter code). Tested with build-many-glibcs.py for aarch64-linux-gnu (GCC mainline). Also tested natively for x86_64.	2021-11-08 19:11:51 +00:00
Adhemerval Zanella	3a523ccd78	locale: Fix localedata/sort-test undefined behavior The collate-test.c triggers UB with an signed integer overflow, which results in an error on some architectures (powerpc32). Checked on x86_64, i686, and powerpc.	2021-11-08 15:28:48 -03:00
H.J. Lu	a6a9c1a36b	test-memcpy.c: Double TIMEOUT to (8 * 60) commit `d585ba47fc` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Nov 1 00:49:48 2021 -0500 string: Make tests birdirectional test-memcpy.c This commit updates the memcpy tests to test both dst > src and dst < src. This is because there is logic in the code based on the Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> significantly increased the number of tests. On Intel Core i7-1165G7, test-memcpy takes 120 seconds to run when machine is idle. Double TIMEOUT to (8 * 60) for test-memcpy to avoid timeout when machine is under heavy load. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-11-07 10:09:33 -08:00
Samuel Thibault	d41985b71e	hurd: Remove unused __libc_close_range That was just cargo-culted.	2021-11-07 16:23:51 +01:00
Sergey Bugaev	f2c996597d	hurd: Implement close_range and closefrom The close_range () function implements the same API as the Linux and FreeBSD syscalls. It operates atomically and reliably. The specified upper bound is clamped to the actual size of the file descriptor table; it is expected that the most common use case is with last = UINT_MAX. Like in the Linux syscall, it is also possible to pass the CLOSE_RANGE_CLOEXEC flag to mark the file descriptors in the range cloexec instead of acually closing them. Also, add a Hurd version of the closefrom () function. Since unlike on Linux, close_range () cannot fail due to being unuspported by the running kernel, a fallback implementation is never necessary. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20211106153524.82700-1-bugaevc@gmail.com>	2021-11-07 16:16:11 +01:00
Noah Goldstein	475b63702e	x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:18:08 -05:00
Noah Goldstein	a6b7502ec0	x86: Optimize memmove-vec-unaligned-erms.S No bug. The optimizations are as follows: 1) Always align entry to 64 bytes. This makes behavior more predictable and makes other frontend optimizations easier. 2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have significant benefits in the case that: 0 < (dst - src) < [256, 512] 3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%] improvement and for FSRM [-10%, 25%]. In addition to these primary changes there is general cleanup throughout to optimize the aligning routines and control flow logic. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:18:03 -05:00
Noah Goldstein	ac759b1fbf	benchtests: Add partial overlap case in bench-memmove-walk.c This commit adds a new partial overlap benchmark. This is generally the most interesting performance case for memmove and was missing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:17:59 -05:00
Noah Goldstein	5e6cce9b34	benchtests: Add additional cases to bench-memcpy.c and bench-memmove.c This commit adds more benchmarks for the common memcpy/memmove benchmarks. The most signifcant cases are the half page offsets. The current versions leaves dst and src near page aligned which leads to false 4k aliasing on x86_64. This can add noise due to false dependencies from one run to the next. As well, this seems like more of an edge case that common case so it shouldn't be the only thing Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:17:51 -05:00
Noah Goldstein	d585ba47fc	string: Make tests birdirectional test-memcpy.c This commit updates the memcpy tests to test both dst > src and dst < src. This is because there is logic in the code based on the Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:17:30 -05:00
H.J. Lu	d465e5e0da	Remove the last trace of generate-md5 [BZ #28554 ] generate-md5 was removed by commit `d73f5331ce` Author: Roland McGrath <roland@gnu.org> Date: Fri May 2 02:20:45 2003 +0000 2003-05-01 Roland McGrath <roland@redhat.com> Remove its last trace. This fixes BZ #28554.	2021-11-06 06:21:44 -07:00
Sunil K Pandey	2856829ee7	Revert "benchtests: Add acosf function to bench-math" This reverts commit `79d0fc6539`.	2021-11-05 16:13:12 -07:00
H.J. Lu	a586fe9c80	Configure GCC with --enable-initfini-array [BZ #27945 ] Starting from GCC 12, the .init_array and .fini_array sections are enabled unconditionally by commit 13a39886940331149173b25d6ebde0850668d8b9 Author: H.J. Lu <hjl.tools@gmail.com> Date: Tue Jun 8 16:09:24 2021 -0700 Always enable DT_INIT_ARRAY/DT_FINI_ARRAY on Linux configure GCC with --enable-initfini-array to enable them when using GCC release branches. Fixes BZ #27945.	2021-11-05 15:30:02 -07:00
Florian Weimer	ea32ec354c	elf: Earlier missing dynamic segment check in _dl_map_object_from_fd Separated debuginfo files have PT_DYNAMIC with p_filesz == 0. We need to check for that before the _dl_map_segments call because that could attempt to write to mappings that extend beyond the end of the file, resulting in SIGBUS. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-05 19:34:16 +01:00

1 2 3 4 5 ...

38151 Commits