glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-21 20:40:05 +00:00

Author	SHA1	Message	Date
Adhemerval Zanella	4f047d9ede	elf: Fix localplt.awk for DT_RELR-enabled builds (BZ 31978) For each input readelf output, localplt.awk parses each 'Relocation section' entry, checks its offset against the dynamic section entry, and saves each DT_JMPREL, DT_RELA, and DT_REL offset value it finds. After all lines are read, the script checks if any segment offset differed from 0, meaning at least one 'Relocation section' was matched. However, if the shared object was built with RELR support and the static linker could place all the relocation on DT_RELR, there would be no DT_JMPREL, DT_RELA, and DT_REL entries; only a DT_RELR. For the current three ABIs that support (aarch64, x86, and powerpc64), the powerpc64 ld.so shows the behavior above. Both x86_64 and aarch64 show extra relocations on '.rela.dyn', which makes the script check to succeed. This patch fixes by handling DT_RELR, where the offset is checked against the dynamic section entries and if the shared object contains an entry it means that there are no extra PLT entries (since all relocations are relative). It fixes the elf/check-localplt failure on powerpc. Checked with a build/check for aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu, arm-linux-gnueabihf, s390x-linux-gnu, powerpc-linux-gnu, powerpc64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-07-19 22:50:26 +02:00
Adhemerval Zanella	6b7e2e1d61	linux: Also check pkey_get for ENOSYS on tst-pkey (BZ 31996) The powerpc pkey_get/pkey_set support was only added for 64-bit [1], and tst-pkey only checks if the support was present with pkey_alloc (which does not fail on powerpc32, at least running a 64-bit kernel). Checked on powerpc-linux-gnu. [1] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a803367bab167f5ec4fde1f0d0ec447707c29520 Reviewed-By: Andreas K. Huettel <dilfridge@gentoo.org>	2024-07-19 22:39:44 +02:00
Adhemerval Zanella	e0f7da7235	powerpc: Update soft-fp ulps Results based on regen-ulps using gcc 11.2.1 on a POWER8 machine.	2024-07-19 19:29:35 +02:00
John David Anglin	8cfa4ecff2	Fix usage of _STACK_GROWS_DOWN and _STACK_GROWS_UP defines [BZ 31989] Signed-off-by: John David Anglin <dave.anglin@bell.net> Reviewed-By: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-07-19 10:10:17 -04:00
Florian Weimer	91eb62d638	Adjust check-local-headers test for libaudit 4.0 The new version introduces /usr/include/audit_logging.h and /usr/include/audit-records.h.	2024-07-19 15:57:46 +02:00
Adhemerval Zanella	3c354d62f5	elf: Parse the auxv values as unsigned on tst-tunables-enable_secure-env.c (BZ 31890) AT_HWCAP on some architecture can indeed use all bits. Checked on x86_64-linux-gnu and powerpc-linux-gnu. Reviewed-By: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-07-19 08:50:38 -03:00
H.J. Lu	66f2cd6e1a	x32: xfail elf/tst-platform-1 [BZ #22363 ] Xfail elf/tst-platform-1 on x32 since kernel passes i686 in AT_PLATFORM. See https://sourceware.org/bugzilla/show_bug.cgi?id=22363 Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>	2024-07-19 10:34:38 +02:00
Xi Ruoyao	d905183f0b	elf/tst-rtld-does-not-exist: Pass --inhibit-cache to rtld This avoids a test failure when the system has no /etc/ld.so.cache. Tested on x86_64-linux-gnu. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-19 01:15:53 -07:00
Andreas K. Hüttel	910aae6e5a	Revert "LoongArch: Add cfi instructions for _dl_tlsdesc_dynamic" We're in freeze for the 2.40 release. This reverts commit `43224b1379`. Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-07-17 15:24:51 +02:00
Samuel Thibault	6ed76f4efc	htl: Fix __pthread_init_thread declaration and definition `0e75c4a463` ("hurd: Fix pthread_self() without libpthread") added a declaration for ___pthread_init_thread instead of __pthread_init_thread, and missed defining the external hidden symbol.	2024-07-17 15:04:25 +02:00
Samuel Thibault	0e75c4a463	hurd: Fix pthread_self() without libpthread `5476f8cd2e` ("htl: move pthread_self info libc.") moved the htl pthread_self() function from libpthread to libc, replacing the previous libc stub that just returns 0. And `53da64d1cf` ("htl: Initialize ___pthread_self early") added initialization code which is needed before being able to call pthread_self. It is currently in libpthread, and thus never called before programs can call pthread_self from libc, which then segfaults when accessing _pthread_self()->thread. This moves the initialization to libc itself, as initialized variables, so pthread_self can always be called fine.	2024-07-17 14:14:21 +02:00
mengqinggang	43224b1379	LoongArch: Add cfi instructions for _dl_tlsdesc_dynamic In _dl_tlsdesc_dynamic, there are three 'addi.d sp, sp, -size' instructions to allocate stack size for Float/LSX/LASX registers. Every 'addi.d sp, sp, -size' needs a cfi_adjust_cfa_offset because of sp is used to compute CFA. But only one 'addi.d sp, sp, -size' will be run according to HWCAP value. And all cfi_adjust_cfa_offset will be executed in stack unwinding, it result in incorrect CFA. Change _dl_tlsdesc_dynamic to _dl_tlsdesc_dynamic, _dl_tlsdesc_dynamic_lsx and _dl_tlsdesc_dynamic_lasx. Conflicting cfi instructions can be distributed to the three functions. And cfi instructions can correspond to stack down instructions.	2024-07-17 09:32:25 +08:00
Noah Goldstein	5bcf6265f2	x86: Disable non-temporal memset on Skylake Server The original commit enabling non-temporal memset on Skylake Server had erroneous benchmarks (actually done on ICX). Further benchmarks indicate non-temporal stores may in fact by a regression on Skylake Server. This commit may be over-cautious in some cases, but should avoid any regressions for 2.40. Tested using qemu on all x86_64 cpu arch supported by both qemu + GLIBC. Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-07-16 17:20:18 +08:00
Flavio Cruz	2dcc908538	Add pthread_getname_np and pthread_setname_np for Hurd We use thread_get_name and thread_set_name to get and set the thread name, so nothing is stored in the thread structure since these functions are supposed to be called sparingly. One notable difference with Linux is that the thread name is up to 32 chars, whereas Linux's is 16. Also added a mach_RPC_CHECK to check for the existing of gnumach RPCs.	2024-07-16 09:21:52 +02:00
Andreas K. Hüttel	a11e15ea0a	math: Update alpha ulps Linux alphadev 6.9.8-gentoo-alpha #1 Sun Jul 7 00:45:49 EDT 2024 alpha EV68CB Titan GNU/Linux gcc (Gentoo 14.1.1_p20240622 p2) 14.1.1 20240622 GNU ld (Gentoo 2.42 p6) 2.42.0 Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-07-14 12:44:15 +02:00
Samuel Thibault	c8b4ce0b36	hurd: Fix restoring message to be retried save_data stores the start of the original message to be retried, overwritten by the EINTR reply. In 64b builds the overwrite is however rounded up to the 64b pointer size, so we have to save more than just the 32b err. Thanks a lot to Luca Dariz for the investigation!	2024-07-13 17:05:13 +02:00
Maciej W. Rozycki	4b2a1b602f	nptl: Convert tst-sem11 and tst-sem12 tests to use the test driver Fix an issue with commit `2af4e3e566` ("Test of semaphores.") by making the tst-sem11 and tst-sem12 tests use the test driver, preventing them from ever causing testing to hang forever and never complete, such as currently happening with the 'mips-linux-gnu' (o32 ABI) target. Adjust the name of the PREPARE macro, which clashes with the interpretation of its presence by the test driver, by using a TF_ prefix in reference to the name of the 'tf' function. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-12 20:41:08 +02:00
Maciej W. Rozycki	9d8995833e	nptl: Add copyright notice tst-sem11 and tst-sem12 tests Add a copyright notice to the tst-sem11 and tst-sem12 tests, observing that they have been originally contributed back in 2007, with commit `2af4e3e566` ("Test of semaphores."). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-12 20:40:36 +02:00
Andreas K. Hüttel	ef7005628f	tests: XFAIL audit tests failing on all mips configurations, bug 29404 Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-07-12 18:49:42 +02:00
Samuel Dobron	255df9299f	time/Makefile: Split and sort tests Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-12 17:33:28 +02:00
Stefan Liebler	9b76514103	s390x: Fix segfault in wcsncmp [BZ #31934 ] The z13/vector-optimized wcsncmp implementation segfaults if n=1 and there is only one character (equal on both strings) before the page end. Then it loads and compares one character and misses to check n again. The following load fails. This patch removes the extra load and compare of the first character and just start with the loop which uses vector-load-to-block-boundary. This code-path also checks n. With this patch both tests are passing: - the simplified one mentioned in the bugzilla 31934 - the full one in Florian Weimer's patch: "manual: Document a GNU extension for strncmp/wcsncmp" (https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/): On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-07-11 15:08:57 +02:00
Florian Weimer	2e456ccf0c	Linux: Make __rseq_size useful for feature detection (bug 31965) The __rseq_size value is now the active area of struct rseq (so 20 initially), not the full struct size including padding at the end (32 initially). Update misc/tst-rseq to print some additional diagnostics. Reviewed-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>	2024-07-09 19:33:37 +02:00
Andreas K. Hüttel	7e7f35278c	po: incorporate translations (bg) Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-07-09 13:35:12 +02:00
DJ Delorie	6c0be74305	manual: add syscalls The purpose of this patch is to add some system calls that (1) aren't otherwise documented, and (2) are merely redirected to the kernel, so can refer to their documentation; and define a standard way of doing so in the future. A more detailed explaination of how system calls are wrapped is added along with reference to the Linux Man-Pages project. Default version of man-pages is in configure.ac but can be overridden by --with-man-pages=X.Y Reviewed-by: Alejandro Colomar <alx@kernel.org>	2024-07-09 11:54:29 +02:00
Andreas Schwab	2213b37b70	libio: handle opening a file when all files are closed (bug 31963) _IO_list_all becomes NULL when all files (including standard files) are closed.	2024-07-09 10:12:36 +02:00
Adam Sampson	895294e51d	ldconfig: Ignore all GDB extension files ldconfig already ignores files with the -gdb.py suffix, but GDB also looks for -gdb.gdb and -gdb.scm files. These aren't as widely used, but libguile at least comes with a -gdb.scm file. Rename is_gdb_python_file to is_gdb_extension_file, and make it recognise all three types of GDB extension. Signed-off-by: Adam Sampson <ats@offog.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-08 22:15:34 +02:00
Adam Sampson	ed2b8d3a86	ldconfig: Move endswithn into a new header file is_gdb_python_file is doing a similar test, so it can use this helper function as well. Signed-off-by: Adam Sampson <ats@offog.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-08 22:14:22 +02:00
Andreas K. Hüttel	ab6045728f	math: Update m68k ULPs This hasn't been looked at for a loong time (already guessing from the number of missing entries), and it ain't pretty. There are some 9-ulps results for float. - ZaZaZebra (qemu-system-m68k clone of PowerBook 190 system) - GCC 13.3.1 20240614 (Gentoo 13.3.1_p20240614 p17) - ld GNU ld (Gentoo 2.42 p6) 2.42.0 - Linux ZaZaZebra 4.19.0-5-m68k #1 Gentoo 4.19.37-5 (2019-06-19) m68k 68040 68040 GNU/Linux - manual build - ../glibc/configure --enable-fortify-source --prefix=/usr - Tested by Immolo (via Andreas K. Hüttel) Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-07-08 21:51:03 +02:00
Adhemerval Zanella	184b9e530e	stdlib: fix arc4random fallback to /dev/urandom (BZ 31612) The __getrandom_nocancel used by __arc4random_buf uses INLINE_SYSCALL_CALL (which returns -1/errno) and the loop checks for the return value instead of errno to fallback to /dev/urandom. The malloc code now uses __getrandom_nocancel_nostatus, which uses INTERNAL_SYSCALL_CALL, so there is no need to use the variant that does not set errno (BZ#29624). Checked on x86_64-linux-gnu. Reviewed-by: Xi Ruoyao <xry111@xry111.site>	2024-07-08 10:05:10 -03:00
Adhemerval Zanella	9fc639f654	elf: Make dl-rseq-symbols Linux only And avoid a Hurd build failures. Checked on x86_64-linux-gnu.	2024-07-04 10:09:07 -03:00
Michael Jeanson	2b92982e23	nptl: fix potential merge of __rseq_* relro symbols While working on a patch to add support for the extensible rseq ABI, we came across an issue where a new 'const' variable would be merged with the existing '__rseq_size' variable. We tracked this to the use of '-fmerge-all-constants' which allows the compiler to merge identical constant variables. This means that all 'const' variables in a compile unit that are of the same size and are initialized to the same value can be merged. In this specific case, on 32 bit systems 'unsigned int' and 'ptrdiff_t' are both 4 bytes and initialized to 0 which should trigger the merge. However for reasons we haven't delved into when the attribute 'section (".data.rel.ro")' is added to the mix, only variables of the same exact types are merged. As far as we know this behavior is not specified anywhere and could change with a new compiler version, hence this patch. Move the definitions of these variables into an assembler file and add hidden writable aliases for internal use. This has the added bonus of removing the asm workaround to set the values on rseq registration. Tested on Debian 12 with GCC 12.2. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-07-03 21:40:30 +02:00
Darius Rad	b85a23d736	riscv: Update nofpu libm test ulps Fixes 32 test failures.	2024-07-03 21:05:34 +02:00
Florian Weimer	7dde7f82d9	manual: Recommendations for dynamic linker hardening This new section in the manual provides recommendations for use of glibc in environments with higher integrity requirements. It's reflecting both current implementation shortcomings, and challenges we inherit from ELF and psABI requirements. Reviewed-by: Jonathan Wakely <jwakely@redhat.com>	2024-07-03 19:26:14 +02:00
Sergey Kolosov	50f5a09e68	socket: Add new test for shutdown This commit adds shutdown test with SHUT_RD, SHUT_WR, SHUT_RDWR for an UNIX socket connection. Reviewed-by: DJ Delorie <dj@redhat.com>	2024-07-03 12:55:44 -04:00
Stefan Liebler	d2f6ceaccb	elf/rtld: Fix auxiliary vector for enable_secure Starting with commit `59974938fe` elf/rtld: Count skipped environment variables for enable_secure The new testcase elf/tst-tunables-enable_secure-env segfaults on s390 (31bit). There _start parses the auxiliary vector for some additional checks. Therefore it skips over the zeros after the environment variables ... 0x7fffac20: 0x7fffbd17 0x7fffbd32 0x7fffbd69 0x00000000 ------------------------------------------------^^^last environment variable ... and then it parses the auxiliary vector and stops at AT_NULL. 0x7fffac30: 0x00000000 0x00000021 0x00000000 0x00000000 --------------------------------^^^AT_SYSINFO_EHDR--------------^^^AT_NULL ----------------^^^newp-----------------------------------------^^^oldp Afterwards it tries to access AT_PHDR which points to somewhere and segfaults. Due to not incorporating the skip_env variable in the computation of oldp when shuffling down the auxv in rtld.c, it just copies one entry with AT_NULL and value 0x00000021 and stops the loop. In reality we have skipped GLIBC_TUNABLES environment variable (=> skip_env=1). Thus we should copy from here: 0x7fffac40: 0x00000021 0x7ffff000 0x00000010 0x007fffff ----------------^^^fixed-oldp This patch fixes the computation of oldp when shuffling down auxiliary vector. It also adds some checks in the testcase. Those checks also fail on s390x (64bit) and x86_64 without the fix. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-03 13:01:44 +02:00
John David Anglin	4737e6a7a3	hppa/vdso: Provide 64-bit clock_gettime() vDSO only Adhemerval noticed that the gettimeofday() and 32-bit clock_gettime() vDSO calls won't be used by glibc on hppa, so there is no need to declare them. Both syscalls will be emulated by utilizing return values of the 64-bit clock_gettime() vDSO instead. Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>	2024-07-02 16:26:32 -04:00
Adhemerval Zanella	9f80d8134a	debug: Fix clang open fortify wrapper (BZ 31927) The clang open fortify wrapper from `4228baef1a` added a restriction where open with 3 arguments where flags do not contain O_CREAT or O_TMPFILE are handled as invalid. They are not invalid, since the third argument is ignored, and the gcc wrapper also allows it. Checked x86_64-linux-gnu and with a yocto build for some affected packages. Tested-by: “Khem Raj <raj.khen@gmail.com>”	2024-07-02 10:19:44 -03:00
H.J. Lu	ba144c179e	Add --disable-static-c++-tests option [BZ #31797 ] By default, if the C++ toolchain lacks support for static linking, configure fails to find the C++ header files and the glibc build fails. The --disable-static-c++-link-check option allows the glibc build to finish, but static C++ tests will fail if the C++ toolchain doesn't have the necessary static C++ libraries which may not be easily installed. Add --disable-static-c++-tests option to skip the static C++ link check and tests. This fixes BZ #31797. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-07-02 00:51:34 -07:00
H.J. Lu	23f12e6e0c	Add --disable-static-c++-link-check option [BZ #31412 ] The current minimum GCC version of glibc build is GCC 6.2 or newer. But building i686 glibc with GCC 6.4 on Fedora 40 failed since the C++ header files couldn't be found which was caused by the static C++ link check failure due to missing __divmoddi4 which was referenced in i686 libc.a and added to GCC 7. Add --disable-static-c++-link-check configure option to disable the static C++ link test. The newly built i686 libc.a can be used by GCC 6.4 to create static C++ tests. This fixes BZ #31412. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-07-02 00:51:34 -07:00
DJ Delorie	dce754b155	Update mmap() flags and errors lists Extend the list of MAP_* macros to include all macros available to the average program (gcc -E -dM \| grep MAP_*) Extend the list of errno codes. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-07-01 16:44:55 -04:00
YunQiang Su	9d0e9c8a13	MIPSr6/math: Use builtin fma and fmaf MIPSr6 has MADDF.s/MADDF.d instructions, which are fused. In MIPS ISA, double support can be subsetted. Only FMAF is enabled for this case. * sysdeps/mips/fpu/math-use-builtins-fma.h Signed-off-by: YunQiang Su <syq@gcc.gnu.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-01 14:52:30 -03:00
Florian Weimer	018f0fc3b8	elf: Support recursive use of dynamic TLS in interposed malloc It turns out that quite a few applications use bundled mallocs that have been built to use global-dynamic TLS (instead of the recommended initial-exec TLS). The previous workaround from commit `afe42e935b` ("elf: Avoid some free (NULL) calls in _dl_update_slotinfo") does not fix all encountered cases unfortunatelly. This change avoids the TLS generation update for recursive use of TLS from a malloc that was called during a TLS update. This is possible because an interposed malloc has a fixed module ID and TLS slot. (It cannot be unloaded.) If an initially-loaded module ID is encountered in __tls_get_addr and the dynamic linker is already in the middle of a TLS update, use the outdated DTV, thus avoiding another call into malloc. It's still necessary to update the DTV to the most recent generation, to get out of the slow path, which is why the check for recursion is needed. The bookkeeping is done using a global counter instead of per-thread flag because TLS access in the dynamic linker is tricky. All this will go away once the dynamic linker stops using malloc for TLS, likely as part of a change that pre-allocates all TLS during pthread_create/dlopen. Fixes commit `d2123d6827` ("elf: Fix slow tls access after dlopen [BZ #19924]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-07-01 19:02:11 +02:00
Carlos O'Donell	a7fe3e805d	Fix conditionals on mtrace-based tests (bug 31892) The conditionals for several mtrace-based tests in catgets, elf, libio, malloc, misc, nptl, posix, and stdio-common were incorrect leading to test failures when bootstrapping glibc without perl. The correct conditional for mtrace-based tests requires three checks: first checking for run-built-tests, then build-shared, and lastly that PERL is not equal to "no" (missing perl). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-01 17:20:30 +02:00
Michel Lind	4f7eb238d0	signal/Makefile: Split and sort tests This will make it easier to add tests later; see also the test section in malloc/Makefile that inspires this. Signed-off-by: Michel Lind <salimma@fedoraproject.org> Suggested-by: Arjun Shankar <arjun@redhat.com> Reviewed-by: Arjun Shankar <arjun@redhat.com>	2024-07-01 13:47:27 +02:00
MayShao-oc	9dc645cb56	x86: Set default non_temporal_threshold for Zhaoxin processors Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
MayShao-oc	c19457aec6	x86_64: Optimize large size copy in memmove-ssse3 This patch optimizes large size copy using normal store when src > dst and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S. Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non- temporal threshold, this patch updates that value to '__x86_shared_non_temporal_threshold'. Currently, the __x86_shared_non_temporal_threshold is cpu-specific, and different CPUs will have different values based on the related nt-benchmark results. However, in memmove-ssse3, the nontemporal threshold uses '__x86_shared_cache_size_half', which sounds unreasonable. The performance is not changed drastically although shows overall improvements without any major regressions or gains. Results on Zhaoxin KX-7000: bench-memcpy geometric_mean(N=20) New / Original: 0.999 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 0.978 bench-memmove geometric_mean(N=20) New / Original: 1.000 bench-memmmove-large geometric_mean(N=20) New / Original: 0.962 Results on Intel Core i5-6600K: bench-memcpy geometric_mean(N=20) New / Original: 1.001 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 1.001 bench-memmove geometric_mean(N=20) New / Original: 0.995 bench-memmmove-large geometric_mean(N=20) New / Original: 0.936 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
MayShao-oc	44d757eb9f	x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
Andrew Pinski	2f1f7a5f8a	Aarch64: Add new memset for Qualcomm's oryon-1 core Qualcom's new core, oryon-1, has a different characteristics for memset than the current versions of memset. For non-zero, larger sizes, using GPRs rather than the SIMD stores is ~30% faster. For even larger sizes, using the nontemporal stores is needed not to polute the L1/L2 caches. For zero values, using `dc zva` should be used. Since we know the size will always be 64 bytes, we don't need to figure out the size there. I started with the emag memset and added back the `dc zva` code. Changes since v1: * v3: Fix comment formating Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-30 13:47:17 +02:00
Andrew Pinski	4dc83cac78	Aarch64: Add memcpy for qualcomm's oryon-1 core Qualcomm's new core (oryon-1) has a different performance characteristic than other cores. For memcpy, it is faster to use the GPRs to do the copy for large sizes (2x faster). For even larger sizes, it is better to use the nontemporal load/store instructions so we don't pollute the L1/L2 caches. For smaller sizes, the characteristic are very similar to other cores. I used the thunderx memcpy as a starting point and expanded from there. Changes since v1: * v2: Fix ordering in Makefile. * v3: Fix comment grammar about the ldnp/stnp instructions. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-30 13:46:33 +02:00
Adhemerval Zanella	4228baef1a	debug: Fix clang open fortify wrapper (BZ 31927) The fcntl.h fortify wrapper for clang added by `86889e22db` missed the __fortify_clang_overload_arg and and also added the mode argument for the __fortify_function_error_function function, which leads clang to be able to correct resolve which overloaded function it should emit. Checked on x86_64-linux-gnu. Reported-by: Khem Raj <raj.khem@gmail.com> Tested-by: Khem Raj <raj.khem@gmail.com>	2024-06-27 13:32:48 -03:00

1 2 3 4 5 ...

41205 Commits