glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-13 06:40:09 +00:00

Author	SHA1	Message	Date
Aurelien Jarno	b9cbb8dd48	x86-64: Require BMI2 for AVX2 strncmp implementation The AVX2 strncmp implementations uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `fc7de1d9b9`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	e1561d8cf0	x86-64: Require BMI2 for AVX2 strcmp implementation The AVX2 strcmp implementation uses the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `4d64c64457`)	2022-10-04 00:00:59 +02:00
Aurelien Jarno	414fc856ff	x86-64: Require BMI2 for AVX2 str(n)casecmp implementations The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which belongs to the BMI2 CPU feature. NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF as BSF if the CPU doesn't support TZCNT, and produces the same result for non-zero input. Partially fixes: `b77b06e0e2` ("x86: Optimize strcmp-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `10f79d3670`)	2022-10-04 00:00:58 +02:00
Aurelien Jarno	95f5089d4a	x86: include BMI1 and BMI2 in x86-64-v3 level The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `b80f16adbd`)	2022-10-04 00:00:58 +02:00
Wangyang Guo	ea69248445	nptl: Add backoff mechanism to spinlock loop When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `8162147872`)	2022-09-28 07:34:53 -07:00
Noah Goldstein	04efdcfac4	sysdeps: Add 'get_fast_jitter' interace in fast-jitter.h 'get_fast_jitter' is meant to be used purely for performance purposes. In all cases it's used it should be acceptable to get no randomness (see default case). An example use case is in setting jitter for retries between threads at a lock. There is a performance benefit to having jitter, but only if the jitter can be generated very quickly and ultimately there is no serious issue if no jitter is generated. The implementation generally uses 'HP_TIMING_NOW' iff it is inlined (avoid any potential syscall paths). Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `911c63a51c`)	2022-09-28 07:34:31 -07:00
Jangwoong Kim	43760d33d7	nptl: Effectively skip CAS in spinlock loop The commit: "Add LLL_MUTEX_READ_LOCK [BZ #28537]" SHA1: `d672a98a1a` introduced LLL_MUTEX_READ_LOCK, to skip CAS in spinlock loop if atomic load fails. But, "continue" inside of do-while loop does not skip the evaluation of escape expression, thus CAS is not skipped. Replace do-while with while and skip LLL_MUTEX_TRYLOCK if LLL_MUTEX_READ_LOCK fails. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `6b8dbbd03a`)	2022-09-28 07:34:08 -07:00
H.J. Lu	6bcfbee727	Move assignment out of the CAS condition Update commit `49302b8fdf` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Nov 11 06:54:01 2021 -0800 Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537] Replace boolean CAS with value CAS to avoid the extra load. and commit `0b82747dc4` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Nov 11 06:31:51 2021 -0800 Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537] Replace boolean CAS with value CAS to avoid the extra load. by moving assignment out of the CAS condition. (cherry picked from commit `120ac6d238`)	2022-09-28 07:33:49 -07:00
H.J. Lu	a6b81f605d	Add LLL_MUTEX_READ_LOCK [BZ #28537 ] CAS instruction is expensive. From the x86 CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock loop if compare may fail to reduce cache line bouncing on contended locks. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit `d672a98a1a`)	2022-09-28 07:33:27 -07:00
H.J. Lu	ed8300c054	Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537 ] Replace boolean CAS with value CAS to avoid the extra load. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit `49302b8fdf`)	2022-09-28 07:33:09 -07:00
H.J. Lu	a2e259014f	Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537 ] Replace boolean CAS with value CAS to avoid the extra load. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> (cherry picked from commit `0b82747dc4`)	2022-09-28 07:32:55 -07:00
Florian Weimer	044755e2fa	resolv: Fix building tst-resolv-invalid-cname for earlier C standards This fixes this compiler error: tst-resolv-invalid-cname.c: In function ‘test_mode_to_string’: tst-resolv-invalid-cname.c:164:10: error: label at end of compound statement case test_mode_num: ^~~~~~~~~~~~~ Fixes commit `9caf782276` ("resolv: Add new tst-resolv-invalid-cname"). (cherry picked from commit `d09aa4a172`)	2022-09-21 19:37:24 +02:00
Florian Weimer	2def56a349	nss_dns: Rewrite _nss_dns_gethostbyname4_r using current interfaces Introduce struct alloc_buffer to this function, and use it and struct ns_rr_cursor in gaih_getanswer_slice. Adjust gaih_getanswer and gaih_getanswer_noaaaa accordingly. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `1d495912a7`) (conflict in resolv/nss_dns/dns-host.c due to missing noaaaa support)	2022-09-21 19:37:24 +02:00
Florian Weimer	480c820493	resolv: Add new tst-resolv-invalid-cname This test checks resolution through CNAME chains that do not contain host names (bug 12154). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `9caf782276`)	2022-09-21 19:37:24 +02:00
Florian Weimer	c36e7cca35	nss_dns: In gaih_getanswer_slice, skip strange aliases (bug 12154) If the name is not a host name, skip adding it to the result, instead of reporting query failure. This fixes bug 12154 for getaddrinfo. This commit still keeps the old parsing code, and only adjusts when a host name is copied. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `32b599ac8c`)	2022-09-21 19:37:24 +02:00
Florian Weimer	9abc40d9b5	nss_dns: Rewrite getanswer_r to match getanswer_ptr (bug 12154, bug 29305) Allocate the pointer arrays only at the end, when their sizes are known. This addresses bug 29305. Skip over invalid names instead of failing lookups. This partially fixes bug 12154 (for gethostbyname, fixing getaddrinfo requires different changes). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `d101d836e7`)	2022-09-21 19:37:17 +02:00
Florian Weimer	7267341ec1	nss_dns: Remove remnants of IPv6 address mapping res_use_inet6 always returns false since commit `3f8b44be0a` ("resolv: Remove support for RES_USE_INET6 and the inet6 option"). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `a7fc30b522`)	2022-09-21 19:36:12 +02:00
Florian Weimer	32e5db3768	nss_dns: Rewrite _nss_dns_gethostbyaddr2_r and getanswer_ptr The simplification takes advantage of the split from getanswer_r. It fixes various aliases issues, and optimizes NSS buffer usage. The new DNS packet parsing helpers are used, too. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `e32547d661`)	2022-09-21 19:36:12 +02:00
Florian Weimer	d9c979abf9	nss_dns: Split getanswer_ptr from getanswer_r And expand the use of name_ok and qtype in getanswer_ptr (the former also in getanswer_r). After further cleanups, not much code will be shared between the two functions. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `0dcc43e998`)	2022-09-21 19:36:12 +02:00
Florian Weimer	e7c03f4765	resolv: Add DNS packet parsing helpers geared towards wire format The public parser functions around the ns_rr record type produce textual domain names, but usually, this is not what we need while parsing DNS packets within glibc. This commit adds two new helper functions, __ns_rr_cursor_init and __ns_rr_cursor_next, for writing packet parsers, and struct ns_rr_cursor, struct ns_rr_wire as supporting types. In theory, it is possible to avoid copying the owner name into the rname field in __ns_rr_cursor_next, but this would need more functions that work on compressed names. Eventually, __res_context_send could be enhanced to preserve the result of the packet parsing that is necessary for matching the incoming UDP packets, so that this works does not have to be done twice. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `857c890d9b`)	2022-09-21 19:36:12 +02:00
Florian Weimer	c288e032ae	resolv: Add internal __ns_name_length_uncompressed function This function is useful for checking that the question name is uncompressed (as it should be). Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `78b1a4f0e4`)	2022-09-21 19:36:12 +02:00
Florian Weimer	bb8adbba4f	resolv: Add the __ns_samebinaryname function During packet parsing, only the binary name is available. If the name equality check is performed before conversion to text, we can sometimes skip the last step. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `394085a34d`)	2022-09-21 19:36:12 +02:00
Florian Weimer	4d2e67d6e5	resolv: Add internal __res_binary_hnok function During package parsing, only the binary representation is available, and it is convenient to check that directly for conformance with host name requirements. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `c79327bf00`)	2022-09-21 19:36:12 +02:00
Florian Weimer	6a833d798e	resolv: Add tst-resolv-aliases Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `87aa98aa80`)	2022-09-21 19:36:12 +02:00
Florian Weimer	1a3afdfe31	resolv: Add tst-resolv-byaddr for testing reverse lookup Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `0b99828d54`)	2022-09-21 19:36:12 +02:00
Florian Weimer	f50a6c843a	gconv: Use 64-bit interfaces in gconv_parseconfdir (bug 29583) It's possible that inode numbers are outside the 32-bit range. The existing code only handles the in-libc case correctly, and still uses the legacy interfaces when building iconv. Suggested-by: Helge Deller <deller@gmx.de> (cherry picked from commit `f97905f246`)	2022-09-21 13:13:02 +02:00
Javier Pello	2ff6775ad3	elf: Fix hwcaps string size overestimation Commit `dad90d5282` added glibc-hwcaps support for LD_LIBRARY_PATH and, for this, it adjusted the total string size required in _dl_important_hwcaps. However, in doing so it inadvertently altered the calculation of the size required for the power set strings, as the computation of the power set string size depended on the first value assigned to the total variable, which is later shifted, resulting in overallocation of string space. Fix this now by using a different variable to hold the string size required for glibc-hwcaps. Signed-off-by: Javier Pello <devel@otheo.eu> (cherry picked from commit `a23820f605`)	2022-09-15 15:44:14 +02:00
Florian Weimer	bc5cb538e5	elf: Run tst-audit-tlsdesc, tst-audit-tlsdesc-dlopen everywhere The test is valid for all TLS models, but we want to make a reasonable effort to test the GNU2 model specifically. For example, aarch64 defaults to GNU2, but does not have -mtls-dialect=gnu2, and the test was not run there. Suggested-by: Martin Coufal <mcoufal@redhat.com> (cherry picked from commit `dd2315a866`) Fixes early backport commit `536ddc5c02` ("elf: Call __libc_early_init for reused namespaces (bug 29528)"); it had a wrong conflict resolution.	2022-09-13 20:47:59 +02:00
Fabian Vogt	2b3d020055	nscd: Fix netlink cache invalidation if epoll is used [BZ #29415 ] Processes cache network interface information such as whether IPv4 or IPv6 are enabled. This is only checked again if the "netlink timestamp" provided by nscd changed, which is triggered by netlink socket activity. However, in the epoll handler for the netlink socket, it was missed to assign the new timestamp to the nscd database. The handler for plain poll did that properly, copy that over. This bug caused that e.g. processes which started before network configuration got unusuable addresses from getaddrinfo, like IPv6 only even though only IPv4 is available: https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1041 It's a bit hard to reproduce, so I verified this by checking the timestamp on calls to __check_pf manually. Without this patch it's stuck at 1, now it's increasing on network changes as expected. Signed-off-by: Fabian Vogt <fvogt@suse.de> (cherry picked from commit `02ca25fef2`)	2022-09-06 17:17:35 +02:00
Raphael Moreira Zinsly	b41c535f46	Apply asm redirections in wchar.h before first use Similar to `d0fa09a770`, but for wchar.h. Fixes [BZ #27087] by applying all long double related asm redirections before using functions in bits/wchar2.h. Moves the function declarations from wcsmbs/bits/wchar2.h to a new file wcsmbs/bits/wchar2-decl.h that will be included first in wcsmbs/wchar.h. Tested with build-many-glibcs.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `c7509d49c4`)	2022-08-31 10:29:54 +02:00
Tulio Magno Quites Machado Filho	2a44960cbc	Apply asm redirections in stdio.h before first use [BZ #27087 ] Compilers may not be able to apply asm redirections to functions after these functions are used for the first time, e.g. clang 13. Fix [BZ #27087] by applying all long double-related asm redirections before using functions in bits/stdio.h. However, as these asm redirections depend on the declarations provided by libio/bits/stdio2.h, this header was split in 2: - libio/bits/stdio2-decl.h contains all function declarations; - libio/bits/stdio2.h remains with the remaining contents, including redirections. This also adds the access attribute to __vsnprintf_chk that was missing. Tested with build-many-glibcs.py. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com> (cherry picked from commit `d0fa09a770`)	2022-08-31 10:29:46 +02:00
Florian Weimer	536ddc5c02	elf: Call __libc_early_init for reused namespaces (bug 29528) libc_map is never reset to NULL, neither during dlclose nor on a dlopen call which reuses the namespace structure. As a result, if a namespace is reused, its libc is not initialized properly. The most visible result is a crash in the <ctype.h> functions. To prevent similar bugs on namespace reuse from surfacing, unconditionally initialize the chosen namespace to zero using memset. (cherry picked from commit `d0e357ff45`)	2022-08-30 16:30:03 +02:00
Arjun Shankar	68507377f2	socket: Check lengths before advancing pointer in CMSG_NXTHDR The inline and library functions that the CMSG_NXTHDR macro may expand to increment the pointer to the header before checking the stride of the increment against available space. Since C only allows incrementing pointers to one past the end of an array, the increment must be done after a length check. This commit fixes that and includes a regression test for CMSG_FIRSTHDR and CMSG_NXTHDR. The Linux, Hurd, and generic headers are all changed. Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x. [BZ #28846] Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `9c443ac455`)	2022-08-22 18:59:26 +02:00
Florian Weimer	1fcc7bfee2	alpha: Fix generic brk system call emulation in __brk_call (bug 29490) The kernel special-cases the zero argument for alpha brk, and we can use that to restore the generic Linux error handling behavior. Fixes commit `b57ab258c1` ("Linux: Introduce __brk_call for invoking the brk system call"). (cherry picked from commit `e7ad26ee3c`)	2022-08-22 11:12:40 +02:00
Noah Goldstein	4bc889c01c	stdlib: Fixup mbstowcs NULL __dst handling. [BZ #29279 ] commit `464d189b96` (origin/master, origin/HEAD) Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 22 08:24:21 2022 -0700 stdlib: Remove attr_write from mbstows if dst is NULL [BZ: 29265] Incorrectly called `__mbstowcs_chk` in the NULL __dst case which is incorrect as in the NULL __dst case we are explicitly skipping the objsize checks. As well, remove the `__always_inline` attribute which exists in `__fortify_function`. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `220b83d83d`)	2022-08-15 23:14:15 +02:00
Noah Goldstein	a88f07f71f	stdlib: Remove attr_write from mbstows if dst is NULL [BZ: 29265] mbstows is defined if dst is NULL and is defined to special cased if dst is NULL so the fortify objsize check if incorrect in that case. Tested on x86-64 linux. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `464d189b96`)	2022-08-15 23:14:13 +02:00
Joseph Myers	4ab59ce4e5	Update syscall lists for Linux 5.19 Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi headers for RISC-V. Update the version number in syscall-names.list to reflect that it is still current for 5.19 and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py. (cherry picked from commit `fccadcdf5b`)	2022-08-08 07:54:02 +02:00
Florian Weimer	875b2414cd	dlfcn: Pass caller pointer to static dlopen implementation (bug 29446) Fixes commit `0c1c3a771e` ("dlfcn: Move dlopen into libc"). (cherry picked from commit `ed0185e412`)	2022-08-04 20:57:18 +02:00
Florian Weimer	b2f32e7464	malloc: Simplify implementation of __malloc_assert It is prudent not to run too much code after detecting heap corruption, and __fxprintf is really complex. The line number and file name do not carry much information, so it is not included in the error message. (__libc_message only supports %s formatting.) The function name and assertion should provide some context. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `ac8047cdf3`)	2022-07-21 17:06:27 +02:00
Joseph Myers	b991af5063	Update syscall-names.list for Linux 5.18 Linux 5.18 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 5.18. Tested with build-many-glibcs.py. (cherry picked from commit `3d9926663c`)	2022-07-21 17:06:27 +02:00
Noah Goldstein	ccc54bd61c	x86: Add missing IS_IN (libc) check to strncmp-sse4_2.S Was missing to for the multiarch build rtld-strncmp-sse4_2.os was being built and exporting symbols: build/glibc/string/rtld-strncmp-sse4_2.os: 0000000000000000 T __strncmp_sse42 Introduced in: commit `11ffcacb64` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 21 12:10:50 2017 -0700 x86-64: Implement strcmp family IFUNC selectors in C (cherry picked from commit `96ac447d91`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	35f9c72c8b	x86: Move mem{p}{mov\|cpy}_{chk_}erms to its own file The primary memmove_{impl}_unaligned_erms implementations don't interact with this function. Putting them in same file both wastes space and unnecessarily bloats a hot code section. (cherry picked from commit `21925f6473`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	7079931c51	x86: Move and slightly improve memset_erms Implementation wise: 1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not use the L(stosb) label that was previously defined. 2. Don't give the hotpath (fallthrough) to zero size. Code positioning wise: Move memset_{chk}_erms to its own file. Leaving it in between the memset_{impl}_unaligned both adds unnecessary complexity to the file and wastes space in a relatively hot cache section. (cherry picked from commit `4a3f29e7e4`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	f4598f0351	x86: Add definition for __wmemset_chk AVX2 RTM in ifunc impl list This was simply missing and meant we weren't testing it properly. (cherry picked from commit `2a1099020c`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	aadd0a1c7c	x86: Put wcs{n}len-sse4.1 in the sse4.1 text section Previously was missing but the two implementations shouldn't get in the sse2 (generic) text section. (cherry picked from commit `afc6e4328f`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	d201c59177	x86: Align entry for memrchr to 64-bytes. The function was tuned around 64-byte entry alignment and performs better for all sizes with it. As well different code boths where explicitly written to touch the minimum number of cache line i.e sizes <= 32 touch only the entry cache line. (cherry picked from commit `227afaa672`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	c51d8d383c	x86: Add BMI1/BMI2 checks for ISA_V3 check BMI1/BMI2 are part of the ISA V3 requirements: https://en.wikipedia.org/wiki/X86-64 And defined by GCC when building with `-march=x86-64-v3` (cherry picked from commit `8da9f346cb`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	ba1c3f23d9	x86: Cleanup bounds checking in large memcpy case 1. Fix incorrect lower-bound threshold in L(large_memcpy_2x). Previously was using `__x86_rep_movsb_threshold` and should have been using `__x86_shared_non_temporal_threshold`. 2. Avoid reloading __x86_shared_non_temporal_threshold before the L(large_memcpy_4x) bounds check. 3. Document the second bounds check for L(large_memcpy_4x) more clearly. (cherry picked from commit `89a25c6f64`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	94b0dc9419	x86: Add bounds `x86_non_temporal_threshold` The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed by memmove-vec-unaligned-erms. The lower-bound is needed because memmove-vec-unaligned-erms unrolls the loop aggressively in the L(large_memset_4x) case. The upper-bound is needed because memmove-vec-unaligned-erms right-shifts the value of `x86_non_temporal_threshold` by LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow. The lack of lower-bound can be a correctness issue. The lack of upper-bound cannot. (cherry picked from commit `b446822b6a`)	2022-07-18 20:45:21 -07:00
Noah Goldstein	9d50e162ee	x86: Add sse42 implementation to strcmp's ifunc This has been missing since the the ifuncs where added. The performance of SSE4.2 is preferable to to SSE2. Measured on Tigerlake with N = 20 runs. Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906 (cherry picked from commit `ff439c4717`)	2022-07-18 20:45:21 -07:00

1 2 3 4 5 ...

38079 Commits