The AVX2 strncmp implementations uses the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.
NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.
Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit fc7de1d9b9)
The AVX2 strcmp implementation uses the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.
NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.
Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 4d64c64457)
The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.
NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.
Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 10f79d3670)
The "System V Application Binary Interface AMD64 Architecture Processor
Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3
level.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit b80f16adbd)
'get_fast_jitter' is meant to be used purely for performance
purposes. In all cases it's used it should be acceptable to get no
randomness (see default case). An example use case is in setting
jitter for retries between threads at a lock. There is a
performance benefit to having jitter, but only if the jitter can
be generated very quickly and ultimately there is no serious issue
if no jitter is generated.
The implementation generally uses 'HP_TIMING_NOW' iff it is
inlined (avoid any potential syscall paths).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 911c63a51c)
The commit:
"Add LLL_MUTEX_READ_LOCK [BZ #28537]"
SHA1: d672a98a1a
introduced LLL_MUTEX_READ_LOCK, to skip CAS in spinlock loop
if atomic load fails. But, "continue" inside of do-while loop
does not skip the evaluation of escape expression, thus CAS
is not skipped.
Replace do-while with while and skip LLL_MUTEX_TRYLOCK if
LLL_MUTEX_READ_LOCK fails.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 6b8dbbd03a)
Update
commit 49302b8fdf
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:54:01 2021 -0800
Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
and
commit 0b82747dc4
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:31:51 2021 -0800
Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
by moving assignment out of the CAS condition.
(cherry picked from commit 120ac6d238)
CAS instruction is expensive. From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading. See Appendix
A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.
Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit d672a98a1a)
This fixes this compiler error:
tst-resolv-invalid-cname.c: In function ‘test_mode_to_string’:
tst-resolv-invalid-cname.c:164:10: error: label at end of compound statement
case test_mode_num:
^~~~~~~~~~~~~
Fixes commit 9caf782276
("resolv: Add new tst-resolv-invalid-cname").
(cherry picked from commit d09aa4a172)
Introduce struct alloc_buffer to this function, and use it and
struct ns_rr_cursor in gaih_getanswer_slice. Adjust gaih_getanswer
and gaih_getanswer_noaaaa accordingly.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 1d495912a7)
(conflict in resolv/nss_dns/dns-host.c due to missing noaaaa support)
This test checks resolution through CNAME chains that do not contain
host names (bug 12154).
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 9caf782276)
If the name is not a host name, skip adding it to the result, instead
of reporting query failure. This fixes bug 12154 for getaddrinfo.
This commit still keeps the old parsing code, and only adjusts when
a host name is copied.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 32b599ac8c)
Allocate the pointer arrays only at the end, when their sizes
are known. This addresses bug 29305.
Skip over invalid names instead of failing lookups. This partially
fixes bug 12154 (for gethostbyname, fixing getaddrinfo requires
different changes).
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit d101d836e7)
res_use_inet6 always returns false since commit 3f8b44be0a
("resolv: Remove support for RES_USE_INET6 and the inet6 option").
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit a7fc30b522)
The simplification takes advantage of the split from getanswer_r.
It fixes various aliases issues, and optimizes NSS buffer usage.
The new DNS packet parsing helpers are used, too.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit e32547d661)
And expand the use of name_ok and qtype in getanswer_ptr (the
former also in getanswer_r).
After further cleanups, not much code will be shared between the
two functions.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 0dcc43e998)
The public parser functions around the ns_rr record type produce
textual domain names, but usually, this is not what we need while
parsing DNS packets within glibc. This commit adds two new helper
functions, __ns_rr_cursor_init and __ns_rr_cursor_next, for writing
packet parsers, and struct ns_rr_cursor, struct ns_rr_wire as
supporting types.
In theory, it is possible to avoid copying the owner name
into the rname field in __ns_rr_cursor_next, but this would need
more functions that work on compressed names.
Eventually, __res_context_send could be enhanced to preserve the
result of the packet parsing that is necessary for matching the
incoming UDP packets, so that this works does not have to be done
twice.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 857c890d9b)
This function is useful for checking that the question name is
uncompressed (as it should be).
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 78b1a4f0e4)
During packet parsing, only the binary name is available. If the name
equality check is performed before conversion to text, we can sometimes
skip the last step.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 394085a34d)
During package parsing, only the binary representation is available,
and it is convenient to check that directly for conformance with host
name requirements.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit c79327bf00)
It's possible that inode numbers are outside the 32-bit range.
The existing code only handles the in-libc case correctly, and
still uses the legacy interfaces when building iconv.
Suggested-by: Helge Deller <deller@gmx.de>
(cherry picked from commit f97905f246)
Commit dad90d5282 added glibc-hwcaps
support for LD_LIBRARY_PATH and, for this, it adjusted the total
string size required in _dl_important_hwcaps. However, in doing so
it inadvertently altered the calculation of the size required for
the power set strings, as the computation of the power set string
size depended on the first value assigned to the total variable,
which is later shifted, resulting in overallocation of string
space. Fix this now by using a different variable to hold the
string size required for glibc-hwcaps.
Signed-off-by: Javier Pello <devel@otheo.eu>
(cherry picked from commit a23820f605)
The test is valid for all TLS models, but we want to make a reasonable
effort to test the GNU2 model specifically. For example, aarch64
defaults to GNU2, but does not have -mtls-dialect=gnu2, and the test
was not run there.
Suggested-by: Martin Coufal <mcoufal@redhat.com>
(cherry picked from commit dd2315a866)
Fixes early backport commit 536ddc5c02
("elf: Call __libc_early_init for reused namespaces (bug 29528)");
it had a wrong conflict resolution.
Processes cache network interface information such as whether IPv4 or IPv6
are enabled. This is only checked again if the "netlink timestamp" provided
by nscd changed, which is triggered by netlink socket activity.
However, in the epoll handler for the netlink socket, it was missed to
assign the new timestamp to the nscd database. The handler for plain poll
did that properly, copy that over.
This bug caused that e.g. processes which started before network
configuration got unusuable addresses from getaddrinfo, like IPv6 only even
though only IPv4 is available:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1041
It's a bit hard to reproduce, so I verified this by checking the timestamp
on calls to __check_pf manually. Without this patch it's stuck at 1, now
it's increasing on network changes as expected.
Signed-off-by: Fabian Vogt <fvogt@suse.de>
(cherry picked from commit 02ca25fef2)
Similar to d0fa09a770, but for wchar.h. Fixes [BZ #27087] by applying
all long double related asm redirections before using functions in
bits/wchar2.h.
Moves the function declarations from wcsmbs/bits/wchar2.h to a new file
wcsmbs/bits/wchar2-decl.h that will be included first in wcsmbs/wchar.h.
Tested with build-many-glibcs.py.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit c7509d49c4)
Compilers may not be able to apply asm redirections to functions after
these functions are used for the first time, e.g. clang 13.
Fix [BZ #27087] by applying all long double-related asm redirections
before using functions in bits/stdio.h.
However, as these asm redirections depend on the declarations provided
by libio/bits/stdio2.h, this header was split in 2:
- libio/bits/stdio2-decl.h contains all function declarations;
- libio/bits/stdio2.h remains with the remaining contents, including
redirections.
This also adds the access attribute to __vsnprintf_chk that was missing.
Tested with build-many-glibcs.py.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
(cherry picked from commit d0fa09a770)
libc_map is never reset to NULL, neither during dlclose nor on a
dlopen call which reuses the namespace structure. As a result, if a
namespace is reused, its libc is not initialized properly. The most
visible result is a crash in the <ctype.h> functions.
To prevent similar bugs on namespace reuse from surfacing,
unconditionally initialize the chosen namespace to zero using memset.
(cherry picked from commit d0e357ff45)
The inline and library functions that the CMSG_NXTHDR macro may expand
to increment the pointer to the header before checking the stride of
the increment against available space. Since C only allows incrementing
pointers to one past the end of an array, the increment must be done
after a length check. This commit fixes that and includes a regression
test for CMSG_FIRSTHDR and CMSG_NXTHDR.
The Linux, Hurd, and generic headers are all changed.
Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x.
[BZ #28846]
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 9c443ac455)
The kernel special-cases the zero argument for alpha brk, and we can
use that to restore the generic Linux error handling behavior.
Fixes commit b57ab258c1 ("Linux:
Introduce __brk_call for invoking the brk system call").
(cherry picked from commit e7ad26ee3c)
commit 464d189b96 (origin/master, origin/HEAD)
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Wed Jun 22 08:24:21 2022 -0700
stdlib: Remove attr_write from mbstows if dst is NULL [BZ: 29265]
Incorrectly called `__mbstowcs_chk` in the NULL __dst case which is
incorrect as in the NULL __dst case we are explicitly skipping
the objsize checks.
As well, remove the `__always_inline` attribute which exists in
`__fortify_function`.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 220b83d83d)
mbstows is defined if dst is NULL and is defined to special cased if
dst is NULL so the fortify objsize check if incorrect in that case.
Tested on x86-64 linux.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 464d189b96)
Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi
headers for RISC-V. Update the version number in syscall-names.list
to reflect that it is still current for 5.19 and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py.
(cherry picked from commit fccadcdf5b)
It is prudent not to run too much code after detecting heap
corruption, and __fxprintf is really complex. The line number
and file name do not carry much information, so it is not included
in the error message. (__libc_message only supports %s formatting.)
The function name and assertion should provide some context.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ac8047cdf3)
Linux 5.18 has no new syscalls. Update the version number in
syscall-names.list to reflect that it is still current for 5.18.
Tested with build-many-glibcs.py.
(cherry picked from commit 3d9926663c)
Was missing to for the multiarch build rtld-strncmp-sse4_2.os was
being built and exporting symbols:
build/glibc/string/rtld-strncmp-sse4_2.os:
0000000000000000 T __strncmp_sse42
Introduced in:
commit 11ffcacb64
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jun 21 12:10:50 2017 -0700
x86-64: Implement strcmp family IFUNC selectors in C
(cherry picked from commit 96ac447d91)
The primary memmove_{impl}_unaligned_erms implementations don't
interact with this function. Putting them in same file both
wastes space and unnecessarily bloats a hot code section.
(cherry picked from commit 21925f6473)
Implementation wise:
1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not
use the L(stosb) label that was previously defined.
2. Don't give the hotpath (fallthrough) to zero size.
Code positioning wise:
Move memset_{chk}_erms to its own file. Leaving it in between the
memset_{impl}_unaligned both adds unnecessary complexity to the
file and wastes space in a relatively hot cache section.
(cherry picked from commit 4a3f29e7e4)
The function was tuned around 64-byte entry alignment and performs
better for all sizes with it.
As well different code boths where explicitly written to touch the
minimum number of cache line i.e sizes <= 32 touch only the entry
cache line.
(cherry picked from commit 227afaa672)
1. Fix incorrect lower-bound threshold in L(large_memcpy_2x).
Previously was using `__x86_rep_movsb_threshold` and should
have been using `__x86_shared_non_temporal_threshold`.
2. Avoid reloading __x86_shared_non_temporal_threshold before
the L(large_memcpy_4x) bounds check.
3. Document the second bounds check for L(large_memcpy_4x)
more clearly.
(cherry picked from commit 89a25c6f64)
The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed
by memmove-vec-unaligned-erms.
The lower-bound is needed because memmove-vec-unaligned-erms unrolls
the loop aggressively in the L(large_memset_4x) case.
The upper-bound is needed because memmove-vec-unaligned-erms
right-shifts the value of `x86_non_temporal_threshold` by
LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow.
The lack of lower-bound can be a correctness issue. The lack of
upper-bound cannot.
(cherry picked from commit b446822b6a)
This has been missing since the the ifuncs where added.
The performance of SSE4.2 is preferable to to SSE2.
Measured on Tigerlake with N = 20 runs.
Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906
(cherry picked from commit ff439c4717)