These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9abdae94c7)
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 4d4da5aab9)
This addresses more (correct) use-after-free warnings reported by
GCC 12 on some targets.
Fixes commit c094c232eb ("Avoid
-Wuse-after-free in tests [BZ #26779].").
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit d653fd2d9e)
This avoids potential memory corruption when the underlying NSS
callback function does not use the buffer space to store all strings
(e.g., for constant strings).
Instead of custom buffer management, two scratch buffers are used.
This increases stack usage somewhat.
Scratch buffer allocation failure is handled by return -1
(an invalid timeout value) instead of terminating the process.
This fixes bug 31679.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit c04a21e050)
The addgetnetgrentX call in addinnetgrX may have failed to produce
a result, so the result variable in addinnetgrX can be NULL.
Use db->negtimeout as the fallback value if there is no result data;
the timeout is also overwritten below.
Also avoid sending a second not-found response. (The client
disconnects after receiving the first response, so the data stream did
not go out of sync even without this fix.) It is still beneficial to
add the negative response to the mapping, so that the client can get
it from there in the future, instead of going through the socket.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit b048a482f0)
If we failed to add a not-found response to the cache, the dataset
point can be null, resulting in a null pointer dereference.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 7835b00dbc)
Using alloca matches what other caches do. The request length is
bounded by MAXKEYLEN.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 87801a8fd0)
ISO-2022-CN-EXT uses escape sequences to indicate character set changes
(as specified by RFC 1922). While the SOdesignation has the expected
bounds checks, neither SS2designation nor SS3designation have its;
allowing a write overflow of 1, 2, or 3 bytes with fixed values:
'$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'.
Checked on aarch64-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit f9dc609e06)
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
(cherry picked from commit 14e56bd4ce)
Old Linux kernels disable SVE after every system call. Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched. Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.
Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions. If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.
Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2e94e2f5d2)
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.
The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 73c26018ed)
The .cfi_return_column directive changes the return column for the whole
FDE range. But the actual intent is to tell the unwinder that the value
in x30 (lr) now resides in x15 after the move, and that is expressed by
the .cfi_register directive.
(cherry picked from commit 3f79842788)
The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.
[1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 2f5524cc53)
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 3d7090f14b)
Cleanup emag memset - merge the memset_base64.S file, remove
the unused ZVA code (since it is disabled on emag).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 9627ab99b5)
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename
strlen_mte to strlen_generic. Remove rtld-memset.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 9fd3409842)
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2bd0017988)
Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS. Add it to
glibc's bits/hwcap.h.
Tested with build-many-glibcs.py for aarch64-linux-gnu.
(cherry picked from commit ff5d2abd18)
Improve SVE memcpy by copying 2 vectors if the size is small enough.
This improves performance of random memcpy by ~9% on Neoverse V1, and
33-64 byte copies are ~16% faster.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit d2d3f3720c)
Use shrn for narrowing the mask which simplifies code and speeds up small
strings. Unroll the first search loop to improve performance on large
strings.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 55599d4804)
Optimize strnlen using the shrn instruction and improve the main loop.
Small strings are around 10% faster, large strings are 40% faster on
modern CPUs.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit ad098893ba)
Optimize strlen by unrolling the main loop. Large strings are 64% faster on
modern CPUs.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 03c8ce5000)
Unroll the main loop. Large strings are around 20% faster on modern CPUs.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 349e48c01e)
Simplify calculation of the mask using shrn. Unroll the main loop.
Small strings are 20% faster on modern CPUs.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 51541a2297)
Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment.
Performance improves slightly as a result.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 1bbb1a2022)
Optimize the main loop - large strings are 43% faster on modern CPUs.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 0077624177)
Optimize the main loop - large strings are 40% faster on modern CPUs.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit ce758d4f06)
Since __memcpy_simd is the fastest memcpy on almost all cores, replace
the generic memcpy with it. If SVE is available, a SVE memcpy will be
used by default (including for Neoverse N2).
(cherry picked from commit e6f3fe362f)
We found that string functions were using AND+ADDP
to find the nibble/syndrome mask but there is an easier
opportunity through `SHRN dst.8b, src.8h, 4` (shift
right every 2 bytes by 4 and narrow to 1 byte) and has
same latency on all SIMD ARMv8 targets as ADDP. There
are also possible gaps for memcmp but that's for
another patch.
We see 10-20% savings for small-mid size cases (<=128)
which are primary cases for general workloads.
(cherry picked from commit 3c99806989)
Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE
vectors which improves the random memcpy benchmark significantly.
Cleanup the memcpy and memmove ifunc selectors.
(cherry picked from commit 9f298bfe1f)
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit b51eb35c57)
This restore the 2.33 semantic for arena_get2. It was changed by
11a02b035b to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2). The
__get_nproc was refactored over then and now it also avoid to call
malloc.
The 11a02b035b did not take in consideration any performance
implication, which should have been discussed properly. The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 472894d2cf)
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory. Ffsll function code size is 17 bytes. Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line. When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.
Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.
This patch fixes ffsll function random performance regression.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9d94997b5f)
_dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer
via the tcb field in TCB:
_dl_tlsdesc_undefweak:
_CET_ENDBR
movq 8(%rax), %rax
subq %fs:0, %rax
ret
_dl_tlsdesc_dynamic:
...
subq %fs:0, %rax
movq -8(%rsp), %rdi
ret
Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location,
not 64-bit. It should use "sub %fs:0, %RAX_LP" instead. Since
_dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic
returns void *, RAX_LP is appropriate here for x32 and x86-64. This
fixes BZ #31185.
(cherry picked from commit 81be2a61da)
On x32, I got
FAIL: elf/tst-tlsgap
$ gdb elf/tst-tlsgap
...
open tst-tlsgap-mod1.so
Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2268754]
_dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108
108 movq (%rsi), %rax
(gdb) p/x $rsi
$4 = 0xf7dbf9005655fb18
(gdb)
This is caused by
_dl_tlsdesc_dynamic:
_CET_ENDBR
/* Preserve call-clobbered registers that we modify.
We need two scratch regs anyway. */
movq %rsi, -16(%rsp)
movq %fs:DTV_OFFSET, %rsi
Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit
location, not 64-bit. Load the dtv field to RSI_LP instead of rsi.
This fixes BZ #31184.
(cherry picked from commit 3502440397)
_dl_assign_tls_modid() assigns a slotinfo entry for a new module, but
does *not* do anything to the generation counter. The first time this
happens, the generation is zero and map_generation() returns the current
generation to be used during relocation processing. However, if
a slotinfo entry is later reused, it will already have a generation
assigned. If this generation has fallen behind the current global max
generation, then this causes an obsolete generation to be assigned
during relocation processing, as map_generation() returns this
generation if nonzero. _dl_add_to_slotinfo() eventually resets the
generation, but by then it is too late. This causes DTV updates to be
skipped, leading to NULL or broken TLS slot pointers and segfaults.
Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
so it behaves the same as the first time a slot is assigned.
_dl_add_to_slotinfo() will still assign the correct static generation
later during module load, but relocation processing will no longer use
an obsolete generation.
Note that slotinfo entry (aka modid) reuse typically happens after a
dlclose and only TLS access via dynamic tlsdesc is affected. Because
tlsdesc is optimized to use the optional part of static TLS, dynamic
tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls
tunable to a large enough value, or by LD_PRELOAD-ing the affected
modules.
Fixes bug 29039.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 3921c5b40f)
The string parsing routine may end up writing beyond bounds of tunestr
if the input tunable string is malformed, of the form name=name=val.
This gets processed twice, first as name=name=val and next as name=val,
resulting in tunestr being name=name=val:name=val, thus overflowing
tunestr.
Terminate the parsing loop at the first instance itself so that tunestr
does not overflow.
This also fixes up tst-env-setuid-tunables to actually handle failures
correct and add new tests to validate the fix for this CVE.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1056e5b4c3)
This patch fixes a very recently added leak in getaddrinfo.
This was assigned CVE-2023-5156.
Resolves: BZ #30884
Related: BZ #30842
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ec6b95c330)
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.
The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query. For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup. In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name. This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.
Fix this by copying h_name over and freeing it at the end. This
resolves BZ #30843, which is assigned CVE-2023-4806.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 973fe93a56)
All other cases of failures due to lack of memory return EAI_MEMORY, so
it seems wrong to return EAI_SYSTEM here. The only reason
convert_hostent_to_gaih_addrtuple could fail is on calloc failure.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit b587456c0e)
Simplify the loop a wee bit and clean up variable names too.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit ac4653ef50)