Commit Graph

38193 Commits

Author SHA1 Message Date
Wilco Dijkstra
a4c93ae6d5 AArch64: Remove Falkor memcpy
The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.

[1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 2f5524cc53)
2024-04-09 17:59:00 +01:00
Wilco Dijkstra
ff17116c1e AArch64: Add memset_zva64
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size.  Since the code is identical to
__memset_falkor, remove the latter.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 3d7090f14b)
2024-04-09 17:58:28 +01:00
Wilco Dijkstra
7e999181c2 AArch64: Cleanup emag memset
Cleanup emag memset - merge the memset_base64.S file, remove
the unused ZVA code (since it is disabled on emag).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 9627ab99b5)
2024-04-09 17:57:54 +01:00
Wilco Dijkstra
bfca39cce7 AArch64: Cleanup ifuncs
Cleanup ifuncs.  Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation.  Rename
strlen_mte to strlen_generic.  Remove rtld-memset.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 9fd3409842)
2024-04-09 17:57:15 +01:00
Wilco Dijkstra
3b79e57c1c AArch64: Add support for MOPS memcpy/memmove/memset
Add support for MOPS in cpu_features and INIT_ARCH.  Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 2bd0017988)
2024-04-09 17:39:07 +01:00
Joseph Myers
cb0012f076 Add HWCAP2_MOPS from Linux 6.5 to AArch64 bits/hwcap.h
Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS.  Add it to
glibc's bits/hwcap.h.

Tested with build-many-glibcs.py for aarch64-linux-gnu.

(cherry picked from commit ff5d2abd18)
2024-04-09 17:38:54 +01:00
Wilco Dijkstra
49f1bd6256 AArch64: Improve SVE memcpy and memmove
Improve SVE memcpy by copying 2 vectors if the size is small enough.
This improves performance of random memcpy by ~9% on Neoverse V1, and
33-64 byte copies are ~16% faster.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit d2d3f3720c)
2024-04-09 17:38:05 +01:00
Wilco Dijkstra
3972a306be AArch64: Improve strrchr
Use shrn for narrowing the mask which simplifies code and speeds up small
strings.  Unroll the first search loop to improve performance on large
strings.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 55599d4804)
2024-04-09 17:37:56 +01:00
Wilco Dijkstra
34bf1b598e AArch64: Optimize strnlen
Optimize strnlen using the shrn instruction and improve the main loop.
Small strings are around 10% faster, large strings are 40% faster on
modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit ad098893ba)
2024-04-09 17:37:47 +01:00
Wilco Dijkstra
07666811d4 AArch64: Optimize strlen
Optimize strlen by unrolling the main loop.  Large strings are 64% faster on
modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 03c8ce5000)
2024-04-09 17:37:38 +01:00
Wilco Dijkstra
20a2b9dc71 AArch64: Optimize strcpy
Unroll the main loop.  Large strings are around 20% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 349e48c01e)
2024-04-09 17:37:24 +01:00
Wilco Dijkstra
b2cf48dd84 AArch64: Improve strchrnul
Unroll the main loop, which improves performance slightly.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 09ebd8549b)
2024-04-09 17:37:15 +01:00
Wilco Dijkstra
979cfc9dc3 AArch64: Optimize strchr
Simplify calculation of the mask using shrn.  Unroll the main loop.
Small strings are 20% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 51541a2297)
2024-04-09 17:37:05 +01:00
Wilco Dijkstra
90255d909c AArch64: Improve strlen_asimd
Use shrn for the mask, merge tst+bne into cbnz, and tweak code alignment.
Performance improves slightly as a result.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 1bbb1a2022)
2024-04-09 17:36:54 +01:00
Wilco Dijkstra
3474ab871b AArch64: Optimize memrchr
Optimize the main loop - large strings are 43% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 0077624177)
2024-04-09 17:36:45 +01:00
Wilco Dijkstra
a68b7934a2 AArch64: Optimize memchr
Optimize the main loop - large strings are 40% faster on modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit ce758d4f06)
2024-04-09 17:36:35 +01:00
Wilco Dijkstra
f44f5957ba aarch64: Use memcpy_simd as the default memcpy
Since __memcpy_simd is the fastest memcpy on almost all cores, replace
the generic memcpy with it.  If SVE is available, a SVE memcpy will be
used by default (including for Neoverse N2).

(cherry picked from commit e6f3fe362f)
2024-04-09 17:36:09 +01:00
Wilco Dijkstra
88ff6835da aarch64: Cleanup memset ifunc
Cleanup memset ifunc selectors. The A64FX memset relies on a ZVA size of
256, so add an explicit check.

(cherry picked from commit a8e72913fe)
2024-04-09 17:34:30 +01:00
Wilco Dijkstra
41c34cc7c7 AArch64: Fix typo in sve configure check (BZ# 29394)
Fix a typo in the SVE configure check. This fixes [BZ# 29394].

(cherry picked from commit 12182ba18d)
2024-04-09 17:34:20 +01:00
Danila Kutenin
c4f4b53eee aarch64: Optimize string functions with shrn instruction
We found that string functions were using AND+ADDP
to find the nibble/syndrome mask but there is an easier
opportunity through `SHRN dst.8b, src.8h, 4` (shift
right every 2 bytes by 4 and narrow to 1 byte) and has
same latency on all SIMD ARMv8 targets as ADDP. There
are also possible gaps for memcmp but that's for
another patch.

We see 10-20% savings for small-mid size cases (<=128)
which are primary cases for general workloads.

(cherry picked from commit 3c99806989)
2024-04-09 17:34:07 +01:00
Wilco Dijkstra
3393c72eb0 AArch64: Sort makefile entries
Sort makefile entries to reduce conflicts.

(cherry picked from commit eea282d9c6)
2024-04-09 17:33:54 +01:00
Wilco Dijkstra
45a13a518a AArch64: Add SVE memcpy
Add an initial SVE memcpy implementation.  Copies up to 32 bytes use SVE
vectors which improves the random memcpy benchmark significantly.
Cleanup the memcpy and memmove ifunc selectors.

(cherry picked from commit 9f298bfe1f)
2024-04-09 17:33:25 +01:00
Wilco Dijkstra
82dee12d3b AArch64: Optimize memcmp
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit b51eb35c57)
2024-04-09 17:32:53 +01:00
Adhemerval Zanella
0f82b26a8d Revert "elf: Fix wrong break removal from 8ee878592c"
(2.34 does not have 8ee878592c)

This reverts commit 62ac4b09c4.
2024-02-12 10:46:34 -03:00
Adhemerval Zanella
3fcb410465 malloc: Use __get_nprocs on arena_get2 (BZ 30945)
This restore the 2.33 semantic for arena_get2.  It was changed by
11a02b035b to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2).  The
__get_nproc was refactored over then and now it also avoid to call
malloc.

The 11a02b035b did not take in consideration any performance
implication, which should have been discussed properly.  The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.

Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>

(cherry picked from commit 472894d2cf)
2024-02-12 10:20:44 -03:00
Adhemerval Zanella
62ac4b09c4 elf: Fix wrong break removal from 8ee878592c
Reported-by: Alexander Monakov <amonakov@ispras.ru>
(cherry picked from commit 546a1ba664)
2024-02-05 09:55:09 -03:00
Sunil K Pandey
a08677d389 x86_64: Optimize ffsll function code size.
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory.  Ffsll function code size is 17 bytes.  Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory.  When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line.  When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.

Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.

This patch fixes ffsll function random performance regression.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9d94997b5f)
2024-01-31 18:54:26 -08:00
H.J. Lu
43ac0f94f1 NEWS: Mention bug fixes for 29039/30745/30843 2023-12-23 09:09:18 -08:00
H.J. Lu
2143fcd540 x86-64: Fix the tcb field load for x32 [BZ #31185]
_dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer
via the tcb field in TCB:

_dl_tlsdesc_undefweak:
        _CET_ENDBR
        movq    8(%rax), %rax
        subq    %fs:0, %rax
        ret

_dl_tlsdesc_dynamic:
	...
        subq    %fs:0, %rax
        movq    -8(%rsp), %rdi
        ret

Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location,
not 64-bit. It should use "sub %fs:0, %RAX_LP" instead.  Since
_dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic
returns void *, RAX_LP is appropriate here for x32 and x86-64.  This
fixes BZ #31185.

(cherry picked from commit 81be2a61da)
2023-12-23 09:05:42 -08:00
H.J. Lu
ba52b325c4 x86-64: Fix the dtv field load for x32 [BZ #31184]
On x32, I got

FAIL: elf/tst-tlsgap

$ gdb elf/tst-tlsgap
...
open tst-tlsgap-mod1.so

Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2268754]
_dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108
108		movq	(%rsi), %rax
(gdb) p/x $rsi
$4 = 0xf7dbf9005655fb18
(gdb)

This is caused by

_dl_tlsdesc_dynamic:
        _CET_ENDBR
        /* Preserve call-clobbered registers that we modify.
           We need two scratch regs anyway.  */
        movq    %rsi, -16(%rsp)
        movq    %fs:DTV_OFFSET, %rsi

Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit
location, not 64-bit.  Load the dtv field to RSI_LP instead of rsi.
This fixes BZ #31184.

(cherry picked from commit 3502440397)
2023-12-23 09:04:14 -08:00
Hector Martin
f95fe70608 elf: Fix TLS modid reuse generation assignment (BZ 29039)
_dl_assign_tls_modid() assigns a slotinfo entry for a new module, but
does *not* do anything to the generation counter. The first time this
happens, the generation is zero and map_generation() returns the current
generation to be used during relocation processing. However, if
a slotinfo entry is later reused, it will already have a generation
assigned. If this generation has fallen behind the current global max
generation, then this causes an obsolete generation to be assigned
during relocation processing, as map_generation() returns this
generation if nonzero. _dl_add_to_slotinfo() eventually resets the
generation, but by then it is too late. This causes DTV updates to be
skipped, leading to NULL or broken TLS slot pointers and segfaults.

Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
so it behaves the same as the first time a slot is assigned.
_dl_add_to_slotinfo() will still assign the correct static generation
later during module load, but relocation processing will no longer use
an obsolete generation.

Note that slotinfo entry (aka modid) reuse typically happens after a
dlclose and only TLS access via dynamic tlsdesc is affected. Because
tlsdesc is optimized to use the optional part of static TLS, dynamic
tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls
tunable to a large enough value, or by LD_PRELOAD-ing the affected
modules.

Fixes bug 29039.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 3921c5b40f)
2023-12-22 14:42:53 +00:00
Siddhesh Poyarekar
dcc367f148 tunables: Terminate if end of input is reached (CVE-2023-4911)
The string parsing routine may end up writing beyond bounds of tunestr
if the input tunable string is malformed, of the form name=name=val.
This gets processed twice, first as name=name=val and next as name=val,
resulting in tunestr being name=name=val:name=val, thus overflowing
tunestr.

Terminate the parsing loop at the first instance itself so that tunestr
does not overflow.

This also fixes up tst-env-setuid-tunables to actually handle failures
correct and add new tests to validate the fix for this CVE.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1056e5b4c3)
2023-10-02 15:44:25 -04:00
Siddhesh Poyarekar
c3b99f8328 Document CVE-2023-4806 and CVE-2023-5156 in NEWS
These are tracked in BZ #30884 and BZ #30843.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit fd134feba3)
2023-09-26 15:40:06 -04:00
Romain Geissler
8006457ab7 Fix leak in getaddrinfo introduced by the fix for CVE-2023-4806 [BZ #30843]
This patch fixes a very recently added leak in getaddrinfo.

This was assigned CVE-2023-5156.

Resolves: BZ #30884
Related: BZ #30842

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ec6b95c330)
2023-09-26 15:38:49 -04:00
Siddhesh Poyarekar
e09ee267c0 getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806)
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.

The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query.  For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup.  In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name.  This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.

Fix this by copying h_name over and freeing it at the end.  This
resolves BZ #30843, which is assigned CVE-2023-4806.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 973fe93a56)
2023-09-15 20:00:04 -04:00
Siddhesh Poyarekar
cc4544ef80 gethosts: Return EAI_MEMORY on allocation failure
All other cases of failures due to lack of memory return EAI_MEMORY, so
it seems wrong to return EAI_SYSTEM here.  The only reason
convert_hostent_to_gaih_addrtuple could fail is on calloc failure.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit b587456c0e)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
92478a808f gaih_inet: Split result generation into its own function
Simplify the loop a wee bit and clean up variable names too.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit ac4653ef50)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
6e3fed9d20 gaih_inet: split loopback lookup into its own function
Flatten the condition nesting and replace the alloca for RET.AT/ATR with
a single array LOCAL_AT[2].  This gets rid of alloca and alloca
accounting.

`git diff -b` is probably the best way to view this change since much of
the diff is whitespace changes.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 657472b2a5)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
4d59769087 gaih_inet: make gethosts into a function
The macro is quite a pain to debug, so make gethosts into a function to
make it easier to maintain.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit cfa3bd48cb)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
ec71cb9611 gaih_inet: separate nss lookup loop into its own function
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 906cecbe08)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
5914a1d55b gaih_inet: Split nscd lookup code into its own function.
Add a new member got_ipv6 to indicate if the results have an IPv6
result and use it instead of the local got_ipv6.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit e7e5315b7f)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
3b5a3e5009 gaih_inet: Split simple gethostbyname into its own function
Add a free_at flag in gaih_result to indicate if res.at needs to be
freed by the caller.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit b44389cb7f)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
922f2614d6 gaih_inet: make numeric lookup a separate routine
Introduce the gaih_result structure and general paradigm for cleanups
that follow to process the lookup request and return a result.  A lookup
function (like text_to_binary_address), should return an integer error
code and set members of gaih_result based on what it finds.  If the
function does not have a result and no errors have occurred during the
lookup, it should return 0 and res.at should be set to NULL, allowing a
subsequent function to do the lookup until we run out of options.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 26dea46119)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
e05e5889b8 gaih_inet: Simplify service resolution
Refactor the code to split out the service resolution code into a
separate function.  Allocate the service tuples array just once to the
size of the typeproto array, thus avoiding the unnecessary pointer
chasing and stack allocations.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8d6cf99f2f)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
f7efb43738 getaddrinfo: Fix leak with AI_ALL [BZ #28852]
Use realloc in convert_hostent_to_gaih_addrtuple and fix up pointers in
the result list so that a single block is maintained for
hostbyname3_r/hostbyname2_r and freed in gaih_inet.  This result is
never merged with any other results, since the hosts database does not
permit merging.

Resolves BZ #28852.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 3004604607)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
b195fd86c6 gaih_inet: Simplify canon name resolution
Simplify logic for allocation of canon to remove the canonbuf variable;
canon now always points to an allocated block.  Also pull the canon name
set into a separate function.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit d01411f6bc)
2023-09-15 19:53:24 -04:00
Siddhesh Poyarekar
01671608a3 gethosts: Remove unused argument _type
The generated code is unchanged.

(cherry picked from commit b17e842a60)
2023-09-15 19:53:18 -04:00
Siddhesh Poyarekar
51948fdf0f nss: Sort tests and tests-container and put one test per line
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit e2f68b54e8)
2023-09-15 19:52:17 -04:00
Siddhesh Poyarekar
228cdb00a0 Simplify allocations and fix merge and continue actions [BZ #28931]
Allocations for address tuples is currently a bit confusing because of
the pointer chasing through PAT, making it hard to observe the sequence
in which allocations have been made.  Narrow scope of the pointer
chasing through PAT so that it is only used where necessary.

This also tightens actions behaviour with the hosts database in
getaddrinfo to comply with the manual text.  The "continue" action
discards previous results and the "merge" action results in an immedate
lookup failure.  Consequently, chaining of allocations across modules is
no longer necessary, thus opening up cleanup opportunities.

A test has been added that checks some combinations to ensure that they
work correctly.

Resolves: BZ #28931

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 1c37b8022e)
2023-09-14 22:46:01 -04:00
Noah Goldstein
7a6b1f06e7 x86: Fix incorrect scope of setting shared_per_thread [BZ# 30745]
The:

```
    if (shared_per_thread > 0 && threads > 0)
      shared_per_thread /= threads;
```

Code was accidentally moved to inside the else scope.  This doesn't
match how it was previously (before af992e7abd).

This patch fixes that by putting the division after the `else` block.

(cherry picked from commit 084fb31bc2)
2023-09-05 17:17:52 -05:00