Commit Graph

38690 Commits

Author SHA1 Message Date
Sunil K Pandey
1a3326df93 x86_64: Optimize ffsll function code size.
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory.  Ffsll function code size is 17 bytes.  Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory.  When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line.  When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.

Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.

This patch fixes ffsll function random performance regression.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9d94997b5f)
2024-01-31 18:53:26 -08:00
H.J. Lu
914af4fcca NEWS: Mention bug fixes for 29039/30745/30843 2023-12-23 08:00:37 -08:00
H.J. Lu
5d1fe26b49 x86-64: Fix the tcb field load for x32 [BZ #31185]
_dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer
via the tcb field in TCB:

_dl_tlsdesc_undefweak:
        _CET_ENDBR
        movq    8(%rax), %rax
        subq    %fs:0, %rax
        ret

_dl_tlsdesc_dynamic:
	...
        subq    %fs:0, %rax
        movq    -8(%rsp), %rdi
        ret

Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location,
not 64-bit. It should use "sub %fs:0, %RAX_LP" instead.  Since
_dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic
returns void *, RAX_LP is appropriate here for x32 and x86-64.  This
fixes BZ #31185.

(cherry picked from commit 81be2a61da)
2023-12-23 07:50:29 -08:00
H.J. Lu
2d87262c1c x86-64: Fix the dtv field load for x32 [BZ #31184]
On x32, I got

FAIL: elf/tst-tlsgap

$ gdb elf/tst-tlsgap
...
open tst-tlsgap-mod1.so

Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2268754]
_dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108
108		movq	(%rsi), %rax
(gdb) p/x $rsi
$4 = 0xf7dbf9005655fb18
(gdb)

This is caused by

_dl_tlsdesc_dynamic:
        _CET_ENDBR
        /* Preserve call-clobbered registers that we modify.
           We need two scratch regs anyway.  */
        movq    %rsi, -16(%rsp)
        movq    %fs:DTV_OFFSET, %rsi

Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit
location, not 64-bit.  Load the dtv field to RSI_LP instead of rsi.
This fixes BZ #31184.

(cherry picked from commit 3502440397)
2023-12-23 07:49:23 -08:00
Hector Martin
5f08ec08d0 elf: Fix TLS modid reuse generation assignment (BZ 29039)
_dl_assign_tls_modid() assigns a slotinfo entry for a new module, but
does *not* do anything to the generation counter. The first time this
happens, the generation is zero and map_generation() returns the current
generation to be used during relocation processing. However, if
a slotinfo entry is later reused, it will already have a generation
assigned. If this generation has fallen behind the current global max
generation, then this causes an obsolete generation to be assigned
during relocation processing, as map_generation() returns this
generation if nonzero. _dl_add_to_slotinfo() eventually resets the
generation, but by then it is too late. This causes DTV updates to be
skipped, leading to NULL or broken TLS slot pointers and segfaults.

Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
so it behaves the same as the first time a slot is assigned.
_dl_add_to_slotinfo() will still assign the correct static generation
later during module load, but relocation processing will no longer use
an obsolete generation.

Note that slotinfo entry (aka modid) reuse typically happens after a
dlclose and only TLS access via dynamic tlsdesc is affected. Because
tlsdesc is optimized to use the optional part of static TLS, dynamic
tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls
tunable to a large enough value, or by LD_PRELOAD-ing the affected
modules.

Fixes bug 29039.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 3921c5b40f)
2023-12-22 14:39:29 +00:00
Florian Weimer
01ea8d9dde Revert "elf: Move l_init_called_next to old place of l_text_end in link map"
This reverts commit 59ee83b0c2.

Reason for revert: Preserve internal ABI.
2023-10-19 09:32:49 +02:00
Florian Weimer
0222f2392d Revert "elf: Always call destructors in reverse constructor order (bug 30785)"
This reverts commit 02a67e102f.

Reason for revert: Incompatibility with existing applications.
2023-10-18 14:32:54 +02:00
Florian Weimer
6aa8380cf5 Revert "elf: Remove unused l_text_end field from struct link_map"
This reverts commit 34b07bdbdd.

Reason for revert: Preserve ABI after revert of commit 02a67e102f.
2023-10-18 14:32:12 +02:00
Siddhesh Poyarekar
c84018a05a tunables: Terminate if end of input is reached (CVE-2023-4911)
The string parsing routine may end up writing beyond bounds of tunestr
if the input tunable string is malformed, of the form name=name=val.
This gets processed twice, first as name=name=val and next as name=val,
resulting in tunestr being name=name=val:name=val, thus overflowing
tunestr.

Terminate the parsing loop at the first instance itself so that tunestr
does not overflow.

This also fixes up tst-env-setuid-tunables to actually handle failures
correct and add new tests to validate the fix for this CVE.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1056e5b4c3)
2023-10-02 15:41:55 -04:00
Siddhesh Poyarekar
73d4ce728a Document CVE-2023-4806 and CVE-2023-5156 in NEWS
These are tracked in BZ #30884 and BZ #30843.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit fd134feba3)
2023-09-26 15:36:54 -04:00
Romain Geissler
17092c0311 Fix leak in getaddrinfo introduced by the fix for CVE-2023-4806 [BZ #30843]
This patch fixes a very recently added leak in getaddrinfo.

This was assigned CVE-2023-5156.

Resolves: BZ #30884
Related: BZ #30842

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ec6b95c330)
2023-09-26 15:34:45 -04:00
Aurelien Jarno
762a747fae io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64
Commit 5f828ff824 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.

Fix that by also checking that define, restoring the pre
4d0fe291ae commit values:

Default values:
- F_GETLK: 5
- F_SETLK: 6
- F_SETLKW: 7

With -D_FILE_OFFSET_BITS=64:
- F_GETLK: 12
- F_SETLK: 13
- F_SETLKW: 14

At the same time, it has been noticed that there was no test for io lock
with __USE_FILE_OFFSET64, so just add one.

Tested on x86_64-linux-gnu, i686-linux-gnu and
powerpc64le-unknown-linux-gnu.

Resolves: BZ #30804.
Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit 434bf72a94)
2023-09-16 10:59:23 +02:00
Siddhesh Poyarekar
e3ccb230a9 getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806)
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.

The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query.  For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup.  In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name.  This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.

Fix this by copying h_name over and freeing it at the end.  This
resolves BZ #30843, which is assigned CVE-2023-4806.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 973fe93a56)
2023-09-15 18:32:43 -04:00
Siddhesh Poyarekar
1b9087dcec gethosts: Return EAI_MEMORY on allocation failure
All other cases of failures due to lack of memory return EAI_MEMORY, so
it seems wrong to return EAI_SYSTEM here.  The only reason
convert_hostent_to_gaih_addrtuple could fail is on calloc failure.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit b587456c0e)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
f5f88f142a gaih_inet: Split result generation into its own function
Simplify the loop a wee bit and clean up variable names too.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit ac4653ef50)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
a6da106892 gaih_inet: split loopback lookup into its own function
Flatten the condition nesting and replace the alloca for RET.AT/ATR with
a single array LOCAL_AT[2].  This gets rid of alloca and alloca
accounting.

`git diff -b` is probably the best way to view this change since much of
the diff is whitespace changes.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 657472b2a5)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
8b70d97b08 gaih_inet: make gethosts into a function
The macro is quite a pain to debug, so make gethosts into a function to
make it easier to maintain.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit cfa3bd48cb)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
9098deb96a gaih_inet: separate nss lookup loop into its own function
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 906cecbe08)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
ce64e72b7d gaih_inet: Split nscd lookup code into its own function.
Add a new member got_ipv6 to indicate if the results have an IPv6
result and use it instead of the local got_ipv6.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit e7e5315b7f)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
4897bf7968 gaih_inet: Split simple gethostbyname into its own function
Add a free_at flag in gaih_result to indicate if res.at needs to be
freed by the caller.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit b44389cb7f)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
571c531b3b gaih_inet: make numeric lookup a separate routine
Introduce the gaih_result structure and general paradigm for cleanups
that follow to process the lookup request and return a result.  A lookup
function (like text_to_binary_address), should return an integer error
code and set members of gaih_result based on what it finds.  If the
function does not have a result and no errors have occurred during the
lookup, it should return 0 and res.at should be set to NULL, allowing a
subsequent function to do the lookup until we run out of options.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 26dea46119)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
9aad91abe6 gaih_inet: Simplify service resolution
Refactor the code to split out the service resolution code into a
separate function.  Allocate the service tuples array just once to the
size of the typeproto array, thus avoiding the unnecessary pointer
chasing and stack allocations.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 8d6cf99f2f)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
d02808dee9 getaddrinfo: Fix leak with AI_ALL [BZ #28852]
Use realloc in convert_hostent_to_gaih_addrtuple and fix up pointers in
the result list so that a single block is maintained for
hostbyname3_r/hostbyname2_r and freed in gaih_inet.  This result is
never merged with any other results, since the hosts database does not
permit merging.

Resolves BZ #28852.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 3004604607)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
f366eaa608 gaih_inet: Simplify canon name resolution
Simplify logic for allocation of canon to remove the canonbuf variable;
canon now always points to an allocated block.  Also pull the canon name
set into a separate function.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit d01411f6bc)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
b126325fc7 nss: Sort tests and tests-container and put one test per line
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit e2f68b54e8)
2023-09-15 18:32:39 -04:00
Siddhesh Poyarekar
6e867146ee Simplify allocations and fix merge and continue actions [BZ #28931]
Allocations for address tuples is currently a bit confusing because of
the pointer chasing through PAT, making it hard to observe the sequence
in which allocations have been made.  Narrow scope of the pointer
chasing through PAT so that it is only used where necessary.

This also tightens actions behaviour with the hosts database in
getaddrinfo to comply with the manual text.  The "continue" action
discards previous results and the "merge" action results in an immedate
lookup failure.  Consequently, chaining of allocations across modules is
no longer necessary, thus opening up cleanup opportunities.

A test has been added that checks some combinations to ensure that they
work correctly.

Resolves: BZ #28931

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 1c37b8022e)
2023-09-14 14:32:44 -04:00
Florian Weimer
59ee83b0c2 elf: Move l_init_called_next to old place of l_text_end in link map
This preserves all member offsets and the GLIBC_PRIVATE ABI
for backporting.
2023-09-11 09:38:01 +02:00
Florian Weimer
34b07bdbdd elf: Remove unused l_text_end field from struct link_map
It is a left-over from commit 52a01100ad
("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]").

When backporting commmit 6985865bc3
("elf: Always call destructors in reverse constructor order
(bug 30785)"), we can move the l_init_called_next field to this
place, so that the internal GLIBC_PRIVATE ABI does not change.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 53df2ce688)
2023-09-11 09:38:01 +02:00
Florian Weimer
02a67e102f elf: Always call destructors in reverse constructor order (bug 30785)
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors.  Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles.  (The force_first handling in
_dl_sort_maps is not effective for dlclose.)  After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.

A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.

In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated.  (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.

After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order.  These destructors
can call dlopen.  Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them.  Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)

After _dl_init_called_list traversal, two more loops follow.  The
processing order changes to the original link map order in the
namespace.  Previously, dependency order was used.  The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.

The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list.  The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.

tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications.  The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.

The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.

There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.

Fixes commit 1df71d32fe ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 6985865bc3)
2023-09-11 09:37:59 +02:00
Florian Weimer
aeea91fd15 elf: Do not run constructors for proxy objects
Otherwise, the ld.so constructor runs for each audit namespace
and each dlmopen namespace.

(cherry picked from commit f6c8204fd7)
2023-09-11 09:37:40 +02:00
Florian Weimer
1d828d5855 elf: Introduce to _dl_call_fini
This consolidates the destructor invocations from _dl_fini and
dlclose.  Remove the micro-optimization that avoids
calling _dl_call_fini if they are no destructors (as dlclose is quite
expensive anyway).  The debug log message is now printed
unconditionally.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-11 09:37:40 +02:00
Noah Goldstein
561e9dadc0 x86: Fix incorrect scope of setting shared_per_thread [BZ# 30745]
The:

```
    if (shared_per_thread > 0 && threads > 0)
      shared_per_thread /= threads;
```

Code was accidentally moved to inside the else scope.  This doesn't
match how it was previously (before af992e7abd).

This patch fixes that by putting the division after the `else` block.

(cherry picked from commit 084fb31bc2)
2023-09-05 17:17:25 -05:00
Noah Goldstein
1c3ecf5858 x86: Use 3/4*sizeof(per-thread-L3) as low bound for NT threshold.
On some machines we end up with incomplete cache information. This can
make the new calculation of `sizeof(total-L3)/custom-divisor` end up
lower than intended (and lower than the prior value). So reintroduce
the old bound as a lower bound to avoid potentially regressing code
where we don't have complete information to make the decision.
Reviewed-by: DJ Delorie <dj@redhat.com>

(cherry picked from commit 8b9a0af8ca)
2023-09-05 17:17:25 -05:00
Noah Goldstein
47c7d2eb03 x86: Fix slight bug in shared_per_thread cache size calculation.
After:
```
    commit af992e7abd
    Author: Noah Goldstein <goldstein.w.n@gmail.com>
    Date:   Wed Jun 7 13:18:01 2023 -0500

        x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`
```

Split `shared` (cumulative cache size) from `shared_per_thread` (cache
size per socket), the `shared_per_thread` *can* be slightly off from
the previous calculation.

Previously we added `core` even if `threads_l2` was invalid, and only
used `threads_l2` to divide `core` if it was present. The changed
version only included `core` if `threads_l2` was valid.

This change restores the old behavior if `threads_l2` is invalid by
adding the entire value of `core`.
Reviewed-by: DJ Delorie <dj@redhat.com>

(cherry picked from commit 47f7472178)
2023-09-05 17:17:25 -05:00
Noah Goldstein
d1b1da26ea x86: Increase non_temporal_threshold to roughly sizeof_L3 / 4
Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 /
ncores_per_socket'. This patch updates that value to roughly
'sizeof_L3 / 4`

The original value (specifically dividing the `ncores_per_socket`) was
done to limit the amount of other threads' data a `memcpy`/`memset`
could evict.

Dividing by 'ncores_per_socket', however leads to exceedingly low
non-temporal thresholds and leads to using non-temporal stores in
cases where REP MOVSB is multiple times faster.

Furthermore, non-temporal stores are written directly to main memory
so using it at a size much smaller than L3 can place soon to be
accessed data much further away than it otherwise could be. As well,
modern machines are able to detect streaming patterns (especially if
REP MOVSB is used) and provide LRU hints to the memory subsystem. This
in affect caps the total amount of eviction at 1/cache_associativity,
far below meaningfully thrashing the entire cache.

As best I can tell, the benchmarks that lead this small threshold
where done comparing non-temporal stores versus standard cacheable
stores. A better comparison (linked below) is to be REP MOVSB which,
on the measure systems, is nearly 2x faster than non-temporal stores
at the low-end of the previous threshold, and within 10% for over
100MB copies (well past even the current threshold). In cases with a
low number of threads competing for bandwidth, REP MOVSB is ~2x faster
up to `sizeof_L3`.

The divisor of `4` is a somewhat arbitrary value. From benchmarks it
seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs
such as Broadwell prefer something closer to `8`. This patch is meant
to be followed up by another one to make the divisor cpu-specific, but
in the meantime (and for easier backporting), this patch settles on
`4` as a middle-ground.

Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable
stores where done using:
https://github.com/goldsteinn/memcpy-nt-benchmarks

Sheets results (also available in pdf on the github):
https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

(cherry picked from commit af992e7abd)
2023-09-05 17:17:25 -05:00
Florian Weimer
e19af583b4 elf: _dl_find_object may return 1 during early startup (bug 30515)
Success is reported with a 0 return value, and failure is -1.
Enhance the kitchen sink test elf/tst-audit28 to cover
_dl_find_object as well.

Fixes commit 5d28a8962d ("elf: Add _dl_find_object function")
and bug 30515.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 1bcfe0f732)
2023-07-07 11:40:49 +02:00
Adhemerval Zanella
cbceb903c4 io: Fix F_GETLK, F_SETLK, and F_SETLKW for powerpc64
Different than other 64 bit architectures, powerpc64 defines the
LFS POSIX lock constants  with values similar to 32 ABI, which
are meant to be used with fcntl64 syscall.  Since powerpc64 kABI
does not have fcntl, the constants are adjusted with the
FCNTL_ADJUST_CMD macro.

The 4d0fe291ae changed the logic of generic constants
LFS value are equal to the default values; which is now wrong
for powerpc64.

Fix the value by explicit define the previous glibc constants
(powerpc64 does not need to use the 32 kABI value, but it simplifies
the FCNTL_ADJUST_CMD which should be kept as compatibility).

Checked on powerpc64-linux-gnu and powerpc-linux-gnu.

(cherry picked from commit 5f828ff824)
2023-05-31 15:31:44 -03:00
Adhemerval Zanella
0967fb5861 io: Fix record locking contants on 32 bit arch with 64 bit default time_t (BZ#30477)
For architecture with default 64 bit time_t support, the kernel
does not provide LFS and non-LFS values for F_GETLK, F_GETLK, and
F_GETLK (the default value used for 64 bit architecture are used).

This is might be considered an ABI break, but the currenct exported
values is bogus anyway.

The POSIX lockf is not affected since it is aliased to lockf64,
which already uses the LFS values.

Checked on i686-linux-gnu and the new tests on a riscv32.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 4d0fe291ae)
2023-05-30 09:31:18 -03:00
H.J. Lu
739de21d30 Document BZ #20975 fix 2023-05-23 16:46:00 -07:00
H.J. Lu
2b9906f9a0 __check_pf: Add a cancellation cleanup handler [BZ #20975]
There are reports for hang in __check_pf:

https://github.com/JoeDog/siege/issues/4

It is reproducible only under specific configurations:

1. Large number of cores (>= 64) and large number of threads (> 3X of
the number of cores) with long lived socket connection.
2. Low power (frequency) mode.
3. Power management is enabled.

While holding lock, __check_pf calls make_request which calls __sendto
and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
lock held by __check_pf won't be released and can cause deadlock when
thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
cleanup handler for __check_pf to unlock the lock when cancelled by
another thread.  This fixes BZ #20975 and the siege hang issue.

(cherry picked from commit a443bd3fb2)
2023-05-23 13:10:40 -07:00
Florian Weimer
7035f2174f gmon: Revert addition of tunables to preserve GLIBC_PRIVATE ABI
Otherwise, processes are likely to crash during concurrent updates
to a new glibc version on the stable release branch.

The test gmon/tst-mcount-overflow depends on those tunables, so
it has to be removed as well.
2023-04-28 18:13:51 +02:00
Simon Kissane
e698e8bd8e gmon: fix memory corruption issues [BZ# 30101]
V2 of this patch fixes an issue in V1, where the state was changed to ON not
OFF at end of _mcleanup. I hadn't noticed that (counterintuitively) ON=0 and
OFF=3, hence zeroing the buffer turned it back on. So set the state to OFF
after the memset.

1. Prevent double free, and reads from unallocated memory, when
   _mcleanup is (incorrectly) called two or more times in a row,
   without an intervening call to __monstartup; with this patch, the
   second and subsequent calls effectively become no-ops instead.
   While setting tos=NULL is minimal fix, safest action is to zero the
   whole gmonparam buffer.

2. Prevent memory leak when __monstartup is (incorrectly) called two
   or more times in a row, without an intervening call to _mcleanup;
   with this patch, the second and subsequent calls effectively become
   no-ops instead.

3. After _mcleanup, treat __moncontrol(1) as __moncontrol(0) instead.
   With zeroing of gmonparam buffer in _mcleanup, this stops the
   state incorrectly being changed to GMON_PROF_ON despite profiling
   actually being off. If we'd just done the minimal fix to _mcleanup
   of setting tos=NULL, there is risk of far worse memory corruption:
   kcount would point to deallocated memory, and the __profil syscall
   would make the kernel write profiling data into that memory,
   which could have since been reallocated to something unrelated.

4. Ensure __moncontrol(0) still turns off profiling even in error
   state. Otherwise, if mcount overflows and sets state to
   GMON_PROF_ERROR, when _mcleanup calls __moncontrol(0), the __profil
   syscall to disable profiling will not be invoked. _mcleanup will
   free the buffer, but the kernel will still be writing profiling
   data into it, potentially corrupted arbitrary memory.

Also adds a test case for (1). Issues (2)-(4) are not feasible to test.

Signed-off-by: Simon Kissane <skissane@gmail.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit bde1218720)
2023-04-28 16:06:43 +02:00
Simon Kissane
9f81b8fa65 gmon: improve mcount overflow handling [BZ# 27576]
When mcount overflows, no gmon.out file is generated, but no message is printed
to the user, leaving the user with no idea why, and thinking maybe there is
some bug - which is how BZ 27576 ended up being logged. Print a message to
stderr in this case so the user knows what is going on.

As a comment in sys/gmon.h acknowledges, the hardcoded MAXARCS value is too
small for some large applications, including the test case in that BZ. Rather
than increase it, add tunables to enable MINARCS and MAXARCS to be overridden
at runtime (glibc.gmon.minarcs and glibc.gmon.maxarcs). So if a user gets the
mcount overflow error, they can try increasing maxarcs (they might need to
increase minarcs too if the heuristic is wrong in their case.)

Note setting minarcs/maxarcs too large can cause monstartup to fail with an
out of memory error. If you set them large enough, it can cause an integer
overflow in calculating the buffer size. I haven't done anything to defend
against that - it would not generally be a security vulnerability, since these
tunables will be ignored in suid/sgid programs (due to the SXID_ERASE default),
and if you can set GLIBC_TUNABLES in the environment of a process, you can take
it over anyway (LD_PRELOAD, LD_LIBRARY_PATH, etc). I thought about modifying
the code of monstartup to defend against integer overflows, but doing so is
complicated, and I realise the existing code is susceptible to them even prior
to this change (e.g. try passing a pathologically large highpc argument to
monstartup), so I decided just to leave that possibility in-place.

Add a test case which demonstrates mcount overflow and the tunables.

Document the new tunables in the manual.

Signed-off-by: Simon Kissane <skissane@gmail.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 31be941e43)
2023-04-28 16:05:29 +02:00
Леонид Юрьев (Leonid Yuriev)
f2820e478c gmon: Fix allocated buffer overflow (bug 29444)
The `__monstartup()` allocates a buffer used to store all the data
accumulated by the monitor.

The size of this buffer depends on the size of the internal structures
used and the address range for which the monitor is activated, as well
as on the maximum density of call instructions and/or callable functions
that could be potentially on a segment of executable code.

In particular a hash table of arcs is placed at the end of this buffer.
The size of this hash table is calculated in bytes as
   p->fromssize = p->textsize / HASHFRACTION;

but actually should be
   p->fromssize = ROUNDUP(p->textsize / HASHFRACTION, sizeof(*p->froms));

This results in writing beyond the end of the allocated buffer when an
added arc corresponds to a call near from the end of the monitored
address range, since `_mcount()` check the incoming caller address for
monitored range but not the intermediate result hash-like index that
uses to write into the table.

It should be noted that when the results are output to `gmon.out`, the
table is read to the last element calculated from the allocated size in
bytes, so the arcs stored outside the buffer boundary did not fall into
`gprof` for analysis. Thus this "feature" help me to found this bug
during working with https://sourceware.org/bugzilla/show_bug.cgi?id=29438

Just in case, I will explicitly note that the problem breaks the
`make test t=gmon/tst-gmon-dso` added for Bug 29438.
There, the arc of the `f3()` call disappears from the output, since in
the DSO case, the call to `f3` is located close to the end of the
monitored range.

Signed-off-by: Леонид Юрьев (Leonid Yuriev) <leo@yuriev.ru>

Another minor error seems a related typo in the calculation of
`kcountsize`, but since kcounts are smaller than froms, this is
actually to align the p->froms data.

Co-authored-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 801af9fafd)
2023-04-28 16:04:58 +02:00
Adam Yi
413af1eb02 posix: Fix system blocks SIGCHLD erroneously [BZ #30163]
Fix bug that SIGCHLD is erroneously blocked forever in the following
scenario:

1. Thread A calls system but hasn't returned yet
2. Thread B calls another system but returns

SIGCHLD would be blocked forever in thread B after its system() returns,
even after the system() in thread A returns.

Although POSIX does not require, glibc system implementation aims to be
thread and cancellation safe. This bug was introduced in
5fb7fc9635 when we moved reverting signal
mask to happen when the last concurrently running system returns,
despite that signal mask is per thread. This commit reverts this logic
and adds a test.

Signed-off-by: Adam Yi <ayi@janestreet.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 436a604b7d)
2023-04-28 16:04:09 +02:00
Florian Weimer
1c7f51c75a x86_64: Fix asm constraints in feraiseexcept (bug 30305)
The divss instruction clobbers its first argument, and the constraints
need to reflect that.  Fortunately, with GCC 12, generated code does
not actually change, so there is no externally visible bug.

Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 5d1ccdda7b)
2023-04-24 15:34:29 +02:00
Florian Weimer
8d07e65d15 gshadow: Matching sgetsgent, sgetsgent_r ERANGE handling (bug 30151)
Before this change, sgetsgent_r did not set errno to ERANGE, but
sgetsgent only check errno, not the return value from sgetsgent_r.
Consequently, sgetsgent did not detect any error, and reported
success to the caller, without initializing the struct sgrp object
whose address was returned.

This commit changes sgetsgent_r to set errno as well.  This avoids
similar issues in applications which only change errno.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 969e9733c7)
2023-04-24 14:36:55 +02:00
H.J. Lu
b7cc55a24e x86: Check minimum/maximum of non_temporal_threshold [BZ #29953]
The minimum non_temporal_threshold is 0x4040.  non_temporal_threshold may
be set to less than the minimum value when the shared cache size isn't
available (e.g., in an emulator) or by the tunable.  Add checks for
minimum and maximum of non_temporal_threshold.

This fixes BZ #29953.

(cherry picked from commit 48b74865c6)
2023-04-20 15:49:52 +02:00
Vitaly Buka
3f63f9dfe1 stdlib: Undo post review change to 16adc58e73 [BZ #27749]
Post review removal of "goto restart" from
https://sourceware.org/pipermail/libc-alpha/2021-April/125470.html
introduced a bug when some atexit handers skipped.

Signed-off-by: Vitaly Buka <vitalybuka@google.com>

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit fd78cfa72e)
2023-02-20 10:03:39 -03:00
Florian Weimer
757d9a6306 elf: Smoke-test ldconfig -p against system /etc/ld.so.cache
The test is sufficient to detect the ldconfig bug fixed in
commit 9fe6f63638 ("elf: Fix 64 time_t
support for installed statically binaries").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 9fd63e3537)
2023-02-08 18:12:05 +01:00