The siglist.c is built with -fno-toplevel-reorder to avoid compiler
to reorder the compat assembly directives due an assembler
issue [1] (fixed on 2.39).
This patch removes the compiler flags by split the compat symbol
generation in two phases. First the __sys_siglist and __sys_sigabbrev
without any compat symbol directive is preprocessed to generate an
assembly source code. This generate assembly is then used as input
on a platform agnostic siglist.S which then creates the compat
definitions. This prevents compiler to move any compat directive
prior the _sys_errlist definition itself.
Checked on a make check run-built-tests=no on all affected ABIs.
Reviewed-by: Fangrui Song <maskray@google.com>
The errlist.c is built with -fno-toplevel-reorder to avoid compiler to
reorder the compat assembly directives due an assembler issue [1]
(fixed on 2.39).
This patch removes the compiler flags by split the compat symbol
generation in two phases. First the _sys_errlist_internal internal
without any compat symbol directive is preprocessed to generate an
assembly source code. This generate assembly is then used as input
on a platform agnostic errlist-data.S which then creates the compat
definitions. This prevents compiler to move any compat directive
prior the _sys_errlist_internal definition itself.
Checked on a make check run-built-tests=no on all affected ABIs.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=29012
The GNU implementation of wcrtomb assumes that there are at least
MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb
as the first argument. This is not compatible with the POSIX
definition, which only requires enough space for the input wide
character.
This does not break much in practice because when users supply buffers
smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically
allocate the buffer, which results in enough spare space (thanks to
usable_size in malloc and padding in alloca) that no actual buffer
overflow occurs. However when the code is built with _FORTIFY_SOURCE,
it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and
hence fails. It wasn't evident until now since dynamic allocations
would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3,
that limitation is gone, resulting in such code failing.
To fix this problem, introduce an internal buffer that is MB_LEN_MAX
long and use that to perform the conversion and then copy the resultant
bytes into the destination buffer. Also move the fortification check
into the main implementation, which checks the result after conversion
and aborts if the resultant byte count is greater than the destination
buffer size.
One complication is that applications that assume the MB_CUR_MAX
limitation to be gone may not be able to run safely on older glibcs if
they use static destination buffers smaller than MB_CUR_MAX; dynamic
allocations will always have enough spare space that no actual overruns
will occur. One alternative to fixing this is to bump symbol version to
prevent them from running on older glibcs but that seems too strict a
constraint. Instead, since these users will only have made this
decision on reading the manual, I have put a note in the manual warning
them about the pitfalls of having static buffers smaller than
MB_CUR_MAX and running them on older glibc.
Benchmarking:
The wcrtomb microbenchmark shows significant increases in maximum
execution time for all locales, ranging from 10x for ar_SA.UTF-8 to
1.5x-2x for nearly everything else. The mean execution time however saw
practically no impact, with some results even being quicker, indicating
that cache locality has a much bigger role in the overhead.
Given that the additional copy uses a temporary buffer inside wcrtomb,
it's likely that a hot path will end up putting that buffer (which is
responsible for the additional overhead) in a similar place on stack,
giving the necessary cache locality to negate the overhead. However in
situations where wcrtomb ends up getting called at wildly different
spots on the call stack (or is on different call stacks, e.g. with
threads or different execution contexts) and is still a hotspot, the
performance lag will be visible.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
And keep the previous definition if it exists. This allows
disabling IA64_USE_NEW_STUB while keeping USE_DL_SYSINFO defined.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add a simple benchmark that measures wcrtomb performance with various
locales with 1-4 byte characters.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Unlike MMAP_CALL, this avoids a TCB dependency for an errno update
on failure.
<mmap_internal.h> cannot be included as is on several architectures
due to the definition of page_unit, so introduce a separate header
file for the definition of MMAP_CALL and MMAP_CALL_INTERNAL,
<mmap_call.h>.
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
The man page and code comments clearly state that abbreviations of long
option names are recognized correctly as long as they are unique.
Document this fact in the glibc manual as well.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
When neither DT_HASH nor DT_GNU_HASH is present, the code scans
[DT_SYMTAB, DT_STRTAB). However, there is no guarantee that .dynstr
immediately follows .dynsym (e.g. lld typically places .gnu.version
after .dynsym).
In the absence of a hash table, symbol lookup will always fail
(map->l_nbuckets == 0 in dl-lookup.c) as if the object has no symbol, so
it seems fair for dladdr to do the same.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The information is theoretically available via dl_iterate_phdr as
well, but that approach is very slow if there are many shared
objects.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@rehdat.com>
The comment indicates that --hash-style=both was used to maintain
compatibility with static dlopen, but we had many internal ABI
changes since then, so this compatiblity does not add value anymore.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Improve libmvec benchmark integration so that in future other
architectures may be able to run their libmvec benchmarks as well. This
now allows libmvec benchmarks to be run with `make BENCHSET=bench-math`.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The libmvec benchmarks print a message indicating that a certain CPU
feature is unsupported and exit prematurelyi, which breaks the JSON in
bench.out.
Handle this more elegantly in the bench makefile target by adding
support for an UNSUPPORTED exit status (77) so that bench.out continues
to have output for valid tests.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The AT_SYMLINK_NOFOLLOW emulation ues the default 32 bit stat internal
calls, which fails with EOVERFLOW if the file constains timestamps
beyond 2038.
Checked on i686-linux-gnu.
__ehdr_start is already used in rltld.c:dl_main, and can serve the
same purpose as _begin. Besides tidying the code, using linker
defined section relative symbols rather than "-defsym _begin=0" better
reflects the intent of _dl_start_final use of _begin, which is to
refer to the load address of ld.so rather than absolute address zero.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
'get_fast_jitter' is meant to be used purely for performance
purposes. In all cases it's used it should be acceptable to get no
randomness (see default case). An example use case is in setting
jitter for retries between threads at a lock. There is a
performance benefit to having jitter, but only if the jitter can
be generated very quickly and ultimately there is no serious issue
if no jitter is generated.
The implementation generally uses 'HP_TIMING_NOW' iff it is
inlined (avoid any potential syscall paths).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Copied from gnulib/lib/glob.c in order to fix rhbz 1982608
Also fixes swbz 25659
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Benchmark for testing pthread mutex locks performance with different
threads and critical sections.
The test configuration consists of 3 parts:
1. thread number
2. critical-section length
3. non-critical-section length
Thread number starts from 1 and increased by 2x until num of CPU cores
(nprocs). An additional over-saturation case (1.25 * nprocs) is also
included.
Critical-section is represented by a loop of shared do_filler(),
length can be determined by the loop iters.
Non-critical-section is similiar to the critical-section, except it's
based on non-shared do_filler().
Currently, adaptive pthread_mutex lock is tested.
On _dl_map_object the underlying file is not opened in trace mode
(in other cases where the underlying file can't be opened,
_dl_map_object quits with an error). If there any missing libraries
being processed, they will not be considered on final nlist size
passed on _dl_sort_maps later in the function. And it is then used by
_dl_sort_maps_dfs on the stack allocated working maps:
222 /* Array to hold RPO sorting results, before we copy back to maps[]. */
223 struct link_map *rpo[nmaps];
224
225 /* The 'head' position during each DFS iteration. Note that we start at
226 one past the last element due to first-decrement-then-store (see the
227 bottom of above dfs_traversal() routine). */
228 struct link_map **rpo_head = &rpo[nmaps];
However while transversing the 'l_initfini' on dfs_traversal it will
still consider the l_faked maps and thus update rpo more times than the
allocated working 'rpo', overflowing the stack object.
As suggested in bugzilla, one option would be to avoid sorting the maps
for trace mode. However I think ignoring l_faked object does make
sense (there is one less constraint to call the sorting function), it
allows a slight less stack usage for trace, and it is slight simpler
solution.
The tests does trigger the stack overflow, however I tried to make
it more generic to check different scenarios or missing objects.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Verify that:
1. A DT_RELR shared library without DT_NEEDED works.
2. A DT_RELR shared library without DT_VERNEED works.
3. A DT_RELR shared library without libc.so on DT_NEEDED works.
With DT_RELR, there may be no relocations in DT_RELA/DT_REL and their
entry values are zero. Don't relocate DT_RELA/DT_REL and update the
combined relocation start address if their entry values are zero.
The EI_ABIVERSION field of the ELF header in executables and shared
libraries can be bumped to indicate the minimum ABI requirement on the
dynamic linker. However, EI_ABIVERSION in executables isn't checked by
the Linux kernel ELF loader nor the existing dynamic linker. Executables
will crash mysteriously if the dynamic linker doesn't support the ABI
features required by the EI_ABIVERSION field. The dynamic linker should
be changed to check EI_ABIVERSION in executables.
Add a glibc version, GLIBC_ABI_DT_RELR, to indicate DT_RELR support so
that the existing dynamic linkers will issue an error on executables with
GLIBC_ABI_DT_RELR dependency. When there is a DT_VERNEED entry with
libc.so on DT_NEEDED, issue an error if there is a DT_RELR entry without
GLIBC_ABI_DT_RELR dependency.
Support __placeholder_only_for_empty_version_map as the placeholder symbol
used only for empty version map to generate GLIBC_ABI_DT_RELR without any
symbols.
PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage
variables and hidden visibility variables in a shared object (ld.so)
need dynamic relocations (usually R_*_RELATIVE). PI (position
independent) in the macro name is a misnomer: a code sequence using GOT
is typically position-independent as well, but using dynamic relocations
does not meet the requirement.
Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new
ports will define PI_STATIC_AND_HIDDEN. Current ports defining
PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure
default.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
These failures were caught while building glibc master for Fedora
Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse'
using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon
processor.
When audit modules are loaded, ld.so initialization is not yet
complete, and rtld_active () returns false even though ld.so is
mostly working. Instead, the static dlopen hook is used, but that
does not work at all because this is not a static dlopen situation.
Commit 466c1ea15f ("dlfcn: Rework
static dlopen hooks") moved the hook pointer into _rtld_global_ro,
which means that separate protection is not needed anymore and the
hook pointer can be checked directly.
The guard for disabling libio vtable hardening in _IO_vtable_check
should stay for now.
Fixes commit 8e1472d2c1 ("ld.so:
Examine GLRO to detect inactive loader [BZ #20204]").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On non-PI_STATIC_AND_HIDDEN architectures, getting the address of
_rtld_local_ro (for GLRO (dl_final_object)) goes through a GOT entry.
The GOT load may be reordered before self relocation, leading to an
unrelocated/incorrect _rtld_local_ro address.
84e02af1eb tickled GCC powerpc32 to
reorder the GOT load before relative relocations, leading to ld.so
crash. This is similar to the m68k jump table reordering issue fixed by
a8e9b5b807.
Move code after self relocation into _dl_start_final to avoid the
reordering. This fixes powerpc32 and may help other architectures when
ELF_DYNAMIC_RELOCATE is simplified in the future.
If `__glibc_objsize (__o) == (size_t) -1` (i.e. `__o` is unknown size), fortify
checks should pass, and `__whatever_alias` should be called.
Previously, `__glibc_objsize (__o) == (size_t) -1` was explicitly checked, but
on commit a643f60c53, this was moved into `__glibc_safe_or_unknown_len`.
A comment says the -1 case should work as: "The -1 check is redundant because
since it implies that __glibc_safe_len_cond is true.". But this fails when:
* `__s > 1`
* `__osz == -1` (i.e. unknown size at compile time)
* `__l` is big enough
* `__l * __s <= __osz` can be folded to a constant
(I only found this to be true for `mbsrtowcs` and other functions in wchar2.h)
In this case `__l * __s <= __osz` is false, and `__whatever_chk_warn` will be
called by `__glibc_fortify` or `__glibc_fortify_n` and crash the program.
This commit adds the explicit `__osz == -1` check again.
moc crashes on startup due to this, see: https://bugs.archlinux.org/task/74041
Minimal test case (test.c):
#include <wchar.h>
int main (void)
{
const char *hw = "HelloWorld";
mbsrtowcs (NULL, &hw, (size_t)-1, NULL);
return 0;
}
Build with:
gcc -O2 -Wp,-D_FORTIFY_SOURCE=2 test.c -o test && ./test
Output:
*** buffer overflow detected ***: terminated
Fixes: BZ #29030
Signed-off-by: Joan Bruguera <joanbrugueram@gmail.com>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.755
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.832
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The new code unrolls the main loop slightly without adding too much
overhead and minimizes the comparisons for the search CHAR.
Geometric Mean of all benchmarks New / Old: 0.741
See email for all results.
Full xcheck passes on x86_64 with and without multiarch enabled.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>