SSE registers are used for passing parameters and must be preserved
in runtime relocations. This is inside ld.so enforced through the
tests in tst-xmmymm.sh. But the malloc routines used after startup
come from libc.so and can be arbitrarily complex. It's overkill
to save the SSE registers all the time because of that. These calls
are rare. Instead we save them on demand. The new infrastructure
put in place in this patch makes this possible and efficient.
The test now takes the callgraph into account. Only code called
during runtime relocation is affected by the limitation. We now
determine the affected object files as closely as possible from
the outside. This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
There were several issues when the initial 31 entries hashtab filled up.
size * 3 <= tab->n_elements is always false, table can't have more elements
than its size. I assume from libiberty/hashtab.c this meant to be check for
3/4 full. Even after fixing that, _dl_higher_prime_number (31) apparently
returns 31, only _dl_higher_prime_number (32) returns 61. And, size
variable wasn't updated during reallocation, which means during reallocation
the insertion of the new entry was done into a wrong spot.
All this lead to a hang in ld.so, because a search with n_elements 31 size
31 wouldn't ever terminate.
This patch introduces a test to make sure no function modifies the
xmm/ymm registers. With the exception of the auditing functions.
The test is probably too pessimistic. All code linked into ld.so
is checked. Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
There will be more than one function which, in multiarch mode, wants
to use SSSE3. We should not test in each of them for Atoms with
slow SSSE3. Instead, disable the SSSE3 bit in the startup code for
such machines.
The posix/tst-rfc3484* test cases caused warnings in newer gccs
because the unused but copied sin_zero part of sockaddr_in wasn't
explicitly initialized.
References to unique symbols from copy relocations can only come
from executables which cannot be unloaded anyway. Optimize the
code to set the unload flag a bit.
If a locale does not have 8-bit characters with case conversion which
are different from the ASCII conversion (±0x20) then we can perform
some optimizations. These will follow later.
It just happens that __pthread_enable_asynccancel doesn't modify the $rdi
register. But this isn't guaranteed. Hence we reload the register after
the calls.