This patch adds 32bit SSE4.2 string functions. It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long. Tested
on 32bit Core i7 and Core 2.
This patch adds multiarch support when configured for i686. I modified
some x86-64 functions to support 32bit. I will contribute 32bit SSE string
and memory functions later.
We use sigaltstack internally which on some systems is a syscall
and should be used as such. Move the x86-64 version to the Linux
specific directory and create in its place a file which always
causes compile errors.
We use a callback function into libc.so to get access to the data
structure with the information and have special versions of the test
macros which automatically use this function.
SSE registers are used for passing parameters and must be preserved
in runtime relocations. This is inside ld.so enforced through the
tests in tst-xmmymm.sh. But the malloc routines used after startup
come from libc.so and can be arbitrarily complex. It's overkill
to save the SSE registers all the time because of that. These calls
are rare. Instead we save them on demand. The new infrastructure
put in place in this patch makes this possible and efficient.
The test now takes the callgraph into account. Only code called
during runtime relocation is affected by the limitation. We now
determine the affected object files as closely as possible from
the outside. This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
This patch introduces a test to make sure no function modifies the
xmm/ymm registers. With the exception of the auditing functions.
The test is probably too pessimistic. All code linked into ld.so
is checked. Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
There will be more than one function which, in multiarch mode, wants
to use SSSE3. We should not test in each of them for Atoms with
slow SSSE3. Instead, disable the SSSE3 bit in the startup code for
such machines.
The original AVX patch used a function pointer to handle the difference
between machines with and without AVX support. This is insecure. A
well-placed memory exploit could lead to redirection of the execution.
Using a variable and several tests is a bit slower but cannot be
exploited in this way.