On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.
This pach fixes memchr/wmemchr for x32. Tested on x86-64 and x32. On
x86-64, libc.so is the same with and withou the fix.
[BZ# 24097]
CVE-2019-6488
* sysdeps/x86_64/memchr.S: Use RDX_LP for length. Clear the
upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memchr and
tst-size_t-wmemchr.
* sysdeps/x86_64/x32/test-size_t.h: New file.
* sysdeps/x86_64/x32/tst-size_t-memchr.c: Likewise.
* sysdeps/x86_64/x32/tst-size_t-wmemchr.c: Likewise.
SSE2 memchr is extended to support wmemchr. AVX2 memchr/rawmemchr/wmemchr
are added to search 32 bytes with a single vector compare instruction.
AVX2 memchr/rawmemchr/wmemchr are as fast as SSE2 memchr/rawmemchr/wmemchr
for small sizes and up to 1.5X faster for larger sizes on Haswell and
Skylake. Select AVX2 memchr/rawmemchr/wmemchr on AVX2 machines where
vzeroupper is preferred and AVX unaligned load is fast.
NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input. TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.
* sysdeps/x86_64/memchr.S (MEMCHR): New. Depending on if
USE_AS_WMEMCHR is defined.
(PCMPEQ): Likewise.
(memchr): Renamed to ...
(MEMCHR): This. Support wmemchr if USE_AS_WMEMCHR is defined.
Replace pcmpeqb with PCMPEQ.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memchr-sse2, rawmemchr-sse2, memchr-avx2, rawmemchr-avx2,
wmemchr-sse4_1, wmemchr-avx2 and wmemchr-c.
* sysdeps/x86_64/multiarch/ifunc-avx2.h: New file.
* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/memchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/memchr.c: Likewise.
* sysdeps/x86_64/multiarch/rawmemchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/rawmemchr.c: Likewise.
* sysdeps/x86_64/multiarch/wmemchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/wmemchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/wmemchr.c: Likewise.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test __memchr_avx2, __memchr_sse2,
__rawmemchr_avx2, __rawmemchr_sse2, __wmemchr_avx2 and
__wmemchr_sse2.
By x86-64 specification, 32-bit destination registers are zero-extended
to 64 bits. There is no need to use 64-bit registers when only the lower
32 bits are non-zero.
* sysdeps/x86_64/memchr.S (MEMCHR): Use 32-bit registers for
the lower 32 bits.
SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16. Use
"edx - (16 - ecx)", instead of satured math, to avoid possible addition
overflow. This replaces
add %ecx, %edx
sbb %eax, %eax
or %eax, %edx
sub $16, %edx
with
neg %ecx
add $16, %ecx
sub %ecx, %edx
It is the same for x86_64, except for rcx/rdx, instead of ecx/edx.
* sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use
"edx + ecx - 16" to avoid possible addition overflow.
* sysdeps/x86_64/memchr.S (memchr): Likewise.
Current optimized memchr for x86_64 does for input arguments pointers
module 64 in range of [49,63] if there is no searchr char in the rest
of 64-byte block a pointer addition which might overflow:
* sysdeps/x86_64/memchr.S
77 .p2align 4
78 L(unaligned_no_match):
79 add %rcx, %rdx
Add (uintptr_t)s % 16 to n in %rdx.
80 sub $16, %rdx
81 jbe L(return_null)
This patch fixes by adding a saturated math that sets a maximum pointer
value if it overflows (UINTPTR_MAX).
Checked on x86_64-linux-gnu and powerpc64-linux-gnu.
[BZ# 19387]
* sysdeps/x86_64/memchr.S (memchr): Avoid overflow in pointer
addition.
* string/test-memchr.c (do_test): Remove alignment limitation.
(test_main): Add test that trigger BZ# 19387.