1. Refactor files so that all implementations for in the multiarch
directory.
- Essentially moved sse2 {raw|w}memchr.S implementation to
multiarch/{raw|w}memchr-sse2.S
- The non-multiarch {raw|w}memchr.S file now only includes one of
the implementations in the multiarch directory based on the
compiled ISA level (only used for non-multiarch builds.
Otherwise we go through the ifunc selector).
2. Add ISA level build guards to different implementations.
- I.e memchr-avx2.S which is ISA level 3 will only build if
compiled ISA level <= 3. Otherwise there is no reason to include
it as we will always use one of the ISA level 4
implementations (memchr-evex{-rtm}.S).
3. Add new multiarch/rtld-{raw}memchr.S that just include the
non-multiarch {raw}memchr.S which will in turn select the best
implementation based on the compiled ISA level.
4. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
- Guranteed replacement essentially means that for any ISA level
build there must be a function that the baseline of the ISA
supports. So for {raw|w}memchr.S since there is not ISA level 2
function, the ISA level 2 build still includes the ISA level
1 (sse2) function. Once we reach the ISA level 3 build, however,
{raw|w}memchr-avx2{-rtm}.S will always be sufficient so the ISA
level 1 implementation ({raw|w}memchr-sse2.S) will not be built.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
commit 6dcbb7d95d
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Mon Jun 6 21:11:33 2022 -0700
x86: Shrink code size of memchr-avx2.S
Changed how the page cross case aligned string (rdi) in
rawmemchr. This was incompatible with how
`L(cross_page_continue)` expected the pointer to be aligned and
would cause rawmemchr to read data start started before the
beginning of the string. What it would read was in valid memory
but could count CHAR matches resulting in an incorrect return
value.
This commit fixes that issue by essentially reverting the changes to
the L(page_cross) case as they didn't really matter.
Test cases added and all pass with the new code (and where confirmed
to fail with the old code).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.
The total code size saving is: 59 bytes
There are no major changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 0.967
Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
This commit fixes the bug mentioned in the previous commit.
The previous implementations of wmemchr in these files relied
on n * sizeof(wchar_t) which was not guranteed by the standard.
The new overflow tests added in the previous commit now
pass (As well as all the other tests).
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
No bug. This commit optimizes memchr-avx2.S. The optimizations include
replacing some branches with cmovcc, avoiding some branches entirely
in the less_4x_vec case, making the page cross logic less strict,
asaving a few instructions the in loop return loop. test-memchr,
test-rawmemchr, and test-wmemchr are all passing.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with
xtest
jz 1f
vzeroall
ret
1:
vzeroupper
ret
at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits. The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.
This pach fixes memchr/wmemchr for x32. Tested on x86-64 and x32. On
x86-64, libc.so is the same with and withou the fix.
[BZ# 24097]
CVE-2019-6488
* sysdeps/x86_64/memchr.S: Use RDX_LP for length. Clear the
upper 32 bits of RDX register.
* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memchr and
tst-size_t-wmemchr.
* sysdeps/x86_64/x32/test-size_t.h: New file.
* sysdeps/x86_64/x32/tst-size_t-memchr.c: Likewise.
* sysdeps/x86_64/x32/tst-size_t-wmemchr.c: Likewise.
SSE2 memchr is extended to support wmemchr. AVX2 memchr/rawmemchr/wmemchr
are added to search 32 bytes with a single vector compare instruction.
AVX2 memchr/rawmemchr/wmemchr are as fast as SSE2 memchr/rawmemchr/wmemchr
for small sizes and up to 1.5X faster for larger sizes on Haswell and
Skylake. Select AVX2 memchr/rawmemchr/wmemchr on AVX2 machines where
vzeroupper is preferred and AVX unaligned load is fast.
NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input. TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.
* sysdeps/x86_64/memchr.S (MEMCHR): New. Depending on if
USE_AS_WMEMCHR is defined.
(PCMPEQ): Likewise.
(memchr): Renamed to ...
(MEMCHR): This. Support wmemchr if USE_AS_WMEMCHR is defined.
Replace pcmpeqb with PCMPEQ.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memchr-sse2, rawmemchr-sse2, memchr-avx2, rawmemchr-avx2,
wmemchr-sse4_1, wmemchr-avx2 and wmemchr-c.
* sysdeps/x86_64/multiarch/ifunc-avx2.h: New file.
* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/memchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/memchr.c: Likewise.
* sysdeps/x86_64/multiarch/rawmemchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/rawmemchr.c: Likewise.
* sysdeps/x86_64/multiarch/wmemchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/wmemchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/wmemchr.c: Likewise.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test __memchr_avx2, __memchr_sse2,
__rawmemchr_avx2, __rawmemchr_sse2, __wmemchr_avx2 and
__wmemchr_sse2.