Commit Graph

8 Commits

Author SHA1 Message Date
Noah Goldstein
419c832aba x86: Use VMM API in memcmp-evex-movbe.S and minor changes
The only change to the existing generated code is `tzcnt` -> `bsf` to
save a byte of code size here and there.

Rewriting with VMM API allows for memcmp-evex-movbe to be used with
evex512 by including "x86-evex512-vecs.h" at the top.

Complete check passes on x86-64.
2022-11-08 19:19:35 -08:00
Noah Goldstein
ae308947ff x86: Add support for building {w}memcmp{eq} with explicit ISA level
1. Refactor files so that all implementations are in the multiarch
   directory
    - Moved the implementation portion of memcmp sse2 from memcmp.S to
      multiarch/memcmp-sse2.S

    - The non-multiarch file now only includes one of the
      implementations in the multiarch directory based on the compiled
      ISA level (only used for non-multiarch builds.  Otherwise we go
      through the ifunc selector).

2. Add ISA level build guards to different implementations.
    - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to include
      it as we will always use one of the ISA level 4
      implementations (memcmp-evex-movbe.S).

3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the
   non-multiarch {w}memcmp{eq}.S which will in turn select the best
   implementation based on the compiled ISA level.

4. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-05 16:42:42 -07:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Noah Goldstein
abddd61de0 x86: Optimize L(less_vec) case in memcmp-evex-movbe.S
No bug.
Optimizations are twofold.

1) Replace page cross and 0/1 checks with masked load instructions in
   L(less_vec). In applications this reduces branch-misses in the
   hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.

Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range.  From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-27 03:17:59 -06:00
Noah Goldstein
bad852b61b x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'.

it could potentially be dangerous to use SSE2 if this function is ever
called without using 'vzeroupper' beforehand. While compilers appear
to use 'vzeroupper' before function calls if AVX2 has been used, using
SSE2 here is more brittle. Since it is not absolutely necessary it
should be avoided.

It costs 2-extra bytes but the extra bytes should only eat into
alignment padding.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-23 13:02:42 -05:00
Noah Goldstein
1bd8b8d58f x86: Optimize memcmp-evex-movbe.S for frontend behavior and size
No bug.

The frontend optimizations are to:
1. Reorganize logically connected basic blocks so they are either in
   the same cache line or adjacent cache lines.
2. Avoid cases when basic blocks unnecissarily cross cache lines.
3. Try and 32 byte align any basic blocks possible without sacrificing
   code size. Smaller / Less hot basic blocks are used for this.

Overall code size shrunk by 168 bytes. This should make up for any
extra costs due to aligning to 64 bytes.

In general performance before deviated a great deal dependending on
whether entry alignment % 64 was 0, 16, 32, or 48. These changes
essentially make it so that the current implementation is at least
equal to the best alignment of the original for any arguments.

The only additional optimization is in the page cross case. Branch on
equals case was removed from the size == [4, 7] case. As well the [4,
7] and [2, 3] case where swapped as [4, 7] is likely a more hot
argument size.

test-memcmp and test-wmemcmp are both passing.
2021-10-12 12:02:12 -05:00
Noah Goldstein
4ad473e97a x86: Optimize memcmp-evex-movbe.S
No bug. This commit optimizes memcmp-evex.S. The optimizations include
adding a new vec compare path for small sizes, reorganizing the entry
control flow, removing some unnecissary ALU instructions from the main
loop, and most importantly replacing the heavy use of vpcmp + kand
logic with vpxor + vptern. test-memcmp and test-wmemcmp are both
passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-05-18 22:57:51 -04:00
H.J. Lu
91264fe357 x86-64: Add memcmp family functions with 256-bit EVEX
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.
2021-03-29 07:40:17 -07:00