Commit Graph

38069 Commits

Author SHA1 Message Date
H.J. Lu
a2e259014f Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit 0b82747dc4)
2022-09-28 07:32:55 -07:00
Florian Weimer
044755e2fa resolv: Fix building tst-resolv-invalid-cname for earlier C standards
This fixes this compiler error:

tst-resolv-invalid-cname.c: In function ‘test_mode_to_string’:
tst-resolv-invalid-cname.c:164:10: error: label at end of compound statement
     case test_mode_num:
          ^~~~~~~~~~~~~

Fixes commit 9caf782276
("resolv: Add new tst-resolv-invalid-cname").

(cherry picked from commit d09aa4a172)
2022-09-21 19:37:24 +02:00
Florian Weimer
2def56a349 nss_dns: Rewrite _nss_dns_gethostbyname4_r using current interfaces
Introduce struct alloc_buffer to this function, and use it and
struct ns_rr_cursor in gaih_getanswer_slice.  Adjust gaih_getanswer
and gaih_getanswer_noaaaa accordingly.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 1d495912a7)
(conflict in resolv/nss_dns/dns-host.c due to missing noaaaa support)
2022-09-21 19:37:24 +02:00
Florian Weimer
480c820493 resolv: Add new tst-resolv-invalid-cname
This test checks resolution through CNAME chains that do not contain
host names (bug 12154).

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 9caf782276)
2022-09-21 19:37:24 +02:00
Florian Weimer
c36e7cca35 nss_dns: In gaih_getanswer_slice, skip strange aliases (bug 12154)
If the name is not a host name, skip adding it to the result, instead
of reporting query failure.  This fixes bug 12154 for getaddrinfo.

This commit still keeps the old parsing code, and only adjusts when
a host name is copied.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 32b599ac8c)
2022-09-21 19:37:24 +02:00
Florian Weimer
9abc40d9b5 nss_dns: Rewrite getanswer_r to match getanswer_ptr (bug 12154, bug 29305)
Allocate the pointer arrays only at the end, when their sizes
are known.  This addresses bug 29305.

Skip over invalid names instead of failing lookups.  This partially
fixes bug 12154 (for gethostbyname, fixing getaddrinfo requires
different changes).

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit d101d836e7)
2022-09-21 19:37:17 +02:00
Florian Weimer
7267341ec1 nss_dns: Remove remnants of IPv6 address mapping
res_use_inet6 always returns false since commit 3f8b44be0a
("resolv: Remove support for RES_USE_INET6 and the inet6 option").

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit a7fc30b522)
2022-09-21 19:36:12 +02:00
Florian Weimer
32e5db3768 nss_dns: Rewrite _nss_dns_gethostbyaddr2_r and getanswer_ptr
The simplification takes advantage of the split from getanswer_r.
It fixes various aliases issues, and optimizes NSS buffer usage.
The new DNS packet parsing helpers are used, too.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit e32547d661)
2022-09-21 19:36:12 +02:00
Florian Weimer
d9c979abf9 nss_dns: Split getanswer_ptr from getanswer_r
And expand the use of name_ok and qtype in getanswer_ptr (the
former also in getanswer_r).

After further cleanups, not much code will be shared between the
two functions.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 0dcc43e998)
2022-09-21 19:36:12 +02:00
Florian Weimer
e7c03f4765 resolv: Add DNS packet parsing helpers geared towards wire format
The public parser functions around the ns_rr record type produce
textual domain names, but usually, this is not what we need while
parsing DNS packets within glibc.  This commit adds two new helper
functions, __ns_rr_cursor_init and __ns_rr_cursor_next, for writing
packet parsers, and struct ns_rr_cursor, struct ns_rr_wire as
supporting types.

In theory, it is possible to avoid copying the owner name
into the rname field in __ns_rr_cursor_next, but this would need
more functions that work on compressed names.

Eventually, __res_context_send could be enhanced to preserve the
result of the packet parsing that is necessary for matching the
incoming UDP packets, so that this works does not have to be done
twice.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 857c890d9b)
2022-09-21 19:36:12 +02:00
Florian Weimer
c288e032ae resolv: Add internal __ns_name_length_uncompressed function
This function is useful for checking that the question name is
uncompressed (as it should be).

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 78b1a4f0e4)
2022-09-21 19:36:12 +02:00
Florian Weimer
bb8adbba4f resolv: Add the __ns_samebinaryname function
During packet parsing, only the binary name is available.  If the name
equality check is performed before conversion to text, we can sometimes
skip the last step.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 394085a34d)
2022-09-21 19:36:12 +02:00
Florian Weimer
4d2e67d6e5 resolv: Add internal __res_binary_hnok function
During package parsing, only the binary representation is available,
and it is convenient to check that directly for conformance with host
name requirements.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit c79327bf00)
2022-09-21 19:36:12 +02:00
Florian Weimer
6a833d798e resolv: Add tst-resolv-aliases
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 87aa98aa80)
2022-09-21 19:36:12 +02:00
Florian Weimer
1a3afdfe31 resolv: Add tst-resolv-byaddr for testing reverse lookup
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 0b99828d54)
2022-09-21 19:36:12 +02:00
Florian Weimer
f50a6c843a gconv: Use 64-bit interfaces in gconv_parseconfdir (bug 29583)
It's possible that inode numbers are outside the 32-bit range.
The existing code only handles the in-libc case correctly, and
still uses the legacy interfaces when building iconv.

Suggested-by: Helge Deller <deller@gmx.de>
(cherry picked from commit f97905f246)
2022-09-21 13:13:02 +02:00
Javier Pello
2ff6775ad3 elf: Fix hwcaps string size overestimation
Commit dad90d5282 added glibc-hwcaps
support for LD_LIBRARY_PATH and, for this, it adjusted the total
string size required in _dl_important_hwcaps. However, in doing so
it inadvertently altered the calculation of the size required for
the power set strings, as the computation of the power set string
size depended on the first value assigned to the total variable,
which is later shifted, resulting in overallocation of string
space. Fix this now by using a different variable to hold the
string size required for glibc-hwcaps.

Signed-off-by: Javier Pello <devel@otheo.eu>
(cherry picked from commit a23820f605)
2022-09-15 15:44:14 +02:00
Florian Weimer
bc5cb538e5 elf: Run tst-audit-tlsdesc, tst-audit-tlsdesc-dlopen everywhere
The test is valid for all TLS models, but we want to make a reasonable
effort to test the GNU2 model specifically.  For example, aarch64
defaults to GNU2, but does not have -mtls-dialect=gnu2, and the test
was not run there.

Suggested-by: Martin Coufal <mcoufal@redhat.com>
(cherry picked from commit dd2315a866)

Fixes early backport commit 536ddc5c02
("elf: Call __libc_early_init for reused namespaces (bug 29528)");
it had a wrong conflict resolution.
2022-09-13 20:47:59 +02:00
Fabian Vogt
2b3d020055 nscd: Fix netlink cache invalidation if epoll is used [BZ #29415]
Processes cache network interface information such as whether IPv4 or IPv6
are enabled. This is only checked again if the "netlink timestamp" provided
by nscd changed, which is triggered by netlink socket activity.

However, in the epoll handler for the netlink socket, it was missed to
assign the new timestamp to the nscd database. The handler for plain poll
did that properly, copy that over.

This bug caused that e.g. processes which started before network
configuration got unusuable addresses from getaddrinfo, like IPv6 only even
though only IPv4 is available:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1041

It's a bit hard to reproduce, so I verified this by checking the timestamp
on calls to __check_pf manually. Without this patch it's stuck at 1, now
it's increasing on network changes as expected.

Signed-off-by: Fabian Vogt <fvogt@suse.de>
(cherry picked from commit 02ca25fef2)
2022-09-06 17:17:35 +02:00
Raphael Moreira Zinsly
b41c535f46 Apply asm redirections in wchar.h before first use
Similar to d0fa09a770, but for wchar.h.  Fixes [BZ #27087] by applying
all long double related asm redirections before using functions in
bits/wchar2.h.
Moves the function declarations from wcsmbs/bits/wchar2.h to a new file
wcsmbs/bits/wchar2-decl.h that will be included first in wcsmbs/wchar.h.

Tested with build-many-glibcs.py.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

(cherry picked from commit c7509d49c4)
2022-08-31 10:29:54 +02:00
Tulio Magno Quites Machado Filho
2a44960cbc Apply asm redirections in stdio.h before first use [BZ #27087]
Compilers may not be able to apply asm redirections to functions after
these functions are used for the first time, e.g. clang 13.
Fix [BZ #27087] by applying all long double-related asm redirections
before using functions in bits/stdio.h.
However, as these asm redirections depend on the declarations provided
by libio/bits/stdio2.h, this header was split in 2:

 - libio/bits/stdio2-decl.h contains all function declarations;
 - libio/bits/stdio2.h remains with the remaining contents, including
   redirections.

This also adds the access attribute to __vsnprintf_chk that was missing.

Tested with build-many-glibcs.py.

Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
(cherry picked from commit d0fa09a770)
2022-08-31 10:29:46 +02:00
Florian Weimer
536ddc5c02 elf: Call __libc_early_init for reused namespaces (bug 29528)
libc_map is never reset to NULL, neither during dlclose nor on a
dlopen call which reuses the namespace structure.  As a result, if a
namespace is reused, its libc is not initialized properly.  The most
visible result is a crash in the <ctype.h> functions.

To prevent similar bugs on namespace reuse from surfacing,
unconditionally initialize the chosen namespace to zero using memset.

(cherry picked from commit d0e357ff45)
2022-08-30 16:30:03 +02:00
Arjun Shankar
68507377f2 socket: Check lengths before advancing pointer in CMSG_NXTHDR
The inline and library functions that the CMSG_NXTHDR macro may expand
to increment the pointer to the header before checking the stride of
the increment against available space.  Since C only allows incrementing
pointers to one past the end of an array, the increment must be done
after a length check.  This commit fixes that and includes a regression
test for CMSG_FIRSTHDR and CMSG_NXTHDR.

The Linux, Hurd, and generic headers are all changed.

Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x.

[BZ #28846]

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 9c443ac455)
2022-08-22 18:59:26 +02:00
Florian Weimer
1fcc7bfee2 alpha: Fix generic brk system call emulation in __brk_call (bug 29490)
The kernel special-cases the zero argument for alpha brk, and we can
use that to restore the generic Linux error handling behavior.

Fixes commit b57ab258c1 ("Linux:
Introduce __brk_call for invoking the brk system call").

(cherry picked from commit e7ad26ee3c)
2022-08-22 11:12:40 +02:00
Noah Goldstein
4bc889c01c stdlib: Fixup mbstowcs NULL __dst handling. [BZ #29279]
commit 464d189b96 (origin/master, origin/HEAD)
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 22 08:24:21 2022 -0700

    stdlib: Remove attr_write from mbstows if dst is NULL [BZ: 29265]

Incorrectly called `__mbstowcs_chk` in the NULL __dst case which is
incorrect as in the NULL __dst case we are explicitly skipping
the objsize checks.

As well, remove the `__always_inline` attribute which exists in
`__fortify_function`.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

(cherry picked from commit 220b83d83d)
2022-08-15 23:14:15 +02:00
Noah Goldstein
a88f07f71f stdlib: Remove attr_write from mbstows if dst is NULL [BZ: 29265]
mbstows is defined if dst is NULL and is defined to special cased if
dst is NULL so the fortify objsize check if incorrect in that case.

Tested on x86-64 linux.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>

(cherry picked from commit 464d189b96)
2022-08-15 23:14:13 +02:00
Joseph Myers
4ab59ce4e5 Update syscall lists for Linux 5.19
Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi
headers for RISC-V.  Update the version number in syscall-names.list
to reflect that it is still current for 5.19 and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.

(cherry picked from commit fccadcdf5b)
2022-08-08 07:54:02 +02:00
Florian Weimer
875b2414cd dlfcn: Pass caller pointer to static dlopen implementation (bug 29446)
Fixes commit 0c1c3a771e ("dlfcn: Move
dlopen into libc").

(cherry picked from commit ed0185e412)
2022-08-04 20:57:18 +02:00
Florian Weimer
b2f32e7464 malloc: Simplify implementation of __malloc_assert
It is prudent not to run too much code after detecting heap
corruption, and __fxprintf is really complex.  The line number
and file name do not carry much information, so it is not included
in the error message.  (__libc_message only supports %s formatting.)
The function name and assertion should provide some context.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit ac8047cdf3)
2022-07-21 17:06:27 +02:00
Joseph Myers
b991af5063 Update syscall-names.list for Linux 5.18
Linux 5.18 has no new syscalls.  Update the version number in
syscall-names.list to reflect that it is still current for 5.18.

Tested with build-many-glibcs.py.

(cherry picked from commit 3d9926663c)
2022-07-21 17:06:27 +02:00
Noah Goldstein
ccc54bd61c x86: Add missing IS_IN (libc) check to strncmp-sse4_2.S
Was missing to for the multiarch build rtld-strncmp-sse4_2.os was
being built and exporting symbols:

build/glibc/string/rtld-strncmp-sse4_2.os:
0000000000000000 T __strncmp_sse42

Introduced in:

commit 11ffcacb64
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jun 21 12:10:50 2017 -0700

    x86-64: Implement strcmp family IFUNC selectors in C

(cherry picked from commit 96ac447d91)
2022-07-18 20:45:21 -07:00
Noah Goldstein
35f9c72c8b x86: Move mem{p}{mov|cpy}_{chk_}erms to its own file
The primary memmove_{impl}_unaligned_erms implementations don't
interact with this function. Putting them in same file both
wastes space and unnecessarily bloats a hot code section.

(cherry picked from commit 21925f6473)
2022-07-18 20:45:21 -07:00
Noah Goldstein
7079931c51 x86: Move and slightly improve memset_erms
Implementation wise:
    1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not
       use the L(stosb) label that was previously defined.

    2. Don't give the hotpath (fallthrough) to zero size.

Code positioning wise:

Move memset_{chk}_erms to its own file.  Leaving it in between the
memset_{impl}_unaligned both adds unnecessary complexity to the
file and wastes space in a relatively hot cache section.

(cherry picked from commit 4a3f29e7e4)
2022-07-18 20:45:21 -07:00
Noah Goldstein
f4598f0351 x86: Add definition for __wmemset_chk AVX2 RTM in ifunc impl list
This was simply missing and meant we weren't testing it properly.

(cherry picked from commit 2a1099020c)
2022-07-18 20:45:21 -07:00
Noah Goldstein
aadd0a1c7c x86: Put wcs{n}len-sse4.1 in the sse4.1 text section
Previously was missing but the two implementations shouldn't get in
the sse2 (generic) text section.

(cherry picked from commit afc6e4328f)
2022-07-18 20:45:21 -07:00
Noah Goldstein
d201c59177 x86: Align entry for memrchr to 64-bytes.
The function was tuned around 64-byte entry alignment and performs
better for all sizes with it.

As well different code boths where explicitly written to touch the
minimum number of cache line i.e sizes <= 32 touch only the entry
cache line.

(cherry picked from commit 227afaa672)
2022-07-18 20:45:21 -07:00
Noah Goldstein
c51d8d383c x86: Add BMI1/BMI2 checks for ISA_V3 check
BMI1/BMI2 are part of the ISA V3 requirements:
https://en.wikipedia.org/wiki/X86-64

And defined by GCC when building with `-march=x86-64-v3`

(cherry picked from commit 8da9f346cb)
2022-07-18 20:45:21 -07:00
Noah Goldstein
ba1c3f23d9 x86: Cleanup bounds checking in large memcpy case
1. Fix incorrect lower-bound threshold in L(large_memcpy_2x).
   Previously was using `__x86_rep_movsb_threshold` and should
   have been using `__x86_shared_non_temporal_threshold`.

2. Avoid reloading __x86_shared_non_temporal_threshold before
   the L(large_memcpy_4x) bounds check.

3. Document the second bounds check for L(large_memcpy_4x)
   more clearly.

(cherry picked from commit 89a25c6f64)
2022-07-18 20:45:21 -07:00
Noah Goldstein
94b0dc9419 x86: Add bounds x86_non_temporal_threshold
The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed
by memmove-vec-unaligned-erms.

The lower-bound is needed because memmove-vec-unaligned-erms unrolls
the loop aggressively in the L(large_memset_4x) case.

The upper-bound is needed because memmove-vec-unaligned-erms
right-shifts the value of `x86_non_temporal_threshold` by
LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow.

The lack of lower-bound can be a correctness issue. The lack of
upper-bound cannot.

(cherry picked from commit b446822b6a)
2022-07-18 20:45:21 -07:00
Noah Goldstein
9d50e162ee x86: Add sse42 implementation to strcmp's ifunc
This has been missing since the the ifuncs where added.

The performance of SSE4.2 is preferable to to SSE2.

Measured on Tigerlake with N = 20 runs.
Geometric Mean of all benchmarks SSE4.2 / SSE2: 0.906

(cherry picked from commit ff439c4717)
2022-07-18 20:45:21 -07:00
Noah Goldstein
6e008c884d x86: Fix misordered logic for setting rep_movsb_stop_threshold
Move the setting of `rep_movsb_stop_threshold` to after the tunables
have been collected so that the `rep_movsb_stop_threshold` (which
is used to redirect control flow to the non_temporal case) will
use any user value for `non_temporal_threshold` (set using
glibc.cpu.x86_non_temporal_threshold)

(cherry picked from commit 0355915514)
2022-07-18 20:45:21 -07:00
Noah Goldstein
fc54e1fae8 x86: Align varshift table to 32-bytes
This ensures the load will never split a cache line.

(cherry picked from commit 0f91811333)
2022-07-18 20:45:21 -07:00
Noah Goldstein
820504e3ed x86: ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST expect no transactions
Give fall-through path to `vzeroupper` and taken-path to `vzeroall`.

Generally even on machines with RTM the expectation is the
string-library functions will not be called in transactions.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit c28db9cb29)
2022-07-18 20:45:21 -07:00
Noah Goldstein
3c87383a20 x86: Shrink code size of memchr-evex.S
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.

The total code size saving is: 64 bytes

There are no non-negligible changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 1.000

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 56da3fe1dd)
2022-07-18 20:45:21 -07:00
Noah Goldstein
a910d7e164 x86: Shrink code size of memchr-avx2.S
This is not meant as a performance optimization. The previous code was
far to liberal in aligning targets and wasted code size unnecissarily.

The total code size saving is: 59 bytes

There are no major changes in the benchmarks.
Geometric Mean of all benchmarks New / Old: 0.967

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 6dcbb7d95d)

x86: Fix page cross case in rawmemchr-avx2 [BZ #29234]

commit 6dcbb7d95d
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Mon Jun 6 21:11:33 2022 -0700

    x86: Shrink code size of memchr-avx2.S

Changed how the page cross case aligned string (rdi) in
rawmemchr. This was incompatible with how
`L(cross_page_continue)` expected the pointer to be aligned and
would cause rawmemchr to read data start started before the
beginning of the string. What it would read was in valid memory
but could count CHAR matches resulting in an incorrect return
value.

This commit fixes that issue by essentially reverting the changes to
the L(page_cross) case as they didn't really matter.

Test cases added and all pass with the new code (and where confirmed
to fail with the old code).
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 2c9af8421d)
2022-07-18 20:45:21 -07:00
Noah Goldstein
b05bd59823 x86: Optimize memrchr-avx2.S
The new code:
    1. prioritizes smaller user-arg lengths more.
    2. optimizes target placement more carefully
    3. reuses logic more
    4. fixes up various inefficiencies in the logic. The biggest
       case here is the `lzcnt` logic for checking returns which
       saves either a branch or multiple instructions.

The total code size saving is: 306 bytes
Geometric Mean of all benchmarks New / Old: 0.760

Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
10-20% regression.

This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 15-45% speedup.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit af5306a735)
2022-07-18 20:45:21 -07:00
Noah Goldstein
83a986e9fb x86: Optimize memrchr-evex.S
The new code:
    1. prioritizes smaller user-arg lengths more.
    2. optimizes target placement more carefully
    3. reuses logic more
    4. fixes up various inefficiencies in the logic. The biggest
       case here is the `lzcnt` logic for checking returns which
       saves either a branch or multiple instructions.

The total code size saving is: 263 bytes
Geometric Mean of all benchmarks New / Old: 0.755

Regressions:
There are some regressions. Particularly where the length (user arg
length) is large but the position of the match char is near the
beginning of the string (in first VEC). This case has roughly a
20% regression.

This is because the new logic gives the hot path for immediate matches
to shorter lengths (the more common input). This case has roughly
a 35% speedup.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit b4209615a0)
2022-07-18 20:45:21 -07:00
Noah Goldstein
4901009dad x86: Optimize memrchr-sse2.S
The new code:
    1. prioritizes smaller lengths more.
    2. optimizes target placement more carefully.
    3. reuses logic more.
    4. fixes up various inefficiencies in the logic.

The total code size saving is: 394 bytes
Geometric Mean of all benchmarks New / Old: 0.874

Regressions:
    1. The page cross case is now colder, especially re-entry from the
       page cross case if a match is not found in the first VEC
       (roughly 50%). My general opinion with this patch is this is
       acceptable given the "coldness" of this case (less than 4%) and
       generally performance improvement in the other far more common
       cases.

    2. There are some regressions 5-15% for medium/large user-arg
       lengths that have a match in the first VEC. This is because the
       logic was rewritten to optimize finds in the first VEC if the
       user-arg length is shorter (where we see roughly 20-50%
       performance improvements). It is not always the case this is a
       regression. My intuition is some frontend quirk is partially
       explaining the data although I haven't been able to find the
       root cause.

Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 731feee386)
2022-07-18 20:45:21 -07:00
Noah Goldstein
e805606193 x86: Add COND_VZEROUPPER that can replace vzeroupper if no ret
The RTM vzeroupper mitigation has no way of replacing inline
vzeroupper not before a return.

This can be useful when hoisting a vzeroupper to save code size
for example:

```
L(foo):
	cmpl	%eax, %edx
	jz	L(bar)
	tzcntl	%eax, %eax
	addq	%rdi, %rax
	VZEROUPPER_RETURN

L(bar):
	xorl	%eax, %eax
	VZEROUPPER_RETURN
```

Can become:

```
L(foo):
	COND_VZEROUPPER
	cmpl	%eax, %edx
	jz	L(bar)
	tzcntl	%eax, %eax
	addq	%rdi, %rax
	ret

L(bar):
	xorl	%eax, %eax
	ret
```

This code does not change any existing functionality.

There is no difference in the objdump of libc.so before and after this
patch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit dd5c483b25)
2022-07-18 20:45:20 -07:00
Noah Goldstein
70be93d1c5 x86: Create header for VEC classes in x86 strings library
This patch does not touch any existing code and is only meant to be a
tool for future patches so that simple source files can more easily be
maintained to target multiple VEC classes.

There is no difference in the objdump of libc.so before and after this
patch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 8a780a6b91)
2022-07-18 20:45:20 -07:00