Commit Graph

35519 Commits

Author SHA1 Message Date
Florian Weimer
1c08c17156 debug: Mark libSegFault.so as NODELETE
The signal handler installed in the ELF constructor cannot easily
be removed again (because the program may have changed handlers
in the meantime).  Mark the object as NODELETE so that the registered
handler function is never unloaded.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 23ee92deea)
2023-07-21 16:40:30 +02:00
Noah Goldstein
2d4f26e5cf x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591]
Previous implementation was adjusting length (rsi) to match
bytes (eax), but since there is no bound to length this can cause
overflow.

Fix is to just convert the byte-count (eax) to length by dividing by
sizeof (wchar_t) before the comparison.

Full check passes on x86-64 and build succeeds w/ and w/o multiarch.

(cherry picked from commit b0969fa53a)
2022-11-24 17:15:54 -08:00
Sunil K Pandey
d4b7559457 x86-64: Require BMI2 for avx2 functions [BZ #29611]
This patch fixes BZ #29611
2022-09-28 18:08:36 -07:00
H.J. Lu
b8bb48a18d x86-64: Require BMI2 for strchr-avx2.S [BZ #29611]
Since strchr-avx2.S updated by

commit 1f745ecc21
Author: noah <goldstein.w.n@gmail.com>
Date:   Wed Feb 3 00:38:59 2021 -0500

    x86-64: Refactor and improve performance of strchr-avx2.S

uses sarx:

c4 e2 72 f7 c0       	sarx   %ecx,%eax,%eax

for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and
ifunc-avx2.h.

This fixes BZ #29611.

(cherry picked from commit 83c5b36822)
2022-09-28 18:08:22 -07:00
Andreas Schwab
c8f2a3e803 Add test for bug 29530
This tests for a bug that was introduced in commit edc1686af0 ("vfprintf:
Reuse work_buffer in group_number") and fixed as a side effect of commit
6caddd34bd ("Remove most vfprintf width/precision-dependent allocations
(bug 14231, bug 26211).").

(cherry picked from commit ca6466e8be)
2022-08-30 10:45:40 +02:00
Joseph Myers
e6ae5b25cd Fix memmove call in vfprintf-internal.c:group_number
A recent GCC mainline change introduces errors of the form:

vfprintf-internal.c: In function 'group_number':
vfprintf-internal.c:2093:15: error: 'memmove' specified bound between 9223372036854775808 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=]
 2093 |               memmove (w, s, (front_ptr -s) * sizeof (CHAR_T));
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is a genuine bug in the glibc code: s > front_ptr is always true
at this point in the code, and the intent is clearly for the
subtraction to be the other way round.  The other arguments to the
memmove call here also appear to be wrong; w and s point just *after*
the destination and source for copying the rest of the number, so the
size needs to be subtracted to get appropriate pointers for the
copying.  Adjust the memmove call to conform to the apparent intent of
the code, so fixing the -Wstringop-overflow error.

Now, if the original code were ever executed, a buffer overrun would
result.  However, I believe this code (introduced in commit
edc1686af0, "vfprintf: Reuse work_buffer
in group_number", so in glibc 2.26) is unreachable in prior glibc
releases (so there is no need for a bug in Bugzilla, no need to
consider any backports unless someone wants to build older glibc
releases with GCC 12 and no possibility of this buffer overrun
resulting in a security issue).

work_buffer is 1000 bytes / 250 wide characters.  This case is only
reachable if an initial part of the number, plus a grouped copy of the
rest of the number, fail to fit in that space; that is, if the grouped
number fails to fit in the space.  In the wide character case,
grouping is always one wide character, so even with a locale (of which
there aren't any in glibc) grouping every digit, a number would need
to occupy at least 125 wide characters to overflow, and a 64-bit
integer occupies at most 23 characters in octal including a leading 0.
In the narrow character case, the multibyte encoding of the grouping
separator would need to be at least 42 bytes to overflow, again
supposing grouping every digit, but MB_LEN_MAX is 16.  So even if we
admit the case of artificially constructed locales not shipped with
glibc, given that such a locale would need to use one of the character
sets supported by glibc, this code cannot be reached at present.  (And
POSIX only actually specifies the ' flag for grouping for decimal
output, though glibc acts on it for other bases as well.)

With binary output (if you consider use of grouping there to be
valid), you'd need a 15-byte multibyte character for overflow; I don't
know if any supported character set has such a character (if, again,
we admit constructed locales using grouping every digit and a grouping
separator chosen to have a multibyte encoding as long as possible, as
well as accepting use of grouping with binary), but given that we have
this code at all (clearly it's not *correct*, or in accordance with
the principle of avoiding arbitrary limits, to skip grouping on
running out of internal space like that), I don't think it should need
any further changes for binary printf support to go in.

On the other hand, support for large sizes of _BitInt in printf (see
the N2858 proposal) *would* require something to be done about such
arbitrary limits (presumably using dynamic allocation in printf again,
for sufficiently large _BitInt arguments only - currently only
floating-point uses dynamic allocation, and, as previously discussed,
that could actually be replaced by bounded allocation given smarter
code).

Tested with build-many-glibcs.py for aarch64-linux-gnu (GCC mainline).
Also tested natively for x86_64.

(cherry picked from commit db6c4935fa)
2022-08-30 10:09:58 +02:00
Joseph Myers
1dbe841a67 Remove most vfprintf width/precision-dependent allocations (bug 14231, bug 26211).
The vfprintf implementation (used for all printf-family functions)
contains complicated logic to allocate internal buffers of a size
depending on the width and precision used for a format, using either
malloc or alloca depending on that size, and with consequent checks
for size overflow and allocation failure.

As noted in bug 26211, the version of that logic used when '$' plus
argument number formats are in use is missing the overflow checks,
which can result in segfaults (quite possibly exploitable, I didn't
try to work that out) when the width or precision is in the range
0x7fffffe0 through 0x7fffffff (maybe smaller values as well in the
wprintf case on 32-bit systems, when the multiplication by sizeof
(CHAR_T) can overflow).

All that complicated logic in fact appears to be useless.  As far as I
can tell, there has been no need (outside the floating-point printf
code, which does its own allocations) for allocations depending on
width or precision since commit
3e95f6602b ("Remove limitation on size
of precision for integers", Sun Sep 12 21:23:32 1999 +0000).  Thus,
this patch removes that logic completely, thereby fixing both problems
with excessive allocations for large width and precision for
non-floating-point formats, and the problem with missing overflow
checks with such allocations.  Note that this does have the
consequence that width and precision up to INT_MAX are now allowed
where previously INT_MAX / sizeof (CHAR_T) - EXTSIZ or more would have
been rejected, so could potentially expose any other overflows where
the value would previously have been rejected by those removed checks.

I believe this completely fixes bugs 14231 and 26211.

Excessive allocations are still possible in the floating-point case
(bug 21127), as are other integer or buffer overflows (see bug 26201).
This does not address the cases where a precision larger than INT_MAX
(embedded in the format string) would be meaningful without printf's
return value overflowing (when it's used with a string format, or %g
without the '#' flag, so the actual output will be much smaller), as
mentioned in bug 17829 comment 8; using size_t internally for
precision to handle that case would be complicated by struct
printf_info being a public ABI.  Nor does it address the matter of an
INT_MIN width being negated (bug 17829 comment 7; the same logic
appears a second time in the file as well, in the form of multiplying
by -1).  There may be other sources of memory allocations with malloc
in printf functions as well (bug 24988, bug 16060).  From inspection,
I think there are also integer overflows in two copies of "if ((width
-= len) < 0)" logic (where width is int, len is size_t and a very long
string could result in spurious padding being output on a 32-bit
system before printf overflows the count of output characters).

Tested for x86-64 and x86.

(cherry picked from commit 6caddd34bd)
2022-08-30 10:09:13 +02:00
Adhemerval Zanella
5a802723db stdio: Add tests for printf multibyte convertion leak [BZ#25691]
Checked on x86_64-linux-gnu and i686-linux-gnu.

(cherry picked from commit 910a835dc9)
2022-08-30 10:06:38 +02:00
Florian Weimer
ae7748e67f stdio: Remove memory leak from multibyte convertion [BZ#25691]
This is an updated version of a previous patch [1] with the
following changes:

  - Use compiler overflow builtins on done_add_func function.
  - Define the scratch +utstring_converted_wide_string using
    CHAR_T.
  - Added a testcase and mention the bug report.

Both default and wide printf functions might leak memory when
manipulate multibyte characters conversion depending of the size
of the input (whether __libc_use_alloca trigger or not the fallback
heap allocation).

This patch fixes it by removing the extra memory allocation on
string formatting with conversion parts.

The testcase uses input argument size that trigger memory leaks
on unpatched code (using a scratch buffer the threashold to use
heap allocation is lower).

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

[1] https://sourceware.org/pipermail/libc-alpha/2017-June/082098.html

(cherry picked from commit 3cc4a8367c)
2022-08-30 10:05:43 +02:00
Aurelien Jarno
174d0b61c7 Linux: Require properly configured /dev/pts for PTYs
Current systems do not have BSD terminals, so the fallback code in
posix_openpt/getpt does not do anything.  Also remove the file system
check for /dev/pts.  Current systems always have a devpts file system
mounted there if /dev/ptmx exists.

grantpt is now essentially a no-op.  It only verifies that the
argument is a ptmx-descriptor.  Therefore, this change indirectly
addresses bug 24941.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-08-18 12:28:36 +02:00
Florian Weimer
0a167374fd Linux: Detect user namespace support in io/tst-getcwd-smallbuff
Otherwise the test fails with certain container runtimes.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 5b8e7980c5)
2022-08-18 00:14:28 +02:00
Siddhesh Poyarekar
4ad1659d8c getcwd: Set errno to ERANGE for size == 1 (CVE-2021-3999)
No valid path returned by getcwd would fit into 1 byte, so reject the
size early and return NULL with errno set to ERANGE.  This change is
prompted by CVE-2021-3999, which describes a single byte buffer
underflow and overflow when all of the following conditions are met:

- The buffer size (i.e. the second argument of getcwd) is 1 byte
- The current working directory is too long
- '/' is also mounted on the current working directory

Sequence of events:

- In sysdeps/unix/sysv/linux/getcwd.c, the syscall returns ENAMETOOLONG
  because the linux kernel checks for name length before it checks
  buffer size

- The code falls back to the generic getcwd in sysdeps/posix

- In the generic func, the buf[0] is set to '\0' on line 250

- this while loop on line 262 is bypassed:

    while (!(thisdev == rootdev && thisino == rootino))

  since the rootfs (/) is bind mounted onto the directory and the flow
  goes on to line 449, where it puts a '/' in the byte before the
  buffer.

- Finally on line 458, it moves 2 bytes (the underflowed byte and the
  '\0') to the buf[0] and buf[1], resulting in a 1 byte buffer overflow.

- buf is returned on line 469 and errno is not set.

This resolves BZ #28769.

Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: Qualys Security Advisory <qsa@qualys.com>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 23e0e8f5f1)
2022-08-18 00:14:28 +02:00
Siddhesh Poyarekar
3319cea99e support: Add helpers to create paths longer than PATH_MAX
Add new helpers support_create_and_chdir_toolong_temp_directory and
support_chdir_toolong_temp_directory to create and descend into
directory trees longer than PATH_MAX.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit fb7bff12e8)
2022-08-18 00:14:28 +02:00
Florian Weimer
f733e291bb support: Fix xclone build failures on ia64 and hppa
(cherry picked from commit 97ed4749be)
2022-08-18 00:14:28 +02:00
Adhemerval Zanella
43757c70ee support: Add xclone
It is a wrapper for Linux clone syscall, to simplify the call to the
use only the most common arguments and remove architecture specific
handling (such as ia64 different name and signature).

(cherry picked from commit de8995a2a0)
2022-08-18 00:14:28 +02:00
Alexandra Hájková
29d3aeb0e8 Add xchdir to libsupport.
(cherry picked from commit a7e9dbb774)
2022-08-18 00:14:28 +02:00
Adhemerval Zanella
2d7720f316 support: Add create_temp_file_in_dir
It allows created a temporary file in a specified directory.

(cherry picked from commit 60854f40ea)
2022-08-18 00:13:15 +02:00
H.J. Lu
183709983d NEWS: Add a bug fix entry for BZ #28896 2022-02-18 19:12:04 -08:00
Noah Goldstein
d385079bd5 x86: Fix TEST_NAME to make it a string in tst-strncmp-rtm.c
Previously TEST_NAME was passing a function pointer. This didn't fail
because of the -Wno-error flag (to allow for overflow sizes passed
to strncmp/wcsncmp)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit b98d0bbf74)
2022-02-18 18:20:50 -08:00
Noah Goldstein
7df3ad6560 x86: Test wcscmp RTM in the wcsncmp overflow case [BZ #28896]
In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would
call strcmp-avx2 and wcscmp-avx2 respectively. This would have
not checks around vzeroupper and would trigger spurious
aborts. This commit fixes that.

test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on
AVX2 machines with and without RTM.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 7835d611af)
2022-02-18 18:20:46 -08:00
Noah Goldstein
fc133fcf49 x86: Fallback {str|wcs}cmp RTM in the ncmp overflow case [BZ #28896]
In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would
call strcmp-avx2 and wcscmp-avx2 respectively. This would have
not checks around vzeroupper and would trigger spurious
aborts. This commit fixes that.

test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on
AVX2 machines with and without RTM.

Co-authored-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit c627209832)
2022-02-18 18:20:39 -08:00
H.J. Lu
775c05b28c string: Add a testcase for wcsncmp with SIZE_MAX [BZ #28755]
Verify that wcsncmp (L("abc"), L("abd"), SIZE_MAX) == 0.  The new test
fails without

commit ddf0992cf5
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Sun Jan 9 16:02:21 2022 -0600

    x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]

and

commit 7e08db3359
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Sun Jan 9 16:02:28 2022 -0600

    x86: Fix __wcsncmp_evex in strcmp-evex.S [BZ# 28755]

This is for BZ #28755.

Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>

(cherry picked from commit aa5a720056)
2022-02-17 11:33:07 -08:00
H.J. Lu
c6b346ec55 x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064]
commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 23 01:19:34 2021 -0400

    x86-64: Add wcslen optimize for sse4.1

added wcsnlen-sse4.1 to the wcslen ifunc implementation list.  Since the
random value in the the RSI register is larger than the wide-character
string length in the existing wcslen test, it didn't trigger the wcslen
test failure.  Add a test to force 0 into the RSI register before calling
wcslen.

(cherry picked from commit a6e7c3745d)
2022-02-01 12:23:36 -08:00
Noah Goldstein
0675185923 x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064]
The following commit

commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 23 01:19:34 2021 -0400

    x86-64: Add wcslen optimize for sse4.1

Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did
not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit
fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc
implementation list and adding wcslen-sse4.1 to the ifunc
implementation list.

Testing:
test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as
well as all other tests in wcsmbs and string.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 0679442def)
2022-02-01 12:23:29 -08:00
H.J. Lu
5db3239baf x86: Black list more Intel CPUs for TSX [BZ #27398]
Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in:

https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

This fixes BZ #27398.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 1e000d3d33)
2022-02-01 07:13:46 -08:00
H.J. Lu
5b99f172b8 x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033]
From

https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

* Intel TSX will be disabled by default.
* The processor will force abort all Restricted Transactional Memory (RTM)
  transactions by default.
* A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated,
  which is set to indicate to updated software that the loaded microcode is
  forcing RTM abort.
* On processors that enumerate support for RTM, the CPUID enumeration bits
  for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to
  be set by default after microcode update.
* Workloads that were benefited from Intel TSX might experience a change
  in performance.
* System software may use a new bit in Model-Specific Register (MSR) 0x10F
  TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock
  Elision (HLE) and RTM bits to indicate to software that Intel TSX is
  disabled.

1. Add RTM_ALWAYS_ABORT to CPUID features.
2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set.  This skips the
string/tst-memchr-rtm etc. testcases on the affected processors, which
always fail after a microcde update.
3. Check RTM feature, instead of usability, against /proc/cpuinfo.

This fixes BZ #28033.

(cherry picked from commit ea8e465a6b)
2022-02-01 07:12:35 -08:00
H.J. Lu
70d293a158 NEWS: Add a bug fix entry for BZ #27974 2022-01-27 15:50:22 -08:00
Noah Goldstein
a2be2c0f5d String: Add overflow tests for strnlen, memchr, and strncat [BZ #27974]
This commit adds tests for a bug in the wide char variant of the
functions where the implementation may assume that maxlen for wcsnlen
or n for wmemchr/strncat will not overflow when multiplied by
sizeof(wchar_t).

These tests show the following implementations failing on x86_64:

wcsnlen-sse4_1
wcsnlen-avx2

wmemchr-sse2
wmemchr-avx2

strncat would fail as well if it where on a system that prefered
either of the wcsnlen implementations that failed as it relies on
wcsnlen.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit da5a6fba0f)
2022-01-27 15:49:37 -08:00
Noah Goldstein
489006c3c5 x86: Optimize strlen-evex.S
No bug. This commit optimizes strlen-evex.S. The
optimizations are mostly small things but they add up to roughly
10-30% performance improvement for strlen. The results for strnlen are
bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and
test-wcsnlen are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 4ba6558684)
2022-01-27 15:49:32 -08:00
Noah Goldstein
937f2c783a x86: Fix overflow bug in wcsnlen-sse4_1 and wcsnlen-avx2 [BZ #27974]
This commit fixes the bug mentioned in the previous commit.

The previous implementations of wmemchr in these files relied
on maxlen * sizeof(wchar_t) which was not guranteed by the standard.

The new overflow tests added in the previous commit now
pass (As well as all the other tests).

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit a775a7a3eb)
2022-01-27 15:49:27 -08:00
Noah Goldstein
0058c73d11 x86-64: Add wcslen optimize for sse4.1
No bug. This comment adds the ifunc / build infrastructure
necessary for wcslen to prefer the sse4.1 implementation
in strlen-vec.S. test-wcslen.c is passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 6f573a27b6)
2022-01-27 15:49:00 -08:00
H.J. Lu
665d0252f1 x86-64: Move strlen.S to multiarch/strlen-vec.S
Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1
version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S
and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants.
This also removes the unused symbols, __GI___strlen_sse2 and
__GI___wcsnlen_sse4_1.

(cherry picked from commit a0db678071)
2022-01-27 15:43:25 -08:00
Alice Xu
82ff13e2cc x86-64: Fix an unknown vector operation in memchr-evex.S
An unknown vector operation occurred in commit 2a76821c30. Fixed it
by using "ymm{k1}{z}" but not "ymm {k1} {z}".

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 6ea916adfa)
2022-01-27 15:43:19 -08:00
Noah Goldstein
539b593a1d x86: Optimize memchr-evex.S
No bug. This commit optimizes memchr-evex.S. The optimizations include
replacing some branches with cmovcc, avoiding some branches entirely
in the less_4x_vec case, making the page cross logic less strict,
saving some ALU in the alignment process, and most importantly
increasing ILP in the 4x loop. test-memchr, test-rawmemchr, and
test-wmemchr are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 2a76821c30)
2022-01-27 15:43:09 -08:00
Noah Goldstein
7b37ae60c6 x86: Optimize strlen-avx2.S
No bug. This commit optimizes strlen-avx2.S. The optimizations are
mostly small things but they add up to roughly 10-30% performance
improvement for strlen. The results for strnlen are bit more
ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen
are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit aaa23c3507)
2022-01-27 15:42:56 -08:00
Noah Goldstein
0381c1c10d x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974]
This commit fixes the bug mentioned in the previous commit.

The previous implementations of wmemchr in these files relied
on n * sizeof(wchar_t) which was not guranteed by the standard.

The new overflow tests added in the previous commit now
pass (As well as all the other tests).

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 645a158978)
2022-01-27 15:32:50 -08:00
Noah Goldstein
10368cb76b x86: Optimize memchr-avx2.S
No bug. This commit optimizes memchr-avx2.S. The optimizations include
replacing some branches with cmovcc, avoiding some branches entirely
in the less_4x_vec case, making the page cross logic less strict,
asaving a few instructions the in loop return loop. test-memchr,
test-rawmemchr, and test-wmemchr are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit acfd088a19)
2022-01-27 15:32:50 -08:00
H.J. Lu
66ca40582e test-strnlen.c: Check that strnlen won't go beyond the maximum length
Place strings ending at page boundary without the null byte.  If an
implementation goes beyond EXP_LEN, it will trigger the segfault.

(cherry picked from commit cb882b21b6)
2022-01-27 15:32:50 -08:00
H.J. Lu
927bcaf892 test-strnlen.c: Initialize wchar_t string with wmemset [BZ #27655]
Use wmemset to initialize wchar_t string.

(cherry picked from commit 86859b7e58)
2022-01-27 15:32:50 -08:00
H.J. Lu
0d4159c36c x86-64: Require BMI2 for __strlen_evex and __strnlen_evex
Since __strlen_evex and __strnlen_evex added by

commit 1fd8c163a8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 5 06:24:52 2021 -0800

    x86-64: Add ifunc-avx2.h functions with 256-bit EVEX

use sarx:

c4 e2 6a f7 c0       	sarx   %edx,%eax,%eax

require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c.
ifunc-avx2.h already requires BMI2 for EVEX implementation.

(cherry picked from commit 55bf411b45)
2022-01-27 15:32:50 -08:00
H.J. Lu
c0cbb9345e NEWS: Add a bug fix entry for BZ #27457 2022-01-27 12:23:42 -08:00
Sunil K Pandey
e81b975fcc x86-64: Fix ifdef indentation in strlen-evex.S
Fix some indentations of ifdef in file strlen-evex.S which are off by 1
and confusing to read.

(cherry picked from commit 595c22ecd8)
2022-01-27 11:40:18 -08:00
H.J. Lu
aa4e48e73c x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.

(cherry picked from commit e4fda46310)
2022-01-27 11:40:18 -08:00
H.J. Lu
ac911d3b57 x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.

(cherry picked from commit 4e2d8f3527)
2022-01-27 11:40:18 -08:00
H.J. Lu
20d37de533 x86: Add string/memory function tests in RTM region
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort.   When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation.  Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.

(cherry picked from commit 4bd660be40)
2022-01-27 11:40:18 -08:00
H.J. Lu
fbaa99ed41 x86-64: Add AVX optimized string/memory functions for RTM
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with

	xtest
	jz	1f
	vzeroall
	ret
1:
	vzeroupper
	ret

at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.

(cherry picked from commit 7ebba91361)
2022-01-27 11:40:18 -08:00
H.J. Lu
096e14f632 x86-64: Add memcmp family functions with 256-bit EVEX
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.

(cherry picked from commit 91264fe357)
2022-01-27 11:40:18 -08:00
H.J. Lu
f00fad4e4c x86-64: Add memset family functions with 256-bit EVEX
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.

(cherry picked from commit 1b968b6b9b)
2022-01-27 11:40:18 -08:00
H.J. Lu
cf239ddd2e x86-64: Add memmove family functions with 256-bit EVEX
Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.

(cherry picked from commit 63ad43566f)
2022-01-27 11:40:18 -08:00
H.J. Lu
7257ba7bf2 x86-64: Add strcpy family functions with 256-bit EVEX
Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.

(cherry picked from commit 525bc2a32c)
2022-01-27 11:40:18 -08:00