glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-09 23:00:07 +00:00

Author	SHA1	Message	Date
Florian Weimer	12d4ddf3c1	nscd: Use time_t for return type of addgetnetgrentX Using int may give false results for future dates (timeouts after the year 2028). Fixes commit 04a21e050d64a1193a6daab872bca2528bda44b ("CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680)"). Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `4bbca1a446`)	2024-05-03 09:26:30 +02:00
Florian Weimer	bbf5a58ccb	CVE-2024-33601, CVE-2024-33602: nscd: netgroup: Use two buffers in addgetnetgrentX (bug 31680) This avoids potential memory corruption when the underlying NSS callback function does not use the buffer space to store all strings (e.g., for constant strings). Instead of custom buffer management, two scratch buffers are used. This increases stack usage somewhat. Scratch buffer allocation failure is handled by return -1 (an invalid timeout value) instead of terminating the process. This fixes bug 31679. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `c04a21e050`)	2024-04-25 16:12:02 +02:00
Florian Weimer	8d79491837	CVE-2024-33600: nscd: Avoid null pointer crashes after notfound response (bug 31678) The addgetnetgrentX call in addinnetgrX may have failed to produce a result, so the result variable in addinnetgrX can be NULL. Use db->negtimeout as the fallback value if there is no result data; the timeout is also overwritten below. Also avoid sending a second not-found response. (The client disconnects after receiving the first response, so the data stream did not go out of sync even without this fix.) It is still beneficial to add the negative response to the mapping, so that the client can get it from there in the future, instead of going through the socket. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `b048a482f0`)	2024-04-25 16:12:02 +02:00
Florian Weimer	304ce5fe46	CVE-2024-33600: nscd: Do not send missing not-found response in addgetnetgrentX (bug 31678) If we failed to add a not-found response to the cache, the dataset point can be null, resulting in a null pointer dereference. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `7835b00dbc`)	2024-04-25 16:12:02 +02:00
Florian Weimer	69c58d5ef9	CVE-2024-33599: nscd: Stack-based buffer overflow in netgroup cache (bug 31677) Using alloca matches what other caches do. The request length is bounded by MAXKEYLEN. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `87801a8fd0`)	2024-04-25 16:12:02 +02:00
Charles Fol	3703c32a8d	iconv: ISO-2022-CN-EXT: fix out-of-bound writes when writing escape sequence (CVE-2024-2961) ISO-2022-CN-EXT uses escape sequences to indicate character set changes (as specified by RFC 1922). While the SOdesignation has the expected bounds checks, neither SS2designation nor SS3designation have its; allowing a write overflow of 1, 2, or 3 bytes with fixed values: '$+I', '$+J', '$+K', '$+L', '$+M', or '$*H'. Checked on aarch64-linux-gnu. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `f9dc609e06`)	2024-04-19 19:58:14 +02:00
Wilco Dijkstra	531717afbc	aarch64: Use memcpy_simd as the default memcpy Since __memcpy_simd is the fastest memcpy on almost all cores, replace the generic memcpy with it. (cherry picked from commit `91ac82d0c6`)	2024-04-09 19:29:00 +01:00
Sunil K Pandey	61f2272bc6	x86_64: Optimize ffsll function code size. Ffsll function randomly regress by ~20%, depending on how code gets aligned in memory. Ffsll function code size is 17 bytes. Since default function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned memory, entire code fits in single 64 bytes cache line. When ffsll function load at 48 bytes aligned memory, it splits in two cache line, hence random regression. Ffsll function size reduction from 17 bytes to 12 bytes ensures that it will always fit in single 64 bytes cache line. This patch fixes ffsll function random performance regression. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `9d94997b5f`)	2024-02-08 09:00:17 -08:00
Noah Goldstein	9f27ef8090	x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745] The: ``` if (shared_per_thread > 0 && threads > 0) shared_per_thread /= threads; ``` Code was accidentally moved to inside the else scope. This doesn't match how it was previously (before `af992e7abd`). This patch fixes that by putting the division after the `else` block. (cherry picked from commit `084fb31bc2`)	2023-09-11 22:47:46 -05:00
Noah Goldstein	01a8874eba	x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold. On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `8b9a0af8ca`)	2023-09-11 22:47:46 -05:00
Noah Goldstein	047968e81d	x86: Fix slight bug in `shared_per_thread` cache size calculation. After: ``` commit `af992e7abd` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 7 13:18:01 2023 -0500 x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` ``` Split `shared` (cumulative cache size) from `shared_per_thread` (cache size per socket), the `shared_per_thread` can be slightly off from the previous calculation. Previously we added `core` even if `threads_l2` was invalid, and only used `threads_l2` to divide `core` if it was present. The changed version only included `core` if `threads_l2` was valid. This change restores the old behavior if `threads_l2` is invalid by adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `47f7472178`)	2023-09-11 22:47:46 -05:00
Noah Goldstein	9e5693b446	x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 / ncores_per_socket'. This patch updates that value to roughly 'sizeof_L3 / 4` The original value (specifically dividing the `ncores_per_socket`) was done to limit the amount of other threads' data a `memcpy`/`memset` could evict. Dividing by 'ncores_per_socket', however leads to exceedingly low non-temporal thresholds and leads to using non-temporal stores in cases where REP MOVSB is multiple times faster. Furthermore, non-temporal stores are written directly to main memory so using it at a size much smaller than L3 can place soon to be accessed data much further away than it otherwise could be. As well, modern machines are able to detect streaming patterns (especially if REP MOVSB is used) and provide LRU hints to the memory subsystem. This in affect caps the total amount of eviction at 1/cache_associativity, far below meaningfully thrashing the entire cache. As best I can tell, the benchmarks that lead this small threshold where done comparing non-temporal stores versus standard cacheable stores. A better comparison (linked below) is to be REP MOVSB which, on the measure systems, is nearly 2x faster than non-temporal stores at the low-end of the previous threshold, and within 10% for over 100MB copies (well past even the current threshold). In cases with a low number of threads competing for bandwidth, REP MOVSB is ~2x faster up to `sizeof_L3`. The divisor of `4` is a somewhat arbitrary value. From benchmarks it seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs such as Broadwell prefer something closer to `8`. This patch is meant to be followed up by another one to make the divisor cpu-specific, but in the meantime (and for easier backporting), this patch settles on `4` as a middle-ground. Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable stores where done using: https://github.com/goldsteinn/memcpy-nt-benchmarks Sheets results (also available in pdf on the github): https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `af992e7abd`)	2023-09-11 22:47:46 -05:00
Florian Weimer	1c08c17156	debug: Mark libSegFault.so as NODELETE The signal handler installed in the ELF constructor cannot easily be removed again (because the program may have changed handlers in the meantime). Mark the object as NODELETE so that the registered handler function is never unloaded. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `23ee92deea`)	2023-07-21 16:40:30 +02:00
Noah Goldstein	2d4f26e5cf	x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591 ] Previous implementation was adjusting length (rsi) to match bytes (eax), but since there is no bound to length this can cause overflow. Fix is to just convert the byte-count (eax) to length by dividing by sizeof (wchar_t) before the comparison. Full check passes on x86-64 and build succeeds w/ and w/o multiarch. (cherry picked from commit `b0969fa53a`)	2022-11-24 17:15:54 -08:00
Sunil K Pandey	d4b7559457	x86-64: Require BMI2 for avx2 functions [BZ #29611 ] This patch fixes BZ #29611	2022-09-28 18:08:36 -07:00
H.J. Lu	b8bb48a18d	x86-64: Require BMI2 for strchr-avx2.S [BZ #29611 ] Since strchr-avx2.S updated by commit `1f745ecc21` Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h. This fixes BZ #29611. (cherry picked from commit `83c5b36822`)	2022-09-28 18:08:22 -07:00
Andreas Schwab	c8f2a3e803	Add test for bug 29530 This tests for a bug that was introduced in commit `edc1686af0` ("vfprintf: Reuse work_buffer in group_number") and fixed as a side effect of commit `6caddd34bd` ("Remove most vfprintf width/precision-dependent allocations (bug 14231, bug 26211)."). (cherry picked from commit `ca6466e8be`)	2022-08-30 10:45:40 +02:00
Joseph Myers	e6ae5b25cd	Fix memmove call in vfprintf-internal.c:group_number A recent GCC mainline change introduces errors of the form: vfprintf-internal.c: In function 'group_number': vfprintf-internal.c:2093:15: error: 'memmove' specified bound between 9223372036854775808 and 18446744073709551615 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=] 2093 \| memmove (w, s, (front_ptr -s) * sizeof (CHAR_T)); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a genuine bug in the glibc code: s > front_ptr is always true at this point in the code, and the intent is clearly for the subtraction to be the other way round. The other arguments to the memmove call here also appear to be wrong; w and s point just after the destination and source for copying the rest of the number, so the size needs to be subtracted to get appropriate pointers for the copying. Adjust the memmove call to conform to the apparent intent of the code, so fixing the -Wstringop-overflow error. Now, if the original code were ever executed, a buffer overrun would result. However, I believe this code (introduced in commit `edc1686af0`, "vfprintf: Reuse work_buffer in group_number", so in glibc 2.26) is unreachable in prior glibc releases (so there is no need for a bug in Bugzilla, no need to consider any backports unless someone wants to build older glibc releases with GCC 12 and no possibility of this buffer overrun resulting in a security issue). work_buffer is 1000 bytes / 250 wide characters. This case is only reachable if an initial part of the number, plus a grouped copy of the rest of the number, fail to fit in that space; that is, if the grouped number fails to fit in the space. In the wide character case, grouping is always one wide character, so even with a locale (of which there aren't any in glibc) grouping every digit, a number would need to occupy at least 125 wide characters to overflow, and a 64-bit integer occupies at most 23 characters in octal including a leading 0. In the narrow character case, the multibyte encoding of the grouping separator would need to be at least 42 bytes to overflow, again supposing grouping every digit, but MB_LEN_MAX is 16. So even if we admit the case of artificially constructed locales not shipped with glibc, given that such a locale would need to use one of the character sets supported by glibc, this code cannot be reached at present. (And POSIX only actually specifies the ' flag for grouping for decimal output, though glibc acts on it for other bases as well.) With binary output (if you consider use of grouping there to be valid), you'd need a 15-byte multibyte character for overflow; I don't know if any supported character set has such a character (if, again, we admit constructed locales using grouping every digit and a grouping separator chosen to have a multibyte encoding as long as possible, as well as accepting use of grouping with binary), but given that we have this code at all (clearly it's not correct, or in accordance with the principle of avoiding arbitrary limits, to skip grouping on running out of internal space like that), I don't think it should need any further changes for binary printf support to go in. On the other hand, support for large sizes of _BitInt in printf (see the N2858 proposal) would require something to be done about such arbitrary limits (presumably using dynamic allocation in printf again, for sufficiently large _BitInt arguments only - currently only floating-point uses dynamic allocation, and, as previously discussed, that could actually be replaced by bounded allocation given smarter code). Tested with build-many-glibcs.py for aarch64-linux-gnu (GCC mainline). Also tested natively for x86_64. (cherry picked from commit `db6c4935fa`)	2022-08-30 10:09:58 +02:00
Joseph Myers	1dbe841a67	Remove most vfprintf width/precision-dependent allocations (bug 14231, bug 26211). The vfprintf implementation (used for all printf-family functions) contains complicated logic to allocate internal buffers of a size depending on the width and precision used for a format, using either malloc or alloca depending on that size, and with consequent checks for size overflow and allocation failure. As noted in bug 26211, the version of that logic used when '$' plus argument number formats are in use is missing the overflow checks, which can result in segfaults (quite possibly exploitable, I didn't try to work that out) when the width or precision is in the range 0x7fffffe0 through 0x7fffffff (maybe smaller values as well in the wprintf case on 32-bit systems, when the multiplication by sizeof (CHAR_T) can overflow). All that complicated logic in fact appears to be useless. As far as I can tell, there has been no need (outside the floating-point printf code, which does its own allocations) for allocations depending on width or precision since commit `3e95f6602b` ("Remove limitation on size of precision for integers", Sun Sep 12 21:23:32 1999 +0000). Thus, this patch removes that logic completely, thereby fixing both problems with excessive allocations for large width and precision for non-floating-point formats, and the problem with missing overflow checks with such allocations. Note that this does have the consequence that width and precision up to INT_MAX are now allowed where previously INT_MAX / sizeof (CHAR_T) - EXTSIZ or more would have been rejected, so could potentially expose any other overflows where the value would previously have been rejected by those removed checks. I believe this completely fixes bugs 14231 and 26211. Excessive allocations are still possible in the floating-point case (bug 21127), as are other integer or buffer overflows (see bug 26201). This does not address the cases where a precision larger than INT_MAX (embedded in the format string) would be meaningful without printf's return value overflowing (when it's used with a string format, or %g without the '#' flag, so the actual output will be much smaller), as mentioned in bug 17829 comment 8; using size_t internally for precision to handle that case would be complicated by struct printf_info being a public ABI. Nor does it address the matter of an INT_MIN width being negated (bug 17829 comment 7; the same logic appears a second time in the file as well, in the form of multiplying by -1). There may be other sources of memory allocations with malloc in printf functions as well (bug 24988, bug 16060). From inspection, I think there are also integer overflows in two copies of "if ((width -= len) < 0)" logic (where width is int, len is size_t and a very long string could result in spurious padding being output on a 32-bit system before printf overflows the count of output characters). Tested for x86-64 and x86. (cherry picked from commit `6caddd34bd`)	2022-08-30 10:09:13 +02:00
Adhemerval Zanella	5a802723db	stdio: Add tests for printf multibyte convertion leak [BZ#25691] Checked on x86_64-linux-gnu and i686-linux-gnu. (cherry picked from commit `910a835dc9`)	2022-08-30 10:06:38 +02:00
Florian Weimer	ae7748e67f	stdio: Remove memory leak from multibyte convertion [BZ#25691] This is an updated version of a previous patch [1] with the following changes: - Use compiler overflow builtins on done_add_func function. - Define the scratch +utstring_converted_wide_string using CHAR_T. - Added a testcase and mention the bug report. Both default and wide printf functions might leak memory when manipulate multibyte characters conversion depending of the size of the input (whether __libc_use_alloca trigger or not the fallback heap allocation). This patch fixes it by removing the extra memory allocation on string formatting with conversion parts. The testcase uses input argument size that trigger memory leaks on unpatched code (using a scratch buffer the threashold to use heap allocation is lower). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> [1] https://sourceware.org/pipermail/libc-alpha/2017-June/082098.html (cherry picked from commit `3cc4a8367c`)	2022-08-30 10:05:43 +02:00
Aurelien Jarno	174d0b61c7	Linux: Require properly configured /dev/pts for PTYs Current systems do not have BSD terminals, so the fallback code in posix_openpt/getpt does not do anything. Also remove the file system check for /dev/pts. Current systems always have a devpts file system mounted there if /dev/ptmx exists. grantpt is now essentially a no-op. It only verifies that the argument is a ptmx-descriptor. Therefore, this change indirectly addresses bug 24941. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-08-18 12:28:36 +02:00
Florian Weimer	0a167374fd	Linux: Detect user namespace support in io/tst-getcwd-smallbuff Otherwise the test fails with certain container runtimes. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `5b8e7980c5`)	2022-08-18 00:14:28 +02:00
Siddhesh Poyarekar	4ad1659d8c	getcwd: Set errno to ERANGE for size == 1 (CVE-2021-3999) No valid path returned by getcwd would fit into 1 byte, so reject the size early and return NULL with errno set to ERANGE. This change is prompted by CVE-2021-3999, which describes a single byte buffer underflow and overflow when all of the following conditions are met: - The buffer size (i.e. the second argument of getcwd) is 1 byte - The current working directory is too long - '/' is also mounted on the current working directory Sequence of events: - In sysdeps/unix/sysv/linux/getcwd.c, the syscall returns ENAMETOOLONG because the linux kernel checks for name length before it checks buffer size - The code falls back to the generic getcwd in sysdeps/posix - In the generic func, the buf[0] is set to '\0' on line 250 - this while loop on line 262 is bypassed: while (!(thisdev == rootdev && thisino == rootino)) since the rootfs (/) is bind mounted onto the directory and the flow goes on to line 449, where it puts a '/' in the byte before the buffer. - Finally on line 458, it moves 2 bytes (the underflowed byte and the '\0') to the buf[0] and buf[1], resulting in a 1 byte buffer overflow. - buf is returned on line 469 and errno is not set. This resolves BZ #28769. Reviewed-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Qualys Security Advisory <qsa@qualys.com> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `23e0e8f5f1`)	2022-08-18 00:14:28 +02:00
Siddhesh Poyarekar	3319cea99e	support: Add helpers to create paths longer than PATH_MAX Add new helpers support_create_and_chdir_toolong_temp_directory and support_chdir_toolong_temp_directory to create and descend into directory trees longer than PATH_MAX. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `fb7bff12e8`)	2022-08-18 00:14:28 +02:00
Florian Weimer	f733e291bb	support: Fix xclone build failures on ia64 and hppa (cherry picked from commit `97ed4749be`)	2022-08-18 00:14:28 +02:00
Adhemerval Zanella	43757c70ee	support: Add xclone It is a wrapper for Linux clone syscall, to simplify the call to the use only the most common arguments and remove architecture specific handling (such as ia64 different name and signature). (cherry picked from commit `de8995a2a0`)	2022-08-18 00:14:28 +02:00
Alexandra Hájková	29d3aeb0e8	Add xchdir to libsupport. (cherry picked from commit `a7e9dbb774`)	2022-08-18 00:14:28 +02:00
Adhemerval Zanella	2d7720f316	support: Add create_temp_file_in_dir It allows created a temporary file in a specified directory. (cherry picked from commit `60854f40ea`)	2022-08-18 00:13:15 +02:00
H.J. Lu	183709983d	NEWS: Add a bug fix entry for BZ #28896	2022-02-18 19:12:04 -08:00
Noah Goldstein	d385079bd5	x86: Fix TEST_NAME to make it a string in tst-strncmp-rtm.c Previously TEST_NAME was passing a function pointer. This didn't fail because of the -Wno-error flag (to allow for overflow sizes passed to strncmp/wcsncmp) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `b98d0bbf74`)	2022-02-18 18:20:50 -08:00
Noah Goldstein	7df3ad6560	x86: Test wcscmp RTM in the wcsncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `7835d611af`)	2022-02-18 18:20:46 -08:00
Noah Goldstein	fc133fcf49	x86: Fallback {str\|wcs}cmp RTM in the ncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `c627209832`)	2022-02-18 18:20:39 -08:00
H.J. Lu	775c05b28c	string: Add a testcase for wcsncmp with SIZE_MAX [BZ #28755 ] Verify that wcsncmp (L("abc"), L("abd"), SIZE_MAX) == 0. The new test fails without commit `ddf0992cf5` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:21 2022 -0600 x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] and commit `7e08db3359` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:28 2022 -0600 x86: Fix __wcsncmp_evex in strcmp-evex.S [BZ# 28755] This is for BZ #28755. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com> (cherry picked from commit `aa5a720056`)	2022-02-17 11:33:07 -08:00
H.J. Lu	c6b346ec55	x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064 ] commit `6f573a27b6` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 added wcsnlen-sse4.1 to the wcslen ifunc implementation list. Since the random value in the the RSI register is larger than the wide-character string length in the existing wcslen test, it didn't trigger the wcslen test failure. Add a test to force 0 into the RSI register before calling wcslen. (cherry picked from commit `a6e7c3745d`)	2022-02-01 12:23:36 -08:00
Noah Goldstein	0675185923	x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064 ] The following commit commit `6f573a27b6` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc implementation list and adding wcslen-sse4.1 to the ifunc implementation list. Testing: test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as well as all other tests in wcsmbs and string. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `0679442def`)	2022-02-01 12:23:29 -08:00
H.J. Lu	5db3239baf	x86: Black list more Intel CPUs for TSX [BZ #27398 ] Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `1e000d3d33`)	2022-02-01 07:13:46 -08:00
H.J. Lu	5b99f172b8	x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033 ] From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033. (cherry picked from commit `ea8e465a6b`)	2022-02-01 07:12:35 -08:00
H.J. Lu	70d293a158	NEWS: Add a bug fix entry for BZ #27974	2022-01-27 15:50:22 -08:00
Noah Goldstein	a2be2c0f5d	String: Add overflow tests for strnlen, memchr, and strncat [BZ #27974 ] This commit adds tests for a bug in the wide char variant of the functions where the implementation may assume that maxlen for wcsnlen or n for wmemchr/strncat will not overflow when multiplied by sizeof(wchar_t). These tests show the following implementations failing on x86_64: wcsnlen-sse4_1 wcsnlen-avx2 wmemchr-sse2 wmemchr-avx2 strncat would fail as well if it where on a system that prefered either of the wcsnlen implementations that failed as it relies on wcsnlen. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `da5a6fba0f`)	2022-01-27 15:49:37 -08:00
Noah Goldstein	489006c3c5	x86: Optimize strlen-evex.S No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `4ba6558684`)	2022-01-27 15:49:32 -08:00
Noah Goldstein	937f2c783a	x86: Fix overflow bug in wcsnlen-sse4_1 and wcsnlen-avx2 [BZ #27974 ] This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on maxlen * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `a775a7a3eb`)	2022-01-27 15:49:27 -08:00
Noah Goldstein	0058c73d11	x86-64: Add wcslen optimize for sse4.1 No bug. This comment adds the ifunc / build infrastructure necessary for wcslen to prefer the sse4.1 implementation in strlen-vec.S. test-wcslen.c is passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `6f573a27b6`)	2022-01-27 15:49:00 -08:00
H.J. Lu	665d0252f1	x86-64: Move strlen.S to multiarch/strlen-vec.S Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1 version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants. This also removes the unused symbols, __GI___strlen_sse2 and __GI___wcsnlen_sse4_1. (cherry picked from commit `a0db678071`)	2022-01-27 15:43:25 -08:00
Alice Xu	82ff13e2cc	x86-64: Fix an unknown vector operation in memchr-evex.S An unknown vector operation occurred in commit `2a76821c30`. Fixed it by using "ymm{k1}{z}" but not "ymm {k1} {z}". Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `6ea916adfa`)	2022-01-27 15:43:19 -08:00
Noah Goldstein	539b593a1d	x86: Optimize memchr-evex.S No bug. This commit optimizes memchr-evex.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, saving some ALU in the alignment process, and most importantly increasing ILP in the 4x loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `2a76821c30`)	2022-01-27 15:43:09 -08:00
Noah Goldstein	7b37ae60c6	x86: Optimize strlen-avx2.S No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `aaa23c3507`)	2022-01-27 15:42:56 -08:00
Noah Goldstein	0381c1c10d	x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974 ] This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on n * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `645a158978`)	2022-01-27 15:32:50 -08:00
Noah Goldstein	10368cb76b	x86: Optimize memchr-avx2.S No bug. This commit optimizes memchr-avx2.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, asaving a few instructions the in loop return loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `acfd088a19`)	2022-01-27 15:32:50 -08:00
H.J. Lu	66ca40582e	test-strnlen.c: Check that strnlen won't go beyond the maximum length Place strings ending at page boundary without the null byte. If an implementation goes beyond EXP_LEN, it will trigger the segfault. (cherry picked from commit `cb882b21b6`)	2022-01-27 15:32:50 -08:00

1 2 3 4 5 ...

35531 Commits