glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-09-19 16:10:01 +00:00

Author	SHA1	Message	Date
H.J. Lu	a486152569	string: Add a testcase for wcsncmp with SIZE_MAX [BZ #28755 ] Verify that wcsncmp (L("abc"), L("abd"), SIZE_MAX) == 0. The new test fails without commit `ddf0992cf5` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:21 2022 -0600 x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] and commit `7e08db3359` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:28 2022 -0600 x86: Fix __wcsncmp_evex in strcmp-evex.S [BZ# 28755] This is for BZ #28755. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com> (cherry picked from commit `aa5a720056`)	2022-02-17 11:43:34 -08:00
H.J. Lu	28689d6255	x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064 ] commit `6f573a27b6` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 added wcsnlen-sse4.1 to the wcslen ifunc implementation list. Since the random value in the the RSI register is larger than the wide-character string length in the existing wcslen test, it didn't trigger the wcslen test failure. Add a test to force 0 into the RSI register before calling wcslen. (cherry picked from commit `a6e7c3745d`)	2022-02-01 12:51:16 -08:00
Noah Goldstein	352cb39fa0	x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064 ] The following commit commit `6f573a27b6` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 23 01:19:34 2021 -0400 x86-64: Add wcslen optimize for sse4.1 Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc implementation list and adding wcslen-sse4.1 to the ifunc implementation list. Testing: test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as well as all other tests in wcsmbs and string. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `0679442def`)	2022-02-01 12:51:09 -08:00
H.J. Lu	5d94c86a47	x86: Black list more Intel CPUs for TSX [BZ #27398 ] Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `1e000d3d33`)	2022-02-01 07:38:16 -08:00
H.J. Lu	7aac95739a	x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033 ] From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033. (cherry picked from commit `ea8e465a6b`)	2022-02-01 07:38:09 -08:00
H.J. Lu	a0ed5893fc	NEWS: Add a bug fix entry for BZ #27974	2022-01-27 16:27:20 -08:00
Noah Goldstein	ac1d6f25d6	String: Add overflow tests for strnlen, memchr, and strncat [BZ #27974 ] This commit adds tests for a bug in the wide char variant of the functions where the implementation may assume that maxlen for wcsnlen or n for wmemchr/strncat will not overflow when multiplied by sizeof(wchar_t). These tests show the following implementations failing on x86_64: wcsnlen-sse4_1 wcsnlen-avx2 wmemchr-sse2 wmemchr-avx2 strncat would fail as well if it where on a system that prefered either of the wcsnlen implementations that failed as it relies on wcsnlen. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `da5a6fba0f`)	2022-01-27 16:26:26 -08:00
Noah Goldstein	f9fe740ec4	x86: Optimize strlen-evex.S No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `4ba6558684`)	2022-01-27 16:26:21 -08:00
Noah Goldstein	dd92fa3029	x86: Fix overflow bug in wcsnlen-sse4_1 and wcsnlen-avx2 [BZ #27974 ] This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on maxlen * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `a775a7a3eb`)	2022-01-27 16:26:15 -08:00
Noah Goldstein	2599967449	x86-64: Add wcslen optimize for sse4.1 No bug. This comment adds the ifunc / build infrastructure necessary for wcslen to prefer the sse4.1 implementation in strlen-vec.S. test-wcslen.c is passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `6f573a27b6`)	2022-01-27 16:26:10 -08:00
H.J. Lu	ccf4e0edde	x86-64: Move strlen.S to multiarch/strlen-vec.S Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1 version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants. This also removes the unused symbols, __GI___strlen_sse2 and __GI___wcsnlen_sse4_1. (cherry picked from commit `a0db678071`)	2022-01-27 16:26:04 -08:00
Alice Xu	3cee1ddad2	x86-64: Fix an unknown vector operation in memchr-evex.S An unknown vector operation occurred in commit `2a76821c30`. Fixed it by using "ymm{k1}{z}" but not "ymm {k1} {z}". Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `6ea916adfa`)	2022-01-27 16:25:58 -08:00
Noah Goldstein	cba2fbe17f	x86: Optimize memchr-evex.S No bug. This commit optimizes memchr-evex.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, saving some ALU in the alignment process, and most importantly increasing ILP in the 4x loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `2a76821c30`)	2022-01-27 16:25:53 -08:00
Noah Goldstein	0a3b2efccc	x86: Optimize strlen-avx2.S No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `aaa23c3507`)	2022-01-27 16:25:48 -08:00
Noah Goldstein	c51e9501c2	x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974 ] This commit fixes the bug mentioned in the previous commit. The previous implementations of wmemchr in these files relied on n * sizeof(wchar_t) which was not guranteed by the standard. The new overflow tests added in the previous commit now pass (As well as all the other tests). Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `645a158978`)	2022-01-27 16:25:42 -08:00
Noah Goldstein	5c38877df5	x86: Optimize memchr-avx2.S No bug. This commit optimizes memchr-avx2.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, asaving a few instructions the in loop return loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `acfd088a19`)	2022-01-27 16:25:37 -08:00
H.J. Lu	8fa2afcec3	test-strnlen.c: Check that strnlen won't go beyond the maximum length Place strings ending at page boundary without the null byte. If an implementation goes beyond EXP_LEN, it will trigger the segfault. (cherry picked from commit `cb882b21b6`)	2022-01-27 16:25:31 -08:00
H.J. Lu	0717959eb3	test-strnlen.c: Initialize wchar_t string with wmemset [BZ #27655 ] Use wmemset to initialize wchar_t string. (cherry picked from commit `86859b7e58`)	2022-01-27 16:25:26 -08:00
H.J. Lu	3fc179e2b2	x86-64: Require BMI2 for __strlen_evex and __strnlen_evex Since __strlen_evex and __strnlen_evex added by commit `1fd8c163a8` Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation. (cherry picked from commit `55bf411b45`)	2022-01-27 16:25:20 -08:00
H.J. Lu	c3535cb6cd	NEWS: Add a bug fix entry for BZ #27457	2022-01-27 12:25:41 -08:00
Sunil K Pandey	aff5eec9ce	x86-64: Fix ifdef indentation in strlen-evex.S Fix some indentations of ifdef in file strlen-evex.S which are off by 1 and confusing to read. (cherry picked from commit `595c22ecd8`)	2022-01-27 12:07:11 -08:00
H.J. Lu	7056607e6a	x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions Update ifunc-memmove.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit. (cherry picked from commit `e4fda46310`)	2022-01-27 12:07:11 -08:00
H.J. Lu	a54fae8df4	x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit `4e2d8f3527`)	2022-01-27 12:07:11 -08:00
H.J. Lu	674137c9b0	x86: Add string/memory function tests in RTM region At function exit, AVX optimized string/memory functions have VZEROUPPER which triggers RTM abort. When such functions are called inside a transactionally executing RTM region, RTM abort causes severe performance degradation. Add tests to verify that string/memory functions won't cause RTM abort in RTM region. (cherry picked from commit `4bd660be40`)	2022-01-27 12:07:11 -08:00
H.J. Lu	5be8a84721	x86-64: Add AVX optimized string/memory functions for RTM Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region. (cherry picked from commit `7ebba91361`)	2022-01-27 12:07:11 -08:00
H.J. Lu	757d90ff37	x86-64: Add memcmp family functions with 256-bit EVEX Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit. (cherry picked from commit `91264fe357`)	2022-01-27 12:07:11 -08:00
H.J. Lu	9650d04fb6	x86-64: Add memset family functions with 256-bit EVEX Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit `1b968b6b9b`)	2022-01-27 12:07:11 -08:00
H.J. Lu	2f4a98ab17	x86-64: Add memmove family functions with 256-bit EVEX Update ifunc-memmove.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit. (cherry picked from commit `63ad43566f`)	2022-01-27 12:07:11 -08:00
H.J. Lu	c90c70b8f9	x86-64: Add strcpy family functions with 256-bit EVEX Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit. (cherry picked from commit `525bc2a32c`)	2022-01-27 12:07:11 -08:00
H.J. Lu	c671b231cf	x86-64: Add ifunc-avx2.h functions with 256-bit EVEX Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set. (cherry picked from commit `1fd8c163a8`)	2022-01-27 12:07:11 -08:00
H.J. Lu	be07b3e059	x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions. (cherry picked from commit `1da50d4bda`)	2022-01-27 12:07:11 -08:00
H.J. Lu	2f3fb944b3	NEWS: Add a bug fix entry for BZ #28755	2022-01-27 07:28:12 -08:00
Noah Goldstein	be6fb78a1f	x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to __wcscmp_avx2. For x86_64 this covers the entire address range so any length larger could not possibly be used to bound `s1` or `s2`. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `ddf0992cf5`)	2022-01-27 07:28:12 -08:00
H.J. Lu	646ec4ebe5	NEWS: Add a bug fix entry for BZ #24794	2022-01-27 07:28:12 -08:00
Tulio Magno Quites Machado Filho	edceb7f520	test-container: Install with $(all-subdirs) [BZ #24794 ] Whenever a sub-make is created, it inherits the variable subdirs from its parent. This is also true when make check is called with a restricted list of subdirs. In this scenario, make install is executed "partially" and testroot.pristine ends up with an incomplete installation. [BZ #24794] * Makefile (testroot.pristine/install.stamp): Pass subdirs='$(all-subdirs)' to make install. Reviewed-by: DJ Delorie <dj@redhat.com> (cherry picked from commit `35e038c1d2`)	2022-01-27 07:28:12 -08:00
Siddhesh Poyarekar	2c9083f93d	Fix SXID_ERASE behavior in setuid programs (BZ #27471 ) When parse_tunables tries to erase a tunable marked as SXID_ERASE for setuid programs, it ends up setting the envvar string iterator incorrectly, because of which it may parse the next tunable incorrectly. Given that currently the implementation allows malformed and unrecognized tunables pass through, it may even allow SXID_ERASE tunables to go through. This change revamps the SXID_ERASE implementation so that: - Only valid tunables are written back to the tunestr string, because of which children of SXID programs will only inherit a clean list of identified tunables that are not SXID_ERASE. - Unrecognized tunables get scrubbed off from the environment and subsequently from the child environment. - This has the side-effect that a tunable that is not identified by the setxid binary, will not be passed on to a non-setxid child even if the child could have identified that tunable. This may break applications that expect this behaviour but expecting such tunables to cross the SXID boundary is wrong. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `2ed18c5b53`)	2021-04-14 11:08:02 +05:30
Siddhesh Poyarekar	dcfcb1b208	Enhance setuid-tunables test Instead of passing GLIBC_TUNABLES via the environment, pass the environment variable from parent to child. This allows us to test multiple variables to ensure better coverage. The test list currently only includes the case that's already being tested. More tests will be added later. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `061fe3f8ad`)	2021-04-14 11:07:47 +05:30
Siddhesh Poyarekar	1927d02d19	tst-env-setuid: Use support_capture_subprogram_self_sgid Use the support_capture_subprogram_self_sgid to spawn an sgid child. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `ca33528106`)	2021-04-14 11:07:47 +05:30
Siddhesh Poyarekar	2c23a4d7e0	support: Add capability to fork an sgid child Add a new function support_capture_subprogram_self_sgid that spawns an sgid child of the running program with its own image and returns the exit code of the child process. This functionality is used by at least three tests in the testsuite at the moment, so it makes sense to consolidate. There is also a new function support_subprogram_wait which should provide simple system() like functionality that does not set up file actions. This is useful in cases where only the return code of the spawned subprocess is interesting. This patch also ports tst-secure-getenv to this new function. A subsequent patch will port other tests. This also brings an important change to tst-secure-getenv behaviour. Now instead of succeeding, the test fails as UNSUPPORTED if it is unable to spawn a setgid child, which is how it should have been in the first place. Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `716a3bdc41`)	2021-04-14 11:07:45 +05:30
Siddhesh Poyarekar	23d0e97fe5	support: Typo and formatting fixes - Add a newline to the end of error messages in transfer(). - Fixed the name of support_subprocess_init(). (cherry picked from commit `95c68080a3`)	2021-04-14 11:06:05 +05:30
Siddhesh Poyarekar	ad25ea0c47	support: Pass environ to child process Pass environ to posix_spawn so that the child process can inherit environment of the test. (cherry picked from commit `e958490f8c`)	2021-04-14 11:06:05 +05:30
DJ Delorie	e9c0d7d7ff	nscd: Fix double free in netgroupcache [BZ #27462 ] In commit `745664bd79` a use-after-free was fixed, but this led to an occasional double-free. This patch tracks the "live" allocation better. Tested manually by a third party. Related: RHBZ 1927877 Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `dca565886b`)	2021-03-08 15:42:15 +05:30
Florian Weimer	a9acd88a5d	gconv: Fix assertion failure in ISO-2022-JP-3 module (bug 27256) The conversion loop to the internal encoding does not follow the interface contract that __GCONV_FULL_OUTPUT is only returned after the internal wchar_t buffer has been filled completely. This is enforced by the first of the two asserts in iconv/skeleton.c: /* We must run out of output buffer space in this rerun. */ assert (outbuf == outerr); assert (nstatus == __GCONV_FULL_OUTPUT); This commit solves this issue by queuing a second wide character which cannot be written immediately in the state variable, like other converters already do (e.g., BIG5-HKSCS or TSCII). Reported-by: Tavis Ormandy <taviso@gmail.com> (cherry picked from commit `7d88c6142c`)	2021-01-27 15:22:18 +01:00
H.J. Lu	1864775abc	x86: Check IFUNC definition in unrelocated executable [BZ #20019 ] Calling an IFUNC function defined in unrelocated executable also leads to segfault. Issue a fatal error message when calling IFUNC function defined in the unrelocated executable from a shared library. On x86, ifuncmain6pie failed with: [hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct ./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency. [hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so \| grep foo 00003ff4 00000706 R_386_GLOB_DAT 0000400c foo_ptr 00003ff8 00000406 R_386_GLOB_DAT 00000000 foo 0000400c 00000401 R_386_32 00000000 foo [hjl@gnu-cfl-2 build-i686-linux]$ Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which trigger the circular IFUNC dependency, and build ifuncmain6pie with -Wl,-z,lazy. (cherry picked from commits `6ea5b57afa` and `7137d682eb`)	2021-01-13 14:30:42 -08:00
H.J. Lu	420ade1f64	x86: Set header.feature_1 in TCB for always-on CET [BZ #27177 ] Update dl_cet_check() to set header.feature_1 in TCB when both IBT and SHSTK are always on. (cherry picked from commit `2ef23b5205`)	2021-01-13 09:24:35 -08:00
H.J. Lu	8493ba72b1	x86-64: Avoid rep movsb with short distance [BZ #27130 ] When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB. (cherry picked from commit `3ec5d83d2a`)	2021-01-12 07:01:34 -08:00
Szabolcs Nagy	a3c78954ee	aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798 ] The variant PCS support was ineffective because in the common case linkmap->l_mach.plt == 0 but then the symbol table flags were ignored and normal lazy binding was used instead of resolving the relocs early. (This was a misunderstanding about how GOT[1] is setup by the linker.) In practice this mainly affects SVE calls when the vector length is more than 128 bits, then the top bits of the argument registers get clobbered during lazy binding. Fixes bug 26798. (cherry picked from commit `558251bd87`)	2020-11-04 12:26:02 +00:00
Wilco Dijkstra	28ff0f650c	AArch64: Use __memcpy_simd on Neoverse N2/V1 Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as the memcpy/memmove ifunc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `e11ed9d2b4`)	2020-10-14 14:31:24 +01:00
Wilco Dijkstra	be3eaffd5a	[AArch64] Improve integer memcpy Further optimize integer memcpy. Small cases now include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Comments have been rewritten. (cherry picked from commit `7000651327`)	2020-10-12 18:29:42 +01:00
Krzysztof Koch	c969e84e0c	aarch64: Increase small and medium cases for __memcpy_generic Increase the upper bound on medium cases from 96 to 128 bytes. Now, up to 128 bytes are copied unrolled. Increase the upper bound on small cases from 16 to 32 bytes so that copies of 17-32 bytes are not impacted by the larger medium case. Benchmarking: The attached figures show relative timing difference with respect to 'memcpy_generic', which is the existing implementation. 'memcpy_med_128' denotes the the version of memcpy_generic with only the medium case enlarged. The 'memcpy_med_128_small_32' numbers are for the version of memcpy_generic submitted in this patch, which has both medium and small cases enlarged. The figures were generated using the script from: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html Depending on the platform, the performance improvement in the bench-memcpy-random.c benchmark ranges from 6% to 20% between the original and final version of memcpy.S Tested against GLIBC testsuite and randomized tests. (cherry picked from commit `b9f145df85`)	2020-10-12 18:29:42 +01:00

1 2 3 4 5 ...

34555 Commits