glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-13 23:00:22 +00:00

Author	SHA1	Message	Date
Noah Goldstein	4ff6ae069b	x86: Small improvements for wcslen Just a few QOL changes. 1. Prefer `add` > `lea` as it has high execution units it can run on. 2. Don't break macro-fusion between `test` and `jcc` 3. Reduce code size by removing gratuitous padding bytes (-90 bytes). geometric_mean(N=20) of all benchmarks New / Original: 0.959 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `244b415d38`)	2022-05-16 18:55:09 -07:00
Noah Goldstein	80883f4354	x86: Remove AVX str{n}casecmp The rational is: 1. SSE42 has nearly identical logic so any benefit is minimal (3.4% regression on Tigerlake using SSE42 versus AVX across the benchtest suite). 2. AVX2 version covers the majority of targets that previously prefered it. 3. The targets where AVX would still be best (SnB and IVB) are becoming outdated. All in all the saving the code size is worth it. All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `305769b2a1`)	2022-05-16 18:55:02 -07:00
Noah Goldstein	b13a2e68eb	x86: Add EVEX optimized str{n}casecmp geometric_mean(N=40) of all benchmarks EVEX / SSE42: .621 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `84e7c46df4`)	2022-05-16 18:54:52 -07:00
Noah Goldstein	3051cf3e74	x86: Add AVX2 optimized str{n}casecmp geometric_mean(N=40) of all benchmarks AVX2 / SSE42: .702 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `bbf8122234`)	2022-05-16 18:54:41 -07:00
Noah Goldstein	3605c74407	x86: Optimize str{n}casecmp TOLOWER logic in strcmp-sse42.S Slightly faster method of doing TOLOWER that saves an instruction. Also replace the hard coded 5-byte no with .p2align 4. On builds with CET enabled this misaligned entry to strcasecmp. geometric_mean(N=40) of all benchmarks New / Original: .920 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `d154758e61`)	2022-05-16 18:54:27 -07:00
Noah Goldstein	5997011826	x86: Optimize str{n}casecmp TOLOWER logic in strcmp.S Slightly faster method of doing TOLOWER that saves an instruction. Also replace the hard coded 5-byte no with .p2align 4. On builds with CET enabled this misaligned entry to strcasecmp. geometric_mean(N=40) of all benchmarks New / Original: .894 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `670b54bc58`)	2022-05-16 18:54:17 -07:00
Noah Goldstein	a4b1cae068	x86: Remove strspn-sse2.S and use the generic implementation The generic implementation is faster. geometric_mean(N=20) of all benchmarks New / Original: .710 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `9c8a6ad620`)	2022-05-16 18:54:09 -07:00
Noah Goldstein	3811544655	x86: Remove strpbrk-sse2.S and use the generic implementation The generic implementation is faster (see strcspn commit). All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `6533585352`)	2022-05-16 18:53:59 -07:00
Noah Goldstein	0dafa75e3c	x86: Remove strcspn-sse2.S and use the generic implementation The generic implementation is faster. geometric_mean(N=20) of all benchmarks New / Original: .678 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `fe28e7d9d9`)	2022-05-16 18:53:48 -07:00
Noah Goldstein	0a2da01110	x86: Optimize strspn in strspn-c.c Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of _mm_cmpistri. Also change offset to unsigned to avoid unnecessary sign extensions. geometric_mean(N=20) of all benchmarks that dont fallback on sse2; New / Original: .901 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `412d103431`)	2022-05-16 18:53:39 -07:00
Noah Goldstein	0ae1006967	x86: Optimize strcspn and strpbrk in strcspn-c.c Use _mm_cmpeq_epi8 and _mm_movemask_epi8 to get strlen instead of _mm_cmpistri. Also change offset to unsigned to avoid unnecessary sign extensions. geometric_mean(N=20) of all benchmarks that dont fallback on sse2/strlen; New / Original: .928 All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `30d627d477`)	2022-05-16 18:53:28 -07:00
Noah Goldstein	dd6d3a0bbc	x86: Code cleanup in strchr-evex and comment justifying branch Small code cleanup for size: -81 bytes. Add comment justifying using a branch to do NULL/non-null return. All string/memory tests pass and no regressions in benchtests. geometric_mean(N=20) of all benchmarks New / Original: .985 Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `ec285ea904`)	2022-05-16 18:53:19 -07:00
Noah Goldstein	3c55c20756	x86: Code cleanup in strchr-avx2 and comment justifying branch Small code cleanup for size: -53 bytes. Add comment justifying using a branch to do NULL/non-null return. All string/memory tests pass and no regressions in benchtests. geometric_mean(N=20) of all benchmarks Original / New: 1.00 Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `a6fbf4d51e`)	2022-05-16 18:53:07 -07:00
Adhemerval Zanella	dd457606ca	x86_64: Remove bcopy optimizations The symbols is not present in current POSIX specification and compiler already generates memmove call. (cherry picked from commit `bf92893a14`)	2022-05-16 18:52:57 -07:00
H.J. Lu	37f373e334	x86-64: Remove bzero weak alias in SS2 memset commit `3d9f171bfb` Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Feb 7 05:55:15 2022 -0800 x86-64: Optimize bzero added the optimized bzero. Remove bzero weak alias in SS2 memset to avoid undefined __bzero in memset-sse2-unaligned-erms. (cherry picked from commit `0fb8800029`)	2022-05-16 18:52:47 -07:00
H.J. Lu	6cba46c858	x86_64/multiarch: Sort sysdep_routines and put one entry per line (cherry picked from commit `c328d0152d`)	2022-05-16 18:52:35 -07:00
H.J. Lu	8de6e4a199	x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ)) (cherry picked from commit `1283948f23`)	2022-05-16 18:52:19 -07:00
Siddhesh Poyarekar	b72bbba236	fortify: Ensure that __glibc_fortify condition is a constant [BZ #29141 ] The fix `c8ee1c85` introduced a -1 check for object size without also checking that object size is a constant. Because of this, the tree optimizer passes in gcc fail to fold away one of the branches in __glibc_fortify and trips on a spurious Wstringop-overflow. The warning itself is incorrect and the branch does go away eventually in DCE in the rtl passes in gcc, but the constant check is a helpful hint to simplify code early, so add it in. Resolves: BZ #29141 Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> (cherry picked from commit `61a8753010`)	2022-05-16 22:06:54 +05:30
Florian Weimer	91c2e6c3db	dlfcn: Implement the RTLD_DI_PHDR request type for dlinfo The information is theoretically available via dl_iterate_phdr as well, but that approach is very slow if there are many shared objects. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@rehdat.com> (cherry picked from commit `d056c21213`)	2022-05-11 20:56:53 +02:00
Florian Weimer	e4a2fb76ef	manual: Document the dlinfo function Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@rehdat.com> (cherry picked from commit `93804a1ee0`) Also includes partial backport of commit `5d28a8962d` (the addition of manual/dynlink.texi).	2022-05-11 20:56:40 +02:00
Noah Goldstein	e123f08ad5	x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896 ] Overflow case for __wcsncmp_avx2_rtm should be __wcscmp_avx2_rtm not __wcscmp_avx2. commit `ddf0992cf5` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:21 2022 -0600 x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] Set the wrong fallback function for `__wcsncmp_avx2_rtm`. It was set to fallback on to `__wcscmp_avx2` instead of `__wcscmp_avx2_rtm` which can cause spurious aborts. This change will need to be backported. All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `9fef7039a7`)	2022-05-05 09:13:13 -07:00
Noah Goldstein	5373c90f2e	x86: Fix bug in strncmp-evex and strncmp-avx2 [BZ #28895 ] Logic can read before the start of `s1` / `s2` if both `s1` and `s2` are near the start of a page. To avoid having the result contimated by these comparisons the `strcmp` variants would mask off these comparisons. This was missing in the `strncmp` variants causing the bug. This commit adds the masking to `strncmp` so that out of range comparisons don't affect the result. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass as well a full xcheck on x86_64 linux. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `e108c02a5e`)	2022-05-05 09:11:49 -07:00
Noah Goldstein	70509f9b48	x86: Set .text section in memset-vec-unaligned-erms commit `3d9f171bfb` Author: H.J. Lu <hjl.tools@gmail.com> Date: Mon Feb 7 05:55:15 2022 -0800 x86-64: Optimize bzero Remove setting the .text section for the code. This commit adds that back. (cherry picked from commit `7912236f4a`)	2022-05-05 09:11:13 -07:00
H.J. Lu	5cb6329652	x86-64: Optimize bzero memset with zero as the value to set is by far the majority value (99%+ for Python3 and GCC). bzero can be slightly more optimized for this case by using a zero-idiom xor for broadcasting the set value to a register (vector or GPR). Co-developed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `3d9f171bfb`)	2022-05-05 09:10:53 -07:00
Noah Goldstein	190ea5f7e4	x86: Remove SSSE3 instruction for broadcast in memset.S (SSE2 Only) commit `b62ace2740` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Feb 6 00:54:18 2022 -0600 x86: Improve vec generation in memset-vec-unaligned-erms.S Revert usage of 'pshufb' in broadcast logic as it is an SSSE3 instruction and memset.S is restricted to only SSE2 instructions. (cherry picked from commit `1b0c60f95b`)	2022-05-05 08:54:23 -07:00
Noah Goldstein	ea19c490a3	x86: Improve vec generation in memset-vec-unaligned-erms.S No bug. Split vec generation into multiple steps. This allows the broadcast in AVX2 to use 'xmm' registers for the L(less_vec) case. This saves an expensive lane-cross instruction and removes the need for 'vzeroupper'. For SSE2 replace 2x 'punpck' instructions with zero-idiom 'pxor' for byte broadcast. Results for memset-avx2 small (geomean of N = 20 benchset runs). size, New Time, Old Time, New / Old 0, 4.100, 3.831, 0.934 1, 5.074, 4.399, 0.867 2, 4.433, 4.411, 0.995 4, 4.487, 4.415, 0.984 8, 4.454, 4.396, 0.987 16, 4.502, 4.443, 0.987 All relevant string/wcsmbs tests are passing. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `b62ace2740`)	2022-05-05 08:54:11 -07:00
H.J. Lu	53ddafe917	x86-64: Fix strcmp-evex.S Change "movl %edx, %rdx" to "movl %edx, %edx" in: commit `8418eb3ff4` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Jan 10 15:35:39 2022 -0600 x86: Optimize strcmp-evex.S (cherry picked from commit `0e0199a9e0`)	2022-05-05 08:54:03 -07:00
H.J. Lu	d299032743	x86-64: Fix strcmp-avx2.S Change "movl %edx, %rdx" to "movl %edx, %edx" in: commit `b77b06e0e2` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Jan 10 15:35:38 2022 -0600 x86: Optimize strcmp-avx2.S (cherry picked from commit `c15efd011c`)	2022-05-05 08:53:50 -07:00
Noah Goldstein	c41a66767d	x86: Optimize strcmp-evex.S Optimization are primarily to the loop logic and how the page cross logic interacts with the loop. The page cross logic is at times more expensive for short strings near the end of a page but not crossing the page. This is done to retest the page cross conditions with a non-faulty check and to improve the logic for entering the loop afterwards. This is only particular cases, however, and is general made up for by more than 10x improvements on the transition from the page cross -> loop case. The non-page cross cases as well are nearly universally improved. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `8418eb3ff4`)	2022-05-05 08:53:42 -07:00
Noah Goldstein	0d5b36c8cc	x86: Optimize strcmp-avx2.S Optimization are primarily to the loop logic and how the page cross logic interacts with the loop. The page cross logic is at times more expensive for short strings near the end of a page but not crossing the page. This is done to retest the page cross conditions with a non-faulty check and to improve the logic for entering the loop afterwards. This is only particular cases, however, and is general made up for by more than 10x improvements on the transition from the page cross -> loop case. The non-page cross cases are improved most for smaller sizes [0, 128] and go about even for (128, 4096]. The loop page cross logic is improved so some more significant speedup is seen there as well. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `b77b06e0e2`)	2022-05-05 08:53:34 -07:00
Siddhesh Poyarekar	31af92b9c8	manual: Clarify that abbreviations of long options are allowed The man page and code comments clearly state that abbreviations of long option names are recognized correctly as long as they are unique. Document this fact in the glibc manual as well. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Andreas Schwab <schwab@linux-m68k.org> (cherry picked from commit `db1efe02c9`)	2022-05-04 15:59:39 +05:30
Joseph Myers	97cb8227b8	Add HWCAP2_AFP, HWCAP2_RPRES from Linux 5.17 to AArch64 bits/hwcap.h Add the new HWCAP2_AFP and HWCAP2_RPRES constants from Linux 5.17. Tested with build-many-glibcs.py for aarch64-linux-gnu. (cherry picked from commit `866c599182`)	2022-05-03 11:08:52 +02:00
Szabolcs Nagy	c108e87026	aarch64: Add HWCAP2_ECV from Linux 5.16 Indicates the availability of enhanced counter virtualization extension of armv8.6-a with self-synchronized virtual counter CNTVCTSS_EL0 usable in userspace. (cherry picked from commit `5a1be8ebdf`)	2022-05-03 11:08:52 +02:00
Joseph Myers	f858bc3093	Add SOL_MPTCP, SOL_MCTP from Linux 5.16 to bits/socket.h Linux 5.16 adds constants SOL_MPTCP and SOL_MCTP to the getsockopt / setsockopt levels; add these constants to bits/socket.h. Tested for x86_64. (cherry picked from commit `fdc1ae67fe`)	2022-05-03 11:08:52 +02:00
Joseph Myers	0499c3a95f	Update kernel version to 5.17 in tst-mman-consts.py This patch updates the kernel version in the test tst-mman-consts.py to 5.17. (There are no new MAP_* constants covered by this test in 5.17 that need any other header changes.) Tested with build-many-glibcs.py. (cherry picked from commit `23808a422e`)	2022-05-03 11:08:52 +02:00
Joseph Myers	81181ba5d9	Update kernel version to 5.16 in tst-mman-consts.py This patch updates the kernel version in the test tst-mman-consts.py to 5.16. (There are no new MAP_* constants covered by this test in 5.16 that need any other header changes.) Tested with build-many-glibcs.py. (cherry picked from commit `790a607e23`)	2022-05-03 11:08:52 +02:00
Joseph Myers	6af165658d	Update syscall lists for Linux 5.17 Linux 5.17 has one new syscall, set_mempolicy_home_node. Update syscall-names.list and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py. (cherry picked from commit `8ef9196b26`)	2022-05-03 11:08:52 +02:00
Joseph Myers	5146b73d72	Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol numbers for PF_CAN") but apparently missed for glibc at the time. Tested for x86_64. (cherry picked from commit `a94d9659cd`)	2022-05-03 11:07:10 +02:00
Joseph Myers	fd5dbfd1cd	Update kernel version to 5.15 in tst-mman-consts.py This patch updates the kernel version in the test tst-mman-consts.py to 5.15. (There are no new MAP_* constants covered by this test in 5.15 that need any other header changes.) Tested with build-many-glibcs.py. (cherry picked from commit `5c3ece451d`)	2022-05-03 11:07:07 +02:00
Joseph Myers	bc6fba3c80	Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add these constants to bits/socket.h. Tested for x86_64. (cherry picked from commit `bdeb7a8fa9`)	2022-05-03 11:07:03 +02:00
DJ Delorie	c66c92181d	posix/glob.c: update from gnulib Copied from gnulib/lib/glob.c in order to fix rhbz 1982608 Also fixes swbz 25659 Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> (cherry picked from commit `7c477b57a3`)	2022-04-28 11:57:23 -04:00
Adhemerval Zanella	88a8637cb4	linux: Fix fchmodat with AT_SYMLINK_NOFOLLOW for 64 bit time_t (BZ#29097) The AT_SYMLINK_NOFOLLOW emulation ues the default 32 bit stat internal calls, which fails with EOVERFLOW if the file constains timestamps beyond 2038. Checked on i686-linux-gnu. (cherry picked from commit `118a2aee07`)	2022-04-28 10:10:30 -03:00
Carlos O'Donell	55640ed3fd	i386: Regenerate ulps These failures were caught while building glibc master for Fedora Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse' using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon processor. (cherry picked from commit `e465d97653`)	2022-04-27 21:20:43 -04:00
Adhemerval Zanella	9681691402	linux: Fix missing internal 64 bit time_t stat usage These are two missing spots initially done by `52a5fe70a2`. Checked on i686-linux-gnu. (cherry picked from commit `834ddd0432`)	2022-04-27 14:52:26 -03:00
Noah Goldstein	c796418d00	x86: Optimize L(less_vec) case in memcmp-evex-movbe.S No bug. Optimizations are twofold. 1) Replace page cross and 0/1 checks with masked load instructions in L(less_vec). In applications this reduces branch-misses in the hot [0, 32] case. 2) Change controlflow so that L(less_vec) case gets the fall through. Change 2) helps copies in the [0, 32] size range but comes at the cost of copies in the [33, 64] size range. From profiles of GCC and Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this appears to the the right tradeoff. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `abddd61de0`)	2022-04-26 18:18:16 -07:00
H.J. Lu	f3a99b2216	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used. (cherry picked from commit `ceeffe968c`)	2022-04-26 18:18:16 -07:00
Noah Goldstein	4bbd0f866a	x86-64: Use notl in EVEX strcmp [BZ #28646 ] Must use notl %edi here as lower bits are for CHAR comparisons potentially out of range thus can be 0 without indicating mismatch. This fixes BZ #28646. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `4df1fa6ddc`)	2022-04-26 18:18:16 -07:00
Noah Goldstein	7cb126e7e7	x86: Shrink memcmp-sse4.S code size No bug. This implementation refactors memcmp-sse4.S primarily with minimizing code size in mind. It does this by removing the lookup table logic and removing the unrolled check from (256, 512] bytes. memcmp-sse4 code size reduction : -3487 bytes wmemcmp-sse4 code size reduction: -1472 bytes The current memcmp-sse4.S implementation has a large code size cost. This has serious adverse affects on the ICache / ITLB. While in micro-benchmarks the implementations appears fast, traces of real-world code have shown that the speed in micro benchmarks does not translate when the ICache/ITLB are not primed, and that the cost of the code size has measurable negative affects on overall application performance. See https://research.google/pubs/pub48320/ for more details. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `2f9062d717`)	2022-04-26 18:18:16 -07:00
Noah Goldstein	cecbac5212	x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `475b63702e`)	2022-04-26 18:18:16 -07:00
Noah Goldstein	a7392db2ff	x86: Optimize memmove-vec-unaligned-erms.S No bug. The optimizations are as follows: 1) Always align entry to 64 bytes. This makes behavior more predictable and makes other frontend optimizations easier. 2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have significant benefits in the case that: 0 < (dst - src) < [256, 512] 3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%] improvement and for FSRM [-10%, 25%]. In addition to these primary changes there is general cleanup throughout to optimize the aligning routines and control flow logic. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `a6b7502ec0`)	2022-04-26 18:18:16 -07:00

1 2 3 4 5 ...

38019 Commits