glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-25 06:20:06 +00:00

Author	SHA1	Message	Date
Alan Modra	6f043e0ee7	Use __ehdr_start rather than _begin in _dl_start_final __ehdr_start is already used in rltld.c:dl_main, and can serve the same purpose as _begin. Besides tidying the code, using linker defined section relative symbols rather than "-defsym _begin=0" better reflects the intent of _dl_start_final use of _begin, which is to refer to the load address of ld.so rather than absolute address zero. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2022-04-28 08:50:11 +09:30
Noah Goldstein	911c63a51c	sysdeps: Add 'get_fast_jitter' interace in fast-jitter.h 'get_fast_jitter' is meant to be used purely for performance purposes. In all cases it's used it should be acceptable to get no randomness (see default case). An example use case is in setting jitter for retries between threads at a lock. There is a performance benefit to having jitter, but only if the jitter can be generated very quickly and ultimately there is no serious issue if no jitter is generated. The implementation generally uses 'HP_TIMING_NOW' iff it is inlined (avoid any potential syscall paths). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-27 17:17:43 -05:00
DJ Delorie	7c477b57a3	posix/glob.c: update from gnulib Copied from gnulib/lib/glob.c in order to fix rhbz 1982608 Also fixes swbz 25659 Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2022-04-27 17:19:31 -04:00
Wangyang Guo	9e5daa1f6a	benchtests: Add pthread-mutex-locks bench Benchmark for testing pthread mutex locks performance with different threads and critical sections. The test configuration consists of 3 parts: 1. thread number 2. critical-section length 3. non-critical-section length Thread number starts from 1 and increased by 2x until num of CPU cores (nprocs). An additional over-saturation case (1.25 * nprocs) is also included. Critical-section is represented by a loop of shared do_filler(), length can be determined by the loop iters. Non-critical-section is similiar to the critical-section, except it's based on non-shared do_filler(). Currently, adaptive pthread_mutex lock is tested.	2022-04-27 13:41:57 -07:00
Adhemerval Zanella	834ddd0432	linux: Fix missing internal 64 bit time_t stat usage These are two missing spots initially done by `52a5fe70a2`. Checked on i686-linux-gnu.	2022-04-27 14:21:07 -03:00
Adhemerval Zanella	3a0588ae48	elf: Fix DFS sorting algorithm for LD_TRACE_LOADED_OBJECTS with missing libraries (BZ #28868 ) On _dl_map_object the underlying file is not opened in trace mode (in other cases where the underlying file can't be opened, _dl_map_object quits with an error). If there any missing libraries being processed, they will not be considered on final nlist size passed on _dl_sort_maps later in the function. And it is then used by _dl_sort_maps_dfs on the stack allocated working maps: 222 /* Array to hold RPO sorting results, before we copy back to maps[]. / 223 struct link_map rpo[nmaps]; 224 225 /* The 'head' position during each DFS iteration. Note that we start at 226 one past the last element due to first-decrement-then-store (see the 227 bottom of above dfs_traversal() routine). / 228 struct link_map *rpo_head = &rpo[nmaps]; However while transversing the 'l_initfini' on dfs_traversal it will still consider the l_faked maps and thus update rpo more times than the allocated working 'rpo', overflowing the stack object. As suggested in bugzilla, one option would be to avoid sorting the maps for trace mode. However I think ignoring l_faked object does make sense (there is one less constraint to call the sorting function), it allows a slight less stack usage for trace, and it is slight simpler solution. The tests does trigger the stack overflow, however I tried to make it more generic to check different scenarios or missing objects. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-04-27 08:36:09 -03:00
Adhemerval Zanella	4f7b7d00e0	posix: Remove unused definition on _Fork Checked on x86_64-linux-gnu.	2022-04-26 14:21:08 -03:00
H.J. Lu	4c5b1cf5a6	NEWS: Mention DT_RELR support	2022-04-26 10:16:11 -07:00
H.J. Lu	4ada564f35	elf: Add more DT_RELR tests Verify that: 1. A DT_RELR shared library without DT_NEEDED works. 2. A DT_RELR shared library without DT_VERNEED works. 3. A DT_RELR shared library without libc.so on DT_NEEDED works.	2022-04-26 10:16:11 -07:00
H.J. Lu	60196d2ef2	elf: Properly handle zero DT_RELA/DT_REL values With DT_RELR, there may be no relocations in DT_RELA/DT_REL and their entry values are zero. Don't relocate DT_RELA/DT_REL and update the combined relocation start address if their entry values are zero.	2022-04-26 10:16:11 -07:00
Fangrui Song	e895cff59a	elf: Support DT_RELR relative relocation format [BZ #27924 ] PIE and shared objects usually have many relative relocations. In 2017/2018, SHT_RELR/DT_RELR was proposed on https://groups.google.com/g/generic-abi/c/bX460iggiKg/m/GxjM0L-PBAAJ ("Proposal for a new section type SHT_RELR") and is a pre-standard. RELR usually takes 3% or smaller space than R__RELATIVE relocations. The virtual memory size of a mostly statically linked PIE is typically 5~10% smaller. --- Notes I will not include in the submitted commit: Available on https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/maskray/relr "pre-standard": even Solaris folks are happy with the refined generic-abi proposal. Cary Coutant will apply the change https://sourceware.org/pipermail/libc-alpha/2021-October/131781.html This patch is simpler than Chrome OS's glibc patch and makes ELF_DYNAMIC_DO_RELR available to all ports. I don't think the current glibc implementation supports ia64 in an ELFCLASS32 container. That said, the style I used is works with an ELFCLASS32 container for 64-bit machine if ElfW(Addr) is 64-bit. Chrome OS folks have carried a local patch since 2018 (latest version: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/main/sys-libs/glibc/files/local/glibc-2.32). I.e. this feature has been battle tested. * Android bionic supports 2018 and switched to DT_RELR==36 in 2020. * The Linux kernel has supported CONFIG_RELR since 2019-08 (https://git.kernel.org/linus/5cf896fb6be3effd9aea455b22213e27be8bdb1d). * A musl patch (by me) exists but is not applied: https://www.openwall.com/lists/musl/2019/03/06/3 * rtld-elf from FreeBSD 14 will support DT_RELR. I believe upstream glibc should support DT_RELR to benefit all Linux distributions. I filed some feature requests to get their attention: * Gentoo: https://bugs.gentoo.org/818376 * Arch Linux: https://bugs.archlinux.org/task/72433 * Debian https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996598 * Fedora https://bugzilla.redhat.com/show_bug.cgi?id=2014699 As of linker support (to the best of my knowledge): * LLD support DT_RELR. * https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/refs/heads/main/sys-devel/binutils/files/ has a gold patch. * GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=27923 Changes from the original patch: 1. Check the linker option, -z pack-relative-relocs, which add a GLIBC_ABI_DT_RELR symbol version dependency on the shared C library if it provides a GLIBC_2.XX symbol version. 2. Change make variale to have-dt-relr. 3. Rename tst-relr-no-pie to tst-relr-pie for --disable-default-pie. 4. Use TEST_VERIFY in tst-relr.c. 5. Add the check-tst-relr-pie.out test to check for linker generated libc.so version dependency on GLIBC_ABI_DT_RELR. 6. Move ELF_DYNAMIC_DO_RELR before ELF_DYNAMIC_DO_REL.	2022-04-26 10:16:11 -07:00
H.J. Lu	57292f5741	Add GLIBC_ABI_DT_RELR for DT_RELR support The EI_ABIVERSION field of the ELF header in executables and shared libraries can be bumped to indicate the minimum ABI requirement on the dynamic linker. However, EI_ABIVERSION in executables isn't checked by the Linux kernel ELF loader nor the existing dynamic linker. Executables will crash mysteriously if the dynamic linker doesn't support the ABI features required by the EI_ABIVERSION field. The dynamic linker should be changed to check EI_ABIVERSION in executables. Add a glibc version, GLIBC_ABI_DT_RELR, to indicate DT_RELR support so that the existing dynamic linkers will issue an error on executables with GLIBC_ABI_DT_RELR dependency. When there is a DT_VERNEED entry with libc.so on DT_NEEDED, issue an error if there is a DT_RELR entry without GLIBC_ABI_DT_RELR dependency. Support __placeholder_only_for_empty_version_map as the placeholder symbol used only for empty version map to generate GLIBC_ABI_DT_RELR without any symbols.	2022-04-26 10:16:11 -07:00
H.J. Lu	4610b24f5e	elf: Define DT_RELR related macros and types	2022-04-26 10:16:11 -07:00
Fangrui Song	098a657fe4	elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-04-26 09:26:22 -07:00
Carlos O'Donell	e465d97653	i386: Regenerate ulps These failures were caught while building glibc master for Fedora Rawhide which is built with '-mtune=generic -msse2 -mfpmath=sse' using gcc 11.3 (gcc-11.3.1-2.fc35) on a Cascadelake Intel Xeon processor.	2022-04-26 10:52:41 -04:00
Florian Weimer	8dcb6d0af0	dlfcn: Do not use rtld_active () to determine ld.so state (bug 29078) When audit modules are loaded, ld.so initialization is not yet complete, and rtld_active () returns false even though ld.so is mostly working. Instead, the static dlopen hook is used, but that does not work at all because this is not a static dlopen situation. Commit `466c1ea15f` ("dlfcn: Rework static dlopen hooks") moved the hook pointer into _rtld_global_ro, which means that separate protection is not needed anymore and the hook pointer can be checked directly. The guard for disabling libio vtable hardening in _IO_vtable_check should stay for now. Fixes commit `8e1472d2c1` ("ld.so: Examine GLRO to detect inactive loader [BZ #20204]"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-04-26 14:24:36 +02:00
Florian Weimer	c935789bdf	INSTALL: Rephrase -with-default-link documentation Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-04-26 14:22:23 +02:00
Fangrui Song	1305edd42c	elf: Move post-relocation code of _dl_start into _dl_start_final On non-PI_STATIC_AND_HIDDEN architectures, getting the address of _rtld_local_ro (for GLRO (dl_final_object)) goes through a GOT entry. The GOT load may be reordered before self relocation, leading to an unrelocated/incorrect _rtld_local_ro address. `84e02af1eb` tickled GCC powerpc32 to reorder the GOT load before relative relocations, leading to ld.so crash. This is similar to the m68k jump table reordering issue fixed by `a8e9b5b807`. Move code after self relocation into _dl_start_final to avoid the reordering. This fixes powerpc32 and may help other architectures when ELF_DYNAMIC_RELOCATE is simplified in the future.	2022-04-25 10:30:27 -07:00
Joan Bruguera	33e03f9cd2	misc: Fix rare fortify crash on wchar funcs. [BZ 29030] If `__glibc_objsize (__o) == (size_t) -1` (i.e. `__o` is unknown size), fortify checks should pass, and `__whatever_alias` should be called. Previously, `__glibc_objsize (__o) == (size_t) -1` was explicitly checked, but on commit `a643f60c53`, this was moved into `__glibc_safe_or_unknown_len`. A comment says the -1 case should work as: "The -1 check is redundant because since it implies that __glibc_safe_len_cond is true.". But this fails when: * `__s > 1` * `__osz == -1` (i.e. unknown size at compile time) * `__l` is big enough * `__l * __s <= __osz` can be folded to a constant (I only found this to be true for `mbsrtowcs` and other functions in wchar2.h) In this case `__l * __s <= __osz` is false, and `__whatever_chk_warn` will be called by `__glibc_fortify` or `__glibc_fortify_n` and crash the program. This commit adds the explicit `__osz == -1` check again. moc crashes on startup due to this, see: https://bugs.archlinux.org/task/74041 Minimal test case (test.c): #include <wchar.h> int main (void) { const char hw = "HelloWorld"; mbsrtowcs (NULL, &hw, (size_t)-1, NULL); return 0; } Build with: gcc -O2 -Wp,-D_FORTIFY_SOURCE=2 test.c -o test && ./test Output: buffer overflow detected *: terminated Fixes: BZ #29030 Signed-off-by: Joan Bruguera <joanbrugueram@gmail.com> Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-04-25 17:32:30 +05:30
Fangrui Song	693517b922	elf: Remove unused enum allowmask Unused since `52a01100ad` ("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]"). Reviewed-by: Florian Weimer <fweimer@redhat.com>	2022-04-25 01:01:02 -07:00
Florian Weimer	b571f3adff	scripts/glibcelf.py: Mark as UNSUPPORTED on Python 3.5 and earlier enum.IntFlag and enum.EnumMeta._missing_ support are not part of earlier Python versions.	2022-04-25 09:14:49 +02:00
Noah Goldstein	c966099cdc	x86: Optimize {str\|wcs}rchr-evex The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.755 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-22 23:08:43 -05:00
Noah Goldstein	df7e295d18	x86: Optimize {str\|wcs}rchr-avx2 The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.832 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-22 23:08:40 -05:00
Noah Goldstein	5307aa9c18	x86: Optimize {str\|wcs}rchr-sse2 The new code unrolls the main loop slightly without adding too much overhead and minimizes the comparisons for the search CHAR. Geometric Mean of all benchmarks New / Old: 0.741 See email for all results. Full xcheck passes on x86_64 with and without multiarch enabled. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-22 23:08:36 -05:00
Noah Goldstein	c2ff9555a1	benchtests: Improve bench-strrchr 1. Use json-lib for printing results. 2. Expose all parameters (before pos, seek_char, and max_char where not printed). 3. Add benchmarks that test multiple occurence of seek_char in the string. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-22 23:07:54 -05:00
H.J. Lu	8ea20ee5f6	x86-64: Fix SSE2 memcmp and SSSE3 memmove for x32 Clear the upper 32 bits in RDX (memory size) for x32 to fix FAIL: string/tst-size_t-memcmp FAIL: string/tst-size_t-memcmp-2 FAIL: string/tst-size_t-memcpy FAIL: wcsmbs/tst-size_t-wmemcmp on x32 introduced by `8804157ad9` x86: Optimize memcmp SSE2 in memcmp.S `26b2478322` x86: Reduce code size of mem{move\|pcpy\|cpy}-ssse3 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-04-22 11:23:15 -07:00
Florian Weimer	198abcbb94	Default to --with-default-link=no (bug 25812) This is necessary to place the libio vtables into the RELRO segment. New tests elf/tst-relro-ldso and elf/tst-relro-libc are added to verify that this is what actually happens. The new tests fail on ia64 due to lack of (default) RELRO support inbutils, so they are XFAILed there.	2022-04-22 10:59:03 +02:00
Florian Weimer	30035d6772	scripts: Add glibcelf.py module Hopefully, this will lead to tests that are easier to maintain. The current approach of parsing readelf -W output using regular expressions is not necessarily easier than parsing the ELF data directly. This module is still somewhat incomplete (e.g., coverage of relocation types and versioning information is missing), but it is sufficient to perform basic symbol analysis or program header analysis. The EM_* mapping for architecture-specific constant classes (e.g., SttX86_64) is not yet implemented. The classes are defined for the benefit of elf/tst-glibcelf.py. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-04-22 10:58:32 +02:00
Emil Soleyman-Zomalan	3e29dc5233	Add locale for syr_SY	2022-04-21 13:05:40 +02:00
Fangrui Song	84e02af1eb	elf: Move elf_dynamic_do_Rel RTLD_BOOTSTRAP branches outside elf_dynamic_do_Rel checks RTLD_BOOTSTRAP in several #ifdef branches. Create an outside RTLD_BOOTSTRAP branch to simplify reasoning about the function at the cost of a few duplicate lines. Since dl_naudit is zero in RTLD_BOOTSTRAP code, the RTLD_BOOTSTRAP branch can avoid _dl_audit_symbind calls to decrease code size. Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>	2022-04-20 13:52:45 -07:00
Fangrui Song	a8e9b5b807	m68k: Handle fewer relocations for RTLD_BOOTSTRAP (#BZ29071) m68k is a non-PI_STATIC_AND_HIDDEN arch which uses a GOT relocation when loading the address of a jump table. The GOT load may be reordered before processing R_68K_RELATIVE relocations, leading to an unrelocated/incorrect jump table, which will cause a crash. The foolproof approach is to add an optimization barrier (e.g. calling an non-inlinable function after relative relocations are resolved). That is non-trivial given the current code structure, so just use the simple approach to avoid the jump table: handle only the essential reloctions for RTLD_BOOTSTRAP code. This is based on Andreas Schwab's patch and fixed ld.so crash on m68k. Reviewed-by: Adheemrval Zanella <adhemerval.zanella@linaro.org>	2022-04-20 10:24:16 -07:00
Adhemerval Zanella	62be968167	nptl: Fix pthread_cancel cancelhandling atomic operations The `404656009b` reversion did not setup the atomic loop to set the cancel bits correctly. The fix is essentially what pthread_cancel did prior `26cfbb7162`. Checked on x86_64-linux-gnu and aarch64-linux-gnu.	2022-04-20 12:01:43 -03:00
Noah Goldstein	c72a1a062a	x86: Fix missing __wmemcmp def for disable-multiarch build commit `8804157ad9` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Fri Apr 15 12:27:59 2022 -0500 x86: Optimize memcmp SSE2 in memcmp.S Only defined wmemcmp and missed __wmemcmp. This commit fixes that by defining __wmemcmp and setting wmemcmp as a weak alias to __wmemcmp. Both multiarch and disable-multiarch builds succeed and full xchecks pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-19 20:18:57 -05:00
Fangrui Song	3e9acce8c5	elf: Remove __libc_init_secure After `73fc4e28b9`, __libc_enable_secure_decided is always 0 and a statically linked executable may overwrite __libc_enable_secure without considering AT_SECURE. The __libc_enable_secure has been correctly initialized in _dl_aux_init, so just remove __libc_enable_secure_decided and __libc_init_secure. This allows us to remove some startup_get*id functions from `22b79ed7f4`. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2022-04-19 15:52:27 -07:00
=Joshua Kinard	78fb888273	mips: Fix mips64n32 64 bit time_t stat support (BZ#29069) Add missing support initially added by `4e8521333b` (which missed n32 stat).	2022-04-18 10:02:25 -03:00
Noah Goldstein	23102686ec	x86: Cleanup page cross code in memcmp-avx2-movbe.S Old code was both inefficient and wasted code size. New code (-62 bytes) and comparable or better performance in the page cross case. geometric_mean(N=20) of page cross cases New / Original: 0.960 size, align0, align1, ret, New Time/Old Time 1, 4095, 0, 0, 1.001 1, 4095, 0, 1, 0.999 1, 4095, 0, -1, 1.0 2, 4094, 0, 0, 1.0 2, 4094, 0, 1, 1.0 2, 4094, 0, -1, 1.0 3, 4093, 0, 0, 1.0 3, 4093, 0, 1, 1.0 3, 4093, 0, -1, 1.0 4, 4092, 0, 0, 0.987 4, 4092, 0, 1, 1.0 4, 4092, 0, -1, 1.0 5, 4091, 0, 0, 0.984 5, 4091, 0, 1, 1.002 5, 4091, 0, -1, 1.005 6, 4090, 0, 0, 0.993 6, 4090, 0, 1, 1.001 6, 4090, 0, -1, 1.003 7, 4089, 0, 0, 0.991 7, 4089, 0, 1, 1.0 7, 4089, 0, -1, 1.001 8, 4088, 0, 0, 0.875 8, 4088, 0, 1, 0.881 8, 4088, 0, -1, 0.888 9, 4087, 0, 0, 0.872 9, 4087, 0, 1, 0.879 9, 4087, 0, -1, 0.883 10, 4086, 0, 0, 0.878 10, 4086, 0, 1, 0.886 10, 4086, 0, -1, 0.873 11, 4085, 0, 0, 0.878 11, 4085, 0, 1, 0.881 11, 4085, 0, -1, 0.879 12, 4084, 0, 0, 0.873 12, 4084, 0, 1, 0.889 12, 4084, 0, -1, 0.875 13, 4083, 0, 0, 0.873 13, 4083, 0, 1, 0.863 13, 4083, 0, -1, 0.863 14, 4082, 0, 0, 0.838 14, 4082, 0, 1, 0.869 14, 4082, 0, -1, 0.877 15, 4081, 0, 0, 0.841 15, 4081, 0, 1, 0.869 15, 4081, 0, -1, 0.876 16, 4080, 0, 0, 0.988 16, 4080, 0, 1, 0.99 16, 4080, 0, -1, 0.989 17, 4079, 0, 0, 0.978 17, 4079, 0, 1, 0.981 17, 4079, 0, -1, 0.98 18, 4078, 0, 0, 0.981 18, 4078, 0, 1, 0.98 18, 4078, 0, -1, 0.985 19, 4077, 0, 0, 0.977 19, 4077, 0, 1, 0.979 19, 4077, 0, -1, 0.986 20, 4076, 0, 0, 0.977 20, 4076, 0, 1, 0.986 20, 4076, 0, -1, 0.984 21, 4075, 0, 0, 0.977 21, 4075, 0, 1, 0.983 21, 4075, 0, -1, 0.988 22, 4074, 0, 0, 0.983 22, 4074, 0, 1, 0.994 22, 4074, 0, -1, 0.993 23, 4073, 0, 0, 0.98 23, 4073, 0, 1, 0.992 23, 4073, 0, -1, 0.995 24, 4072, 0, 0, 0.989 24, 4072, 0, 1, 0.989 24, 4072, 0, -1, 0.991 25, 4071, 0, 0, 0.99 25, 4071, 0, 1, 0.999 25, 4071, 0, -1, 0.996 26, 4070, 0, 0, 0.993 26, 4070, 0, 1, 0.995 26, 4070, 0, -1, 0.998 27, 4069, 0, 0, 0.993 27, 4069, 0, 1, 0.999 27, 4069, 0, -1, 1.0 28, 4068, 0, 0, 0.997 28, 4068, 0, 1, 1.0 28, 4068, 0, -1, 0.999 29, 4067, 0, 0, 0.996 29, 4067, 0, 1, 0.999 29, 4067, 0, -1, 0.999 30, 4066, 0, 0, 0.991 30, 4066, 0, 1, 1.001 30, 4066, 0, -1, 0.999 31, 4065, 0, 0, 0.988 31, 4065, 0, 1, 0.998 31, 4065, 0, -1, 0.998 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-15 13:08:52 -05:00
Noah Goldstein	7cbc03d030	x86: Remove memcmp-sse4.S Code didn't actually use any sse4 instructions since `ptest` was removed in: commit `2f9062d717` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Nov 10 16:18:56 2021 -0600 x86: Shrink memcmp-sse4.S code size The new memcmp-sse2 implementation is also faster. geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905 Note there are two regressions preferring SSE2 for Size = 1 and Size = 65. Size = 1: size, align0, align1, ret, New Time/Old Time 1, 1, 1, 0, 1.2 1, 1, 1, 1, 1.197 1, 1, 1, -1, 1.2 This is intentional. Size == 1 is significantly less hot based on profiles of GCC11 and Python3 than sizes [4, 8] (which is made hotter). Python3 Size = 1 -> 13.64% Python3 Size = [4, 8] -> 60.92% GCC11 Size = 1 -> 1.29% GCC11 Size = [4, 8] -> 33.86% size, align0, align1, ret, New Time/Old Time 4, 4, 4, 0, 0.622 4, 4, 4, 1, 0.797 4, 4, 4, -1, 0.805 5, 5, 5, 0, 0.623 5, 5, 5, 1, 0.777 5, 5, 5, -1, 0.802 6, 6, 6, 0, 0.625 6, 6, 6, 1, 0.813 6, 6, 6, -1, 0.788 7, 7, 7, 0, 0.625 7, 7, 7, 1, 0.799 7, 7, 7, -1, 0.795 8, 8, 8, 0, 0.625 8, 8, 8, 1, 0.848 8, 8, 8, -1, 0.914 9, 9, 9, 0, 0.625 Size = 65: size, align0, align1, ret, New Time/Old Time 65, 0, 0, 0, 1.103 65, 0, 0, 1, 1.216 65, 0, 0, -1, 1.227 65, 65, 0, 0, 1.091 65, 0, 65, 1, 1.19 65, 65, 65, -1, 1.215 This is because A) the checks in range [65, 96] are now unrolled 2x and B) because smaller values <= 16 are now given a hotter path. By contrast the SSE4 version has a branch for Size = 80. The unrolled version has get better performance for returns which need both comparisons. size, align0, align1, ret, New Time/Old Time 128, 4, 8, 0, 0.858 128, 4, 8, 1, 0.879 128, 4, 8, -1, 0.888 As well, out of microbenchmark environments that are not full predictable the branch will have a real-cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-15 13:08:42 -05:00
Noah Goldstein	8804157ad9	x86: Optimize memcmp SSE2 in memcmp.S New code save size (-303 bytes) and has significantly better performance. geometric_mean(N=20) of page cross cases New / Original: 0.634 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-15 13:08:35 -05:00
Adhemerval Zanella	ac0d208b54	misc: Use 64 bit time_t interfaces on syslog It also handles the highly unlikely case where localtime might return NULL, in this case only the PRI is set to hopefully instruct the relay to get eh TIMESTAMP (as defined by the RFC). Checked on x86_64-linux-gnu and i686-linux-gnu.	2022-04-15 10:41:54 -03:00
Adhemerval Zanella	cac6b54ae2	misc: syslog: Move SYSLOG_NAME to USE_MISC (BZ #16355 ) There is no easy solution as described on first comment in bug report, and some code (like busybox) assumes facilitynames existance when SYSLOG_NAMES is defined (so we can't just remove it as suggested in comment #2). So use the easier solution and guard it with __USE_MISC.	2022-04-15 10:41:54 -03:00
Adhemerval Zanella	a583b6add4	misc: syslog: Use fixed-sized buffer and remove memstream A fixed-sized buffer is used instead of memstream for messages up to 1024 bytes to avoid the potential BUFSIZ (8K) malloc and free for each syslog call. Also, since the buffer size is know, memstream is replaced with a malloced buffer for larger messages. Checked on x86_64-linux-gnu.	2022-04-15 10:41:54 -03:00
Adhemerval Zanella	f9f5c70e7f	misc: syslog: Simplify implementation Use a temporary buffer for strftime instead of using internal libio members, simplify fprintf call on the memstream and memory allocation, use %b instead of %h, use dprintf instead of writev for LOG_PERROR. Checked on x86_64-linux-gnu and i686-linux-gnu.	2022-04-15 10:41:54 -03:00
Adhemerval Zanella	0cc15f45c9	misc: syslog: Fix indentation and style And also clenaup the headers, no semantic changes.	2022-04-15 10:41:54 -03:00
Adhemerval Zanella	096c27684a	misc: Add syslog test The test cover: - All possible priorities and facilities through TCP and UDP. - Same syslog tests for vsyslog. - Some openlog/syslog/close combinations. - openlog with LOG_CONS, LOG_PERROR, and LOG_PID. Internally is done with a test-container where the main process mimics the syslog server interface. The test does not cover multithread and async-signal usage. Checked on x86_64-linux-gnu.	2022-04-15 10:41:50 -03:00
Adhemerval Zanella	e3fdbe9f39	support: Add xmkfifo Wrapper support mkfifo.	2022-04-15 09:59:33 -03:00
Adhemerval Zanella	592b6d00aa	stdio: Split __get_errname definition from errlist.c The loader does not need to pull all __get_errlist definitions and its size is decreased: Before: $ size elf/ld.so text data bss dec hex filename 197774 11024 456 209254 33166 elf/ld.so After: $ size elf/ld.so text data bss dec hex filename 191510 9936 456 201902 314ae elf/ld.so Checked on x86_64-linux-gnu.	2022-04-15 09:37:57 -03:00
Noah Goldstein	26b2478322	x86: Reduce code size of mem{move\|pcpy\|cpy}-ssse3 The goal is to remove most SSSE3 function as SSE4, AVX2, and EVEX are generally preferable. memcpy/memmove is one exception where avoiding unaligned loads with `palignr` is important for some targets. This commit replaces memmove-ssse3 with a better optimized are lower code footprint verion. As well it aliases memcpy to memmove. Aside from this function all other SSSE3 functions should be safe to remove. The performance is not changed drastically although shows overall improvements without any major regressions or gains. bench-memcpy geometric_mean(N=50) New / Original: 0.957 bench-memcpy-random geometric_mean(N=50) New / Original: 0.912 bench-memcpy-large geometric_mean(N=50) New / Original: 0.892 Benchmarks where run on Zhaoxin KX-6840@2000MHz See attached numbers for all results. More important this saves 7246 bytes of code size in memmove an additional 10741 bytes by reusing memmove code for memcpy (total 17987 bytes saves). As well an additional 896 bytes of rodata for the jump table entries.	2022-04-14 23:21:42 -05:00
Noah Goldstein	d85916e30a	x86: Remove mem{move\|cpy}-ssse3-back With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-14 23:21:42 -05:00
Noah Goldstein	41bfe224e5	x86: Remove str{p}{n}cpy-ssse3 With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-14 23:21:42 -05:00
Noah Goldstein	e084ccd37e	x86: Remove str{n}cat-ssse3 With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer SSSE3. As a result it is no longer worth it to keep the SSSE3 versions given the code size cost. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-04-14 23:21:41 -05:00

1 2 3 4 5 ...

38785 Commits