glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-04 19:00:09 +00:00

Author	SHA1	Message	Date
Noah Goldstein	9b7cfab180	x86_64: Add evex optimized __memcmpeq in memcmpeq-evex.S No bug. This commit adds new optimized __memcmpeq implementation for evex. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.	2021-10-27 13:03:46 -05:00
Noah Goldstein	b4ed69ba16	x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.S No bug. This commit adds new optimized __memcmpeq implementation for avx2. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.	2021-10-27 13:03:46 -05:00
Noah Goldstein	fa7f63d8d6	x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.S No bug. This commit does not modify any of the memcmp implementation. It just adds __memcmpeq ifdefs to skip obvious cases where computing the proper 1/-1 required by memcmp is not needed.	2021-10-27 13:03:46 -05:00
Noah Goldstein	cf4fd28ea4	x86_64: Add support for __memcmpeq using sse2, avx2, and evex No bug. This commit adds support for __memcmpeq to be implemented seperately from memcmp. Support is added for versions optimized with sse2, avx2, and evex.	2021-10-27 13:03:46 -05:00
Noah Goldstein	cf3acd774f	Benchtests: Add benchtests for __memcmpeq No bug. This commit adds __memcmpeq benchmarks. The benchmarks just use the existing ones in memcmp. This will be useful for testing implementations of __memcmpeq that do not just alias memcmp.	2021-10-27 13:03:46 -05:00
Noah Goldstein	3592ccd472	String: Add __memcmpeq as build target No bug. This commit just adds __memcmpeq as a build target so that implementations for __memcmpeq that are not just aliases to memcmp can be supported.	2021-10-27 13:03:46 -05:00
Noah Goldstein	11c88336e3	NEWS: Add item for __memcmpeq	2021-10-26 16:51:29 -05:00
Noah Goldstein	d9283b71ac	String: Add tests for __memcmpeq No bug. This commit adds tests for the new function __memcmpeq. The new tests use the existing tests in 'test-memcmp.c' but relax the result requirement to only check for zero or non-zero returns. All string tests include test-memcmpeq are passing.	2021-10-26 16:51:29 -05:00
Noah Goldstein	9894127d20	String: Add hidden defs for __memcmpeq() to enable internal usage No bug. This commit adds hidden defs for all declarations of __memcmpeq. This enables usage of __memcmpeq without the PLT for usage internal to GLIBC.	2021-10-26 16:51:29 -05:00
Noah Goldstein	44829b3ddb	String: Add support for __memcmpeq() ABI on all targets No bug. This commit adds support for __memcmpeq() as a new ABI for all targets. In this commit __memcmpeq() is implemented only as an alias to the corresponding targets memcmp() implementation. __memcmpeq() is added as a new symbol starting with GLIBC_2.35 and defined in string.h with comments explaining its behavior. Basic tests that it is callable and works where added in string/tester.c As discussed in the proposal "Add new ABI '__memcmpeq()' to libc" __memcmpeq() is essentially a reserved namespace for bcmp(). The means is shares the same specifications as memcmp() except the return value for non-equal byte sequences is any non-zero value. This is less strict than memcmp()'s return value specification and can be better optimized when a boolean return is all that is needed. __memcmpeq() is meant to only be called by compilers if they can prove that the return value of a memcmp() call is only used for its boolean value. All tests in string/tester.c passed. As well build succeeds on x86_64-linux-gnu target.	2021-10-26 16:51:29 -05:00
Fangrui Song	8438135d34	configure: Don't check LD -v --help for LIBC_LINKER_FEATURE When LIBC_LINKER_FEATURE is used to check a linker option with the equal sign, it will likely fail because the LD -v --help output may look like `-z lam-report=[none\|warning\|error]` while the needle is something like `-z lam-report=warning`. The LD -v --help filter doesn't save much time, so just remove it.	2021-10-25 13:17:44 -07:00
H.J. Lu	f9b152c83f	elf: Make global.out depend on reldepmod4.so [BZ #28457 ] The global test is linked with globalmod1.so which dlopens reldepmod4.so. Make global.out depend on reldepmod4.so. This fixes BZ #28457. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-10-25 07:13:54 -07:00
Noah Goldstein	bad852b61b	x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. it could potentially be dangerous to use SSE2 if this function is ever called without using 'vzeroupper' beforehand. While compilers appear to use 'vzeroupper' before function calls if AVX2 has been used, using SSE2 here is more brittle. Since it is not absolutely necessary it should be avoided. It costs 2-extra bytes but the extra bytes should only eat into alignment padding. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-23 13:02:42 -05:00
H.J. Lu	d8e7d06381	bench-math: Sort and put each bench per line Sort and put each math bench per line to prepare for new math benches. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-23 05:20:25 -07:00
Sunil K Pandey	4f690aad9e	x86_64: Add missing libmvec ABI tests Add vector ABI tests for cos, exp, log, pow and sin functions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-22 06:46:49 -07:00
Adhemerval Zanella	927246e188	elf: Fix `e6fd79f379` build with --enable-tunables=no The _dl_sort_maps_init() is not defined when tunables is not enabled. Checked on x86_64-linux-gnu.	2021-10-21 17:26:32 -03:00
Chung-Lin Tang	15a0c5730d	elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645 ) This second patch contains the actual implementation of a new sorting algorithm for shared objects in the dynamic loader, which solves the slow behavior that the current "old" algorithm falls into when the DSO set contains circular dependencies. The new algorithm implemented here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Another notable implementation adjustment related to this _dl_sort_maps change is the removing of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes. This has been changed to simply use two new bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes do along the way. Tunable support for switching between different sorting algorithms at runtime is also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1 (old algorithm) and 2 (new DFS algorithm) has been added. At time of commit of this patch, the default setting is 1 (old algorithm). Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-21 11:23:53 -03:00
Chung-Lin Tang	e6fd79f379	elf: Testing infrastructure for ld.so DSO sorting (BZ #17645 ) This is the first of a 2-part patch set that fixes slow DSO sorting behavior in the dynamic loader, as reported in BZ #17645. In order to facilitate such a large modification to the dynamic loader, this first patch implements a testing framework for validating shared object sorting behavior, to enable comparison between old/new sorting algorithms, and any later enhancements. This testing infrastructure consists of a Python script scripts/dso-ordering-test.py' which takes in a description language, consisting of strings that describe a set of link dependency relations between DSOs, and generates testcase programs and Makefile fragments to automatically test the described situation, for example: a->b->c->d # four objects linked one after another a->[bc]->d;b->c # a depends on b and c, which both depend on d, # b depends on c (b,c linked to object a in fixed order) a->b->c;{+a;%a;-a} # a, b, c serially dependent, main program uses # dlopen/dlsym/dlclose on object a a->b->c;{}!->[abc] # a, b, c serially dependent; multiple tests generated # to test all permutations of a, b, c ordering linked # to main program (Above is just a short description of what the script can do, more documentation is in the script comments.) Two files containing several new tests, elf/dso-sort-tests-[12].def are added, including test scenarios for BZ #15311 and Redhat issue #1162810 [1]. Due to the nature of dynamic loader tests, where the sorting behavior and test output occurs before/after main(), generating testcases to use support/test-driver.c does not suffice to control meaningful timeout for ld.so. Therefore a new utility program 'support/test-run-command', based on test-driver.c/support_test_main.c has been added. This does the same testcase control, but for a program specified through a command-line rather than at the source code level. This utility is used to run the dynamic loader testcases generated by dso-ordering-test.py. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1162810 Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-21 11:23:53 -03:00
Stafford Horne	0ff2d30dae	iconv: Use TIMEOUTFACTOR for iconv test timeout Currently the timeout for each iconv test is hard coded to 3 seconds. On my OpenRISC test platform this is too slow and the test fails with a HANG error. This change uses the available TIMEOUTFACTOR to compute the timeout. The default value is still 3. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-21 11:11:07 -03:00
Adhemerval Zanella	4e32c8f568	posix: Remove alloca usage for internal fnmatch implementation This patch replaces the internal fnmatch pattern list generation to use a dynamic array. Checked on x86_64-linux-gnu.	2021-10-21 10:30:31 -03:00
Jonathan Wakely	8a9a593115	Add alloc_align attribute to memalign et al GCC 4.9.0 added the alloc_align attribute to say that a function argument specifies the alignment of the returned pointer. Clang supports the attribute too. Using the attribute can allow a compiler to generate better code if it knows the returned pointer has a minimum alignment. See https://gcc.gnu.org/PR60092 for more details. GCC implicitly knows the semantics of aligned_alloc and posix_memalign, but not the obsolete memalign. As a result, GCC generates worse code when memalign is used, compared to aligned_alloc. Clang knows about aligned_alloc and memalign, but not posix_memalign. This change adds a new __attribute_alloc_align__ macro to <sys/cdefs.h> and then uses it on memalign (where it helps GCC) and aligned_alloc (where GCC and Clang already know the semantics, but it doesn't hurt) and xposix_memalign. It can't be used on posix_memalign because that doesn't return a pointer (the allocated pointer is returned via a void** parameter instead). Unlike the alloc_size attribute, alloc_align only allows a single argument. That means the new __attribute_alloc_align__ macro doesn't really need to be used with double parentheses to protect a comma between its arguments. For consistency with __attribute_alloc_size__ this patch defines it the same way, so that double parentheses are required. Signed-off-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2021-10-21 00:19:20 +01:00
Fangrui Song	aa783f9a7b	linux: Fix a possibly non-constant expression in _Static_assert According to C11 6.6p6, `const int` as an operand may not make up a constant expression. GCC -O0 errors: ../sysdeps/unix/sysv/linux/opendir.c:107:19: error: static_assert expression is not an integral constant expression _Static_assert (allocation_size >= sizeof (struct dirent64), -O2 -Wpedantic has a similar warning. See https://gcc.gnu.org/PR102502 for GCC's inconsistency. Use enum which is guaranteed to be a constant expression. This also makes the file compilable with Clang. Fixes: `4b962c9e85` ("linux: Simplify opendir buffer allocation")	2021-10-20 14:22:43 -07:00
H.J. Lu	d962cce139	x86-64: Add sysdeps/x86_64/fpu/Makeconfig 1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which contains libmvec ABI test dependencies and CFLAGS, in the build directory. 2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS. Tested on SSE4, AVX, AVX2 and AVX512 machines. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-20 11:53:45 -07:00
omain GEISSLER	e037274c8e	stdlib: Fix tst-canon-bz26341 when the glibc build current working directory is itself using symlinks.	2021-10-20 12:01:40 -03:00
Adhemerval Zanella	82fd7314c7	powerpc: Remove backtrace implementation The powerpc optimization to provide a fast stacktrace requires some ad-hoc code to handle Linux signal frames and the change is fragile once the kernel decides to slight change its execution sequence [1]. The generic implementation work as-is and it should be future proof since the kernel provides the expected CFI directives in vDSO shared page. Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and powerpc64-linux-gnu. [1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html	2021-10-20 10:40:53 -03:00
Joseph Myers	2c6cabb3a4	Correct access attribute on memfrob (bug 28475) As noted in bug 28475, the access attribute on memfrob in <string.h> is incorrect: the function both reads and writes the memory pointed to by its argument, so it needs to use __read_write__, not __write_only__. This incorrect attribute results in a build failure for accessing uninitialized memory for s390x-linux-gnu-O3 with build-many-glibcs.py using GCC mainline. Correct the attribute. Fixing this shows up that some calls to memfrob in elf/ tests are reading uninitialized memory; I'm not entirely sure of the purpose of those calls, but guess they are about ensuring that the stack space is indeed allocated at that point in the function, and so it matters that they are calling a function whose semantics are unknown to the compiler. Thus, change the first memfrob call in those tests to use explicit_bzero instead, as suggested by Florian in <https://sourceware.org/pipermail/libc-alpha/2021-October/132119.html>, to avoid the use of uninitialized memory. Tested for x86_64, and with build-many-glibcs.py (GCC mainline) for s390x-linux-gnu-O3.	2021-10-20 13:38:50 +00:00
Siddhesh Poyarekar	ad6f2a010c	debug: Add tests for _FORTIFY_SOURCE=3 Add some testing coverage for _FORTIFY_SOURCE=3. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-20 18:13:05 +05:30
Siddhesh Poyarekar	a643f60c53	Make sure that the fortified function conditionals are constant In _FORTIFY_SOURCE=3, the size expression may be non-constant, resulting in branches in the inline functions remaining intact and causing a tiny overhead. Clang (and in future, gcc) make sure that the -1 case is always safe, i.e. any comparison of the generated expression with (size_t)-1 is always false so that bit is taken care of. The rest is avoidable since we want the _chk variant whenever we have a size expression and it's not -1. Rework the conditionals in a uniform way to clearly indicate two conditions at compile time: - Either the size is unknown (-1) or we know at compile time that the operation length is less than the object size. We can call the original function in this case. It could be that either the length, object size or both are non-constant, but the compiler, through range analysis, is able to fold the comparison to a constant. - The size and length are known and the compiler can see at compile time that operation length > object size. This is valid grounds for a warning at compile time, followed by emitting the _chk variant. For everything else, emit the _chk variant. This simplifies most of the fortified function implementations and at the same time, ensures that only one call from _chk or the regular function is emitted. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-20 18:12:41 +05:30
Siddhesh Poyarekar	e938c02748	Don't add access size hints to fortifiable functions In the context of a function definition, the size hints imply that the size of an object pointed to by one parameter is another parameter. This doesn't make sense for the fortified versions of the functions since that's the bit it's trying to validate. This is harmless with __builtin_object_size since it has fairly simple semantics when it comes to objects passed as function parameters. With __builtin_dynamic_object_size we could (as my patchset for gcc[1] already does) use the access attribute to determine the object size in the general case but it misleads the fortified functions. Basically the problem occurs when access attributes are present on regular functions that have inline fortified definitions to generate _chk variants; the attributes get inherited by these definitions, causing problems when analyzing them. For example with poll(fds, nfds, timeout), nfds is hinted using the __attr_access as being the size of fds. Now, when analyzing the inline function definition in bits/poll2.h, the compiler sees that nfds is the size of fds and tries to use that information in the function body. In _FORTIFY_SOURCE=3 case, where the object size could be a non-constant expression, this information results in the conclusion that nfds is the size of fds, which defeats the purpose of the implementation because we're trying to check here if nfds does indeed represent the size of fds. Hence for this case, it is best to not have the access attribute. With the attributes gone, the expression evaluation should get delayed until the function is actually inlined into its destinations. Disable the access attribute for fortified function inline functions when building at _FORTIFY_SOURCE=3 to make this work better. The access attributes remain for the _chk variants since they can be used by the compiler to warn when the caller is passing invalid arguments. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581125.html Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-10-20 08:33:31 +05:30
Fangrui Song	46baeb61e1	glibcextract.py: Place un-assemblable @@@ in a comment Unlike GCC, Clang parses asm statements and verifies they are valid instructions/directives. Place the magic @@@ into a comment to avoid a parse error.	2021-10-19 09:58:16 -07:00
Fangrui Song	53d19edf7b	nss: Unnest nested function add_key This makes makedb.c compilable with Clang which does not support nested functions. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-19 09:36:43 -07:00
H.J. Lu	2ec99d8c42	ld.so: Initialize bootstrap_map.l_ld_readonly [BZ #28340 ] 1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly before calling elf_get_dynamic_info to get dynamic info in bootstrap_map, 2. Define a single static inline bool dl_relocate_ld (const struct link_map l) { / Don't relocate dynamic section if it is readonly */ return !(l->l_ld_readonly \|\| DL_RO_DYN_SECTION); } This updates BZ #28340 fix.	2021-10-19 06:40:38 -07:00
Stafford Horne	1d550265a7	timex: Use 64-bit fields on 32-bit TIMESIZE=64 systems (BZ #28469 ) This was found when testing the OpenRISC port I am working on. These two tests fail with SIGSEGV: FAIL: misc/tst-ntp_gettime FAIL: misc/tst-ntp_gettimex This was found to be due to the kernel overwriting the stack space allocated by the timex structure. The reason for the overwrite being that the kernel timex has 64-bit fields and user space code only allocates enough stack space for timex with 32-bit fields. On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined. This causes the timex structure to use 32-bit fields with type __syscall_slong_t. This patch adjusts the ifdef condition to allow 32-bit systems with TIMESIZE=64 to use the 64-bit long long timex definition. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-18 17:17:20 -03:00
Stafford Horne	ad6feef1b0	manual: Update _TIME_BITS to clarify it's user defined The current language reads "This macro determines...", changing to "Define this macro...". This is consistent with other feature macro documentation language. When I first read the previous language it seems to indicate that the macro is already defined. By changing the language to "Define this macro..." it's clear that its the user's responsibility to define it.	2021-10-18 13:31:15 -03:00
Stafford Horne	06acd6d1d6	nptl: Fix tst-cancel7 and tst-cancelx7 pidfile race The check for waiting for the pidfile to be created looks wrong. At the point when ACCESS is run the pid file will always be created and accessible as it is created during DO_PREPARE. This means that thread cancellation may be performed before the pid is written to the pidfile. This was found to be flaky when testing on my OpenRISC platform. Fix this by using the semaphore to wait for pidfile pid write completion. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-18 13:30:10 -03:00
Adhemerval Zanella	5118dcac68	elf: Fix elf_get_dynamic_info() for bootstrap THe `d6d89608ac` broke powerpc for --enable-bind-now because it turned out that different than patch assumption rtld elf_get_dynamic_info() does require to handle RTLD_BOOTSTRAP to avoid DT_FLAGS and DT_RUNPATH (more specially the GLRO usage which is not reallocate yet). This patch fixes by passing two arguments to elf_get_dynamic_info() to inform that by rtld (bootstrap) or static pie initialization (static_pie_bootstrap). I think using explicit argument is way more clear and burried C preprocessor, and compiler should remove the dead code. I checked on x86_64 and i686 with default options, --enable-bind-now, and --enable-bind-now and --enable--static-pie. I also check on aarch64, armhf, powerpc64, and powerpc with default and --enable-bind-now.	2021-10-18 09:51:56 -03:00
Samuel Thibault	1d3decee99	hurd if_index: Explicitly use AF_INET for if index discovery `5bf07e1b3a` ("Linux: Simplify __opensock and fix race condition [BZ #28353]") made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET knows about network interfaces, so better actually specify that in if_index.	2021-10-18 01:39:02 +02:00
Samuel Thibault	1d20f33ff4	hurd: Fix intr-msg parameter/stack kludge INTR_MSG_TRAP was tinkering with esp to make it point to _hurd_intr_rpc_mach_msg's parameters, and notably use (&msg)[-1] which is meaningless in C. Instead, just push the parameters on the stack, which also avoids leaving local variables of _hurd_intr_rpc_mach_msg below esp. We now also properly express that OPTION and TIMEOUT may be updated during the trap call.	2021-10-18 00:50:41 +02:00
H.J. Lu	9d3c9a046a	x86-64: Add test-vector-abi.h/test-vector-abi-sincos.h Add templates for vector ABI test and use them for vector sincos/sincosf ABI tests.	2021-10-14 11:59:12 -07:00
Adhemerval Zanella	d6d89608ac	elf: Fix dynamic-link.h usage on rtld.c The `4af6982e4c` fix does not fully handle RTLD_BOOTSTRAP usage on rtld.c due two issues: 1. RTLD_BOOTSTRAP is also used on dl-machine.h on various architectures and it changes the semantics of various machine relocation functions. 2. The elf_get_dynamic_info() change was done sideways, previously to `490e6c62aa` get-dynamic-info.h was included by the first dynamic-link.h include without RTLD_BOOTSTRAP being defined. It means that the code within elf_get_dynamic_info() that uses RTLD_BOOTSTRAP is in fact unused. To fix 1. this patch now includes dynamic-link.h only once with RTLD_BOOTSTRAP defined. The ELF_DYNAMIC_RELOCATE call will now have the relocation fnctions with the expected semantics for the loader. And to fix 2. part of `4af6982e4c` is reverted (the check argument elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP pieces are removed. To reorganize the includes the static TLS definition is moved to its own header to avoid a circular dependency (it is defined on dynamic-link.h and dl-machine.h requires it at same time other dynamic-link.h definition requires dl-machine.h defitions). Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL are moved to its own header. Only ancient ABIs need special values (arm, i386, and mips), so a generic one is used as default. The powerpc Elf64_FuncDesc is also moved to its own header, since csu code required its definition (which would require either include elf/ folder or add a full path with elf/). Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32, and powerpc64le. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-10-14 14:52:07 -03:00
Noah Goldstein	e59ced2384	x86: Optimize memset-vec-unaligned-erms.S No bug. Optimization are 1. change control flow for L(more_2x_vec) to fall through to loop and jump for L(less_4x_vec) and L(less_8x_vec). This uses less code size and saves jumps for length > 4x VEC_SIZE. 2. For EVEX/AVX512 move L(less_vec) closer to entry. 3. Avoid complex address mode for length > 2x VEC_SIZE 4. Slightly better aligning code for the loop from the perspective of code size and uops. 5. Align targets so they make full use of their fetch block and if possible cache line. 6. Try and reduce total number of icache lines that will need to be pulled in for a given length. 7. Include "local" version of stosb target. For AVX2/EVEX/AVX512 jumping to the stosb target in the sse2 code section will almost certainly be to a new page. The new version does increase code size marginally by duplicating the target but should get better iTLB behavior as a result. test-memset, test-wmemset, and test-bzero are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-12 13:38:02 -05:00
Noah Goldstein	1bd8b8d58f	x86: Optimize memcmp-evex-movbe.S for frontend behavior and size No bug. The frontend optimizations are to: 1. Reorganize logically connected basic blocks so they are either in the same cache line or adjacent cache lines. 2. Avoid cases when basic blocks unnecissarily cross cache lines. 3. Try and 32 byte align any basic blocks possible without sacrificing code size. Smaller / Less hot basic blocks are used for this. Overall code size shrunk by 168 bytes. This should make up for any extra costs due to aligning to 64 bytes. In general performance before deviated a great deal dependending on whether entry alignment % 64 was 0, 16, 32, or 48. These changes essentially make it so that the current implementation is at least equal to the best alignment of the original for any arguments. The only additional optimization is in the page cross case. Branch on equals case was removed from the size == [4, 7] case. As well the [4, 7] and [2, 3] case where swapped as [4, 7] is likely a more hot argument size. test-memcmp and test-wmemcmp are both passing.	2021-10-12 12:02:12 -05:00
Stafford Horne	8faa1e0449	libio: Update tst-wfile-sync to not depend on stdin The test expects stdin to be a file which is not the case when running tests over ssh where stdin is piped in. The test fails with: error: xlseek.c:27: lseek64 (0, 0, 1): Illegal seek Update the test to create a temporary file and use that to perform the test. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-12 13:25:43 -03:00
Stafford Horne	171ab1af56	elf: Update audit tests to not depend on stdout The tst-audit14, tst-audit15 and tst-audit16 tests all have audit modules that write to stdout; the test reads from stdout to confirm what was written. This assumes the stdout is a file which is not the case when run over ssh. This patch updates the tests to use a post run cmp command to compare the output against and .exp file. This is similar to how many other tests work and it fixes the stdout limitation. Also, this means the test code can be greatly simplified. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-12 13:25:43 -03:00
Adhemerval Zanella	4af6982e4c	elf: Fix elf_get_dynamic_info definition Before to `490e6c62aa` ('elf: Avoid nested functions in the loader [BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on the first dynamic-link.h include and later within _dl_start(). The former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used on setup_vdso() (since it is a global definition), while the former does define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation. With the commit change, the function is now included and defined once instead of defined as a nested function. So rtld.c defines without defining RTLD_BOOTSTRAP and it brokes at least powerpc32. This patch fixes by moving the get-dynamic-info.h include out of dynamic-link.h, which then the caller can corirectly set the expected semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or RESOLVE_MAP. It also required to enable some asserts only for the loader bootstrap to avoid issues when called from setup_vdso(). As a side note, this is another issues with nested functions: it is not clear from pre-processed output (-E -dD) how the function will be build and its semantic (since nested function will be local and extra C defines may change it). I checked on x86_64-linux-gnu (w/o --enable-static-pie), i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4, aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and s390x-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>	2021-10-12 13:25:43 -03:00
Joseph Myers	de82cb0da4	Add TEST_COMPARE_STRING_WIDE to support/check.h I'd like to be able to test narrow and wide string interfaces, with the narrow string tests using TEST_COMPARE_STRING and the wide string tests using something analogous (possibly generated using macros from a common test template for both the narrow and wide string tests where appropriate). Add such a TEST_COMPARE_STRING_WIDE, along with functions support_quote_blob_wide and support_test_compare_string_wide that it builds on. Those functions are built using macros from common templates shared by the narrow and wide string implementations, though I didn't do that for the tests of test functions. In support_quote_blob_wide, I chose to use the \x{} delimited escape sequence syntax proposed for C2X in N2785, rather than e.g. trying to generate the end of a string and the start of a new string when ambiguity would result from undelimited \x (when the next character after such an escape sequence is valid hex) or forcing an escape sequence to be used for the next character in the case of such ambiguity. Tested for x86_64.	2021-10-12 13:48:39 +00:00
Joseph Myers	4912c738fc	Fix nios2 localplt failure Building for nios2-linux-gnu has recently started showing a localplt test failure, arising from a reference to __floatunsidf from getloadavg after commit `b5c8a3aa82` ("Linux: implement getloadavg(3) using sysinfo(2)") (this is an architecture with soft-fp in libc). Add this as a permitted local PLT reference in localplt.data. Tested with build-many-glibcs.py for nios2-linux-gnu.	2021-10-11 21:47:32 +00:00
Fangrui Song	bf433b849a	elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) Intel MPX failed to gain wide adoption and has been deprecated for a while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019. This patch removes the support code from the dynamic loader. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-11 11:14:02 -07:00
Martin Sebor	eb73b87897	resolv: Avoid GCC 12 false positive warning [BZ #28439 ]. Replace a call to sprintf with an equivalent pair of stpcpy/strcpy calls to avoid a GCC 12 -Wformat-overflow false positive due to recent optimizer improvements.	2021-10-11 09:36:57 -06:00
Noah Goldstein	5d26d12f4a	benchtests: Add medium cases and increase iters in bench-memset.c No bug. This commit adds new medium size cases for lengths in [512, 1024). As well it increase the iters to INNER_LOOP_ITERS_LARGE for more reliable results. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-08 15:13:06 -05:00

1 2 3 4 5 ...

37975 Commits