glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-30 00:31:08 +00:00

Author	SHA1	Message	Date
Sunil K Pandey	8ab8afb336	x86-64: Add vector log/logf to libmvec microbenchmark Add vector log/logf and input files to libmvec microbenchmark. libmvec-log-inputs: 70% Normal random distribution range: (0.0, DBL_MAX) mean: 1.0 sigma: 50.0 30% uniform random distribution in range (0.0, DBL_MAX) libmvec-logf-inputs: 70% Normal random distribution range: (0.0f, FLT_MAX) mean: 1.0f sigma: 50.0f 30% uniform random distribution in range (0.0f, FLT_MAX) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:48:14 -08:00
Sunil K Pandey	37df38bd5f	x86-64: Add vector exp/expf to libmvec microbenchmark Add vector exp/expf and input files to libmvec microbenchmark. libmvec-exp-inputs: 90% Normal random distribution range: (-708.0, 709.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-500.0, 500.0) libmvec-expf-inputs: 90% Normal random distribution range: (-87.0f, 88.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-50.0f, 50.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:46:59 -08:00
Sunil K Pandey	4443695598	x86-64: Add vector cos/cosf to libmvec microbenchmark Add vector cos/cosf and input files to libmvec microbenchmark. libmvec-cos-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-cosf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:45:20 -08:00
Adhemerval Zanella	456b3c08b6	io: Refactor close_range and closefrom Now that Hurd implementis both close_range and closefrom (`f2c996597d`), we can make close_range() a base ABI, and make the default closefrom() implementation on top of close_range(). The generic closefrom() implementation based on __getdtablesize() is moved to generic close_range(). On Linux it will be overriden by the auto-generation syscall while on Hurd it will be a system specific implementation. The closefrom() now calls close_range() and __closefrom_fallback(). Since on Hurd close_range() does not fail, __closefrom_fallback() is an empty static inline function set by__ASSUME_CLOSE_RANGE. The __ASSUME_CLOSE_RANGE also allows optimize Linux __closefrom_fallback() implementation when --enable-kernel=5.9 or higher is used. Finally the Linux specific tst-close_range.c is moved to io and enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are guarded so it can be built for Hurd (I have not actually test it). Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu build.	2021-11-24 09:09:37 -03:00
Florian Weimer	e186fc5a31	nptl: Do not set signal mask on second setjmp return [BZ #28607 ] __libc_signal_restore_set was in the wrong place: It also ran when setjmp returned the second time (after pthread_exit or pthread_cancel). This is observable with blocked pending signals during thread exit. Fixes commit `b3cae39dcb` ("nptl: Start new threads with all signals blocked [BZ #25098]"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-24 08:59:54 +01:00
Adhemerval Zanella	aac54dcd37	powerpc: Define USE_PPC64_NOTOC iff compiler supports it The @notoc usage only yields an advantage on ISA 3.1+ machine (power10) and for ld.bfd also when it sees pcrel relocations used on the code (generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+ instruction on stubs are used iff linker also sees the new pc-relative relocations (for instance R_PPC64_D34), otherwise it generates default stubs (ppc64_elf_check_relocs:4700). This patch also help on linkers that do not implement this optimization, since building for older ISA (such as 3.0 / power9) will also trigger power10 stubs generation in the assembly code uses the NOTOC imacro. Checked on powerpc64le-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2021-11-22 14:49:11 -03:00
Adhemerval Zanella	bc801b3a40	setjmp: Replace jmp_buf-macros.h with jmp_buf-macros.sym It requires less boilerplate code for newer ports. The _Static_assert checks from internal setjmp are moved to its own internal test since setjmp.h is included early by multiple headers (to generate rtld-sizes.sym). The riscv jmp_buf-macros.h check is also redundant, it is already done by riscv configure.ac. Checked with a build for the affected architectures.	2021-11-22 13:43:22 -03:00
Joseph Myers	5c3ece451d	Update kernel version to 5.15 in tst-mman-consts.py This patch updates the kernel version in the test tst-mman-consts.py to 5.15. (There are no new MAP_* constants covered by this test in 5.15 that need any other header changes.) Tested with build-many-glibcs.py.	2021-11-22 15:30:12 +00:00
Joseph Myers	bdeb7a8fa9	Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add these constants to bits/socket.h. Tested for x86_64.	2021-11-17 14:25:16 +00:00
Florian Weimer	f1d333b5bf	elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres This will be used to deallocate memory allocated using the non-minimal malloc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:29 +01:00
Florian Weimer	8bd336a00a	nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c And make it an installed header. This addresses a few aliasing violations (which do not seem to result in miscompilation due to the use of atomics), and also enables use of wide counters in other parts of the library. The debug output in nptl/tst-cond22 has been adjusted to print the 32-bit values instead because it avoids a big-endian/little-endian difference. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:13 +01:00
Sunil K Pandey	a43c0b5483	x86-64: Create microbenchmark infrastructure for libmvec Add python script to generate libmvec microbenchmark from the input values for each libmvec function using skeleton benchmark template. Creates double and float benchmarks with vector length 1, 2, 4, 8, and 16 for each libmvec function. Vector length 1 corresponds to scalar version of function and is included for vector function perf comparison. Co-authored-by: Haochen Jiang <haochen.jiang@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-16 11:37:39 -08:00
Noah Goldstein	2f9062d717	x86: Shrink memcmp-sse4.S code size No bug. This implementation refactors memcmp-sse4.S primarily with minimizing code size in mind. It does this by removing the lookup table logic and removing the unrolled check from (256, 512] bytes. memcmp-sse4 code size reduction : -3487 bytes wmemcmp-sse4 code size reduction: -1472 bytes The current memcmp-sse4.S implementation has a large code size cost. This has serious adverse affects on the ICache / ITLB. While in micro-benchmarks the implementations appears fast, traces of real-world code have shown that the speed in micro benchmarks does not translate when the ICache/ITLB are not primed, and that the cost of the code size has measurable negative affects on overall application performance. See https://research.google/pubs/pub48320/ for more details. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-10 20:12:10 -06:00
Joseph Myers	3387c40a8b	Update syscall lists for Linux 5.15 Linux 5.15 has one new syscall, process_mrelease (and also enables the clone3 syscall for RV32). It also has a macro __NR_SYSCALL_MASK for Arm, which is not a syscall but matches the pattern used for syscall macro names. Add __NR_SYSCALL_MASK to the names filtered out in the code dealing with syscall lists, update syscall-names.list for the new syscall and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py.	2021-11-10 15:21:19 +00:00
Florian Weimer	98966749f2	s390: Use long branches across object boundaries (jgh instead of jh) Depending on the layout chosen by the linker, the 16-bit displacement of the jh instruction is insufficient to reach the target label. Analysis of the linker failure was carried out by Nick Clifton. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Stefan Liebler <stli@linux.ibm.com>	2021-11-10 15:21:37 +01:00
H.J. Lu	0bd356df1a	Remove the unused +mkdep/+make-deps/s-proto.S/s-proto-cancel.S Since commit `d73f5331ce` Author: Roland McGrath <roland@gnu.org> Date: Fri May 2 02:20:45 2003 +0000 2003-05-01 Roland McGrath <roland@redhat.com> dependency is generated by passing -MD -MF to compiler. Remove the unused +mkdep, +make-deps, s-proto.S and s-proto-cancel.S. This fixes BZ #28554.	2021-11-10 04:54:18 -08:00
Adhemerval Zanella	824dd3ec49	Fix build a chec failures after `b05fae4d8e` The include cleanup on dl-minimal.c removed too much for some targets. Also for Hurd, __sbrk is removed from localplt.data now that tunables allocated memory through mmap. Checked with a build for all affected architectures.	2021-11-09 23:21:22 -03:00
Adhemerval Zanella	b05fae4d8e	elf: Use the minimal malloc on tunables_strdup The rtld_malloc functions are moved to its own file so it can be used on csu code. Also, the functiosn are renamed to __minimal_* (since there are now used not only on loader code). Using the __minimal_malloc on tunables_strdup() avoids potential issues with sbrk() calls while processing the tunables (I see sporadic elf/tst-dso-ordering9 on powerpc64le with different tests failing due ASLR). Also, using __minimal_malloc over plain mmap optimizes the memory allocation on both static and dynamic case (since it will any unused space in either the last page of data segments, avoiding mmap() call, or from the previous mmap() call). Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-11-09 14:11:25 -03:00
Samuel Thibault	d41985b71e	hurd: Remove unused __libc_close_range That was just cargo-culted.	2021-11-07 16:23:51 +01:00
Sergey Bugaev	f2c996597d	hurd: Implement close_range and closefrom The close_range () function implements the same API as the Linux and FreeBSD syscalls. It operates atomically and reliably. The specified upper bound is clamped to the actual size of the file descriptor table; it is expected that the most common use case is with last = UINT_MAX. Like in the Linux syscall, it is also possible to pass the CLOSE_RANGE_CLOEXEC flag to mark the file descriptors in the range cloexec instead of acually closing them. Also, add a Hurd version of the closefrom () function. Since unlike on Linux, close_range () cannot fail due to being unuspported by the running kernel, a fallback implementation is never necessary. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20211106153524.82700-1-bugaevc@gmail.com>	2021-11-07 16:16:11 +01:00
Noah Goldstein	475b63702e	x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:18:08 -05:00
Noah Goldstein	a6b7502ec0	x86: Optimize memmove-vec-unaligned-erms.S No bug. The optimizations are as follows: 1) Always align entry to 64 bytes. This makes behavior more predictable and makes other frontend optimizations easier. 2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have significant benefits in the case that: 0 < (dst - src) < [256, 512] 3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%] improvement and for FSRM [-10%, 25%]. In addition to these primary changes there is general cleanup throughout to optimize the aligning routines and control flow logic. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:18:03 -05:00
Paul A. Clarke	9fea0f1a2a	[powerpc] Tighten contraints for asm constant parameters There are a few places where only known numeric values are acceptable for `asm` parameters, yet the constraint "i" is used. "i" can include "symbolic constants whose values will be known only at assembly time or later." Use "n" instead of "i" where known numeric values are required. Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2021-11-03 09:17:28 -05:00
Adhemerval Zanella	09f214528c	riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN It allows build both glibc and tests with lld (Since lld does not support R_RISCV_ALIGN linker relaxation). Checked with a build for riscv32-linux-gnu-rv32imafdc-ilp32d and riscv64-linux-gnu-rv64imafdc-lp64d. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Fangrui Song <maskray@google.com>	2021-11-03 09:25:06 -03:00
Fangrui Song	6720d36b66	x86-64: Replace movzx with movzbl Clang cannot assemble movzx in the AT&T dialect mode. ../sysdeps/x86_64/strcmp.S:2232:16: error: invalid operand for instruction movzx (%rsi), %ecx ^~~~ Change movzx to movzbl, which follows the AT&T dialect and is used elsewhere in the file. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-02 20:59:52 -07:00
Florian Weimer	cca75bd8b5	i386: Explain why __HAVE_64B_ATOMICS has to be 0	2021-11-02 10:26:23 +01:00
Adhemerval Zanella	613cb5c7b1	arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors support The lld linker does not support TLSDESC for arm. The have-arm-tls-desc is a leftover of `56583289b1` to support NaCL. Reviewed-by: Fangrui Song <maskray@google.com>	2021-11-01 16:23:15 -03:00
Adhemerval Zanella	d6dea8c847	arm: Use internal symbol for _dl_argv on _dl_start_user The lld does not support R_ARM_GOTOFF32 to preemptible symbol (_dl_argv has default visibility). Use the internal alias instead (one option would to use HIDDEN_JUMPTARGET, bu the macro is not defined for !__ASSEMBLER__ and I made this patch arm-specific to avoid require to check extensivelly on other architecture it this might break something). Checked on arm-linux-gnueabihf. Reviewed-by: Fangrui Song <maskray@google.com>	2021-11-01 16:21:53 -03:00
H.J. Lu	14dbbf46a0	x86-64: Remove Prefer_AVX2_STRCMP Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake.	2021-11-01 07:53:04 -07:00
H.J. Lu	c46e9afb2d	x86-64: Improve EVEX strcmp with masked load In strcmp-evex.S, to compare 2 32-byte strings, replace VMOVU (%rdi, %rdx), %YMM0 VMOVU (%rsi, %rdx), %YMM1 /* Each bit in K0 represents a mismatch in YMM0 and YMM1. / VPCMP $4, %YMM0, %YMM1, %k0 VPCMP $0, %YMMZERO, %YMM0, %k1 VPCMP $0, %YMMZERO, %YMM1, %k2 / Each bit in K1 represents a NULL in YMM0 or YMM1. / kord %k1, %k2, %k1 / Each bit in K1 represents a NULL or a mismatch. / kord %k0, %k1, %k1 kmovd %k1, %ecx testl %ecx, %ecx jne L(last_vector) with VMOVU (%rdi, %rdx), %YMM0 VPTESTM %YMM0, %YMM0, %k2 / Each bit cleared in K1 represents a mismatch or a null CHAR in YMM0 and 32 bytes at (%rsi, %rdx). */ VPCMP $0, (%rsi, %rdx), %YMM0, %k1{%k2} kmovd %k1, %ecx incl %ecx jne L(last_vector) It makes EVEX strcmp faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake. Co-Authored-By: Noah Goldstein <goldstein.w.n@gmail.com>	2021-11-01 07:52:56 -07:00
Stafford Horne	6446c725d4	Fix compiler issue with mmap_internal Compiling mmap_internal fails to compile when we use -1 for MMAP2_PAGE_UNIT on 32 bit architectures. The error is as follows: ../sysdeps/unix/sysv/linux/mmap_internal.h:30:8: error: unknown type name 'uint64_t' \| 30 \| static uint64_t page_unit; \| \| ^~~~~~~~ Fix by adding including stdint.h.	2021-10-29 09:21:37 -03:00
Noah Goldstein	1d56fd3bae	x86_64: Add memcmpeq.S to fix disable-multi-arch build The following commit: commit `cf4fd28ea4` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Tue Oct 26 19:43:18 2021 -0500 Broke --disable-multi-arch build for x86_64 because x86_64/memcmpeq.S was not defined outside of multiarch and the alias for __memcmpeq in x86_64/memcmp.S was removed. This commit fixes that issue by adding x86_64/memcmpeq.S. make xcheck passes on x86_64 with and without --disable-multi-arch	2021-10-28 16:35:50 -05:00
Fangrui Song	6838920383	riscv: Fix incorrect jal with HIDDEN_JUMPTARGET A non-local STV_DEFAULT defined symbol is by default preemptible in a shared object. j/jal cannot target a preemptible symbol. On other architectures, such a jump instruction either causes PLT [BZ #18822], or if short-ranged, sometimes rejected by the linker (but not by GNU ld's riscv port [ld PR/28509]). Use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead. With this patch, ld.so and libc.so can be linked with LLD if source files are compiled/assembled with -mno-relax/-Wa,-mno-relax. Acked-by: Palmer Dabbelt <palmer@dabbelt.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-28 11:39:49 -07:00
Noah Goldstein	9b7cfab180	x86_64: Add evex optimized __memcmpeq in memcmpeq-evex.S No bug. This commit adds new optimized __memcmpeq implementation for evex. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.	2021-10-27 13:03:46 -05:00
Noah Goldstein	b4ed69ba16	x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.S No bug. This commit adds new optimized __memcmpeq implementation for avx2. The primary optimizations are: 1) skipping the logic to find the difference of the first mismatched byte. 2) not updating src/dst addresses as the non-equals logic does not need to be reused by different areas.	2021-10-27 13:03:46 -05:00
Noah Goldstein	fa7f63d8d6	x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.S No bug. This commit does not modify any of the memcmp implementation. It just adds __memcmpeq ifdefs to skip obvious cases where computing the proper 1/-1 required by memcmp is not needed.	2021-10-27 13:03:46 -05:00
Noah Goldstein	cf4fd28ea4	x86_64: Add support for __memcmpeq using sse2, avx2, and evex No bug. This commit adds support for __memcmpeq to be implemented seperately from memcmp. Support is added for versions optimized with sse2, avx2, and evex.	2021-10-27 13:03:46 -05:00
Noah Goldstein	9894127d20	String: Add hidden defs for __memcmpeq() to enable internal usage No bug. This commit adds hidden defs for all declarations of __memcmpeq. This enables usage of __memcmpeq without the PLT for usage internal to GLIBC.	2021-10-26 16:51:29 -05:00
Noah Goldstein	44829b3ddb	String: Add support for __memcmpeq() ABI on all targets No bug. This commit adds support for __memcmpeq() as a new ABI for all targets. In this commit __memcmpeq() is implemented only as an alias to the corresponding targets memcmp() implementation. __memcmpeq() is added as a new symbol starting with GLIBC_2.35 and defined in string.h with comments explaining its behavior. Basic tests that it is callable and works where added in string/tester.c As discussed in the proposal "Add new ABI '__memcmpeq()' to libc" __memcmpeq() is essentially a reserved namespace for bcmp(). The means is shares the same specifications as memcmp() except the return value for non-equal byte sequences is any non-zero value. This is less strict than memcmp()'s return value specification and can be better optimized when a boolean return is all that is needed. __memcmpeq() is meant to only be called by compilers if they can prove that the return value of a memcmp() call is only used for its boolean value. All tests in string/tester.c passed. As well build succeeds on x86_64-linux-gnu target.	2021-10-26 16:51:29 -05:00
Fangrui Song	8438135d34	configure: Don't check LD -v --help for LIBC_LINKER_FEATURE When LIBC_LINKER_FEATURE is used to check a linker option with the equal sign, it will likely fail because the LD -v --help output may look like `-z lam-report=[none\|warning\|error]` while the needle is something like `-z lam-report=warning`. The LD -v --help filter doesn't save much time, so just remove it.	2021-10-25 13:17:44 -07:00
Noah Goldstein	bad852b61b	x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'. it could potentially be dangerous to use SSE2 if this function is ever called without using 'vzeroupper' beforehand. While compilers appear to use 'vzeroupper' before function calls if AVX2 has been used, using SSE2 here is more brittle. Since it is not absolutely necessary it should be avoided. It costs 2-extra bytes but the extra bytes should only eat into alignment padding. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-23 13:02:42 -05:00
Sunil K Pandey	4f690aad9e	x86_64: Add missing libmvec ABI tests Add vector ABI tests for cos, exp, log, pow and sin functions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-22 06:46:49 -07:00
Adhemerval Zanella	927246e188	elf: Fix `e6fd79f379` build with --enable-tunables=no The _dl_sort_maps_init() is not defined when tunables is not enabled. Checked on x86_64-linux-gnu.	2021-10-21 17:26:32 -03:00
Chung-Lin Tang	15a0c5730d	elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645 ) This second patch contains the actual implementation of a new sorting algorithm for shared objects in the dynamic loader, which solves the slow behavior that the current "old" algorithm falls into when the DSO set contains circular dependencies. The new algorithm implemented here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Another notable implementation adjustment related to this _dl_sort_maps change is the removing of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes. This has been changed to simply use two new bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes do along the way. Tunable support for switching between different sorting algorithms at runtime is also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1 (old algorithm) and 2 (new DFS algorithm) has been added. At time of commit of this patch, the default setting is 1 (old algorithm). Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-21 11:23:53 -03:00
Fangrui Song	aa783f9a7b	linux: Fix a possibly non-constant expression in _Static_assert According to C11 6.6p6, `const int` as an operand may not make up a constant expression. GCC -O0 errors: ../sysdeps/unix/sysv/linux/opendir.c:107:19: error: static_assert expression is not an integral constant expression _Static_assert (allocation_size >= sizeof (struct dirent64), -O2 -Wpedantic has a similar warning. See https://gcc.gnu.org/PR102502 for GCC's inconsistency. Use enum which is guaranteed to be a constant expression. This also makes the file compilable with Clang. Fixes: `4b962c9e85` ("linux: Simplify opendir buffer allocation")	2021-10-20 14:22:43 -07:00
H.J. Lu	d962cce139	x86-64: Add sysdeps/x86_64/fpu/Makeconfig 1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which contains libmvec ABI test dependencies and CFLAGS, in the build directory. 2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS. Tested on SSE4, AVX, AVX2 and AVX512 machines. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-20 11:53:45 -07:00
Adhemerval Zanella	82fd7314c7	powerpc: Remove backtrace implementation The powerpc optimization to provide a fast stacktrace requires some ad-hoc code to handle Linux signal frames and the change is fragile once the kernel decides to slight change its execution sequence [1]. The generic implementation work as-is and it should be future proof since the kernel provides the expected CFI directives in vDSO shared page. Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and powerpc64-linux-gnu. [1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html	2021-10-20 10:40:53 -03:00
H.J. Lu	2ec99d8c42	ld.so: Initialize bootstrap_map.l_ld_readonly [BZ #28340 ] 1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly before calling elf_get_dynamic_info to get dynamic info in bootstrap_map, 2. Define a single static inline bool dl_relocate_ld (const struct link_map l) { / Don't relocate dynamic section if it is readonly */ return !(l->l_ld_readonly \|\| DL_RO_DYN_SECTION); } This updates BZ #28340 fix.	2021-10-19 06:40:38 -07:00
Stafford Horne	1d550265a7	timex: Use 64-bit fields on 32-bit TIMESIZE=64 systems (BZ #28469 ) This was found when testing the OpenRISC port I am working on. These two tests fail with SIGSEGV: FAIL: misc/tst-ntp_gettime FAIL: misc/tst-ntp_gettimex This was found to be due to the kernel overwriting the stack space allocated by the timex structure. The reason for the overwrite being that the kernel timex has 64-bit fields and user space code only allocates enough stack space for timex with 32-bit fields. On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined. This causes the timex structure to use 32-bit fields with type __syscall_slong_t. This patch adjusts the ifdef condition to allow 32-bit systems with TIMESIZE=64 to use the 64-bit long long timex definition. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-18 17:17:20 -03:00
Samuel Thibault	1d3decee99	hurd if_index: Explicitly use AF_INET for if index discovery `5bf07e1b3a` ("Linux: Simplify __opensock and fix race condition [BZ #28353]") made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET knows about network interfaces, so better actually specify that in if_index.	2021-10-18 01:39:02 +02:00

1 2 3 4 5 ...

14486 Commits