glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-25 14:30:06 +00:00

Author	SHA1	Message	Date
H.J. Lu	7d544dd049	x86-64/cet: Move check-cet.awk to x86_64 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-10 05:20:16 -08:00
H.J. Lu	a1bbee9fd1	x86-64/cet: Move dl-cet.[ch] to x86_64 directories Since CET is only enabled for x86-64, move dl-cet.[ch] to x86_64 directories. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-10 05:19:32 -08:00
H.J. Lu	b45115a666	x86: Move x86-64 shadow stack startup codes Move sysdeps/x86/libc-start.h to sysdeps/x86_64/libc-start.h and use sysdeps/generic/libc-start.h for i386. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-10 05:19:32 -08:00
Adhemerval Zanella	25f1e16ef0	i386: Remove CET support CET is only support for x86_64, this patch reverts: - `faaee1f07e` x86: Support shadow stack pointer in setjmp/longjmp. - `be9ccd27c0` i386: Add _CET_ENDBR to indirect jump targets in add_n.S/sub_n.S - `c02695d776` x86/CET: Update vfork to prevent child return - `5d844e1b72` i386: Enable CET support in ucontext functions - `124bcde683` x86: Add _CET_ENDBR to functions in crti.S - `562837c002` x86: Add _CET_ENDBR to functions in dl-tlsdesc.S - `f753fa7dea` x86: Support IBT and SHSTK in Intel CET [BZ #21598] - `825b58f3fb` i386-mcount.S: Add _CET_ENDBR to _mcount and __fentry__ - `7e119cd582` i386: Use _CET_NOTRACK in i686/memcmp.S - `177824e232` i386: Use _CET_NOTRACK in memcmp-sse4.S - `0a899af097` i386: Use _CET_NOTRACK in memcpy-ssse3-rep.S - `7fb613361c` i386: Use _CET_NOTRACK in memcpy-ssse3.S - `77a8ae0948` i386: Use _CET_NOTRACK in memset-sse2-rep.S - `00e7b76a8f` i386: Use _CET_NOTRACK in memset-sse2.S - `90d15dc577` i386: Use _CET_NOTRACK in strcat-sse2.S - `f1574581c7` i386: Use _CET_NOTRACK in strcpy-sse2.S - `4031d7484a` i386/sub_n.S: Add a missing _CET_ENDBR to indirect jump - target - Checked on i686-linux-gnu.	2024-01-09 13:55:51 -03:00
Adhemerval Zanella	b7fc4a07f2	x86: Move CET infrastructure to x86_64 The CET is only supported for x86_64 and there is no plan to add kernel support for i386. Move the Makefile rules and files from the generic x86 folder to x86_64 one. Checked on x86_64-linux-gnu and i686-linux-gnu.	2024-01-09 13:55:51 -03:00
H.J. Lu	0f9afc265a	x32: Handle displacement overflow in PLT rewrite [BZ #31218 ] PLT rewrite calculated displacement with ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE; On x32, displacement from 0xf7fbe060 to 0x401030 was calculated as unsigned int disp = 0x401030 - 0xf7fbe060 - 5; with disp == 0x8442fcb and caused displacement overflow. The PLT entry was changed to: 0xf7fbe060 <+0>: e9 cb 2f 44 08 jmp 0x401030 0xf7fbe065 <+5>: cc int3 0xf7fbe066 <+6>: cc int3 0xf7fbe067 <+7>: cc int3 0xf7fbe068 <+8>: cc int3 0xf7fbe069 <+9>: cc int3 0xf7fbe06a <+10>: cc int3 0xf7fbe06b <+11>: cc int3 0xf7fbe06c <+12>: cc int3 0xf7fbe06d <+13>: cc int3 0xf7fbe06e <+14>: cc int3 0xf7fbe06f <+15>: cc int3 x32 has 32-bit address range, but it doesn't wrap address around at 4GB, JMP target was changed to 0x100401030 (0xf7fbe060LL + 0x8442fcbLL + 5), which is above 4GB. Always use uint64_t to calculate displacement. This fixes BZ #31218. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-06 14:25:49 -08:00
H.J. Lu	848746e88e	elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after relocation. For x86-64, add #define DT_X86_64_PLT (DT_LOPROC + 0) #define DT_X86_64_PLTSZ (DT_LOPROC + 1) #define DT_X86_64_PLTENT (DT_LOPROC + 3) 1. DT_X86_64_PLT: The address of the procedure linkage table. 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage table. 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table entry. With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the memory offset of the indirect branch instruction. Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section with direct branch after relocation when the lazy binding is disabled. PLT rewrite is disabled by default since SELinux may disallow modifying code pages and ld.so can't detect it in all cases. Use $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 to enable PLT rewrite with 32-bit direct jump at run-time or $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2 to enable PLT rewrite with 32-bit direct jump and on APX processors with 64-bit absolute jump at run-time. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-05 05:49:49 -08:00
H.J. Lu	35694d3416	x86-64/cet: Check the restore token in longjmp setcontext and swapcontext put a restore token on the old shadow stack which is used to restore the target shadow stack when switching user contexts. When longjmp from a user context, the target shadow stack can be different from the current shadow stack and INCSSP can't be used to restore the shadow stack pointer to the target shadow stack. Update longjmp to search for a restore token. If found, use the token to restore the shadow stack pointer before using INCSSP to pop the shadow stack. Stop the token search and use INCSSP if the shadow stack entry value is the same as the current shadow stack pointer. It is a user error if there is a shadow stack switch without leaving a restore token on the old shadow stack. The only difference between __longjmp.S and __longjmp_chk.S is that __longjmp_chk.S has a check for invalid longjmp usages. Merge __longjmp.S and __longjmp_chk.S by adding the CHECK_INVALID_LONGJMP macro. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-04 13:38:26 -08:00
H.J. Lu	bbfb54930c	i386: Ignore --enable-cet Since shadow stack is only supported for x86-64, ignore --enable-cet for i386. Always setting $(enable-cet) for i386 to "no" to support ifneq ($(enable-cet),no) in x86 Makefiles. We can't use ifeq ($(enable-cet),yes) since $(enable-cet) can be "yes", "no" or "permissive". Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-04 06:08:55 -08:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
H.J. Lu	541641a3de	x86/cet: Enable shadow stack during startup Previously, CET was enabled by kernel before passing control to user space and the startup code must disable CET if applications or shared libraries aren't CET enabled. Since the current kernel only supports shadow stack and won't enable shadow stack before passing control to user space, we need to enable shadow stack during startup if the application and all shared library are shadow stack enabled. There is no need to disable shadow stack at startup. Shadow stack can only be enabled in a function which will never return. Otherwise, shadow stack will underflow at the function return. 1. GL(dl_x86_feature_1) is set to the CET features which are supported by the processor and are not disabled by the tunable. Only non-zero features in GL(dl_x86_feature_1) should be enabled. After enabling shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check if shadow stack is really enabled. 2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is safe since RTLD_START never returns. 3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static executable. Since the start function using ARCH_SETUP_TLS never returns, it is safe to enable shadow stack in ARCH_SETUP_TLS.	2024-01-01 05:22:48 -08:00
H.J. Lu	edb5e0c8f9	x86/cet: Sync with Linux kernel 6.6 shadow stack interface Sync with Linux kernel 6.6 shadow stack interface. Since only x86-64 is supported, i386 shadow stack codes are unchanged and CET shouldn't be enabled for i386. 1. When the shadow stack base in TCB is unset, the default shadow stack is in use. Use the current shadow stack pointer as the marker for the default shadow stack. It is used to identify if the current shadow stack is the same as the target shadow stack when switching ucontexts. If yes, INCSSP will be used to unwind shadow stack. Otherwise, shadow stack restore token will be used. 2. Allocate shadow stack with the map_shadow_stack syscall. Since there is no function to explicitly release ucontext, there is no place to release shadow stack allocated by map_shadow_stack in ucontext functions. Such shadow stacks will be leaked. 3. Rename arch_prctl CET commands to ARCH_SHSTK_XXX. 4. Rewrite the CET control functions with the current kernel shadow stack interface. Since CET is no longer enabled by kernel, a separate patch will enable shadow stack during startup.	2024-01-01 05:22:48 -08:00
H.J. Lu	81be2a61da	x86-64: Fix the tcb field load for x32 [BZ #31185 ] _dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer via the tcb field in TCB: _dl_tlsdesc_undefweak: _CET_ENDBR movq 8(%rax), %rax subq %fs:0, %rax ret _dl_tlsdesc_dynamic: ... subq %fs:0, %rax movq -8(%rsp), %rdi ret Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location, not 64-bit. It should use "sub %fs:0, %RAX_LP" instead. Since _dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic returns void *, RAX_LP is appropriate here for x32 and x86-64. This fixes BZ #31185.	2023-12-22 05:37:17 -08:00
H.J. Lu	3502440397	x86-64: Fix the dtv field load for x32 [BZ #31184 ] On x32, I got FAIL: elf/tst-tlsgap $ gdb elf/tst-tlsgap ... open tst-tlsgap-mod1.so Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault. [Switching to LWP 2268754] _dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108 108 movq (%rsi), %rax (gdb) p/x $rsi $4 = 0xf7dbf9005655fb18 (gdb) This is caused by _dl_tlsdesc_dynamic: _CET_ENDBR /* Preserve call-clobbered registers that we modify. We need two scratch regs anyway. */ movq %rsi, -16(%rsp) movq %fs:DTV_OFFSET, %rsi Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit location, not 64-bit. Load the dtv field to RSI_LP instead of rsi. This fixes BZ #31184.	2023-12-22 05:37:00 -08:00
Bruno Haible	787282dede	x86: Do not raises floating-point exception traps on fesetexceptflag (BZ 30990) According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. When we need to clear a flag, we need to do so in both units, due to the way fetestexcept is implemented. When we need to set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-12-19 15:12:38 -03:00
Matthew Sterrett	e957308723	x86: Unifies 'strlen-evex' and 'strlen-evex512' implementations. This commit uses a common implementation 'strlen-evex-base.S' for both 'strlen-evex' and 'strlen-evex512' The motivation is to reduce the number of implementations to maintain. This incidentally gives a small performance improvement. All tests pass on x86. Benchmarks were taken on SKX. https://www.intel.com/content/www/us/en/products/sku/123613/intel-core-i97900x-xseries-processor-13-75m-cache-up-to-4-30-ghz/specifications.html Geometric mean for strlen-evex512 over all benchmarks (N=10) was (new/old) 0.939 Geometric mean for wcslen-evex512 over all benchmarks (N=10) was (new/old) 0.965 Code Size Changes: strlen-evex512.S : +24 bytes wcslen-evex512.S : +54 bytes Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-12-18 12:38:01 -06:00
Noah Goldstein	9469261cf1	x86: Only align destination to 1x VEC_SIZE in memset 4x loop Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on performance other than potentially resulting in an additional iteration of the loop. 1x maintains aligned stores (the only reason to align in this case) and doesn't incur any unnecessary loop iterations. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2023-11-28 12:06:19 -06:00
Adhemerval Zanella	55f41ef8de	elf: Remove LD_PROFILE for static binaries The _dl_non_dynamic_init does not parse LD_PROFILE, which does not enable profile for dlopen objects. Since dlopen is deprecated for static objects, it is better to remove the support. It also allows to trim down libc.a of profile support. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-11-21 16:15:42 -03:00
Adhemerval Zanella	4862d546c0	x86: Use dl-symbol-redir-ifunc.h on cpu-tunables The dl-symbol-redir-ifunc.h redirects compiler-generated libcalls to arch-specific memory implementations to avoid ifunc calls where it is not yet possible. The memcmp-isa-default-impl.h aims to fix the same issue by calling the specific memset implementation directly. Using the memcmp symbol directly allows the compiler to inline the memset calls (especially because _dl_tunable_set_hwcaps uses constants values), generating better code. Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-11-21 16:15:42 -03:00
Adhemerval Zanella	9c96c87d60	elf: Ignore GLIBC_TUNABLES for setuid/setgid binaries The tunable privilege levels were a retrofit to try and keep the malloc tunable environment variables' behavior unchanged across security boundaries. However, CVE-2023-4911 shows how tricky can be tunable parsing in a security-sensitive environment. Not only parsing, but the malloc tunable essentially changes some semantics on setuid/setgid processes. Although it is not a direct security issue, allowing users to change setuid/setgid semantics is not a good security practice, and requires extra code and analysis to check if each tunable is safe to use on all security boundaries. It also means that security opt-in features, like aarch64 MTE, would need to be explicit enabled by an administrator with a wrapper script or with a possible future system-wide tunable setting. Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>	2023-11-21 16:15:42 -03:00
Noah Goldstein	b7f8b6b64b	x86: Fix unchecked AVX512-VBMI2 usage in strrchr-evex-base.S strrchr-evex-base used `vpcompress{b\|d}` in the page cross logic but was missing the CPU_FEATURE checks for VBMI2 in the ifunc/ifunc-impl-list. The fix is either to add those checks or change the logic to not use `vpcompress{b\|d}`. Choosing the latter here so that the strrchr-evex implementation is usable on SKX. New implementation is a bit slower, but this is in a cold path so its probably okay.	2023-11-15 11:09:44 -06:00
Noah Goldstein	a3c50bf46a	x86: Prepare `strrchr-evex` and `strrchr-evex512` for AVX10 This commit refactors `strrchr-evex` and `strrchr-evex512` to use a common implementation: `strrchr-evex-base.S`. The motivation is `strrchr-evex` needed to be refactored to not use 64-bit masked registers in preperation for AVX10. Once vec-width masked register combining was removed, the EVEX and EVEX512 implementations can easily be implemented in the same file without any major overhead. The net result is performance improvements (measured on TGL) for both `strrchr-evex` and `strrchr-evex512`. Although, note there are some regressions in the test suite and it may be many of the cases that make the total-geomean of improvement/regression across bench-strrchr are cold. The point of the performance measurement is to show there are no major regressions, but the primary motivation is preperation for AVX10. Benchmarks where taken on TGL: https://www.intel.com/content/www/us/en/products/sku/213799/intel-core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html EVEX geometric_mean(N=5) of all benchmarks New / Original : 0.74 EVEX512 geometric_mean(N=5) of all benchmarks New / Original: 0.87 Full check passes on x86.	2023-10-06 00:18:55 -05:00
Samuel Thibault	29d4591b07	hurd: Drop REG_GSFS and REG_ESDS from x86_64's ucontext These are useless on x86_64, and __NGREG was actually wrong with them.	2023-09-28 00:10:13 +02:00
Szabolcs Nagy	d2123d6827	elf: Fix slow tls access after dlopen [BZ #19924 ] In short: __tls_get_addr checks the global generation counter and if the current dtv is older then _dl_update_slotinfo updates dtv up to the generation of the accessed module. So if the global generation is newer than generation of the module then __tls_get_addr keeps hitting the slow dtv update path. The dtv update path includes a number of checks to see if any update is needed and this already causes measurable tls access slow down after dlopen. It may be possible to detect up-to-date dtv faster. But if there are many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at least walking the slotinfo list. This patch tries to update the dtv to the global generation instead, so after a dlopen the tls access slow path is only hit once. The modules with larger generation than the accessed one were not necessarily synchronized before, so additional synchronization is needed. This patch uses acquire/release synchronization when accessing the generation counter. Note: in the x86_64 version of dl-tls.c the generation is only loaded once, since relaxed mo is not faster than acquire mo load. I have not benchmarked this. Tested by Adhemerval Zanella on aarch64, powerpc, sparc, x86 who reported that it fixes the performance issue of bug 19924. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-09-01 08:21:37 +01:00
H.J. Lu	a8ecb126d4	x86_64: Add log1p with FMA On Skylake, it changes log1p bench performance by: Before After Improvement max 63.349 58.347 8% min 4.448 5.651 -30% mean 12.0674 10.336 14% The minimum code path is if (hx < 0x3FDA827A) /* x < 0.41422 / { if (__glibc_unlikely (ax >= 0x3ff00000)) / x <= -1.0 / { ... } if (__glibc_unlikely (ax < 0x3e200000)) / \|x\| < 2*-29 / { math_force_eval (two54 + x); /* raise inexact / if (ax < 0x3c900000) / \|x\| < 2*-54 / { ... } else return x - x * x * 0.5; FMA and non-FMA code sequences look similar. Non-FMA version is slightly faster. Since log1p is called by asinh and atanh, it improves asinh performance by: Before After Improvement max 75.645 63.135 16% min 10.074 10.071 0% mean 15.9483 14.9089 6% and improves atanh performance by: Before After Improvement max 91.768 75.081 18% min 15.548 13.883 10% mean 18.3713 16.8011 8%	2023-08-21 10:44:26 -07:00
H.J. Lu	1b214630ce	x86_64: Add expm1 with FMA On Skylake, it improves expm1 bench performance by: Before After Improvement max 70.204 68.054 3% min 20.709 16.2 22% mean 22.1221 16.7367 24% NB: Add extern long double __expm1l (long double); extern long double __expm1f128 (long double); for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is defined since __expm1 may be expanded in their declarations which causes the build failure.	2023-08-14 08:14:19 -07:00
H.J. Lu	f6b10ed8e9	x86_64: Add log2 with FMA On Skylake, it improves log2 bench performance by: Before After Improvement max 208.779 63.827 69% min 9.977 6.55 34% mean 10.366 6.8191 34%	2023-08-11 07:49:45 -07:00
H.J. Lu	881546979d	x86_64: Sort fpu/multiarch/Makefile Sort Makefile variables using scripts/sort-makefile-lines.py. No code generation changes observed in libm. No regressions on x86_64.	2023-08-10 11:23:25 -07:00
Adhemerval Zanella	51cb52214f	x86_64: Fix build with --disable-multiarch (BZ 30721) With multiarch disabled, the default memmove implementation provides the fortify routines for memcpy, mempcpy, and memmove. However, it does not provide the internal hidden definitions used when building with fortify enabled. The memset has a similar issue. Checked on x86_64-linux-gnu building with different options: default and --disable-multi-arch plus default, --disable-default-pie, --enable-fortify-source={2,3}, and --enable-fortify-source={2,3} with --disable-default-pie. Tested-by: Andreas K. Huettel <dilfridge@gentoo.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-08-10 10:29:29 -03:00
Andreas K. Hüttel	6d457ff36a	Update x86_64 libm-test-ulps (x32 ABI) Based on feedback by Mike Gilbert <floppym@gentoo.org> Linux-6.1.38-dist x86_64 AMD Phenom-tm- II X6 1055T Processor -march=amdfam10 failures occur for x32 ABI Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2023-07-19 16:56:54 +02:00
Siddhesh Poyarekar	c6cb8783b5	configure: Use autoconf 2.71 Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-07-17 10:08:10 -04:00
Frédéric Bérat	64f9857507	wchar: Avoid PLT entries with _FORTIFY_SOURCE The change is meant to avoid unwanted PLT entries for the wmemset and wcrtomb routines when _FORTIFY_SOURCE is set. On top of that, ensure that *_chk routines have their hidden builtin definitions available. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-07-05 16:59:48 +02:00
Frédéric Bérat	dd8486ffc1	string: Ensure _chk routines have their hidden builtin definition available If libc_hidden_builtin_{def,proto} isn't properly set for _chk routines, there are unwanted PLT entries in libc.so. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-07-05 16:59:48 +02:00
H.J. Lu	6259ab3941	ld.so: Always use MAP_COPY to map the first segment [BZ #30452 ] The first segment in a shared library may be read-only, not executable. To support LD_PREFER_MAP_32BIT_EXEC on such shared libraries, we also check MAP_DENYWRITE to decide if MAP_32BIT should be passed to mmap. Normally the first segment is mapped with MAP_COPY, which is defined as (MAP_PRIVATE \| MAP_DENYWRITE). But if the segment alignment is greater than the page size, MAP_COPY isn't used to allocate enough space to ensure that the segment can be properly aligned. Map the first segment with MAP_COPY in this case to fix BZ #30452.	2023-06-30 10:42:42 -07:00
Sergey Bugaev	45e2483a6c	x86: Make dl-cache.h and readelflib.c not Linux-specific These files could be useful to any port that wants to use ld.so.cache. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-06-26 10:04:31 -03:00
Frederic Berat	1bc85effd5	sysdeps/{i386, x86_64}/mempcpy_chk.S: fix linknamespace for __mempcpy_chk On i386 and x86_64, for libc.a specifically, __mempcpy_chk calls mempcpy which leads POSIX routines to call non-POSIX mempcpy indirectly. This leads the linknamespace test to fail when glibc is built with __FORTIFY_SOURCE=3. Since calling mempcpy doesn't bring any benefit for libc.a, directly call __mempcpy instead. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-06-22 00:20:52 -04:00
H.J. Lu	a8c8889978	x86-64: Use YMM registers in memcmpeq-evex.S Since the assembly source file with -evex suffix should use YMM registers, not ZMM registers, include x86-evex256-vecs.h by default to use YMM registers in memcmpeq-evex.S Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-06-01 09:21:14 -07:00
Paul Pluzhnikov	1e9d5987fd	Fix misspellings in sysdeps/x86_64 -- BZ 25337. Applying this commit results in bit-identical rebuild of libc.so.6 math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1 Reviewed-by: Florian Weimer <fweimer@redhat.com>	2023-05-23 10:25:11 +00:00
Paul Pluzhnikov	1d2971b525	Fix misspellings in sysdeps/x86_64/fpu/multiarch -- BZ 25337. Applying this commit results in a bit-identical rebuild of mathvec/libmvec.so.1 (which is the only binary that gets rebuilt). Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-05-23 03:28:58 +00:00
Joe Ramsay	cd94326a13	Enable libmvec support for AArch64 This patch enables libmvec on AArch64. The proposed change is mainly implementing build infrastructure to add the new routines to ABI, tests and benchmarks. I have demonstrated how this all fits together by adding implementations for vector cos, in both single and double precision, targeting both Advanced SIMD and SVE. The implementations of the routines themselves are just loops over the scalar routine from libm for now, as we are more concerned with getting the plumbing right at this point. We plan to contribute vector routines from the Arm Optimized Routines repo that are compliant with requirements described in the libmvec wiki. Building libmvec requires minimum GCC 10 for SVE ACLE. To avoid raising the minimum GCC by such a big jump, we allow users to disable libmvec if their compiler is too old. Note that at this point users have to manually call the vector math functions. This seems to be acceptable to some downstream users. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-05-03 12:09:49 +01:00
Samuel Thibault	6d4f183495	nptl: move tst-x86-64-tls-1 to nptl-only tests It is essentially nptl-only.	2023-05-01 12:59:33 +02:00
Sergey Bugaev	c02b26455b	hurd: Implement prefer_map_32bit_exec tunable This makes the prefer_map_32bit_exec tunable no longer Linux-specific. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230423215526.346009-4-bugaevc@gmail.com>	2023-04-24 22:48:35 +02:00
Sergey Bugaev	57df0f16b4	hurd: Add sys/ucontext.h and sigcontext.h for x86_64 This is based on the Linux port's version, but laid out to match Mach's struct i386_thread_state, much like the i386 version does. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>	2023-04-10 20:11:43 +02:00
Florian Weimer	5d1ccdda7b	x86_64: Fix asm constraints in feraiseexcept (bug 30305) The divss instruction clobbers its first argument, and the constraints need to reflect that. Fortunately, with GCC 12, generated code does not actually change, so there is no externally visible bug. Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-03 18:40:52 +02:00
Sergey Bugaev	8d873a4904	x86_64: Add rtld-stpncpy & rtld-strncpy Just like the other existing rtld-str* files, this provides rtld with usable versions of stpncpy and strncpy. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230319151017.531737-22-bugaevc@gmail.com>	2023-04-03 01:17:56 +02:00
Sergey Bugaev	fb9e7f6732	htl: Add tcb-offsets.sym for x86_64 The source code is the same as sysdeps/i386/htl/tcb-offsets.sym, but of course the produced tcb-offsets.h will be different. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230319151017.531737-21-bugaevc@gmail.com>	2023-04-03 01:15:30 +02:00
Adhemerval Zanella Netto	33237fe83d	Remove --enable-tunables configure option And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-03-29 14:33:06 -03:00
Joe Ramsay	e4d336f1ac	benchtests: Move libmvec benchtest inputs to benchtests directory This allows other targets to use the same inputs for their own libmvec microbenchmarks without having to duplicate them in their own subdirectory. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-03-27 17:04:03 +01:00
Sergey Bugaev	35ce4c99e7	htl: Add pthreadtypes-arch.h for x86_64 Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-Id: <20230221211932.296459-5-bugaevc@gmail.com>	2023-02-27 23:30:15 +01:00
H.J. Lu	04a558e669	x86_64: Update libm test ulps Update libm test ulps for commit `3efbf11fdf` Author: Paul Zimmermann <Paul.Zimmermann@inria.fr> Date: Tue Feb 14 11:24:59 2023 +0100 update auto-libm-test-out-hypot Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-02-27 08:39:32 -08:00

1 2 3 4 5 ...

1837 Commits