glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 13:00:06 +00:00

Author	SHA1	Message	Date
Florian Weimer	68007900be	misc, nptl: Remove stray references to __condvar_load_64_relaxed The function was renamed to __atomic_wide_counter_load_relaxed in commit `8bd336a00a` ("nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c").	2021-12-06 08:01:08 +01:00
Florian Weimer	4fb4e7e821	csu: Always use __executable_start in gmon-start.c Current binutils defines __executable_start as the lowest text address, so using the entry point address as a fallback is no longer necessary. As a result, overriding <entry.h> is only necessary if the entry point is not called _start. The previous approach to define __ASSEMBLY__ to suppress the declaration breaks if headers included by <entry.h> are not compatible with __ASSEMBLY__. This happens with rseq integration because it is necessary to include kernel headers in more places. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-05 13:50:21 +01:00
Florian Weimer	c1cb2deeca	elf: execve statically linked programs instead of crashing [BZ #28648 ] Programs without dynamic dependencies and without a program interpreter are now run via execve. Previously, the dynamic linker either crashed while attempting to read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT data), or the self-relocated in the static PIE executable crashed because the outer dynamic linker had already applied RELRO protection. <dl-execve.h> is needed because execve is not available in the dynamic loader on Hurd. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-05 11:28:34 +01:00
H.J. Lu	bada2e312a	Add --with-timeoutfactor=NUM to specify TIMEOUTFACTOR On Ice Lake and Tiger Lake laptops, some test programs timeout when there are 3 "make check -j8" runs in parallel. Add --with-timeoutfactor=NUM to specify an integer to scale the timeout of test programs, which can be overriden by TIMEOUTFACTOR environment variable. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-04 12:58:28 -08:00
Noah Goldstein	4df1fa6ddc	x86-64: Use notl in EVEX strcmp [BZ #28646 ] Must use notl %edi here as lower bits are for CHAR comparisons potentially out of range thus can be 0 without indicating mismatch. This fixes BZ #28646. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>	2021-12-03 21:14:11 -08:00
Florian Weimer	23c77f6018	nptl: Increase default TCB alignment to 32 rseq support will use a 32-byte aligned field in struct pthread, so the whole struct needs to have at least that alignment. nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h> to obtain the fallback definition. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-03 20:43:31 +01:00
Luca Boccassi	0656b649c5	elf: add definition for ELF_NOTE_FDO and NT_FDO_PACKAGING_METADATA note As defined on: https://systemd.io/COREDUMP_PACKAGE_METADATA/ this note will be used starting from Fedora 36. Signed-off-by: Luca Boccassi <bluca@debian.org>	2021-12-02 23:01:51 +01:00
Wilco Dijkstra	b31bd11454	AArch64: Improve A64FX memcpy v2 is a complete rewrite of the A64FX memcpy. Performance is improved by streamlining the code, aligning all large copies and using a single unrolled loop for all sizes. The code size for memcpy and memmove goes down from 1796 bytes to 868 bytes. Performance is better in all cases: bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33% faster for large sizes, bench-memcpy-walk is 25% faster for small sizes and 20% for the largest sizes. The geomean of all tests in bench-memcpy is 5.1% faster, and total time is reduced by 4%. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-02 18:36:03 +00:00
Wilco Dijkstra	b51eb35c57	AArch64: Optimize memcmp Rewrite memcmp to improve performance. On small and medium inputs performance is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per iteration, which is 30-50% faster depending on the size. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-02 18:35:53 +00:00
Matheus Castanho	d120fb9941	powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532 ] Syscalls based on the assembly templates are missing CFI for r31, which gets clobbered when scv is used, and info for LR is inaccurate, placed in the wrong LOC and not using the proper offset. LR was also being saved to the callee's frame, while the ABI mandates it to be saved to the caller's frame. These are fixed by this commit. After this change: $ readelf -wF libc.so.6 \| grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6 00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c LOC CFA r31 ra 000000000004b9d4 r1+0 u u 000000000004b9e4 r1+48 u u 000000000004b9e8 r1+48 c-16 u 000000000004b9fc r1+48 c-16 c+16 000000000004ba08 r1+48 c-16 000000000004ba18 r1+48 u 000000000004ba1c r1+0 u libc.so.6: file format elf64-powerpcle Disassembly of section .text: 000000000004b9d4 <kill>: 4b9d4: 1f 00 4c 3c addis r2,r12,31 4b9d8: 2c c3 42 38 addi r2,r2,-15572 4b9dc: 25 00 00 38 li r0,37 4b9e0: d1 ff 21 f8 stdu r1,-48(r1) 4b9e4: 20 00 e1 fb std r31,32(r1) 4b9e8: 98 8f ed eb ld r31,-28776(r13) 4b9ec: 10 00 ff 77 andis. r31,r31,16 4b9f0: 1c 00 82 41 beq 4ba0c <kill+0x38> 4b9f4: a6 02 28 7d mflr r9 4b9f8: 40 00 21 f9 std r9,64(r1) 4b9fc: 01 00 00 44 scv 0 4ba00: 40 00 21 e9 ld r9,64(r1) 4ba04: a6 03 28 7d mtlr r9 4ba08: 08 00 00 48 b 4ba10 <kill+0x3c> 4ba0c: 02 00 00 44 sc 4ba10: 00 00 bf 2e cmpdi cr5,r31,0 4ba14: 20 00 e1 eb ld r31,32(r1) 4ba18: 30 00 21 38 addi r1,r1,48 4ba1c: 18 00 96 41 beq cr5,4ba34 <kill+0x60> 4ba20: 01 f0 20 39 li r9,-4095 4ba24: 40 48 23 7c cmpld r3,r9 4ba28: 20 00 e0 4d bltlr+ 4ba2c: d0 00 63 7c neg r3,r3 4ba30: 08 00 00 48 b 4ba38 <kill+0x64> 4ba34: 20 00 e3 4c bnslr+ 4ba38: c8 32 fe 4b b 2ed00 <__syscall_error> ... 4ba44: 40 20 0c 00 .long 0xc2040 4ba48: 68 00 00 00 .long 0x68 4ba4c: 06 00 5f 5f rlwnm r31,r26,r0,0,3 4ba50: 6b 69 6c 6c xoris r12,r3,26987	2021-11-30 15:18:52 -03:00
Adhemerval Zanella	efc6b2dbc4	linux: Implement pipe in terms of __NR_pipe2 The syscall pipe2 was added in linux 2.6.27 and glibc requires linux 3.2.0. The patch removes the arch-specific implementation for alpha, ia64, mips, sh, and sparc which requires a different kernel ABI than the usual one. Checked on x86_64-linux-gnu and with a build for the affected ABIs.	2021-11-30 13:13:03 -03:00
Adhemerval Zanella	5b3e31e312	linux: Implement mremap in C Variadic function calls in syscalls.list does not work for all ABIs (for instance where the argument are passed on the stack instead of registers) and might have underlying issues depending of the variadic type (for instance if a 64-bit argument is used). Checked on x86_64-linux-gnu.	2021-11-30 13:13:03 -03:00
Adhemerval Zanella	83008fa495	linux: Add prlimit64 C implementation The LFS prlimit64 requires a arch-specific implementation in syscalls.list. Instead add a generic one that handles the required symbol alias for __RLIM_T_MATCHES_RLIM64_T. HPPA is the only outlier which requires a different default symbol. Checked on x86_64-linux-gnu and with build for the affected ABIs.	2021-11-30 13:13:03 -03:00
Florian Weimer	df4cb2280e	elf: Include <stdbool.h> in tst-tls20.c The test uses the bool type.	2021-11-30 15:39:17 +01:00
Florian Weimer	3c7c511782	elf: Include <stdint.h> in tst-tls20.c The test uses standard integer types.	2021-11-30 14:35:54 +01:00
Samuel Thibault	e49c3c5d7a	hurd: Let report-wait use a weak reference to _hurd_itimer_thread libc.so.0.3 does not seem to need this defined any more.	2021-11-28 21:26:25 +01:00
Adhemerval Zanella	137ed5ac44	linux: Use /proc/stat fallback for __get_nprocs_conf (BZ #28624 ) The /proc/statm fallback was removed by `f13fb81ad3` if sysfs is not available, reinstate it. Checked on x86_64-linux-gnu.	2021-11-25 11:00:42 -03:00
Adhemerval Zanella	d150181d73	linux: Add fanotify_mark C implementation Passing 64-bit arguments on syscalls.list is tricky: it requires to reimplement the expected kernel abi in each architecture. This is way to better to represent in C code where we already have macros for this (SYSCALL_LL64). Checked on x86_64-linux-gnu.	2021-11-25 09:56:57 -03:00
Adhemerval Zanella	c3b023a782	linux: Only build fstatat fallback if required For 32-bit architecture with __ASSUME_STATX there is no need to build fstatat64_time64_stat. Checked on i686-linux-gnu.	2021-11-25 09:28:27 -03:00
Paul Eggert	c52ef24829	regex: fix buffer read overrun in search [BZ#28470] Problem reported by Benno Schulenberg in: https://lists.gnu.org/r/bug-gnulib/2021-10/msg00035.html * posix/regexec.c (re_search_internal): Use better bounds check.	2021-11-24 14:16:09 -08:00
Sunil K Pandey	c58d3b7d00	x86-64: Add vector sin/sinf to libmvec microbenchmark Add vector sin/sinf and input files to libmvec microbenchmark. libmvec-sin-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-sinf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:50:23 -08:00
Sunil K Pandey	6a556bac81	x86-64: Add vector pow/powf to libmvec microbenchmark Add vector pow/powf and input files to libmvec microbenchmark. libmvec-pow-inputs: arg1: 90% Normal random distribution range: (0.0, 256.0) mean: 0.0 sigma: 32.0 10% uniform random distribution in range (0.0, 256.0) arg2: 90% Normal random distribution range: (-127.0, 127.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-127.0, 127.0) libmvec-powf-inputs: arg1: 90% Normal random distribution range: (0.0f, 100.0f) mean: 0.0f sigma: 16.0f 10% uniform random distribution in range (0.0f, 100.0f) arg2: 90% Normal random distribution range: (-10.0f, 10.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-10.0f, 10.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:49:14 -08:00
Sunil K Pandey	8ab8afb336	x86-64: Add vector log/logf to libmvec microbenchmark Add vector log/logf and input files to libmvec microbenchmark. libmvec-log-inputs: 70% Normal random distribution range: (0.0, DBL_MAX) mean: 1.0 sigma: 50.0 30% uniform random distribution in range (0.0, DBL_MAX) libmvec-logf-inputs: 70% Normal random distribution range: (0.0f, FLT_MAX) mean: 1.0f sigma: 50.0f 30% uniform random distribution in range (0.0f, FLT_MAX) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:48:14 -08:00
Sunil K Pandey	37df38bd5f	x86-64: Add vector exp/expf to libmvec microbenchmark Add vector exp/expf and input files to libmvec microbenchmark. libmvec-exp-inputs: 90% Normal random distribution range: (-708.0, 709.0) mean: 0.0 sigma: 16.0 10% uniform random distribution in range (-500.0, 500.0) libmvec-expf-inputs: 90% Normal random distribution range: (-87.0f, 88.0f) mean: 0.0f sigma: 8.0f 10% uniform random distribution in range (-50.0f, 50.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:46:59 -08:00
Sunil K Pandey	4443695598	x86-64: Add vector cos/cosf to libmvec microbenchmark Add vector cos/cosf and input files to libmvec microbenchmark. libmvec-cos-inputs: 90% Normal random distribution range: (-DBL_MAX, DBL_MAX) mean: 0.0 sigma: 5.0 10% uniform random distribution in range (-1000.0, 1000.0) libmvec-cosf-inputs: 90% Normal random distribution range: (-FLT_MAX, FLT_MAX) mean: 0.0f sigma: 5.0f 10% uniform random distribution in range (-1000.0f, 1000.0f) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-24 07:45:20 -08:00
Adhemerval Zanella	456b3c08b6	io: Refactor close_range and closefrom Now that Hurd implementis both close_range and closefrom (`f2c996597d`), we can make close_range() a base ABI, and make the default closefrom() implementation on top of close_range(). The generic closefrom() implementation based on __getdtablesize() is moved to generic close_range(). On Linux it will be overriden by the auto-generation syscall while on Hurd it will be a system specific implementation. The closefrom() now calls close_range() and __closefrom_fallback(). Since on Hurd close_range() does not fail, __closefrom_fallback() is an empty static inline function set by__ASSUME_CLOSE_RANGE. The __ASSUME_CLOSE_RANGE also allows optimize Linux __closefrom_fallback() implementation when --enable-kernel=5.9 or higher is used. Finally the Linux specific tst-close_range.c is moved to io and enabled as default. The Linuxism and CLOSE_RANGE_UNSHARE are guarded so it can be built for Hurd (I have not actually test it). Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu build.	2021-11-24 09:09:37 -03:00
Florian Weimer	e186fc5a31	nptl: Do not set signal mask on second setjmp return [BZ #28607 ] __libc_signal_restore_set was in the wrong place: It also ran when setjmp returned the second time (after pthread_exit or pthread_cancel). This is observable with blocked pending signals during thread exit. Fixes commit `b3cae39dcb` ("nptl: Start new threads with all signals blocked [BZ #25098]"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-24 08:59:54 +01:00
Adhemerval Zanella	aac54dcd37	powerpc: Define USE_PPC64_NOTOC iff compiler supports it The @notoc usage only yields an advantage on ISA 3.1+ machine (power10) and for ld.bfd also when it sees pcrel relocations used on the code (generated if compiler targets ISA 3.1+). On bfd case ISA 3.1+ instruction on stubs are used iff linker also sees the new pc-relative relocations (for instance R_PPC64_D34), otherwise it generates default stubs (ppc64_elf_check_relocs:4700). This patch also help on linkers that do not implement this optimization, since building for older ISA (such as 3.0 / power9) will also trigger power10 stubs generation in the assembly code uses the NOTOC imacro. Checked on powerpc64le-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2021-11-22 14:49:11 -03:00
Adhemerval Zanella	bc801b3a40	setjmp: Replace jmp_buf-macros.h with jmp_buf-macros.sym It requires less boilerplate code for newer ports. The _Static_assert checks from internal setjmp are moved to its own internal test since setjmp.h is included early by multiple headers (to generate rtld-sizes.sym). The riscv jmp_buf-macros.h check is also redundant, it is already done by riscv configure.ac. Checked with a build for the affected architectures.	2021-11-22 13:43:22 -03:00
Joseph Myers	5c3ece451d	Update kernel version to 5.15 in tst-mman-consts.py This patch updates the kernel version in the test tst-mman-consts.py to 5.15. (There are no new MAP_* constants covered by this test in 5.15 that need any other header changes.) Tested with build-many-glibcs.py.	2021-11-22 15:30:12 +00:00
Florian Weimer	3d981795cd	socket: Do not use AF_NETLINK in __opensock It is not possible to use interface ioctls with netlink sockets on all Linux kernels. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-22 14:47:13 +01:00
Adhemerval Zanella	ed3ce71f5c	elf: Move la_activity (LA_ACT_ADD) after _dl_add_to_namespace_list() (BZ #28062 ) It ensures that the the namespace is guaranteed to not be empty. Checked on x86_64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-11-18 17:17:58 -03:00
Joseph Myers	bdeb7a8fa9	Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add these constants to bits/socket.h. Tested for x86_64.	2021-11-17 14:25:16 +00:00
Stafford Horne	f1bcfde3a7	malloc: Fix malloc debug for 2.35 onwards The change `1e5a5866cb` ("Remove malloc hooks [BZ #23328]") has broken ports that are using GLIBC_2_35, like the new OpenRISC port I am working on. The libc_malloc_debug.so library used to bring in the debug infrastructure is currently essentially empty for GLIBC_2_35 ports like mine causing mtrace tests to fail: cat sysdeps/unix/sysv/linux/or1k/shlib-versions DEFAULT GLIBC_2.35 ld=ld-linux-or1k.so.1 FAIL: posix/bug-glob2-mem FAIL: posix/bug-regex14-mem FAIL: posix/bug-regex2-mem FAIL: posix/bug-regex21-mem FAIL: posix/bug-regex31-mem FAIL: posix/bug-regex36-mem FAIL: malloc/tst-mtrace. The issue seems to be with the ifdefs in malloc/malloc-debug.c. The ifdefs are currently essentially exluding all symbols for ports > 2.35. Removing the top level SHLIB_COMPAT ifdef allows things to just work. Fixes: `1e5a5866cb` ("Remove malloc hooks [BZ #23328]") Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-11-17 21:33:39 +09:00
Florian Weimer	f1d333b5bf	elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres This will be used to deallocate memory allocated using the non-minimal malloc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:29 +01:00
Florian Weimer	8bd336a00a	nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c And make it an installed header. This addresses a few aliasing violations (which do not seem to result in miscompilation due to the use of atomics), and also enables use of wide counters in other parts of the library. The debug output in nptl/tst-cond22 has been adjusted to print the 32-bit values instead because it avoids a big-endian/little-endian difference. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-17 12:20:13 +01:00
Sunil K Pandey	a43c0b5483	x86-64: Create microbenchmark infrastructure for libmvec Add python script to generate libmvec microbenchmark from the input values for each libmvec function using skeleton benchmark template. Creates double and float benchmarks with vector length 1, 2, 4, 8, and 16 for each libmvec function. Vector length 1 corresponds to scalar version of function and is included for vector function perf comparison. Co-authored-by: Haochen Jiang <haochen.jiang@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-16 11:37:39 -08:00
Adhemerval Zanella	d8c2e8e043	elf: hidden visibility for __minimal_malloc functions Since `b05fae4d8e`, __minimal malloc code is used during static startup before PIE self-relocation (_dl_relocate_static_pie). So it requires the same fix done for other objects by `47618209d0`. Checked on aarch64, x86_64, and i686 with and without static-pie.	2021-11-16 16:03:31 -03:00
H.J. Lu	1f67d8286b	elf: Use a temporary file to generate Makefile fragments [BZ #28550 ] 1. Use a temporary file to generate Makefile fragments for DSO sorting tests and use -include on them. 2. Add Makefile fragments to postclean-generated so that a "make clean" removes the autogenerated fragments and a subsequent "make" regenerates them. This partially fixes BZ #28550. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-16 05:18:01 -08:00
H.J. Lu	b4bbedb1e7	dso-ordering-test.py: Put all sources in one directory [BZ #28550 ] Put all sources for DSO sorting tests in the dso-sort-tests-src directory and compile test relocatable objects with $(objpfx)tst-dso-ordering1-dir/tst-dso-ordering1-a.os: $(objpfx)dso-sort-tests-src/tst-dso-ordering1-a.c $(compile.c) $(OUTPUT_OPTION) to avoid random $< values from $(before-compile) when compiling test relocatable objects with $(objpfx)%$o: $(objpfx)%.c $(before-compile); $$(compile-command.c) compile-command.c = $(compile.c) $(OUTPUT_OPTION) $(compile-mkdep-flags) compile.c = $(CC) $< -c $(CFLAGS) $(CPPFLAGS) for 3 "make -j 28" parallel builds on a machine with 112 cores at the same time. This partially fixes BZ #28550. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-11-15 11:53:40 -08:00
Adhemerval Zanella	54816ae98d	elf: Move LAV_CURRENT to link_lavcurrent.h No functional change.	2021-11-15 15:28:17 -03:00
H.J. Lu	120ac6d238	Move assignment out of the CAS condition Update commit `49302b8fdf` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Nov 11 06:54:01 2021 -0800 Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537] Replace boolean CAS with value CAS to avoid the extra load. and commit `0b82747dc4` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Nov 11 06:31:51 2021 -0800 Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537] Replace boolean CAS with value CAS to avoid the extra load. by moving assignment out of the CAS condition.	2021-11-15 05:50:56 -08:00
H.J. Lu	cbcd65c8b5	Add a comment for --enable-initfini-array [BZ #27945 ] Document that --enable-initfini-array is enabled by default in GCC 12, which can be removed when GCC 12 becomes the minimum requirement.	2021-11-13 09:50:07 -08:00
Stafford Horne	afbf26492a	tst-tzset: output reason when creating 4GiB file fails Currently, if the temporary file creation fails the create_tz_file function returns NULL. The NULL pointer is then passed to setenv which causes a SIGSEGV. Rather than failing with a SIGSEGV print a warning and exit.	2021-11-13 08:08:47 +09:00
H.J. Lu	d672a98a1a	Add LLL_MUTEX_READ_LOCK [BZ #28537 ] CAS instruction is expensive. From the x86 CPU's point of view, getting a cache line for writing is more expensive than reading. See Appendix A.2 Spinlock in: https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf The full compare and swap will grab the cache line exclusive and cause excessive cache line bouncing. Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock loop if compare may fail to reduce cache line bouncing on contended locks. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-11-12 10:32:09 -08:00
H.J. Lu	49302b8fdf	Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537 ] Replace boolean CAS with value CAS to avoid the extra load. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-11-12 10:31:31 -08:00
H.J. Lu	0b82747dc4	Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537 ] Replace boolean CAS with value CAS to avoid the extra load. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-11-12 10:30:57 -08:00
Noah Goldstein	6c1e3c0fd0	String: Split memcpy tests so that parallel build is faster No bug. This commit splits test-memcpy.c into test-memcpy.c and test-memcpy-large.c. The idea is parallel builds will be able to run both in parallel speeding up the process. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-10 20:14:09 -06:00
Noah Goldstein	2f9062d717	x86: Shrink memcmp-sse4.S code size No bug. This implementation refactors memcmp-sse4.S primarily with minimizing code size in mind. It does this by removing the lookup table logic and removing the unrolled check from (256, 512] bytes. memcmp-sse4 code size reduction : -3487 bytes wmemcmp-sse4 code size reduction: -1472 bytes The current memcmp-sse4.S implementation has a large code size cost. This has serious adverse affects on the ICache / ITLB. While in micro-benchmarks the implementations appears fast, traces of real-world code have shown that the speed in micro benchmarks does not translate when the ICache/ITLB are not primed, and that the cost of the code size has measurable negative affects on overall application performance. See https://research.google/pubs/pub48320/ for more details. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-10 20:12:10 -06:00
Joseph Myers	309548bec3	Support C2X printf %b, %B C2X adds a printf %b format (see <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2630.pdf>, accepted for C2X), for outputting integers in binary. It also has recommended practice for a corresponding %B format (like %b, but %#B starts the output with 0B instead of 0b). Add support for these formats to glibc. One existing test uses %b as an example of an unknown format, to test how glibc printf handles unknown formats; change that to %v. Use of %b and %B as user-registered format specifiers continues to work (and we already have a test that covers that, tst-printfsz.c). Note that C2X also has scanf %b support, plus support for binary constants starting 0b in strtol (base 0 and 2) and scanf %i (strtol base 0 and scanf %i coming from a previous paper that added binary integer literals). I intend to implement those features in a separate patch or patches; as discussed in the thread starting at <https://sourceware.org/pipermail/libc-alpha/2020-December/120414.html>, they will be more complicated because they involve adding extra public symbols to ensure compatibility with existing code that might not expect 0b constants to be handled by strtol base 0 and 2 and scanf %i, whereas simply adding a new format specifier poses no such compatibility concerns. Note that the actual conversion from integer to string uses existing code in _itoa.c. That code has special cases for bases 8, 10 and 16, probably so that the compiler can optimize division by an integer constant in the code for those bases. If desired such special cases could easily be added for base 2 as well, but that would be an optimization, not actually needed for these printf formats to work. Tested for x86_64 and x86. Also tested with build-many-glibcs.py for aarch64-linux-gnu with GCC mainline to make sure that the test does indeed build with GCC 12 (where format checking warnings are enabled for most of the test).	2021-11-10 15:52:21 +00:00

1 2 3 4 5 ...

38070 Commits