glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-28 13:31:13 +00:00

Author	SHA1	Message	Date
Samuel Thibault	b9ca3f3efb	tst-cancel4-common.c: fix calling socketpair PF_UNIX was actually never intended to be passed as protocol parameter to socket() calls: it is a protocol family, not a protocol. It happens that Linux introduced accepting it during its 2.0 development, but it shouldn't. OpenBSD kernels accept it as well, but FreeBSD and NetBSD rightfully do not. GNU/Hurd does not either. * nptl/tst-cancel4-common.c (do_test): Pass 0 instead of PF_UNIX as protocol.	2020-06-26 23:51:52 +02:00
H.J. Lu	4fdd4d41a1	x86: Detect Intel Advanced Matrix Extensions Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Intel AMX is an extensible architecture. New accelerators can be added and the existing accelerator may be enhanced to provide higher performance. The initial features are AMX-BF16, AMX-TILE and AMX-INT8, which are usable only if the operating system supports both XTILECFG state and XTILEDATA state. Add AMX-BF16, AMX-TILE and AMX-INT8 support to HAS_CPU_FEATURE and CPU_FEATURE_USABLE.	2020-06-26 06:53:05 -07:00
Mike FABIAN	6e540caa21	Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120 ] Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-06-26 09:54:43 +02:00
Stefan Liebler	1d21fb1061	S390: Optimize __memset_z196. It turned out that an 256b-mvc instruction which depends on the result of a previous 256b-mvc instruction is counterproductive. Therefore this patch adjusts the 256b-loop by storing the first byte with stc and setting the remaining 255b with mvc. Now the 255b-mvc instruction depends on the stc instruction.	2020-06-26 09:45:11 +02:00
Stefan Liebler	0792c8ae1a	S390: Optimize __memcpy_z196. This patch introduces an extra loop without pfd instructions as it turned out that the pfd instructions are usefull for copies >=64KB but are counterproductive for smaller copies.	2020-06-26 09:45:11 +02:00
Florian Weimer	2034c70e64	elf: Include <stddef.h> (for size_t), <sys/stat.h> in <ldconfig.h> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-06-25 16:51:03 +02:00
Szabolcs Nagy	087942251f	nptl: Don't madvise user provided stack User provided stack should not be released nor madvised at thread exit because it's owned by the user. If the memory is shared or file based then MADV_DONTNEED can have unwanted effects. With memory tagging on aarch64 linux the tags are dropped and thus it may invalidate pointers. Tested on aarch64-linux-gnu with MTE, it fixes FAIL: nptl/tst-stack3 FAIL: nptl/tst-stack3-mem	2020-06-25 14:19:16 +01:00
Stefan Liebler	f6b955e8ba	S390: Regenerate ULPs. Updates needed after recent exp10f commits.	2020-06-24 14:51:06 +02:00
Florian Weimer	1fb7dc751e	htl: Add wrapper header for <semaphore.h> with hidden __sem_post This is required to avoid a check-localplt failure due to a sem_post call through the PLT. Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>	2020-06-24 13:38:08 +02:00
Florian Weimer	6f3331f26d	elf: Include <stdbool.h> in <dl-tunables.h> because bool is used	2020-06-24 11:02:34 +02:00
Samuel Thibault	1b90d52df9	htl: Fix case when sem_wait is canceled while holding a token sysdeps/htl/sem-timedwait.c (struct cancel_ctx): Add cancel_wake field. (cancel_hook): When unblocking thread, set cancel_wake field to 1. (__sem_timedwait_internal): Set cancel_wake field to 0 by default. On cancellation exit, check whether we hold a token, to be put back.	2020-06-24 02:20:42 +02:00
Samuel Thibault	eca16db02d	htl: Make sem_wait cancellations points By aligning its implementation on pthread_cond_wait. sysdeps/htl/sem-timedwait.c (cancel_ctx): New structure. (cancel_hook): New function. (__sem_timedwait_internal): Check for cancellation and register cancellation hook that wakes the thread up, and check again for cancellation on exit. * nptl/tst-cancel13.c, nptl/tst-cancelx13.c: Move to... * sysdeps/pthread/: ... here. * nptl/Makefile: Move corresponding references and rules to... * sysdeps/pthread/Makefile: ... here.	2020-06-24 01:19:49 +02:00
Samuel Thibault	3513d5af3d	htl: Simplify non-cancel path of __pthread_cond_timedwait_internal Since __pthread_exit does not return, we do not need to indent the noncancel path * sysdeps/htl/pt-cond-timedwait.c (__pthread_cond_timedwait_internal): Move cancelled path before non-cancelled path, to avoid "else" indentation.	2020-06-24 01:19:48 +02:00
Samuel Thibault	9f6e508b42	htl: Enable tst-cancel25 test * nptl/tst-cancel25.c: Move to... * sysdeps/pthread/tst-cancel25.c: ... here. (tf2) Do not test for SIGCANCEL when it is not defined. * nptl/Makefile: Move corresponding reference to... * sysdeps/pthread/Makefile: ... here.	2020-06-24 00:02:31 +02:00
Tulio Magno Quites Machado Filho	ae725e3f9c	powerpc: Add new hwcap values Linux commit ID ee988c11acf6f9464b7b44e9a091bf6afb3b3a49 reserved 2 new bits in AT_HWCAP2: - PPC_FEATURE2_ARCH_3_1 indicates the availability of the POWER ISA 3.1; - PPC_FEATURE2_MMA indicates the availability of the Matrix-Multiply Assist facility.	2020-06-23 18:15:06 -03:00
Alex Butler	03e1378f94	aarch64: MTE compatible strncmp Add support for MTE to strncmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	adac54ffc5	aarch64: MTE compatible strcmp Add support for MTE to strcmp. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Branislav Rankov <branislav.rankov@arm.com> Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	79160c06c7	aarch64: MTE compatible strrchr Add support for MTE to strrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	df06b0d90f	aarch64: MTE compatible memrchr Add support for MTE to memrchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	7ff899969f	aarch64: MTE compatible memchr Add support for MTE to memchr. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>	2020-06-23 17:55:39 +01:00
Alex Butler	bb2c12aecb	aarch64: MTE compatible strcpy Add support for MTE to strcpy. Regression tested with xcheck and benchmarked with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1. The existing implementation assumes that any access to the pages in which the string resides is safe. This assumption is not true when MTE is enabled. This patch updates the algorithm to ensure that accesses remain within the bounds of an MTE tag (16-byte chunks) and improves overall performance. Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2020-06-23 17:55:39 +01:00
Joseph Myers	8ec13b4639	Add MREMAP_DONTUNMAP from Linux 5.7 Add the new constant MREMAP_DONTUNMAP from Linux 5.7 to bits/mman-shared.h. Tested with build-many-glibcs.py.	2020-06-23 14:42:45 +00:00
H.J. Lu	ecbbadbf10	x86: Update CPU feature detection [BZ #26149 ] 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; and initialize usable_p to pointer to the usable arary so that struct cpu_features { struct cpu_features_basic basic; unsigned int usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. The cpuid and usable arrays can be expanded with backward binary compatibility for both .o and .so files. 3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. 4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID, TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16. 5. Rename CAPABILITIES to ARCH_CAPABILITIES. 6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable. 7. Update CPU feature detection test.	2020-06-22 13:09:33 -07:00
Adhemerval Zanella	ea04f02131	aarch64: Remove fpu Makefile The -fno-math-errno is already added by default and the minimum required GCC to build glibc (6.2) make the -ffinite-math-only superflous. Checked on aarch64-linux-gnu.	2020-06-22 11:09:50 -03:00
Adhemerval Zanella	9f21672b89	m68k: Use sqrt{f} builtin for coldfire Checked with a build for m68k-linux-gnu-coldfire.	2020-06-22 11:09:50 -03:00
Adhemerval Zanella	cbf3571f49	arm: Use sqrt{f} builtin Checked on arm-linux-gnueabi and armv7-linux-gnueabihf	2020-06-22 11:09:50 -03:00
Adhemerval Zanella	9dbb3fdfb7	riscv: Use sqrt{f} builtin Checked with a build for riscv64-linux-gnu-rv64imac-lp64 (no builtin support), riscv64-linux-gnu-rv64imafdc-lp64, and riscv64-linux-gnu-rv64imafdc-lp64d.	2020-06-22 11:09:50 -03:00
Adhemerval Zanella	3ca05a8e9e	s390: Use sqrt{f} builtin Checked on s390x-linux-gnu.	2020-06-22 11:09:50 -03:00
Adhemerval Zanella	c9a30f08e1	sparc: Use sqrt{f} builtin It also enabled to use fsqrtd on sparc64. Checked on sparcv9-linux-gnu and sparc64-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	32c65b28f3	mips: Use sqrt{f} builtin Checked with a build against mips-linux-gnu and mips64-linux-gnu and comparing the resulting binaries.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	8a7923b57e	alpha: Use builtin sqrt{f} The generic implementation is simplified by removing the 'optimization' for !_IEEE_FP_INEXACT (which does not handle inexact neither some values). Checked on alpha-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	b24381e50f	i386: Use builtin sqrtl Checked on i686-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	d19d25dd06	x86_64: Use builtin sqrt{f,l} Checked on x86_64-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	169ea8f928	powerpc: Use sqrt{f} builtin The powerpc sqrt implementation is also simplified: - the static constants are open coded within the implementation. - for !USE_SQRT_BUILTIN the function is implemented directly on __ieee754_sqrt (it avoid an superflous extra jump). Checked on powerpc-linux-gnu and powerpc64le-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	a2e833667d	s390x: Use fma{f} builtin Checked on s390x-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	271afad8f4	aarch64: Use math-use-builtins for ceil{f} The define is already set on the math-use-builtins-ceil.h, the patch just removes the implementations (it was missed on `c9feb1be93`). Checked on aarch64-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	e80501a5c9	math: Decompose math-use-builtins.h Each symbol definitions are moved on a separated file and it cover all symbol type definitions (float, double, long double, and float128). It allows to set support for architectures without the boiler place of copying default values. Checked with a build on the affected ABIs.	2020-06-22 11:09:45 -03:00
Samuel Thibault	c013d5d3aa	hurd: Add mremap * sysdeps/mach/hurd/mremap.c: New file. * sysdeps/mach/hurd/Makefile [misc] (sysdep_routines): Add mremap. * sysdeps/mach/hurd/Versions (libc.GLIBC_2.32): Add mremap. * sysdeps/mach/hurd/i386/libc.abilist: Add mremap.	2020-06-20 13:49:57 +00:00
Adhemerval Zanella	3297d019e1	ia64: Use generic exp10f The generic implementation is slight worse (Itanium(R) Processor 9020): Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 3.61582e+08, "iterations": 2.384e+07, "reciprocal-throughput": 14.8334, "latency": 15.5006, "max-throughput": 6.74153e+07, "min-throughput": 6.45136e+07 } } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 3.85549e+08, "iterations": 2.384e+07, "reciprocal-throughput": 15.8391, "latency": 16.5056, "max-throughput": 6.31348e+07, "min-throughput": 6.05857e+07 } } However it fixes all the issues on both: math/test-float-exp10 math/test-float32-exp10 (all the issues wrong results for non default rounding modes). The existing ia64 libm interface uses matherrf and matherrl in addition to matherr for SVID error handling. However, there is no such error handling support for exp10f in ia64 libm. So replacing it with the generic implementation should be fine. Checked on ia64-linux-gnu.	2020-06-19 12:08:52 -03:00
Adhemerval Zanella	be668a8d78	New exp10f version without SVID compat wrapper This patch changes the exp10f error handling semantics to only set errno according to POSIX rules. New symbol version is introduced at GLIBC_2.32. The old wrappers are kept for compat symbols. There are some outliers that need special handling: - ia64 provides an optimized implementation of exp10f that uses ia64 specific routines to set SVID compatibility. The new symbol version is aliased to the exp10f one. - m68k also provides an optimized implementation, and the new version uses it instead of the sysdeps/ieee754/flt32 one. - riscv and csky uses the generic template implementation that does not provide SVID support. For both cases a new exp10f version is not added, but rather the symbols version of the generic sysdeps/ieee754/flt32 is adjusted instead. Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu.	2020-06-19 12:08:47 -03:00
Adhemerval Zanella	4b2d8e4442	i386: Use generic exp10f The generic implementation is twice as fast. Using the exp10f benchmark: * master: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 1.02967e+09, "iterations": 4.768e+07, "reciprocal-throughput": 18.3579, "latency": 24.8331, "max-throughput": 5.44725e+07, "min-throughput": 4.02688e+07 } } * patched: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 1.01821e+09, "iterations": 6.1984e+07, "reciprocal-throughput": 13.1975, "latency": 19.6563, "max-throughput": 7.57719e+07, "min-throughput": 5.08743e+07 } } Checked on i686-linux-gnu.	2020-06-19 10:48:15 -03:00
Paul Zimmermann	6e98983c09	math: Optimized generic exp10f with wrappers It is inspired by expf and reuses its tables and internal functions. The error checks are inlined and errno setting is in separate tail called functions, but the wrappers are kept in this patch to handle the _LIB_VERSION==_SVID_ case. Double precision arithmetics is used which is expected to be faster on most targets (including soft-float) than using single precision and it is easier to get good precision result with it. Result for x86_64 (i7-4790K CPU @ 4.00GHz) are: Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.0414e+09, "iterations": 1.00128e+08, "reciprocal-throughput": 26.6818, "latency": 54.043, "max-throughput": 3.74787e+07, "min-throughput": 1.85038e+07 } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.11951e+09, "iterations": 1.23968e+08, "reciprocal-throughput": 21.0581, "latency": 45.4028, "max-throughput": 4.74876e+07, "min-throughput": 2.20251e+07 } Result for aarch64 (A72 @ 2GHz) are: Before new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.62362e+09, "iterations": 3.3376e+07, "reciprocal-throughput": 127.698, "latency": 149.365, "max-throughput": 7.831e+06, "min-throughput": 6.69501e+06 } With new code: "exp10f": { "workload-spec2017.wrf (adapted)": { "duration": 4.29108e+09, "iterations": 6.6752e+07, "reciprocal-throughput": 51.2111, "latency": 77.3568, "max-throughput": 1.9527e+07, "min-throughput": 1.29271e+07 } Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, aarch64-linux-gnu, and sparc64-linux-gnu.	2020-06-19 10:48:15 -03:00
Adhemerval Zanella	2004063fb4	benchtests: Add exp10f benchmark It is based on expf one by converting each line with the formula: new_val = (float) log10 (exp ((double) old_val))	2020-06-19 10:48:15 -03:00
H.J. Lu	27f8864bd4	x86: Update F16C detection [BZ #26133 ] Since F16C requires AVX, set F16C usable only when AVX is usable.	2020-06-18 07:01:58 -07:00
Sunil K Pandey	75870237ff	Fix avx2 strncmp offset compare condition check [BZ #25933 ] strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size.	2020-06-17 07:07:38 -07:00
Samuel Thibault	7a508406df	nptl: Remove now-spurious tst-cancelx9 references They were to be moved to sysdeps/pthread/Makefile in `45fce058f` ('htl: Enable more cancellation tests') * nptl/Makefile: (tests): Remove tst-cancelx9. (CFLAGS-tst-cancelx9.c): Remove.	2020-06-17 15:55:52 +02:00
H.J. Lu	a35a59036e	x86_64: Use %xmmN with vpxor to clear a vector register Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use %xmmN, instead of %ymmN, with vpxor to clear a vector register.	2020-06-17 05:44:02 -07:00
H.J. Lu	b7c9bb183b	x86: Correct bit_cpu_CLFLUSHOPT [BZ #26128 ] bit_cpu_CLFLUSHOPT should be (1u << 23), not (1u << 22).	2020-06-17 05:32:37 -07:00
Paul E. Murphy	b637306d3e	powerpc64le: refactor e_sqrtf128.c Combine both implementations into a single file to allow building twice with appropriate multiarch support when possible.	2020-06-16 13:50:44 -05:00
Joseph Myers	b67339d0bb	Update syscall-names.list for Linux 5.7. Linux 5.7 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 5.7. Tested with build-many-glibcs.py.	2020-06-15 22:58:22 +00:00

1 2 3 4 5 ...

35978 Commits