glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-25 22:40:05 +00:00

Author	SHA1	Message	Date
Adhemerval Zanella	98d5fcb8d0	malloc: Add Huge Page support for mmap With the morecore hook removed, there is not easy way to provide huge pages support on with glibc allocator without resorting to transparent huge pages. And some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly with the system default size, while a positive value means and specific page size that is matched against the supported ones by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:38 -03:00
Adhemerval Zanella	6cc3ccc67e	malloc: Move mmap logic to its own function So it can be used with different pagesize and flags. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:15 -03:00
Adhemerval Zanella	7478c9959a	malloc: Add THP/madvise support for sbrk To increase effectiveness with Transparent Huge Page with madvise, the large page size is use instead page size for sbrk increment for the main arena. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:15 -03:00
Adhemerval Zanella	5f6d8d97c6	malloc: Add madvise support for Transparent Huge Pages Linux Transparent Huge Pages (THP) current supports three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous pages. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:14 -03:00
Florian Weimer	cb976fba4c	powerpc: Use global register variable in <thread_pointer.h> A local register variable is merely a compiler hint, and so not appropriate in this context. Move the global register variable into <thread_pointer.h> and include it from <tls.h>, as there can only be one global definition for one particular register. Fixes commit `8dbeb0561e` ("nptl: Add <thread_pointer.h> for defining __thread_pointer"). Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>	2021-12-15 16:06:25 +01:00
Adhemerval Zanella	a6d2f948b7	Use LFS and 64 bit time for installed programs (BZ #15333 ) The installed programs are built with a combination of different values for MODULE_NAME, as below. To enable both Long File Support and 64 bt time, -D_TIME_BITS=64 -D_FILE_OFFSET_BITS=64 is added for nonlibi, nscd, lddlibc4, libresolv, ldconfig, locale_programs, iconvprogs, libnss_files, libnss_compat, libnss_db, libnss_hesiod, libutil, libpcprofile, and libSegFault. nscd/nscd nscd/nscd.o MODULE_NAME=nscd nscd/connections.o MODULE_NAME=nscd nscd/pwdcache.o MODULE_NAME=nscd nscd/getpwnam_r.o MODULE_NAME=nscd nscd/getpwuid_r.o MODULE_NAME=nscd nscd/grpcache.o MODULE_NAME=nscd nscd/getgrnam_r.o MODULE_NAME=nscd nscd/getgrgid_r.o MODULE_NAME=nscd nscd/hstcache.o MODULE_NAME=nscd nscd/gethstbyad_r.o MODULE_NAME=nscd nscd/gethstbynm3_r.o MODULE_NAME=nscd nscd/getsrvbynm_r.o MODULE_NAME=nscd nscd/getsrvbypt_r.o MODULE_NAME=nscd nscd/servicescache.o MODULE_NAME=nscd nscd/dbg_log.o MODULE_NAME=nscd nscd/nscd_conf.o MODULE_NAME=nscd nscd/nscd_stat.o MODULE_NAME=nscd nscd/cache.o MODULE_NAME=nscd nscd/mem.o MODULE_NAME=nscd nscd/nscd_setup_thread.o MODULE_NAME=nscd nscd/xmalloc.o MODULE_NAME=nscd nscd/xstrdup.o MODULE_NAME=nscd nscd/aicache.o MODULE_NAME=nscd nscd/initgrcache.o MODULE_NAME=nscd nscd/gai.o MODULE_NAME=nscd nscd/res_hconf.o MODULE_NAME=nscd nscd/netgroupcache.o MODULE_NAME=nscd nscd/cachedumper.o MODULE_NAME=nscd elf/lddlibc4 elf/lddlibc4 MODULE_NAME=lddlibc4 elf/pldd elf/pldd.o MODULE_NAME=nonlib elf/xmalloc.o MODULE_NAME=nonlib elf/sln elf/sln.o MODULE_NAME=nonlib elf/static-stubs.o MODULE_NAME=nonlib elf/sprof MODULE_NAME=nonlib elf/ldconfig elf/ldconfig.o MODULE_NAME=ldconfig elf/cache.o MODULE_NAME=nonlib elf/readlib.o MODULE_NAME=nonlib elf/xmalloc.o MODULE_NAME=nonlib elf/xstrdup.o MODULE_NAME=nonlib elf/chroot_canon.o MODULE_NAME=nonlib elf/static-stubs.o MODULE_NAME=nonlib elf/stringtable.o MODULE_NAME=nonlib io/pwd io/pwd.o MODULE_NAME=nonlib locale/locale locale/locale.o MODULE_NAME=locale_programs locale/locale-spec.o MODULE_NAME=locale_programs locale/charmap-dir.o MODULE_NAME=locale_programs locale/simple-hash.o MODULE_NAME=locale_programs locale/xmalloc.o MODULE_NAME=locale_programs locale/xstrdup.o MODULE_NAME=locale_programs locale/record-status.o MODULE_NAME=locale_programs locale/xasprintf.o MODULE_NAME=locale_programs locale/localedef locale/localedef.o MODULE_NAME=locale_programs locale/ld-ctype.o MODULE_NAME=locale_programs locale/ld-messages.o MODULE_NAME=locale_programs locale/ld-monetary.o MODULE_NAME=locale_programs locale/ld-numeric.o MODULE_NAME=locale_programs locale/ld-time.o MODULE_NAME=locale_programs locale/ld-paper.o MODULE_NAME=locale_programs locale/ld-name.o MODULE_NAME=locale_programs locale/ld-address.o MODULE_NAME=locale_programs locale/ld-telephone.o MODULE_NAME=locale_programs locale/ld-measurement.o MODULE_NAME=locale_programs locale/ld-identification.o MODULE_NAME=locale_programs locale/ld-collate.o MODULE_NAME=locale_programs locale/charmap.o MODULE_NAME=locale_programs locale/linereader.o MODULE_NAME=locale_programs locale/locfile.o MODULE_NAME=locale_programs locale/repertoire.o MODULE_NAME=locale_programs locale/locarchive.o MODULE_NAME=locale_programs locale/md5.o MODULE_NAME=locale_programs locale/charmap-dir.o MODULE_NAME=locale_programs locale/simple-hash.o MODULE_NAME=locale_programs locale/xmalloc.o MODULE_NAME=locale_programs locale/xstrdup.o MODULE_NAME=locale_programs locale/record-status.o MODULE_NAME=locale_programs locale/xasprintf.o MODULE_NAME=locale_programs catgets/gencat catgets/gencat.o MODULE_NAME=nonlib catgets/xmalloc.o MODULE_NAME=nonlib nss/makedb nss/makedb.o MODULE_NAME=nonlib nss/xmalloc.o MODULE_NAME=nonlib nss/hash-string.o MODULE_NAME=nonlib nss/getent nss/getent.o MODULE_NAME=nonlib posix/getconf posix/getconf.o MODULE_NAME=nonlib login/utmpdump login/utmpdump.o MODULE_NAME=nonlib debug/pcprofiledump debug/pcprofiledump.o MODULE_NAME=nonlib timezone/zic timezone/zic.o MODULE_NAME=nonlib timezone/zdump timezone/zdump.o MODULE_NAME=nonlib iconv/iconv_prog iconv/iconv_prog.o MODULE_NAME=nonlib iconv/iconv_charmap.o MODULE_NAME=iconvprogs iconv/charmap.o MODULE_NAME=iconvprogs iconv/charmap-dir.o MODULE_NAME=iconvprogs iconv/linereader.o MODULE_NAME=iconvprogs iconv/dummy-repertoire.o MODULE_NAME=iconvprogs iconv/simple-hash.o MODULE_NAME=iconvprogs iconv/xstrdup.o MODULE_NAME=iconvprogs iconv/xmalloc.o MODULE_NAME=iconvprogs iconv/record-status.o MODULE_NAME=iconvprogs iconv/iconvconfig iconv/iconvconfig.o MODULE_NAME=nonlib iconv/strtab.o MODULE_NAME=iconvprogs iconv/xmalloc.o MODULE_NAME=iconvprogs iconv/hash-string.o MODULE_NAME=iconvprogs nss/libnss_files.so MODULE_NAME=libnss_files nss/libnss_compat.so.2 MODULE_NAME=libnss_compat nss/libnss_db.so MODULE_NAME=libnss_db hesiod/libnss_hesiod.so MODULE_NAME=libnss_hesiod login/libutil.so MODULE_NAME=libutil debug/libpcprofile.so MODULE_NAME=libpcprofile debug/libSegFault.so MODULE_NAME=libSegFault Also, to avoid adding both LFS and 64 bit time support on internal tests they are moved to a newer 'testsuite-internal' module. It should be similar to 'nonlib' regarding internal definition and linking namespace. This patch also enables LFS and 64 bit support of libsupport container programs (echo-container, test-container, shell-container, and true-container). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 09:01:01 -03:00
H.J. Lu	4435c29892	Support target specific ALIGN for variable alignment test [BZ #28676 ] Add <tst-file-align.h> to support target specific ALIGN for variable alignment test: 1. Alpha: Use 0x10000. 2. MicroBlaze and Nios II: Use 0x8000. 3. All others: Use 0x200000. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-14 14:50:33 -08:00
H.J. Lu	f6ff87868a	NEWS: Document LD_PREFER_MAP_32BIT_EXEC as x86-64 only	2021-12-14 07:58:05 -08:00
H.J. Lu	fd6062ede3	elf: Align argument of __munmap to page size [BZ #28676 ] On Linux/x86-64, for elf/tst-align3, we now get munmap(0x7f88f9401000, 1126424) = 0 instead of munmap(0x7f1615200018, 544768) = -1 EINVAL (Invalid argument) Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-14 07:16:51 -08:00
Florian Weimer	0884724a95	elf: Use new dependency sorting algorithm by default The default has to change eventually, and there are no known failures that require a delay. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-14 14:44:04 +01:00
Khem Raj	f8392bb766	intl: Emit no lines in bison generated files Improve reproducibility: Do not put any #line preprocessor commands in bison generated files. These lines contain absolute paths containing file locations on the host build machine. Signed-off-by: Juro Bystricky <juro.bystricky@intel.com> Signed-off-by: Khem Raj <raj.khem@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-14 09:33:25 -03:00
Samuel Thibault	ec06717856	hurd: Do not set PIE_UNSUPPORTED This is now supported.	2021-12-14 08:38:05 +01:00
H.J. Lu	1f3d460761	NEWS: Move LD_PREFER_MAP_32BIT_EXEC Move LD_PREFER_MAP_32BIT_EXEC to Deprecated and removed features, and other changes affecting compatibility:	2021-12-13 16:33:57 -08:00
Samuel Thibault	cf44f08379	mach: Fix spurious inclusion of stack_chk_fail_local in libmachuser.a When linking programs statically, stack_chk_fail_local already comes from libc_nonshared, so we don't need it in lib{mach,hurd}user.a.	2021-12-14 01:01:48 +01:00
H.J. Lu	57e349b1b0	Disable DT_RUNPATH on NSS tests [BZ #28455 ] The glibc internal NSS functions should always load NSS modules from the system. For testing purpose, disable DT_RUNPATH on NSS tests so that the glibc internal NSS functions can load testing NSS modules via DT_RPATH. This partially fixes BZ #28455. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-12-13 07:32:04 -08:00
Akila Welihinda	3b1402b3fc	sysdeps: Simplify sin Taylor Series calculation The macro TAYLOR_SIN adds the term `-0.5daa^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5daa^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-13 15:31:05 +01:00
Adhemerval Zanella	104d2005d5	math: Remove the error handling wrapper from hypot and hypotf The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Wilco Dijkstra	2f44eef584	math: Use fmin/fmax on hypot It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	ecb94e9587	aarch64: Add math-use-builtins-f{max,min}.h It allows to remove the arch-specific implementations.	2021-12-13 10:08:46 -03:00
Adhemerval Zanella	583c4d424e	math: Add math-use-builtinds-fmin.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 10:08:43 -03:00
Adhemerval Zanella	72ab1eaec7	math: Add math-use-builtinds-fmax.h It allows the architecture to use the builtin instead of generic implementation.	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	2eb1cd2f47	math: Remove powerpc e_hypot The generic implementation is shows only slight worse performance: POWER10 reciprocal-throughput latency master 8.28478 13.7253 new hypot 7.21945 13.1933 POWER9 reciprocal-throughput latency master 13.4024 14.0967 new hypot 14.8479 15.8061 POWER8 reciprocal-throughput latency master 15.5767 16.8885 new hypot 16.5371 18.4057 One way to improve might to make gcc generate xsmaxdp/xsmindp for fmax/fmin (it onl does for -ffast-math, clang does for default options). Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu (power9).	2021-12-13 09:08:07 -03:00
Adhemerval Zanella	a1d3c9b642	i386: Move hypot implementation to C The generic hypotf is slight slower, mostly due the tricks the assembly does to optimize the isinf/isnan/issignaling. The generic hypot is way slower, since the optimized implementation uses the i386 default excessive precision to issue the operation directly. A similar implementation is provided instead of using the generic implementation: Checked on i686-linux-gnu.	2021-12-13 09:08:02 -03:00
Adhemerval Zanella	c212d6397e	math: Use an improved algorithm for hypotl (ldbl-128) This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.05% results with an error of 1 ulp (453266 results) while the new implementation only shows 0.0001% of total (1280). Checked on aarch64-linux-gnu and x86_64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Adhemerval Zanella	aa9c28cde3	math: Use an improved algorithm for hypotl (ldbl-96) This implementation is based on 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for subnormal results. The main advantage of the new algorithm is its precision. With a random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc current implementation shows around 0.02% results with an error of 1 ulp (23158 results) while the new implementation only shows 0.0001% of total (111). [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Wilco Dijkstra	ccfa865a82	math: Improve hypot performance with FMA Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite.	2021-12-13 09:02:34 -03:00
Wilco Dijkstra	6c848d7038	math: Use an improved algorithm for hypot (dbl-64) This implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for denormal results. The main advantage of the new algorithm is its precision: with a random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc current implementation shows around 0.34% results with an error of 1 ulp (3424869 results) while the new implementation only shows 0.002% of total (18851). The performance result are also only slight worse than current implementation. On x86_64 (Ryzen 5900X) with gcc 12: Before: "hypot": { "workload-random": { "duration": 3.73319e+09, "iterations": 1.12e+08, "reciprocal-throughput": 22.8737, "latency": 43.7904, "max-throughput": 4.37184e+07, "min-throughput": 2.28361e+07 } } After: "hypot": { "workload-random": { "duration": 3.7597e+09, "iterations": 9.8e+07, "reciprocal-throughput": 23.7547, "latency": 52.9739, "max-throughput": 4.2097e+07, "min-throughput": 1.88772e+07 } } Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Adhemerval Zanella	7fe0ace3e2	math: Simplify hypotf implementation Use a more optimized comparison for check for NaN and infinite and add an inlined issignaling implementation for float. With gcc it results in 2 FP comparisons. The file Copyright is also changed to use GPL, the implementation was completely changed by `7c10fd3515` to use double precision instead of scaling and this change removes all the GET_FLOAT_WORD usage. Checked on x86_64-linux-gnu.	2021-12-13 09:02:30 -03:00
Siddhesh Poyarekar	5afe4c0d69	Cleanup encoding in comments Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8 equivalents so that files don't end up with mixed encodings. With this, all files (except tests that actually test different encodings) have a single encoding. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-13 10:01:45 +05:30
Siddhesh Poyarekar	23645707f1	Replace --enable-static-pie with --disable-default-pie Build glibc programs and tests as PIE by default and enable static-pie automatically if the architecture and toolchain supports it. Also add a new configuration option --disable-default-pie to prevent building programs as PIE. Only the following architectures now have PIE disabled by default because they do not work at the moment. hppa, ia64, alpha and csky don't work because the linker is unable to handle a pcrel relocation generated from PIE objects. The microblaze compiler is currently failing with an ICE. GNU hurd tries to enable static-pie, which does not work and hence fails. All these targets have default PIE disabled at the moment and I have left it to the target maintainers to enable PIE on their targets. build-many-glibcs runs clean for all targets. I also tested x86_64 on Fedora and Ubuntu, to verify that the default build as well as --disable-default-pie work as expected with both system toolchains. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-13 08:08:59 +05:30
Samuel Thibault	556a6126f8	hurd: Add rules for static PIE build This fixes [BZ #28671].	2021-12-12 00:42:13 +01:00
Samuel Thibault	26803075e4	hurd: Fix gmon-static We need to use crt0 for gmon-static too.	2021-12-12 00:42:12 +01:00
H.J. Lu	ea5814467a	x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656 ] Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since the first PT_LOAD segment is no longer executable due to defaulting to -z separate-code. This fixes [BZ #28656]. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-10 14:01:34 -08:00
Florian Weimer	f1eeef945d	elf: Use errcode instead of (unset) errno in rtld_chain_load	2021-12-10 21:34:30 +01:00
H.J. Lu	fc2334ab32	Add a testcase to check alignment of PT_LOAD segment [BZ #28676 ]	2021-12-10 11:26:08 -08:00
Rongwei Wang	718fdd87b1	elf: Properly align PT_LOAD segments [BZ #28676 ] When PT_LOAD segment alignment > the page size, allocate enough space to ensure that the segment can be properly aligned. This change helps code segments use huge pages become simple and available. This fixes [BZ #28676]. Signed-off-by: Xu Yu <xuyu@linux.alibaba.com> Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>	2021-12-10 11:25:37 -08:00
Florian Weimer	2e75604f83	elf: Install a symbolic link to ld.so as /usr/bin/ld.so This makes ld.so features such as --preload, --audit, and --list-diagnostics more accessible to end users because they do not need to know the ABI name of the dynamic loader. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-12-10 16:06:58 +01:00
Florian Weimer	5cc3385654	nptl: Add one more barrier to nptl/tst-create1 Without the bar_ctor_finish barrier, it was possible that thread2 re-locked user_lock before ctor had a chance to lock it. ctor then blocked in its locking operation, xdlopen from the main thread did not return, and thread2 was stuck waiting in bar_dtor: thread 1: started. thread 2: started. thread 2: locked user_lock. constructor started: 0. thread 1: in ctor: started. thread 3: started. thread 3: done. thread 2: unlocked user_lock. thread 2: locked user_lock. Fixes the test in commit `83b5323261` ("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-10 11:51:25 +01:00
Florian Weimer	627f5ede70	Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely introduced to support a configuration where the thread pointer has not the same alignment as THREAD_SELF. Only ia64 seems to use that, but for the stack/pointer guard, not for storing tcbhead_t. Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift the thread pointer, potentially landing in a different residue class modulo the alignment, but the changes should not impact that. In general, given that TLS variables have their own alignment requirements, having different alignment for the (unshifted) thread pointer and struct pthread would potentially result in dynamic offsets, leading to more complexity. hppa had different values before: __alignof__ (tcbhead_t), which seems to be 4, and __alignof__ (struct pthread), which was 8 (old default) and is now 32. However, it defines THREAD_SELF as: /* Return the thread descriptor for the current thread. / # define THREAD_SELF \ ({ struct pthread __self; \ __self = __get_cr27(); \ __self - 1; \ }) So the thread pointer points after struct pthread (hence __self - 1), and they have to have the same alignment on hppa as well. Similarly, on ia64, the definitions were different. We have: # define TLS_PRE_TCB_SIZE \ (sizeof (struct pthread) \ + (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \ ? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \ & ~(__alignof__ (struct pthread) - 1)) \ : 0)) # define THREAD_SELF \ ((struct pthread ) ((char ) __thread_self - TLS_PRE_TCB_SIZE)) And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment (confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c). On m68k, we have a larger gap between tcbhead_t and struct pthread. But as far as I can tell, the port is fine with that. The definition of TCB_OFFSET is sufficient to handle the shifted TCB scenario. This fixes commit `23c77f6018` ("nptl: Increase default TCB alignment to 32"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-09 23:47:49 +01:00
Florian Weimer	a41c8e9235	nptl: rseq failure after registration on main thread is fatal This simplifies the application programming model. Browser sandboxes have already been fixed: Sandbox is incompatible with rseq registration <https://bugzilla.mozilla.org/show_bug.cgi?id=1651701> Allow rseq in the Linux sandboxes. r=gcp <https://hg.mozilla.org/mozilla-central/rev/042425712eb1> Sandbox needs to support rseq system call <https://bugs.chromium.org/p/chromium/issues/detail?id=1104160> Linux sandbox: Allow rseq(2) <https://chromium.googlesource.com/chromium/src.git/+/230675d9ac8f1> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	c901c3e764	nptl: Add public rseq symbols and <sys/rseq.h> The relationship between the thread pointer and the rseq area is made explicit. The constant offset can be used by JIT compilers to optimize rseq access (e.g., for really fast sched_getcpu). Extensibility is provided through __rseq_size and __rseq_flags. (In the future, the kernel could request a different rseq size via the auxiliary vector.) Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	e3e589829d	nptl: Add glibc.pthread.rseq tunable to control rseq registration This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-09 09:49:32 +01:00
Florian Weimer	1d350aa060	Linux: Use rseq to accelerate sched_getcpu Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	95e114a091	nptl: Add rseq registration The rseq area is placed directly into struct pthread. rseq registration failure is not treated as an error, so it is possible that threads run with inconsistent registration status. <sys/rseq.h> is not yet installed as a public header. Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-09 09:49:32 +01:00
Florian Weimer	8d1927d8dc	nptl: Introduce THREAD_GETMEM_VOLATILE This will be needed for rseq TCB access. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	ce2248ab91	nptl: Introduce <tcb-access.h> for THREAD_* accessors These are common between most architectures. Only the x86 targets are outliers. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
Florian Weimer	8dbeb0561e	nptl: Add <thread_pointer.h> for defining __thread_pointer <tls.h> already contains a definition that is quite similar, but it is not consistent across architectures. Only architectures for which rseq support is added are covered. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
John David Anglin	409a735816	String: test-memcpy used unaligned types for buffers [BZ 28572] commit `d585ba47fc` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon Nov 1 00:49:48 2021 -0500 string: Make tests birdirectional test-memcpy.c Add tests that had src/dst non 4-byte aligned. Since src/dst are initialized/compared as uint32_t type which is 4-byte aligned this can break on some targets. Fix the issue by specifying a new non-aligned 4-byte `unaligned_uint32_t` for src/dst. Another alternative is to rely on memcpy/memcmp for initializing/testing src/dst. Using memcpy for initializing in memcpy tests, however, could lead to future bugs. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-12-07 22:19:50 -06:00
Aurelien Jarno	cbab7f7268	localedef: check magic value on archive load [BZ #28650 ] localedef currently blindly trust the archive header. When passed an archive file with the wrong endianess, this leads to a segmentation fault: $ localedef --big-endian --list-archive /usr/lib/locale/locale-archive Segmentation fault (core dumped) When passed non-archive files, asserts are reported on the best case, but sometimes it can lead to a segmentation fault: $ localedef --list-archive /bin/true localedef: programs/locarchive.c:1643: show_archive_content: Assertion `used < GET (head->namehash_used)' failed. Aborted (core dumped) $ localedef --list-archive /usr/lib/locale/C.utf8/LC_COLLATE Segmentation fault (core dumped) This patch improves the user experience by looking at the magic value, which is always written, but never checked. It should still be possible to trigger a segmentation fault with crafted files, but this already catch many cases.	2021-12-07 23:32:53 +01:00
H.J. Lu	ceeffe968c	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used.	2021-12-06 07:14:12 -08:00

1 2 3 4 5 ...

38122 Commits