glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-09 14:50:05 +00:00

Author	SHA1	Message	Date
Stefan Liebler	ad0aa1f549	elf: Remove LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask Remove the environment variable LD_HWCAP_MASK and the tunable glibc.cpu.hwcap_mask as those are not used anymore in common-code after removal in elf/dl-cache.c:search_cache(). The only remaining user is sparc32 where it is used in elf_machine_matches_host(). If sparc32 does not need it anymore, we can get rid of it at all. Otherwise we could also move LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask to be sparc32 specific. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Noah Goldstein	46b5e98ef6	x86: Add seperate non-temporal tunable for memset The tuning for non-temporal stores for memset vs memcpy is not always the same. This includes both the exact value and whether non-temporal stores are profitable at all for a given arch. This patch add `x86_memset_non_temporal_threshold`. Currently we disable non-temporal stores for non Intel vendors as the only benchmarks showing its benefit have been on Intel hardware. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-30 12:36:09 -05:00
caiyinyu	095067efdf	LoongArch: Add glibc.cpu.hwcap support. The current IFUNC selection is always using the most recent features which are available via AT_HWCAP. But in some scenarios it is useful to adjust this selection. The environment variable: GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,zzz,.... can be used to enable HWCAP feature yyy, disable HWCAP feature xxx, where the feature name is case-sensitive and has to match the ones used in sysdeps/loongarch/cpu-tunables.c. Signed-off-by: caiyinyu <caiyinyu@loongson.cn>	2024-04-24 18:22:38 +08:00
Joe Talbott	d370155b9a	manual/tunables - Add entry for enable_secure tunable.	2024-03-01 17:43:03 +00:00
H.J. Lu	848746e88e	elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after relocation. For x86-64, add #define DT_X86_64_PLT (DT_LOPROC + 0) #define DT_X86_64_PLTSZ (DT_LOPROC + 1) #define DT_X86_64_PLTENT (DT_LOPROC + 3) 1. DT_X86_64_PLT: The address of the procedure linkage table. 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage table. 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table entry. With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the memory offset of the indirect branch instruction. Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section with direct branch after relocation when the lazy binding is disabled. PLT rewrite is disabled by default since SELinux may disallow modifying code pages and ld.so can't detect it in all cases. Use $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 to enable PLT rewrite with 32-bit direct jump at run-time or $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2 to enable PLT rewrite with 32-bit direct jump and on APX processors with 64-bit absolute jump at run-time. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-05 05:49:49 -08:00
Adhemerval Zanella	6c6fce572f	elf: Remove /etc/suid-debug support Since malloc debug support moved to a different library (libc_malloc_debug.so), the glibc.malloc.check requires preloading the debug library to enable it. It means that suid-debug support has not been working since 2.34. To restore its support, it would require to add additional information and parsing to where to find libc_malloc_debug.so. It is one thing less that might change AT_SECURE binaries' behavior due to environment configurations. Checked on x86_64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-11-21 16:15:42 -03:00
Wilco Dijkstra	2f5524cc53	AArch64: Remove Falkor memcpy The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-13 16:52:50 +00:00
Adhemerval Zanella	bf033c0072	elf: Add glibc.mem.decorate_maps tunable The PR_SET_VMA_ANON_NAME support is only enabled through a configurable kernel switch, mainly because assigning a name to a anonymous virtual memory area might prevent that area from being merged with adjacent virtual memory areas. For instance, with the following code: void p1 = mmap (NULL, 1024 4096, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0); void p2 = mmap (p1 + (1024 4096), 1024 * 4096, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0); The kernel will potentially merge both mappings resulting in only one segment of size 0x800000. If the segment is names with PR_SET_VMA_ANON_NAME with different names, it results in two mappings. Although this will unlikely be an issue for pthread stacks and malloc arenas (since for pthread stacks the guard page will result in a PROT_NONE segment, similar to the alignment requirement for the arena block), it still might prevent the mmap memory allocated for detail malloc. There is also another potential scalability issue, where the prctl requires to take the mmap global lock which is still not fully fixed in Linux [1] (for pthread stacks and arenas, it is mitigated by the stack cached and the arena reuse). So this patch disables anonymous mapping annotations as default and add a new tunable, glibc.mem.decorate_maps, can be used to enable it. [1] https://lwn.net/Articles/906852/ Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-11-07 10:27:57 -03:00
Mahesh Bodapati	21841f0d56	PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLES This patch enables the option to influence hwcaps used by PowerPC. The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz...., can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx and zzz, where the feature name is case-sensitive and has to match the ones mentioned in the file{sysdeps/powerpc/dl-procinfo.c}. Note that the hwcap tunables only used in the IFUNC selection. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-08-01 07:41:17 -05:00
Paul Pluzhnikov	64d9ebae87	Fix misspellings in manual/ -- BZ 25337	2023-05-27 16:41:44 +00:00
Cupertino Miranda	b630be0922	Created tunable to force small pages on stack allocation. Created tunable glibc.pthread.stack_hugetlb to control when hugepages can be used for stack allocation. In case THP are enabled and glibc.pthread.stack_hugetlb is set to 0, glibc will madvise the kernel not to use allow hugepages for stack allocations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-04-20 13:54:24 -03:00
H.J. Lu	188ecdb777	tunables.texi: Change \code{1} to @code{1} Update `317f1c0a8a` x86-64: Add glibc.cpu.prefer_map_32bit_exec [BZ #28656]	2023-02-23 08:50:19 -08:00
H.J. Lu	317f1c0a8a	x86-64: Add glibc.cpu.prefer_map_32bit_exec [BZ #28656 ] Crossing 2GB boundaries with indirect calls and jumps can use more branch prediction resources on Intel Golden Cove CPU (see the "Misprediction for Branches >2GB" section in Intel 64 and IA-32 Architectures Optimization Reference Manual.) There is visible performance improvement on workloads with many PLT calls when executable and shared libraries are mmapped below 2GB. Add the Prefer_MAP_32BIT_EXEC bit so that mmap will try to map executable or denywrite pages in shared libraries with MAP_32BIT first. NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space layout randomization (ASLR), which is always disabled for SUID programs and can only be enabled by the tunable, glibc.cpu.prefer_map_32bit_exec, or the environment variable, LD_PREFER_MAP_32BIT_EXEC. This works only between shared libraries or between shared libraries and executables with addresses below 2GB. PIEs are usually loaded at a random address above 4GB by the kernel.	2023-02-22 18:28:37 -08:00
Simon Kissane	31be941e43	gmon: improve mcount overflow handling [BZ# 27576] When mcount overflows, no gmon.out file is generated, but no message is printed to the user, leaving the user with no idea why, and thinking maybe there is some bug - which is how BZ 27576 ended up being logged. Print a message to stderr in this case so the user knows what is going on. As a comment in sys/gmon.h acknowledges, the hardcoded MAXARCS value is too small for some large applications, including the test case in that BZ. Rather than increase it, add tunables to enable MINARCS and MAXARCS to be overridden at runtime (glibc.gmon.minarcs and glibc.gmon.maxarcs). So if a user gets the mcount overflow error, they can try increasing maxarcs (they might need to increase minarcs too if the heuristic is wrong in their case.) Note setting minarcs/maxarcs too large can cause monstartup to fail with an out of memory error. If you set them large enough, it can cause an integer overflow in calculating the buffer size. I haven't done anything to defend against that - it would not generally be a security vulnerability, since these tunables will be ignored in suid/sgid programs (due to the SXID_ERASE default), and if you can set GLIBC_TUNABLES in the environment of a process, you can take it over anyway (LD_PRELOAD, LD_LIBRARY_PATH, etc). I thought about modifying the code of monstartup to defend against integer overflows, but doing so is complicated, and I realise the existing code is susceptible to them even prior to this change (e.g. try passing a pathologically large highpc argument to monstartup), so I decided just to leave that possibility in-place. Add a test case which demonstrates mcount overflow and the tunables. Document the new tunables in the manual. Signed-off-by: Simon Kissane <skissane@gmail.com> Reviewed-by: DJ Delorie <dj@redhat.com>	2023-02-22 21:00:14 -05:00
Stefan Liebler	41f67ccbe9	S390: Influence hwcaps/stfle via GLIBC_TUNABLES. This patch enables the option to influence hwcaps and stfle bits used by the s390 specific ifunc-resolvers. The currently x86-specific tunable glibc.cpu.hwcaps is also used on s390x to achieve the task. In addition the user can also set a CPU arch-level like z13 instead of single HWCAP and STFLE features. Note that the tunable only handles the features which are really used in the IFUNC-resolvers. All others are ignored as the values are only used inside glibc. Thus we can influence: - HWCAP_S390_VXRS (z13) - HWCAP_S390_VXRS_EXT (z14) - HWCAP_S390_VXRS_EXT2 (z15) - STFLE_MIE3 (z15) The influenced hwcap/stfle-bits are stored in the s390-specific cpu_features struct which also contains reserved fields for future usage. The ifunc-resolvers and users of stfle bits are adjusted to use the information from cpu_features struct. On 31bit, the ELF_MACHINE_IRELATIVE macro is now also defined. Otherwise the new ifunc-resolvers segfaults as they depend on the not yet processed_rtld_global_ro@GLIBC_PRIVATE relocation.	2023-02-07 09:19:27 +01:00
Florian Weimer	6c93af6b45	malloc: Correct the documentation of the top_pad default DEFAULT_TOP_PAD is defined as 131072 in sysdeps/generic/malloc-machine.h.	2022-08-04 17:20:48 +02:00
Tejas Belagod	e9dd368296	AArch64: Add asymmetric faulting mode for tag violations in mem.tagging tunable The new asymmetric mode is available when HWCAP2_MTE3 is set (support is available), bit2 is set in the tunable (user request per application), and the system is configured such that the asymmetric mode is preferred over sync or async (per-cpu system-wide setting). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2022-06-30 14:01:08 +01:00
Noah Goldstein	b446822b6a	x86: Add bounds `x86_non_temporal_threshold` The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed by memmove-vec-unaligned-erms. The lower-bound is needed because memmove-vec-unaligned-erms unrolls the loop aggressively in the L(large_memset_4x) case. The upper-bound is needed because memmove-vec-unaligned-erms right-shifts the value of `x86_non_temporal_threshold` by LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow. The lack of lower-bound can be a correctness issue. The lack of upper-bound cannot.	2022-06-15 14:25:55 -07:00
Andreas Schwab	d976d44a89	manual: fix reference to source file	2022-05-31 16:22:49 +02:00
Adhemerval Zanella	98d5fcb8d0	malloc: Add Huge Page support for mmap With the morecore hook removed, there is not easy way to provide huge pages support on with glibc allocator without resorting to transparent huge pages. And some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly with the system default size, while a positive value means and specific page size that is matched against the supported ones by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:38 -03:00
Adhemerval Zanella	5f6d8d97c6	malloc: Add madvise support for Transparent Huge Pages Linux Transparent Huge Pages (THP) current supports three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous pages. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-12-15 17:35:14 -03:00
Florian Weimer	0884724a95	elf: Use new dependency sorting algorithm by default The default has to change eventually, and there are no known failures that require a delay. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-14 14:44:04 +01:00
Florian Weimer	e3e589829d	nptl: Add glibc.pthread.rseq tunable to control rseq registration This tunable allows applications to register the rseq area instead of glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-12-09 09:49:32 +01:00
Chung-Lin Tang	15a0c5730d	elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645 ) This second patch contains the actual implementation of a new sorting algorithm for shared objects in the dynamic loader, which solves the slow behavior that the current "old" algorithm falls into when the DSO set contains circular dependencies. The new algorithm implemented here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Another notable implementation adjustment related to this _dl_sort_maps change is the removing of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes. This has been changed to simply use two new bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes do along the way. Tunable support for switching between different sorting algorithms at runtime is also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1 (old algorithm) and 2 (new DFS algorithm) has been added. At time of commit of this patch, the default setting is 1 (old algorithm). Signed-off-by: Chung-Lin Tang <cltang@codesourcery.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-21 11:23:53 -03:00
Siddhesh Poyarekar	fb1621a886	manual: Drop the .so suffix in libc_malloc_debug description All references to libraries in the manual are without the .so prefix, so do the same for libc_malloc_debug. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-07-27 07:54:46 +05:30
Siddhesh Poyarekar	2d2d9f2b48	Move malloc hooks into a compat DSO Remove all malloc hook uses from core malloc functions and move it into a new library libc_malloc_debug.so. With this, the hooks now no longer have any effect on the core library. libc_malloc_debug.so is a malloc interposer that needs to be preloaded to get hooks functionality back so that the debugging features that depend on the hooks, i.e. malloc-check, mcheck and mtrace work again. Without the preloaded DSO these debugging features will be nops. These features will be ported away from hooks in subsequent patches. Similarly, legacy applications that need hooks functionality need to preload libc_malloc_debug.so. The symbols exported by libc_malloc_debug.so are maintained at exactly the same version as libc.so. Finally, static binaries will no longer be able to use malloc debugging features since they cannot preload the debugging DSO. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>	2021-07-22 18:37:59 +05:30
Siddhesh Poyarekar	83e55c982f	glibc.malloc.check: Fix nit in documentation The tunable will not work with any non-zero tunable value since its list of allowed values is 0-3. Fix the documentation to reflect that. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-07-07 07:02:13 +05:30
Florian Weimer	dd45734e32	nptl: Add glibc.pthread.stack_cache_size tunable The valgrind/helgrind test suite needs a way to make stack dealloction more prompt, and this feature seems to be generally useful. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2021-06-28 16:41:58 +02:00
Naohiro Tamura	fa527f345c	aarch64: Added optimized memcpy and memmove for A64FX This patch optimizes the performance of memcpy/memmove for A64FX [1] which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB cache per NUMA node. The performance optimization makes use of Scalable Vector Register with several techniques such as loop unrolling, memory access alignment, cache zero fill, and software pipelining. SVE assembler code for memcpy/memmove is implemented as Vector Length Agnostic code so theoretically it can be run on any SOC which supports ARMv8-A SVE standard. We confirmed that all testcases have been passed by running 'make check' and 'make xcheck' not only on A64FX but also on ThunderX2. And also we confirmed that the SVE 512 bit vector register performance is roughly 4 times better than Advanced SIMD 128 bit register and 8 times better than scalar 64 bit register by running 'make bench'. [1] https://github.com/fujitsu/A64FX Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>	2021-05-27 09:47:53 +01:00
Paul Eggert	bdc674d97b	Improve documentation for malloc etc. (BZ#27719) Cover key corner cases (e.g., whether errno is set) that are well settled in glibc, fix some examples to avoid integer overflow, and update some other dated examples (code needed for K&R C, e.g.). * manual/charset.texi (Non-reentrant String Conversion): * manual/filesys.texi (Symbolic Links): * manual/memory.texi (Allocating Cleared Space): * manual/socket.texi (Host Names): * manual/string.texi (Concatenating Strings): * manual/users.texi (Setting Groups): Use reallocarray instead of realloc, to avoid integer overflow issues. * manual/filesys.texi (Scanning Directory Content): * manual/memory.texi (The GNU Allocator, Hooks for Malloc): * manual/tunables.texi: Use code font for 'malloc' instead of roman font. (Symbolic Links): Don't assume readlink return value fits in 'int'. * manual/memory.texi (Memory Allocation and C, Basic Allocation) (Malloc Examples, Alloca Example): * manual/stdio.texi (Formatted Output Functions): * manual/string.texi (Concatenating Strings, Collation Functions): Omit pointer casts that are needed only in ancient K&R C. * manual/memory.texi (Basic Allocation): Say that malloc sets errno on failure. Say "convert" rather than "cast", since casts are no longer needed. * manual/memory.texi (Basic Allocation): * manual/string.texi (Concatenating Strings): In examples, use C99 declarations after statements for brevity. * manual/memory.texi (Malloc Examples): Add portability notes for malloc (0), errno setting, and PTRDIFF_MAX. (Changing Block Size): Say that realloc (p, 0) acts like (p ? (free (p), NULL) : malloc (0)). Add xreallocarray example, since other examples can use it. Add portability notes for realloc (0, 0), realloc (p, 0), PTRDIFF_MAX, and improve notes for reallocating to the same size. (Allocating Cleared Space): Reword now-confusing discussion about replacement, and xref "Replacing malloc". * manual/stdio.texi (Formatted Output Functions): Don't assume message size fits in 'int'. * manual/string.texi (Concatenating Strings): Fix undefined behavior involving arithmetic on a freed pointer.	2021-04-13 12:17:56 -07:00
H.J. Lu	86f65dffc2	ld.so: Add --list-tunables to print tunable values Pass --list-tunables to ld.so to print tunables with min and max values. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-15 05:59:10 -08:00
Richard Earnshaw	26450d04d3	elf: Add a tunable to control use of tagged memory Add a new glibc tunable: mem.tagging. This is a decimal constant in the range 0-255 but used as a bit-field. Bit 0 enables use of tagged memory in the malloc family of functions. Bit 1 enables precise faulting of tag failure on platforms where this can be controlled. Other bits are currently unused, but if set will cause memory tag checking for the current process to be enabled in the kernel.	2020-12-21 15:25:25 +00:00
Patrick McGehearty	d3c5702747	Reversing calculation of __x86_shared_non_temporal_threshold The __x86_shared_non_temporal_threshold determines when memcpy on x86 uses non_temporal stores to avoid pushing other data out of the last level cache. This patch proposes to revert the calculation change made by H.J. Lu's patch of June 2, 2017. H.J. Lu's patch selected a threshold suitable for a single thread getting maximum performance. It was tuned using the single threaded large memcpy micro benchmark on an 8 core processor. The last change changes the threshold from using 3/4 of one thread's share of the cache to using 3/4 of the entire cache of a multi-threaded system before switching to non-temporal stores. Multi-threaded systems with more than a few threads are server-class and typically have many active threads. If one thread consumes 3/4 of the available cache for all threads, it will cause other active threads to have data removed from the cache. Two examples show the range of the effect. John McCalpin's widely parallel Stream benchmark, which runs in parallel and fetches data sequentially, saw a 20% slowdown with this patch on an internal system test of 128 threads. This regression was discovered when comparing OL8 performance to OL7. An example that compares normal stores to non-temporal stores may be found at https://vgatherps.github.io/2018-09-02-nontemporal/. A simple test shows performance loss of 400 to 500% due to a failure to use nontemporal stores. These performance losses are most likely to occur when the system load is heaviest and good performance is critical. The tunable x86_non_temporal_threshold can be used to override the default for the knowledgable user who really wants maximum cache allocation to a single thread in a multi-threaded system. The manual entry for the tunable has been expanded to provide more information about its purpose. modified: sysdeps/x86/cacheinfo.c modified: manual/tunables.texi	2020-09-28 22:10:39 +00:00
Szabolcs Nagy	ffb17e7ba3	rtld: Avoid using up static TLS surplus for optimizations [BZ #25051 ] On some targets static TLS surplus area can be used opportunistically for dynamically loaded modules such that the TLS access then becomes faster (TLSDESC and powerpc TLS optimization). However we don't want all surplus TLS to be used for this optimization because dynamically loaded modules with initial-exec model TLS can only use surplus TLS. The new contract for surplus static TLS use is: - libc.so can have up to 192 bytes of IE TLS, - other system libraries together can have up to 144 bytes of IE TLS. - Some "optional" static TLS is available for opportunistic use. The optional TLS is now tunable: rtld.optional_static_tls, so users can directly affect the allocated static TLS size. (Note that module unloading with dlclose does not reclaim static TLS. After the optional TLS runs out, TLS access is no longer optimized to use static TLS.) The default setting of rtld.optional_static_tls is 512 so the surplus TLS is 3192 + 4144 + 512 = 1664 by default, the same as before. Fixes BZ #25051. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-08 17:32:56 +01:00
Szabolcs Nagy	17796419b5	rtld: Account static TLS surplus for audit modules The new static TLS surplus size computation is surplus_tls = 192 * (nns-1) + 144 * nns + 512 where nns is controlled via the rtld.nns tunable. This commit accounts audit modules too so nns = rtld.nns + audit modules. rtld.nns should only include the namespaces required by the application, namespaces for audit modules are accounted on top of that so audit modules don't use up the static TLS that is reserved for the application. This allows loading many audit modules without tuning rtld.nns or using up static TLS, and it fixes FAIL: elf/tst-auditmany Note that DL_NNS is currently a hard upper limit for nns, and if rtld.nns + audit modules go over the limit that's a fatal error. By default rtld.nns is 4 which allows 12 audit modules. Counting the audit modules is based on existing audit string parsing code, we cannot use GLRO(dl_naudit) before the modules are actually loaded.	2020-07-08 17:32:56 +01:00
Szabolcs Nagy	0c7b002fac	rtld: Add rtld.nns tunable for the number of supported namespaces TLS_STATIC_SURPLUS is 1664 bytes currently which is not enough to support DL_NNS (== 16) number of dynamic link namespaces, if we assume 192 bytes of TLS are reserved for libc use and 144 bytes are reserved for other system libraries that use IE TLS. A new tunable is introduced to control the number of supported namespaces and to adjust the surplus static TLS size as follows: surplus_tls = 192 * (rtld.nns-1) + 144 * rtld.nns + 512 The default is rtld.nns == 4 and then the surplus TLS size is the same as before, so the behaviour is unchanged by default. If an application creates more namespaces than the rtld.nns setting allows, then it is not guaranteed to work, but the limit is not checked. So existing usage will continue to work, but in the future if an application creates more than 4 dynamic link namespaces then the tunable will need to be set. In this patch DL_NNS is a fixed value and provides a maximum to the rtld.nns setting. Static linking used fixed 2048 bytes surplus TLS, this is changed so the same contract is used as for dynamic linking. With static linking DL_NNS == 1 so rtld.nns tunable is forced to 1, so by default the surplus TLS is reduced to 144 + 512 = 656 bytes. This change is not expected to cause problems. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-08 17:32:56 +01:00
H.J. Lu	3f4b61a0b8	x86: Add thresholds for "rep movsb/stosb" to tunables Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables to update thresholds for "rep movsb" and "rep stosb" at run-time. Note that the user specified threshold for "rep movsb" smaller than the minimum threshold will be ignored. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-06 11:48:42 -07:00
Xuelei Zhang	0db8e7b366	aarch64: Add Huawei Kunpeng to tunable cpu list Kunpeng processer is a 64-bit Arm-compatible CPU released by Huawei, and we have already signed a copyright assignement with the FSF. This patch adds its to cpu list, and related macro for IFUNC. Checked on aarch64-linux-gnu. Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>	2019-12-19 16:31:04 -03:00
DJ Delorie	c48d92b430	Add glibc.malloc.mxfast tunable * elf/dl-tunables.list: Add glibc.malloc.mxfast. * manual/tunables.texi: Document it. * malloc/malloc.c (do_set_mxfast): New. (__libc_mallopt): Call it. * malloc/arena.c: Add mxfast tunable. * malloc/tst-mxfast.c: New. * malloc/Makefile: Add it. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2019-08-09 14:04:03 -04:00
Wilco Dijkstra	1f50f2ad85	Small tcache improvements Change the tcache->counts[] entries to uint16_t - this removes the limit set by char and allows a larger tcache. Remove a few redundant asserts. bench-malloc-thread with 4 threads is ~15% faster on Cortex-A72. Reviewed-by: DJ Delorie <dj@redhat.com> * malloc/malloc.c (MAX_TCACHE_COUNT): Increase to UINT16_MAX. (tcache_put): Remove redundant assert. (tcache_get): Remove redundant asserts. (__libc_malloc): Check tcache count is not zero. * manual/tunables.texi (glibc.malloc.tcache_count): Update maximum.	2019-05-17 18:16:20 +01:00
Wilco Dijkstra	5ad533e8e6	Fix tcache count maximum (BZ #24531 ) The tcache counts[] array is a char, which has a very small range and thus may overflow. When setting tcache_count tunable, there is no overflow check. However the tunable must not be larger than the maximum value of the tcache counts[] array, otherwise it can overflow when filling the tcache. [BZ #24531] * malloc/malloc.c (MAX_TCACHE_COUNT): New define. (do_set_tcache_count): Only update if count is small enough. * manual/tunables.texi (glibc.malloc.tcache_count): Document max value.	2019-05-10 16:38:21 +01:00
Feng Xue	07c3d1ec03	aarch64: Add AmpereComputing emag to tunable cpu list Emag is a 64-bit CPU core released by AmpereComputing. Add its name to cpu list, and corresponding macro as utilities for later IFUNC dispatch. * manual/tunables.texi (Tunable glibc.cpu.name): Add emag. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add emag. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_EMAG): New macro.	2019-02-01 07:59:18 -05:00
Wilco Dijkstra	02f440c1ef	[AArch64] Add ifunc support for Ares Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares supports 2 128-bit loads/stores, use Neon registers for memcpy by selecting __memcpy_falkor by default (we should rename this to __memcpy_simd or similar). * manual/tunables.texi (glibc.cpu.name): Add ares tunable. * sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use __memcpy_falkor for ares. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES): Add new define. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add ares cpu.	2019-01-09 10:35:34 +00:00
Kemi Wang	6310e6be9b	Mutex: Add pthread mutex tunables This patch does not have any functionality change, we only provide a spin count tunes for pthread adaptive spin mutex. The tunable glibc.pthread.mutex_spin_count tunes can be used by system administrator to squeeze system performance according to different hardware capabilities and workload characteristics. The maximum value of spin count is limited to 32767 to avoid the overflow of mutex->__data.__spins variable with the possible type of short in pthread_mutex_lock (). The default value of spin count is set to 100 with the reference to the previous number of times of spinning via trylock. This value would be architecture-specific and can be tuned with kinds of benchmarks to fit most cases in future. I would extend my appreciation sincerely to H.J.Lu for his help to refine this patch series. * manual/tunables.texi (POSIX Thread Tunables): New node. * nptl/Makefile (libpthread-routines): Add pthread_mutex_conf. * nptl/nptl-init.c: Include pthread_mutex_conf.h (__pthread_initialize_minimal_internal) [HAVE_TUNABLES]: Call __pthread_tunables_init. * nptl/pthreadP.h (MAX_ADAPTIVE_COUNT): Remove. (max_adaptive_count): Define. * nptl/pthread_mutex_conf.c: New file. * nptl/pthread_mutex_conf.h: New file. * sysdeps/generic/adaptive_spin_count.h: New file. * sysdeps/nptl/dl-tunables.list: New file. * nptl/pthread_mutex_lock.c (__pthread_mutex_lock): Use max_adaptive_count () not MAX_ADAPTIVE_COUNT. * nptl/pthread_mutex_timedlock.c (__pthrad_mutex_timedlock): Likewise. Suggested-by: Andi Kleen <andi.kleen@intel.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Signed-off-by: Kemi.wang <kemi.wang@intel.com>	2018-12-01 08:19:20 -08:00
Siddhesh Poyarekar	dce452dc52	Rename the glibc.tune namespace to glibc.cpu The glibc.tune namespace is vaguely named since it is a 'tunable', so give it a more specific name that describes what it refers to. Rename the tunable namespace to 'cpu' to more accurately reflect what it encompasses. Also rename glibc.tune.cpu to glibc.cpu.name since glibc.cpu.cpu is weird. * NEWS: Mention the change. * elf/dl-tunables.list: Rename tune namespace to cpu. * sysdeps/powerpc/dl-tunables.list: Likewise. * sysdeps/x86/dl-tunables.list: Likewise. * sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to cpu.name. * elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust. * elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise. * manual/README.tunables: Likewise. * manual/tunables.texi: Likewise. * sysdeps/powerpc/cpu-features.c: Likewise. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/cpu-tunables.c: Likewise. * sysdeps/x86_64/Makefile: Likewise. * sysdeps/x86/dl-cet.c: Likewise. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2018-08-02 23:49:19 +05:30
H.J. Lu	6d90776dff	x86/CET: Document glibc.tune.x86_ibt and glibc.tune.x86_shstk * manual/tunables.texi: Document glibc.tune.x86_ibt and glibc.tune.x86_shstk.	2018-07-18 11:35:03 -07:00
Tobias Klauser	28c3f14f2e	manual: Fix spelling of "Auxiliary."	2018-01-23 11:40:44 -08:00
Adhemerval Zanella	c9cd7b0ce5	powerpc: POWER8 memcpy optimization for cached memory On POWER8, unaligned memory accesses to cached memory has little impact on performance as opposed to its ancestors. It is disabled by default and will only be available when the tunable glibc.tune.cached_memopt is set to 1. __memcpy_power8_cached __memcpy_power7 ============================================================ max-size=4096: 33325.70 ( 12.65%) 38153.00 max-size=8192: 32878.20 ( 11.17%) 37012.30 max-size=16384: 33782.20 ( 11.61%) 38219.20 max-size=32768: 33296.20 ( 11.30%) 37538.30 max-size=65536: 33765.60 ( 10.53%) 37738.40 * manual/tunables.texi (Hardware Capability Tunables): Document glibc.tune.cached_memopt. * sysdeps/powerpc/cpu-features.c: New file. * sysdeps/powerpc/cpu-features.h: New file. * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add _dl_powerpc_cpu_features. * sysdeps/powerpc/dl-tunables.list: New file. * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h. * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h (INIT_ARCH): Initialize use_aligned_memopt. * sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED && IS_IN(rtld))]: Restrict dl_platform_init availability and initialize CPU features used by tunables. * sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add memcpy-power8-cached. * sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add __memcpy_power8_cached. * sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise. * sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S: New file. Reviewed-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>	2017-12-11 17:39:42 -02:00
Rogerio Alves	07ed18d26a	Add elision tunables This patch adds several new tunables to control the behavior of elision on supported platforms[1]. Since elision now depends on tunables, we should always compile with elision enabled, and leave the code disabled, but available for runtime selection. This gives us much better compile-time testing of the existing code to avoid bit-rot[2]. Tested on ppc, ppc64, ppc64le, s390x and x86_64. [1] This part of the patch was initially proposed by Paul Murphy but was "staled" because the framework have changed since the patch was originally proposed: https://patchwork.sourceware.org/patch/10342/ [2] This part of the patch was inititally proposed as a RFC by Carlos O'Donnell. Make sense to me integrate this on the patch: https://sourceware.org/ml/libc-alpha/2017-05/msg00335.html * elf/dl-tunables.list: Add elision parameters. * manual/tunables.texi: Add entries about elision tunable. * sysdeps/unix/sysv/linux/powerpc/elision-conf.c: Add callback functions to dynamically enable/disable elision. Add multiple callbacks functions to set elision parameters. Deleted __libc_enable_secure check. * sysdeps/unix/sysv/linux/s390/elision-conf.c: Likewise. * sysdeps/unix/sysv/linux/x86/elision-conf.c: Likewise. * configure: Regenerated. * configure.ac: Option enable_lock_elision was deleted. * config.h.in: ENABLE_LOCK_ELISION flag was deleted. * config.make.in: Remove references to enable_lock_elision. * manual/install.texi: Elision configure option was removed. * INSTALL: Regenerated to remove enable_lock_elision. * nptl/Makefile: Disable elision so it can verify error case for destroying a mutex. * sysdeps/powerpc/nptl/elide.h: Cleanup ENABLE_LOCK_ELISION check. Deleted macros for the case when ENABLE_LOCK_ELISION was not defined. * sysdeps/s390/configure: Regenerated. * sysdeps/s390/configure.ac: Remove references to enable_lock_elision.. * nptl/tst-mutex8.c: Deleted all #ifndef ENABLE_LOCK_ELISION from the test. * sysdeps/powerpc/powerpc32/sysdep.h: Deleted all ENABLE_LOCK_ELISION checks. * sysdeps/powerpc/powerpc64/sysdep.h: Likewise. * sysdeps/powerpc/sysdep.h: Likewise. * sysdeps/s390/nptl/bits/pthreadtypes-arch.h: Likewise. * sysdeps/unix/sysv/linux/powerpc/force-elision.h: Likewise. * sysdeps/unix/sysv/linux/s390/elision-conf.h: Likewise. * sysdeps/unix/sysv/linux/s390/force-elision.h: Likewise. * sysdeps/unix/sysv/linux/s390/lowlevellock.h: Likewise. * sysdeps/unix/sysv/linux/s390/Makefile: Remove references to enable-lock-elision. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>	2017-12-05 17:48:48 -02:00
Steve Ellcey	9c9ec58197	Add thunderx2t99 and thunderx2t99p1 CPU names to tunables list * manual/tunables.texi (glibc.tune.cpu): Add thunderx2t99 and thunderx2t99p1 to list of cpu names. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add thunderx2t99 and thunderx2t99p1 entries to cpu_list.	2017-09-08 11:02:09 -07:00

1 2

59 Commits