glibc

mirror of https://sourceware.org/git/glibc.git synced 2025-01-06 01:21:08 +00:00

Author	SHA1	Message	Date
Florian Weimer	60d5e40ab2	x86: Remove low-level lock optimization The current approach is to do this optimizations at a higher level, in generic code, so that single-threaded cases can be specifically targeted. Furthermore, using IS_IN (libc) as a compile-time indicator that all locks are private is no longer correct once process-shared lock implementations are moved into libc. The generic <lowlevellock.h> is not compatible with assembler code (obviously), so it's necessary to remove two long-unused #includes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-21 19:49:51 +02:00
Szabolcs Nagy	2208066603	elf: Remove lazy tlsdesc relocation related code Remove generic tlsdesc code related to lazy tlsdesc processing since lazy tlsdesc relocation is no longer supported. This includes removing GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at load time when that lock is already held. Added a documentation comment too. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-21 14:35:53 +01:00
Noah Goldstein	aaa23c3507	x86: Optimize strlen-avx2.S No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-19 18:03:49 -07:00
Noah Goldstein	4ba6558684	x86: Optimize strlen-evex.S No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-19 18:03:49 -07:00
Noah Goldstein	f53790272c	x86: Optimize less_vec evex and avx512 memset-vec-unaligned-erms.S No bug. This commit adds optimized cased for less_vec memset case that uses the avx512vl/avx512bw mask store avoiding the excessive branches. test-memset and test-wmemset are passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-19 15:08:04 -07:00
H.J. Lu	83c5b36822	x86-64: Require BMI2 for strchr-avx2.S Since strchr-avx2.S updated by commit `1f745ecc21` Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h.	2021-04-19 11:01:45 -07:00
H.J. Lu	55bf411b45	x86-64: Require BMI2 for __strlen_evex and __strnlen_evex Since __strlen_evex and __strnlen_evex added by commit `1fd8c163a8` Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation.	2021-04-19 07:51:33 -07:00
noah	1a8605b6cd	x86: Update large memcpy case in memmove-vec-unaligned-erms.S No Bug. This commit updates the large memcpy case (no overlap). The update is to perform memcpy on either 2 or 4 contiguous pages at once. This 1) helps to alleviate the affects of false memory aliasing when destination and source have a close 4k alignment and 2) In most cases and for most DRAM units is a modestly more efficient access pattern. These changes are a clear performance improvement for VEC_SIZE =16/32, though more ambiguous for VEC_SIZE=64. test-memcpy, test-memccpy, test-mempcpy, test-memmove, and tst-memmove-overflow all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-16 10:06:56 -07:00
Szabolcs Nagy	55c9f32380	x86_64: Remove lazy tlsdesc relocation related code _dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for lazy tlsdesc relocation processing which is no longer supported.	2021-04-15 09:47:47 +01:00
Szabolcs Nagy	8f7e09f4db	x86_64: Avoid lazy relocation of tlsdesc [BZ #27137 ] Lazy tlsdesc relocation is racy because the static tls optimization and tlsdesc management operations are done without holding the dlopen lock. This similar to the commit `b7cf203b5c` for aarch64, but it fixes a different race: bug 27137. Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to relocate tlsdesc lazily, but that does not work in a BIND_NOW module due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at load time fixes this bug 27721 too.	2021-04-15 09:47:37 +01:00
Paul Zimmermann	43576de04a	Improve the accuracy of tgamma (BZ #26983 ) With this patch, the maximal known error for tgamma is now reduced to 9 ulps for dbl-64, for all rounding modes. Since exhaustive testing is not possible for dbl-64, it might be that there are still cases with an error larger than 9 ulps, but all known cases are fixed (intensive tests were done to find cases with large errors). Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm, s390x, sparc, and i686). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-07 13:23:39 +02:00
Paul Zimmermann	9acda61d94	Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469 , #14470 , #14471 , #14472 ] For j0f/j1f/y0f/y1f, the largest error for all binary32 inputs is reduced to at most 9 ulps for all rounding modes. The new code is enabled only when there is a cancellation at the very end of the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not give any visible slowdown on average. Two different algorithms are used: * around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of degree 3 are used, computed using the Sollya tool (https://www.sollya.org/) * for large inputs, an asymptotic formula from [1] is used [1] Fast and Accurate Bessel Function Computation, John Harrison, Proceedings of Arith 19, 2009. Inputs yielding the new largest errors are added to auto-libm-test-in, and ulps are regenerated for various targets (thanks Adhemerval Zanella). Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-02 06:15:48 +02:00
Sunil K Pandey	595c22ecd8	x86-64: Fix ifdef indentation in strlen-evex.S Fix some indentations of ifdef in file strlen-evex.S which are off by 1 and confusing to read.	2021-04-01 16:13:33 -07:00
H.J. Lu	b1ec623ed5	x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591 ] config/i386/constraints.md in GCC has (define_constraint "e" "32-bit signed integer constant, or a symbolic reference known to fit that range (for immediate operands in sign-extending x86-64 instructions)." (match_operand 0 "x86_64_immediate_operand")) Since movq takes a signed 32-bit immediate or a register source operand, use "er", instead of "nr"/"ir", constraint for 32-bit signed integer constant or register on movq. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-04-01 07:00:22 -07:00
H.J. Lu	e4fda46310	x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions Update ifunc-memmove.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	4e2d8f3527	x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	7ebba91361	x86-64: Add AVX optimized string/memory functions for RTM Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region.	2021-03-29 07:40:17 -07:00
H.J. Lu	91264fe357	x86-64: Add memcmp family functions with 256-bit EVEX Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	1b968b6b9b	x86-64: Add memset family functions with 256-bit EVEX Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	63ad43566f	x86-64: Add memmove family functions with 256-bit EVEX Update ifunc-memmove.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	525bc2a32c	x86-64: Add strcpy family functions with 256-bit EVEX Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	1fd8c163a8	x86-64: Add ifunc-avx2.h functions with 256-bit EVEX Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set.	2021-03-29 07:40:17 -07:00
H.J. Lu	3e2f285c5f	nptl: Remove MULTI_PAGE_ALIASING [BZ #23554 ] MULTI_PAGE_ALIASING was introduced to mitigate an aliasing issue on Pentium 4. It is no longer needed for processors after Pentium 4.	2021-03-19 15:04:17 -07:00
Wilco Dijkstra	47ad14d789	math: Remove mpa files [BZ #15267 ] Finally remove all mpa related files, headers, declarations, probes, unused tables and update makefiles. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Wilco Dijkstra	db3f7bb558	math: Remove slow paths from asin and acos [BZ #15267 ] This patch series removes all remaining slow paths and related code. First asin/acos, tan, atan, atan2 implementations are updated, and the final patch removes the unused mpa files, headers and probes. Passes buildmanyglibc. Remove slow paths from asin/acos. Add ULP annotations based on previous slow path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Paul Zimmermann	5a051454a9	Add inputs that generate larger error bounds (Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)	2021-02-27 06:32:11 +01:00
Florian Weimer	035c012e32	Reduce the statically linked startup code [BZ #23323 ] It turns out the startup code in csu/elf-init.c has a perfect pair of ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A New Method to Bypass 64-bit Linux ASLR"). These functions are not needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY are already processed by the dynamic linker. However, the dynamic linker skipped the main program for some reason. For maximum backwards compatibility, this is not changed, and instead, the main map is consulted from __libc_start_main if the init function argument is a NULL pointer. For statically linked binaries, the old approach based on linker symbols is still used because there is nothing else available. A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because new binaries running on an old libc would not run their ELF constructors, leading to difficult-to-debug issues.	2021-02-25 12:13:02 +01:00
H.J. Lu	89de9d3958	x86: Use x86/nptl/pthreaddef.h 1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h. 2. Remove sysdeps/x86_64/nptl/pthreaddef.h. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-02-22 15:52:56 -08:00
noah	1f745ecc21	x86-64: Refactor and improve performance of strchr-avx2.S No bug. Just seemed the performance could be improved a bit. Observed and expected behavior are unchanged. Optimized body of main loop. Updated page cross logic and optimized accordingly. Made a few minor instruction selection modifications. No regressions in test suite. Both test-strchrnul and test-strchr passed.	2021-02-08 11:21:33 -08:00
Sajan Karumanchi	6e02b3e932	x86: Adding an upper bound for Enhanced REP MOVSB. In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>	2021-02-02 12:42:15 +01:00
Szabolcs Nagy	374cef32ac	configure: Check for static PIE support Add SUPPORT_STATIC_PIE that targets can define if they support static PIE. This requires PI_STATIC_AND_HIDDEN support and various linker features as described in commit `9d7a3741c9` Add --enable-static-pie configure option to build static PIE [BZ #19574] Currently defined on x86_64, i386 and aarch64 where static PIE is known to work. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-21 15:54:50 +00:00
H.J. Lu	ff6d62e9ed	<sys/platform/x86.h>: Remove the C preprocessor magic In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.	2021-01-21 05:58:17 -08:00
H.J. Lu	0ec583d926	libmvec: Add extra-test-objs to test-extras Add extra-test-objs to test-extras so that they are compiled with -DMODULE_NAME=testsuite instead of -DMODULE_NAME=libc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-19 06:20:46 -08:00
Adhemerval Zanella	d18f59bf92	Fix x86 build with --enable-tunable=no Checked on x86_64-linux-gnu.	2021-01-14 16:04:05 -03:00
H.J. Lu	ecce11aa07	x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717 ] GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA levels: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with GNU_PROPERTY_X86_ISA_1_V[234] marker: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13 Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by commit b0ab06937385e0ae25cebf1991787d64f439bf12 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 30 06:49:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker and commit 32930e4edbc06bc6f10c435dbcc63131715df678 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 9 05:05:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the micro-architecture ISA level required to execute the binary. The marker must be added by programmers explicitly in one of 3 ways: 1. Pass -mneeded to GCC. 2. Add the marker in the linker inputs as this patch does. 3. Pass -z x86-64-v[234] to the linker. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker support to ld.so if binutils 2.32 or newer is used to build glibc: 1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] markers to elf.h. 2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker to abi-note.o based on the ISA level used to compile abi-note.o, assuming that the same ISA level is used to compile the whole glibc. 3. Add isa_1 to cpu_features to record the supported x86 ISA level. 4. Rename _dl_process_cet_property_note to _dl_process_property_note and add GNU_PROPERTY_X86_ISA_1_V[234] marker detection. 5. Update _rtld_main_check and _dl_open_check to check loaded objects with the incompatible ISA level. 6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails on lesser platforms. 7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c. Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and x86-64-v4 machines. Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine and got: [hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1 ./elf/tst-isa-level-1: CPU ISA level is lower than required [hjl@gnu-cfl-2 build-x86_64-linux]$	2021-01-07 13:10:13 -08:00
Wilco Dijkstra	9e97f239ea	Remove dbl-64/wordsize-64 (part 2) Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The second patch just moves all wordsize-64 files and removes a few wordsize-64 uses in comments and Implies files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-07 15:26:26 +00:00
H.J. Lu	6ea5b57afa	x86: Check IFUNC definition in unrelocated executable [BZ #20019 ] Calling an IFUNC function defined in unrelocated executable also leads to segfault. Issue a fatal error message when calling IFUNC function defined in the unrelocated executable from a shared library.	2021-01-04 12:01:01 -08:00
H.J. Lu	3ec5d83d2a	x86-64: Avoid rep movsb with short distance [BZ #27130 ] When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB.	2021-01-04 07:58:57 -08:00
Paul Eggert	2b778ceb40	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: * pre-commit check failed ... remote: * error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master	2021-01-02 12:17:34 -08:00
Siddhesh Poyarekar	94547d9209	x86 long double: Support pseudo numbers in isnanl This syncs up isnanl behaviour with gcc. Also move the isnanl implementation to sysdeps/x86 and remove the sysdeps/x86_64 version. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-24 06:05:40 +05:30
Siddhesh Poyarekar	b7f8815617	x86 long double: Support pseudo numbers in fpclassifyl Also move sysdeps/i386/fpu/s_fpclassifyl.c to sysdeps/x86/fpu/s_fpclassifyl.c and remove sysdeps/x86_64/fpu/s_fpclassifyl.c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-24 06:05:26 +05:30
Paul Zimmermann	cad5ad81d2	add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64)	2020-12-21 10:35:20 +05:30
Anssi Hannula	69a7ca7705	ieee754: Remove unused __sin32 and __cos32 The __sin32 and __cos32 functions were only used in the now removed slow path of asin and acos.	2020-12-18 12:10:31 +05:30
Florian Weimer	f267e1c9dd	x86_64: Add glibc-hwcaps support The subdirectories match those in the x86-64 psABI: `77566eb03b` Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-04 09:36:02 +01:00
Jakub Jelinek	1d9cbb9608	x86: Fix THREAD_SELF definition to avoid ld.so crash (bug 27004) The previous definition of THREAD_SELF did not tell the compiler that %fs (or %gs) usage is invalid for the !DL_LOOKUP_GSCOPE_LOCK case in _dl_lookup_symbol_x. As a result, ld.so could try to use the TCB before it was initialized. As the comment in tls.h explains, asm volatile is undesirable here. Using the __seg_fs (or __seg_gs) namespace does not interfere with optimization, and expresses that THREAD_SELF is potentially trapping.	2020-12-03 13:48:55 +01:00
Florian Weimer	1daccf403b	nptl: Move stack list variables into _rtld_global Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-11-16 19:33:30 +01:00
Adhemerval Zanella	01bd62517c	Remove tls.h inclusion from internal errno.h The tls.h inclusion is not really required and limits possible definition on more arch specific headers. This is a cleanup to allow inline functions on sysdep.h, more specifically on i386 and ia64 which requires to access some tls definitions its own. No semantic changes expected, checked with a build against all affected ABIs.	2020-11-13 12:59:19 -03:00
Florian Weimer	0f34d426ac	x86: Remove UP macro. Define LOCK_PREFIX unconditionally. The UP macro is never defined. Also define LOCK_PREFIX unconditionally, to the same string. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-11-13 15:20:03 +01:00
Samuel Thibault	3d3316b1de	hurd: keep only required PLTs in ld.so We need NO_RTLD_HIDDEN because of the need for PLT calls in ld.so. See Roland's comment in https://sourceware.org/bugzilla/show_bug.cgi?id=15605 "in the Hurd it's crucial that calls like __mmap be the libc ones instead of the rtld-local ones after the bootstrap phase, when the dynamic linker is being used for dlopen and the like." We used to just avoid all hidden use in the rtld ; this commit switches to keeping only those that should use PLT calls, i.e. essentially those defined in sysdeps/mach/hurd/dl-sysdep.c: __assert_fail __assert_perror_fail __*stat64 _exit This fixes a few startup issues, notably the call to __tunable_get_val that is made before PLTs are set up.	2020-11-11 02:36:22 +01:00
H.J. Lu	0f09154c64	x86: Initialize CPU info via IFUNC relocation [BZ 26203] X86 CPU features in ld.so are initialized by init_cpu_features, which is invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is loaded by static executable, DL_PLATFORM_INIT is never called. Also x86 cache info in libc.o and libc.a is initialized by a constructor which may be called too late. Since some fields in _rtld_global_ro in ld.so are initialized by dynamic relocation, we can also initialize x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so by initializing dummy function pointers in ld.so and libc.so via IFUNC relocation. Key points: 1. IFUNC is always supported, independent of --enable-multi-arch or --disable-multi-arch. Linker generates IFUNC relocations from input IFUNC objects and ld.so performs IFUNC relocations. 2. There are no IFUNC dependencies in ld.so before dynamic relocation have been performed, 3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT in dynamic executable and by IFUNC relocation in dlopen in static executable. 4. The x86 cache info in libc.o is initialized by IFUNC relocation. 5. In libc.a, both x86 CPU features and cache info are initialized from ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init is called. Note: _dl_x86_init_cpu_features can be called more than once from DL_PLATFORM_INIT and during relocation in ld.so.	2020-10-16 16:17:53 -07:00
Szabolcs Nagy	238032ead6	aarch64: enforce >=64K guard size [BZ #26691 ] There are several compiler implementations that allow large stack allocations to jump over the guard page at the end of the stack and corrupt memory beyond that. See CVE-2017-1000364. Compilers can emit code to probe the stack such that the guard page cannot be skipped, but on aarch64 the probe interval is 64K by default instead of the minimum supported page size (4K). This patch enforces at least 64K guard on aarch64 unless the guard is disabled by setting its size to 0. For backward compatibility reasons the increased guard is not reported, so it is only observable by exhausting the address space or parsing /proc/self/maps on linux. On other targets the patch has no effect. If the stack probe interval is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can be defined to get large enough stack guard on libc allocated stacks. The patch does not affect threads with user allocated stacks. Fixes bug 26691.	2020-10-02 09:57:44 +01:00
Florian Weimer	90ccfdf176	x86: Use one ldbl2mpn.c file for both i386 and x86_64	2020-09-22 17:58:39 +02:00
H.J. Lu	9620398097	x86: Install <sys/platform/x86.h> [BZ #26124 ] Install <sys/platform/x86.h> so that programmers can do #if __has_include(<sys/platform/x86.h>) #include <sys/platform/x86.h> #endif ... if (CPU_FEATURE_USABLE (SSE2)) ... if (CPU_FEATURE_USABLE (AVX2)) ... <sys/platform/x86.h> exports only: enum { COMMON_CPUID_INDEX_1 = 0, COMMON_CPUID_INDEX_7, COMMON_CPUID_INDEX_80000001, COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007, COMMON_CPUID_INDEX_80000008, COMMON_CPUID_INDEX_7_ECX_1, /* Keep the following line at the end. / COMMON_CPUID_INDEX_MAX }; struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; }; / Get a pointer to the CPU features structure. / extern const struct cpu_features __x86_get_cpu_features (unsigned int max) __attribute__ ((const)); Since all feature checks are done through macros, programs compiled with a newer <sys/platform/x86.h> are compatible with the older glibc binaries as long as the layout of struct cpu_features is identical. The features array can be expanded with backward binary compatibility for both .o and .so files. When COMMON_CPUID_INDEX_MAX is increased to support new processor features, __x86_get_cpu_features in the older glibc binaries returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the new processor feature. No new symbol version is neeeded. Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided. HAS_CPU_FEATURE can be used to identify processor features. Note: Although GCC has __builtin_cpu_supports, it only supports a subset of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE. It doesn't support HAS_CPU_FEATURE.	2020-09-11 17:20:52 -07:00
Ondřej Hošek	23af890b3f	x86-64: Fix FMA4 detection in ifunc [BZ #26534 ] A typo in commit `107e6a3c22` causes the FMA4 code path to be taken on systems that support FMA, even if they do not support FMA4. Fix this to detect FMA4.	2020-09-02 05:07:37 -07:00
Adhemerval Zanella	5ff35e9544	math: Update x86_64 ulps From new j0 test.	2020-08-08 16:43:11 -03:00
Andreas K. Hüttel	180d5a045f	Update x86-64 libm-test-ulps x86_64 Intel(R) Core(TM) i5-8265U gcc (Gentoo 10.1.0-r2 p3) 10.1.0 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-25 17:10:53 -04:00
H.J. Lu	107e6a3c22	x86: Support usable check for all CPU features Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.	2020-07-13 06:05:16 -07:00
H.J. Lu	9016b6f389	x86: Remove the unused __x86_prefetchw Since commit `c867597bff` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 8 13:57:50 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove removed the only usage of __x86_prefetchw, we can remove the unused __x86_prefetchw.	2020-07-11 09:34:03 -07:00
H.J. Lu	3f4b61a0b8	x86: Add thresholds for "rep movsb/stosb" to tunables Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables to update thresholds for "rep movsb" and "rep stosb" at run-time. Note that the user specified threshold for "rep movsb" smaller than the minimum threshold will be ignored. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-06 11:48:42 -07:00
Adhemerval Zanella	b24381e50f	i386: Use builtin sqrtl Checked on i686-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	d19d25dd06	x86_64: Use builtin sqrt{f,l} Checked on x86_64-linux-gnu.	2020-06-22 11:09:49 -03:00
Sunil K Pandey	75870237ff	Fix avx2 strncmp offset compare condition check [BZ #25933 ] strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size.	2020-06-17 07:07:38 -07:00
H.J. Lu	a35a59036e	x86_64: Use %xmmN with vpxor to clear a vector register Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use %xmmN, instead of %ymmN, with vpxor to clear a vector register.	2020-06-17 05:44:02 -07:00
Vineet Gupta	8dbb7a08ec	dl-runtime: reloc_{offset,index} now functions arch overide'able The existing macros are fragile and expect local variables with a certain name. Fix this by defining them as functions with default implementation in a new header dl-runtime.h which arches can override if need be. This came up during ARC port review, hence the need for argument pltgot in reloc_index() which is not needed by existing ports. This patch potentially only affects hppa/x86 ports, build tested for both those configs and a few more. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-05 13:45:46 -07:00
Florian Weimer	501bdb5dd6	Linux: Remove remnants of the getcpu cache The getcpu cache was removed from the kernel in Linux 2.6.24. glibc support from the sched_getcpu implementation was removed in commit `dd26c44403` ("Consolidate sched_getcpu").	2020-05-16 15:47:51 +02:00
H.J. Lu	55c7bcc71b	x86-64: Use RDX_LP on __x86_shared_non_temporal_threshold [BZ #25966 ] Since __x86_shared_non_temporal_threshold is defined as long int __x86_shared_non_temporal_threshold; and long int is 4 bytes for x32, use RDX_LP to compare against __x86_shared_non_temporal_threshold in assembly code.	2020-05-09 12:28:15 -07:00
Joseph Myers	dbb188dd87	Remove unused floating-point configuration from gmp-impl.h. This patch removes the IEEE_DOUBLE_BIG_ENDIAN and IEEE_DOUBLE_MIXED_ENDIAN macros from gmp-impl.h and gmp-mparam.h, and the ieee_double_extract union from gmp-impl.h. The macros were used only in defining the union, which was used nowhere in glibc. As GMP's gmp-impl.h is over 5000 lines, the file in glibc is so far from the GMP version that it doesn't seem to make sense to keep things there that are not relevant in glibc. (I expect there is plenty more in the header after this patch that is also not relevant in glibc and can be cleaned up later.) Tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by this patch.	2020-04-28 15:05:09 +00:00
Adhemerval Zanella	f721171632	Revert "x86_64: Add SSE sfp-exceptions" The __sfp_handle_exceptions is not fully correct regarding raising exceptions, since there is no direct way to raise only FP_EX_OVERFLOW nor FP_EX_UNDERFLOW for SSE mode. Both libgcc and feraiseexcept rely on x87 mode to accomplish it. This reverts commit `460ee50de0`. Checked on x86_64.	2020-04-20 14:56:05 -03:00
Adhemerval Zanella	460ee50de0	x86_64: Add SSE sfp-exceptions The exported x86_64 fenv.h functions operate on both i387 and SSE (since they should work on both float, double, and long double) while the internal libc_fe* set either SSE (float, double, and float128) or i387 (long double). The libgcc __sfp_handle_exceptions (used on float128 implementation), however, will set either SEE or i387 exception depending of the exception to raise. This broke the internal assumption of float128 where only SSE operations will be used. This patch reimplements the libgcc __sfp_handle_exceptions to use only SSE operations and sets libgcc to use it instead of its own implementation. And I think we should fix libgcc in a similar manner, since checking on config/i386/64/sfp-machine.h it already only supports SSE rounding mode and x86_64 ABI also expectes float128 to use SSE registers [1] (although it is not clear on how future implementation might implement it). Checked on x86_64-linux-gnu. [1] https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI	2020-04-17 11:42:29 -03:00
Adhemerval Zanella	17fd707f88	nptl: Remove x86_64 cancellation assembly implementations [BZ #25765 ] All cancellable syscalls are done by C implementations, so there is no no need to use a specialized implementation to optimize register usage. It fixes BZ #25765. Checked on x86_64-linux-gnu.	2020-04-03 10:47:59 -03:00
Paul Zimmermann	a9d42c09a3	math: Add inputs that yield larger errors for float type (x86_64) The corner cases included were generated using exhaustive search for all float/binary32 values on x86_64 (comparing to MPFR for correct rounding to nearest). For the j0/j1/y0 functions, only cases with ulp error <= 9 were included. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-03-31 21:48:54 -04:00
Adhemerval Zanella	1c15464ca0	math: Remove inline math tests With mathinline removal there is no need to keep building and testing inline math tests. The gen-libm-tests.py support to generate ULP_I_* is removed and all libm-test-ulps files are updated to longer have the i{float,double,ldouble} entries. The support for no-test-inline is also removed from both gen-auto-libm-tests and the auto-libm-test-out-* were regenerated. Checked on x86_64-linux-gnu and i686-linux-gnu.	2020-03-19 11:45:44 -03:00
Florian Weimer	fe49a73316	x86: Avoid single-argument _Static_assert in <tls.h> Older GCC versions do not support this extension. Fixes commit `f1bdee6179` ("x86 tls: Use _Static_assert for TLS access size assertion").	2020-02-17 11:12:03 +01:00
Samuel Thibault	f1bdee6179	x86 tls: Use _Static_assert for TLS access size assertion	2020-02-17 00:40:39 +01:00
Florian Weimer	3a0ecccb59	ld.so: Do not export free/calloc/malloc/realloc functions [BZ #25486 ] Exporting functions and relying on symbol interposition from libc.so makes the choice of implementation dependent on DT_NEEDED order, which is not what some compiler drivers expect. This commit replaces one magic mechanism (symbol interposition) with another one (preprocessor-/compiler-based redirection). This makes the hand-over from the minimal malloc to the full malloc more explicit. Removing the ABI symbols is backwards-compatible because libc.so is always in scope, and the dynamic loader will find the malloc-related symbols there since commit `f0b2132b35` ("ld.so: Support moving versioned symbols between sonames [BZ #24741]"). Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-02-15 11:01:23 +01:00
Adhemerval Zanella	fcb78a5505	linux: Consolidate INLINE_SYSCALL With all Linux ABIs using the expected Linux kABI to indicate syscalls errors, there is no need to replicate the INLINE_SYSCALL. The generic Linux sysdep.h includes errno.h even for !__ASSEMBLER__, which is ok now and it allows cleanup some archaic code that assume otherwise. Checked with a build against all affected ABIs.	2020-02-14 21:09:12 -03:00
Wilco Dijkstra	220622dde5	Add libm_alias_finite for _finite symbols This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2020-01-03 10:02:04 -03:00
Joseph Myers	d614a75396	Update copyright dates with scripts/update-copyrights.	2020-01-01 00:14:33 +00:00
Stefan Liebler	1c94bf0f0a	Always use wordsize-64 version of s_trunc.c. This patch replaces s_trunc.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:14 +01:00
Stefan Liebler	9f234eafe8	Always use wordsize-64 version of s_ceil.c. This patch replaces s_ceil.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:13 +01:00
Stefan Liebler	95b0c2c431	Always use wordsize-64 version of s_floor.c. This patch replaces s_floor.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:12 +01:00
Stefan Liebler	ab48bdd098	Always use wordsize-64 version of s_rint.c. This patch replaces s_rint.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:12 +01:00
Stefan Liebler	af123aa950	Always use wordsize-64 version of s_nearbyint.c. This patch replaces s_nearbyint.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:11 +01:00
Florian Weimer	4db71d2f98	elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214 ] This commit adds missing skip_ifunc checks to aarch64, arm, i386, sparc, and x86_64. A new test case ensures that IRELATIVE IFUNC resolvers do not run in various diagnostic modes of the dynamic loader. Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>	2019-12-02 14:55:22 +01:00
Adhemerval Zanella	48dbce60cf	nptl: Add tests for internal pthread_rwlock_t offsets This patch new build tests to check for internal fields offsets for internal pthread_rwlock_t definition. Althoug the '__data.__flags' field layout should be preserved due static initializators, the patch also adds tests for the futexes that may be used in a shared memory (although using different libc version in such scenario is not really supported). Checked with a build against all affected ABIs. Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148	2019-11-26 13:53:36 +00:00
Adhemerval Zanella	71d260c107	nptl: Cleanup mutex internal offset tests The offsets of pthread_mutex_t __data.__nusers, __data.__spins, __data.elision, __data.list are not required to be constant over the releases. Only the __data.__kind is used for static initializers. This patch also adds an additional size check for __data.__kind. Checked with a build against affected ABIs. Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f	2019-11-26 13:53:36 +00:00
Wilco Dijkstra	d0007dc53c	Remove x64 _finite tests and references Remove _finite tests and references from x86_64. Rather than calling __exp_finite, use exp directly (since it's the same entry point). x86_64 builds and passes testsuite. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-10-21 14:29:12 -03:00
Paul Eggert	5a82c74822	Prefer https to http for gnu.org and fsf.org URLs Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http\|ftp)(://(.\.)?(gnu\|fsf\|sourceware)\.org($\|[^.]\|\.[^a-z])),https\2,g s,(http\|ftp)(://(.\.)?)sources\.redhat\.com($\|[^.]\|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '.po' \ ! -name 'ChangeLog' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: * error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: * error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S	2019-09-07 02:43:31 -07:00
Wilco Dijkstra	3c05dd79d0	Use generic memset/memcpy/memmove in benchtests Use the generic C memset/memcpy/memmove in benchtests since comparing against a slow byte-oriented implementation makes no sense. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> 2019-08-29 Wilco Dijkstra <wdijkstr@arm.com> * benchtests/bench-memcpy.c (simple_memcpy): Remove. (generic_memcpy): Include generic C memcpy. * benchtests/bench-memmove.c (simple_memmove): Remove. (generic_memmove): Include generic C memmove. * benchtests/bench-memset.c (simple_memset): Remove. (generic_memset): Include generic C memset. * benchtests/bench-memset-large.c (simple_memset): Remove. (generic_memset): Include generic C memset. * benchtests/bench-memset-walk.c (simple_memset): Remove. (generic_memset): Include generic C memset. * string/memcpy.c (MEMCPY): Add defines to enable redirection. * string/memset.c (MEMSET): Likewise. * sysdeps/x86_64/memcopy.h: Remove empty file.	2019-08-30 17:21:35 +01:00
H.J. Lu	7e681561a3	x86-64: Compile branred.c with -mprefer-vector-width=128 [BZ #24603 ] When compiled with -O3 and AVX, GCC 8 and 9 optimize some loops in sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions, which leads to store forward stall: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 There is no easy fix in compiler. This patch limits vector width to 128 bits to work around this issue. It improves performance of sin and cos by more than 40% on Skylake compiled with -O3 -march=skylake. Tested with GCC 7/8/9 on x86-64. [BZ #24603] * sysdeps/x86_64/configure.ac: Check if -mprefer-vector-width=128 works. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set to -mprefer-vector-width=128 if supported.	2019-07-24 14:48:43 -07:00
H.J. Lu	d039da1c00	x86: Add sysdeps/x86/dl-lookupcfg.h Since sysdeps/i386/dl-lookupcfg.h and sysdeps/x86_64/dl-lookupcfg.h are identical, we can replace them with sysdeps/x86/dl-lookupcfg.h. * sysdeps/i386/dl-lookupcfg.h: Moved to ... * sysdeps/x86/dl-lookupcfg.h: Here. * sysdeps/x86_64/dl-lookupcfg.h: Removed.	2019-06-26 15:07:28 -07:00
Adhemerval Zanella	81a1443941	wcsmbs: optimize wcscat This patch rewrites wcscat using wcslen and wcscpy. This is similar to the optimization done on strcat by `6e46de42fe`. The strcpy changes are mainly to add the internal alias to avoid PLT calls. Checked on x86_64-linux-gnu and a build against the affected architectures. * include/wchar.h (__wcscpy): New prototype. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c (__wcscpy): Route internal symbol to generic implementation. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy): Add internal __wcscpy alias. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise. * sysdeps/s390/wcscpy.c (wcscpy): Likewise. * sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise. * wcsmbs/wcscpy.c (wcscpy): Add * sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to use generic implementation. * wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.	2019-02-27 10:00:37 -03:00
Joseph Myers	32db86d558	Add fall-through comments. This patch adds fall-through comments in some cases where -Wextra produces implicit-fallthrough warnings. The patch is non-exhaustive. Apart from architecture-specific code for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy code, probably should have such changes, but left to be dealt with separately), or places that already had comments about the fall-through but not matching the form expected by -Wimplicit-fallthrough=3 (the default level with -Wextra; my inclination is to adjust those comments to match rather than downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one place where I thought the implicit fallthrough was not correct and so should be handled separately as a bug fix. I think the key thing to consider in review of this patch is whether the fall-through is indeed intended and correct in each place where such a comment is added. Tested for x86_64. * elf/dl-exception.c (_dl_exception_create_format): Add fall-through comments. * elf/ldconfig.c (parse_conf_include): Likewise. * elf/rtld.c (print_statistics): Likewise. * locale/programs/charmap.c (parse_charmap): Likewise. * misc/mntent_r.c (__getmntent_r): Likewise. * posix/wordexp.c (parse_arith): Likewise. (parse_backtick): Likewise. * resolv/ns_ttl.c (ns_parse_ttl): Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.	2019-02-12 10:30:34 +00:00
Andreas Schwab	65f7767a91	Fix handling of collating elements in fnmatch (bug 17396, bug 16976) This fixes the same bug in fnmatch that was fixed by commit `7e2f0d2d77` for regexp matching. As a side effect it also removes the use of an unbound VLA.	2019-02-04 15:45:02 +01:00
H.J. Lu	3f635fb433	x86-64 memcmp: Use unsigned Jcc instructions on size [BZ #24155 ] Since the size argument is unsigned. we should use unsigned Jcc instructions, instead of signed, to check size. Tested on x86-64 and x32, with and without --disable-multi-arch. [BZ #24155] CVE-2019-7309 * NEWS: Updated for CVE-2019-7309. * sysdeps/x86_64/memcmp.S: Use RDX_LP for size. Clear the upper 32 bits of RDX register for x32. Use unsigned Jcc instructions, instead of signed. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp-2. * sysdeps/x86_64/x32/tst-size_t-memcmp-2.c: New test.	2019-02-04 06:31:13 -08:00
H.J. Lu	5165de69c0	x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes strnlen/wcsnlen for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length. Clear the upper 32 bits of RSI register. * sysdeps/x86_64/strlen.S: Use RSI_LP for length. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen and tst-size_t-wcsnlen. * sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file. * sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise.	2019-01-21 11:36:47 -08:00
H.J. Lu	c7c54f65b0	x86-64 strncpy: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes strncpy for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strcpy-avx2.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy. * sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file.	2019-01-21 11:35:34 -08:00
H.J. Lu	ee915088a0	x86-64 strncmp family: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes the strncmp family for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strcmp-avx2.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/strcmp-sse42.S: Likewise. * sysdeps/x86_64/strcmp.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncasecmp, tst-size_t-strncmp and tst-size_t-wcsncmp. * sysdeps/x86_64/x32/tst-size_t-strncasecmp.c: New file. * sysdeps/x86_64/x32/tst-size_t-strncmp.c: Likewise. * sysdeps/x86_64/x32/tst-size_t-wcsncmp.c: Likewise.	2019-01-21 11:34:04 -08:00
H.J. Lu	82d0b4a4d7	x86-64 memset/wmemset: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memset/wmemset for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-wmemset. * sysdeps/x86_64/x32/tst-size_t-memset.c: New file. * sysdeps/x86_64/x32/tst-size_t-wmemset.c: Likewise.	2019-01-21 11:32:37 -08:00
H.J. Lu	ecd8b842cf	x86-64 memrchr: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memrchr for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/memrchr.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/memrchr-avx2.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memrchr. * sysdeps/x86_64/x32/tst-size_t-memrchr.c: New file.	2019-01-21 11:30:12 -08:00

1 2 3 4 5 ...

1385 Commits