glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-13 14:50:17 +00:00

Author	SHA1	Message	Date
H.J. Lu	a0db678071	x86-64: Move strlen.S to multiarch/strlen-vec.S Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1 version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants. This also removes the unused symbols, __GI___strlen_sse2 and __GI___wcsnlen_sse4_1.	2021-06-23 10:24:35 -07:00
H.J. Lu	79aec84102	Properly check stack alignment [BZ #27901 ] 1. Replace if ((((uintptr_t) &_d) & (__alignof (double) - 1)) != 0) which may be optimized out by compiler, with int __attribute__ ((weak, noclone, noinline)) is_aligned (void *p, int align) { return (((uintptr_t) p) & (align - 1)) != 0; } 2. Add TEST_STACK_ALIGN_INIT to TEST_STACK_ALIGN. 3. Add a common TEST_STACK_ALIGN_INIT to check 16-byte stack alignment for both i386 and x86-64. 4. Update powerpc to use TEST_STACK_ALIGN_INIT. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-05-24 07:42:12 -07:00
Noah Goldstein	1b992204f6	x86: Improve memmove-vec-unaligned-erms.S This patch changes the condition for copy 4x VEC so that if length is exactly equal to 4 * VEC_SIZE it will use the 4x VEC case instead of 8x VEC case. Results For Skylake memcpy-avx2-erms size, al1 , al2 , Cur T , New T , Win , New / Cur 128 , 0 , 0 , 9.137 , 6.873 , New , 75.22 128 , 7 , 0 , 12.933 , 7.732 , New , 59.79 128 , 0 , 7 , 11.852 , 6.76 , New , 57.04 128 , 7 , 7 , 12.587 , 6.808 , New , 54.09 Results For Icelake memcpy-evex-erms size, al1 , al2 , Cur T , New T , Win , New / Cur 128 , 0 , 0 , 9.963 , 5.416 , New , 54.36 128 , 7 , 0 , 16.467 , 8.061 , New , 48.95 128 , 0 , 7 , 14.388 , 7.644 , New , 53.13 128 , 7 , 7 , 14.546 , 7.642 , New , 52.54 Results For Tigerlake memcpy-evex-erms size, al1 , al2 , Cur T , New T , Win , New / Cur 128 , 0 , 0 , 8.979 , 4.95 , New , 55.13 128 , 7 , 0 , 14.245 , 7.122 , New , 50.0 128 , 0 , 7 , 12.668 , 6.675 , New , 52.69 128 , 7 , 7 , 13.042 , 6.802 , New , 52.15 Results For Skylake memmove-avx2-erms size, al1 , al2 , Cur T , New T , Win , New / Cur 128 , 0 , 32 , 6.181 , 5.691 , New , 92.07 128 , 32 , 0 , 6.165 , 5.752 , New , 93.3 128 , 0 , 7 , 13.923 , 9.37 , New , 67.3 128 , 7 , 0 , 12.049 , 10.182 , New , 84.5 Results For Icelake memmove-evex-erms size, al1 , al2 , Cur T , New T , Win , New / Cur 128 , 0 , 32 , 5.479 , 4.889 , New , 89.23 128 , 32 , 0 , 5.127 , 4.911 , New , 95.79 128 , 0 , 7 , 18.885 , 13.547 , New , 71.73 128 , 7 , 0 , 15.565 , 14.436 , New , 92.75 Results For Tigerlake memmove-evex-erms size, al1 , al2 , Cur T , New T , Win , New / Cur 128 , 0 , 32 , 5.275 , 4.815 , New , 91.28 128 , 32 , 0 , 5.376 , 4.565 , New , 84.91 128 , 0 , 7 , 19.426 , 14.273 , New , 73.47 128 , 7 , 0 , 15.924 , 14.951 , New , 93.89 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-05-23 22:50:49 -04:00
Noah Goldstein	6abf27980a	x86: Improve memset-vec-unaligned-erms.S No bug. This commit makes a few small improvements to memset-vec-unaligned-erms.S. The changes are 1) only aligning to 64 instead of 128. Either alignment will perform equally well in a loop and 128 just increases the odds of having to do an extra iteration which can be significant overhead for small values. 2) Align some targets and the loop. 3) Remove an ALU from the alignment process. 4) Reorder the last 4x VEC so that they are stored after the loop. 5) Move the condition for leq 8x VEC to before the alignment process. test-memset and test-wmemset are both passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-05-20 17:28:33 -04:00
Noah Goldstein	4ad473e97a	x86: Optimize memcmp-evex-movbe.S No bug. This commit optimizes memcmp-evex.S. The optimizations include adding a new vec compare path for small sizes, reorganizing the entry control flow, removing some unnecissary ALU instructions from the main loop, and most importantly replacing the heavy use of vpcmp + kand logic with vpxor + vptern. test-memcmp and test-wmemcmp are both passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-05-18 22:57:51 -04:00
Noah Goldstein	16d12015c5	x86: Optimize memcmp-avx2-movbe.S No bug. This commit optimizes memcmp-avx2.S. The optimizations include adding a new vec compare path for small sizes, reorganizing the entry control flow, and removing some unnecissary ALU instructions from the main loop. test-memcmp and test-wmemcmp are both passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-05-18 22:57:44 -04:00
Szabolcs Nagy	f4f8f4d4e0	elf: Use relaxed atomics for racy accesses [BZ #19329 ] This is a follow up patch to the fix for bug 19329. This adds relaxed MO atomics to accesses that were previously data races but are now race conditions, and where relaxed MO is sufficient. The race conditions all follow the pattern that the write is behind the dlopen lock, but a read can happen concurrently (e.g. during tls access) without holding the lock. For slotinfo entries the read value only matters if it reads from a synchronized write in dlopen or dlclose, otherwise the related dtv entry is not valid to access so it is fine to leave it in an inconsistent state. The same applies for GL(dl_tls_max_dtv_idx) and GL(dl_tls_generation), but there the algorithm relies on the fact that the read of the last synchronized write is an increasing value. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-05-11 17:16:37 +01:00
Noah Goldstein	104c7b1967	x86: Add EVEX optimized memchr family not safe for RTM No bug. This commit adds a new implementation for EVEX memchr that is not safe for RTM because it uses vzeroupper. The benefit is that by using ymm0-ymm15 it can use vpcmpeq and vpternlogd in the 4x loop which is faster than the RTM safe version which cannot use vpcmpeq because there is no EVEX encoding for the instruction. All parts of the implementation aside from the 4x loop are the same for the two versions and the optimization is only relevant for large sizes. Tigerlake: size , algn , Pos , Cur T , New T , Win , Dif 512 , 6 , 192 , 9.2 , 9.04 , no-RTM , 0.16 512 , 7 , 224 , 9.19 , 8.98 , no-RTM , 0.21 2048 , 0 , 256 , 10.74 , 10.54 , no-RTM , 0.2 2048 , 0 , 512 , 14.81 , 14.87 , RTM , 0.06 2048 , 0 , 1024 , 22.97 , 22.57 , no-RTM , 0.4 2048 , 0 , 2048 , 37.49 , 34.51 , no-RTM , 2.98 <-- Icelake: size , algn , Pos , Cur T , New T , Win , Dif 512 , 6 , 192 , 7.6 , 7.3 , no-RTM , 0.3 512 , 7 , 224 , 7.63 , 7.27 , no-RTM , 0.36 2048 , 0 , 256 , 8.48 , 8.38 , no-RTM , 0.1 2048 , 0 , 512 , 11.57 , 11.42 , no-RTM , 0.15 2048 , 0 , 1024 , 17.92 , 17.38 , no-RTM , 0.54 2048 , 0 , 2048 , 30.37 , 27.34 , no-RTM , 3.03 <-- test-memchr, test-wmemchr, and test-rawmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-05-08 16:26:30 -04:00
Alice Xu	6ea916adfa	x86-64: Fix an unknown vector operation in memchr-evex.S An unknown vector operation occurred in commit `2a76821c30`. Fixed it by using "ymm{k1}{z}" but not "ymm {k1} {z}". Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-05-07 19:03:21 -07:00
Adhemerval Zanella	db373e4c57	Remove architecture specific sched_cpucount optimizations And replace the generic algorithm with the Brian Kernighan's one. GCC optimize it with popcnt if the architecture supports, so there is no need to add the extra POPCNT define to enable it. This is really a micro-optimization that only adds complexity: recent ABIs already support it (x86-64-v2 or power64le) and it simplifies the code for internal usage, since i686 does not allow an internal iFUNC call. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu.	2021-05-07 13:35:29 -03:00
Noah Goldstein	2a76821c30	x86: Optimize memchr-evex.S No bug. This commit optimizes memchr-evex.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, saving some ALU in the alignment process, and most importantly increasing ILP in the 4x loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-05-03 21:18:03 -04:00
Noah Goldstein	acfd088a19	x86: Optimize memchr-avx2.S No bug. This commit optimizes memchr-avx2.S. The optimizations include replacing some branches with cmovcc, avoiding some branches entirely in the less_4x_vec case, making the page cross logic less strict, asaving a few instructions the in loop return loop. test-memchr, test-rawmemchr, and test-wmemchr are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-05-03 21:17:21 -04:00
Paul Zimmermann	e6eef0adc5	regenerate ulps on x86_64 with -march=native On x86_64, when configuring glibc with CFLAGS="-O2 -g -march=native", some tests fail. After this patch, "make check" succeeds. Tested on Intel Core i5-4590 with gcc 10.2.1.	2021-04-28 12:46:00 +02:00
Noah Goldstein	7f3e7c262c	x86: Optimize strchr-evex.S No bug. This commit optimizes strchr-evex.S. The optimizations are mostly small things such as save an ALU in the alignment process, saving a few instructions in the loop return. The one significant change is saving 2 instructions in the 4x loop. test-strchr, test-strchrnul, test-wcschr, and test-wcschrnul are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-25 10:04:39 -07:00
Noah Goldstein	ccabe7971f	x86: Optimize strchr-avx2.S No bug. This commit optimizes strchr-avx2.S. The optimizations are all small things such as save an ALU in the alignment process, saving a few instructions in the loop return, saving some bytes in the main loop, and increasing the ILP in the return cases. test-strchr, test-strchrnul, test-wcschr, and test-wcschrnul are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-25 10:04:31 -07:00
Florian Weimer	4baf02b332	nptl: Move pthread_spin_trylock into libc The symbol was moved using scripts/move-symbol-to-libc.py.	2021-04-23 17:06:48 +02:00
Florian Weimer	da8e3710d8	nptl: Move pthread_spin_lock into libc The symbol was moved using scripts/move-symbol-to-libc.py.	2021-04-23 17:06:46 +02:00
Florian Weimer	ce4b3b7bef	nptl: Move pthread_spin_init, Move pthread_spin_unlock into libc For some architectures, the two functions are aliased, so these symbols need to be moved at the same time. The symbols were moved using scripts/move-symbol-to-libc.py.	2021-04-23 17:06:44 +02:00
Florian Weimer	60d5e40ab2	x86: Remove low-level lock optimization The current approach is to do this optimizations at a higher level, in generic code, so that single-threaded cases can be specifically targeted. Furthermore, using IS_IN (libc) as a compile-time indicator that all locks are private is no longer correct once process-shared lock implementations are moved into libc. The generic <lowlevellock.h> is not compatible with assembler code (obviously), so it's necessary to remove two long-unused #includes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-21 19:49:51 +02:00
Szabolcs Nagy	2208066603	elf: Remove lazy tlsdesc relocation related code Remove generic tlsdesc code related to lazy tlsdesc processing since lazy tlsdesc relocation is no longer supported. This includes removing GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at load time when that lock is already held. Added a documentation comment too. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-21 14:35:53 +01:00
Noah Goldstein	aaa23c3507	x86: Optimize strlen-avx2.S No bug. This commit optimizes strlen-avx2.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-19 18:03:49 -07:00
Noah Goldstein	4ba6558684	x86: Optimize strlen-evex.S No bug. This commit optimizes strlen-evex.S. The optimizations are mostly small things but they add up to roughly 10-30% performance improvement for strlen. The results for strnlen are bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen are all passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-19 18:03:49 -07:00
Noah Goldstein	f53790272c	x86: Optimize less_vec evex and avx512 memset-vec-unaligned-erms.S No bug. This commit adds optimized cased for less_vec memset case that uses the avx512vl/avx512bw mask store avoiding the excessive branches. test-memset and test-wmemset are passing. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-19 15:08:04 -07:00
H.J. Lu	83c5b36822	x86-64: Require BMI2 for strchr-avx2.S Since strchr-avx2.S updated by commit `1f745ecc21` Author: noah <goldstein.w.n@gmail.com> Date: Wed Feb 3 00:38:59 2021 -0500 x86-64: Refactor and improve performance of strchr-avx2.S uses sarx: c4 e2 72 f7 c0 sarx %ecx,%eax,%eax for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and ifunc-avx2.h.	2021-04-19 11:01:45 -07:00
H.J. Lu	55bf411b45	x86-64: Require BMI2 for __strlen_evex and __strnlen_evex Since __strlen_evex and __strnlen_evex added by commit `1fd8c163a8` Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Mar 5 06:24:52 2021 -0800 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX use sarx: c4 e2 6a f7 c0 sarx %edx,%eax,%eax require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c. ifunc-avx2.h already requires BMI2 for EVEX implementation.	2021-04-19 07:51:33 -07:00
noah	1a8605b6cd	x86: Update large memcpy case in memmove-vec-unaligned-erms.S No Bug. This commit updates the large memcpy case (no overlap). The update is to perform memcpy on either 2 or 4 contiguous pages at once. This 1) helps to alleviate the affects of false memory aliasing when destination and source have a close 4k alignment and 2) In most cases and for most DRAM units is a modestly more efficient access pattern. These changes are a clear performance improvement for VEC_SIZE =16/32, though more ambiguous for VEC_SIZE=64. test-memcpy, test-memccpy, test-mempcpy, test-memmove, and tst-memmove-overflow all pass. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-04-16 10:06:56 -07:00
Szabolcs Nagy	55c9f32380	x86_64: Remove lazy tlsdesc relocation related code _dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for lazy tlsdesc relocation processing which is no longer supported.	2021-04-15 09:47:47 +01:00
Szabolcs Nagy	8f7e09f4db	x86_64: Avoid lazy relocation of tlsdesc [BZ #27137 ] Lazy tlsdesc relocation is racy because the static tls optimization and tlsdesc management operations are done without holding the dlopen lock. This similar to the commit `b7cf203b5c` for aarch64, but it fixes a different race: bug 27137. Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to relocate tlsdesc lazily, but that does not work in a BIND_NOW module due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at load time fixes this bug 27721 too.	2021-04-15 09:47:37 +01:00
Paul Zimmermann	43576de04a	Improve the accuracy of tgamma (BZ #26983 ) With this patch, the maximal known error for tgamma is now reduced to 9 ulps for dbl-64, for all rounding modes. Since exhaustive testing is not possible for dbl-64, it might be that there are still cases with an error larger than 9 ulps, but all known cases are fixed (intensive tests were done to find cases with large errors). Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm, s390x, sparc, and i686). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-07 13:23:39 +02:00
Paul Zimmermann	9acda61d94	Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469 , #14470 , #14471 , #14472 ] For j0f/j1f/y0f/y1f, the largest error for all binary32 inputs is reduced to at most 9 ulps for all rounding modes. The new code is enabled only when there is a cancellation at the very end of the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not give any visible slowdown on average. Two different algorithms are used: * around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of degree 3 are used, computed using the Sollya tool (https://www.sollya.org/) * for large inputs, an asymptotic formula from [1] is used [1] Fast and Accurate Bessel Function Computation, John Harrison, Proceedings of Arith 19, 2009. Inputs yielding the new largest errors are added to auto-libm-test-in, and ulps are regenerated for various targets (thanks Adhemerval Zanella). Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-02 06:15:48 +02:00
Sunil K Pandey	595c22ecd8	x86-64: Fix ifdef indentation in strlen-evex.S Fix some indentations of ifdef in file strlen-evex.S which are off by 1 and confusing to read.	2021-04-01 16:13:33 -07:00
H.J. Lu	b1ec623ed5	x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591 ] config/i386/constraints.md in GCC has (define_constraint "e" "32-bit signed integer constant, or a symbolic reference known to fit that range (for immediate operands in sign-extending x86-64 instructions)." (match_operand 0 "x86_64_immediate_operand")) Since movq takes a signed 32-bit immediate or a register source operand, use "er", instead of "nr"/"ir", constraint for 32-bit signed integer constant or register on movq. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-04-01 07:00:22 -07:00
H.J. Lu	e4fda46310	x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions Update ifunc-memmove.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	4e2d8f3527	x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	7ebba91361	x86-64: Add AVX optimized string/memory functions for RTM Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region.	2021-03-29 07:40:17 -07:00
H.J. Lu	91264fe357	x86-64: Add memcmp family functions with 256-bit EVEX Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	1b968b6b9b	x86-64: Add memset family functions with 256-bit EVEX Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	63ad43566f	x86-64: Add memmove family functions with 256-bit EVEX Update ifunc-memmove.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	525bc2a32c	x86-64: Add strcpy family functions with 256-bit EVEX Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.	2021-03-29 07:40:17 -07:00
H.J. Lu	1fd8c163a8	x86-64: Add ifunc-avx2.h functions with 256-bit EVEX Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to select the function optimized with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW and BMI2 since VZEROUPPER isn't needed at function exit. For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP is set.	2021-03-29 07:40:17 -07:00
H.J. Lu	3e2f285c5f	nptl: Remove MULTI_PAGE_ALIASING [BZ #23554 ] MULTI_PAGE_ALIASING was introduced to mitigate an aliasing issue on Pentium 4. It is no longer needed for processors after Pentium 4.	2021-03-19 15:04:17 -07:00
Wilco Dijkstra	47ad14d789	math: Remove mpa files [BZ #15267 ] Finally remove all mpa related files, headers, declarations, probes, unused tables and update makefiles. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Wilco Dijkstra	db3f7bb558	math: Remove slow paths from asin and acos [BZ #15267 ] This patch series removes all remaining slow paths and related code. First asin/acos, tan, atan, atan2 implementations are updated, and the final patch removes the unused mpa files, headers and probes. Passes buildmanyglibc. Remove slow paths from asin/acos. Add ULP annotations based on previous slow path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Paul Zimmermann	5a051454a9	Add inputs that generate larger error bounds (Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)	2021-02-27 06:32:11 +01:00
Florian Weimer	035c012e32	Reduce the statically linked startup code [BZ #23323 ] It turns out the startup code in csu/elf-init.c has a perfect pair of ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A New Method to Bypass 64-bit Linux ASLR"). These functions are not needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY are already processed by the dynamic linker. However, the dynamic linker skipped the main program for some reason. For maximum backwards compatibility, this is not changed, and instead, the main map is consulted from __libc_start_main if the init function argument is a NULL pointer. For statically linked binaries, the old approach based on linker symbols is still used because there is nothing else available. A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because new binaries running on an old libc would not run their ELF constructors, leading to difficult-to-debug issues.	2021-02-25 12:13:02 +01:00
H.J. Lu	89de9d3958	x86: Use x86/nptl/pthreaddef.h 1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h. 2. Remove sysdeps/x86_64/nptl/pthreaddef.h. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-02-22 15:52:56 -08:00
noah	1f745ecc21	x86-64: Refactor and improve performance of strchr-avx2.S No bug. Just seemed the performance could be improved a bit. Observed and expected behavior are unchanged. Optimized body of main loop. Updated page cross logic and optimized accordingly. Made a few minor instruction selection modifications. No regressions in test suite. Both test-strchrnul and test-strchr passed.	2021-02-08 11:21:33 -08:00
Sajan Karumanchi	6e02b3e932	x86: Adding an upper bound for Enhanced REP MOVSB. In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>	2021-02-02 12:42:15 +01:00
Szabolcs Nagy	374cef32ac	configure: Check for static PIE support Add SUPPORT_STATIC_PIE that targets can define if they support static PIE. This requires PI_STATIC_AND_HIDDEN support and various linker features as described in commit `9d7a3741c9` Add --enable-static-pie configure option to build static PIE [BZ #19574] Currently defined on x86_64, i386 and aarch64 where static PIE is known to work. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-21 15:54:50 +00:00
H.J. Lu	ff6d62e9ed	<sys/platform/x86.h>: Remove the C preprocessor magic In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.	2021-01-21 05:58:17 -08:00
H.J. Lu	0ec583d926	libmvec: Add extra-test-objs to test-extras Add extra-test-objs to test-extras so that they are compiled with -DMODULE_NAME=testsuite instead of -DMODULE_NAME=libc. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-19 06:20:46 -08:00
Adhemerval Zanella	d18f59bf92	Fix x86 build with --enable-tunable=no Checked on x86_64-linux-gnu.	2021-01-14 16:04:05 -03:00
H.J. Lu	ecce11aa07	x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717 ] GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA levels: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250 and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with GNU_PROPERTY_X86_ISA_1_V[234] marker: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13 Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by commit b0ab06937385e0ae25cebf1991787d64f439bf12 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 30 06:49:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker and commit 32930e4edbc06bc6f10c435dbcc63131715df678 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Oct 9 05:05:57 2020 -0700 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the micro-architecture ISA level required to execute the binary. The marker must be added by programmers explicitly in one of 3 ways: 1. Pass -mneeded to GCC. 2. Add the marker in the linker inputs as this patch does. 3. Pass -z x86-64-v[234] to the linker. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker support to ld.so if binutils 2.32 or newer is used to build glibc: 1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] markers to elf.h. 2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234] marker to abi-note.o based on the ISA level used to compile abi-note.o, assuming that the same ISA level is used to compile the whole glibc. 3. Add isa_1 to cpu_features to record the supported x86 ISA level. 4. Rename _dl_process_cet_property_note to _dl_process_property_note and add GNU_PROPERTY_X86_ISA_1_V[234] marker detection. 5. Update _rtld_main_check and _dl_open_check to check loaded objects with the incompatible ISA level. 6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails on lesser platforms. 7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c. Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and x86-64-v4 machines. Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine and got: [hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1 ./elf/tst-isa-level-1: CPU ISA level is lower than required [hjl@gnu-cfl-2 build-x86_64-linux]$	2021-01-07 13:10:13 -08:00
Wilco Dijkstra	9e97f239ea	Remove dbl-64/wordsize-64 (part 2) Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The second patch just moves all wordsize-64 files and removes a few wordsize-64 uses in comments and Implies files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-07 15:26:26 +00:00
H.J. Lu	6ea5b57afa	x86: Check IFUNC definition in unrelocated executable [BZ #20019 ] Calling an IFUNC function defined in unrelocated executable also leads to segfault. Issue a fatal error message when calling IFUNC function defined in the unrelocated executable from a shared library.	2021-01-04 12:01:01 -08:00
H.J. Lu	3ec5d83d2a	x86-64: Avoid rep movsb with short distance [BZ #27130 ] When copying with "rep movsb", if the distance between source and destination is N*4GB + [1..63] with N >= 0, performance may be very slow. This patch updates memmove-vec-unaligned-erms.S for AVX and AVX512 versions with the distance in RCX: cmpl $63, %ecx // Don't use "rep movsb" if ECX <= 63 jbe L(Don't use rep movsb") Use "rep movsb" Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its performance impact is within noise range as "rep movsb" is only used for data size >= 4KB.	2021-01-04 07:58:57 -08:00
Paul Eggert	2b778ceb40	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: * pre-commit check failed ... remote: * error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master	2021-01-02 12:17:34 -08:00
Siddhesh Poyarekar	94547d9209	x86 long double: Support pseudo numbers in isnanl This syncs up isnanl behaviour with gcc. Also move the isnanl implementation to sysdeps/x86 and remove the sysdeps/x86_64 version. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-24 06:05:40 +05:30
Siddhesh Poyarekar	b7f8815617	x86 long double: Support pseudo numbers in fpclassifyl Also move sysdeps/i386/fpu/s_fpclassifyl.c to sysdeps/x86/fpu/s_fpclassifyl.c and remove sysdeps/x86_64/fpu/s_fpclassifyl.c Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-24 06:05:26 +05:30
Paul Zimmermann	cad5ad81d2	add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64)	2020-12-21 10:35:20 +05:30
Anssi Hannula	69a7ca7705	ieee754: Remove unused __sin32 and __cos32 The __sin32 and __cos32 functions were only used in the now removed slow path of asin and acos.	2020-12-18 12:10:31 +05:30
Florian Weimer	f267e1c9dd	x86_64: Add glibc-hwcaps support The subdirectories match those in the x86-64 psABI: `77566eb03b` Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-12-04 09:36:02 +01:00
Jakub Jelinek	1d9cbb9608	x86: Fix THREAD_SELF definition to avoid ld.so crash (bug 27004) The previous definition of THREAD_SELF did not tell the compiler that %fs (or %gs) usage is invalid for the !DL_LOOKUP_GSCOPE_LOCK case in _dl_lookup_symbol_x. As a result, ld.so could try to use the TCB before it was initialized. As the comment in tls.h explains, asm volatile is undesirable here. Using the __seg_fs (or __seg_gs) namespace does not interfere with optimization, and expresses that THREAD_SELF is potentially trapping.	2020-12-03 13:48:55 +01:00
Florian Weimer	1daccf403b	nptl: Move stack list variables into _rtld_global Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT, formerly __wait_lookup_done) can be implemented directly in ld.so, eliminating the unprotected GL (dl_wait_lookup_done) function pointer. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-11-16 19:33:30 +01:00
Adhemerval Zanella	01bd62517c	Remove tls.h inclusion from internal errno.h The tls.h inclusion is not really required and limits possible definition on more arch specific headers. This is a cleanup to allow inline functions on sysdep.h, more specifically on i386 and ia64 which requires to access some tls definitions its own. No semantic changes expected, checked with a build against all affected ABIs.	2020-11-13 12:59:19 -03:00
Florian Weimer	0f34d426ac	x86: Remove UP macro. Define LOCK_PREFIX unconditionally. The UP macro is never defined. Also define LOCK_PREFIX unconditionally, to the same string. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-11-13 15:20:03 +01:00
Samuel Thibault	3d3316b1de	hurd: keep only required PLTs in ld.so We need NO_RTLD_HIDDEN because of the need for PLT calls in ld.so. See Roland's comment in https://sourceware.org/bugzilla/show_bug.cgi?id=15605 "in the Hurd it's crucial that calls like __mmap be the libc ones instead of the rtld-local ones after the bootstrap phase, when the dynamic linker is being used for dlopen and the like." We used to just avoid all hidden use in the rtld ; this commit switches to keeping only those that should use PLT calls, i.e. essentially those defined in sysdeps/mach/hurd/dl-sysdep.c: __assert_fail __assert_perror_fail __*stat64 _exit This fixes a few startup issues, notably the call to __tunable_get_val that is made before PLTs are set up.	2020-11-11 02:36:22 +01:00
H.J. Lu	0f09154c64	x86: Initialize CPU info via IFUNC relocation [BZ 26203] X86 CPU features in ld.so are initialized by init_cpu_features, which is invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is loaded by static executable, DL_PLATFORM_INIT is never called. Also x86 cache info in libc.o and libc.a is initialized by a constructor which may be called too late. Since some fields in _rtld_global_ro in ld.so are initialized by dynamic relocation, we can also initialize x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so by initializing dummy function pointers in ld.so and libc.so via IFUNC relocation. Key points: 1. IFUNC is always supported, independent of --enable-multi-arch or --disable-multi-arch. Linker generates IFUNC relocations from input IFUNC objects and ld.so performs IFUNC relocations. 2. There are no IFUNC dependencies in ld.so before dynamic relocation have been performed, 3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT in dynamic executable and by IFUNC relocation in dlopen in static executable. 4. The x86 cache info in libc.o is initialized by IFUNC relocation. 5. In libc.a, both x86 CPU features and cache info are initialized from ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init is called. Note: _dl_x86_init_cpu_features can be called more than once from DL_PLATFORM_INIT and during relocation in ld.so.	2020-10-16 16:17:53 -07:00
Szabolcs Nagy	238032ead6	aarch64: enforce >=64K guard size [BZ #26691 ] There are several compiler implementations that allow large stack allocations to jump over the guard page at the end of the stack and corrupt memory beyond that. See CVE-2017-1000364. Compilers can emit code to probe the stack such that the guard page cannot be skipped, but on aarch64 the probe interval is 64K by default instead of the minimum supported page size (4K). This patch enforces at least 64K guard on aarch64 unless the guard is disabled by setting its size to 0. For backward compatibility reasons the increased guard is not reported, so it is only observable by exhausting the address space or parsing /proc/self/maps on linux. On other targets the patch has no effect. If the stack probe interval is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can be defined to get large enough stack guard on libc allocated stacks. The patch does not affect threads with user allocated stacks. Fixes bug 26691.	2020-10-02 09:57:44 +01:00
Florian Weimer	90ccfdf176	x86: Use one ldbl2mpn.c file for both i386 and x86_64	2020-09-22 17:58:39 +02:00
H.J. Lu	9620398097	x86: Install <sys/platform/x86.h> [BZ #26124 ] Install <sys/platform/x86.h> so that programmers can do #if __has_include(<sys/platform/x86.h>) #include <sys/platform/x86.h> #endif ... if (CPU_FEATURE_USABLE (SSE2)) ... if (CPU_FEATURE_USABLE (AVX2)) ... <sys/platform/x86.h> exports only: enum { COMMON_CPUID_INDEX_1 = 0, COMMON_CPUID_INDEX_7, COMMON_CPUID_INDEX_80000001, COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007, COMMON_CPUID_INDEX_80000008, COMMON_CPUID_INDEX_7_ECX_1, /* Keep the following line at the end. / COMMON_CPUID_INDEX_MAX }; struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; }; / Get a pointer to the CPU features structure. / extern const struct cpu_features __x86_get_cpu_features (unsigned int max) __attribute__ ((const)); Since all feature checks are done through macros, programs compiled with a newer <sys/platform/x86.h> are compatible with the older glibc binaries as long as the layout of struct cpu_features is identical. The features array can be expanded with backward binary compatibility for both .o and .so files. When COMMON_CPUID_INDEX_MAX is increased to support new processor features, __x86_get_cpu_features in the older glibc binaries returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the new processor feature. No new symbol version is neeeded. Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided. HAS_CPU_FEATURE can be used to identify processor features. Note: Although GCC has __builtin_cpu_supports, it only supports a subset of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE. It doesn't support HAS_CPU_FEATURE.	2020-09-11 17:20:52 -07:00
Ondřej Hošek	23af890b3f	x86-64: Fix FMA4 detection in ifunc [BZ #26534 ] A typo in commit `107e6a3c22` causes the FMA4 code path to be taken on systems that support FMA, even if they do not support FMA4. Fix this to detect FMA4.	2020-09-02 05:07:37 -07:00
Adhemerval Zanella	5ff35e9544	math: Update x86_64 ulps From new j0 test.	2020-08-08 16:43:11 -03:00
Andreas K. Hüttel	180d5a045f	Update x86-64 libm-test-ulps x86_64 Intel(R) Core(TM) i5-8265U gcc (Gentoo 10.1.0-r2 p3) 10.1.0 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-25 17:10:53 -04:00
H.J. Lu	107e6a3c22	x86: Support usable check for all CPU features Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.	2020-07-13 06:05:16 -07:00
H.J. Lu	9016b6f389	x86: Remove the unused __x86_prefetchw Since commit `c867597bff` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jun 8 13:57:50 2016 -0700 X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove removed the only usage of __x86_prefetchw, we can remove the unused __x86_prefetchw.	2020-07-11 09:34:03 -07:00
H.J. Lu	3f4b61a0b8	x86: Add thresholds for "rep movsb/stosb" to tunables Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables to update thresholds for "rep movsb" and "rep stosb" at run-time. Note that the user specified threshold for "rep movsb" smaller than the minimum threshold will be ignored. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-06 11:48:42 -07:00
Adhemerval Zanella	b24381e50f	i386: Use builtin sqrtl Checked on i686-linux-gnu.	2020-06-22 11:09:49 -03:00
Adhemerval Zanella	d19d25dd06	x86_64: Use builtin sqrt{f,l} Checked on x86_64-linux-gnu.	2020-06-22 11:09:49 -03:00
Sunil K Pandey	75870237ff	Fix avx2 strncmp offset compare condition check [BZ #25933 ] strcmp-avx2.S: In avx2 strncmp function, strings are compared in chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4 vector size comparison, code must check whether it already passed the given offset. This patch implement avx2 offset check condition for strncmp function, if both string compare same for first 4 vector size.	2020-06-17 07:07:38 -07:00
H.J. Lu	a35a59036e	x86_64: Use %xmmN with vpxor to clear a vector register Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use %xmmN, instead of %ymmN, with vpxor to clear a vector register.	2020-06-17 05:44:02 -07:00
Vineet Gupta	8dbb7a08ec	dl-runtime: reloc_{offset,index} now functions arch overide'able The existing macros are fragile and expect local variables with a certain name. Fix this by defining them as functions with default implementation in a new header dl-runtime.h which arches can override if need be. This came up during ARC port review, hence the need for argument pltgot in reloc_index() which is not needed by existing ports. This patch potentially only affects hppa/x86 ports, build tested for both those configs and a few more. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-05 13:45:46 -07:00
Florian Weimer	501bdb5dd6	Linux: Remove remnants of the getcpu cache The getcpu cache was removed from the kernel in Linux 2.6.24. glibc support from the sched_getcpu implementation was removed in commit `dd26c44403` ("Consolidate sched_getcpu").	2020-05-16 15:47:51 +02:00
H.J. Lu	55c7bcc71b	x86-64: Use RDX_LP on __x86_shared_non_temporal_threshold [BZ #25966 ] Since __x86_shared_non_temporal_threshold is defined as long int __x86_shared_non_temporal_threshold; and long int is 4 bytes for x32, use RDX_LP to compare against __x86_shared_non_temporal_threshold in assembly code.	2020-05-09 12:28:15 -07:00
Joseph Myers	dbb188dd87	Remove unused floating-point configuration from gmp-impl.h. This patch removes the IEEE_DOUBLE_BIG_ENDIAN and IEEE_DOUBLE_MIXED_ENDIAN macros from gmp-impl.h and gmp-mparam.h, and the ieee_double_extract union from gmp-impl.h. The macros were used only in defining the union, which was used nowhere in glibc. As GMP's gmp-impl.h is over 5000 lines, the file in glibc is so far from the GMP version that it doesn't seem to make sense to keep things there that are not relevant in glibc. (I expect there is plenty more in the header after this patch that is also not relevant in glibc and can be cleaned up later.) Tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by this patch.	2020-04-28 15:05:09 +00:00
Adhemerval Zanella	f721171632	Revert "x86_64: Add SSE sfp-exceptions" The __sfp_handle_exceptions is not fully correct regarding raising exceptions, since there is no direct way to raise only FP_EX_OVERFLOW nor FP_EX_UNDERFLOW for SSE mode. Both libgcc and feraiseexcept rely on x87 mode to accomplish it. This reverts commit `460ee50de0`. Checked on x86_64.	2020-04-20 14:56:05 -03:00
Adhemerval Zanella	460ee50de0	x86_64: Add SSE sfp-exceptions The exported x86_64 fenv.h functions operate on both i387 and SSE (since they should work on both float, double, and long double) while the internal libc_fe* set either SSE (float, double, and float128) or i387 (long double). The libgcc __sfp_handle_exceptions (used on float128 implementation), however, will set either SEE or i387 exception depending of the exception to raise. This broke the internal assumption of float128 where only SSE operations will be used. This patch reimplements the libgcc __sfp_handle_exceptions to use only SSE operations and sets libgcc to use it instead of its own implementation. And I think we should fix libgcc in a similar manner, since checking on config/i386/64/sfp-machine.h it already only supports SSE rounding mode and x86_64 ABI also expectes float128 to use SSE registers [1] (although it is not clear on how future implementation might implement it). Checked on x86_64-linux-gnu. [1] https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI	2020-04-17 11:42:29 -03:00
Adhemerval Zanella	17fd707f88	nptl: Remove x86_64 cancellation assembly implementations [BZ #25765 ] All cancellable syscalls are done by C implementations, so there is no no need to use a specialized implementation to optimize register usage. It fixes BZ #25765. Checked on x86_64-linux-gnu.	2020-04-03 10:47:59 -03:00
Paul Zimmermann	a9d42c09a3	math: Add inputs that yield larger errors for float type (x86_64) The corner cases included were generated using exhaustive search for all float/binary32 values on x86_64 (comparing to MPFR for correct rounding to nearest). For the j0/j1/y0 functions, only cases with ulp error <= 9 were included. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-03-31 21:48:54 -04:00
Adhemerval Zanella	1c15464ca0	math: Remove inline math tests With mathinline removal there is no need to keep building and testing inline math tests. The gen-libm-tests.py support to generate ULP_I_* is removed and all libm-test-ulps files are updated to longer have the i{float,double,ldouble} entries. The support for no-test-inline is also removed from both gen-auto-libm-tests and the auto-libm-test-out-* were regenerated. Checked on x86_64-linux-gnu and i686-linux-gnu.	2020-03-19 11:45:44 -03:00
Florian Weimer	fe49a73316	x86: Avoid single-argument _Static_assert in <tls.h> Older GCC versions do not support this extension. Fixes commit `f1bdee6179` ("x86 tls: Use _Static_assert for TLS access size assertion").	2020-02-17 11:12:03 +01:00
Samuel Thibault	f1bdee6179	x86 tls: Use _Static_assert for TLS access size assertion	2020-02-17 00:40:39 +01:00
Florian Weimer	3a0ecccb59	ld.so: Do not export free/calloc/malloc/realloc functions [BZ #25486 ] Exporting functions and relying on symbol interposition from libc.so makes the choice of implementation dependent on DT_NEEDED order, which is not what some compiler drivers expect. This commit replaces one magic mechanism (symbol interposition) with another one (preprocessor-/compiler-based redirection). This makes the hand-over from the minimal malloc to the full malloc more explicit. Removing the ABI symbols is backwards-compatible because libc.so is always in scope, and the dynamic loader will find the malloc-related symbols there since commit `f0b2132b35` ("ld.so: Support moving versioned symbols between sonames [BZ #24741]"). Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-02-15 11:01:23 +01:00
Adhemerval Zanella	fcb78a5505	linux: Consolidate INLINE_SYSCALL With all Linux ABIs using the expected Linux kABI to indicate syscalls errors, there is no need to replicate the INLINE_SYSCALL. The generic Linux sysdep.h includes errno.h even for !__ASSEMBLER__, which is ok now and it allows cleanup some archaic code that assume otherwise. Checked with a build against all affected ABIs.	2020-02-14 21:09:12 -03:00
Wilco Dijkstra	220622dde5	Add libm_alias_finite for _finite symbols This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2020-01-03 10:02:04 -03:00
Joseph Myers	d614a75396	Update copyright dates with scripts/update-copyrights.	2020-01-01 00:14:33 +00:00
Stefan Liebler	1c94bf0f0a	Always use wordsize-64 version of s_trunc.c. This patch replaces s_trunc.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:14 +01:00
Stefan Liebler	9f234eafe8	Always use wordsize-64 version of s_ceil.c. This patch replaces s_ceil.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:13 +01:00
Stefan Liebler	95b0c2c431	Always use wordsize-64 version of s_floor.c. This patch replaces s_floor.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 and sparc64 files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:12 +01:00
Stefan Liebler	ab48bdd098	Always use wordsize-64 version of s_rint.c. This patch replaces s_rint.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:12 +01:00
Stefan Liebler	af123aa950	Always use wordsize-64 version of s_nearbyint.c. This patch replaces s_nearbyint.c in sysdeps/dbl-64 with the one in sysdeps/dbl-64/wordsize-64 and removes the latter one. The code is not changed except changes in code style. Also adjusted the include path in x86_64 file. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-12-11 15:12:11 +01:00
Florian Weimer	4db71d2f98	elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214 ] This commit adds missing skip_ifunc checks to aarch64, arm, i386, sparc, and x86_64. A new test case ensures that IRELATIVE IFUNC resolvers do not run in various diagnostic modes of the dynamic loader. Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>	2019-12-02 14:55:22 +01:00
Adhemerval Zanella	48dbce60cf	nptl: Add tests for internal pthread_rwlock_t offsets This patch new build tests to check for internal fields offsets for internal pthread_rwlock_t definition. Althoug the '__data.__flags' field layout should be preserved due static initializators, the patch also adds tests for the futexes that may be used in a shared memory (although using different libc version in such scenario is not really supported). Checked with a build against all affected ABIs. Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148	2019-11-26 13:53:36 +00:00
Adhemerval Zanella	71d260c107	nptl: Cleanup mutex internal offset tests The offsets of pthread_mutex_t __data.__nusers, __data.__spins, __data.elision, __data.list are not required to be constant over the releases. Only the __data.__kind is used for static initializers. This patch also adds an additional size check for __data.__kind. Checked with a build against affected ABIs. Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f	2019-11-26 13:53:36 +00:00
Wilco Dijkstra	d0007dc53c	Remove x64 _finite tests and references Remove _finite tests and references from x86_64. Rather than calling __exp_finite, use exp directly (since it's the same entry point). x86_64 builds and passes testsuite. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2019-10-21 14:29:12 -03:00
Paul Eggert	5a82c74822	Prefer https to http for gnu.org and fsf.org URLs Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http\|ftp)(://(.\.)?(gnu\|fsf\|sourceware)\.org($\|[^.]\|\.[^a-z])),https\2,g s,(http\|ftp)(://(.\.)?)sources\.redhat\.com($\|[^.]\|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '.po' \ ! -name 'ChangeLog' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: * error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: * error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S	2019-09-07 02:43:31 -07:00
Wilco Dijkstra	3c05dd79d0	Use generic memset/memcpy/memmove in benchtests Use the generic C memset/memcpy/memmove in benchtests since comparing against a slow byte-oriented implementation makes no sense. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> 2019-08-29 Wilco Dijkstra <wdijkstr@arm.com> * benchtests/bench-memcpy.c (simple_memcpy): Remove. (generic_memcpy): Include generic C memcpy. * benchtests/bench-memmove.c (simple_memmove): Remove. (generic_memmove): Include generic C memmove. * benchtests/bench-memset.c (simple_memset): Remove. (generic_memset): Include generic C memset. * benchtests/bench-memset-large.c (simple_memset): Remove. (generic_memset): Include generic C memset. * benchtests/bench-memset-walk.c (simple_memset): Remove. (generic_memset): Include generic C memset. * string/memcpy.c (MEMCPY): Add defines to enable redirection. * string/memset.c (MEMSET): Likewise. * sysdeps/x86_64/memcopy.h: Remove empty file.	2019-08-30 17:21:35 +01:00
H.J. Lu	7e681561a3	x86-64: Compile branred.c with -mprefer-vector-width=128 [BZ #24603 ] When compiled with -O3 and AVX, GCC 8 and 9 optimize some loops in sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions, which leads to store forward stall: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 There is no easy fix in compiler. This patch limits vector width to 128 bits to work around this issue. It improves performance of sin and cos by more than 40% on Skylake compiled with -O3 -march=skylake. Tested with GCC 7/8/9 on x86-64. [BZ #24603] * sysdeps/x86_64/configure.ac: Check if -mprefer-vector-width=128 works. * sysdeps/x86_64/configure: Regenerated. * sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New. Set to -mprefer-vector-width=128 if supported.	2019-07-24 14:48:43 -07:00
H.J. Lu	d039da1c00	x86: Add sysdeps/x86/dl-lookupcfg.h Since sysdeps/i386/dl-lookupcfg.h and sysdeps/x86_64/dl-lookupcfg.h are identical, we can replace them with sysdeps/x86/dl-lookupcfg.h. * sysdeps/i386/dl-lookupcfg.h: Moved to ... * sysdeps/x86/dl-lookupcfg.h: Here. * sysdeps/x86_64/dl-lookupcfg.h: Removed.	2019-06-26 15:07:28 -07:00
Adhemerval Zanella	81a1443941	wcsmbs: optimize wcscat This patch rewrites wcscat using wcslen and wcscpy. This is similar to the optimization done on strcat by `6e46de42fe`. The strcpy changes are mainly to add the internal alias to avoid PLT calls. Checked on x86_64-linux-gnu and a build against the affected architectures. * include/wchar.h (__wcscpy): New prototype. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c (__wcscpy): Route internal symbol to generic implementation. * sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy): Add internal __wcscpy alias. * sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise. * sysdeps/s390/wcscpy.c (wcscpy): Likewise. * sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise. * wcsmbs/wcscpy.c (wcscpy): Add * sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to use generic implementation. * wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.	2019-02-27 10:00:37 -03:00
Joseph Myers	32db86d558	Add fall-through comments. This patch adds fall-through comments in some cases where -Wextra produces implicit-fallthrough warnings. The patch is non-exhaustive. Apart from architecture-specific code for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy code, probably should have such changes, but left to be dealt with separately), or places that already had comments about the fall-through but not matching the form expected by -Wimplicit-fallthrough=3 (the default level with -Wextra; my inclination is to adjust those comments to match rather than downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one place where I thought the implicit fallthrough was not correct and so should be handled separately as a bug fix. I think the key thing to consider in review of this patch is whether the fall-through is indeed intended and correct in each place where such a comment is added. Tested for x86_64. * elf/dl-exception.c (_dl_exception_create_format): Add fall-through comments. * elf/ldconfig.c (parse_conf_include): Likewise. * elf/rtld.c (print_statistics): Likewise. * locale/programs/charmap.c (parse_charmap): Likewise. * misc/mntent_r.c (__getmntent_r): Likewise. * posix/wordexp.c (parse_arith): Likewise. (parse_backtick): Likewise. * resolv/ns_ttl.c (ns_parse_ttl): Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.	2019-02-12 10:30:34 +00:00
Andreas Schwab	65f7767a91	Fix handling of collating elements in fnmatch (bug 17396, bug 16976) This fixes the same bug in fnmatch that was fixed by commit `7e2f0d2d77` for regexp matching. As a side effect it also removes the use of an unbound VLA.	2019-02-04 15:45:02 +01:00
H.J. Lu	3f635fb433	x86-64 memcmp: Use unsigned Jcc instructions on size [BZ #24155 ] Since the size argument is unsigned. we should use unsigned Jcc instructions, instead of signed, to check size. Tested on x86-64 and x32, with and without --disable-multi-arch. [BZ #24155] CVE-2019-7309 * NEWS: Updated for CVE-2019-7309. * sysdeps/x86_64/memcmp.S: Use RDX_LP for size. Clear the upper 32 bits of RDX register for x32. Use unsigned Jcc instructions, instead of signed. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp-2. * sysdeps/x86_64/x32/tst-size_t-memcmp-2.c: New test.	2019-02-04 06:31:13 -08:00
H.J. Lu	5165de69c0	x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes strnlen/wcsnlen for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length. Clear the upper 32 bits of RSI register. * sysdeps/x86_64/strlen.S: Use RSI_LP for length. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen and tst-size_t-wcsnlen. * sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file. * sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise.	2019-01-21 11:36:47 -08:00
H.J. Lu	c7c54f65b0	x86-64 strncpy: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes strncpy for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strcpy-avx2.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise. * sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy. * sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file.	2019-01-21 11:35:34 -08:00
H.J. Lu	ee915088a0	x86-64 strncmp family: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes the strncmp family for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/strcmp-avx2.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/strcmp-sse42.S: Likewise. * sysdeps/x86_64/strcmp.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncasecmp, tst-size_t-strncmp and tst-size_t-wcsncmp. * sysdeps/x86_64/x32/tst-size_t-strncasecmp.c: New file. * sysdeps/x86_64/x32/tst-size_t-strncmp.c: Likewise. * sysdeps/x86_64/x32/tst-size_t-wcsncmp.c: Likewise.	2019-01-21 11:34:04 -08:00
H.J. Lu	82d0b4a4d7	x86-64 memset/wmemset: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memset/wmemset for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-wmemset. * sysdeps/x86_64/x32/tst-size_t-memset.c: New file. * sysdeps/x86_64/x32/tst-size_t-wmemset.c: Likewise.	2019-01-21 11:32:37 -08:00
H.J. Lu	ecd8b842cf	x86-64 memrchr: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memrchr for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/memrchr.S: Use RDX_LP for length. * sysdeps/x86_64/multiarch/memrchr-avx2.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memrchr. * sysdeps/x86_64/x32/tst-size_t-memrchr.c: New file.	2019-01-21 11:30:12 -08:00
H.J. Lu	231c56760c	x86-64 memcpy: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memcpy for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise. * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcpy. tst-size_t-wmemchr. * sysdeps/x86_64/x32/tst-size_t-memcpy.c: New file.	2019-01-21 11:27:36 -08:00
H.J. Lu	b304fc201d	x86-64 memcmp/wmemcmp: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memcmp/wmemcmp for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise. * sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp and tst-size_t-wmemcmp. * sysdeps/x86_64/x32/tst-size_t-memcmp.c: New file. * sysdeps/x86_64/x32/tst-size_t-wmemcmp.c: Likewise.	2019-01-21 11:26:07 -08:00
H.J. Lu	97700a34f3	x86-64 memchr/wmemchr: Properly handle the length parameter [BZ# 24097] On x32, the size_t parameter may be passed in the lower 32 bits of a 64-bit register with the non-zero upper 32 bits. The string/memory functions written in assembly can only use the lower 32 bits of a 64-bit register as length or must clear the upper 32 bits before using the full 64-bit register for length. This pach fixes memchr/wmemchr for x32. Tested on x86-64 and x32. On x86-64, libc.so is the same with and withou the fix. [BZ# 24097] CVE-2019-6488 * sysdeps/x86_64/memchr.S: Use RDX_LP for length. Clear the upper 32 bits of RDX register. * sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise. * sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memchr and tst-size_t-wmemchr. * sysdeps/x86_64/x32/test-size_t.h: New file. * sysdeps/x86_64/x32/tst-size_t-memchr.c: Likewise. * sysdeps/x86_64/x32/tst-size_t-wmemchr.c: Likewise.	2019-01-21 11:24:13 -08:00
Leonardo Sandoval	1a153e47fc	x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2 Optimize x86-64 strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2. It uses vector comparison as much as possible. In general, the larger the source string, the greater performance gain observed, reaching speedups of 1.6x compared to SSE2 unaligned routines. Select AVX2 strcat/strncat, strcpy/strncpy and stpcpy/stpncpy on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcat-avx2, strncat-avx2, strcpy-avx2, strncpy-avx2, stpcpy-avx2 and stpncpy-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: (__libc_ifunc_impl_list): Add tests for __strcat_avx2, __strncat_avx2, __strcpy_avx2, __strncpy_avx2, __stpcpy_avx2 and __stpncpy_avx2. * sysdeps/x86_64/multiarch/{ifunc-unaligned-ssse3.h => ifunc-strcpy.h}: rename header for a more generic name. * sysdeps/x86_64/multiarch/ifunc-strcpy.h: (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred. * sysdeps/x86_64/multiarch/stpcpy-avx2.S: New file * sysdeps/x86_64/multiarch/stpncpy-avx2.S: Likewise * sysdeps/x86_64/multiarch/strcat-avx2.S: Likewise * sysdeps/x86_64/multiarch/strcpy-avx2.S: Likewise * sysdeps/x86_64/multiarch/strncat-avx2.S: Likewise * sysdeps/x86_64/multiarch/strncpy-avx2.S: Likewise	2019-01-14 09:43:38 -06:00
Adhemerval Zanella	85c828a462	x86_64: Remove wrong THREAD_ATOMIC_* macros The x86 defines optimized THREAD_ATOMIC_* macros where reference always the current thread instead of the one indicated by input 'descr' argument. It work as long the input is the self thread pointer, however it generates wrong code if the semantic is to set a bit atomicialy from another thread. This is not an issue for current GLIBC usage, however the new cancellation code expects that some synchronization code to atomically set bits from different threads. The generic code generates an additional load to reference to TLS segment, for instance the code: THREAD_ATOMIC_BIT_SET (THREAD_SELF, cancelhandling, CANCELED_BIT); Compiles to: lock;orl $4, %fs:776 Where with patch changes it now compiles to: mov %fs:16,%rax lock;orl $4, 776(%rax) If some usage indeed proves to be a hotspot we can add an extra macro with a more descriptive name (THREAD_ATOMIC_BIT_SET_SELF for instance) where x86_64 might optimize it. Checked on x86_64-linux-gnu. * sysdeps/x86_64/nptl/tls.h (THREAD_ATOMIC_CMPXCHG_VAL, THREAD_ATOMIC_AND, THREAD_ATOMIC_BIT_SET): Remove macros.	2019-01-03 18:38:14 -02:00
Joseph Myers	04277e02d7	Update copyright dates with scripts/update-copyrights. * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.	2019-01-01 00:11:28 +00:00
H.J. Lu	ba4b8fab20	x86-64: Remove s_sincosf-sse2.S The current s_sincosf.c is faster than s_sincosf-sse2.S. On Broadwell with FMA disabled, bench-sincosf shows: Before After Improvement max 154.032 114.517 34% min 6.25 5.609 11% mean 14.8728 12.8589 15% * sysdeps/x86_64/fpu/s_sincosf.S: Removed. * sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.c: New file.	2018-12-26 06:58:31 -08:00
H.J. Lu	9412979a43	Regenerate sysdeps/x86_64/fpu/libm-test-ulps * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.	2018-12-26 06:57:30 -08:00
H.J. Lu	8700a7851b	x86-64: Vectorize sincosf_poly and update s_sincosf-fma.c Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized sincosf_poly. Add x86 sincosf_poly.h to vectorize sincosf_poly. On Broadwell, bench-sincosf shows: Before After Improvement max 160.273 114.198 40% min 6.25 5.625 11% mean 13.0325 10.6462 22% Vectorized sincosf_poly shows Before After Improvement max 138.653 114.198 21% min 5.004 5.625 -11% mean 11.5934 10.6462 9% Tested on x86-64 and i686 as well as with build-many-glibcs.py. * sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>. (sincos_t, sincosf_poly, sinf_poly): Moved to ... * sysdeps/ieee754/flt-32/sincosf_poly.h: Here. New file. * sysdeps/x86/fpu/s_sincosf_data.c: New file. * sysdeps/x86/fpu/sincosf_poly.h: Likewise. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include <sysdeps/ieee754/flt-32/s_sincosf.c>.	2018-12-26 06:56:13 -08:00
H.J. Lu	cd815050e5	x86: Merge i386/x86_64 atomic-machine.h Merge i386 and x86_64 atomic-machine.h to x86 atomic-machine.h. Tested on i686 and x86_64 as well as with build-many-glibcs.py. * sysdeps/i386/atomic-machine.h: Merged with ... * sysdeps/x86_64/atomic-machine.h: To ... * sysdeps/x86/atomic-machine.h: This. New file.	2018-12-18 04:25:26 -08:00
H.J. Lu	c22e4c2a14	x86: Extend CPUID support in struct cpu_features Extend CPUID support for all feature bits from CPUID. Add a new macro, CPU_FEATURE_USABLE, which can be used to check if a feature is usable at run-time, instead of HAS_CPU_FEATURE and HAS_ARCH_FEATURE. Add COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007 and COMMON_CPUID_INDEX_80000008 to check CPU feature bits in them. Tested on i686 and x86-64 as well as using build-many-glibcs.py with x86 targets. * sysdeps/x86/cacheinfo.c (intel_check_word): Updated for cpu_features_basic. (__cache_sysconf): Likewise. (init_cacheinfo): Likewise. * sysdeps/x86/cpu-features.c (get_extended_indeces): Also populate COMMON_CPUID_INDEX_80000007 and COMMON_CPUID_INDEX_80000008. (get_common_indices): Also populate COMMON_CPUID_INDEX_D_ECX_1. Use CPU_FEATURES_CPU_P (cpu_features, XSAVEC) to check if XSAVEC is available. Set the bit_arch_XXX_Usable bits. (init_cpu_features): Use _Static_assert on index_arch_Fast_Unaligned_Load. __get_cpuid_registers and __get_arch_feature. Updated for cpu_features_basic. Set stepping in cpu_features. * sysdeps/x86/cpu-features.h: (FEATURE_INDEX_1): Changed to enum. (FEATURE_INDEX_2): New. (FEATURE_INDEX_MAX): Changed to enum. (COMMON_CPUID_INDEX_D_ECX_1): New. (COMMON_CPUID_INDEX_80000007): Likewise. (COMMON_CPUID_INDEX_80000008): Likewise. (cpuid_registers): Likewise. (cpu_features_basic): Likewise. (CPU_FEATURE_USABLE): Likewise. (bit_arch_XXX_Usable): Likewise. (cpu_features): Use cpuid_registers and cpu_features_basic. (bit_arch_XXX): Reweritten. (bit_cpu_XXX): Likewise. (index_cpu_XXX): Likewise. (reg_XXX): Likewise. * sysdeps/x86/tst-get-cpu-features.c: Include <stdio.h> and <support/check.h>. (CHECK_CPU_FEATURE): New. (CHECK_CPU_FEATURE_USABLE): Likewise. (cpu_kinds): Likewise. (do_test): Print vendor, family, model and stepping. Check HAS_CPU_FEATURE and CPU_FEATURE_USABLE. (TEST_FUNCTION): Removed. Include <support/test-driver.c> instead of "../../test-skeleton.c". * sysdeps/x86_64/multiarch/sched_cpucount.c (__sched_cpucount): Check POPCNT instead of POPCOUNT. * sysdeps/x86_64/multiarch/test-multiarch.c (do_test): Likewise.	2018-12-03 05:54:56 -08:00
Szabolcs Nagy	a502c5294b	Remove the error handling wrapper from pow Introduce new pow symbol version that doesn't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty w_pow.c and enabled for targets with their own pow implementation or ifunc dispatch on __ieee754_pow by including math/w_pow.c. The compatibility symbol version still uses the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously powl was an alias of pow, now it points to the compatibility symbol with the wrapper, because it still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The __pow_finite symbol is now an alias of pow. Both __pow_finite and pow set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header. Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add pow. * math/w_pow_compat.c (__pow_compat): Change to versioned compat symbol. * math/w_pow.c: New file. * sysdeps/i386/fpu/w_pow.c: New file. * sysdeps/ia64/fpu/e_pow.S: Add versioned symbols. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Rename to __pow and add necessary aliases. * sysdeps/ieee754/dbl-64/w_pow.c: New file. * sysdeps/m68k/m680x0/fpu/w_pow.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__ieee754_pow): Rename to __pow. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__ieee754_pow): Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c (__ieee754_pow): Likewise. * sysdeps/x86_64/fpu/multiarch/w_pow.c: New file.	2018-11-21 09:58:36 +00:00
Szabolcs Nagy	f29b7c492d	Remove the error handling wrapper from log Introduce new log symbol version that doesn't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty w_log.c and enabled for targets with their own log implementation by including math/w_log.c. The compatibility symbol version still uses the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously logl was an alias of log, now it points to the compatibility symbol with the wrapper, because it still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The __log_finite symbol is now an alias of log. Both __log_finite and log set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header. Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add log. * math/w_log_compat.c (__log_compat): Change to versioned compat symbol. * math/w_log.c: New file. * sysdeps/i386/fpu/w_log.c: New file. * sysdeps/ia64/fpu/e_log.S: Update. * sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Rename to __log and add necessary aliases. * sysdeps/ieee754/dbl-64/w_log.c: New file. * sysdeps/m68k/m680x0/fpu/w_log.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_log-avx.c (__ieee754_log): Rename to __log. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma4.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/w_log.c: New file.	2018-11-21 09:56:27 +00:00
Szabolcs Nagy	c20a10561a	Remove the error handling wrapper from exp and exp2 Introduce new exp and exp2 symbol version that don't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The double precision wrappers are disabled for sysdeps/ieee754/dbl-64 by using empty w_exp.c and w_exp2.c files, the math/w_exp.c and math/w_exp2.c files use the wrapper template and can be included by targets that have their own exp and exp2 implementations or use ifunc on the glibc internal __ieee754_exp symbol. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously expl and exp2l were aliases of exp and exp2, now they point to the compatibility symbols with the wrapper, because they still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The _finite symbols are now aliases of the standard symbols (they have no performance advantage anymore). Both the standard symbols and _finite symbols set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header (the new macro name is __exp instead of __ieee754_exp which breaks some math.h macros). Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add exp and exp2. * math/w_exp2_compat.c (__exp2_compat): Change to versioned compat symbol, handle NO_LONG_DOUBLE and LONG_DOUBLE_COMPAT explicitly. * math/w_exp_compat.c (__exp_compat): Likewise. * math/w_exp.c: New file. * math/w_exp2.c: New file. * sysdeps/i386/fpu/w_exp.c: New file. * sysdeps/i386/fpu/w_exp2.c: New file. * sysdeps/ia64/fpu/e_exp.S: Add versioned symbols. * sysdeps/ia64/fpu/e_exp2.S: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Rename to __exp and add necessary aliases. * sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Rename to __exp2 and add necessary aliases. * sysdeps/ieee754/dbl-64/w_exp.c: New file. * sysdeps/ieee754/dbl-64/w_exp2.c: New file. * sysdeps/m68k/m680x0/fpu/w_exp.c: New file. * sysdeps/m68k/m680x0/fpu/w_exp2.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp.c (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/w_exp.c: New file.	2018-11-21 09:55:02 +00:00
Martin Sebor	1626a1cfcd	Add support for GCC 9 attribute copy. GCC 9 has gained an enhancement to help detect attribute mismatches between alias declarations and their targets. It consists of a new warning, -Wattribute-alias, an enhancement to an existing warning, -Wmissing-attributes, and a new attribute called copy. The purpose of the warnings is to help identify either possible bugs (an alias declared with more restrictive attributes than its target promises) or optimization or diagnostic opportunities (an alias target missing some attributes that it could be declared with that might benefit analysis and code generation). The purpose of the new attribute is to easily apply (almost) the same set of attributes to one declaration as those already present on another. As expected (and intended) the enhancement triggers warnings for many alias declarations in Glibc code. This change, tested on x86_64-linux, avoids all instances of the new warnings by making use of the attribute where appropriate. To fully benefit from the enhancement Glibc will need to be compiled with -Wattribute-alias=2 and remaining warnings reviewed and dealt with (there are a couple of thousand but most should be straightforward to deal with). ChangeLog: * include/libc-symbols.h (__attribute_copy__): Define macro unless it's already defined. (_strong_alias): Use __attribute_copy__. (_weak_alias, __hidden_ver1, __hidden_nolink2): Same. * misc/sys/cdefs.h (__attribute_copy__): New macro. * sysdeps/x86_64/multiarch/memchr.c (memchr): Use __attribute_copy__. * sysdeps/x86_64/multiarch/memcmp.c (memcmp): Same. * sysdeps/x86_64/multiarch/mempcpy.c (mempcpy): Same. * sysdeps/x86_64/multiarch/memset.c (memset): Same. * sysdeps/x86_64/multiarch/stpcpy.c (stpcpy): Same. * sysdeps/x86_64/multiarch/strcat.c (strcat): Same. * sysdeps/x86_64/multiarch/strchr.c (strchr): Same. * sysdeps/x86_64/multiarch/strcmp.c (strcmp): Same. * sysdeps/x86_64/multiarch/strcpy.c (strcpy): Same. * sysdeps/x86_64/multiarch/strcspn.c (strcspn): Same. * sysdeps/x86_64/multiarch/strlen.c (strlen): Same. * sysdeps/x86_64/multiarch/strncmp.c (strncmp): Same. * sysdeps/x86_64/multiarch/strncpy.c (strncpy): Same. * sysdeps/x86_64/multiarch/strnlen.c (strnlen): Same. * sysdeps/x86_64/multiarch/strpbrk.c (strpbrk): Same. * sysdeps/x86_64/multiarch/strrchr.c (strrchr): Same. * sysdeps/x86_64/multiarch/strspn.c (strspn): Same.	2018-11-09 17:24:12 -07:00
H.J. Lu	72771e5375	x86: Use _rdtsc intrinsic for HP_TIMING_NOW Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for HP_TIMING_NOW. This patch 1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h. 2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h can check minimum x86 ISA to decide if _rdtsc can be used. NB: Checking if __i686__ isn't sufficient since __i686__ may not be defined when building for i686 class processors. * sysdeps/i386/init-arch.h: Removed. * sysdeps/i386/i586/init-arch.h: Likewise. * sysdeps/i386/i686/init-arch.h: Likewise. * sysdeps/i386/i686/hp-timing.h: Likewise. * sysdeps/x86_64/hp-timing.h: Likewise. * sysdeps/i386/isa.h: New file. * sysdeps/i386/i586/isa.h: Likewise. * sysdeps/i386/i686/isa.h: Likewise. * sysdeps/x86_64/isa.h: Likewise. * sysdeps/x86/hp-timing.h: New file. * sysdeps/x86/init-arch.h: Include <isa.h>.	2018-10-17 15:16:45 -07:00
Joseph Myers	7abf97bed9	Use trunc functions not __trunc functions in glibc libm. Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __trunc functions to call the corresponding trunc names instead, with asm redirection to __trunc when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_truncf.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise. * sysdeps/ieee754/float128/s_truncf128.c: Likewise. * sysdeps/ieee754/dbl-64/s_trunc.c: Likewise. * sysdeps/ieee754/flt-32/s_truncf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise. (ceil): Redirect to __ceil. (floor): Redirect to __floor. (trunc): Redirect to __trunc. (__truncl): Call trunc instead of __trunc. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc): Remove macro. [_ARCH_PWR5X] (__truncf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use trunc functions instead of __trunc variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise.	2018-09-20 21:11:10 +00:00
Szabolcs Nagy	424c4f60ed	Add new pow implementation The algorithm is exp(y * log(x)), where log(x) is computed with about 1.32^-68 relative error (1.52^-68 without fma), returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). The __exp1 internal symbol is no longer necessary. There is separate code path when fma is not available but the worst case error is about 0.54 ULP in both cases. The lookup table and consts for log are 4168 bytes. The .rodata+.text is decreased by 37908 bytes on aarch64. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1] pow latency: 1.84x in [0.01 11.1]x[0.01 11.1] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets. * NEWS: Mention pow improvements. * math/Makefile (type-double-routines): Add e_pow_log_data. * sysdeps/generic/math_private.h (__exp1): Remove. * sysdeps/i386/fpu/e_pow_log_data.c: New file. * sysdeps/ia64/fpu/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma contraction. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove. (exp_inline): Remove. (__ieee754_exp): Only single double input is handled. * sysdeps/ieee754/dbl-64/e_pow.c: Rewrite. * sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define. (__pow_log_data): Define. * sysdeps/ieee754/dbl-64/upow.h: Remove. * sysdeps/ieee754/dbl-64/upow.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma contraction. (CFLAGS-e_pow-fma4.c): Likewise.	2018-09-19 10:04:51 +01:00
Joseph Myers	71223ef909	Use ceil functions not __ceil functions in glibc libm. Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __ceil functions to call the corresponding ceil names instead, with asm redirection to __ceil when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_ceilf.c: Likewise. * sysdeps/ieee754/dbl-64/s_ceil.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise. * sysdeps/ieee754/float128/s_ceilf128.c: Likewise. * sysdeps/ieee754/flt-32/s_ceilf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil): Remove macro. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil functions instead of __ceil variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.	2018-09-17 20:42:06 +00:00
Joseph Myers	f29b6f17e4	Use rint functions not __rint functions in glibc libm. Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __rint functions to call the corresponding rint names instead, with asm redirection to __rint when the calls are not inlined. The x86_64 math_private.h is removed as no longer useful after this patch. This patch is relative to a tree with my floor patch <https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied, and much the same considerations arise regarding possibly replacing an IFUNC call with a direct inline expansion. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_rintf.c: Likewise. * sysdeps/alpha/fpu/s_rint.c: Likewise. * sysdeps/alpha/fpu/s_rintf.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/ieee754/dbl-64/s_rint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise. * sysdeps/ieee754/float128/s_rintf128.c: Likewise. * sysdeps/ieee754/flt-32/s_rintf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise. * sysdeps/powerpc/fpu/s_rint.c: Likewise. * sysdeps/powerpc/fpu/s_rintf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Likewise. * sysdeps/riscv/rvf/s_rintf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/math_private.h: Remove file. * math/e_scalb.c (invalid_fn): Use rint functions instead of __rint variants. * math/e_scalbf.c (invalid_fn): Likewise. * math/e_scalbl.c (invalid_fn): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.	2018-09-14 13:10:39 +00:00
Joseph Myers	e44acb2063	Use floor functions not __floor functions in glibc libm. Similar to the changes that were made to call sqrt functions directly in glibc, instead of __ieee754_sqrt variants, so that the compiler could inline them automatically without needing special inline definitions in lots of math_private.h headers, this patch makes libm code call floor functions directly instead of __floor variants, removing the inlines / macros for x86_64 (SSE4.1) and powerpc (POWER5). The redirection used to ensure that __ieee754_sqrt does still get called when the compiler doesn't inline a built-in function expansion is refactored so it can be applied to other functions; the refactoring is arranged so it's not limited to unary functions either (it would be reasonable to use this mechanism for copysign - removing the inline in math_private_calls.h but also eliminating unnecessary local PLT entry use in the cases (powerpc soft-float and e500v1, for IBM long double) where copysign calls don't get inlined). The point of this change is that more architectures can get floor calls inlined where they weren't previously (AArch64, for example), without needing special inline definitions in their math_private.h, and existing such definitions in math_private.h headers can be removed. Note that it's possible that in some cases an inline may be used where an IFUNC call was previously used - this is the case on x86_64, for example. I think the direct calls to floor are still appropriate; if there's any significant performance cost from inline SSE2 floor instead of an IFUNC call ending up with SSE4.1 floor, that indicates that either the function should be doing something else that's faster than using floor at all, or it should itself have IFUNC variants, or that the compiler choice of inlining for generic tuning should change to allow for the possibility that, by not inlining, an SSE4.1 IFUNC might be called at runtime - but not that glibc should avoid calling floor internally. (After all, all the same considerations would apply to any user program calling floor, where it might either be inlined or left as an out-of-line call allowing for a possible IFUNC.) Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (floor): Likewise. * sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_floorf.c: Likewise. * sysdeps/ieee754/dbl-64/s_floor.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise. * sysdeps/ieee754/float128/s_floorf128.c: Likewise. * sysdeps/ieee754/flt-32/s_floorf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor): Remove macro. [_ARCH_PWR5X] (__floorf): Likewise. * sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove inline function. [__SSE4_1__] (__floorf): Likewise. * math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions instead of __floor variants. * math/w_lgamma_r_compat.c (__lgamma_r): Likewise. * math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise. * math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise. * math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise. * math/w_lgammal_r_compat.c (__lgammal_r): Likewise. * math/w_tgamma_compat.c (__tgamma): Likewise. * math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise. * math/w_tgammaf_compat.c (__tgammaf): Likewise. * math/w_tgammal_compat.c (__tgammal): Likewise. * sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise. * sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.	2018-09-14 13:09:01 +00:00
Joseph Myers	4e7fbdd7c2	Remove x86_64 math_private.h asms. The x86_64 math_private.h has asm versions of the macros to reinterpret between floating-point and integer types. This is the sort of thing we now strongly discourage; the expectation in such cases, where the generic C code gives the compiler all the information needed about the required semantics, is that you should get the compiler to do the right thing for the generic C code rather than writing an asm version. Trivial tests showed GCC generates the expected single instructions for reinterpretation from floating point to integer. In the other direction, it goes via memory when the asms don't; I asked about this in GCC bug 87236 and was advised this was deliberate for generic tuning because it was faster that way on some AMD processors (but -mtune=intel, and -Os with the latest GCC, avoid going via memory). The asms don't and can't know about those tuning details, so that's evidence that they are actually making the code worse. This patch removes the asms accordingly. Tested for x86_64. * sysdeps/x86_64/fpu/math_private.h (MOVD): Remove macro. (MOVQ): Likewise. (EXTRACT_WORDS64): Likewise. (INSERT_WORDS64): Likewise. (GET_FLOAT_WORD): Likewise. (SET_FLOAT_WORD): Likewise.	2018-09-11 14:51:40 +00:00
Szabolcs Nagy	e70c176825	Add new exp and exp2 implementations Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update.	2018-09-05 16:22:00 +01:00
Paul Pluzhnikov	a6e8926f8d	[BZ #20271 ] Add newlines in __libc_fatal calls.	2018-08-31 18:04:32 -07:00
Joseph Myers	ff6b24501f	Split fenv_private.h out of math_private.h more consistently. On some architectures, the parts of math_private.h relating to the floating-point environment are in a separate file fenv_private.h included from math_private.h. As this is purely an architecture-specific convention used by several architectures, however, all such architectures still need their own math_private.h, even if it has nothing to do beyond #include <fenv_private.h> and peculiarity of including the i386 file directly instead of having a shared file in sysdeps/x86. This patch makes the fenv_private.h name an architecture-independent convention in glibc. The include of fenv_private.h from math_private.h becomes architecture-independent (until callers are updated to include fenv_private.h directly so the include from math_private.h is no longer needed). Some architecture math_private.h headers are removed if no longer needed, or renamed to fenv_private.h if all they define belongs in that header; architecture fenv_private.h headers now do require #include_next <fenv_private.h>. The i386 fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is actually shared with x86_64. The generic math_private.h gets a new include of <stdbool.h>, as needed for bool in some prototypes in that header (previously that was indirectly included via include/fenv.h, which now only gets included too late in math_private.h, after those prototypes). Tested for x86_64 and x86, and tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by the patch. * sysdeps/aarch64/fpu/fenv_private.h: New file. Based on .... * sysdeps/aarch64/fpu/math_private.h: ... this file. All contents moved to fenv_private.h except for ... (TOINT_INTRINSICS): Kept in math_private.h. (roundtoint): Likewise. (converttoint): Likewise. * sysdeps/arm/fenv_private.h: Change multiple-include guard to [ARM_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/arm/math_private.h: Remove. * sysdeps/generic/fenv_private.h: New file. Contents moved from .... * sysdeps/generic/math_private.h: ... this file. Include <stdbool.h>. Do not include <fenv.h> or <get-rounding-mode.h>. Include <fenv_private.h>. Remove functions and macros moved to fenv_private.h. * sysdeps/i386/fpu/math_private.h: Remove. * sysdeps/mips/math_private.h: Move to .... * sysdeps/mips/fpu/fenv_private.h: ... here. Change multiple-include guard to [MIPS_FENV_PRIVATE_H]. Remove [__mips_hard_float] conditional. Include next <fenv_private.h>. * sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include guard to [POWERPC_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/powerpc/fpu/math_private.h: Do not include <fenv_private.h>. * sysdeps/riscv/rvf/math_private.h: Move to .... * sysdeps/riscv/rvf/fenv_private.h: ... here. Change multiple-include guard to [RISCV_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard to [SPARC_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/sparc/fpu/math_private.h: Remove. * sysdeps/i386/fpu/fenv_private.h: Move to .... * sysdeps/x86/fpu/fenv_private.h: ... here. Change multiple-include guard to [X86_FENV_PRIVATE_H]. Include next <fenv_private.h>. * sysdeps/x86_64/fpu/math_private.h: Do not include <sysdeps/i386/fpu/fenv_private.h>.	2018-08-28 20:48:49 +00:00
Wilco Dijkstra	49acec179c	Fix spaces in x86_64 ULP file Fix a few missing spaces, it's now identical to the regenerated version. Passes GLIBC tests on x64. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.	2018-08-15 12:56:22 +01:00
Wilco Dijkstra	599cf39766	Improve performance of sinf and cosf The second patch improves performance of sinf and cosf using the same algorithms and polynomials. The returned values are identical to sincosf for the same input. ULP definitions for AArch64 and x64 are updated. sinf/cosf througput gains on Cortex-A72: * \|x\| < 0x1p-12 : 1.2x * \|x\| < M_PI_4 : 1.8x * \|x\| < 2 * M_PI: 1.7x * \|x\| < 120.0 : 2.3x * \|x\| < Inf : 3.0x * NEWS: Mention sinf, cosf, sincosf. * sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf. * sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf. * sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of constants rather than including generic sincosf.h. * sysdeps/x86_64/fpu/s_sincosf_data.c: Remove. * sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove. (reduced_cos): Remove. (sinf_poly): New function. * sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.	2018-08-14 10:45:59 +01:00
Joseph Myers	2ce7ba7d15	Move SNAN_TESTS_* out of math-tests.h. Continuing moving macros out of math-tests.h to smaller headers following typo-proof conventions instead of using #ifndef, this patch moves the SNAN_TESTS_* macros for individual types out to their own sysdeps header (while the type-generic SNAN_TESTS wrapper for those macros remains in math-tests.h). Tested for x86_64 and x86, and with build-many-glibcs.py. * sysdeps/generic/math-tests-snan.h: New file. * sysdeps/generic/math-tests.h: Include <math-tests-snan.h>. (SNAN_TESTS_float): Do not define here. (SNAN_TESTS_double): Likewise. (SNAN_TESTS_long_double): Likewise. (SNAN_TESTS_float128): Likewise. * sysdeps/i386/fpu/math-tests-snan.h: New file. * sysdeps/i386/fpu/math-tests.h: Remove file. * sysdeps/ia64/math-tests-snan.h: New file. * sysdeps/ia64/math-tests.h: Remove file. * sysdeps/x86/math-tests.h: Likewise. * sysdeps/x86_64/fpu/math-tests-snan.h: New file.	2018-08-10 19:22:01 +00:00
Wilco Dijkstra	ea5c662c62	Improve performance of sincosf This patch is a complete rewrite of sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.5607, maximum relative error is 0.5303 * 2^-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary. sincosf throughput gains on Cortex-A72: * \|x\| < 0x1p-12 : 1.6x * \|x\| < M_PI_4 : 1.7x * \|x\| < 2 * M_PI: 1.5x * \|x\| < 120.0 : 1.8x * \|x\| < Inf : 2.3x * math/Makefile: Add s_sincosf_data.c. * sysdeps/ia64/fpu/s_sincosf_data.c: New file. * sysdeps/ieee754/flt-32/s_sincosf.h (abstop12): Add new function. (sincosf_poly): Likewise. (reduce_small): Likewise. (reduce_large): Likewise. * sysdeps/ieee754/flt-32/s_sincosf.c (sincosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf_data.c: New file with sincosf data. * sysdeps/m68k/m680x0/fpu/s_sincosf_data.c: New file. * sysdeps/x86_64/fpu/s_sincosf_data.c: New file.	2018-08-10 17:34:39 +01:00
Ilya Leoshkevich	8d997d2253	Move __fentry__ version definition to sysdeps/{i386,x86_64} __fentry__ symbol is currently not defined for other architectures. Attempts to introduce it cause abicheck to fail, because it will be available since 2.29 earliest, and not 2.13, which is the case for Intel. With the new code, abicheck passes for i686-linux-gnu, x86_64-linux-gnu and x86_64-linux-gnu32 triples. ChangeLog: * stdlib/Versions: Remove __fentry__. * sysdeps/i386/Versions: Add __fentry__. * sysdeps/x86_64/Versions: Add __fentry__.	2018-08-10 09:07:44 +02:00
H.J. Lu	fb4c32aef6	x86: Move STATE_SAVE_OFFSET/STATE_SAVE_MASK to sysdep.h Move STATE_SAVE_OFFSET and STATE_SAVE_MASK to sysdep.h to make sysdeps/x86/cpu-features.h a C header file. * sysdeps/x86/cpu-features.h (STATE_SAVE_OFFSET): Removed. (STATE_SAVE_MASK): Likewise. Don't check __ASSEMBLER__ to include <cpu-features-offsets.h>. * sysdeps/x86/sysdep.h (STATE_SAVE_OFFSET): New. (STATE_SAVE_MASK): Likewise. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features-offsets.h> instead of <cpu-features.h>.	2018-08-06 06:25:43 -07:00
H.J. Lu	430388d5dc	x86: Don't include <init-arch.h> in assembly codes There is no need to include <init-arch.h> in assembly codes since all x86 IFUNC selector functions are written in C. Tested on i686 and x86-64. There is no code change in libc.so, ld.so and libmvec.so. * sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include <init-arch.h>. * sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise. * sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.	2018-08-03 08:05:00 -07:00

1 2 3 4 5 ...

1453 Commits