Commit Graph

1385 Commits

Author SHA1 Message Date
Florian Weimer
60d5e40ab2 x86: Remove low-level lock optimization
The current approach is to do this optimizations at a higher level,
in generic code, so that single-threaded cases can be specifically
targeted.

Furthermore, using IS_IN (libc) as a compile-time indicator that
all locks are private is no longer correct once process-shared lock
implementations are moved into libc.

The generic <lowlevellock.h> is not compatible with assembler code
(obviously), so it's necessary to remove two long-unused #includes.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-21 19:49:51 +02:00
Szabolcs Nagy
2208066603 elf: Remove lazy tlsdesc relocation related code
Remove generic tlsdesc code related to lazy tlsdesc processing since
lazy tlsdesc relocation is no longer supported.  This includes removing
GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at
load time when that lock is already held.

Added a documentation comment too.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-21 14:35:53 +01:00
Noah Goldstein
aaa23c3507 x86: Optimize strlen-avx2.S
No bug. This commit optimizes strlen-avx2.S. The optimizations are
mostly small things but they add up to roughly 10-30% performance
improvement for strlen. The results for strnlen are bit more
ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen
are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 18:03:49 -07:00
Noah Goldstein
4ba6558684 x86: Optimize strlen-evex.S
No bug. This commit optimizes strlen-evex.S. The
optimizations are mostly small things but they add up to roughly
10-30% performance improvement for strlen. The results for strnlen are
bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and
test-wcsnlen are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 18:03:49 -07:00
Noah Goldstein
f53790272c x86: Optimize less_vec evex and avx512 memset-vec-unaligned-erms.S
No bug. This commit adds optimized cased for less_vec memset case that
uses the avx512vl/avx512bw mask store avoiding the excessive
branches. test-memset and test-wmemset are passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 15:08:04 -07:00
H.J. Lu
83c5b36822 x86-64: Require BMI2 for strchr-avx2.S
Since strchr-avx2.S updated by

commit 1f745ecc21
Author: noah <goldstein.w.n@gmail.com>
Date:   Wed Feb 3 00:38:59 2021 -0500

    x86-64: Refactor and improve performance of strchr-avx2.S

uses sarx:

c4 e2 72 f7 c0       	sarx   %ecx,%eax,%eax

for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and
ifunc-avx2.h.
2021-04-19 11:01:45 -07:00
H.J. Lu
55bf411b45 x86-64: Require BMI2 for __strlen_evex and __strnlen_evex
Since __strlen_evex and __strnlen_evex added by

commit 1fd8c163a8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 5 06:24:52 2021 -0800

    x86-64: Add ifunc-avx2.h functions with 256-bit EVEX

use sarx:

c4 e2 6a f7 c0       	sarx   %edx,%eax,%eax

require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c.
ifunc-avx2.h already requires BMI2 for EVEX implementation.
2021-04-19 07:51:33 -07:00
noah
1a8605b6cd x86: Update large memcpy case in memmove-vec-unaligned-erms.S
No Bug. This commit updates the large memcpy case (no overlap). The
update is to perform memcpy on either 2 or 4 contiguous pages at
once. This 1) helps to alleviate the affects of false memory aliasing
when destination and source have a close 4k alignment and 2) In most
cases and for most DRAM units is a modestly more efficient access
pattern. These changes are a clear performance improvement for
VEC_SIZE =16/32, though more ambiguous for VEC_SIZE=64. test-memcpy,
test-memccpy, test-mempcpy, test-memmove, and tst-memmove-overflow all
pass.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-16 10:06:56 -07:00
Szabolcs Nagy
55c9f32380 x86_64: Remove lazy tlsdesc relocation related code
_dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for
lazy tlsdesc relocation processing which is no longer supported.
2021-04-15 09:47:47 +01:00
Szabolcs Nagy
8f7e09f4db x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c
for aarch64, but it fixes a different race: bug 27137.

Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
relocate tlsdesc lazily, but that does not work in a BIND_NOW module
due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
load time fixes this bug 27721 too.
2021-04-15 09:47:37 +01:00
Paul Zimmermann
43576de04a Improve the accuracy of tgamma (BZ #26983)
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).

Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-07 13:23:39 +02:00
Paul Zimmermann
9acda61d94 Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.

The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average.  Two different algorithms are used:

* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
  degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)

* for large inputs, an asymptotic formula from [1] is used

[1] Fast and Accurate Bessel Function Computation,
    John Harrison, Proceedings of Arith 19, 2009.

Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).

Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-02 06:15:48 +02:00
Sunil K Pandey
595c22ecd8 x86-64: Fix ifdef indentation in strlen-evex.S
Fix some indentations of ifdef in file strlen-evex.S which are off by 1
and confusing to read.
2021-04-01 16:13:33 -07:00
H.J. Lu
b1ec623ed5 x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591]
config/i386/constraints.md in GCC has

(define_constraint "e"
  "32-bit signed integer constant, or a symbolic reference known
   to fit that range (for immediate operands in sign-extending x86-64
   instructions)."
  (match_operand 0 "x86_64_immediate_operand"))

Since movq takes a signed 32-bit immediate or a register source operand,
use "er", instead of "nr"/"ir", constraint for 32-bit signed integer
constant or register on movq.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-04-01 07:00:22 -07:00
H.J. Lu
e4fda46310 x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
4e2d8f3527 x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
7ebba91361 x86-64: Add AVX optimized string/memory functions for RTM
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with

	xtest
	jz	1f
	vzeroall
	ret
1:
	vzeroupper
	ret

at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
2021-03-29 07:40:17 -07:00
H.J. Lu
91264fe357 x86-64: Add memcmp family functions with 256-bit EVEX
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
1b968b6b9b x86-64: Add memset family functions with 256-bit EVEX
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
63ad43566f x86-64: Add memmove family functions with 256-bit EVEX
Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
525bc2a32c x86-64: Add strcpy family functions with 256-bit EVEX
Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
1fd8c163a8 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.

For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.
2021-03-29 07:40:17 -07:00
H.J. Lu
3e2f285c5f nptl: Remove MULTI_PAGE_ALIASING [BZ #23554]
MULTI_PAGE_ALIASING was introduced to mitigate an aliasing issue on
Pentium 4.  It is no longer needed for processors after Pentium 4.
2021-03-19 15:04:17 -07:00
Wilco Dijkstra
47ad14d789 math: Remove mpa files [BZ #15267]
Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Wilco Dijkstra
db3f7bb558 math: Remove slow paths from asin and acos [BZ #15267]
This patch series removes all remaining slow paths and related code.
First asin/acos, tan, atan, atan2 implementations are updated, and the final
patch removes the unused mpa files, headers and probes. Passes buildmanyglibc.

Remove slow paths from asin/acos. Add ULP annotations based on previous slow
path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Paul Zimmermann
5a051454a9 Add inputs that generate larger error bounds
(Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)
2021-02-27 06:32:11 +01:00
Florian Weimer
035c012e32 Reduce the statically linked startup code [BZ #23323]
It turns out the startup code in csu/elf-init.c has a perfect pair of
ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A
New Method to Bypass 64-bit Linux ASLR").  These functions are not
needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY
are already processed by the dynamic linker.  However, the dynamic
linker skipped the main program for some reason.  For maximum
backwards compatibility, this is not changed, and instead, the main
map is consulted from __libc_start_main if the init function argument
is a NULL pointer.

For statically linked binaries, the old approach based on linker
symbols is still used because there is nothing else available.

A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because
new binaries running on an old libc would not run their ELF
constructors, leading to difficult-to-debug issues.
2021-02-25 12:13:02 +01:00
H.J. Lu
89de9d3958 x86: Use x86/nptl/pthreaddef.h
1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h.
2. Remove sysdeps/x86_64/nptl/pthreaddef.h.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-02-22 15:52:56 -08:00
noah
1f745ecc21 x86-64: Refactor and improve performance of strchr-avx2.S
No bug. Just seemed the performance could be improved a bit. Observed
and expected behavior are unchanged. Optimized body of main
loop. Updated page cross logic and optimized accordingly. Made a few
minor instruction selection modifications. No regressions in test
suite. Both test-strchrnul and test-strchr passed.
2021-02-08 11:21:33 -08:00
Sajan Karumanchi
6e02b3e932 x86: Adding an upper bound for Enhanced REP MOVSB.
In the process of optimizing memcpy for AMD machines, we have found the
vector move operations are outperforming enhanced REP MOVSB for data
transfers above the L2 cache size on Zen3 architectures.
To handle this use case, we are adding an upper bound parameter on
enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'.
As per large-bench results, we are configuring this parameter to the
L2 cache size for AMD machines and applicable from Zen3 architecture
supporting the ERMS feature.
For architectures other than AMD, it is the computed value of
non-temporal threshold parameter.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
2021-02-02 12:42:15 +01:00
Szabolcs Nagy
374cef32ac configure: Check for static PIE support
Add SUPPORT_STATIC_PIE that targets can define if they support
static PIE. This requires PI_STATIC_AND_HIDDEN support and various
linker features as described in

  commit 9d7a3741c9
  Add --enable-static-pie configure option to build static PIE [BZ #19574]

Currently defined on x86_64, i386 and aarch64 where static PIE is
known to work.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-21 15:54:50 +00:00
H.J. Lu
ff6d62e9ed <sys/platform/x86.h>: Remove the C preprocessor magic
In <sys/platform/x86.h>, define CPU features as enum instead of using
the C preprocessor magic to make it easier to wrap this functionality
in other languages.  Move the C preprocessor magic to internal header
for better GCC codegen when more than one features are checked in a
single expression as in x86-64 dl-hwcaps-subdirs.c.

1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX.
2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h.
3. Remove struct cpu_features and __x86_get_cpu_features from
<sys/platform/x86.h>.
4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it
in libc.
5. Make __get_cpu_features() private to glibc.
6. Replace __x86_get_cpu_features(N) with __get_cpu_features().
7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE.
8. Use a single enum index for each CPU feature detection.
9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf.
10. Return zero struct cpuid_feature for the older glibc binary with a
smaller CPUID_INDEX_MAX [BZ #27104].
11. Inside glibc, use the C preprocessor magic so that cpu_features data
can be loaded just once leading to more compact code for glibc.

256 bits are used for each CPUID leaf.  Some leaves only contain a few
features.  We can add exceptions to such leaves.  But it will increase
code sizes and it is harder to provide backward/forward compatibilities
when new features are added to such leaves in the future.

When new leaves are added, _rtld_global_ro offsets will change which
leads to race condition during in-place updates. We may avoid in-place
updates by

1. Rename the old glibc.
2. Install the new glibc.
3. Remove the old glibc.

NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy
relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
2021-01-21 05:58:17 -08:00
H.J. Lu
0ec583d926 libmvec: Add extra-test-objs to test-extras
Add extra-test-objs to test-extras so that they are compiled with
-DMODULE_NAME=testsuite instead of -DMODULE_NAME=libc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-19 06:20:46 -08:00
Adhemerval Zanella
d18f59bf92 Fix x86 build with --enable-tunable=no
Checked on x86_64-linux-gnu.
2021-01-14 16:04:05 -03:00
H.J. Lu
ecce11aa07 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717]
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with
GNU_PROPERTY_X86_ISA_1_V[234] marker:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13

Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by

commit b0ab06937385e0ae25cebf1991787d64f439bf12
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Oct 30 06:49:57 2020 -0700

    x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker

and

commit 32930e4edbc06bc6f10c435dbcc63131715df678
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Oct 9 05:05:57 2020 -0700

    x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker

GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the
micro-architecture ISA level required to execute the binary.  The marker
must be added by programmers explicitly in one of 3 ways:

1. Pass -mneeded to GCC.
2. Add the marker in the linker inputs as this patch does.
3. Pass -z x86-64-v[234] to the linker.

Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker support to ld.so if binutils 2.32 or newer is used to build glibc:

1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
markers to elf.h.
2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker to abi-note.o based on the ISA level used to compile abi-note.o,
assuming that the same ISA level is used to compile the whole glibc.
3. Add isa_1 to cpu_features to record the supported x86 ISA level.
4. Rename _dl_process_cet_property_note to _dl_process_property_note and
add GNU_PROPERTY_X86_ISA_1_V[234] marker detection.
5. Update _rtld_main_check and _dl_open_check to check loaded objects
with the incompatible ISA level.
6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails
on lesser platforms.
7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c.

Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and
x86-64-v4 machines.

Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine
and got:

[hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1
./elf/tst-isa-level-1: CPU ISA level is lower than required
[hjl@gnu-cfl-2 build-x86_64-linux]$
2021-01-07 13:10:13 -08:00
Wilco Dijkstra
9e97f239ea Remove dbl-64/wordsize-64 (part 2)
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory.  The second patch just moves all wordsize-64 files and removes a
few wordsize-64 uses in comments and Implies files.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-07 15:26:26 +00:00
H.J. Lu
6ea5b57afa x86: Check IFUNC definition in unrelocated executable [BZ #20019]
Calling an IFUNC function defined in unrelocated executable also leads to
segfault.  Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.
2021-01-04 12:01:01 -08:00
H.J. Lu
3ec5d83d2a x86-64: Avoid rep movsb with short distance [BZ #27130]
When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow.  This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:

	cmpl	$63, %ecx
	// Don't use "rep movsb" if ECX <= 63
	jbe	L(Don't use rep movsb")
	Use "rep movsb"

Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.
2021-01-04 07:58:57 -08:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Siddhesh Poyarekar
94547d9209 x86 long double: Support pseudo numbers in isnanl
This syncs up isnanl behaviour with gcc.  Also move the isnanl
implementation to sysdeps/x86 and remove the sysdeps/x86_64 version.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-24 06:05:40 +05:30
Siddhesh Poyarekar
b7f8815617 x86 long double: Support pseudo numbers in fpclassifyl
Also move sysdeps/i386/fpu/s_fpclassifyl.c to
sysdeps/x86/fpu/s_fpclassifyl.c and remove
sysdeps/x86_64/fpu/s_fpclassifyl.c

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-24 06:05:26 +05:30
Paul Zimmermann
cad5ad81d2 add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64) 2020-12-21 10:35:20 +05:30
Anssi Hannula
69a7ca7705 ieee754: Remove unused __sin32 and __cos32
The __sin32 and __cos32 functions were only used in the now removed slow
path of asin and acos.
2020-12-18 12:10:31 +05:30
Florian Weimer
f267e1c9dd x86_64: Add glibc-hwcaps support
The subdirectories match those in the x86-64 psABI:

77566eb03b

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-04 09:36:02 +01:00
Jakub Jelinek
1d9cbb9608 x86: Fix THREAD_SELF definition to avoid ld.so crash (bug 27004)
The previous definition of THREAD_SELF did not tell the compiler
that %fs (or %gs) usage is invalid for the !DL_LOOKUP_GSCOPE_LOCK
case in _dl_lookup_symbol_x.  As a result, ld.so could try to use the
TCB before it was initialized.

As the comment in tls.h explains, asm volatile is undesirable here.
Using the __seg_fs (or __seg_gs) namespace does not interfere with
optimization, and expresses that THREAD_SELF is potentially trapping.
2020-12-03 13:48:55 +01:00
Florian Weimer
1daccf403b nptl: Move stack list variables into _rtld_global
Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT,
formerly __wait_lookup_done) can be implemented directly in ld.so,
eliminating the unprotected GL (dl_wait_lookup_done) function
pointer.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-11-16 19:33:30 +01:00
Adhemerval Zanella
01bd62517c Remove tls.h inclusion from internal errno.h
The tls.h inclusion is not really required and limits possible
definition on more arch specific headers.

This is a cleanup to allow inline functions on sysdep.h, more
specifically on i386 and ia64 which requires to access some tls
definitions its own.

No semantic changes expected, checked with a build against all
affected ABIs.
2020-11-13 12:59:19 -03:00
Florian Weimer
0f34d426ac x86: Remove UP macro. Define LOCK_PREFIX unconditionally.
The UP macro is never defined.  Also define LOCK_PREFIX
unconditionally, to the same string.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-11-13 15:20:03 +01:00
Samuel Thibault
3d3316b1de hurd: keep only required PLTs in ld.so
We need NO_RTLD_HIDDEN because of the need for PLT calls in ld.so.
See Roland's comment in
https://sourceware.org/bugzilla/show_bug.cgi?id=15605
"in the Hurd it's crucial that calls like __mmap be the libc ones
instead of the rtld-local ones after the bootstrap phase, when the
dynamic linker is being used for dlopen and the like."

We used to just avoid all hidden use in the rtld ; this commit switches to
keeping only those that should use PLT calls, i.e. essentially those defined in
sysdeps/mach/hurd/dl-sysdep.c:

__assert_fail
__assert_perror_fail
__*stat64
_exit

This fixes a few startup issues, notably the call to __tunable_get_val that is
made before PLTs are set up.
2020-11-11 02:36:22 +01:00
H.J. Lu
0f09154c64 x86: Initialize CPU info via IFUNC relocation [BZ 26203]
X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.

Key points:

1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch.  Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
2020-10-16 16:17:53 -07:00
Szabolcs Nagy
238032ead6 aarch64: enforce >=64K guard size [BZ #26691]
There are several compiler implementations that allow large stack
allocations to jump over the guard page at the end of the stack and
corrupt memory beyond that. See CVE-2017-1000364.

Compilers can emit code to probe the stack such that the guard page
cannot be skipped, but on aarch64 the probe interval is 64K by default
instead of the minimum supported page size (4K).

This patch enforces at least 64K guard on aarch64 unless the guard
is disabled by setting its size to 0.  For backward compatibility
reasons the increased guard is not reported, so it is only observable
by exhausting the address space or parsing /proc/self/maps on linux.

On other targets the patch has no effect. If the stack probe interval
is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can
be defined to get large enough stack guard on libc allocated stacks.

The patch does not affect threads with user allocated stacks.

Fixes bug 26691.
2020-10-02 09:57:44 +01:00
Florian Weimer
90ccfdf176 x86: Use one ldbl2mpn.c file for both i386 and x86_64 2020-09-22 17:58:39 +02:00
H.J. Lu
9620398097 x86: Install <sys/platform/x86.h> [BZ #26124]
Install <sys/platform/x86.h> so that programmers can do

 #if __has_include(<sys/platform/x86.h>)
 #include <sys/platform/x86.h>
 #endif
 ...

   if (CPU_FEATURE_USABLE (SSE2))
 ...
   if (CPU_FEATURE_USABLE (AVX2))
 ...

<sys/platform/x86.h> exports only:

enum
{
  COMMON_CPUID_INDEX_1 = 0,
  COMMON_CPUID_INDEX_7,
  COMMON_CPUID_INDEX_80000001,
  COMMON_CPUID_INDEX_D_ECX_1,
  COMMON_CPUID_INDEX_80000007,
  COMMON_CPUID_INDEX_80000008,
  COMMON_CPUID_INDEX_7_ECX_1,
  /* Keep the following line at the end.  */
  COMMON_CPUID_INDEX_MAX
};

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
};

/* Get a pointer to the CPU features structure.  */
extern const struct cpu_features *__x86_get_cpu_features
  (unsigned int max) __attribute__ ((const));

Since all feature checks are done through macros, programs compiled with
a newer <sys/platform/x86.h> are compatible with the older glibc binaries
as long as the layout of struct cpu_features is identical.  The features
array can be expanded with backward binary compatibility for both .o and
.so files.  When COMMON_CPUID_INDEX_MAX is increased to support new
processor features, __x86_get_cpu_features in the older glibc binaries
returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the
new processor feature.  No new symbol version is neeeded.

Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided.  HAS_CPU_FEATURE
can be used to identify processor features.

Note: Although GCC has __builtin_cpu_supports, it only supports a subset
of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE.  It
doesn't support HAS_CPU_FEATURE.
2020-09-11 17:20:52 -07:00
Ondřej Hošek
23af890b3f x86-64: Fix FMA4 detection in ifunc [BZ #26534]
A typo in commit 107e6a3c22 causes the
FMA4 code path to be taken on systems that support FMA, even if they do
not support FMA4. Fix this to detect FMA4.
2020-09-02 05:07:37 -07:00
Adhemerval Zanella
5ff35e9544 math: Update x86_64 ulps
From new j0 test.
2020-08-08 16:43:11 -03:00
Andreas K. Hüttel
180d5a045f Update x86-64 libm-test-ulps
x86_64 Intel(R) Core(TM) i5-8265U
gcc (Gentoo 10.1.0-r2 p3) 10.1.0
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-07-25 17:10:53 -04:00
H.J. Lu
107e6a3c22 x86: Support usable check for all CPU features
Support usable check for all CPU features with the following changes:

1. Change struct cpu_features to

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};

so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits.  EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.

The results are

1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
2020-07-13 06:05:16 -07:00
H.J. Lu
9016b6f389 x86: Remove the unused __x86_prefetchw
Since

commit c867597bff
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jun 8 13:57:50 2016 -0700

    X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove

removed the only usage of __x86_prefetchw, we can remove the unused
__x86_prefetchw.
2020-07-11 09:34:03 -07:00
H.J. Lu
3f4b61a0b8 x86: Add thresholds for "rep movsb/stosb" to tunables
Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables
to update thresholds for "rep movsb" and "rep stosb" at run-time.

Note that the user specified threshold for "rep movsb" smaller than
the minimum threshold will be ignored.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-07-06 11:48:42 -07:00
Adhemerval Zanella
b24381e50f i386: Use builtin sqrtl
Checked on i686-linux-gnu.
2020-06-22 11:09:49 -03:00
Adhemerval Zanella
d19d25dd06 x86_64: Use builtin sqrt{f,l}
Checked on x86_64-linux-gnu.
2020-06-22 11:09:49 -03:00
Sunil K Pandey
75870237ff Fix avx2 strncmp offset compare condition check [BZ #25933]
strcmp-avx2.S: In avx2 strncmp function, strings are compared in
chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4
vector size comparison, code must check whether it already passed
the given offset. This patch implement avx2 offset check condition
for strncmp function, if both string compare same for first 4 vector
size.
2020-06-17 07:07:38 -07:00
H.J. Lu
a35a59036e x86_64: Use %xmmN with vpxor to clear a vector register
Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use
%xmmN, instead of %ymmN, with vpxor to clear a vector register.
2020-06-17 05:44:02 -07:00
Vineet Gupta
8dbb7a08ec dl-runtime: reloc_{offset,index} now functions arch overide'able
The existing macros are fragile and expect local variables with a
certain name. Fix this by defining them as functions with default
implementation in a new header dl-runtime.h which arches can override
if need be.

This came up during ARC port review, hence the need for argument pltgot
in reloc_index() which is not needed by existing ports.

This patch potentially only affects hppa/x86 ports,
build tested for both those configs and a few more.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-06-05 13:45:46 -07:00
Florian Weimer
501bdb5dd6 Linux: Remove remnants of the getcpu cache
The getcpu cache was removed from the kernel in Linux 2.6.24.  glibc
support from the sched_getcpu implementation was removed in commit
dd26c44403 ("Consolidate sched_getcpu").
2020-05-16 15:47:51 +02:00
H.J. Lu
55c7bcc71b x86-64: Use RDX_LP on __x86_shared_non_temporal_threshold [BZ #25966]
Since __x86_shared_non_temporal_threshold is defined as

long int __x86_shared_non_temporal_threshold;

and long int is 4 bytes for x32, use RDX_LP to compare against
__x86_shared_non_temporal_threshold in assembly code.
2020-05-09 12:28:15 -07:00
Joseph Myers
dbb188dd87 Remove unused floating-point configuration from gmp-impl.h.
This patch removes the IEEE_DOUBLE_BIG_ENDIAN and
IEEE_DOUBLE_MIXED_ENDIAN macros from gmp-impl.h and gmp-mparam.h, and
the ieee_double_extract union from gmp-impl.h.  The macros were used
only in defining the union, which was used nowhere in glibc.  As GMP's
gmp-impl.h is over 5000 lines, the file in glibc is so far from the
GMP version that it doesn't seem to make sense to keep things there
that are not relevant in glibc.  (I expect there is plenty more in the
header after this patch that is also not relevant in glibc and can be
cleaned up later.)

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.
2020-04-28 15:05:09 +00:00
Adhemerval Zanella
f721171632 Revert "x86_64: Add SSE sfp-exceptions"
The __sfp_handle_exceptions is not fully correct regarding raising
exceptions, since there is no direct way to raise only FP_EX_OVERFLOW
nor FP_EX_UNDERFLOW for SSE mode.  Both libgcc and feraiseexcept rely
on x87 mode to accomplish it.

This reverts commit 460ee50de0.

Checked on x86_64.
2020-04-20 14:56:05 -03:00
Adhemerval Zanella
460ee50de0 x86_64: Add SSE sfp-exceptions
The exported x86_64 fenv.h functions operate on both i387 and SSE (since
they should work on both float, double, and long double) while the
internal libc_fe* set either SSE (float, double, and float128) or
i387 (long double).

The libgcc __sfp_handle_exceptions (used on float128 implementation),
however, will set either SEE or i387 exception depending of the
exception to raise.  This broke the internal assumption of float128
where only SSE operations will be used.

This patch reimplements the libgcc __sfp_handle_exceptions to use only
SSE operations and sets libgcc to use it instead of its own
implementation.

And I think we should fix libgcc in a similar manner, since checking on
config/i386/64/sfp-machine.h it already only supports SSE rounding mode
and x86_64 ABI also expectes float128 to use SSE registers [1]
(although it is not clear on how future implementation might implement
it).

Checked on x86_64-linux-gnu.

[1] https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI
2020-04-17 11:42:29 -03:00
Adhemerval Zanella
17fd707f88 nptl: Remove x86_64 cancellation assembly implementations [BZ #25765]
All cancellable syscalls are done by C implementations, so there is no
no need to use a specialized implementation to optimize register usage.

It fixes BZ #25765.

Checked on x86_64-linux-gnu.
2020-04-03 10:47:59 -03:00
Paul Zimmermann
a9d42c09a3 math: Add inputs that yield larger errors for float type (x86_64)
The corner cases included were generated using exhaustive search
for all float/binary32 values on x86_64 (comparing to MPFR for
correct rounding to nearest).

For the j0/j1/y0 functions, only cases with ulp error <= 9 were
included.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-03-31 21:48:54 -04:00
Adhemerval Zanella
1c15464ca0 math: Remove inline math tests
With mathinline removal there is no need to keep building and testing
inline math tests.

The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries.  The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2020-03-19 11:45:44 -03:00
Florian Weimer
fe49a73316 x86: Avoid single-argument _Static_assert in <tls.h>
Older GCC versions do not support this extension.  Fixes commit f1bdee6179
("x86 tls: Use _Static_assert for TLS access size assertion").
2020-02-17 11:12:03 +01:00
Samuel Thibault
f1bdee6179 x86 tls: Use _Static_assert for TLS access size assertion 2020-02-17 00:40:39 +01:00
Florian Weimer
3a0ecccb59 ld.so: Do not export free/calloc/malloc/realloc functions [BZ #25486]
Exporting functions and relying on symbol interposition from libc.so
makes the choice of implementation dependent on DT_NEEDED order, which
is not what some compiler drivers expect.

This commit replaces one magic mechanism (symbol interposition) with
another one (preprocessor-/compiler-based redirection).  This makes
the hand-over from the minimal malloc to the full malloc more
explicit.

Removing the ABI symbols is backwards-compatible because libc.so is
always in scope, and the dynamic loader will find the malloc-related
symbols there since commit f0b2132b35
("ld.so: Support moving versioned symbols between sonames
[BZ #24741]").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-02-15 11:01:23 +01:00
Adhemerval Zanella
fcb78a5505 linux: Consolidate INLINE_SYSCALL
With all Linux ABIs using the expected Linux kABI to indicate
syscalls errors, there is no need to replicate the INLINE_SYSCALL.

The generic Linux sysdep.h includes errno.h even for !__ASSEMBLER__,
which is ok now and it allows cleanup some archaic code that assume
otherwise.

Checked with a build against all affected ABIs.
2020-02-14 21:09:12 -03:00
Wilco Dijkstra
220622dde5 Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol.  It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).

The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.

Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).

Passes buildmanyglibc.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-03 10:02:04 -03:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Stefan Liebler
1c94bf0f0a Always use wordsize-64 version of s_trunc.c.
This patch replaces s_trunc.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:14 +01:00
Stefan Liebler
9f234eafe8 Always use wordsize-64 version of s_ceil.c.
This patch replaces s_ceil.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:13 +01:00
Stefan Liebler
95b0c2c431 Always use wordsize-64 version of s_floor.c.
This patch replaces s_floor.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
ab48bdd098 Always use wordsize-64 version of s_rint.c.
This patch replaces s_rint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
af123aa950 Always use wordsize-64 version of s_nearbyint.c.
This patch replaces s_nearbyint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:11 +01:00
Florian Weimer
4db71d2f98 elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214]
This commit adds missing skip_ifunc checks to aarch64, arm, i386,
sparc, and x86_64.  A new test case ensures that IRELATIVE IFUNC
resolvers do not run in various diagnostic modes of the dynamic
loader.

Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>
2019-12-02 14:55:22 +01:00
Adhemerval Zanella
48dbce60cf nptl: Add tests for internal pthread_rwlock_t offsets
This patch new build tests to check for internal fields offsets for
internal pthread_rwlock_t definition.  Althoug the '__data.__flags'
field layout should be preserved due static initializators, the patch
also adds tests for the futexes that may be used in a shared memory
(although using different libc version in such scenario is not really
supported).

Checked with a build against all affected ABIs.

Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
71d260c107 nptl: Cleanup mutex internal offset tests
The offsets of pthread_mutex_t __data.__nusers, __data.__spins,
__data.elision, __data.list are not required to be constant over
the releases.  Only the __data.__kind is used for static
initializers.

This patch also adds an additional size check for __data.__kind.

Checked with a build against affected ABIs.

Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f
2019-11-26 13:53:36 +00:00
Wilco Dijkstra
d0007dc53c Remove x64 _finite tests and references
Remove _finite tests and references from x86_64.  Rather than calling
__exp_finite, use exp directly (since it's the same entry point).

x86_64 builds and passes testsuite.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-10-21 14:29:12 -03:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Wilco Dijkstra
3c05dd79d0 Use generic memset/memcpy/memmove in benchtests
Use the generic C memset/memcpy/memmove in benchtests since comparing
against a slow byte-oriented implementation makes no sense.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

2019-08-29  Wilco Dijkstra  <wdijkstr@arm.com>

	* benchtests/bench-memcpy.c (simple_memcpy): Remove.
	(generic_memcpy): Include generic C memcpy.
	* benchtests/bench-memmove.c (simple_memmove): Remove.
	(generic_memmove): Include generic C memmove.
	* benchtests/bench-memset.c (simple_memset): Remove.
	(generic_memset): Include generic C memset.
	* benchtests/bench-memset-large.c (simple_memset): Remove.
	(generic_memset): Include generic C memset.
	* benchtests/bench-memset-walk.c (simple_memset): Remove.
	(generic_memset): Include generic C memset.
	* string/memcpy.c (MEMCPY): Add defines to enable redirection.
	* string/memset.c (MEMSET): Likewise.
	* sysdeps/x86_64/memcopy.h: Remove empty file.
2019-08-30 17:21:35 +01:00
H.J. Lu
7e681561a3 x86-64: Compile branred.c with -mprefer-vector-width=128 [BZ #24603]
When compiled with -O3 and AVX, GCC 8 and 9 optimize some loops in
sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions,
which leads to store forward stall:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

There is no easy fix in compiler.  This patch limits vector width to
128 bits to work around this issue.  It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.

Tested with GCC 7/8/9 on x86-64.

	[BZ #24603]
	* sysdeps/x86_64/configure.ac: Check if -mprefer-vector-width=128
	works.
	* sysdeps/x86_64/configure: Regenerated.
	* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New.  Set
	to -mprefer-vector-width=128 if supported.
2019-07-24 14:48:43 -07:00
H.J. Lu
d039da1c00 x86: Add sysdeps/x86/dl-lookupcfg.h
Since sysdeps/i386/dl-lookupcfg.h and sysdeps/x86_64/dl-lookupcfg.h are
identical, we can replace them with sysdeps/x86/dl-lookupcfg.h.

	* sysdeps/i386/dl-lookupcfg.h: Moved to ...
	* sysdeps/x86/dl-lookupcfg.h: Here.
	* sysdeps/x86_64/dl-lookupcfg.h: Removed.
2019-06-26 15:07:28 -07:00
Adhemerval Zanella
81a1443941 wcsmbs: optimize wcscat
This patch rewrites wcscat using wcslen and wcscpy.  This is similar to
the optimization done on strcat by 6e46de42fe.

The strcpy changes are mainly to add the internal alias to avoid PLT
calls.

Checked on x86_64-linux-gnu and a build against the affected
architectures.

	* include/wchar.h (__wcscpy): New prototype.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
	(__wcscpy): Route internal symbol to generic implementation.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy):
	Add internal __wcscpy alias.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise.
	* sysdeps/s390/wcscpy.c (wcscpy): Likewise.
	* sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise.
	* wcsmbs/wcscpy.c (wcscpy): Add
	* sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to
	use generic implementation.
	* wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.
2019-02-27 10:00:37 -03:00
Joseph Myers
32db86d558 Add fall-through comments.
This patch adds fall-through comments in some cases where -Wextra
produces implicit-fallthrough warnings.

The patch is non-exhaustive.  Apart from architecture-specific code
for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy
code, probably should have such changes, but left to be dealt with
separately), or places that already had comments about the
fall-through but not matching the form expected by
-Wimplicit-fallthrough=3 (the default level with -Wextra; my
inclination is to adjust those comments to match rather than
downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one
place where I thought the implicit fallthrough was not correct and so
should be handled separately as a bug fix.  I think the key thing to
consider in review of this patch is whether the fall-through is indeed
intended and correct in each place where such a comment is added.

Tested for x86_64.

	* elf/dl-exception.c (_dl_exception_create_format): Add
	fall-through comments.
	* elf/ldconfig.c (parse_conf_include): Likewise.
	* elf/rtld.c (print_statistics): Likewise.
	* locale/programs/charmap.c (parse_charmap): Likewise.
	* misc/mntent_r.c (__getmntent_r): Likewise.
	* posix/wordexp.c (parse_arith): Likewise.
	(parse_backtick): Likewise.
	* resolv/ns_ttl.c (ns_parse_ttl): Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
2019-02-12 10:30:34 +00:00
Andreas Schwab
65f7767a91 Fix handling of collating elements in fnmatch (bug 17396, bug 16976)
This fixes the same bug in fnmatch that was fixed by commit 7e2f0d2d77 for
regexp matching.  As a side effect it also removes the use of an unbound
VLA.
2019-02-04 15:45:02 +01:00
H.J. Lu
3f635fb433 x86-64 memcmp: Use unsigned Jcc instructions on size [BZ #24155]
Since the size argument is unsigned. we should use unsigned Jcc
instructions, instead of signed, to check size.

Tested on x86-64 and x32, with and without --disable-multi-arch.

	[BZ #24155]
	CVE-2019-7309
	* NEWS: Updated for CVE-2019-7309.
	* sysdeps/x86_64/memcmp.S: Use RDX_LP for size.  Clear the
	upper 32 bits of RDX register for x32.  Use unsigned Jcc
	instructions, instead of signed.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp-2.
	* sysdeps/x86_64/x32/tst-size_t-memcmp-2.c: New test.
2019-02-04 06:31:13 -08:00
H.J. Lu
5165de69c0 x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strnlen/wcsnlen for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length.
	Clear the upper 32 bits of RSI register.
	* sysdeps/x86_64/strlen.S: Use RSI_LP for length.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen
	and tst-size_t-wcsnlen.
	* sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file.
	* sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise.
2019-01-21 11:36:47 -08:00
H.J. Lu
c7c54f65b0 x86-64 strncpy: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strncpy for x32.  Tested on x86-64 and x32.  On x86-64,
libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/strcpy-avx2.S: Use RDX_LP for length.
	* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise.
	* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy.
	* sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file.
2019-01-21 11:35:34 -08:00
H.J. Lu
ee915088a0 x86-64 strncmp family: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes the strncmp family for x32.  Tested on x86-64 and x32.
On x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/strcmp-avx2.S: Use RDX_LP for length.
	* sysdeps/x86_64/multiarch/strcmp-sse42.S: Likewise.
	* sysdeps/x86_64/strcmp.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncasecmp,
	tst-size_t-strncmp and tst-size_t-wcsncmp.
	* sysdeps/x86_64/x32/tst-size_t-strncasecmp.c: New file.
	* sysdeps/x86_64/x32/tst-size_t-strncmp.c: Likewise.
	* sysdeps/x86_64/x32/tst-size_t-wcsncmp.c: Likewise.
2019-01-21 11:34:04 -08:00
H.J. Lu
82d0b4a4d7 x86-64 memset/wmemset: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memset/wmemset for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Use
	RDX_LP for length.  Clear the upper 32 bits of RDX register.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-wmemset.
	* sysdeps/x86_64/x32/tst-size_t-memset.c: New file.
	* sysdeps/x86_64/x32/tst-size_t-wmemset.c: Likewise.
2019-01-21 11:32:37 -08:00
H.J. Lu
ecd8b842cf x86-64 memrchr: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memrchr for x32.  Tested on x86-64 and x32.  On x86-64,
libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/memrchr.S: Use RDX_LP for length.
	* sysdeps/x86_64/multiarch/memrchr-avx2.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memrchr.
	* sysdeps/x86_64/x32/tst-size_t-memrchr.c: New file.
2019-01-21 11:30:12 -08:00