Commit Graph

1533 Commits

Author SHA1 Message Date
Noah Goldstein
aaa23c3507 x86: Optimize strlen-avx2.S
No bug. This commit optimizes strlen-avx2.S. The optimizations are
mostly small things but they add up to roughly 10-30% performance
improvement for strlen. The results for strnlen are bit more
ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen
are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 18:03:49 -07:00
Noah Goldstein
4ba6558684 x86: Optimize strlen-evex.S
No bug. This commit optimizes strlen-evex.S. The
optimizations are mostly small things but they add up to roughly
10-30% performance improvement for strlen. The results for strnlen are
bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and
test-wcsnlen are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 18:03:49 -07:00
Noah Goldstein
f53790272c x86: Optimize less_vec evex and avx512 memset-vec-unaligned-erms.S
No bug. This commit adds optimized cased for less_vec memset case that
uses the avx512vl/avx512bw mask store avoiding the excessive
branches. test-memset and test-wmemset are passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 15:08:04 -07:00
H.J. Lu
83c5b36822 x86-64: Require BMI2 for strchr-avx2.S
Since strchr-avx2.S updated by

commit 1f745ecc21
Author: noah <goldstein.w.n@gmail.com>
Date:   Wed Feb 3 00:38:59 2021 -0500

    x86-64: Refactor and improve performance of strchr-avx2.S

uses sarx:

c4 e2 72 f7 c0       	sarx   %ecx,%eax,%eax

for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and
ifunc-avx2.h.
2021-04-19 11:01:45 -07:00
H.J. Lu
55bf411b45 x86-64: Require BMI2 for __strlen_evex and __strnlen_evex
Since __strlen_evex and __strnlen_evex added by

commit 1fd8c163a8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 5 06:24:52 2021 -0800

    x86-64: Add ifunc-avx2.h functions with 256-bit EVEX

use sarx:

c4 e2 6a f7 c0       	sarx   %edx,%eax,%eax

require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c.
ifunc-avx2.h already requires BMI2 for EVEX implementation.
2021-04-19 07:51:33 -07:00
noah
1a8605b6cd x86: Update large memcpy case in memmove-vec-unaligned-erms.S
No Bug. This commit updates the large memcpy case (no overlap). The
update is to perform memcpy on either 2 or 4 contiguous pages at
once. This 1) helps to alleviate the affects of false memory aliasing
when destination and source have a close 4k alignment and 2) In most
cases and for most DRAM units is a modestly more efficient access
pattern. These changes are a clear performance improvement for
VEC_SIZE =16/32, though more ambiguous for VEC_SIZE=64. test-memcpy,
test-memccpy, test-mempcpy, test-memmove, and tst-memmove-overflow all
pass.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-16 10:06:56 -07:00
Szabolcs Nagy
55c9f32380 x86_64: Remove lazy tlsdesc relocation related code
_dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for
lazy tlsdesc relocation processing which is no longer supported.
2021-04-15 09:47:47 +01:00
Szabolcs Nagy
8f7e09f4db x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c
for aarch64, but it fixes a different race: bug 27137.

Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
relocate tlsdesc lazily, but that does not work in a BIND_NOW module
due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
load time fixes this bug 27721 too.
2021-04-15 09:47:37 +01:00
Paul Zimmermann
43576de04a Improve the accuracy of tgamma (BZ #26983)
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).

Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-07 13:23:39 +02:00
Paul Zimmermann
9acda61d94 Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.

The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average.  Two different algorithms are used:

* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
  degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)

* for large inputs, an asymptotic formula from [1] is used

[1] Fast and Accurate Bessel Function Computation,
    John Harrison, Proceedings of Arith 19, 2009.

Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).

Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-02 06:15:48 +02:00
Sunil K Pandey
595c22ecd8 x86-64: Fix ifdef indentation in strlen-evex.S
Fix some indentations of ifdef in file strlen-evex.S which are off by 1
and confusing to read.
2021-04-01 16:13:33 -07:00
H.J. Lu
b1ec623ed5 x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591]
config/i386/constraints.md in GCC has

(define_constraint "e"
  "32-bit signed integer constant, or a symbolic reference known
   to fit that range (for immediate operands in sign-extending x86-64
   instructions)."
  (match_operand 0 "x86_64_immediate_operand"))

Since movq takes a signed 32-bit immediate or a register source operand,
use "er", instead of "nr"/"ir", constraint for 32-bit signed integer
constant or register on movq.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-04-01 07:00:22 -07:00
H.J. Lu
e4fda46310 x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
4e2d8f3527 x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
7ebba91361 x86-64: Add AVX optimized string/memory functions for RTM
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with

	xtest
	jz	1f
	vzeroall
	ret
1:
	vzeroupper
	ret

at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
2021-03-29 07:40:17 -07:00
H.J. Lu
91264fe357 x86-64: Add memcmp family functions with 256-bit EVEX
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
1b968b6b9b x86-64: Add memset family functions with 256-bit EVEX
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
63ad43566f x86-64: Add memmove family functions with 256-bit EVEX
Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
525bc2a32c x86-64: Add strcpy family functions with 256-bit EVEX
Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
1fd8c163a8 x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.

For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.
2021-03-29 07:40:17 -07:00
H.J. Lu
3e2f285c5f nptl: Remove MULTI_PAGE_ALIASING [BZ #23554]
MULTI_PAGE_ALIASING was introduced to mitigate an aliasing issue on
Pentium 4.  It is no longer needed for processors after Pentium 4.
2021-03-19 15:04:17 -07:00
Wilco Dijkstra
47ad14d789 math: Remove mpa files [BZ #15267]
Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Wilco Dijkstra
db3f7bb558 math: Remove slow paths from asin and acos [BZ #15267]
This patch series removes all remaining slow paths and related code.
First asin/acos, tan, atan, atan2 implementations are updated, and the final
patch removes the unused mpa files, headers and probes. Passes buildmanyglibc.

Remove slow paths from asin/acos. Add ULP annotations based on previous slow
path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Paul Zimmermann
5a051454a9 Add inputs that generate larger error bounds
(Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)
2021-02-27 06:32:11 +01:00
Florian Weimer
035c012e32 Reduce the statically linked startup code [BZ #23323]
It turns out the startup code in csu/elf-init.c has a perfect pair of
ROP gadgets (see Marco-Gisbert and Ripoll-Ripoll, "return-to-csu: A
New Method to Bypass 64-bit Linux ASLR").  These functions are not
needed in dynamically-linked binaries because DT_INIT/DT_INIT_ARRAY
are already processed by the dynamic linker.  However, the dynamic
linker skipped the main program for some reason.  For maximum
backwards compatibility, this is not changed, and instead, the main
map is consulted from __libc_start_main if the init function argument
is a NULL pointer.

For statically linked binaries, the old approach based on linker
symbols is still used because there is nothing else available.

A new symbol version __libc_start_main@@GLIBC_2.34 is introduced because
new binaries running on an old libc would not run their ELF
constructors, leading to difficult-to-debug issues.
2021-02-25 12:13:02 +01:00
H.J. Lu
89de9d3958 x86: Use x86/nptl/pthreaddef.h
1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h.
2. Remove sysdeps/x86_64/nptl/pthreaddef.h.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-02-22 15:52:56 -08:00
noah
1f745ecc21 x86-64: Refactor and improve performance of strchr-avx2.S
No bug. Just seemed the performance could be improved a bit. Observed
and expected behavior are unchanged. Optimized body of main
loop. Updated page cross logic and optimized accordingly. Made a few
minor instruction selection modifications. No regressions in test
suite. Both test-strchrnul and test-strchr passed.
2021-02-08 11:21:33 -08:00
Sajan Karumanchi
6e02b3e932 x86: Adding an upper bound for Enhanced REP MOVSB.
In the process of optimizing memcpy for AMD machines, we have found the
vector move operations are outperforming enhanced REP MOVSB for data
transfers above the L2 cache size on Zen3 architectures.
To handle this use case, we are adding an upper bound parameter on
enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'.
As per large-bench results, we are configuring this parameter to the
L2 cache size for AMD machines and applicable from Zen3 architecture
supporting the ERMS feature.
For architectures other than AMD, it is the computed value of
non-temporal threshold parameter.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
2021-02-02 12:42:15 +01:00
Szabolcs Nagy
374cef32ac configure: Check for static PIE support
Add SUPPORT_STATIC_PIE that targets can define if they support
static PIE. This requires PI_STATIC_AND_HIDDEN support and various
linker features as described in

  commit 9d7a3741c9
  Add --enable-static-pie configure option to build static PIE [BZ #19574]

Currently defined on x86_64, i386 and aarch64 where static PIE is
known to work.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-21 15:54:50 +00:00
H.J. Lu
ff6d62e9ed <sys/platform/x86.h>: Remove the C preprocessor magic
In <sys/platform/x86.h>, define CPU features as enum instead of using
the C preprocessor magic to make it easier to wrap this functionality
in other languages.  Move the C preprocessor magic to internal header
for better GCC codegen when more than one features are checked in a
single expression as in x86-64 dl-hwcaps-subdirs.c.

1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX.
2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h.
3. Remove struct cpu_features and __x86_get_cpu_features from
<sys/platform/x86.h>.
4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it
in libc.
5. Make __get_cpu_features() private to glibc.
6. Replace __x86_get_cpu_features(N) with __get_cpu_features().
7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE.
8. Use a single enum index for each CPU feature detection.
9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf.
10. Return zero struct cpuid_feature for the older glibc binary with a
smaller CPUID_INDEX_MAX [BZ #27104].
11. Inside glibc, use the C preprocessor magic so that cpu_features data
can be loaded just once leading to more compact code for glibc.

256 bits are used for each CPUID leaf.  Some leaves only contain a few
features.  We can add exceptions to such leaves.  But it will increase
code sizes and it is harder to provide backward/forward compatibilities
when new features are added to such leaves in the future.

When new leaves are added, _rtld_global_ro offsets will change which
leads to race condition during in-place updates. We may avoid in-place
updates by

1. Rename the old glibc.
2. Install the new glibc.
3. Remove the old glibc.

NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy
relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
2021-01-21 05:58:17 -08:00
H.J. Lu
0ec583d926 libmvec: Add extra-test-objs to test-extras
Add extra-test-objs to test-extras so that they are compiled with
-DMODULE_NAME=testsuite instead of -DMODULE_NAME=libc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-19 06:20:46 -08:00
Adhemerval Zanella
d18f59bf92 Fix x86 build with --enable-tunable=no
Checked on x86_64-linux-gnu.
2021-01-14 16:04:05 -03:00
H.J. Lu
ecce11aa07 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717]
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with
GNU_PROPERTY_X86_ISA_1_V[234] marker:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13

Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by

commit b0ab06937385e0ae25cebf1991787d64f439bf12
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Oct 30 06:49:57 2020 -0700

    x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker

and

commit 32930e4edbc06bc6f10c435dbcc63131715df678
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Oct 9 05:05:57 2020 -0700

    x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker

GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the
micro-architecture ISA level required to execute the binary.  The marker
must be added by programmers explicitly in one of 3 ways:

1. Pass -mneeded to GCC.
2. Add the marker in the linker inputs as this patch does.
3. Pass -z x86-64-v[234] to the linker.

Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker support to ld.so if binutils 2.32 or newer is used to build glibc:

1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
markers to elf.h.
2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker to abi-note.o based on the ISA level used to compile abi-note.o,
assuming that the same ISA level is used to compile the whole glibc.
3. Add isa_1 to cpu_features to record the supported x86 ISA level.
4. Rename _dl_process_cet_property_note to _dl_process_property_note and
add GNU_PROPERTY_X86_ISA_1_V[234] marker detection.
5. Update _rtld_main_check and _dl_open_check to check loaded objects
with the incompatible ISA level.
6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails
on lesser platforms.
7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c.

Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and
x86-64-v4 machines.

Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine
and got:

[hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1
./elf/tst-isa-level-1: CPU ISA level is lower than required
[hjl@gnu-cfl-2 build-x86_64-linux]$
2021-01-07 13:10:13 -08:00
Wilco Dijkstra
9e97f239ea Remove dbl-64/wordsize-64 (part 2)
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory.  The second patch just moves all wordsize-64 files and removes a
few wordsize-64 uses in comments and Implies files.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-07 15:26:26 +00:00
H.J. Lu
6ea5b57afa x86: Check IFUNC definition in unrelocated executable [BZ #20019]
Calling an IFUNC function defined in unrelocated executable also leads to
segfault.  Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.
2021-01-04 12:01:01 -08:00
H.J. Lu
3ec5d83d2a x86-64: Avoid rep movsb with short distance [BZ #27130]
When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow.  This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:

	cmpl	$63, %ecx
	// Don't use "rep movsb" if ECX <= 63
	jbe	L(Don't use rep movsb")
	Use "rep movsb"

Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.
2021-01-04 07:58:57 -08:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Siddhesh Poyarekar
94547d9209 x86 long double: Support pseudo numbers in isnanl
This syncs up isnanl behaviour with gcc.  Also move the isnanl
implementation to sysdeps/x86 and remove the sysdeps/x86_64 version.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-24 06:05:40 +05:30
Siddhesh Poyarekar
b7f8815617 x86 long double: Support pseudo numbers in fpclassifyl
Also move sysdeps/i386/fpu/s_fpclassifyl.c to
sysdeps/x86/fpu/s_fpclassifyl.c and remove
sysdeps/x86_64/fpu/s_fpclassifyl.c

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-24 06:05:26 +05:30
Paul Zimmermann
cad5ad81d2 add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64) 2020-12-21 10:35:20 +05:30
Anssi Hannula
69a7ca7705 ieee754: Remove unused __sin32 and __cos32
The __sin32 and __cos32 functions were only used in the now removed slow
path of asin and acos.
2020-12-18 12:10:31 +05:30
Florian Weimer
f267e1c9dd x86_64: Add glibc-hwcaps support
The subdirectories match those in the x86-64 psABI:

77566eb03b

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-04 09:36:02 +01:00
Jakub Jelinek
1d9cbb9608 x86: Fix THREAD_SELF definition to avoid ld.so crash (bug 27004)
The previous definition of THREAD_SELF did not tell the compiler
that %fs (or %gs) usage is invalid for the !DL_LOOKUP_GSCOPE_LOCK
case in _dl_lookup_symbol_x.  As a result, ld.so could try to use the
TCB before it was initialized.

As the comment in tls.h explains, asm volatile is undesirable here.
Using the __seg_fs (or __seg_gs) namespace does not interfere with
optimization, and expresses that THREAD_SELF is potentially trapping.
2020-12-03 13:48:55 +01:00
Florian Weimer
1daccf403b nptl: Move stack list variables into _rtld_global
Now __thread_gscope_wait (the function behind THREAD_GSCOPE_WAIT,
formerly __wait_lookup_done) can be implemented directly in ld.so,
eliminating the unprotected GL (dl_wait_lookup_done) function
pointer.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-11-16 19:33:30 +01:00
Adhemerval Zanella
01bd62517c Remove tls.h inclusion from internal errno.h
The tls.h inclusion is not really required and limits possible
definition on more arch specific headers.

This is a cleanup to allow inline functions on sysdep.h, more
specifically on i386 and ia64 which requires to access some tls
definitions its own.

No semantic changes expected, checked with a build against all
affected ABIs.
2020-11-13 12:59:19 -03:00
Florian Weimer
0f34d426ac x86: Remove UP macro. Define LOCK_PREFIX unconditionally.
The UP macro is never defined.  Also define LOCK_PREFIX
unconditionally, to the same string.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-11-13 15:20:03 +01:00
Samuel Thibault
3d3316b1de hurd: keep only required PLTs in ld.so
We need NO_RTLD_HIDDEN because of the need for PLT calls in ld.so.
See Roland's comment in
https://sourceware.org/bugzilla/show_bug.cgi?id=15605
"in the Hurd it's crucial that calls like __mmap be the libc ones
instead of the rtld-local ones after the bootstrap phase, when the
dynamic linker is being used for dlopen and the like."

We used to just avoid all hidden use in the rtld ; this commit switches to
keeping only those that should use PLT calls, i.e. essentially those defined in
sysdeps/mach/hurd/dl-sysdep.c:

__assert_fail
__assert_perror_fail
__*stat64
_exit

This fixes a few startup issues, notably the call to __tunable_get_val that is
made before PLTs are set up.
2020-11-11 02:36:22 +01:00
H.J. Lu
0f09154c64 x86: Initialize CPU info via IFUNC relocation [BZ 26203]
X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.

Key points:

1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch.  Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
2020-10-16 16:17:53 -07:00
Szabolcs Nagy
238032ead6 aarch64: enforce >=64K guard size [BZ #26691]
There are several compiler implementations that allow large stack
allocations to jump over the guard page at the end of the stack and
corrupt memory beyond that. See CVE-2017-1000364.

Compilers can emit code to probe the stack such that the guard page
cannot be skipped, but on aarch64 the probe interval is 64K by default
instead of the minimum supported page size (4K).

This patch enforces at least 64K guard on aarch64 unless the guard
is disabled by setting its size to 0.  For backward compatibility
reasons the increased guard is not reported, so it is only observable
by exhausting the address space or parsing /proc/self/maps on linux.

On other targets the patch has no effect. If the stack probe interval
is larger than a page size on a target then ARCH_MIN_GUARD_SIZE can
be defined to get large enough stack guard on libc allocated stacks.

The patch does not affect threads with user allocated stacks.

Fixes bug 26691.
2020-10-02 09:57:44 +01:00
Florian Weimer
90ccfdf176 x86: Use one ldbl2mpn.c file for both i386 and x86_64 2020-09-22 17:58:39 +02:00
H.J. Lu
9620398097 x86: Install <sys/platform/x86.h> [BZ #26124]
Install <sys/platform/x86.h> so that programmers can do

 #if __has_include(<sys/platform/x86.h>)
 #include <sys/platform/x86.h>
 #endif
 ...

   if (CPU_FEATURE_USABLE (SSE2))
 ...
   if (CPU_FEATURE_USABLE (AVX2))
 ...

<sys/platform/x86.h> exports only:

enum
{
  COMMON_CPUID_INDEX_1 = 0,
  COMMON_CPUID_INDEX_7,
  COMMON_CPUID_INDEX_80000001,
  COMMON_CPUID_INDEX_D_ECX_1,
  COMMON_CPUID_INDEX_80000007,
  COMMON_CPUID_INDEX_80000008,
  COMMON_CPUID_INDEX_7_ECX_1,
  /* Keep the following line at the end.  */
  COMMON_CPUID_INDEX_MAX
};

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
};

/* Get a pointer to the CPU features structure.  */
extern const struct cpu_features *__x86_get_cpu_features
  (unsigned int max) __attribute__ ((const));

Since all feature checks are done through macros, programs compiled with
a newer <sys/platform/x86.h> are compatible with the older glibc binaries
as long as the layout of struct cpu_features is identical.  The features
array can be expanded with backward binary compatibility for both .o and
.so files.  When COMMON_CPUID_INDEX_MAX is increased to support new
processor features, __x86_get_cpu_features in the older glibc binaries
returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the
new processor feature.  No new symbol version is neeeded.

Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided.  HAS_CPU_FEATURE
can be used to identify processor features.

Note: Although GCC has __builtin_cpu_supports, it only supports a subset
of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE.  It
doesn't support HAS_CPU_FEATURE.
2020-09-11 17:20:52 -07:00
Ondřej Hošek
23af890b3f x86-64: Fix FMA4 detection in ifunc [BZ #26534]
A typo in commit 107e6a3c22 causes the
FMA4 code path to be taken on systems that support FMA, even if they do
not support FMA4. Fix this to detect FMA4.
2020-09-02 05:07:37 -07:00
Adhemerval Zanella
5ff35e9544 math: Update x86_64 ulps
From new j0 test.
2020-08-08 16:43:11 -03:00
Andreas K. Hüttel
180d5a045f Update x86-64 libm-test-ulps
x86_64 Intel(R) Core(TM) i5-8265U
gcc (Gentoo 10.1.0-r2 p3) 10.1.0
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-07-25 17:10:53 -04:00
H.J. Lu
107e6a3c22 x86: Support usable check for all CPU features
Support usable check for all CPU features with the following changes:

1. Change struct cpu_features to

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};

so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits.  EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.

The results are

1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
2020-07-13 06:05:16 -07:00
H.J. Lu
9016b6f389 x86: Remove the unused __x86_prefetchw
Since

commit c867597bff
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jun 8 13:57:50 2016 -0700

    X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove

removed the only usage of __x86_prefetchw, we can remove the unused
__x86_prefetchw.
2020-07-11 09:34:03 -07:00
H.J. Lu
3f4b61a0b8 x86: Add thresholds for "rep movsb/stosb" to tunables
Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables
to update thresholds for "rep movsb" and "rep stosb" at run-time.

Note that the user specified threshold for "rep movsb" smaller than
the minimum threshold will be ignored.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-07-06 11:48:42 -07:00
Adhemerval Zanella
b24381e50f i386: Use builtin sqrtl
Checked on i686-linux-gnu.
2020-06-22 11:09:49 -03:00
Adhemerval Zanella
d19d25dd06 x86_64: Use builtin sqrt{f,l}
Checked on x86_64-linux-gnu.
2020-06-22 11:09:49 -03:00
Sunil K Pandey
75870237ff Fix avx2 strncmp offset compare condition check [BZ #25933]
strcmp-avx2.S: In avx2 strncmp function, strings are compared in
chunks of 4 vector size(i.e. 32x4=128 byte for avx2). After first 4
vector size comparison, code must check whether it already passed
the given offset. This patch implement avx2 offset check condition
for strncmp function, if both string compare same for first 4 vector
size.
2020-06-17 07:07:38 -07:00
H.J. Lu
a35a59036e x86_64: Use %xmmN with vpxor to clear a vector register
Since "vpxor %xmmN, %xmmN, %xmmN" clears the whole vector register, use
%xmmN, instead of %ymmN, with vpxor to clear a vector register.
2020-06-17 05:44:02 -07:00
Vineet Gupta
8dbb7a08ec dl-runtime: reloc_{offset,index} now functions arch overide'able
The existing macros are fragile and expect local variables with a
certain name. Fix this by defining them as functions with default
implementation in a new header dl-runtime.h which arches can override
if need be.

This came up during ARC port review, hence the need for argument pltgot
in reloc_index() which is not needed by existing ports.

This patch potentially only affects hppa/x86 ports,
build tested for both those configs and a few more.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-06-05 13:45:46 -07:00
Florian Weimer
501bdb5dd6 Linux: Remove remnants of the getcpu cache
The getcpu cache was removed from the kernel in Linux 2.6.24.  glibc
support from the sched_getcpu implementation was removed in commit
dd26c44403 ("Consolidate sched_getcpu").
2020-05-16 15:47:51 +02:00
H.J. Lu
55c7bcc71b x86-64: Use RDX_LP on __x86_shared_non_temporal_threshold [BZ #25966]
Since __x86_shared_non_temporal_threshold is defined as

long int __x86_shared_non_temporal_threshold;

and long int is 4 bytes for x32, use RDX_LP to compare against
__x86_shared_non_temporal_threshold in assembly code.
2020-05-09 12:28:15 -07:00
Joseph Myers
dbb188dd87 Remove unused floating-point configuration from gmp-impl.h.
This patch removes the IEEE_DOUBLE_BIG_ENDIAN and
IEEE_DOUBLE_MIXED_ENDIAN macros from gmp-impl.h and gmp-mparam.h, and
the ieee_double_extract union from gmp-impl.h.  The macros were used
only in defining the union, which was used nowhere in glibc.  As GMP's
gmp-impl.h is over 5000 lines, the file in glibc is so far from the
GMP version that it doesn't seem to make sense to keep things there
that are not relevant in glibc.  (I expect there is plenty more in the
header after this patch that is also not relevant in glibc and can be
cleaned up later.)

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.
2020-04-28 15:05:09 +00:00
Adhemerval Zanella
f721171632 Revert "x86_64: Add SSE sfp-exceptions"
The __sfp_handle_exceptions is not fully correct regarding raising
exceptions, since there is no direct way to raise only FP_EX_OVERFLOW
nor FP_EX_UNDERFLOW for SSE mode.  Both libgcc and feraiseexcept rely
on x87 mode to accomplish it.

This reverts commit 460ee50de0.

Checked on x86_64.
2020-04-20 14:56:05 -03:00
Adhemerval Zanella
460ee50de0 x86_64: Add SSE sfp-exceptions
The exported x86_64 fenv.h functions operate on both i387 and SSE (since
they should work on both float, double, and long double) while the
internal libc_fe* set either SSE (float, double, and float128) or
i387 (long double).

The libgcc __sfp_handle_exceptions (used on float128 implementation),
however, will set either SEE or i387 exception depending of the
exception to raise.  This broke the internal assumption of float128
where only SSE operations will be used.

This patch reimplements the libgcc __sfp_handle_exceptions to use only
SSE operations and sets libgcc to use it instead of its own
implementation.

And I think we should fix libgcc in a similar manner, since checking on
config/i386/64/sfp-machine.h it already only supports SSE rounding mode
and x86_64 ABI also expectes float128 to use SSE registers [1]
(although it is not clear on how future implementation might implement
it).

Checked on x86_64-linux-gnu.

[1] https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI
2020-04-17 11:42:29 -03:00
Adhemerval Zanella
17fd707f88 nptl: Remove x86_64 cancellation assembly implementations [BZ #25765]
All cancellable syscalls are done by C implementations, so there is no
no need to use a specialized implementation to optimize register usage.

It fixes BZ #25765.

Checked on x86_64-linux-gnu.
2020-04-03 10:47:59 -03:00
Paul Zimmermann
a9d42c09a3 math: Add inputs that yield larger errors for float type (x86_64)
The corner cases included were generated using exhaustive search
for all float/binary32 values on x86_64 (comparing to MPFR for
correct rounding to nearest).

For the j0/j1/y0 functions, only cases with ulp error <= 9 were
included.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-03-31 21:48:54 -04:00
Adhemerval Zanella
1c15464ca0 math: Remove inline math tests
With mathinline removal there is no need to keep building and testing
inline math tests.

The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries.  The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2020-03-19 11:45:44 -03:00
Florian Weimer
fe49a73316 x86: Avoid single-argument _Static_assert in <tls.h>
Older GCC versions do not support this extension.  Fixes commit f1bdee6179
("x86 tls: Use _Static_assert for TLS access size assertion").
2020-02-17 11:12:03 +01:00
Samuel Thibault
f1bdee6179 x86 tls: Use _Static_assert for TLS access size assertion 2020-02-17 00:40:39 +01:00
Florian Weimer
3a0ecccb59 ld.so: Do not export free/calloc/malloc/realloc functions [BZ #25486]
Exporting functions and relying on symbol interposition from libc.so
makes the choice of implementation dependent on DT_NEEDED order, which
is not what some compiler drivers expect.

This commit replaces one magic mechanism (symbol interposition) with
another one (preprocessor-/compiler-based redirection).  This makes
the hand-over from the minimal malloc to the full malloc more
explicit.

Removing the ABI symbols is backwards-compatible because libc.so is
always in scope, and the dynamic loader will find the malloc-related
symbols there since commit f0b2132b35
("ld.so: Support moving versioned symbols between sonames
[BZ #24741]").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-02-15 11:01:23 +01:00
Adhemerval Zanella
fcb78a5505 linux: Consolidate INLINE_SYSCALL
With all Linux ABIs using the expected Linux kABI to indicate
syscalls errors, there is no need to replicate the INLINE_SYSCALL.

The generic Linux sysdep.h includes errno.h even for !__ASSEMBLER__,
which is ok now and it allows cleanup some archaic code that assume
otherwise.

Checked with a build against all affected ABIs.
2020-02-14 21:09:12 -03:00
Wilco Dijkstra
220622dde5 Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol.  It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).

The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.

Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).

Passes buildmanyglibc.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-03 10:02:04 -03:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Stefan Liebler
1c94bf0f0a Always use wordsize-64 version of s_trunc.c.
This patch replaces s_trunc.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:14 +01:00
Stefan Liebler
9f234eafe8 Always use wordsize-64 version of s_ceil.c.
This patch replaces s_ceil.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:13 +01:00
Stefan Liebler
95b0c2c431 Always use wordsize-64 version of s_floor.c.
This patch replaces s_floor.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
ab48bdd098 Always use wordsize-64 version of s_rint.c.
This patch replaces s_rint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
af123aa950 Always use wordsize-64 version of s_nearbyint.c.
This patch replaces s_nearbyint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:11 +01:00
Florian Weimer
4db71d2f98 elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214]
This commit adds missing skip_ifunc checks to aarch64, arm, i386,
sparc, and x86_64.  A new test case ensures that IRELATIVE IFUNC
resolvers do not run in various diagnostic modes of the dynamic
loader.

Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>
2019-12-02 14:55:22 +01:00
Adhemerval Zanella
48dbce60cf nptl: Add tests for internal pthread_rwlock_t offsets
This patch new build tests to check for internal fields offsets for
internal pthread_rwlock_t definition.  Althoug the '__data.__flags'
field layout should be preserved due static initializators, the patch
also adds tests for the futexes that may be used in a shared memory
(although using different libc version in such scenario is not really
supported).

Checked with a build against all affected ABIs.

Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
71d260c107 nptl: Cleanup mutex internal offset tests
The offsets of pthread_mutex_t __data.__nusers, __data.__spins,
__data.elision, __data.list are not required to be constant over
the releases.  Only the __data.__kind is used for static
initializers.

This patch also adds an additional size check for __data.__kind.

Checked with a build against affected ABIs.

Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f
2019-11-26 13:53:36 +00:00
Wilco Dijkstra
d0007dc53c Remove x64 _finite tests and references
Remove _finite tests and references from x86_64.  Rather than calling
__exp_finite, use exp directly (since it's the same entry point).

x86_64 builds and passes testsuite.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-10-21 14:29:12 -03:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Wilco Dijkstra
3c05dd79d0 Use generic memset/memcpy/memmove in benchtests
Use the generic C memset/memcpy/memmove in benchtests since comparing
against a slow byte-oriented implementation makes no sense.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

2019-08-29  Wilco Dijkstra  <wdijkstr@arm.com>

	* benchtests/bench-memcpy.c (simple_memcpy): Remove.
	(generic_memcpy): Include generic C memcpy.
	* benchtests/bench-memmove.c (simple_memmove): Remove.
	(generic_memmove): Include generic C memmove.
	* benchtests/bench-memset.c (simple_memset): Remove.
	(generic_memset): Include generic C memset.
	* benchtests/bench-memset-large.c (simple_memset): Remove.
	(generic_memset): Include generic C memset.
	* benchtests/bench-memset-walk.c (simple_memset): Remove.
	(generic_memset): Include generic C memset.
	* string/memcpy.c (MEMCPY): Add defines to enable redirection.
	* string/memset.c (MEMSET): Likewise.
	* sysdeps/x86_64/memcopy.h: Remove empty file.
2019-08-30 17:21:35 +01:00
H.J. Lu
7e681561a3 x86-64: Compile branred.c with -mprefer-vector-width=128 [BZ #24603]
When compiled with -O3 and AVX, GCC 8 and 9 optimize some loops in
sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions,
which leads to store forward stall:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

There is no easy fix in compiler.  This patch limits vector width to
128 bits to work around this issue.  It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.

Tested with GCC 7/8/9 on x86-64.

	[BZ #24603]
	* sysdeps/x86_64/configure.ac: Check if -mprefer-vector-width=128
	works.
	* sysdeps/x86_64/configure: Regenerated.
	* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New.  Set
	to -mprefer-vector-width=128 if supported.
2019-07-24 14:48:43 -07:00
H.J. Lu
d039da1c00 x86: Add sysdeps/x86/dl-lookupcfg.h
Since sysdeps/i386/dl-lookupcfg.h and sysdeps/x86_64/dl-lookupcfg.h are
identical, we can replace them with sysdeps/x86/dl-lookupcfg.h.

	* sysdeps/i386/dl-lookupcfg.h: Moved to ...
	* sysdeps/x86/dl-lookupcfg.h: Here.
	* sysdeps/x86_64/dl-lookupcfg.h: Removed.
2019-06-26 15:07:28 -07:00
Adhemerval Zanella
81a1443941 wcsmbs: optimize wcscat
This patch rewrites wcscat using wcslen and wcscpy.  This is similar to
the optimization done on strcat by 6e46de42fe.

The strcpy changes are mainly to add the internal alias to avoid PLT
calls.

Checked on x86_64-linux-gnu and a build against the affected
architectures.

	* include/wchar.h (__wcscpy): New prototype.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
	(__wcscpy): Route internal symbol to generic implementation.
	* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy):
	Add internal __wcscpy alias.
	* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise.
	* sysdeps/s390/wcscpy.c (wcscpy): Likewise.
	* sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise.
	* wcsmbs/wcscpy.c (wcscpy): Add
	* sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to
	use generic implementation.
	* wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.
2019-02-27 10:00:37 -03:00
Joseph Myers
32db86d558 Add fall-through comments.
This patch adds fall-through comments in some cases where -Wextra
produces implicit-fallthrough warnings.

The patch is non-exhaustive.  Apart from architecture-specific code
for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy
code, probably should have such changes, but left to be dealt with
separately), or places that already had comments about the
fall-through but not matching the form expected by
-Wimplicit-fallthrough=3 (the default level with -Wextra; my
inclination is to adjust those comments to match rather than
downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one
place where I thought the implicit fallthrough was not correct and so
should be handled separately as a bug fix.  I think the key thing to
consider in review of this patch is whether the fall-through is indeed
intended and correct in each place where such a comment is added.

Tested for x86_64.

	* elf/dl-exception.c (_dl_exception_create_format): Add
	fall-through comments.
	* elf/ldconfig.c (parse_conf_include): Likewise.
	* elf/rtld.c (print_statistics): Likewise.
	* locale/programs/charmap.c (parse_charmap): Likewise.
	* misc/mntent_r.c (__getmntent_r): Likewise.
	* posix/wordexp.c (parse_arith): Likewise.
	(parse_backtick): Likewise.
	* resolv/ns_ttl.c (ns_parse_ttl): Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
2019-02-12 10:30:34 +00:00
Andreas Schwab
65f7767a91 Fix handling of collating elements in fnmatch (bug 17396, bug 16976)
This fixes the same bug in fnmatch that was fixed by commit 7e2f0d2d77 for
regexp matching.  As a side effect it also removes the use of an unbound
VLA.
2019-02-04 15:45:02 +01:00
H.J. Lu
3f635fb433 x86-64 memcmp: Use unsigned Jcc instructions on size [BZ #24155]
Since the size argument is unsigned. we should use unsigned Jcc
instructions, instead of signed, to check size.

Tested on x86-64 and x32, with and without --disable-multi-arch.

	[BZ #24155]
	CVE-2019-7309
	* NEWS: Updated for CVE-2019-7309.
	* sysdeps/x86_64/memcmp.S: Use RDX_LP for size.  Clear the
	upper 32 bits of RDX register for x32.  Use unsigned Jcc
	instructions, instead of signed.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp-2.
	* sysdeps/x86_64/x32/tst-size_t-memcmp-2.c: New test.
2019-02-04 06:31:13 -08:00
H.J. Lu
5165de69c0 x86-64 strnlen/wcsnlen: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strnlen/wcsnlen for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/strlen-avx2.S: Use RSI_LP for length.
	Clear the upper 32 bits of RSI register.
	* sysdeps/x86_64/strlen.S: Use RSI_LP for length.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strnlen
	and tst-size_t-wcsnlen.
	* sysdeps/x86_64/x32/tst-size_t-strnlen.c: New file.
	* sysdeps/x86_64/x32/tst-size_t-wcsnlen.c: Likewise.
2019-01-21 11:36:47 -08:00
H.J. Lu
c7c54f65b0 x86-64 strncpy: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes strncpy for x32.  Tested on x86-64 and x32.  On x86-64,
libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/strcpy-avx2.S: Use RDX_LP for length.
	* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: Likewise.
	* sysdeps/x86_64/multiarch/strcpy-ssse3.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncpy.
	* sysdeps/x86_64/x32/tst-size_t-strncpy.c: New file.
2019-01-21 11:35:34 -08:00
H.J. Lu
ee915088a0 x86-64 strncmp family: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes the strncmp family for x32.  Tested on x86-64 and x32.
On x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/strcmp-avx2.S: Use RDX_LP for length.
	* sysdeps/x86_64/multiarch/strcmp-sse42.S: Likewise.
	* sysdeps/x86_64/strcmp.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-strncasecmp,
	tst-size_t-strncmp and tst-size_t-wcsncmp.
	* sysdeps/x86_64/x32/tst-size_t-strncasecmp.c: New file.
	* sysdeps/x86_64/x32/tst-size_t-strncmp.c: Likewise.
	* sysdeps/x86_64/x32/tst-size_t-wcsncmp.c: Likewise.
2019-01-21 11:34:04 -08:00
H.J. Lu
82d0b4a4d7 x86-64 memset/wmemset: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memset/wmemset for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: Use
	RDX_LP for length.  Clear the upper 32 bits of RDX register.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-wmemset.
	* sysdeps/x86_64/x32/tst-size_t-memset.c: New file.
	* sysdeps/x86_64/x32/tst-size_t-wmemset.c: Likewise.
2019-01-21 11:32:37 -08:00
H.J. Lu
ecd8b842cf x86-64 memrchr: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memrchr for x32.  Tested on x86-64 and x32.  On x86-64,
libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/memrchr.S: Use RDX_LP for length.
	* sysdeps/x86_64/multiarch/memrchr-avx2.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memrchr.
	* sysdeps/x86_64/x32/tst-size_t-memrchr.c: New file.
2019-01-21 11:30:12 -08:00
H.J. Lu
231c56760c x86-64 memcpy: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memcpy for x32.  Tested on x86-64 and x32.  On x86-64,
libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Use RDX_LP for
	length.  Clear the upper 32 bits of RDX register.
	* sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise.
	* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcpy.
	tst-size_t-wmemchr.
	* sysdeps/x86_64/x32/tst-size_t-memcpy.c: New file.
2019-01-21 11:27:36 -08:00
H.J. Lu
b304fc201d x86-64 memcmp/wmemcmp: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memcmp/wmemcmp for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S: Use RDX_LP for
	length.  Clear the upper 32 bits of RDX register.
	* sysdeps/x86_64/multiarch/memcmp-sse4.S: Likewise.
	* sysdeps/x86_64/multiarch/memcmp-ssse3.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memcmp and
	tst-size_t-wmemcmp.
	* sysdeps/x86_64/x32/tst-size_t-memcmp.c: New file.
	* sysdeps/x86_64/x32/tst-size_t-wmemcmp.c: Likewise.
2019-01-21 11:26:07 -08:00
H.J. Lu
97700a34f3 x86-64 memchr/wmemchr: Properly handle the length parameter [BZ# 24097]
On x32, the size_t parameter may be passed in the lower 32 bits of a
64-bit register with the non-zero upper 32 bits.  The string/memory
functions written in assembly can only use the lower 32 bits of a
64-bit register as length or must clear the upper 32 bits before using
the full 64-bit register for length.

This pach fixes memchr/wmemchr for x32.  Tested on x86-64 and x32.  On
x86-64, libc.so is the same with and withou the fix.

	[BZ# 24097]
	CVE-2019-6488
	* sysdeps/x86_64/memchr.S: Use RDX_LP for length.  Clear the
	upper 32 bits of RDX register.
	* sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise.
	* sysdeps/x86_64/x32/Makefile (tests): Add tst-size_t-memchr and
	tst-size_t-wmemchr.
	* sysdeps/x86_64/x32/test-size_t.h: New file.
	* sysdeps/x86_64/x32/tst-size_t-memchr.c: Likewise.
	* sysdeps/x86_64/x32/tst-size_t-wmemchr.c: Likewise.
2019-01-21 11:24:13 -08:00
Leonardo Sandoval
1a153e47fc x86-64: Optimize strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2
Optimize x86-64 strcat/strncat, strcpy/strncpy and stpcpy/stpncpy with AVX2.
It uses vector comparison as much as possible. In general, the larger the
source string, the greater performance gain observed, reaching speedups of
1.6x compared to SSE2 unaligned routines. Select AVX2 strcat/strncat,
strcpy/strncpy and stpcpy/stpncpy on AVX2 machines where vzeroupper is
preferred and AVX unaligned load is fast.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcat-avx2, strncat-avx2, strcpy-avx2, strncpy-avx2,
	stpcpy-avx2 and stpncpy-avx2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c:
	(__libc_ifunc_impl_list): Add tests for __strcat_avx2,
	__strncat_avx2, __strcpy_avx2, __strncpy_avx2, __stpcpy_avx2
	and __stpncpy_avx2.
	* sysdeps/x86_64/multiarch/{ifunc-unaligned-ssse3.h =>
	ifunc-strcpy.h}: rename header for a more generic name.
	* sysdeps/x86_64/multiarch/ifunc-strcpy.h:
	(IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
	AVX unaligned load is fast and vzeroupper is preferred.
	* sysdeps/x86_64/multiarch/stpcpy-avx2.S: New file
	* sysdeps/x86_64/multiarch/stpncpy-avx2.S: Likewise
	* sysdeps/x86_64/multiarch/strcat-avx2.S: Likewise
	* sysdeps/x86_64/multiarch/strcpy-avx2.S: Likewise
	* sysdeps/x86_64/multiarch/strncat-avx2.S: Likewise
	* sysdeps/x86_64/multiarch/strncpy-avx2.S: Likewise
2019-01-14 09:43:38 -06:00
Adhemerval Zanella
85c828a462 x86_64: Remove wrong THREAD_ATOMIC_* macros
The x86 defines optimized THREAD_ATOMIC_* macros where reference always
the current thread instead of the one indicated by input 'descr' argument.
It work as long the input is the self thread pointer, however it generates
wrong code if the semantic is to set a bit atomicialy from another thread.

This is not an issue for current GLIBC usage, however the new cancellation
code expects that some synchronization code to atomically set bits from
different threads.

The generic code generates an additional load to reference to TLS segment,
for instance the code:

  THREAD_ATOMIC_BIT_SET (THREAD_SELF, cancelhandling, CANCELED_BIT);

Compiles to:

  lock;orl $4, %fs:776

Where with patch changes it now compiles to:

  mov %fs:16,%rax
  lock;orl $4, 776(%rax)

If some usage indeed proves to be a hotspot we can add an extra macro
with a more descriptive name (THREAD_ATOMIC_BIT_SET_SELF for instance)
where x86_64 might optimize it.

Checked on x86_64-linux-gnu.

	* sysdeps/x86_64/nptl/tls.h (THREAD_ATOMIC_CMPXCHG_VAL,
	THREAD_ATOMIC_AND, THREAD_ATOMIC_BIT_SET): Remove macros.
2019-01-03 18:38:14 -02:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
H.J. Lu
ba4b8fab20 x86-64: Remove s_sincosf-sse2.S
The current s_sincosf.c is faster than s_sincosf-sse2.S.  On Broadwell
with FMA disabled, bench-sincosf shows:

       Before         After      Improvement
max    154.032        114.517        34%
min    6.25           5.609          11%
mean   14.8728        12.8589        15%

	* sysdeps/x86_64/fpu/s_sincosf.S: Removed.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.c: New file.
2018-12-26 06:58:31 -08:00
H.J. Lu
9412979a43 Regenerate sysdeps/x86_64/fpu/libm-test-ulps
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
2018-12-26 06:57:30 -08:00
H.J. Lu
8700a7851b x86-64: Vectorize sincosf_poly and update s_sincosf-fma.c
Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized
sincosf_poly.  Add x86 sincosf_poly.h to vectorize sincosf_poly.  On
Broadwell, bench-sincosf shows:

       Before         After      Improvement
max    160.273        114.198        40%
min    6.25           5.625          11%
mean   13.0325        10.6462        22%

Vectorized sincosf_poly shows

       Before         After      Improvement
max    138.653        114.198        21%
min    5.004          5.625          -11%
mean   11.5934        10.6462        9%

Tested on x86-64 and i686 as well as with build-many-glibcs.py.

	* sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>.
	(sincos_t, sincosf_poly, sinf_poly): Moved to ...
	* sysdeps/ieee754/flt-32/sincosf_poly.h: Here.  New file.
	* sysdeps/x86/fpu/s_sincosf_data.c: New file.
	* sysdeps/x86/fpu/sincosf_poly.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include
	<sysdeps/ieee754/flt-32/s_sincosf.c>.
2018-12-26 06:56:13 -08:00
H.J. Lu
cd815050e5 x86: Merge i386/x86_64 atomic-machine.h
Merge i386 and x86_64 atomic-machine.h to x86 atomic-machine.h.

Tested on i686 and x86_64 as well as with build-many-glibcs.py.

	* sysdeps/i386/atomic-machine.h: Merged with ...
	* sysdeps/x86_64/atomic-machine.h: To ...
	* sysdeps/x86/atomic-machine.h: This.  New file.
2018-12-18 04:25:26 -08:00
H.J. Lu
c22e4c2a14 x86: Extend CPUID support in struct cpu_features
Extend CPUID support for all feature bits from CPUID.  Add a new macro,
CPU_FEATURE_USABLE, which can be used to check if a feature is usable at
run-time, instead of HAS_CPU_FEATURE and HAS_ARCH_FEATURE.

Add COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007 and
COMMON_CPUID_INDEX_80000008 to check CPU feature bits in them.

Tested on i686 and x86-64 as well as using build-many-glibcs.py with
x86 targets.

	* sysdeps/x86/cacheinfo.c (intel_check_word): Updated for
	cpu_features_basic.
	(__cache_sysconf): Likewise.
	(init_cacheinfo): Likewise.
	* sysdeps/x86/cpu-features.c (get_extended_indeces): Also
	populate COMMON_CPUID_INDEX_80000007 and
	COMMON_CPUID_INDEX_80000008.
	(get_common_indices): Also populate COMMON_CPUID_INDEX_D_ECX_1.
	Use CPU_FEATURES_CPU_P (cpu_features, XSAVEC) to check if
	XSAVEC is available.  Set the bit_arch_XXX_Usable bits.
	(init_cpu_features): Use _Static_assert on
	index_arch_Fast_Unaligned_Load.
	__get_cpuid_registers and __get_arch_feature.  Updated for
	cpu_features_basic.  Set stepping in cpu_features.
	* sysdeps/x86/cpu-features.h: (FEATURE_INDEX_1): Changed to enum.
	(FEATURE_INDEX_2): New.
	(FEATURE_INDEX_MAX): Changed to enum.
	(COMMON_CPUID_INDEX_D_ECX_1): New.
	(COMMON_CPUID_INDEX_80000007): Likewise.
	(COMMON_CPUID_INDEX_80000008): Likewise.
	(cpuid_registers): Likewise.
	(cpu_features_basic): Likewise.
	(CPU_FEATURE_USABLE): Likewise.
	(bit_arch_XXX_Usable): Likewise.
	(cpu_features): Use cpuid_registers and cpu_features_basic.
	(bit_arch_XXX): Reweritten.
	(bit_cpu_XXX): Likewise.
	(index_cpu_XXX): Likewise.
	(reg_XXX): Likewise.
	* sysdeps/x86/tst-get-cpu-features.c: Include <stdio.h> and
	<support/check.h>.
	(CHECK_CPU_FEATURE): New.
	(CHECK_CPU_FEATURE_USABLE): Likewise.
	(cpu_kinds): Likewise.
	(do_test): Print vendor, family, model and stepping.  Check
	HAS_CPU_FEATURE and CPU_FEATURE_USABLE.
	(TEST_FUNCTION): Removed.
	Include <support/test-driver.c> instead of
	"../../test-skeleton.c".
	* sysdeps/x86_64/multiarch/sched_cpucount.c (__sched_cpucount):
	Check POPCNT instead of POPCOUNT.
	* sysdeps/x86_64/multiarch/test-multiarch.c (do_test): Likewise.
2018-12-03 05:54:56 -08:00
Szabolcs Nagy
a502c5294b Remove the error handling wrapper from pow
Introduce new pow symbol version that doesn't do SVID compatible error
handling.  The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.

The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_pow.c and enabled for targets with their own pow implementation or
ifunc dispatch on __ieee754_pow by including math/w_pow.c.

The compatibility symbol version still uses the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously powl was an alias of pow, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling.  This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.

The __pow_finite symbol is now an alias of pow.  Both __pow_finite and
pow set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that
may affect that header.

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add pow.
	* math/w_pow_compat.c (__pow_compat): Change to versioned compat
	symbol.
	* math/w_pow.c: New file.
	* sysdeps/i386/fpu/w_pow.c: New file.
	* sysdeps/ia64/fpu/e_pow.S: Add versioned symbols.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Rename to __pow
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_pow.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_pow.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__ieee754_pow): Rename to
	__pow.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__ieee754_pow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_pow.c (__ieee754_pow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/w_pow.c: New file.
2018-11-21 09:58:36 +00:00
Szabolcs Nagy
f29b7c492d Remove the error handling wrapper from log
Introduce new log symbol version that doesn't do SVID compatible error
handling.  The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.

The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_log.c and enabled for targets with their own log implementation by
including math/w_log.c.

The compatibility symbol version still uses the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously logl was an alias of log, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling.  This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.

The __log_finite symbol is now an alias of log.  Both __log_finite and
log set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that may
affect that header.

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add log.
	* math/w_log_compat.c (__log_compat): Change to versioned compat
	symbol.
	* math/w_log.c: New file.
	* sysdeps/i386/fpu/w_log.c: New file.
	* sysdeps/ia64/fpu/e_log.S: Update.
	* sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Rename to __log
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_log.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_log.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_log-avx.c (__ieee754_log): Rename to
	__log.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/w_log.c: New file.
2018-11-21 09:56:27 +00:00
Szabolcs Nagy
c20a10561a Remove the error handling wrapper from exp and exp2
Introduce new exp and exp2 symbol version that don't do SVID compatible
error handling.  The standard errno and fp exception based error handling
is inline in the new code and does not have significant overhead.

The double precision wrappers are disabled for sysdeps/ieee754/dbl-64
by using empty w_exp.c and w_exp2.c files, the math/w_exp.c and
math/w_exp2.c files use the wrapper template and can be included by
targets that have their own exp and exp2 implementations or use ifunc
on the glibc internal __ieee754_exp symbol.

The compatibility symbol versions still use the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously expl and exp2l were aliases of exp and exp2,
now they point to the compatibility symbols with the wrapper, because
they still need the SVID compatible error handling.  This affects
NO_LONG_DOUBLE (e.g arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets
as well.

The _finite symbols are now aliases of the standard symbols (they have
no performance advantage anymore).  Both the standard symbols and
_finite symbols set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that may
affect that header (the new macro name is __exp instead of __ieee754_exp
which breaks some math.h macros).

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add exp and exp2.
	* math/w_exp2_compat.c (__exp2_compat): Change to versioned compat
	symbol, handle NO_LONG_DOUBLE and LONG_DOUBLE_COMPAT explicitly.
	* math/w_exp_compat.c (__exp_compat): Likewise.
	* math/w_exp.c: New file.
	* math/w_exp2.c: New file.
	* sysdeps/i386/fpu/w_exp.c: New file.
	* sysdeps/i386/fpu/w_exp2.c: New file.
	* sysdeps/ia64/fpu/e_exp.S: Add versioned symbols.
	* sysdeps/ia64/fpu/e_exp2.S: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Rename to __exp
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Rename to __exp2
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_exp.c: New file.
	* sysdeps/ieee754/dbl-64/w_exp2.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_exp.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_exp2.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp.c (__ieee754_exp): Rename to
	__exp.
	* sysdeps/x86_64/fpu/multiarch/w_exp.c: New file.
2018-11-21 09:55:02 +00:00
Martin Sebor
1626a1cfcd Add support for GCC 9 attribute copy.
GCC 9 has gained an enhancement to help detect attribute mismatches
between alias declarations and their targets.  It consists of a new
warning, -Wattribute-alias, an enhancement to an existing warning,
-Wmissing-attributes, and a new attribute called copy.

The purpose of the warnings is to help identify either possible bugs
(an alias declared with more restrictive attributes than its target
promises) or optimization or diagnostic opportunities (an alias target
missing some attributes that it could be declared with that might
benefit analysis and code generation).  The purpose of the new
attribute is to easily apply (almost) the same set of attributes
to one declaration as those already present on another.

As expected (and intended) the enhancement triggers warnings for
many alias declarations in Glibc code.  This change, tested on
x86_64-linux, avoids all instances of the new warnings by making
use of the attribute where appropriate.  To fully benefit from
the enhancement Glibc will need to be compiled with
 -Wattribute-alias=2 and remaining warnings reviewed and dealt with
(there are a couple of thousand but most should be straightforward
to deal with).

ChangeLog:

	* include/libc-symbols.h (__attribute_copy__): Define macro unless
	it's already defined.
	(_strong_alias): Use __attribute_copy__.
	(_weak_alias,  __hidden_ver1,  __hidden_nolink2): Same.
	* misc/sys/cdefs.h (__attribute_copy__): New macro.
	* sysdeps/x86_64/multiarch/memchr.c (memchr): Use __attribute_copy__.
	* sysdeps/x86_64/multiarch/memcmp.c (memcmp): Same.
	* sysdeps/x86_64/multiarch/mempcpy.c (mempcpy): Same.
	* sysdeps/x86_64/multiarch/memset.c (memset): Same.
	* sysdeps/x86_64/multiarch/stpcpy.c (stpcpy): Same.
	* sysdeps/x86_64/multiarch/strcat.c (strcat): Same.
	* sysdeps/x86_64/multiarch/strchr.c (strchr): Same.
	* sysdeps/x86_64/multiarch/strcmp.c (strcmp): Same.
	* sysdeps/x86_64/multiarch/strcpy.c (strcpy): Same.
	* sysdeps/x86_64/multiarch/strcspn.c (strcspn): Same.
	* sysdeps/x86_64/multiarch/strlen.c (strlen): Same.
	* sysdeps/x86_64/multiarch/strncmp.c (strncmp): Same.
	* sysdeps/x86_64/multiarch/strncpy.c (strncpy): Same.
	* sysdeps/x86_64/multiarch/strnlen.c (strnlen): Same.
	* sysdeps/x86_64/multiarch/strpbrk.c (strpbrk): Same.
	* sysdeps/x86_64/multiarch/strrchr.c (strrchr): Same.
	* sysdeps/x86_64/multiarch/strspn.c (strspn): Same.
2018-11-09 17:24:12 -07:00
H.J. Lu
72771e5375 x86: Use _rdtsc intrinsic for HP_TIMING_NOW
Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
HP_TIMING_NOW.  This patch

1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h.
2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h
can check minimum x86 ISA to decide if _rdtsc can be used.

NB: Checking if __i686__ isn't sufficient since __i686__ may not be
defined when building for i686 class processors.

	* sysdeps/i386/init-arch.h: Removed.
	* sysdeps/i386/i586/init-arch.h: Likewise.
	* sysdeps/i386/i686/init-arch.h: Likewise.
	* sysdeps/i386/i686/hp-timing.h: Likewise.
	* sysdeps/x86_64/hp-timing.h: Likewise.
	* sysdeps/i386/isa.h: New file.
	* sysdeps/i386/i586/isa.h: Likewise.
	* sysdeps/i386/i686/isa.h: Likewise.
	* sysdeps/x86_64/isa.h: Likewise.
	* sysdeps/x86/hp-timing.h: New file.
	* sysdeps/x86/init-arch.h: Include <isa.h>.
2018-10-17 15:16:45 -07:00
Joseph Myers
7abf97bed9 Use trunc functions not __trunc functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __trunc functions to call the
corresponding trunc names instead, with asm redirection to __trunc
when the calls are not inlined.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_truncf.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise.
	* sysdeps/ieee754/float128/s_truncf128.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_trunc.c: Likewise.
	* sysdeps/ieee754/flt-32/s_truncf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
	* sysdeps/riscv/rvf/s_truncf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise.
	(ceil): Redirect to __ceil.
	(floor): Redirect to __floor.
	(trunc): Redirect to __trunc.
	(__truncl): Call trunc instead of __trunc.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc):
	Remove macro.
	[_ARCH_PWR5X] (__truncf): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use
	trunc functions instead of __trunc variants.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
2018-09-20 21:11:10 +00:00
Szabolcs Nagy
424c4f60ed Add new pow implementation
The algorithm is exp(y * log(x)), where log(x) is computed with about
1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result
in two doubles, and the exp part uses the same algorithm (and lookup
tables) as exp, but takes the input as two doubles and a sign (to handle
negative bases with odd integer exponent).  The __exp1 internal symbol
is no longer necessary.

There is separate code path when fma is not available but the worst case
error is about 0.54 ULP in both cases.  The lookup table and consts for
log are 4168 bytes.  The .rodata+.text is decreased by 37908 bytes on
aarch64.  The non-nearest rounding error is less than 1 ULP.

Improvements on Cortex-A72 compared to current glibc master:
pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1]
pow latency: 1.84x in [0.01 11.1]x[0.01 11.1]

Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and
arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets.

	* NEWS: Mention pow improvements.
	* math/Makefile (type-double-routines): Add e_pow_log_data.
	* sysdeps/generic/math_private.h (__exp1): Remove.
	* sysdeps/i386/fpu/e_pow_log_data.c: New file.
	* sysdeps/ia64/fpu/e_pow_log_data.c: New file.
	* sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma
	contraction.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove.
	(exp_inline): Remove.
	(__ieee754_exp): Only single double input is handled.
	* sysdeps/ieee754/dbl-64/e_pow.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file.
	* sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define.
	(__pow_log_data): Define.
	* sysdeps/ieee754/dbl-64/upow.h: Remove.
	* sysdeps/ieee754/dbl-64/upow.tbl: Remove.
	* sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file.
	* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma
	contraction.
	(CFLAGS-e_pow-fma4.c): Likewise.
2018-09-19 10:04:51 +01:00
Joseph Myers
71223ef909 Use ceil functions not __ceil functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __ceil functions to call the
corresponding ceil names instead, with asm redirection to __ceil when
the calls are not inlined.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_ceilf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_ceil.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise.
	* sysdeps/ieee754/float128/s_ceilf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_ceilf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
	* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil):
	Remove macro.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil
	functions instead of __ceil variants.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-17 20:42:06 +00:00
Joseph Myers
f29b6f17e4 Use rint functions not __rint functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __rint functions to call the
corresponding rint names instead, with asm redirection to __rint when
the calls are not inlined.  The x86_64 math_private.h is removed as no
longer useful after this patch.

This patch is relative to a tree with my floor patch
<https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied,
and much the same considerations arise regarding possibly replacing an
IFUNC call with a direct inline expansion.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_rintf.c: Likewise.
	* sysdeps/alpha/fpu/s_rint.c: Likewise.
	* sysdeps/alpha/fpu/s_rintf.c: Likewise.
	* sysdeps/i386/fpu/s_rintl.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_rint.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise.
	* sysdeps/ieee754/float128/s_rintf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_rintf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
	* sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise.
	* sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise.
	* sysdeps/powerpc/fpu/s_rint.c: Likewise.
	* sysdeps/powerpc/fpu/s_rintf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_rint.c: Likewise.
	* sysdeps/riscv/rvf/s_rintf.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/x86_64/fpu/math_private.h: Remove file.
	* math/e_scalb.c (invalid_fn): Use rint functions instead of
	__rint variants.
	* math/e_scalbf.c (invalid_fn): Likewise.
	* math/e_scalbl.c (invalid_fn): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
	Likewise.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
	Likewise.
	* sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise.
	* sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
2018-09-14 13:10:39 +00:00
Joseph Myers
e44acb2063 Use floor functions not __floor functions in glibc libm.
Similar to the changes that were made to call sqrt functions directly
in glibc, instead of __ieee754_sqrt variants, so that the compiler
could inline them automatically without needing special inline
definitions in lots of math_private.h headers, this patch makes libm
code call floor functions directly instead of __floor variants,
removing the inlines / macros for x86_64 (SSE4.1) and powerpc
(POWER5).

The redirection used to ensure that __ieee754_sqrt does still get
called when the compiler doesn't inline a built-in function expansion
is refactored so it can be applied to other functions; the refactoring
is arranged so it's not limited to unary functions either (it would be
reasonable to use this mechanism for copysign - removing the inline in
math_private_calls.h but also eliminating unnecessary local PLT entry
use in the cases (powerpc soft-float and e500v1, for IBM long double)
where copysign calls don't get inlined).

The point of this change is that more architectures can get floor
calls inlined where they weren't previously (AArch64, for example),
without needing special inline definitions in their math_private.h,
and existing such definitions in math_private.h headers can be
removed.

Note that it's possible that in some cases an inline may be used where
an IFUNC call was previously used - this is the case on x86_64, for
example.  I think the direct calls to floor are still appropriate; if
there's any significant performance cost from inline SSE2 floor
instead of an IFUNC call ending up with SSE4.1 floor, that indicates
that either the function should be doing something else that's faster
than using floor at all, or it should itself have IFUNC variants, or
that the compiler choice of inlining for generic tuning should change
to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
might be called at runtime - but not that glibc should avoid calling
floor internally.  (After all, all the same considerations would apply
to any user program calling floor, where it might either be inlined or
left as an out-of-line call allowing for a possible IFUNC.)

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT):
	New macro.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (floor): Likewise.
	* sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_floorf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_floor.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise.
	* sysdeps/ieee754/float128/s_floorf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_floorf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
	* sysdeps/riscv/rvf/s_floorf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor):
	Remove macro.
	[_ARCH_PWR5X] (__floorf): Likewise.
	* sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove
	inline function.
	[__SSE4_1__] (__floorf): Likewise.
	* math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions
	instead of __floor variants.
	* math/w_lgamma_r_compat.c (__lgamma_r): Likewise.
	* math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise.
	* math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise.
	* math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise.
	* math/w_lgammal_r_compat.c (__lgammal_r): Likewise.
	* math/w_tgamma_compat.c (__tgamma): Likewise.
	* math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise.
	* math/w_tgammaf_compat.c (__tgammaf): Likewise.
	* math/w_tgammal_compat.c (__tgammal): Likewise.
	* sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
	Likewise.
	* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
	* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise.
	* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
	* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-14 13:09:01 +00:00
Joseph Myers
4e7fbdd7c2 Remove x86_64 math_private.h asms.
The x86_64 math_private.h has asm versions of the macros to
reinterpret between floating-point and integer types.

This is the sort of thing we now strongly discourage; the expectation
in such cases, where the generic C code gives the compiler all the
information needed about the required semantics, is that you should
get the compiler to do the right thing for the generic C code rather
than writing an asm version.

Trivial tests showed GCC generates the expected single instructions
for reinterpretation from floating point to integer.  In the other
direction, it goes via memory when the asms don't; I asked about this
in GCC bug 87236 and was advised this was deliberate for generic
tuning because it was faster that way on some AMD processors (but
-mtune=intel, and -Os with the latest GCC, avoid going via memory).
The asms don't and can't know about those tuning details, so that's
evidence that they are actually making the code worse.

This patch removes the asms accordingly.  Tested for x86_64.

	* sysdeps/x86_64/fpu/math_private.h (MOVD): Remove macro.
	(MOVQ): Likewise.
	(EXTRACT_WORDS64): Likewise.
	(INSERT_WORDS64): Likewise.
	(GET_FLOAT_WORD): Likewise.
	(SET_FLOAT_WORD): Likewise.
2018-09-11 14:51:40 +00:00
Szabolcs Nagy
e70c176825 Add new exp and exp2 implementations
Optimized exp and exp2 implementations using a lookup table for
fractional powers of 2.  There are several variants, see e_exp_data.c,
they can be selected by modifying math_config.h allowing different
tradeoffs.

The default selection should be acceptable as generic libm code.
Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on
aarch64 the rodata size is 2160 bytes, shared between exp and exp2.
On aarch64 .text + .rodata size decreased by 24912 bytes.

The non-nearest rounding error is less than 1 ULP even on targets
without efficient round implementation (although the error rate is
higher in that case).  Targets with single instruction, rounding mode
independent, to nearest integer rounding and conversion can use them
by setting TOINT_INTRINSICS and adding the necessary code to their
math_private.h.

The __exp1 code uses the same algorithm, so the error bound of pow
increased a bit.

New double precision error handling code was added following the
style of the single precision error handling code.

Improvements on Cortex-A72 compared to current glibc master:
exp thruput: 1.61x in [-9.9 9.9]
exp latency: 1.53x in [-9.9 9.9]
exp thruput: 1.13x in [0.5 1]
exp latency: 1.30x in [0.5 1]
exp2 thruput: 2.03x in [-9.9 9.9]
exp2 latency: 1.64x in [-9.9 9.9]

For small (< 1) inputs the current exp code uses a separate algorithm
so the speed up there is less.

Was tested on
aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and
x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and
powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets,
only non-nearest rounding ulp errors increase and they are within
acceptable bounds (ulp updates are in separate patches).

	* NEWS: Mention exp and exp2 improvements.
	* math/Makefile (libm-support): Remove t_exp.
	(type-double-routines): Add math_err and e_exp_data.
	* sysdeps/aarch64/libm-test-ulps: Update.
	* sysdeps/arm/libm-test-ulps: Update.
	* sysdeps/i386/fpu/e_exp_data.c: New file.
	* sysdeps/i386/fpu/math_err.c: New file.
	* sysdeps/i386/fpu/t_exp.c: Remove.
	* sysdeps/ia64/fpu/e_exp_data.c: New file.
	* sysdeps/ia64/fpu/math_err.c: New file.
	* sysdeps/ia64/fpu/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp_data.c: New file.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound.
	* sysdeps/ieee754/dbl-64/eexp.tbl: Remove.
	* sysdeps/ieee754/dbl-64/math_config.h: New file.
	* sysdeps/ieee754/dbl-64/math_err.c: New file.
	* sysdeps/ieee754/dbl-64/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/t_exp2.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.tbl: Remove.
	* sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file.
	* sysdeps/m68k/m680x0/fpu/math_err.c: New file.
	* sysdeps/m68k/m680x0/fpu/t_exp.c: Remove.
	* sysdeps/powerpc/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-09-05 16:22:00 +01:00
Paul Pluzhnikov
a6e8926f8d [BZ #20271] Add newlines in __libc_fatal calls. 2018-08-31 18:04:32 -07:00
Joseph Myers
ff6b24501f Split fenv_private.h out of math_private.h more consistently.
On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h.  As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.

This patch makes the fenv_private.h name an architecture-independent
convention in glibc.  The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed).  Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>.  The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64.  The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).

Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.

	* sysdeps/aarch64/fpu/fenv_private.h: New file.  Based on ....
	* sysdeps/aarch64/fpu/math_private.h: ... this file.  All contents
	moved to fenv_private.h except for ...
	(TOINT_INTRINSICS): Kept in math_private.h.
	(roundtoint): Likewise.
	(converttoint): Likewise.
	* sysdeps/arm/fenv_private.h: Change multiple-include guard to
	[ARM_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/arm/math_private.h: Remove.
	* sysdeps/generic/fenv_private.h: New file.  Contents moved from
	....
	* sysdeps/generic/math_private.h: ... this file.  Include
	<stdbool.h>.  Do not include <fenv.h> or <get-rounding-mode.h>.
	Include <fenv_private.h>.  Remove functions and macros moved to
	fenv_private.h.
	* sysdeps/i386/fpu/math_private.h: Remove.
	* sysdeps/mips/math_private.h: Move to ....
	* sysdeps/mips/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [MIPS_FENV_PRIVATE_H].  Remove
	[__mips_hard_float] conditional.  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
	guard to [POWERPC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/math_private.h: Do not include
	<fenv_private.h>.
	* sysdeps/riscv/rvf/math_private.h: Move to ....
	* sysdeps/riscv/rvf/fenv_private.h: ... here.  Change
	multiple-include guard to [RISCV_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
	to [SPARC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/sparc/fpu/math_private.h: Remove.
	* sysdeps/i386/fpu/fenv_private.h: Move to ....
	* sysdeps/x86/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [X86_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/x86_64/fpu/math_private.h: Do not include
	<sysdeps/i386/fpu/fenv_private.h>.
2018-08-28 20:48:49 +00:00
Wilco Dijkstra
49acec179c Fix spaces in x86_64 ULP file
Fix a few missing spaces, it's now identical to the regenerated version.

Passes GLIBC tests on x64.

	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.
2018-08-15 12:56:22 +01:00
Wilco Dijkstra
599cf39766 Improve performance of sinf and cosf
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials.  The returned values are identical to sincosf
for the same input.  ULP definitions for AArch64 and x64 are updated.

sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4  : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0   : 2.3x
* |x| < Inf     : 3.0x

	* NEWS: Mention sinf, cosf, sincosf.
	* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
	constants rather than including generic sincosf.h.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
	* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
	(reduced_cos): Remove.
	(sinf_poly): New function.
	* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
2018-08-14 10:45:59 +01:00
Joseph Myers
2ce7ba7d15 Move SNAN_TESTS_* out of math-tests.h.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the SNAN_TESTS_* macros for individual types out to their own
sysdeps header (while the type-generic SNAN_TESTS wrapper for those
macros remains in math-tests.h).

Tested for x86_64 and x86, and with build-many-glibcs.py.

	* sysdeps/generic/math-tests-snan.h: New file.
	* sysdeps/generic/math-tests.h: Include <math-tests-snan.h>.
	(SNAN_TESTS_float): Do not define here.
	(SNAN_TESTS_double): Likewise.
	(SNAN_TESTS_long_double): Likewise.
	(SNAN_TESTS_float128): Likewise.
	* sysdeps/i386/fpu/math-tests-snan.h: New file.
	* sysdeps/i386/fpu/math-tests.h: Remove file.
	* sysdeps/ia64/math-tests-snan.h: New file.
	* sysdeps/ia64/math-tests.h: Remove file.
	* sysdeps/x86/math-tests.h: Likewise.
	* sysdeps/x86_64/fpu/math-tests-snan.h: New file.
2018-08-10 19:22:01 +00:00
Wilco Dijkstra
ea5c662c62 Improve performance of sincosf
This patch is a complete rewrite of sincosf.  The new version is
significantly faster, as well as simple and accurate.
The worst-case ULP is 0.5607, maximum relative error is 0.5303 * 2^-23 over
all 4 billion inputs.  In non-nearest rounding modes the error is 1ULP.

The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction.  The code uses approximate integer
comparisons to quickly decide between these cases.

The small range reducer uses a single reduction step to handle values up to
120.0.  It is fastest on targets which support inlined round instructions.

The large range reducer uses integer arithmetic for simplicity.  It does a
32x96 bit multiply to compute a 64-bit modulo result.  This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4.  It could be further optimized, however it is
already much faster than necessary.

sincosf throughput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.6x
* |x| < M_PI_4  : 1.7x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0   : 1.8x
* |x| < Inf     : 2.3x

	* math/Makefile: Add s_sincosf_data.c.
	* sysdeps/ia64/fpu/s_sincosf_data.c: New file.
	* sysdeps/ieee754/flt-32/s_sincosf.h (abstop12): Add new function.
	(sincosf_poly): Likewise.
	(reduce_small): Likewise.
	(reduce_large): Likewise.
	* sysdeps/ieee754/flt-32/s_sincosf.c (sincosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf_data.c: New file with sincosf data.
	* sysdeps/m68k/m680x0/fpu/s_sincosf_data.c: New file.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: New file.
2018-08-10 17:34:39 +01:00
Ilya Leoshkevich
8d997d2253 Move __fentry__ version definition to sysdeps/{i386,x86_64}
__fentry__ symbol is currently not defined for other architectures.
Attempts to introduce it cause abicheck to fail, because it will be
available since 2.29 earliest, and not 2.13, which is the case for
Intel.  With the new code, abicheck passes for i686-linux-gnu,
x86_64-linux-gnu and x86_64-linux-gnu32 triples.

ChangeLog:

	* stdlib/Versions: Remove __fentry__.
	* sysdeps/i386/Versions: Add __fentry__.
	* sysdeps/x86_64/Versions: Add __fentry__.
2018-08-10 09:07:44 +02:00
H.J. Lu
fb4c32aef6 x86: Move STATE_SAVE_OFFSET/STATE_SAVE_MASK to sysdep.h
Move STATE_SAVE_OFFSET and STATE_SAVE_MASK to sysdep.h to make
sysdeps/x86/cpu-features.h a C header file.

	* sysdeps/x86/cpu-features.h (STATE_SAVE_OFFSET): Removed.
	(STATE_SAVE_MASK): Likewise.
	Don't check __ASSEMBLER__ to include <cpu-features-offsets.h>.
	* sysdeps/x86/sysdep.h (STATE_SAVE_OFFSET): New.
	(STATE_SAVE_MASK): Likewise.
	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features-offsets.h>
	instead of <cpu-features.h>.
2018-08-06 06:25:43 -07:00
H.J. Lu
430388d5dc x86: Don't include <init-arch.h> in assembly codes
There is no need to include <init-arch.h> in assembly codes since all
x86 IFUNC selector functions are written in C.  Tested on i686 and
x86-64.  There is no code change in libc.so, ld.so and libmvec.so.

	* sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include
	<init-arch.h>.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
2018-08-03 08:05:00 -07:00
Siddhesh Poyarekar
dce452dc52 Rename the glibc.tune namespace to glibc.cpu
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to.  Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses.  Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.

	* NEWS: Mention the change.
	* elf/dl-tunables.list: Rename tune namespace to cpu.
	* sysdeps/powerpc/dl-tunables.list: Likewise.
	* sysdeps/x86/dl-tunables.list: Likewise.
	* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
	cpu.name.
	* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
	* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
	* manual/README.tunables: Likewise.
	* manual/tunables.texi: Likewise.
	* sysdeps/powerpc/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
	(init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.c: Likewise.
	* sysdeps/x86/cpu-features.h: Likewise.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86_64/Makefile: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-08-02 23:49:19 +05:30
H.J. Lu
9aa3113a42 x86: Rename __glibc_reserved2 to ssp_base in tcbhead_t
This will be used to record the current shadow stack base for shadow
stack switching by getcontext, makecontext, setcontext and swapcontext.
If the target shadow stack base is the same as the current shadow stack
base, we unwind the shadow stack.  Otherwise it is a stack switch and
we look for a restore token to restore the target shadow stack.

	* sysdeps/i386/nptl/tcb-offsets.sym (SSP_BASE_OFFSET): New.
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Replace __glibc_reserved2
	with ssp_base.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (SSP_BASE_OFFSET): New.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Replace __glibc_reserved2
	with ssp_base.
2018-07-25 04:39:39 -07:00
H.J. Lu
ca027e0f62 x86-64: Add endbr64 to tst-quadmod[12].S
Add endbr64 to tst-quadmod1.S and tst-quadmod2.S so that func and foo
can be called indirectly.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/x86_64/tst-quadmod1.S (func): Add endbr64 if IBT is
	enabled.
	(foo): Likewise.
	* sysdeps/x86_64/tst-quadmod2.S (func) : Likewise.
	(foo): Likewise.
2018-07-24 05:12:14 -07:00
H.J. Lu
e2d40a8822 x86-64: Use _CET_NOTRACK in memcmp-sse4.S
* sysdeps/x86_64/multiarch/memcmp-sse4.S (BRANCH_TO_JMPTBL_ENTRY):
	Add _CET_NOTRACK before indirect jump to jump table.
2018-07-18 08:07:32 -07:00
H.J. Lu
03aaf49b68 x86-64: Use _CET_NOTRACK in memcpy-ssse3.S
* sysdeps/x86_64/multiarch/memcpy-ssse3.S
	(BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump
	to jump table.
	(MEMCPY): Likewise.
2018-07-18 06:39:46 -07:00
H.J. Lu
811e9e52b2 x86-64: Use _CET_NOTRACK in memcpy-ssse3-back.S
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S
	(BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump
	to jump table.
	(MEMCPY): Likewise.
2018-07-18 06:38:23 -07:00
H.J. Lu
8817df4265 x86-64: Use _CET_NOTRACK in strcmp-sse42.S
* sysdeps/x86_64/multiarch/strcmp-sse42.S (STRCMP_SSE42): Add
	_CET_NOTRACK before indirect jump to jump table.
2018-07-18 06:37:09 -07:00
H.J. Lu
921595d151 x86-64: Use _CET_NOTRACK in strcpy-sse2-unaligned.S
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S
	(BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump
	to jump table.
2018-07-18 06:33:06 -07:00
H.J. Lu
4ef60d9597 x86_64: Use _CET_NOTRACK in strcmp.S
* sysdeps/x86_64/strcmp.S (STRCMP): Add _CET_NOTRACK before
	indirect jump to jump table.
2018-07-18 06:31:53 -07:00
H.J. Lu
5efc6777ad x86-64: Add _CET_ENDBR to STRCMP_SSE42
Add _CET_ENDBR to STRCMP_SSE42, which is called indirectly, to support
IBT.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/x86_64/multiarch/strcmp-sse42.S (STRCMP_SSE42): Add
	_CET_ENDBR.
2018-07-17 16:08:47 -07:00
H.J. Lu
562837c002 x86: Add _CET_ENDBR to functions in dl-tlsdesc.S
Add _CET_ENDBR to functions in dl-tlsdesc.S, which are called indirectly,
to support IBT.

Tested on i686 and x86-64.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/i386/dl-tlsdesc.S (_dl_tlsdesc_return): Add
	_CET_ENDBR.
	(_dl_tlsdesc_undefweak): Likewise.
	(_dl_tlsdesc_dynamic): Likewise.
	(_dl_tlsdesc_resolve_abs_plus_addend): Likewise.
	(_dl_tlsdesc_resolve_rel): Likewise.
	(_dl_tlsdesc_resolve_rela): Likewise.
	(_dl_tlsdesc_resolve_hold): Likewise.
	* sysdeps/x86_64/dl-tlsdesc.S (_dl_tlsdesc_return): Likewise.
	(_dl_tlsdesc_undefweak): Likewise.
	(_dl_tlsdesc_dynamic): Likewise.
	(_dl_tlsdesc_resolve_rela): Likewise.
	(_dl_tlsdesc_resolve_hold): Likewise.
2018-07-17 16:07:17 -07:00
H.J. Lu
124bcde683 x86: Add _CET_ENDBR to functions in crti.S
Add _CET_ENDBR to functions in crti.S, which are called indirectly, to
support IBT.

Tested on i686 and x86-64.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/i386/crti.S (_init): Add _CET_ENDBR.
	(_fini): Likewise.
	* sysdeps/x86_64/crti.S (_init): Likewise.
	(_fini): Likewise.
2018-07-17 16:05:18 -07:00
H.J. Lu
f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598]
Intel Control-flow Enforcement Technology (CET) instructions:

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en
forcement-technology-preview.pdf

includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK).

GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to
indicate that all executable sections are compatible with IBT when
ENDBR instruction starts each valid target where an indirect branch
instruction can land.  Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on
output only if it is set on all relocatable inputs.

On an IBT capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable IBT and
lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable.
2. When loading an executable with an interpreter, enable IBT if
GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable,
     disable IBT.
  b. Lock IBT.
3. If IBT is enabled, when loading a shared object without
GNU_PROPERTY_X86_FEATURE_1_IBT:
  a. If legacy interwork is allowed, then mark all pages in executable
     PT_LOAD segments in legacy code page bitmap.  Failure of legacy code
     page bitmap allocation causes an error.
  b. If legacy interwork isn't allowed, it causes an error.

GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to
indicate that all executable sections are compatible with SHSTK where
return address popped from shadow stack always matches return address
popped from normal stack.  Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK
on output only if it is set on all relocatable inputs.

On a SHSTK capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable.
2. When loading an executable with an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable
     or any shared objects loaded via the DT_NEEDED tag, disable SHSTK.
  b. Otherwise lock SHSTK.
3. After SHSTK is enabled, it is an error to load a shared object
without GNU_PROPERTY_X86_FEATURE_1_SHSTK.

To enable CET support in glibc, --enable-cet is required to configure
glibc.  When CET is enabled, both compiler and assembler must support
CET.  Otherwise, it is a configure-time error.

To support CET run-time control,

1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate
if IBT or SHSTK are enabled at run-time.  It should be initialized by
init_cpu_features.
2. For dynamic executables:
   a. A l_cet field is added to struct link_map to indicate if IBT or
      SHSTK is enabled in an ELF module.  _dl_process_pt_note or
      _rtld_process_pt_note is called to process PT_NOTE segment for
      GNU program property and set l_cet.
   b. _dl_open_check is added to check IBT and SHSTK compatibilty when
      dlopening a shared object.
3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with
_dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if
SHSTK is enabled.

CET run-time control can be changed via GLIBC_TUNABLES with

$ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive|on|off]
$ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive|on|off]

1. permissive: SHSTK is disabled when dlopening a legacy ELF module.
2. on: IBT or SHSTK are always enabled, regardless if there are IBT or
SHSTK bits in GNU program property.
3. off: IBT or SHSTK are always disabled, regardless if there are IBT or
SHSTK bits in GNU program property.

<cet.h> from CET-enabled GCC is automatically included by assembly codes
to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK
to GNU program property.  _CET_ENDBR is added at the entrance of all
assembly functions whose address may be taken.  _CET_NOTRACK is used to
insert NOTRACK prefix with indirect jump table to support IBT.  It is
defined as notrack when _CET_NOTRACK is defined in <cet.h>.

	 [BZ #21598]
	* configure.ac: Add --enable-cet.
	* configure: Regenerated.
	* elf/Makefille (all-built-dso): Add a comment.
	* elf/dl-load.c (filebuf): Moved before "dynamic-link.h".
	Include <dl-prop.h>.
	(_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE
	segment.
	* elf/dl-open.c: Include <dl-prop.h>.
	(dl_open_worker): Call _dl_open_check.
	* elf/rtld.c: Include <dl-prop.h>.
	(dl_main): Call _rtld_process_pt_note on PT_NOTE segment.  Call
	_rtld_main_check.
	* sysdeps/generic/dl-prop.h: New file.
	* sysdeps/i386/dl-cet.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise.
	* sysdeps/x86/cet-tunables.h: Likewise.
	* sysdeps/x86/check-cet.awk: Likewise.
	* sysdeps/x86/configure: Likewise.
	* sysdeps/x86/configure.ac: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.
	* sysdeps/x86/dl-procruntime.c: Likewise.
	* sysdeps/x86/dl-prop.h: Likewise.
	* sysdeps/x86/libc-start.h: Likewise.
	* sysdeps/x86/link_map.h: Likewise.
	* sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
	(_dl_runtime_resolve_shstk): New.
	(_dl_runtime_profile_shstk): Likewise.
	* sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet
	if CET is enabled.
	(CFLAGS-.o): Add -fcf-protection if CET is enabled.
	(CFLAGS-.os): Likewise.
	(CFLAGS-.op): Likewise.
	(CFLAGS-.oS): Likewise.
	(asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET
	is enabled.
	(tests-special): Add $(objpfx)check-cet.out.
	(cet-built-dso): New.
	(+$(cet-built-dso:=.note)): Likewise.
	(common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note).
	($(objpfx)check-cet.out): New.
	(generated): Add check-cet.out.
	* sysdeps/x86/cpu-features.c: Include <dl-cet.h> and
	<cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New prototype.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	(init_cpu_features): Call get_cet_status to check CET status
	and update dl_x86_feature_1 with CET status.  Call
	TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK
	(set_x86_shstk).  Disable and lock CET in libc.a.
	* sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New function.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	* sysdeps/x86/sysdep.h (_CET_NOTRACK): New.
	(_CET_ENDBR): Define if not defined.
	(ENTRY): Add _CET_ENDBR.
	* sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and
	x86_shstk.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
2018-07-16 14:08:27 -07:00
H.J. Lu
faaee1f07e x86: Support shadow stack pointer in setjmp/longjmp
Save and restore shadow stack pointer in setjmp and longjmp to support
shadow stack in Intel CET.  Use feature_1 in tcbhead_t to check if
shadow stack is enabled before saving and restoring shadow stack pointer.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/i386/__longjmp.S: Include <jmp_buf-ssp.h>.
	(__longjmp): Restore shadow stack pointer if shadow stack is
	enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
	isn't defined for __longjmp_cancel.
	* sysdeps/i386/bsd-_setjmp.S: Include <jmp_buf-ssp.h>.
	(_setjmp): Save shadow stack pointer if shadow stack is enabled
	and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/i386/bsd-setjmp.S: Include <jmp_buf-ssp.h>.
	(setjmp): Save shadow stack pointer if shadow stack is enabled
	and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/i386/setjmp.S: Include <jmp_buf-ssp.h>.
	(__sigsetjmp): Save shadow stack pointer if shadow stack is
	enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/unix/sysv/linux/i386/____longjmp_chk.S: Include
	<jmp_buf-ssp.h>.
	(____longjmp_chk): Restore shadow stack pointer if shadow stack
	is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/unix/sysv/linux/x86/Makefile (gen-as-const-headers):
	Remove jmp_buf-ssp.sym.
	* sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S: Include
	<jmp_buf-ssp.h>.
	(____longjmp_chk): Restore shadow stack pointer if shadow stack
	is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/x86/Makefile (gen-as-const-headers): Add
	jmp_buf-ssp.sym.
	* sysdeps/x86/jmp_buf-ssp.sym: New dummy file.
	* sysdeps/x86_64/__longjmp.S: Include <jmp_buf-ssp.h>.
	(__longjmp): Restore shadow stack pointer if shadow stack is
	enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
	isn't defined for __longjmp_cancel.
	* sysdeps/x86_64/setjmp.S: Include <jmp_buf-ssp.h>.
	(__sigsetjmp): Save shadow stack pointer if shadow stack is
	enabled and SHADOW_STACK_POINTER_OFFSET is defined.
2018-07-14 05:59:53 -07:00
H.J. Lu
ebff9c5cfa x86: Rename __glibc_reserved1 to feature_1 in tcbhead_t [BZ #22563]
feature_1 has X86_FEATURE_1_IBT and X86_FEATURE_1_SHSTK bits for CET
run-time control.

CET_ENABLED, IBT_ENABLED and SHSTK_ENABLED are defined to 1 or 0 to
indicate that if CET, IBT and SHSTK are enabled.

<tls-setup.h> is added to set up thread-local data.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ #22563]
	* nptl/pthread_create.c: Include <tls-setup.h>.
	(__pthread_create_2_1): Call tls_setup_tcbhead.
	* sysdeps/generic/tls-setup.h: New file.
	* sysdeps/x86/nptl/tls-setup.h: Likewise.
	* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET):
	Likewise.
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Rename __glibc_reserved1
	to feature_1.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/x86/sysdep.h (X86_FEATURE_1_IBT): New.
	(X86_FEATURE_1_SHSTK): Likewise.
	(CET_ENABLED): Likewise.
	(IBT_ENABLED): Likewise.
	(SHSTK_ENABLED): Likewise.
2018-07-14 05:56:46 -07:00
H.J. Lu
0221ce2a90 i386: Change offset of __private_ss to 0x30 [BZ #23250]
sysdeps/i386/nptl/tls.h has

typedef struct
{
  void *tcb;            /* Pointer to the TCB.  Not necessarily the
                           thread descriptor used by libpthread.  */
  dtv_t *dtv;
  void *self;           /* Pointer to the thread descriptor.  */
  int multiple_threads;
  uintptr_t sysinfo;
  uintptr_t stack_guard;
  uintptr_t pointer_guard;
  int gscope_flag;
  int __glibc_reserved1;
  /* Reservation of some values for the TM ABI.  */
  void *__private_tm[4];
  /* GCC split stack support.  */
  void *__private_ss;
} tcbhead_t;

The offset of __private_ss is 0x34.  But GCC defines

/* We steal the last transactional memory word.  */
 #define TARGET_THREAD_SPLIT_STACK_OFFSET 0x30

and libgcc/config/i386/morestack.S has

	cmpl	%gs:0x30,%eax		# See if we have enough space.
	movl	%eax,%gs:0x30		# Save the new stack boundary.
	movl	%eax,%gs:0x30		# Save the new stack boundary.
	movl	%ecx,%gs:0x30		# Save new stack boundary.
	movl	%eax,%gs:0x30
	movl	%gs:0x30,%eax
	movl	%eax,%gs:0x30

Since update TARGET_THREAD_SPLIT_STACK_OFFSET changes split stack ABI,
this patch updates tcbhead_t to match GCC.

	[BZ #23250]
	[BZ #10686]
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Change __private_tm[4]
	to _private_tm[3] and add __glibc_reserved2.
	Add _Static_assert of offset of __private_ss == 0x30.
	* sysdeps/x86_64/nptl/tls.h: Add _Static_assert of offset of
	__private_ss == 0x40 for ILP32 and == 0x70 for LP64.
2018-06-12 06:34:48 -07:00
Florian Weimer
e826574c98 x86: Make strncmp usable from rtld
Due to the way the conditions were written, the rtld build of strncmp
ended up with no definition of the strncmp symbol at all: The
implementations were renamed for use within an IFUNC resolver, but the
IFUNC resolver itself was missing (because rtld does not use IFUNCs).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-06-12 15:00:33 +02:00
H.J. Lu
67c0579669 Mark _init and _fini as hidden [BZ #23145]
_init and _fini are special functions provided by glibc for linker to
define DT_INIT and DT_FINI in executable and shared library.  They
should never be put in dynamic symbol table.  This patch marks them as
hidden to remove them from dynamic symbol table.

Tested with build-many-glibcs.py.

	[BZ #23145]
	* elf/Makefile (tests-special): Add $(objpfx)check-initfini.out.
	($(all-built-dso:=.dynsym): New target.
	(common-generated): Add $(all-built-dso:$(common-objpfx)%=%.dynsym).
	($(objpfx)check-initfini.out): New target.
	(generated): Add check-initfini.out.
	* scripts/check-initfini.awk: New file.
	* sysdeps/aarch64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/alpha/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/arm/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/hppa/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/i386/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/ia64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/m68k/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/microblaze/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/mips/mips32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/mips/mips64/n32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/mips/mips64/n64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/nios2/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/powerpc/powerpc32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/powerpc/powerpc64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/s390/s390-32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/s390/s390-64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/sh/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/sparc/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/x86_64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
2018-06-08 10:28:52 -07:00
Leonardo Sandoval
1457016337 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2
Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector
comparison as much as possible. Peak performance observed on a SkyLake
machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp,
respectively. The larger the comparison length, the more benefit using
avx2 functions, except on the strcmp, where peak is observed at length
== 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper
is preferred and AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and
	wcsncmp-sse2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __strcmp_avx2,
	__strncmp_avx2,	__wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2
	and __wcsncmp_sse2.
	* sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)):
	(IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
	AVX unaligned load is fast and vzeroupper is preferred.
	* sysdeps/x86_64/multiarch/strncmp.c: Likewise.
	* sysdeps/x86_64/multiarch/strcmp-avx2.S: New file.
	* sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp.c: Likewise.
	* sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp
	is undefined.
2018-06-01 16:32:43 -05:00
Paul Pluzhnikov
50d004c91c Update ulps with "make regen-ulps" on AMD Ryzen 7 1800X.
2018-05-30  Paul Pluzhnikov  <ppluzhnikov@google.com>

	* sysdeps/x86_64/fpu/libm-test-ulps (log_vlen8_avx2): Update for
	AMD Ryzen 7 1800X.
2018-05-30 09:17:47 -07:00
H.J. Lu
727b38df05 x86-64: Skip zero length in __mem[pcpy|move|set]_erms
This patch skips zero length in __mempcpy_erms, __memmove_erms and
__memset_erms.

Tested on x86-64.

	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
	(__mempcpy_erms): Skip zero length.
	(__memmove_erms): Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
	(__memset_erms): Likewise.
2018-05-23 11:25:42 -07:00
Andreas Schwab
9aaaab7c6e Don't write beyond destination in __mempcpy_avx512_no_vzeroupper (bug 23196)
When compiled as mempcpy, the return value is the end of the destination
buffer, thus it cannot be used to refer to the start of it.
2018-05-23 09:50:57 +02:00
H.J. Lu
e28e9b1ec4 x86-64: Check Prefer_FSRM in ifunc-memmove.h
Although the REP MOVSB implementations of memmove, memcpy and mempcpy
aren't used by the current processors, this patch adds Prefer_FSRM
check in ifunc-memmove.h so that they can be used in the future.

	* sysdeps/x86/cpu-features.h (bit_arch_Prefer_FSRM): New.
	(index_arch_Prefer_FSRM): Likewise.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Also check Prefer_FSRM.
	* sysdeps/x86_64/multiarch/ifunc-memmove.h (IFUNC_SELECTOR):
	Also return OPTIMIZE (erms) for Prefer_FSRM.
2018-05-21 16:54:59 -07:00
Leonardo Sandoval
e4ebc1380d x86-64: remove duplicate line on PREFETCH_ONE_SET macro
Tested on 64-bit AVX machine

       * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
       (PREFETCH_ONE_SET): Remove duplicate line
2018-05-17 09:09:13 -05:00
H.J. Lu
0068c08588 nptl: Remove __ASSUME_PRIVATE_FUTEX
Since __ASSUME_PRIVATE_FUTEX is always defined, this patch removes the
!__ASSUME_PRIVATE_FUTEX paths.

Tested with build-many-glibcs.py.

	* nptl/allocatestack.c (allocate_stack): Remove the
	!__ASSUME_PRIVATE_FUTEX paths.
	* nptl/descr.h (header): Remove the !__ASSUME_PRIVATE_FUTEX path.
	* nptl/nptl-init.c (__pthread_initialize_minimal_internal):
	Likewise.
	* sysdeps/i386/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Removed.
	* sysdeps/powerpc/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Likewise.
	* sysdeps/sh/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Likewise.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Likewise.
	* sysdeps/i386/nptl/tls.h: (tcbhead_t): Remve the
	!__ASSUME_PRIVATE_FUTEX path.
	* sysdeps/s390/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/sparc/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/unix/sysv/linux/i386/lowlevellock.S: Remove the
	!__ASSUME_PRIVATE_FUTEX macros.
	* sysdeps/unix/sysv/linux/lowlevellock-futex.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/cancellation.S: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Likewise.
	* sysdeps/unix/sysv/linux/kernel-features.h
	(__ASSUME_PRIVATE_FUTEX): Removed.
2018-05-17 04:25:10 -07:00
H.J. Lu
04958880e0 x86-64: Use IFUNC strncat inside libc.so
Unlike i386, we can call hidden IFUNC functions inside libc.so since
x86-64 PLT is always PIC.

Tested on x86-64.

	* sysdeps/x86_64/multiarch/strncat-c.c (STRNCAT_PRIMARY): Removed.
	Include <string/strncat.c>.
	* sysdeps/x86_64/multiarch/strncat.c (__strncat): New strong
	alias.
	(__GI___strncat): New hidden alias.
2018-05-16 09:04:35 -07:00
H.J. Lu
98ee36c7a4 x86: Add sysdeps/x86/ldsodefs.h
Merge sysdeps/i386/ldsodefs.h and sysdeps/x86_64/ldsodefs.h into
sysdeps/x86/ldsodefs.h.

Tested on i686 and x86-64.

	* sysdeps/i386/ldsodefs.h: Removed.
	* sysdeps/x86_64/ldsodefs.h: Moved to ...
	* sysdeps/x86/ldsodefs.h: This.
	(La_i86_regs): New.
	(La_i86_retval): Likewise.
	(ARCH_PLTENTER_MEMBERS): Add i86_gnu_pltenter.
	(ARCH_PLTEXIT_MEMBERS): i86_gnu_pltexit.

Acked-by: Christian Brauner (Ubuntu) christian@brauner.io
2018-05-14 09:19:41 -07:00
H.J. Lu
e322ec3282 x86-64: Remove the unnecessary testl in strlen-avx2.S
Since the result of testl is never used, this patch removes it.

Tested on 64-bit AVX2 machine.

	* sysdeps/x86_64/multiarch/strlen-avx2.S (STRLEN): Remove the
	unnecessary testl.
2018-05-14 03:41:35 -07:00
H.J. Lu
50d7d351b5 x86-64/memset: Mark the debugger symbol as hidden
When MEMSET_SYMBOL (__memset, erms) is provided for debugger, mark it
as hidden so that it will be local to the library.

	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
	(MEMSET_SYMBOL (__memset, erms)): Mark the debugger symbol as
	hidden.
2018-05-07 11:01:48 -07:00
Maciej W. Rozycki
10a446ddcc elf: Unify symbol address run-time calculation [BZ #19818]
Wrap symbol address run-time calculation into a macro and use it
throughout, replacing inline calculations.

There are a couple of variants, most of them different in a functionally
insignificant way.  Most calculations are right following RESOLVE_MAP,
at which point either the map or the symbol returned can be checked for
validity as the macro sets either both or neither.  In some places both
the symbol and the map has to be checked however.

My initial implementation therefore always checked both, however that
resulted in code larger by as much as 0.3%, as many places know from
elsewhere that no check is needed.  I have decided the size growth was
unacceptable.

Having looked closer I realized that it's the map that is the culprit.
Therefore I have modified LOOKUP_VALUE_ADDRESS to accept an additional
boolean argument telling it to access the map without checking it for
validity.  This in turn has brought quite nice results, with new code
actually being smaller for i686, and MIPS o32, n32 and little-endian n64
targets, unchanged in size for x86-64 and, unusually, marginally larger
for big-endian MIPS n64, as follows:

i686:
   text    data     bss     dec     hex filename
 152255    4052     192  156499   26353 ld-2.27.9000-base.so
 152159    4052     192  156403   262f3 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/el:
   text    data     bss     dec     hex filename
 142906    4396     260  147562   2406a ld-2.27.9000-base.so
 142890    4396     260  147546   2405a ld-2.27.9000-elf-symbol-value.so
MIPS/n32/el:
   text    data     bss     dec     hex filename
 142267    4404     260  146931   23df3 ld-2.27.9000-base.so
 142171    4404     260  146835   23d93 ld-2.27.9000-elf-symbol-value.so
MIPS/n64/el:
   text    data     bss     dec     hex filename
 149835    7376     408  157619   267b3 ld-2.27.9000-base.so
 149787    7376     408  157571   26783 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/eb:
   text    data     bss     dec     hex filename
 142870    4396     260  147526   24046 ld-2.27.9000-base.so
 142854    4396     260  147510   24036 ld-2.27.9000-elf-symbol-value.so
MIPS/n32/eb:
   text    data     bss     dec     hex filename
 142019    4404     260  146683   23cfb ld-2.27.9000-base.so
 141923    4404     260  146587   23c9b ld-2.27.9000-elf-symbol-value.so
MIPS/n64/eb:
   text    data     bss     dec     hex filename
 149763    7376     408  157547   2676b ld-2.27.9000-base.so
 149779    7376     408  157563   2677b ld-2.27.9000-elf-symbol-value.so
x86-64:
   text    data     bss     dec     hex filename
 148462    6452     400  155314   25eb2 ld-2.27.9000-base.so
 148462    6452     400  155314   25eb2 ld-2.27.9000-elf-symbol-value.so

	[BZ #19818]
	* sysdeps/generic/ldsodefs.h (LOOKUP_VALUE_ADDRESS): Add `set'
	parameter.
	(SYMBOL_ADDRESS): New macro.
	[!ELF_FUNCTION_PTR_IS_SPECIAL] (DL_SYMBOL_ADDRESS): Use
	SYMBOL_ADDRESS for symbol address calculation.
	* elf/dl-runtime.c (_dl_fixup): Likewise.
	(_dl_profile_fixup): Likewise.
	* elf/dl-symaddr.c (_dl_symbol_address): Likewise.
	* elf/rtld.c (dl_main): Likewise.
	* sysdeps/aarch64/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/alpha/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/arm/dl-machine.h (elf_machine_rel): Likewise.
	(elf_machine_rela): Likewise.
	* sysdeps/hppa/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/hppa/dl-symaddr.c (_dl_symbol_address): Likewise.
	* sysdeps/i386/dl-machine.h (elf_machine_rel): Likewise.
	(elf_machine_rela): Likewise.
	* sysdeps/ia64/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/m68k/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/microblaze/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/mips/dl-machine.h (ELF_MACHINE_BEFORE_RTLD_RELOC):
	Likewise.
	(elf_machine_reloc): Likewise.
	(elf_machine_got_rel): Likewise.
	* sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise.
	* sysdeps/nios2/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/riscv/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/s390/s390-64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/sh/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/tile/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2018-04-04 23:09:37 +01:00
Wilco Dijkstra
19a8b9a300 [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputs
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).

ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos.  The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.

Tested on AArch64 and x86_64 with no regressions.

The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction.  Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.

	* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
	inputs.
	(__cos): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
2018-04-03 16:52:16 +01:00
Joseph Myers
ffec7b2740 Use x86_64 backtrace as generic version.
No glibc configuration uses the present debug/backtrace.c, whereas
several #include the x86_64 version.  The x86_64 version is
effectively a generic one (using _Unwind_Backtrace from libgcc, which
works much more reliably than the built-in functions used by
debug/backtrace.c).  This patch moves it to debug/backtrace.c and
removes all the #includes of the x86_64 version from other
architectures which are no longer required.

I do not know whether all the other architecture-specific backtrace
implementations that are based on _Unwind_Backtrace are required, or
whether, where their differences from the generic version do something
useful, suitable hooks could be added to the generic version to reduce
the duplication involved.

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.

	* sysdeps/x86_64/backtrace.c: Move to ....
	* debug/backtrace.c: ... here.
	* sysdeps/aarch64/backtrace.c: Remove file.
	* sysdeps/alpha/backtrace.c: Likewise.
	* sysdeps/hppa/backtrace.c: Likewise.
	* sysdeps/ia64/backtrace.c: Likewise.
	* sysdeps/mips/backtrace.c: Likewise.
	* sysdeps/nios2/backtrace.c: Likewise.
	* sysdeps/riscv/backtrace.c: Likewise.
	* sysdeps/sh/backtrace.c: Likewise.
	* sysdeps/tile/backtrace.c: Likewise.
2018-03-21 17:25:30 +00:00
Wilco Dijkstra
700593fdd7 Remove all target specific __ieee754_sqrt(f/l) inlines
Remove the now unused target specific__ieee754_sqrt(f/l) inlines.
Also remove inlines of sqrt which are for really old GCC versions.
Removing these is desirable, under the general principle of leaving
such inlining to the compiler rather than trying to do it in installed
headers, especially when only very old compilers are affected.

Note that removing inlines for __ieee754_sqrt disables inlining in the
sqrt wrapper functions.  Given the sqrt function will typically only be
called for negative arguments, it doesn't matter whether the inlining
happens or not.

	* sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/s390/fpu/bits/mathinline.h: Remove file.
	* sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove.
	(sqrtf): Remove.
	(sqrtl): Remove.
	(__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove.
	* sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
2018-03-15 19:21:36 +00:00
Samuel Thibault
a5df0318ef hurd: add gscope support
* elf/dl-support.c [!THREAD_GSCOPE_IN_TCB] (_dl_thread_gscope_count):
Define variable.
* sysdeps/generic/ldsodefs.h [!THREAD_GSCOPE_IN_TCB] (struct
rtld_global): Add _dl_thread_gscope_count member.
* sysdeps/mach/hurd/tls.h: Include <atomic.h>.
[!defined __ASSEMBLER__] (THREAD_GSCOPE_GLOBAL, THREAD_GSCOPE_SET_FLAG,
THREAD_GSCOPE_RESET_FLAG, THREAD_GSCOPE_WAIT): Define macros.
* sysdeps/generic/tls.h: Document THREAD_GSCOPE_IN_TCB.
* sysdeps/aarch64/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/alpha/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/arm/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/hppa/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/i386/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/ia64/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/m68k/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/microblaze/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/mips/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/nios2/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/powerpc/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/riscv/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/s390/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/sh/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/sparc/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/tile/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/x86_64/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
2018-03-11 13:06:33 +01:00
Wilco Dijkstra
610ee1fc93 Remove mplog and mpexp
Remove the now unused mplog and mpexp files.

	* math/Makefile: Remove mpexp.c and mplog.c
	* sysdeps/i386/fpu/mpexp.c: Delete file.
	* sysdeps/i386/fpu/mplog.c: Likewise.
	* sysdeps/ia64/fpu/mpexp.c: Likewise.
	* sysdeps/ia64/fpu/mplog.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog.
	* sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function.
	* sysdeps/ieee754/dbl-64/mpexp.c: Delete file.
	* sysdeps/ieee754/dbl-64/mplog.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mplog.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*.
	* sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise.
2018-02-15 12:41:05 +00:00
Szabolcs Nagy
de800d8305 Remove slow paths from exp
Remove the __slowexp code, so exp is no longer correctly rounded.  The
result is computed to about 70 bits precision so the worst case ulp
error is about 0.500007 in nearest rounding mode.

	* manual/probes.texi: Remove slowexp probes.
	* math/Makefile: Remove slowexp.
	* sysdeps/generic/math_private.h (__slowexp): Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and
	document error bounds.
	* sysdeps/i386/fpu/slowexp.c: Remove.
	* sysdeps/ia64/fpu/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Remove.
	* sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
2018-02-12 11:33:33 +00:00
Wilco Dijkstra
c3d466cba1 Remove slow paths from pow
Remove the slow paths from pow.  Like several other double precision math
functions, pow is exactly rounded.  This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP.  Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.

All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP.  A simple test over a few hundred million
values shows pow is 10% faster on average.  This fixes BZ #13932.

	[BZ #13932]
	* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
	* benchtests/pow-inputs: Update comment for slow path cases.
	* manual/probes.texi (slowpow_p10): Delete removed probe.
	(slowpow_p10): Likewise.
	* math/Makefile: Remove halfulp.c and slowpow.c.
	* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/generic/math_private.h (__exp1): Remove error argument.
	(__halfulp): Remove.
	(__slowpow): Remove.
	* sysdeps/i386/fpu/halfulp.c: Delete file.
	* sysdeps/i386/fpu/slowpow.c: Likewise.
	* sysdeps/ia64/fpu/halfulp.c: Likewise.
	* sysdeps/ia64/fpu/slowpow.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
	improve comments and add error analysis.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
	(power1): Remove function:
	(log1): Remove error argument, add error analysis.
	(my_log2): Remove function.
	* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
	* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
	* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
	* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
	slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-12 10:47:09 +00:00
H.J. Lu
06fbebfff7 x86-64: Use __glibc_likely/__glibc_likely in dl-machine.h
The differences in elf/dl-reloc.os are

--- before    	2018-02-05 03:52:32.803125207 -0800
+++ after     	2018-02-05 03:52:14.913711879 -0800
@@ -1129,7 +1129,7 @@ _dl_relocate_object:
 	leaq	__PRETTY_FUNCTION__.9767(%rip), %rcx
 	leaq	.LC11(%rip), %rsi
 	leaq	.LC12(%rip), %rdi
-	movl	$540, %edx
+	movl	$539, %edx
 	call	__GI___assert_fail
 	.p2align 4,,10
 	.p2align 3

	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Replace
	__builtin_expect with __glibc_likely and __glibc_likely.
	(elf_machine_lazy_rel): Likewise.
2018-02-05 06:08:07 -08:00
Carlos O'Donell
2ec0e7eade Revert Intel CET changes to __jmp_buf_tag (Bug 22743)
In commit cba595c350 and commit
f81ddabffd, ABI compatibility with
applications was broken by increasing the size of the on-stack
allocated __pthread_unwind_buf_t beyond the oringal size.
Applications only have the origianl space available for
__pthread_unwind_register, and __pthread_unwind_next to use,
any increase in the size of __pthread_unwind_buf_t causes these
functions to write beyond the original structure into other
on-stack variables leading to segmentation faults in common
applications like vlc. The only workaround is to version those
functions which operate on the old sized objects, but this must
happen in glibc 2.28.

Thank you to Andrew Senkevich, H.J. Lu, and Aurelien Jarno, for
submitting reports and tracking the issue down.

The commit reverts the above mentioned commits and testing on
x86_64 shows that the ABI compatibility is restored. A tst-cleanup1
regression test linked with an older glibc now passes when run
with the newly built glibc. Previously a tst-cleanup1 linked with
an older glibc would segfault when run with an affected glibc build.

Tested on x86_64 with no regressions.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
2018-01-25 23:43:46 -08:00
H.J. Lu
207a72e298 x86-64: Properly align La_x86_64_retval to VEC_SIZE [BZ #22715]
_dl_runtime_profile calls _dl_call_pltexit, passing a pointer to
La_x86_64_retval which is allocated on stack.  The lrv_vector0
field in La_x86_64_retval must be aligned to size of vector register.
When allocating stack space for La_x86_64_retval, we need to make sure
that the address of La_x86_64_retval + RV_VECTOR0_OFFSET is aligned to
VEC_SIZE.  This patch checks the alignment of the lrv_vector0 field
and pads the stack space if needed.

Tested with x32 and x86-64 on SSE4, AVX and AVX512 machines.  It fixed

FAIL: elf/tst-audit10
FAIL: elf/tst-audit4
FAIL: elf/tst-audit5
FAIL: elf/tst-audit6
FAIL: elf/tst-audit7

on x32 AVX512 machine.

	[BZ #22715]
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_profile): Properly
	align La_x86_64_retval to VEC_SIZE.
2018-01-17 04:32:04 -08:00
Joseph Myers
4942c4ea48 Use LIBGCC_S_SO in x86_64 backtrace.
The x86_64 backtrace implementation is used as a generic
implementation (unwinding via unwind info and _Unwind_Backtrace) by
various other architectures.  This patch makes it more generic by
making it use LIBGCC_S_SO from gnu/lib-names.h instead of hardcoding
the libgcc_s.so.1 name, so that it can also be used on hppa which uses
libgcc_s.so.4.

Tested for x86_64.

	* sysdeps/x86_64/backtrace.c: Include <gnu/lib-names.h>.
	(init): Use LIBGCC_S_SO not hardcoded "libgcc_s.so.1".
2018-01-16 20:53:03 +00:00
H.J. Lu
c70e4e9c9e x86-64: Add sincosf with vector FMA
Since the x86-64 assembly version of sincosf is higly optimized with
vector instructions, there isn't much room for improvement.  However
s_sincosf.c written in C with vector math and intrinsics can be
optimized by GCC with FMA.

On Skylake, bench-sincosf reports performance improvement:

           Assembly       FMA         improvement
max        104.042       101.008         3%
min        9.426         8.586           10%
mean       20.6209       18.2238         13%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_sincosf-sse2 and s_sincosf-fma.
	(CFLAGS-s_sincosf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise.
	* sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if
	__sincosf is defined.
2018-01-08 08:04:40 -08:00
Samuel Thibault
4a5ce6e908 hurd: Fix build without NO_HIDDEN
* sysdeps/i386/dl-tlsdesc.S (_dl_tlsdesc_dynamic) [NO_RTLD_HIDDEN]: Call
JUMPTARGET (___tls_get_addr) instead of HIDDEN_JUMPTARGET (___tls_get_addr).
* sysdeps/x86_64/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Likewise.
2018-01-06 18:20:18 +01:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Joseph Myers
f1e005022e Revert exp reimplementation (causes test failures).
Revert:

	2017-12-19  Joseph Myers  <joseph@codesourcery.com>

	* sysdeps/x86_64/fpu/libm-test-ulps: Update.

	2017-12-19  Patrick McGehearty  <patrick.mcgehearty@oracle.com>

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 18:11:37 +00:00
Joseph Myers
6f58c10ded Update x86_64 libm-test-ulps.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2017-12-19 17:38:41 +00:00
Patrick McGehearty
6fd0a3c6a8 Improve __ieee754_exp() performance by greater than 5x on sparc/x86.
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 17:27:31 +00:00
H.J. Lu
cba595c350 x86: Add feature_1 to tcbhead_t [BZ #22563]
On x86, padding in struct __jmp_buf_tag is used for shadow stack pointer
to support Shadow Stack in Intel Control-flow Enforcemen Technology.
cancel_jmp_buf has been updated to include saved_mask so that it is as
large as struct __jmp_buf_tag.  We must suport the old cancel_jmp_buf
in existing binaries.  Since symbol versioning doesn't work on
cancel_jmp_buf, feature_1 is added to tcbhead_t so that setjmp and
longjmp can check if shadow stack is enabled.  NB: Shadow stack is
enabled only if all modules are shadow stack enabled.

	[BZ #22563]
	* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Add feature_1.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Rename __glibc_unused1
	to feature_1.
2017-12-19 02:45:34 -08:00
H.J. Lu
9d7a3741c9 Add --enable-static-pie configure option to build static PIE [BZ #19574]
Static PIE extends address space layout randomization to static
executables.  It provides additional security hardening benefits at
the cost of some memory and performance.

Dynamic linker, ld.so, is a standalone program which can be loaded at
any address.  This patch adds a configure option, --enable-static-pie,
to embed the part of ld.so in static executable to create static position
independent executable (static PIE).  A static PIE is similar to static
executable, but can be loaded at any address without help from a dynamic
linker.  When --enable-static-pie is used to configure glibc, libc.a is
built as PIE and all static executables, including tests, are built as
static PIE.  The resulting libc.a can be used together with GCC 8 or
above to build static PIE with the compiler option, -static-pie.  But
GCC 8 isn't required to build glibc with --enable-static-pie.  Only GCC
with PIE support is needed.  When an older GCC is used to build glibc
with --enable-static-pie, proper input files are passed to linker to
create static executables as static PIE, together with "-z text" to
prevent dynamic relocations in read-only segments, which are not allowed
in static PIE.

The following changes are made for static PIE:

1. Add a new function, _dl_relocate_static_pie, to:
   a. Get the run-time load address.
   b. Read the dynamic section.
   c. Perform dynamic relocations.
Dynamic linker also performs these steps.  But static PIE doesn't load
any shared objects.
2. Call _dl_relocate_static_pie at entrance of LIBC_START_MAIN in
libc.a.  crt1.o, which is used to create dynamic and non-PIE static
executables, is updated to include a dummy _dl_relocate_static_pie.
rcrt1.o is added to create static PIE, which will link in the real
_dl_relocate_static_pie.  grcrt1.o is also added to create static PIE
with -pg.  GCC 8 has been updated to support rcrt1.o and grcrt1.o for
static PIE.

Static PIE can work on all architectures which support PIE, provided:

1. Target must support accessing of local functions without dynamic
relocations, which is needed in start.S to call __libc_start_main with
function addresses of __libc_csu_init, __libc_csu_fini and main.  All
functions in static PIE are local functions.  If PIE start.S can't reach
main () defined in a shared object, the code sequence:

	pass address of local_main to __libc_start_main
	...

local_main:
	tail call to main via PLT

can be used.
2. start.S is updated to check PIC instead SHARED for PIC code path and
avoid dynamic relocation, when PIC is defined and SHARED isn't defined,
to support static PIE.
3. All assembly codes are updated check PIC instead SHARED for PIC code
path to avoid dynamic relocations in read-only sections.
4. All assembly codes are updated check SHARED instead PIC for static
symbol name.
5. elf_machine_load_address in dl-machine.h are updated to support static
PIE.
6. __brk works without TLS nor dynamic relocations in read-only section
so that it can be used by __libc_setup_tls to initializes TLS in static
PIE.

NB: When glibc is built with GCC defaulted to PIE, libc.a is compiled
with -fPIE, regardless if --enable-static-pie is used to configure glibc.
When glibc is configured with --enable-static-pie, libc.a is compiled
with -fPIE, regardless whether GCC defaults to PIE or not.  The same
libc.a can be used to build both static executable and static PIE.
There is no need for separate PIE copy of libc.a.

On x86-64, the normal static sln:

   text	   data	    bss	    dec	    hex	filename
 625425	   8284	   5456	 639165	  9c0bd	elf/sln

the static PIE sln:

   text	   data	    bss	    dec	    hex	filename
 657626	  20636	   5392	 683654	  a6e86	elf/sln

The code size is increased by 5% and the binary size is increased by 7%.

Linker requirements to build glibc with --enable-static-pie:

1. Linker supports --no-dynamic-linker to remove PT_INTERP segment from
static PIE.
2. Linker can create working static PIE.  The x86-64 linker needs the
fix for

https://sourceware.org/bugzilla/show_bug.cgi?id=21782

The i386 linker needs to be able to convert "movl main@GOT(%ebx), %eax"
to "leal main@GOTOFF(%ebx), %eax" if main is defined locally.

Binutils 2.29 or above are OK for i686 and x86-64.  But linker status for
other targets need to be verified.

3. Linker should resolve undefined weak symbols to 0 in static PIE:

https://sourceware.org/bugzilla/show_bug.cgi?id=22269

4. Many ELF backend linkers incorrectly check bfd_link_pic for TLS
relocations, which should check bfd_link_executable instead:

https://sourceware.org/bugzilla/show_bug.cgi?id=22263

Tested on aarch64, i686 and x86-64.

Using GCC 7 and binutils master branch, build-many-glibcs.py with
--enable-static-pie with all patches for static PIE applied have the
following build successes:

PASS: glibcs-aarch64_be-linux-gnu build
PASS: glibcs-aarch64-linux-gnu build
PASS: glibcs-armeb-linux-gnueabi-be8 build
PASS: glibcs-armeb-linux-gnueabi build
PASS: glibcs-armeb-linux-gnueabihf-be8 build
PASS: glibcs-armeb-linux-gnueabihf build
PASS: glibcs-arm-linux-gnueabi build
PASS: glibcs-arm-linux-gnueabihf build
PASS: glibcs-arm-linux-gnueabihf-v7a build
PASS: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch build
PASS: glibcs-m68k-linux-gnu build
PASS: glibcs-microblazeel-linux-gnu build
PASS: glibcs-microblaze-linux-gnu build
PASS: glibcs-mips64el-linux-gnu-n32 build
PASS: glibcs-mips64el-linux-gnu-n32-nan2008 build
PASS: glibcs-mips64el-linux-gnu-n32-nan2008-soft build
PASS: glibcs-mips64el-linux-gnu-n32-soft build
PASS: glibcs-mips64el-linux-gnu-n64 build
PASS: glibcs-mips64el-linux-gnu-n64-nan2008 build
PASS: glibcs-mips64el-linux-gnu-n64-nan2008-soft build
PASS: glibcs-mips64el-linux-gnu-n64-soft build
PASS: glibcs-mips64-linux-gnu-n32 build
PASS: glibcs-mips64-linux-gnu-n32-nan2008 build
PASS: glibcs-mips64-linux-gnu-n32-nan2008-soft build
PASS: glibcs-mips64-linux-gnu-n32-soft build
PASS: glibcs-mips64-linux-gnu-n64 build
PASS: glibcs-mips64-linux-gnu-n64-nan2008 build
PASS: glibcs-mips64-linux-gnu-n64-nan2008-soft build
PASS: glibcs-mips64-linux-gnu-n64-soft build
PASS: glibcs-mipsel-linux-gnu build
PASS: glibcs-mipsel-linux-gnu-nan2008 build
PASS: glibcs-mipsel-linux-gnu-nan2008-soft build
PASS: glibcs-mipsel-linux-gnu-soft build
PASS: glibcs-mips-linux-gnu build
PASS: glibcs-mips-linux-gnu-nan2008 build
PASS: glibcs-mips-linux-gnu-nan2008-soft build
PASS: glibcs-mips-linux-gnu-soft build
PASS: glibcs-nios2-linux-gnu build
PASS: glibcs-powerpc64le-linux-gnu build
PASS: glibcs-powerpc64-linux-gnu build
PASS: glibcs-tilegxbe-linux-gnu-32 build
PASS: glibcs-tilegxbe-linux-gnu build
PASS: glibcs-tilegx-linux-gnu-32 build
PASS: glibcs-tilegx-linux-gnu build
PASS: glibcs-tilepro-linux-gnu build

and the following build failures:

FAIL: glibcs-alpha-linux-gnu build

elf/sln is failed to link due to:

assertion fail bfd/elf64-alpha.c:4125

This is caused by linker bug and/or non-PIC code in PIE libc.a.

FAIL: glibcs-hppa-linux-gnu build

elf/sln is failed to link due to:

collect2: fatal error: ld terminated with signal 11 [Segmentation fault]

https://sourceware.org/bugzilla/show_bug.cgi?id=22537

FAIL: glibcs-ia64-linux-gnu build

elf/sln is failed to link due to:

collect2: fatal error: ld terminated with signal 11 [Segmentation fault]

FAIL: glibcs-powerpc-linux-gnu build
FAIL: glibcs-powerpc-linux-gnu-soft build
FAIL: glibcs-powerpc-linux-gnuspe build
FAIL: glibcs-powerpc-linux-gnuspe-e500v1 build

elf/sln is failed to link due to:

ld: read-only segment has dynamic relocations.

This is caused by linker bug and/or non-PIC code in PIE libc.a.  See:

https://sourceware.org/bugzilla/show_bug.cgi?id=22264

FAIL: glibcs-powerpc-linux-gnu-power4 build

elf/sln is failed to link due to:

findlocale.c:96:(.text+0x22c): @local call to ifunc memchr

This is caused by linker bug and/or non-PIC code in PIE libc.a.

FAIL: glibcs-s390-linux-gnu build

elf/sln is failed to link due to:

collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core dumped

assertion fail bfd/elflink.c:14299

This is caused by linker bug and/or non-PIC code in PIE libc.a.

FAIL: glibcs-sh3eb-linux-gnu build
FAIL: glibcs-sh3-linux-gnu build
FAIL: glibcs-sh4eb-linux-gnu build
FAIL: glibcs-sh4eb-linux-gnu-soft build
FAIL: glibcs-sh4-linux-gnu build
FAIL: glibcs-sh4-linux-gnu-soft build

elf/sln is failed to link due to:

ld: read-only segment has dynamic relocations.

This is caused by linker bug and/or non-PIC code in PIE libc.a.  See:

https://sourceware.org/bugzilla/show_bug.cgi?id=22263

Also TLS code sequence in SH assembly syscalls in glibc doesn't match TLS
code sequence expected by ld:

https://sourceware.org/bugzilla/show_bug.cgi?id=22270

FAIL: glibcs-sparc64-linux-gnu build
FAIL: glibcs-sparcv9-linux-gnu build
FAIL: glibcs-tilegxbe-linux-gnu build
FAIL: glibcs-tilegxbe-linux-gnu-32 build
FAIL: glibcs-tilegx-linux-gnu build
FAIL: glibcs-tilegx-linux-gnu-32 build
FAIL: glibcs-tilepro-linux-gnu build

elf/sln is failed to link due to:

ld: read-only segment has dynamic relocations.

This is caused by linker bug and/or non-PIC code in PIE libc.a.  See:

https://sourceware.org/bugzilla/show_bug.cgi?id=22263

	[BZ #19574]
	* INSTALL: Regenerated.
	* Makeconfig (real-static-start-installed-name): New.
	(pic-default): Updated for --enable-static-pie.
	(pie-default): New for --enable-static-pie.
	(default-pie-ldflag): Likewise.
	(+link-static-before-libc): Replace $(DEFAULT-LDFLAGS-$(@F))
	with $(if $($(@F)-no-pie),$(no-pie-ldflag),$(default-pie-ldflag)).
	Replace $(static-start-installed-name) with
	$(real-static-start-installed-name).
	(+prectorT): Updated for --enable-static-pie.
	(+postctorT): Likewise.
	(CFLAGS-.o): Add $(pie-default).
	(CFLAGS-.op): Likewise.
	* NEWS: Mention --enable-static-pie.
	* config.h.in (ENABLE_STATIC_PIE): New.
	* configure.ac (--enable-static-pie): New configure option.
	(have-no-dynamic-linker): New LIBC_CONFIG_VAR.
	(have-static-pie): Likewise.
	Enable static PIE if linker supports --no-dynamic-linker.
	(ENABLE_STATIC_PIE): New AC_DEFINE.
	(enable-static-pie): New LIBC_CONFIG_VAR.
	* configure: Regenerated.
	* csu/Makefile (omit-deps): Add r$(start-installed-name) and
	gr$(start-installed-name) for --enable-static-pie.
	(extra-objs): Likewise.
	(install-lib): Likewise.
	(extra-objs): Add static-reloc.o and static-reloc.os
	($(objpfx)$(start-installed-name)): Also depend on
	$(objpfx)static-reloc.o.
	($(objpfx)r$(start-installed-name)): New.
	($(objpfx)g$(start-installed-name)): Also depend on
	$(objpfx)static-reloc.os.
	($(objpfx)gr$(start-installed-name)): New.
	* csu/libc-start.c (LIBC_START_MAIN): Call _dl_relocate_static_pie
	in libc.a.
	* csu/libc-tls.c (__libc_setup_tls): Add main_map->l_addr to
	initimage.
	* csu/static-reloc.c: New file.
	* elf/Makefile (routines): Add dl-reloc-static-pie.
	(elide-routines.os): Likewise.
	(DEFAULT-LDFLAGS-tst-tls1-static-non-pie): Removed.
	(tst-tls1-static-non-pie-no-pie): New.
	* elf/dl-reloc-static-pie.c: New file.
	* elf/dl-support.c (_dl_get_dl_main_map): New function.
	* elf/dynamic-link.h (ELF_DURING_STARTUP): Also check
	STATIC_PIE_BOOTSTRAP.
	* elf/get-dynamic-info.h (elf_get_dynamic_info): Likewise.
	* gmon/Makefile (tests): Add tst-gmon-static-pie.
	(tests-static): Likewise.
	(DEFAULT-LDFLAGS-tst-gmon-static): Removed.
	(tst-gmon-static-no-pie): New.
	(CFLAGS-tst-gmon-static-pie.c): Likewise.
	(CRT-tst-gmon-static-pie): Likewise.
	(tst-gmon-static-pie-ENV): Likewise.
	(tests-special): Likewise.
	($(objpfx)tst-gmon-static-pie.out): Likewise.
	(clean-tst-gmon-static-pie-data): Likewise.
	($(objpfx)tst-gmon-static-pie-gprof.out): Likewise.
	* gmon/tst-gmon-static-pie.c: New file.
	* manual/install.texi: Document --enable-static-pie.
	* sysdeps/generic/ldsodefs.h (_dl_relocate_static_pie): New.
	(_dl_get_dl_main_map): Likewise.
	* sysdeps/i386/configure.ac: Check if linker supports static PIE.
	* sysdeps/x86_64/configure.ac: Likewise.
	* sysdeps/i386/configure: Regenerated.
	* sysdeps/x86_64/configure: Likewise.
	* sysdeps/mips/Makefile (ASFLAGS-.o): Add $(pie-default).
	(ASFLAGS-.op): Likewise.
2017-12-15 17:12:14 -08:00
H.J. Lu
4ca945e9c5 x86-64: Remove sysdeps/x86_64/fpu/s_cosf.S
On Ivy Bridge, bench-cosf reports performance improvement:

           s_cosf.S      s_cosf.c       Improvement
max        114.136       82.401            39%
min        13.119        11.301            16%
mean       22.1882       21.8938           1%

	* sysdeps/x86_64/fpu/s_cosf.S: Removed.
2017-12-14 04:39:53 -08:00
H.J. Lu
ac817e083b x86-64: Add cosf with FMA
On Skylake, bench-cosf reports performance improvement:

            Before        After         Improvement
max        135.362       94.552            43%
min        8.532         7.688             11%
mean       17.1446       11.8128           45%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_cosf-sse2 and s_cosf-fma.
	(CFLAGS-s_cosf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-sse2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_cosf.c: Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-12-12 15:32:58 -08:00
H.J. Lu
9d0ffa60ad x86-64: Add sinf with FMA
On Skylake, bench-sinf reports performance improvement:

            Before        After         Improvement
max        153.996       100.094           54%
min        8.546         6.852             25%
mean       18.1223       11.802            54%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_sinf-sse2 and s_sinf-fma.
	(CFLAGS-s_sinf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_sinf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_sinf-sse2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sinf.c: Likewise.
2017-12-07 10:11:16 -08:00
H.J. Lu
9574c7b68d x86-64: Remove sysdeps/x86_64/fpu/s_sinf.S
On Ivy Bridge, bench-sinf reports performance improvement:

          s_sinf.S      s_sinf.c       Improvement
max        91.521        86.148            6%
min        14.061        11.265            25%
mean       23.3758       23.3344           0.2%

	* sysdeps/x86_64/fpu/s_sinf.S: Removed.
2017-12-07 09:44:17 -08:00
Joseph Myers
34bb10aabf Use libm_alias_float for x86_64.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes x86_64 libm function implementations use
libm_alias_float to define function aliases, or libm_alias_float_other
where the main name is defined with versioned_symbol.

Tested with the glibc testsuite for x86_64, and tested with
build-many-glibcs.py for all its x86_64 configurations that installed
stripped shared libraries are unchanged by the patch.

	* sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_expf.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_log2f.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_logf.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/e_powf.c: Include
	<libm-alias-float.h>.
	(exp2f): Define using libm_alias_float, or libm_alias_float_other
	if [SHARED].
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Include
	<libm-alias-float.h>.
	(ceilf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Include
	<libm-alias-float.h>.
	(floorf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Include
	<libm-alias-float.h>.
	(fmaf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Include
	<libm-alias-float.h>.
	(nearbyintf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Include
	<libm-alias-float.h>.
	(rintf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Include
	<libm-alias-float.h>.
	(truncf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_copysignf.S: Include <libm-alias-float.h>.
	(copysignf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_cosf.S: Include <libm-alias-float.h>.
	(cosf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_fabsf.c: Include <libm-alias-float.h>.
	(fabsf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_fmaxf.S: Include <libm-alias-float.h>.
	(fmaxf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_fminf.S: Include <libm-alias-float.h>.
	(fminf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_llrintf.S: Include <libm-alias-float.h>.
	(llrintf): Define using libm_alias_float.
	[!__ILP32__] (lrintf): Likewise.
	* sysdeps/x86_64/fpu/s_sincosf.S: Include <libm-alias-float.h>.
	(sincosf): Define using libm_alias_float.
	* sysdeps/x86_64/fpu/s_sinf.S: Include <libm-alias-float.h>.
	(sinf): Define using libm_alias_float.
	* sysdeps/x86_64/x32/fpu/s_lrintf.S: Include <libm-alias-float.h>.
	(lrintf): Define using libm_alias_float.
2017-11-29 21:25:41 +00:00
Joseph Myers
011fba7e48 Use libm_alias_double for x86_64.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes x86_64 libm function implementations use
libm_alias_double to define function aliases.

Tested with the glibc testsuite for x86_64, and tested with
build-many-glibcs.py for all its x86_64 configurations that installed
stripped shared libraries are unchanged by the patch.

	* sysdeps/x86_64/fpu/multiarch/s_atan.c: Include
	<libm-alias-double.h>.
	(atan): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_ceil.c: Include
	<libm-alias-double.h>.
	(ceil): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_floor.c: Include
	<libm-alias-double.h>.
	(floor): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_fma.c: Include
	<libm-alias-double.h>.
	(fma): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyint.c: Include
	<libm-alias-double.h>.
	(nearbyint): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_rint.c: Include
	<libm-alias-double.h>.
	(rint): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_sin.c: Include
	<libm-alias-double.h>.
	(sin): Define using libm_alias_double.
	(cos): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan.c: Include
	<libm-alias-double.h>.
	(tan): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Include
	<libm-alias-double.h>.
	(trunc): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_copysign.S: Include <libm-alias-double.h>.
	(copysign): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_fabs.c: Include <libm-alias-double.h>.
	(fabs): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_fmax.S: Include <libm-alias-double.h>.
	(fmax): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_fmin.S: Include <libm-alias-double.h>.
	(fmin): Define using libm_alias_double.
	* sysdeps/x86_64/fpu/s_llrint.S: Include <libm-alias-double.h>.
	(llrint): Define using libm_alias_double.
	[!__ILP32__] (lrint): Likewise.
	* sysdeps/x86_64/x32/fpu/s_lrint.S: Include <libm-alias-double.h>.
	(lrint): Define using libm_alias_double.
2017-11-29 19:01:21 +00:00
Joseph Myers
f58e5f4809 Use libm_alias_ldouble in sysdeps/x86_64/fpu.
This patch continues the preparation for additional _FloatN / _FloatNx
function aliases by using libm_alias_ldouble for sysdeps/x86_64/fpu
long double functions, so that they can have _Float64x aliases added
in future.

Tested for x86_64, including build-many-glibcs.py tests that installed
stripped shared libraries are unchanged by the patch.

	* sysdeps/x86_64/fpu/e_expl.S: Include <libm-alias-ldouble.h>.
	[USE_AS_EXPM1L] (expm1l): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_ceill.S: Include <libm-alias-ldouble.h>.
	(ceill): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_copysignl.S: Include
	<libm-alias-ldouble.h>.
	(copysignl): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_fabsl.S: Include <libm-alias-ldouble.h>.
	(fabsl): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_floorl.S: Include <libm-alias-ldouble.h>.
	(floorl): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_fmaxl.S: Include <libm-alias-ldouble.h>.
	(fmaxl): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_fminl.S: Include <libm-alias-ldouble.h>.
	(fminl): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_llrintl.S: Include <libm-alias-ldouble.h>.
	(llrintl): Define using libm_alias_ldouble.
	(lrintl): Likewise.
	* sysdeps/x86_64/fpu/s_nearbyintl.S: Include
	<libm-alias-ldouble.h>.
	(nearbyintl): Define using libm_alias_ldouble.
	* sysdeps/x86_64/fpu/s_truncl.S: Include <libm-alias-ldouble.h>.
	(truncl): Define using libm_alias_ldouble.
	* sysdeps/x86_64/x32/fpu/s_lrintl.S: Include
	<libm-alias-ldouble.h>.
	(lrintl): Define using libm_alias_ldouble.
2017-11-17 23:39:11 +00:00
Adhemerval Zanella
dff91cd45e nptl: Add tests for internal pthread_mutex_t offsets
This patch adds a new build test to check for internal fields
offsets for user visible internal field.  Although currently
the only field which is statically initialized to a non zero value
is pthread_mutex_t.__data.__kind value, the tests also check the
offset of __kind, __spins, __elision (if supported), and __list
internal member.  A internal header (pthread-offset.h) is added
to each major ABI with the reference value.

Checked on x86_64-linux-gnu and with a build check for all affected
ABIs (aarch64-linux-gnu, alpha-linux-gnu, arm-linux-gnueabihf,
hppa-linux-gnu, i686-linux-gnu, ia64-linux-gnu, m68k-linux-gnu,
microblaze-linux-gnu, mips64-linux-gnu, mips64-n32-linux-gnu,
mips-linux-gnu, powerpc64le-linux-gnu, powerpc-linux-gnu,
s390-linux-gnu, s390x-linux-gnu, sh4-linux-gnu, sparc64-linux-gnu,
sparcv9-linux-gnu, tilegx-linux-gnu, tilegx-linux-gnu-x32,
tilepro-linux-gnu, x86_64-linux-gnu, and x86_64-linux-x32).

	* nptl/pthreadP.h (ASSERT_PTHREAD_STRING,
	ASSERT_PTHREAD_INTERNAL_OFFSET): New macro.
	* nptl/pthread_mutex_init.c (__pthread_mutex_init): Add build time
	checks for internal pthread_mutex_t offsets.
	* sysdeps/aarch64/nptl/pthread-offsets.h
	(__PTHREAD_MUTEX_NUSERS_OFFSET, __PTHREAD_MUTEX_KIND_OFFSET,
	__PTHREAD_MUTEX_SPINS_OFFSET, __PTHREAD_MUTEX_ELISION_OFFSET,
	__PTHREAD_MUTEX_LIST_OFFSET): New macro.
	* sysdeps/alpha/nptl/pthread-offsets.h: Likewise.
	* sysdeps/arm/nptl/pthread-offsets.h: Likewise.
	* sysdeps/hppa/nptl/pthread-offsets.h: Likewise.
	* sysdeps/i386/nptl/pthread-offsets.h: Likewise.
	* sysdeps/ia64/nptl/pthread-offsets.h: Likewise.
	* sysdeps/m68k/nptl/pthread-offsets.h: Likewise.
	* sysdeps/microblaze/nptl/pthread-offsets.h: Likewise.
	* sysdeps/mips/nptl/pthread-offsets.h: Likewise.
	* sysdeps/nios2/nptl/pthread-offsets.h: Likewise.
	* sysdeps/powerpc/nptl/pthread-offsets.h: Likewise.
	* sysdeps/s390/nptl/pthread-offsets.h: Likewise.
	* sysdeps/sh/nptl/pthread-offsets.h: Likewise.
	* sysdeps/sparc/nptl/pthread-offsets.h: Likewise.
	* sysdeps/tile/nptl/pthread-offsets.h: Likewise.
	* sysdeps/x86_64/nptl/pthread-offsets.h: Likewise.

Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-11-07 09:48:28 -02:00
H.J. Lu
95b93c6e0d x86: Add sysdeps/x86/sysdep.h
Add a new header file, sysdeps/x86/sysdep.h, for common assembly code
macros between i386 and x86-64.  Tested on i686 and x86-64.  There are
no differences in outputs of "readelf -a" and "objdump -dw" on all glibc
shared objects before and after the patch.

	* sysdeps/i386/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
	of <sysdeps/generic/sysdep.h>.
	(ALIGNARG): Removed.
	(ASM_SIZE_DIRECTIVE): Likewise.
	(ENTRY): Likewise.
	(END): Likewise.
	(ENTRY_CHK): Likewise.
	(END_CHK): Likewise.
	(syscall_error): Likewise.
	(mcount): Likewise.
	(PSEUDO_END): Likewise.
	(L): Likewise.
	(atom_text_section): Likewise.
	* sysdeps/x86/sysdep.h: New file.
	* sysdeps/x86_64/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
	of <sysdeps/generic/sysdep.h>.
	(ALIGNARG): Removed.
	(ASM_SIZE_DIRECTIVE): Likewise.
	(ENTRY): Likewise.
	(END): Likewise.
	(ENTRY_CHK): Likewise.
	(END_CHK): Likewise.
	(syscall_error): Likewise.
	(mcount): Likewise.
	(PSEUDO_END): Likewise.
	(L): Likewise.
	(atom_text_section): Likewise.
2017-11-01 05:37:26 -07:00
H.J. Lu
a122dbfb2e Replace "if if " with "if " in comments
* include/alloc_buffer.h: Replace "if if " with "if " in
	comments.
	* sysdeps/mips/memcpy.S: Likkewise.
	* sysdeps/mips/memset.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S:
	Likewise.
2017-10-25 08:05:51 -07:00
H.J. Lu
80bb593563 x86-64: Add powf with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      35.4713          27.3842       29%
latency                    82.4537          66.3175       24%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_powf-fma.
	(CFLAGS-e_powf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_powf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_powf.c: Likewise.
2017-10-22 08:08:00 -07:00
H.J. Lu
5c7adbd8ed x86-64: Add log2f with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      16.5937          14.0789       17%
latency                    41.7755          35.3586       18%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_log2f-fma.
	(CFLAGS-e_log2f-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_log2f-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_log2f.c: Likewise.
2017-10-22 08:06:58 -07:00
H.J. Lu
0ccc7153cc x86-64: Add logf with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      16.1534          13.8874       16%
latency                    41.9642          34.3072       22%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_logf-fma.
	(CFLAGS-e_logf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_logf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_logf.c: Likewise.
2017-10-22 08:05:15 -07:00
H.J. Lu
5d15c96975 x86-64: Add exp2f with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      13.0291          11.2225       16%
latency                    44.5154          37.5766       18%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_exp2f-fma.
	(CFLAGS-e_exp2f-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_exp2f-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Likewise.
2017-10-22 07:57:50 -07:00
H.J. Lu
e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c
This patch replaces x86-64 assembly versions of e_expf with generic
e_expf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      36.039           20.7749       73%
latency                    58.8096          40.8715       43%

On Skylake, it improves

                           Before            After     Improvement
reciprocal-throughput      18.4436          11.1693       65%
latency                    47.5162          37.5411       26%

	* sysdeps/x86_64/fpu/e_expf.S: Removed.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise.
	* sysdeps/x86_64/fpu/w_expf.c: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic
	e_expf.c.
	* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c):
	New.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf):
	Renamed to ...
	(__redirect_expf): This.
	(SYMBOL_NAME): Changed to expf.
	(__ieee754_expf): Renamed to ...
	(__expf): This.
	(__GI___expf): This.
	(__ieee754_expf): Add strong_alias.
	(__expf_finite): Likewise.
	(__expf): New.
	Include <sysdeps/ieee754/flt-32/e_expf.c>.
2017-10-22 07:49:55 -07:00
H.J. Lu
b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
H.J. Lu
4d916f0f12 x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299]
Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set
GLRO(dl_platform) to NULL.

	[BZ #22299]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	GLRO(dl_platform) to NULL.
	* sysdeps/x86_64/Makefile (tests): Add tst-platform-1.
	(modules-names): Add tst-platformmod-1 and
	x86_64/tst-platformmod-2.
	(CFLAGS-tst-platform-1.c): New.
	(CFLAGS-tst-platformmod-1.c): Likewise.
	(CFLAGS-tst-platformmod-2.c): Likewise.
	(LDFLAGS-tst-platformmod-2.so): Likewise.
	($(objpfx)tst-platform-1): Likewise.
	($(objpfx)tst-platform-1.out): Likewise.
	(tst-platform-1-ENV): Likewise.
	($(objpfx)x86_64/tst-platformmod-2.os): Likewise.
	* sysdeps/x86_64/tst-platform-1.c: New file.
	* sysdeps/x86_64/tst-platformmod-1.c: Likewise.
	* sysdeps/x86_64/tst-platformmod-2.c: Likewise.
2017-10-19 08:28:26 -07:00
H.J. Lu
02d2d8927d Revert x86: Allow undefined _DYNAMIC in static executable
This code is used in non-PIE static executable and static PIE.  It checks
if _DYNAMIC is undefined before using it to compute load address.  But
not all targets can convert access _DYNAMIC via GOT, which needs dynamic
relocation, to PC-relative at link-time.

	* sysdeps/i386/dl-machine.h (elf_machine_load_address): Don't
	allow undefined _DYNAMIC in PIE libc.a.
	* sysdeps/x86_64/dl-machine.h (elf_machine_load_address):
	Likewse.
2017-10-03 17:49:09 -07:00
Joseph Myers
527cd19c3d Make dbl-64 atan and tan into weak aliases.
This patch converts the dbl-64 implementations of atan and tan into
weak aliases of __atan and __tan, in preparation for making them use
libm_alias_double.  Consequent changes are made to the x86_64
multiarch versions wrapping round them (with the dbl-64 functions,
like other such functions, being made not to define their aliases at
all if __atan or __tan are defined as macros by an including file).

Tested for x86_64, and with build-many-glibcs.py.

	* sysdeps/ieee754/dbl-64/s_atan.c (atan): Rename to __atan and
	define as weak alias of __atan.  Do not define any aliases if
	[__atan].
	[NO_LONG_DOUBLE] (__atanl): Define as strong alias of __atan.
	[NO_LONG_DOUBLE] (atanl): Define as weak alias of __atanl.
	* sysdeps/ieee754/dbl-64/s_tan.c (tan): Rename to __tan and define
	as weak alias of __tan.  Do not define any aliases if [__tan].
	[NO_LONG_DOUBLE] (__tanl): Define as strong alias of __tan.
	[NO_LONG_DOUBLE] (tanl): Define as weak alias of __tanl.
	* sysdeps/x86_64/fpu/multiarch/s_atan-avx.c (atan): Rename to
	__atan.
	* sysdeps/x86_64/fpu/multiarch/s_atan-fma.c (atan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan-fma4.c (atan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan.c (atan): Rename to __atan
	and define as weak alias of __atan.
	* sysdeps/x86_64/fpu/multiarch/s_tan-avx.c (tan): Rename to
	__atan.
	* sysdeps/x86_64/fpu/multiarch/s_tan-fma.c (tan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan-fma4.c (tan): Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan.c (tan): Rename to __tan and
	define as weak alias of __tan.
2017-10-02 20:20:52 +00:00
Szabolcs Nagy
f7a0b063e7 Do not wrap expf and exp2f
The new generic expf and exp2f code don't need wrappers any more, they
set errno inline, so only use the wrappers on targets that need it.
(If the wrapper is needed, then the top level wrapper code is included,
otherwise empty w_exp*f.c is used to suppress the wrapper.)

A powerpc64 expf implementation includes the expf c code directly which
needed some changes.

	* sysdeps/ieee754/flt-32/e_exp2f.c (__exp2f): Define without wrapper.
	* sysdeps/ieee754/flt-32/e_expf.c (__expf): Likewise
	* sysdeps/ieee754/flt-32/w_exp2f.c: New file.
	* sysdeps/ieee754/flt-32/w_expf.c: New file.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Update for
	the new expf code.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: New file.
	* sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_exp2f.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_expf.c: New file.
	* sysdeps/i386/fpu/w_exp2f.c: New file.
	* sysdeps/i386/fpu/w_expf.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/w_expf.c: New file.
	* sysdeps/x86_64/fpu/w_expf.c: New file.
2017-10-02 14:38:54 +01:00
Joseph Myers
2f92505d20 Update x86_64 libm-test-ulps.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2017-09-29 18:03:48 +00:00
H.J. Lu
4088d8dd29 x86: Allow undefined _DYNAMIC in static executable
When --enable-static-pie is used to build static PIE, _DYNAMIC is used
to compute the load address of static PIE.  But _DYNAMIC is undefined
when creating static executable.  This patch makes _DYNAMIC weak in PIE
libc.a so that it can be undefined.

	* sysdeps/i386/dl-machine.h (elf_machine_load_address): Allow
	undefined _DYNAMIC in PIE libc.a.
	* sysdeps/x86_64/dl-machine.h (elf_machine_load_address):
	Likewse.
2017-09-28 15:28:12 -07:00
Joseph Myers
ae8372d7e4 Add SSE4.1 trunc, truncf (bug 20142).
This patch adds SSE4.1 versions of trunc and truncf, using the roundsd
/ roundss instructions, similar to the versions of ceil, floor, rint
and nearbyint functions we already have.  In my testing with the glibc
benchtests these are about 30% faster than the C versions for double,
20% faster for float.

Tested for x86_64.

	[BZ #20142]
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1.
	* sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
2017-09-20 16:54:05 +00:00
H.J. Lu
ef8adeb041 x86: Add MathVec_Prefer_No_AVX512 to cpu-features [BZ #21967]
AVX512 functions in mathvec are used on machines with AVX512.  An AVX2
wrapper is also provided and it can be used when the AVX512 version
isn't profitable.  MathVec_Prefer_No_AVX512 is addded to cpu-features.
If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES
environment variable, the AVX2 wrapper will be used.

Tested on x86-64 machines with and without AVX512.  Also verified
glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine.

	[BZ #21967]
	* sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512):
	New.
	(index_arch_MathVec_Prefer_No_AVX512): Likewise.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Handle MathVec_Prefer_No_AVX512.
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h
	(IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512
	is set.
2017-09-12 07:54:47 -07:00
H.J. Lu
45ff34638f x86: Add x86_64 to x86-64 HWCAP [BZ #22093]
Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
"x86_64" subdirectory when loading a shared library.  ld.so in glibc
2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
on supported ISAs.  This led to shared library loading failure for
shared libraries placed under the "x86_64" subdirectory.

This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
search the "x86_64" subdirectory when loading a shared library.

NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
machines.

Tested on i686 and x86-64.

	[BZ #22093]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
	GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
	* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
	(HWCAP_IMPORTANT): Likewise.
	(HWCAP_X86_64): New enum.
	(HWCAP_X86_AVX512_1): Updated.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
	* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
	(modules-names): Add x86_64/tst-x86_64mod-1.
	(LDFLAGS-tst-x86_64mod-1.so): New.
	($(objpfx)tst-x86_64-1): Likewise.
	($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
	(tst-x86_64-1-clean): Likewise.
	* sysdeps/x86_64/tst-x86_64-1.c: New file.
	* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
2017-09-11 08:18:32 -07:00
Markus Trippelsdorf
4c03a69680 Update x86_64 ulps for AMD Ryzen.
* sysdeps/x86_64/fpu/libm-test-ulps: Update for AMD Ryzen.
2017-09-08 19:57:12 +00:00
Adhemerval Zanella
65687ac76c Remove remaining _HAVE_STRING_ARCH_* definitions (BZ #18858)
Since the removal of bits/string.h, _HAVE_STRING_ARCH_* are no
longer used.  This patch removes the unused macros from i686
and x86_64 sysdeps folder.

Checked on x86_64-linux-gnu and i686-linux-gnu.

	* sysdeps/i386/i686/multiarch/strncpy.c (_HAVE_STRING_ARCH_strncpy):
	Remove define.
	* sysdeps/x86_64/multiarch/stpcpy.c (_HAVE_STRING_ARCH_stpcpy):
	Likewise.
	* sysdeps/x86_64/multiarch/strcspn.c (_HAVE_STRING_ARCH_strcspn):
	Likewise.
	* sysdeps/x86_64/multiarch/strncat.c (_HAVE_STRING_ARCH_strncat):
	Likewise.
	* sysdeps/x86_64/multiarch/strncpy.c (_HAVE_STRING_ARCH_strncpy):
	Likewise.
	* sysdeps/x86_64/multiarch/strpbrk.c (_HAVE_STRING_ARCH_strpbrk):
	Likewise.
	* sysdeps/x86_64/multiarch/strspn.c (_HAVE_STRING_ARCH_strspn):
	Likewise.
2017-09-06 14:35:23 -03:00
Joseph Myers
5a80d39d0d Obsolete pow10 functions.
This patch obsoletes the pow10, pow10f and pow10l functions (makes
them into compat symbols, not available for new ports or static
linking).  The exp10 names for these functions are standardized (in TS
18661-4) and were added in the same glibc version (2.1) as pow10 so
source code can change to use them without any loss of portability.
Since pow10 is deliberately not provided for _Float128, only exp10,
this slightly simplifies moving to the new wrapper templates in the
!LIBM_SVID_COMPAT case, by avoiding needing to arrange for pow10,
pow10f and pow10l to be defined by those templates.

Tested for x86_64, and with build-many-glibcs.py.

	* manual/math.texi (pow10): Do not document.
	(pow10f): Likewise.
	(pow10l): Likewise.
	* math/bits/mathcalls.h [__USE_GNU] (pow10): Do not declare.
	* math/bits/math-finite.h [__USE_GNU] (pow10): Likewise.
	* math/libm-test-exp10.inc (pow10_test): Remove.
	(do_test): Do not call pow10.
	* math/w_exp10_compat.c (pow10): Make into compat symbol.
	[NO_LONG_DOUBLE] (pow10l): Likewise.
	* math/w_exp10f_compat.c (pow10f): Likewise.
	* math/w_exp10l_compat.c (pow10l): Likewise.
	* sysdeps/ia64/fpu/e_exp10.S: Include <shlib-compat.h>.
	(pow10): Make into compat symbol.
	* sysdeps/ia64/fpu/e_exp10f.S: Include <shlib-compat.h>.
	(pow10f): Make into compat symbol.
	* sysdeps/ia64/fpu/e_exp10l.S: Include <shlib-compat.h>.
	(pow10l): Make into compat symbol.
	* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Remove
	pow10.
	(CFLAGS-nldbl-pow10.c): Remove variable..
	* sysdeps/ieee754/ldbl-opt/nldbl-pow10.c: Remove file.
	* sysdeps/ieee754/ldbl-opt/w_exp10_compat.c (pow10l): Condition on
	[SHLIB_COMPAT (libm, GLIBC_2_1, GLIBC_2_27)].
	* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (compat_symbol):
	Undefine and redefine.
	(pow10l): Make into compat symbol.
	* sysdeps/aarch64/libm-test-ulps: Remove pow10 ulps.
	* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
	* sysdeps/arm/libm-test-ulps: Likewise.
	* sysdeps/hppa/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/microblaze/libm-test-ulps: Likewise.
	* sysdeps/mips/mips32/libm-test-ulps: Likewise.
	* sysdeps/mips/mips64/libm-test-ulps: Likewise.
	* sysdeps/nios2/libm-test-ulps: Likewise.
	* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
	* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
	* sysdeps/s390/fpu/libm-test-ulps: Likewise.
	* sysdeps/sh/libm-test-ulps: Likewise.
	* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
	* sysdeps/tile/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-09-01 21:13:18 +00:00
Florian Weimer
17e00cc69e elf: Remove internal_function attribute 2017-08-31 16:59:37 +02:00
H.J. Lu
5e2bc4ff33 x86_64 __redirect_ieee754_expf: Change double to float
__redirect_ieee754_expf has type float, not double.

	* sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf):
	Change double to float.
2017-08-28 08:45:53 -07:00
H.J. Lu
fcaaca412f x86-64: Regenerate libm-test-ulps for AVX512 mathvec tests
Update libm-test-ulps for AVX512 mathvec tests by running
“make regen-ulps” on Intel Xeon processor with AVX512.

	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
2017-08-23 09:11:55 -07:00
H.J. Lu
b9eaca8fa0 x86_64: Replace AVX512F .byte sequences with instructions
Since binutils 2.25 or later is required to build glibc, we can replace
AVX512F .byte sequences with AVX512F instructions.

Tested on x86-64 and x32.  There are no code differences in libmvec.so
and libmvec.a.

	* sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Replace AVX512F
	.byte sequences with AVX512F instructions.
	* sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: Likewise.
	* sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
	* sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
	Likewise.
2017-08-23 06:26:44 -07:00
H.J. Lu
7550717ed7 Mark internal SSE2 functions with attribute_hidden [BZ #18822]
Mark internal SSE2 functions with attribute_hidden to allow direct
access within libc.so and libc.a without using GOT nor PLT.

	[BZ #18822]
	* sysdeps/x86_64/multiarch/strcspn-c.c (STRCSPN_SSE2): Add
	attribute_hidden.
	(__strspn_sse2): Likewise.
2017-08-19 16:46:53 -07:00
H.J. Lu
098b9dd468 x86-64: Check FMA_Usable in ifunc-mathvec-avx2.h [BZ #21966]
Since the AVX2 version of mathvec functions uses FMA, it can only be
used when FMA is usable.

	[BZ #21966]
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h
	(IFUNC_SELECTOR): Don't use the AVX2 version if FMA isn't
	usable.
2017-08-18 06:19:07 -07:00
H.J. Lu
24a2e6588d x86-64: Optimize e_expf with FMA [BZ #21912]
FMA optimized e_expf improves performance by more than 50% on Skylake.

	[BZ #21912]
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_expf-fma.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: New file.
	* sysdeps/x86_64/fpu/multiarch/e_expf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/ifunc-fma.h: Likewise.
2017-08-16 08:43:48 -07:00
H.J. Lu
f59f7adb4a x86-64: Align L(SP_RANGE)/L(SP_INF_0) to 8 bytes [BZ #21955]
sysdeps/x86_64/fpu/e_expf.S has

        lea     L(SP_RANGE)(%rip), %rdx /* load over/underflow bound */
        cmpl    (%rdx,%rax,4), %ecx     /* |x|<under/overflow bound ? */
...
        /* Here if |x| is Inf */
        lea     L(SP_INF_0)(%rip), %rdx /* depending on sign of x: */
        movss   (%rdx,%rax,4), %xmm0    /* return zero or Inf */
        ret
...
         .section .rodata.cst8,"aM",@progbits,8
...
        .p2align 2
L(SP_RANGE): /* single precision overflow/underflow bounds */
        .long   0x42b17217      /* if x>this bound, then result overflows */
        .long   0x42cff1b4      /* if x<this bound, then result underflows */
        .type L(SP_RANGE), @object
        ASM_SIZE_DIRECTIVE(L(SP_RANGE))

        .p2align 2
L(SP_INF_0):
        .long   0x7f800000      /* single precision Inf */
        .long   0               /* single precision zero */
        .type L(SP_INF_0), @object
        ASM_SIZE_DIRECTIVE(L(SP_INF_0))

Since L(SP_RANGE) and L(SP_INF_0) are in .rodata.cst8 section, they must
be aligned to 8 bytes.

	[BZ #21955]
	* sysdeps/x86_64/fpu/e_expf.S (L(SP_RANGE)): Aligned to 8 bytes.
	(L(SP_INF_0)): Likewise.
2017-08-15 14:05:14 -07:00
Florian Weimer
2449ae7b2d ld.so: Introduce struct dl_exception
This commit separates allocating and raising exceptions.  This
simplifies catching and re-raising them because it is no longer
necessary to make a temporary, on-stack copy of the exception message.
2017-08-10 16:54:57 +02:00
H.J. Lu
57a72fa350 x86-64: Add FMA multiarch functions to libm
This patch adds multiarch functions optimized with -mfma -mavx2 to libm.
e_pow-fma.c is compiled with $(config-cflags-nofma) due to PR 19003.

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_exp-fma, e_log-fma, e_pow-fma, s_atan-fma, e_asin-fma,
	e_atan2-fma, s_sin-fma, s_tan-fma, mplog-fma, mpa-fma,
	slowexp-fma, slowpow-fma, sincos32-fma, doasin-fma, dosincos-fma,
	halfulp-fma, mpexp-fma, mpatan2-fma, mpatan-fma, mpsqrt-fma,
	and mptan-fma.
	(CFLAGS-doasin-fma.c): New.
	(CFLAGS-dosincos-fma.c): Likewise.
	(CFLAGS-e_asin-fma.c): Likewise.
	(CFLAGS-e_atan2-fma.c): Likewise.
	(CFLAGS-e_exp-fma.c): Likewise.
	(CFLAGS-e_log-fma.c): Likewise.
	(CFLAGS-e_pow-fma.c): Likewise.
	(CFLAGS-halfulp-fma.c): Likewise.
	(CFLAGS-mpa-fma.c): Likewise.
	(CFLAGS-mpatan-fma.c): Likewise.
	(CFLAGS-mpatan2-fma.c): Likewise.
	(CFLAGS-mpexp-fma.c): Likewise.
	(CFLAGS-mplog-fma.c): Likewise.
	(CFLAGS-mpsqrt-fma.c): Likewise.
	(CFLAGS-mptan-fma.c): Likewise.
	(CFLAGS-s_atan-fma.c): Likewise.
	(CFLAGS-sincos32-fma.c): Likewise.
	(CFLAGS-slowexp-fma.c): Likewise.
	(CFLAGS-slowpow-fma.c): Likewise.
	(CFLAGS-s_sin-fma.c): Likewise.
	(CFLAGS-s_tan-fma.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/doasin-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/dosincos-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_asin-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_atan2-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/ifunc-avx-fma4.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/ifunc-fma4.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpa-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpatan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpatan2-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpsqrt-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mptan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sin-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/sincos32-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_asin.c: Rewrite.
	* sysdeps/x86_64/fpu/multiarch/e_atan2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_pow.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sin.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan.c: Likewise.
2017-08-07 08:20:56 -07:00
H.J. Lu
8537e0f6cf x86-64: Implement libmathvec IFUNC selectors in C
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines)
	Add svml_d_cos2_core-sse2, svml_d_cos4_core-sse,
	svml_d_cos8_core-avx2, svml_d_exp2_core-sse2,
	svml_d_exp4_core-sse, svml_d_exp8_core-avx2,
	svml_d_log2_core-sse2, svml_d_log4_core-sse,
	svml_d_log8_core-avx2, svml_d_pow2_core-sse2,
	svml_d_pow4_core-sse, svml_d_pow8_core-avx2
	svml_d_sin2_core-sse2, svml_d_sin4_core-sse,
	svml_d_sin8_core-avx2, svml_d_sincos2_core-sse2,
	svml_d_sincos4_core-sse, svml_d_sincos8_core-avx2,
	svml_s_cosf16_core-avx2, svml_s_cosf4_core-sse2,
	svml_s_cosf8_core-sse, svml_s_expf16_core-avx2,
	svml_s_expf4_core-sse2, svml_s_expf8_core-sse,
	svml_s_logf16_core-avx2, svml_s_logf4_core-sse2,
	svml_s_logf8_core-sse, svml_s_powf16_core-avx2,
	svml_s_powf4_core-sse2, svml_s_powf8_core-sse,
	svml_s_sincosf16_core-avx2, svml_s_sincosf4_core-sse2,
	svml_s_sincosf8_core-sse, svml_s_sinf16_core-avx2,
	svml_s_sinf4_core-sse2 and svml_s_sinf8_core-sse.
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx2.h: New file.
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-sse4_1.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN2v_cos): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN4v_cos): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN8v_cos): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN2v_exp): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN4v_exp): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN8v_exp): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log2_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_log2_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN2v_log): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_log4_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN4v_log): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN8v_log): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN2vv_pow): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN4vv_pow): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN8vv_pow): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN2v_sin): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN4v_sin): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN8v_sin): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN2vvv_sincos): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN4vvv_sincos): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN8vvv_sincos): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf16_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN16v_cosf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf4_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN4v_cosf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_cosf8_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN8v_cosf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf16_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN16v_expf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf4_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN4v_expf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_expf8_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN8v_expf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf16_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN16v_logf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf4_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN4v_logf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_logf8_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN8v_logf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf16_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN16vv_powf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf4_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN4vv_powf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_powf8_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN8vv_powf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf16_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN16vvv_sincosf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf4_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN4vvv_sincosf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincosf8_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN8vvv_sincosf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf16_core-avx2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVeN16v_sinf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf4_core-sse2.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVbN4v_sinf): Removed.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core.S:  Renamed to
	...
	* sysdeps/x86_64/fpu/multiarch/svml_d_sinf8_core-sse.S: This.
	Don't include <sysdep.h> nor <init-arch.h>.
	(_ZGVdN8v_sinf): Removed.
2017-08-04 13:03:58 -07:00
H.J. Lu
10a87ca476 x86-64: Implement libm IFUNC selectors in C
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_ceil-sse4_1, s_ceilf-sse4_1, s_floor-sse4_1,
	s_floorf-sse4_1, s_nearbyint-sse4_1, s_nearbyintf-sse4_1,
	s_rint-sse4_1 and s_rintf-sse4_1.
	* sysdeps/x86_64/fpu/multiarch/ifunc-sse4_1.h: New file.
	* sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyint.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceil.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_ceil-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__ceil): Removed.
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_ceilf-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__ceilf): Removed.
	* sysdeps/x86_64/fpu/multiarch/s_floor.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_floor-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__floor): Removed.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_floorf-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__floorf): Removed.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyint.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_nearbyint-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__nearbyint): Removed.
	* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_nearbyintf-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__nearbyintf): Removed.
	* sysdeps/x86_64/fpu/multiarch/s_rint.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_rint-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__rint): Removed.
	* sysdeps/x86_64/fpu/multiarch/s_rintf.S: Renamed to ...
	* sysdeps/x86_64/fpu/multiarch/s_rintf-sse4_1.S: This.  Don't
	include <machine/asm.h> nor <init-arch.h>.  Include <sysdep.h>.
	(__rintf): Removed.
2017-08-04 13:02:13 -07:00
H.J. Lu
fc11ff8d0a x86-64: Use IFUNC memcpy and mempcpy in libc.a
Since apply_irel is called before memcpy and mempcpy are called, we
can use IFUNC memcpy and mempcpy in libc.a.

	* sysdeps/x86_64/memmove.S (MEMCPY_SYMBOL): Don't check SHARED.
	(MEMPCPY_SYMBOL): Likewise.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test memcpy and mempcpy in libc.a.
	* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S: Also include
	in libc.a.
	* sysdeps/x86_64/multiarch/memcpy-ssse3.S: Likewise.
	* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memcpy.c: Also include in libc.a.
	(__hidden_ver1): Don't use in libc.a.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
	(__mempcpy): Don't create a weak alias in libc.a.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Support
	libc.a.
	* sysdeps/x86_64/multiarch/mempcpy.c: Also include in libc.a.
	(__hidden_ver1): Don't use in libc.a.
2017-08-04 12:27:18 -07:00
H.J. Lu
19f1a11e7e Check linker support for INSERT in linker script
Since gold doesn't support INSERT in linker script:

https://sourceware.org/bugzilla/show_bug.cgi?id=21676

tst-split-dynreloc fails to link with gold.  Check if linker supports
INSERT in linker script before using it.

	* config.make.in (have-insert): New.
	* configure.ac (libc_cv_insert): New.  Set to yes if linker
	supports INSERT in linker script.
	(AC_SUBST(libc_cv_insert): New.
	* configure: Regenerated.
	* sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc only
	if $(have-insert) == yes.
2017-08-04 12:17:30 -07:00
H.J. Lu
c8a0e6ec03 x86: Remove __memset_zero_constant_len_parameter [BZ #21790]
__memset_zero_constant_len_parameter should be removed by

commit 61062f5630
Author: Ulrich Drepper <drepper@redhat.com>
Date:   Tue Mar 1 00:35:23 2005 +0000

    2005-02-24  Roland McGrath  <roland@redhat.com>

            * debug/Versions (libc: GLIBC_2.4): Remove
            __memset_zero_constant_len_parameter.
            * sysdeps/generic/memset_chk.c: Remove alias and warning.
            * misc/sys/cdefs.h (__warndecl): New macro.
            * debug/warning-nop.c: New file.
            * string/bits/string3.h (memset): Call __warn_memset_zero_len with no
            arguments, instead of calling __memset_zero_constant_len_parameter.
            Use __warndecl for __warn_memset_zero_len.
            * debug/Makefile (routines): Add $(static-only-routines).
            (static-only-routines): New variable.

This patch removes the last emaining pieces of it.  Tested it on i586,
i686 and x86-64.

	[BZ #21790]
	* sysdeps/i386/i586/memset.S
	(__memset_zero_constant_len_parameter): Removed.
	* sysdeps/i386/i686/memset.S
	(__memset_zero_constant_len_parameter): Likewise.
	* sysdeps/i386/i686/multiarch/memset_chk.S
	(__memset_zero_constant_len_parameter): Likewise.
	* sysdeps/x86_64/memset.S (__memset_zero_constant_len_parameter):
	Likewise.
2017-08-04 10:56:51 -07:00
H.J. Lu
5b736bc9b5 x86-64: Check PIC instead of SHARED in start.S
Since start.o may be compiled as PIC, we should check PIC instead of
SHARED.

	* sysdeps/x86_64/start.S (_start): Check PIC instead of SHARED.
2017-08-02 10:27:34 -07:00
H.J. Lu
7a499756ab x86-64: Test memmove_chk and memset_chk only in libc.so [BZ #21741]
Since there are no multiarch versions of memmove_chk and memset_chk,
test multiarch versions of memmove_chk and memset_chk only in libc.so.

	[BZ #21741]
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test memmove_chk and memset_chk only
	in libc.so.
2017-07-10 04:44:38 -07:00
H.J. Lu
58d021c836 x86-64: Update comments in IFUNC selectors
* sysdeps/x86_64/multiarch/memcmp.c: Update comments.
	* sysdeps/x86_64/multiarch/memmove.c: Likewise.
	* sysdeps/x86_64/multiarch/memrchr.c: Likewise.
	* sysdeps/x86_64/multiarch/memset.c: Likewise.
	* sysdeps/x86_64/multiarch/rawmemchr.c: Likewise.
	* sysdeps/x86_64/multiarch/strchrnul.c: Likewise.
	* sysdeps/x86_64/multiarch/strlen.c: Likewise.
	* sysdeps/x86_64/multiarch/strnlen.c: Likewise.
	* sysdeps/x86_64/multiarch/wcschr.c: Likewise.
	* sysdeps/x86_64/multiarch/wcscpy.c: Likewise.
	* sysdeps/x86_64/multiarch/wcslen.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsnlen.c: Likewise.
	* sysdeps/x86_64/multiarch/wmemchr.c: Likewise.
	* sysdeps/x86_64/multiarch/wmemcmp.c: Likewise.
	* sysdeps/x86_64/multiarch/wmemset.c: Likewise.
	* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
2017-07-09 11:43:20 -07:00
H.J. Lu
4df54c89bb x86-64: Update comments in ifunc-impl-list.c
All x86-64 IFUNC selectors are written in C now.  Update comments to
reflect it.

	* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update comments.
2017-07-09 11:38:37 -07:00
H.J. Lu
031e519c95 x86-64: Align the stack in __tls_get_addr [BZ #21609]
This change forces realignment of the stack pointer in __tls_get_addr, so
that binaries compiled by GCCs older than GCC 4.9:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

continue to work even if vector instructions are used in glibc which
require the ABI stack realignment.

__tls_get_addr_slow is added to handle the slow paths in the default
implementation of__tls_get_addr in elf/dl-tls.c.  The new __tls_get_addr
calls __tls_get_addr_slow after realigning the stack.  Internal calls
within ld.so go directly to the default implementation of __tls_get_addr
because they do not need stack realignment.

	[BZ #21609]
	* sysdeps/x86_64/Makefile (sysdep-dl-routines): Add tls_get_addr.
	(gen-as-const-headers): Add rtld-offsets.sym.
	* sysdeps/x86_64/dl-tls.c: New file.
	* sysdeps/x86_64/rtld-offsets.sym: Likwise.
	* sysdeps/x86_64/tls_get_addr.S: Likewise.
	* sysdeps/x86_64/dl-tls.h: Add multiple inclusion guards.
	* sysdeps/x86_64/tlsdesc.sym (TI_MODULE_OFFSET): New.
	(TI_OFFSET_OFFSET): Likwise.
2017-07-06 04:43:20 -07:00
Joseph Myers
073e8fa773 Require binutils 2.25 or later to build glibc.
This patch implements a requirement of binutils >= 2.25 (up from 2.22)
to build glibc.  Tests for 2.24 or later on x86_64 and s390 are
removed.  It was already the case, as indicated by buildbot results,
that 2.24 was too old for building tests for 32-bit x86 (produced
internal linker errors linking elf/tst-gnu2-tls1mod.so).  I don't know
if any configure tests for binutils features are obsolete given the
increased version requirement.

Tested for x86_64.

	* configure.ac (AS): Require binutils 2.25 or later.
	(LD): Likewise.
	* configure: Regenerated.
	* sysdeps/s390/configure.ac (AS): Remove version check.
	* sysdeps/s390/configure: Regenerated.
	* sysdeps/x86_64/configure.ac (AS): Remove version check.
	* sysdeps/x86_64/configure: Regenerated.
	* manual/install.texi (Tools for Compilation): Document
	requirement for binutils 2.25 or later.
	* INSTALL: Regenerated.
2017-06-28 11:31:50 +00:00
H.J. Lu
e94c310357 x86-64: Optimize memcmp-avx2-movbe.S for short difference
Check the first 32 bytes before checking size when size >= 32 bytes
to avoid unnecessary branch if the difference is in the first 32 bytes.
Replace vpmovmskb/subl/jnz with vptest/jnc.

On Haswell, the new version is as fast as the previous one.  On Skylake,
the new version is a little bit faster.

	* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (MEMCMP): Check
	the first 32 bytes before checking size when size >= 32 bytes.
	Replace vpmovmskb/subl/jnz with vptest/jnc.
2017-06-27 07:55:00 -07:00
Joseph Myers
c86ed71d63 Add float128 support for x86_64, x86.
This patch enables float128 support for x86_64 and x86.  All GCC
versions that can build glibc provide the required support, but since
GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN
tests and some tests of NaN payloads need to be disabled with such
compilers (this does not affect the generated glibc binaries at all,
just the tests).  bits/floatn.h declares float128 support to be
available for GCC versions that provide the required libgcc support
(4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd);
compilation-only support was present some time before then, but not
really useful without the libgcc functions.

fenv_private.h needed updating to avoid trying to put _Float128 values
in registers.  I make no assertion of optimality of the
math_opt_barrier / math_force_eval definitions for this case; they are
simply intended to be sufficient to work correctly.

Tested for x86_64 and x86, with GCC 7 and GCC 6.  (Testing for x32 was
compilation tests only with build-many-glibcs.py to verify the ABI
baseline updates.  I have not done any testing for Hurd, although the
float128 support is enabled there as for GNU/Linux.)

	* sysdeps/i386/Implies: Add ieee754/float128.
	* sysdeps/x86_64/Implies: Likewise.
	* sysdeps/x86/bits/floatn.h: New file.
	* sysdeps/x86/float128-abi.h: Likewise.
	* manual/math.texi (Mathematics): Document support for _Float128
	on x86_64 and x86.
	* sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>.
	(math_opt_barrier): Do not put _Float128 values in floating-point
	registers.
	(math_force_eval): Likewise.
	[__x86_64__] (SET_RESTORE_ROUNDF128): New macro.
	* sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append
	to Makefile variable.
	* sysdeps/x86/fpu/e_sqrtf128.c: New file.
	* sysdeps/x86/fpu/sfp-machine.h: Likewise.  Based on libgcc.
	* sysdeps/x86/math-tests.h: New file.
	* math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro.
	* math/libm-test-getpayload.inc (getpayload_test_data): Use
	XFAIL_FLOAT128_PAYLOAD.
	* math/libm-test-setpayload.inc (setpayload_test_data): Likewise.
	* math/libm-test-totalorder.inc (totalorder_test_data): Likewise.
	* math/libm-test-totalordermag.inc (totalordermag_test_data):
	Likewise.
	* sysdeps/unix/sysv/linux/i386/libc.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-06-26 22:02:24 +00:00
H.J. Lu
049816c3be x86-64: Optimize L(between_2_3) in memcmp-avx2-movbe.S
Turn

	movzbl	-1(%rdi, %rdx), %edi
	movzbl	-1(%rsi, %rdx), %esi
	orl	%edi, %eax
	orl	%esi, %ecx

into

	movb	-1(%rdi, %rdx), %al
	movb	-1(%rsi, %rdx), %cl

	* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3):
	Replace movzbl and orl with movb.
2017-06-23 12:46:12 -07:00
Florian Weimer
bc0382ae90 x86-64: Fix comment typo in memcmp-avx2-movbe.S 2017-06-23 19:00:58 +02:00
Florian Weimer
3ec7c02cc3 x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
This code:

L(between_2_3):
	/* Load as big endian with overlapping loads and bswap to avoid
	   branches.  */
	movzwl	-2(%rdi, %rdx), %eax
	movzwl	-2(%rsi, %rdx), %ecx
	shll	$16, %eax
	shll	$16, %ecx
	movzwl	(%rdi), %edi
	movzwl	(%rsi), %esi
	orl	%edi, %eax
	orl	%esi, %ecx
	bswap	%eax
	bswap	%ecx
	subl	%ecx, %eax
	ret

needs a saturating subtract because the full register is used.
With this commit, only the lower 24 bits of the register are used,
so a regular subtraction suffices.

The test case change adds coverage for these kinds of bugs.
2017-06-23 17:24:40 +02:00
H.J. Lu
11ffcacb64 x86-64: Implement strcmp family IFUNC selectors in C
Implement strcmp family IFUNC selectors in C.

All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register.  For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized.  Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for strcmp family functions within libc.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcmp-sse2, strcmp-sse4_2, strncmp-sse2, strncmp-sse4_2,
	strcasecmp_l-sse2, strcasecmp_l-sse4_2, strcasecmp_l-avx,
	strncase_l-sse2, strncase_l-sse4_2 and strncase_l-avx.
	* sysdeps/x86_64/multiarch/ifunc-strcasecmp.h: New file.
	* sysdeps/x86_64/multiarch/strcasecmp.c: Likewise.
	* sysdeps/x86_64/multiarch/strcasecmp_l-avx.S: Likewise.
	* sysdeps/x86_64/multiarch/strcasecmp_l-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strcasecmp_l-sse4_2.S: Likewise.
	* sysdeps/x86_64/multiarch/strcasecmp_l.c: Likewise.
	* sysdeps/x86_64/multiarch/strcmp-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strcmp-sse4_2.S: Likewise.
	* sysdeps/x86_64/multiarch/strcmp.c: Likewise.
	* sysdeps/x86_64/multiarch/strncase.c: Likewise.
	* sysdeps/x86_64/multiarch/strncase_l-avx.S : Likewise.
	* sysdeps/x86_64/multiarch/strncase_l-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strncase_l-sse4_2.S: Likewise.
	* sysdeps/x86_64/multiarch/strncase_l.c: Likewise.
	* sysdeps/x86_64/multiarch/strncmp-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strncmp-sse4_2.S: Likewise.
	* sysdeps/x86_64/multiarch/strncmp.c: Likewise.
	* sysdeps/x86_64/multiarch/strcasecmp_l.S: Removed.
	* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
	* sysdeps/x86_64/multiarch/strncase_l.S: Likewise.
	* sysdeps/x86_64/multiarch/strncmp.S: Likewise.
	* sysdeps/x86_64/multiarch/strcmp-sse42.S: Include <sysdep.h>.
	(STRCMP_SSE42): New.  Defined to __strcmp_sse42 if not defined.
	[USE_AS_STRCASECMP_L || USE_AS_STRNCASECMP_L]: Include
	"locale-defines.h".
	(UPDATE_STRNCMP_COUNTER): New.
	(SECTION): Likewise.
	(GLABEL): Likewise.
	(LABEL): Likewise.
	* sysdeps/x86_64/multiarch/strncmp-ssse3.S: Rewrite and enable
	for libc.a.
2017-06-21 12:11:06 -07:00
Zack Weinberg
af85385f31 Use locale_t, not __locale_t, throughout glibc
<locale.h> is specified to define locale_t in POSIX.1-2008, and so are
all of the headers that define functions that take locale_t arguments.
Under _GNU_SOURCE, the additional headers that define such functions
have also always defined locale_t.  Therefore, there is no need to use
__locale_t in public function prototypes, nor in any internal code.

	* ctype/ctype-c99_l.c, ctype/ctype.h, ctype/ctype_l.c
	* include/monetary.h, include/stdlib.h, include/time.h
	* include/wchar.h, locale/duplocale.c, locale/freelocale.c
	* locale/global-locale.c, locale/langinfo.h, locale/locale.h
	* locale/localeinfo.h, locale/newlocale.c
	* locale/nl_langinfo_l.c, locale/uselocale.c
	* localedata/bug-usesetlocale.c, localedata/tst-xlocale2.c
	* stdio-common/vfscanf.c, stdlib/monetary.h, stdlib/stdlib.h
	* stdlib/strfmon_l.c, stdlib/strtod_l.c, stdlib/strtof_l.c
	* stdlib/strtol.c, stdlib/strtol_l.c, stdlib/strtold_l.c
	* stdlib/strtoll_l.c, stdlib/strtoul_l.c, stdlib/strtoull_l.c
	* string/strcasecmp.c, string/strcoll_l.c, string/string.h
	* string/strings.h, string/strncase.c, string/strxfrm_l.c
	* sysdeps/ieee754/float128/strtof128_l.c
	* sysdeps/ieee754/float128/wcstof128.c
	* sysdeps/ieee754/float128/wcstof128_l.c
	* sysdeps/ieee754/ldbl-128ibm/strtold_l.c
	* sysdeps/ieee754/ldbl-64-128/strtold_l.c
	* sysdeps/ieee754/ldbl-opt/nldbl-compat.c
	* sysdeps/ieee754/ldbl-opt/nldbl-strfmon_l.c
	* sysdeps/ieee754/ldbl-opt/nldbl-strtold_l.c
	* sysdeps/ieee754/ldbl-opt/nldbl-wcstold_l.c
	* sysdeps/powerpc/powerpc32/power7/strcasecmp.S
	* sysdeps/powerpc/powerpc64/power7/strcasecmp.S
	* sysdeps/x86_64/strcasecmp_l-nonascii.c
	* sysdeps/x86_64/strncase_l-nonascii.c, time/strftime_l.c
	* time/strptime_l.c, time/time.h, wcsmbs/mbsrtowcs_l.c
	* wcsmbs/wchar.h, wcsmbs/wcscasecmp.c, wcsmbs/wcsncase.c
	* wcsmbs/wcstod.c, wcsmbs/wcstod_l.c, wcsmbs/wcstof.c
	* wcsmbs/wcstof_l.c, wcsmbs/wcstol_l.c, wcsmbs/wcstold.c
	* wcsmbs/wcstold_l.c, wcsmbs/wcstoll_l.c, wcsmbs/wcstoul_l.c
	* wcsmbs/wcstoull_l.c, wctype/iswctype_l.c
	* wctype/towctrans_l.c, wctype/wcfuncs_l.c
	* wctype/wctrans_l.c, wctype/wctype.h, wctype/wctype_l.c:
	Change all uses of __locale_t to locale_t.
2017-06-20 20:30:06 -04:00
Zack Weinberg
c0b23001a8 Fix fallout from bits/string.h removal.
Remove one more string inline that was defined directly in string.h;
in the absence of the rest of the inlines, it broke the build.

Like other ifunc shims for these functions,
x86_64/multiarch/{mem,st}pcpy.c need to define __NO_STRING_INLINES and
NO_MEMPCPY_STPCPY_REDIRECT.

	* string/string.h (__mempcpy_inline): Delete.
	* sysdeps/x86_64/multiarch/mempcpy.c
	* sysdeps/x86_64/multiarch/stpcpy.c:
	Define NO_MEMPCPY_STPCPY_REDIRECT and __NO_STRING_INLINES
	before including string.h.
2017-06-20 09:39:08 -04:00
Zack Weinberg
09a596cc2c Remove bits/string.h.
These machine-dependent inline string functions have never been on by
default, and even if they were a good idea at the time they were
introduced, they haven't really been touched in ten to fifteen years
and probably aren't a good idea on current-gen processors.  Current
thinking is that this class of optimization is best left to the
compiler.

	* bits/string.h, string/bits/string.h
	* sysdeps/aarch64/bits/string.h
	* sysdeps/m68k/m680x0/m68020/bits/string.h
	* sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h
	* sysdeps/x86/bits/string.h: Delete file.

	* string/string.h: Don't include bits/string.h.
	* string/bits/string3.h: Rename to bits/string_fortified.h.
	No need to undef various symbols that the removed headers
	might have defined as macros.
	* string/Makefile (headers): Remove bits/string.h, change
	bits/string3.h to bits/string_fortified.h.
	* string/string-inlines.c: Update commentary.  Remove definitions
	of various macros that nothing looks at anymore.  Don't directly
	include bits/string.h. Set _STRING_INLINE_unaligned here, based on
	compiler-predefined macros.
	* string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY
	_is_ defined, provide internal hidden alias __strncat.
	* include/string.h: Declare internal hidden alias __strncat.
	Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is
	not defined.
	* include/bits/string3.h: Rename to bits/string_fortified.h,
	update to match above.

	* sysdeps/i386/string-inlines.c: Define compat symbols for
	everything formerly defined by sysdeps/x86/bits/string.h.
	Make existing definitions into compat symbols as well.
	Remove some no-longer-necessary messing around with macros.

	* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
	* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
	* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
	* sysdeps/s390/multiarch/mempcpy.c
	No need to define _HAVE_STRING_ARCH_mempcpy.
	Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT.

	* sysdeps/i386/i686/multiarch/strncat-c.c
	* sysdeps/s390/multiarch/strncat-c.c
	* sysdeps/x86_64/multiarch/strncat-c.c
	Define STRNCAT_PRIMARY.  Don't change definition of libc_hidden_def.
2017-06-20 08:21:24 -04:00
Siddhesh Poyarekar
629ebc873a Fix typo when undefining weak_alias
The macro directive #undef was miswritten as #undefine.

	* sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Fix typo.
2017-06-19 14:56:40 +05:30
H.J. Lu
70fe2eb794 x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C
Implement strcspn/strpbrk/strspn IFUNC selectors in C

All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register.  For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized.  Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for strcspn/strpbrk/strspn functions within libc.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcspn-sse2, strpbrk-sse2 and strspn-sse2.
	* sysdeps/x86_64/strcspn.S (STRPBRK_P): Removed.
	Check USE_AS_STRPBRK instead of STRPBRK_P.
	* sysdeps/x86_64/strpbrk.S (USE_AS_STRPBRK): New.
	* sysdeps/x86_64/multiarch/ifunc-sse4_2.h: New file.
	* sysdeps/x86_64/multiarch/strcspn-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strcspn.c: Likewise.
	* sysdeps/x86_64/multiarch/strpbrk-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strpbrk.c: Likewise.
	* sysdeps/x86_64/multiarch/strspn-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strspn.c: Likewise.
	* sysdeps/x86_64/multiarch/strcspn.S: Removed.
	* sysdeps/x86_64/multiarch/strpbrk.S: Likewise.
	* sysdeps/x86_64/multiarch/strspn.S: Likewise.
	* sysdeps/x86_64/multiarch/strpbrk-c.c: Remove "#ifdef SHARED"
	and "#endif".
2017-06-15 08:59:05 -07:00
H.J. Lu
9f4254b8bd x86-64: Implement wcscpy IFUNC selector in C
* sysdeps/x86_64/multiarch/wcscpy.S: Removed.
	* sysdeps/x86_64/multiarch/wcscpy.c: New file.
2017-06-15 08:57:52 -07:00
H.J. Lu
9ed0aa15d3 x86-64: Implement strcat family IFUNC selectors in C
Implement strcat family IFUNC selectors in C.

All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register.  For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized.  Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for strcat family functions within libc.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcat-sse2.
	* sysdeps/x86_64/multiarch/strcat-sse2.S: New file.
	* sysdeps/x86_64/multiarch/strcat.c: Likewise.
	* sysdeps/x86_64/multiarch/strncat.c: Likewise.
	* sysdeps/x86_64/multiarch/strcat.S: Removed.
	* sysdeps/x86_64/multiarch/strncat.S: Likewise.
2017-06-15 08:56:59 -07:00
H.J. Lu
b91a52d0d7 x86-64: Implement memcmp family IFUNC selectors in C
Implement memcmp family IFUNC selectors in C.

All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register.  For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized.  Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for memcmp family functions within libc.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memcmp-sse2.
	* sysdeps/x86_64/multiarch/ifunc-memcmp.h: New file.
	* sysdeps/x86_64/multiarch/memcmp-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/memcmp.c: Likewise.
	* sysdeps/x86_64/multiarch/wmemcmp.c: Likewise.
	* sysdeps/x86_64/multiarch/memcmp.S: Removed.
	* sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.
2017-06-15 08:49:57 -07:00
H.J. Lu
93e46f8773 x86-64: Implement memset family IFUNC selectors in C
Implement memset family IFUNC selectors in C.

All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register.  For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized.  Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for memset functions within libc.

2017-06-07  H.J. Lu  <hongjiu.lu@intel.com>
	    Erich Elsen  <eriche@google.com>

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memset-sse2-unaligned-erms, and memset_chk-nonshared.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add test for __memset_chk_erms.
	Update comments.
	* sysdeps/x86_64/multiarch/ifunc-memset.h: New file.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
	* sysdeps/x86_64/multiarch/memset.c: Likewise.
	* sysdeps/x86_64/multiarch/memset_chk-nonshared.S: Likewise.
	* sysdeps/x86_64/multiarch/memset_chk.c: Likewise.
	* sysdeps/x86_64/multiarch/memset.S: Removed.
	* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
	(__memset_chk_erms): New function.
2017-06-15 08:33:35 -07:00
H.J. Lu
5c3e322d3b x86-64: Implement memmove family IFUNC selectors in C
Implement memmove family IFUNC selectors in C.

All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register.  For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized.  Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for memmove family functions within libc.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memmove-sse2-unaligned-erms, memcpy_chk-nonshared,
	mempcpy_chk-nonshared and memmove_chk-nonshared.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __memmove_chk_erms,
	__memcpy_chk_erms and __mempcpy_chk_erms.  Update comments.
	* sysdeps/x86_64/multiarch/ifunc-memmove.h: New file.
	* sysdeps/x86_64/multiarch/memcpy.c: Likewise.
	* sysdeps/x86_64/multiarch/memcpy_chk-nonshared.S: Likewise.
	* sysdeps/x86_64/multiarch/memcpy_chk.c: Likewise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise.
	* sysdeps/x86_64/multiarch/memmove.c: Likewise.
	* sysdeps/x86_64/multiarch/memmove_chk-nonshared.S: Likewise.
	* sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
	* sysdeps/x86_64/multiarch/mempcpy.c: Likewise.
	* sysdeps/x86_64/multiarch/mempcpy_chk-nonshared.S: Likewise.
	* sysdeps/x86_64/multiarch/mempcpy_chk.c: Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S: Removed.
	* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
	* sysdeps/x86_64/multiarch/memmove.S: Likewise.
	* sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
	* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
	* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
	(__mempcpy_chk_erms): New function.
	(__memmove_chk_erms): Likewise.
	(__memcpy_chk_erms): New alias.
2017-06-14 12:11:10 -07:00
Zack Weinberg
fd860eaaa8 Remove __need macros from errno.h (__need_Emath, __need_error_t).
This is fairly complicated, not because the users of __need_Emath and
__need_error_t have complicated requirements, but because the core
changes had a lot of fallout.

__need_error_t exists for gnulib compatibility in argz.h and argp.h.
error_t itself is a Hurdism, an enum containing all the E-constants,
so you can do 'p (error_t) errno' in gdb and get a symbolic value.
argz.h and argp.h use it for function return values, and they want to
fall back to 'int' when that's not available.  There is no reason why
these nonstandard headers cannot just go ahead and include all of
errno.h; so we do that.

__need_Emath is defined only by .S files; what they _really_ need is
for errno.h to avoid declaring anything other than the E-constants
(e.g. 'extern int __errno_location(void);' is a syntax error in
assembly language). This is replaced with a check for __ASSEMBLER__ in
errno.h, plus a carefully documented requirement for bits/errno.h not
to define anything other than macros.  That in turn has the
consequence that bits/errno.h must not define errno - fortunately, all
live ports use the same definition of errno, so I've moved it to
errno.h.  The Hurd bits/errno.h must also take care not to define
error_t when __ASSEMBLER__ is defined, which involves repeating all of
the definitions twice, but it's a generated file so that's okay.

	* stdlib/errno.h: Remove __need_Emath and __need_error_t logic.
	Reorganize file.  Declare errno here.  When __ASSEMBLER__ is
	defined, don't declare anything other than the E-constants.

	* include/errno.h: Change conditional for exposing internal
	declarations to (not _ISOMAC and not __ASSEMBLER__).
	* bits/errno.h: Remove logic for __need_Emath.  Document
	requirements for a port-specific bits/errno.h.

	* sysdeps/unix/sysv/linux/bits/errno.h
	* sysdeps/unix/sysv/linux/alpha/bits/errno.h
	* sysdeps/unix/sysv/linux/hppa/bits/errno.h
	* sysdeps/unix/sysv/linux/mips/bits/errno.h
	* sysdeps/unix/sysv/linux/sparc/bits/errno.h:
	Add multiple-include guard and check against improper inclusion.
	Remove __need_Emath logic.  Don't declare errno here.  Ensure all
	constants are defined as simple integer literals.  Consistent
	formatting.
	* sysdeps/mach/hurd/errnos.awk: Likewise.  Only define error_t and
	enum __error_t_codes if __ASSEMBLER__ is not defined.
	* sysdeps/mach/hurd/bits/errno.h: Regenerate.

	* argp/argp.h, string/argz.h: Don't define __need_error_t before
	including errno.h.
	* sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S
	* sysdeps/i386/i686/fpu/multiarch/s_sincosf-sse2.S
	* sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S
	* sysdeps/x86_64/fpu/s_cosf.S
	* sysdeps/x86_64/fpu/s_sincosf.S
	* sysdeps/x86_64/fpu/s_sinf.S:
	Just include errno.h; don't define __need_Emath or include
	bits/errno.h directly.
2017-06-14 08:14:34 -04:00
Alan Modra
0572433b5b PowerPC64 ELFv2 PPC64_OPT_LOCALENTRY
ELFv2 functions with localentry:0 are those with a single entry point,
ie. global entry == local entry, that have no requirement on r2 or
r12 and guarantee r2 is unchanged on return.  Such an external
function can be called via the PLT without saving r2 or restoring it
on return, avoiding a common load-hit-store for small functions.

This patch implements the ld.so changes necessary for this
optimization.  ld.so needs to check that an optimized plt call
sequence is in fact calling a function implemented with localentry:0,
end emit a fatal error otherwise.

The elf/testobj6.c change is to stop "error while loading shared
libraries: expected localentry:0 `preload'" when running
elf/preloadtest, which we'd get otherwise.

	* elf/elf.h (PPC64_OPT_LOCALENTRY): Define.
	* sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add
	refsym and sym parameters.  Adjust callers.
	* sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt):
	Likewise.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New.
	(_dl_reloc_overflow): Increase buffser size.  Formatting.
	* sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset):
	Delete reloc param, add refsym and sym.  Check optimized plt
	call stubs for localentry:0 functions.  Adjust callers.
	(elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym
	and sym parameters.  Adjust callers.
	(_dl_reloc_overflow): Move attribute.
	(_dl_error_localentry): Declare.
	* elf/dl-runtime.c (_dl_fixup): Save original sym.  Pass
	refsym and sym to elf_machine_fixup_plt.
	* elf/testobj6.c (preload): Call printf.
2017-06-14 10:47:25 +09:30
H.J. Lu
5a103908c0 x86-64: Implement strcpy family IFUNC selectors in C
Implement strcpy family IFUNC selectors in C.

All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register.  For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized.  Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for strcpy family functions within libc.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcpy-sse2 and stpcpy-sse2.
	* sysdeps/x86_64/multiarch/ifunc-unaligned-ssse3.h: New file.
	* sysdeps/x86_64/multiarch/stpcpy-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/stpcpy.c: Likewise.
	* sysdeps/x86_64/multiarch/stpncpy.c: Likewise.
	* sysdeps/x86_64/multiarch/strcpy-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strcpy.c: Likewise.
	* sysdeps/x86_64/multiarch/strncpy.c: Likewise.
	* sysdeps/x86_64/multiarch/stpcpy.S: Removed.
	* sysdeps/x86_64/multiarch/stpncpy.S: Likewise.
	* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
	* sysdeps/x86_64/multiarch/strncpy.S: Likewise.
	* sysdeps/x86_64/multiarch/stpncpy-c.c (weak_alias): New.
	(libc_hidden_def): Always defined as empty.
	* sysdeps/x86_64/multiarch/strncpy-c.c (libc_hidden_builtin_def):
	Always Defined as empty.
2017-06-12 09:06:09 -07:00
H.J. Lu
6b6710e55b x86-64: Correct comments in ifunc-impl-list.c
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Correct comments.
2017-06-09 05:53:45 -07:00
H.J. Lu
d2538b9156 x86-64: Optimize strrchr/wcsrchr with AVX2
Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector
instructions.  It is as fast as SSE2 version for small data sizes
and up to 1X faster for large data sizes on Haswell.  Select AVX2
version on AVX2 machines where vzeroupper is preferred and AVX
unaligned load is fast.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strrchr-sse2, strrchr-avx2, wcsrchr-sse2 and wcsrchr-avx2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __strrchr_avx2,
	__strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2.
	* sysdeps/x86_64/multiarch/strrchr-avx2.S: New file.
	* sysdeps/x86_64/multiarch/strrchr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strrchr.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsrchr-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcsrchr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcsrchr.c: Likewise.
2017-06-09 05:45:52 -07:00
H.J. Lu
5ac7aa1d7c x86-64: Optimize memrchr with AVX2
Optimize memrchr with AVX2 to search 32 bytes with a single vector
compare instruction.  It is as fast as SSE2 memrchr for small data
sizes and up to 1X faster for large data sizes on Haswell.  Select
AVX2 memrchr on AVX2 machines where vzeroupper is preferred and AVX
unaligned load is fast.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memrchr-sse2 and memrchr-avx2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __memrchr_avx2 and
	__memrchr_sse2.
	* sysdeps/x86_64/multiarch/memrchr-avx2.S: New file.
	* sysdeps/x86_64/multiarch/memrchr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/memrchr.c: Likewise.
2017-06-09 05:44:41 -07:00