Commit Graph

11343 Commits

Author SHA1 Message Date
Michael Collison
5062680c60 aarch64: Implement math acceleration via builtins
This patch converts asm statements into builtins for AArch64.  As an
example for the file sysdeps/aarch64/fpu/s_ceil.c, we convert the
function from

double
__ceil (double x)
{
  double result;
  asm ("frintp\t%d0, %d1" :
       "=w" (result) : "w" (x) );
  return result;
}

into

double
__ceil (double x)
{
  return __builtin_ceil (x);
}

Tested on aarch64-linux-gnu with gcc-4.9.4 and gcc-6.

	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace asm statements
	with __builtin_sqrt.
	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace asm statements
	with __builtin_sqrtf.
	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace asm statements
	with __builtin_ceil.
	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace asm statements
	with __builtin_ceilf.
	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace asm statements
	with __builtin_floor.
	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace asm statements
	with __builtin_floorf.
	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace asm statements
	with __builtin_fma.
	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace asm statements
	with __builtin_fmaf.
	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace asm statements
	with __builtin_fmax.
	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace asm statements
	with __builtin_fmaxf.
	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace asm statements
	with __builtin_fmin.
	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace asm statements
	with __builtin_fminf.
	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace asm statements
	with builtin_rint and conversion to int.
	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Likewise.
	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace asm statements
	with builtin_llround.
	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Likewise.
	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace asm statements
	with builtin_rint and conversion to long int.
	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Likewise.
	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace asm statements
	with builtin_lround.
	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace asm statements
	with builtin_lroundf.
	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace asm
	statements with __builtin_nearbyint.
	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace asm
	statements with __builtin_nearbyintf.
	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace asm statements
	with __builtin_rint.
	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace asm statements
	with __builtin_rintf.
	* sysdeps/aarch64/fpu/s_round.c (__round): Replace asm statements
	with __builtin_round.
	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace asm statements
	with __builtin_roundf.
	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace asm statements
	with __builtin_trunc.
	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace asm statements
	with __builtin_truncf.
	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with -fno-math-errno.
2017-10-23 10:32:56 +01:00
Alan Modra
174935af03 PowerPC64 power8 strncpy cfi fixes
cfi info for stack adjust needs to be on the insn doing the adjust.
cfi describing register saves can be anywhere after the save insn but
before the reg is altered.  Fewer locations with cfi result in smaller
cfi programs and possibly slightly faster exception handling.  Thus
the LR cfi_offset move.

The idea behind ajusting sp after restoring regs is to break a
register dependency chain, in this case not be using r1 immediately
after it is modified.

The missing LR cfi_restore meant that code after the blr,
unaligned_lt_16 and other labels, would have cfi that said LR was at
cfa+16, but that code is reached without LR being saved.

	* sysdeps/powerpc/powerpc64/power8/strncpy.S: Move LR cfi.
	Adjust stack after restoring regs.  Add missing LR cfi_restore.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
2017-10-23 07:46:58 +10:30
Alan Modra
750a0e4967 PowerPC64 power7 strncpy stack handling and cfi
This patch moves the frame setup and teardown to immediately around
the single memset call, as has been done for power8.  I've also
decreased FRAMESIZE to that needed to save the two callee-saved
registers used.  Plus added cfi.

	* sysdeps/powerpc/powerpc64/power7/strncpy.S: Decrease FRAMESIZE.
	Move LR save and frame setup/teardown and LR restore to
	immediately around memset call.  Provide cfi.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
2017-10-23 07:46:07 +10:30
H.J. Lu
5313581cb5 i386: Replace assembly versions of e_powf with generic e_powf.c
This patch replaces i386 assembly versions of e_powf with generic
e_powf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      230.855          78.3358       194%
latency                    231.685          94.1259       146%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      239.858          47.4713       405%
latency                    247.57           93.8798       163%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      269.078          63.3758       324%
latency                    271.473          102.091       165%

	* sysdeps/i386/fpu/e_powf.S: Removed.
	* sysdeps/i386/fpu/e_powf_log2_data.c: Likewise.
	* sysdeps/i386/fpu/w_powf.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_powf.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_powf-sse2.
	(CFLAGS-e_powf-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_powf-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_powf.c: Likewise.
2017-10-22 08:12:41 -07:00
H.J. Lu
6089a3ee24 i386: Replace assembly versions of e_log2f with generic e_log2f.c
This patch replaces i386 assembly versions of e_log2f with generic
e_log2f.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      92.3845          30.8752       199%
latency                    112.855          54.8645       105%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      98.7488          22.7507       334%
latency                    118.01           51.6083       128%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      106.635          28.8596       269%
latency                    129.888          56.9187       128%

	* sysdeps/i386/fpu/e_log2f.S: Removed.
	* sysdeps/i386/fpu/e_log2f_data.c: Likewise.
	* sysdeps/i386/fpu/w_log2f.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_log2f.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_log2f-sse2.
	(CFLAGS-e_log2f-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_log2f-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_log2f.c: Likewise.
2017-10-22 08:10:18 -07:00
H.J. Lu
80bb593563 x86-64: Add powf with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      35.4713          27.3842       29%
latency                    82.4537          66.3175       24%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_powf-fma.
	(CFLAGS-e_powf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_powf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_powf.c: Likewise.
2017-10-22 08:08:00 -07:00
H.J. Lu
5c7adbd8ed x86-64: Add log2f with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      16.5937          14.0789       17%
latency                    41.7755          35.3586       18%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_log2f-fma.
	(CFLAGS-e_log2f-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_log2f-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_log2f.c: Likewise.
2017-10-22 08:06:58 -07:00
H.J. Lu
0ccc7153cc x86-64: Add logf with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      16.1534          13.8874       16%
latency                    41.9642          34.3072       22%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_logf-fma.
	(CFLAGS-e_logf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_logf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_logf.c: Likewise.
2017-10-22 08:05:15 -07:00
H.J. Lu
fe596486d6 i386: Replace assembly versions of e_logf with generic e_logf.c
This patch replaces i386 assembly versions of e_logf with generic
e_logf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      73.3865          40.0454       83%
latency                    90.0985          54.4479       65%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      75.1384          22.1452       239%
latency                    91.9441          50.7925       81%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      84.5575          28.7879       193%
latency                    103.971          57.5231       80%

	* sysdeps/i386/fpu/e_logf.S: Removed.
	* sysdeps/i386/fpu/e_logf_data.c: Likewise.
	* sysdeps/i386/fpu/w_logf.c: Likewise.
	* sysdeps/i386/i686/fpu/e_logf.S: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_logf.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_logf-sse2.
	(CFLAGS-e_logf-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_logf-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_logf.c: Likewise.
2017-10-22 08:02:58 -07:00
H.J. Lu
7eda65f69e i386: Replace assembly versions of e_exp2f with generic e_exp2f.c
This patch replaces i386 assembly versions of e_exp2f with generic
e_exp2f.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      112.996          40.0454       182%
latency                    126.581          54.4479       132%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      113.14           39.447        186%
latency                    136.068          55.684        144%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      132.521          40.3759       228%
latency                    145.791          58.4587       149%

	* sysdeps/i386/fpu/e_exp2f.S: Removed.
	* sysdeps/i386/fpu/w_exp2f.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_exp2f.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_exp2f-sse2.
	(CFLAGS-e_exp2f-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_exp2f-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Likewise.
2017-10-22 08:00:18 -07:00
H.J. Lu
5d15c96975 x86-64: Add exp2f with FMA
For workload-spec2017.wrf, on Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      13.0291          11.2225       16%
latency                    44.5154          37.5766       18%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_exp2f-fma.
	(CFLAGS-e_exp2f-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/e_exp2f-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Likewise.
2017-10-22 07:57:50 -07:00
H.J. Lu
b2f6137ea5 i386: Replace assembly versions of e_expf with generic e_expf.c
This patch replaces i386 assembly versions of e_expf with generic
e_expf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      55.5724          40.2664       38%
latency                    80.0687          60.8517       31%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      62.4056          39.4188       58%
latency                    85.5496          59.6377       43%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      133.707          40.3778       231%
latency                    149.191          63.2515       135%

	* sysdeps/i386/fpu/e_exp2f_data.c: Removed.
	* sysdeps/i386/fpu/e_expf.S: Likewise.
	* sysdeps/i386/fpu/math_errf.c: Likewise.
	* sysdeps/i386/fpu/w_expf.c: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/e_expf-ia32.S: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/w_expf.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_expf.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove e_expf-ia32.
	(CFLAGS-e_expf-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Rewritten.
2017-10-22 07:54:50 -07:00
H.J. Lu
e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c
This patch replaces x86-64 assembly versions of e_expf with generic
e_expf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      36.039           20.7749       73%
latency                    58.8096          40.8715       43%

On Skylake, it improves

                           Before            After     Improvement
reciprocal-throughput      18.4436          11.1693       65%
latency                    47.5162          37.5411       26%

	* sysdeps/x86_64/fpu/e_expf.S: Removed.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise.
	* sysdeps/x86_64/fpu/w_expf.c: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic
	e_expf.c.
	* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c):
	New.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf):
	Renamed to ...
	(__redirect_expf): This.
	(SYMBOL_NAME): Changed to expf.
	(__ieee754_expf): Renamed to ...
	(__expf): This.
	(__GI___expf): This.
	(__ieee754_expf): Add strong_alias.
	(__expf_finite): Likewise.
	(__expf): New.
	Include <sysdeps/ieee754/flt-32/e_expf.c>.
2017-10-22 07:49:55 -07:00
Joseph Myers
797ba44ba2 Add bits/floatn.h defines for more _FloatN / _FloatNx types.
The bits/floatn.h header currently only has defines relating to
_Float128.  This patch adds defines relating to other _FloatN /
_FloatNx types.

The approach taken is to add defines for all _FloatN / _FloatNx types
known to GCC, and to put them in a common bits/floatn-common.h header
included at the end of all the individual bits/floatn.h headers.  If
in future some defines become different for different glibc
configurations, they will move out into the separate bits/floatn.h
headers.

Some defines are expected always to be the same across glibc ports.
Corresponding defines are nevertheless put in this header.  The intent
is that where there are conditionals (in headers or in non-installed
files) that can just repeat the same or nearly the same logic for each
floating-point type, they should do so, even if in fact the cases for
some types could be unconditionally present or absent because the same
conditionals are true or false for all glibc configurations.  This
should make the glibc code with such conditionals easier to read,
because the reader can just see that the same conditionals are
repeated for each type, rather than seeing different conditionals for
different types and needing to reason, at each location with such
differences, why those differences are indeed correct there.  (Cases
involving per-format rather than per-type logic are more likely still
to need differences in how they handle different types.)

Having such defines and conditionals also helps in incremental
preparation for adding _Float32 / _Float64 / _Float32x / _Float64x
function aliases.  I intend subsequent patches to add such
conditionals corresponding to those already present for _Float128, as
well as making more architecture-specific function implementations use
common macros to define aliases in preparation for adding such _FloatN
/ _FloatNx aliases.

Tested for x86_64.

	* bits/floatn-common.h: New file.
	* math/Makefile (headers): Add bits/floatn-common.h.
	* bits/floatn.h: Include <bits/floatn-common.h>.
	* sysdeps/ia64/bits/floatn.h: Likewise.
	* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
	* sysdeps/mips/ieee754/bits/floatn.h: Likewise.
	* sysdeps/powerpc/bits/floatn.h: Likewise.
	* sysdeps/x86/bits/floatn.h: Likewise.
2017-10-20 21:42:51 +00:00
Adhemerval Zanella
fe05e1cb6d posix: Fix improper assert in Linux posix_spawn (BZ#22273)
As noted by Florian Weimer, current Linux posix_spawn implementation
can trigger an assert if the auxiliary process is terminated before
actually setting the err member:

    340   /* Child must set args.err to something non-negative - we rely on
    341      the parent and child sharing VM.  */
    342   args.err = -1;
    [...]
    362   new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
    363                    CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
    364
    365   if (new_pid > 0)
    366     {
    367       ec = args.err;
    368       assert (ec >= 0);

Another possible issue is killing the child between setting the err and
actually calling execve.  In this case the process will not ran, but
posix_spawn also will not report any error:

    269
    270   args->err = 0;
    271   args->exec (args->file, args->argv, args->envp);

As suggested by Andreas Schwab, this patch removes the faulty assert
and also handles any signal that happens before fork and execve as the
spawn was successful (and thus relaying the handling to the caller to
figure this out).  Different than Florian, I can not see why using
atomics to set err would help here, essentially the code runs
sequentially (due CLONE_VFORK) and I think it would not be legal the
compiler evaluate ec without checking for new_pid result (thus there
is no need to compiler barrier).

Summarizing the possible scenarios on posix_spawn execution, we
have:

  1. For default case with a success execution, args.err will be 0, pid
     will not be collected and it will be reported to caller.

  2. For default failure case, args.err will be positive and the it will
     be collected by the waitpid.  An error will be reported to the
     caller.

  3. For the unlikely case where the process was terminated and not
     collected by a caller signal handler, it will be reported as succeful
     execution and not be collected by posix_spawn (since args.err will
     be 0). The caller will need to actually handle this case.

  4. For the unlikely case where the process was terminated and collected
     by caller we have 3 other possible scenarios:

     4.1. The auxiliary process was terminated with args.err equal to 0:
	  it will handled as 1. (so it does not matter if we hit the pid
          reuse race since we won't possible collect an unexpected
          process).

     4.2. The auxiliary process was terminated after execve (due a failure
          in calling it) and before setting args.err to -1: it will also
          be handle as 1. but with the issue of not be able to report the
          caller a possible execve failures.

     4.3. The auxiliary process was terminated after args.err is set to -1:
          this is the case where it will be possible to hit the pid reuse
          case where we will need to collected the auxiliary pid but we
          can not be sure if it will be expected one.  I think for this
          case we need to actually change waitpid to use WNOHANG to avoid
          hanging indefinitely on the call and report an error to caller
          since we can't differentiate between a default failure as 2.
          and a possible pid reuse race issue.

Checked on x86_64-linux-gnu.

	* sysdeps/unix/sysv/linux/spawni.c (__spawnix): Handle the case where
	the auxiliary process is terminated by a signal before calling _exit
	or execve.
2017-10-20 16:25:59 -02:00
H.J. Lu
b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
H.J. Lu
9ba7e81028 m68k: Update elf_machine_load_address for static PIE
When --enable-static-pie is used to configure glibc, we need to use
_dl_relocate_static_pie to compute load address in static PIE.

	* sysdeps/m68k/dl-machine.h (elf_machine_load_address): Use
	_dl_relocate_static_pie instead of _dl_start to compute load
	address in static PIE.
2017-10-20 03:36:47 -07:00
H.J. Lu
4027a4fda0 m68k: Check PIC instead of SHARED in start.S
Since start.o may be compiled as PIC, we should check PIC instead of
SHARED.

	* sysdeps/m68k/start.S (_start): Check PIC instead of SHARED.
2017-10-20 03:34:53 -07:00
Florian Weimer
63b4baa44e sysconf: Fix missing definition of UIO_MAXIOV on Linux [BZ #22321]
After commit 37f802f864 (Remove
__need_IOV_MAX and __need_FOPEN_MAX), UIO_MAXIOV is no longer supplied
(indirectly) through <bits/stdio_lim.h>, so sysdeps/posix/sysconf.c no
longer sees the definition.
2017-10-20 04:10:15 +02:00
H.J. Lu
95ccb619f5 i386: Regenerate libm-test-ulps
Regenerate libm-test-ulps for --disable-multi-arch.

	* sysdeps/i386/fpu/libm-test-ulps: Regenerated.
2017-10-19 11:51:57 -07:00
Joseph Myers
76f2ed922a Add MIPS bits/floatn.h.
This patch adds a MIPS-specific bits/floatn.h header.  This header is
identical to the ldbl-128 version except for the comment at the top;
the purpose is to ensure that a 32-bit MIPS build installs a header
that is the same as in a 64-bit MIPS build and so properly shows
_Float128 support to be available for 64-bit compilations, on the
general principle of an installation for one multilib providing
headers also suitable for other multilibs.

Tested with build-many-glibcs.py.

	* sysdeps/mips/ieee754/bits/floatn.h: New file.
2017-10-19 17:59:41 +00:00
Joseph Myers
37bb78cb8c Install correct bits/long-double.h for MIPS64 (bug 22322).
Similar to bug 21987 for SPARC, MIPS64 wrongly installs the ldbl-128
version of bits/long-double.h, meaning incorrect results when using
headers installed from a 64-bit installation for a 32-bit build.  (I
haven't actually seen this cause build failures before its interaction
with bits/floatn.h did so - installed headers wrongly expecting
_Float128 to be available in a 32-bit configuration.)

This patch fixes the bug by moving the MIPS header to
sysdeps/mips/ieee754, which comes before sysdeps/ieee754/ldbl-128 in
the sysdeps directory ordering.  (bits/floatn.h will need a similar
fix - duplicating the ldbl-128 version for MIPS will suffice - for
headers from a 32-bit installation to be correct for 64-bit builds.)

Tested with build-many-glibcs.py (compilers build for
mips64-linux-gnu, where there was previously a libstdc++ build failure
as at
<https://sourceware.org/ml/libc-testresults/2017-q4/msg00130.html>).

	[BZ #22322]
	* sysdeps/mips/bits/long-double.h: Move to ....
	* sysdeps/mips/ieee754/bits/long-double.h: ... here.
2017-10-19 17:32:20 +00:00
H.J. Lu
4d916f0f12 x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299]
Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set
GLRO(dl_platform) to NULL.

	[BZ #22299]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	GLRO(dl_platform) to NULL.
	* sysdeps/x86_64/Makefile (tests): Add tst-platform-1.
	(modules-names): Add tst-platformmod-1 and
	x86_64/tst-platformmod-2.
	(CFLAGS-tst-platform-1.c): New.
	(CFLAGS-tst-platformmod-1.c): Likewise.
	(CFLAGS-tst-platformmod-2.c): Likewise.
	(LDFLAGS-tst-platformmod-2.so): Likewise.
	($(objpfx)tst-platform-1): Likewise.
	($(objpfx)tst-platform-1.out): Likewise.
	(tst-platform-1-ENV): Likewise.
	($(objpfx)x86_64/tst-platformmod-2.os): Likewise.
	* sysdeps/x86_64/tst-platform-1.c: New file.
	* sysdeps/x86_64/tst-platformmod-1.c: Likewise.
	* sysdeps/x86_64/tst-platformmod-2.c: Likewise.
2017-10-19 08:28:26 -07:00
Joseph Myers
81325b12b1 Add _Float128 function aliases.
This patch adds support for *f128 function aliases on platforms where
long double has the binary128 format (and thus GCC 7 provides the
_Float128 type with the same ABI as long double but as a distinct type
in terms of C type compatibility).  This is the same API as provided
in glibc 2.26 for powerpc64le / x86_64 / x86 / ia64 where _Float128
has a different format from long double, with the bulk of the API
coming from TS 18661-3.  All the functions alias the corresponding
long double functions, and __* function names are not provided since
those are only needed once for each floating-point format, not more
than once for different types with the same format (so for example,
-ffinite-math-only maps foof128 to __fool_finite, while type-generic
macros end up calling e.g. __issignalingl for _Float128 arguments on
such platforms).

The preparation for this feature was done in previous patches, so this
one just needs to add the relevant makefile and header definitions,
and update macro definitions of libm_alias_ldouble_other_r, to turn on
the feature, and update documentation and ABI baselines.

Tested (a) for x86_64, (b) for aarch64, (c) with build-many-glibcs.py
with both GCC 6 and GCC 7.

	* sysdeps/ieee754/ldbl-128/Makeconfig: New file.
	* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
	* sysdeps/ieee754/ldbl-128/float128-abi.h: Likewise.
	* sysdeps/generic/libm-alias-ldouble.h: Include <bits/floatn.h>.
	[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128]
	(libm_alias_ldouble_other_r): Also create _Float128 alias.
	* sysdeps/ieee754/ldbl-opt/libm-alias-ldouble.h: Include
	<bits/floatn.h>.
	[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128]
	(libm_alias_ldouble_other_r): Also create _Float128 alias.
	* manual/math.texi (Mathematics): Document additional architecture
	support for _Float128.
	* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
2017-10-18 17:37:18 +00:00
Szabolcs Nagy
a68ba2f3cd [AARCH64] Rewrite elf_machine_load_address using _DYNAMIC symbol
This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC
symbol instead of _dl_start.

The static address of _DYNAMIC symbol is stored in the first GOT entry.
Here is the change which makes this solution work (part of binutils 2.24):
https://sourceware.org/ml/binutils/2013-06/msg00248.html

i386, x86_64 targets use the same method to do this as well.

The original implementation relies on a trick that R_AARCH64_ABS32 relocation
being resolved at link time and the static address fits in the 32bits.
However, in LP64, normally, the address is defined to be 64 bit.

Here is the C version one which should be portable in all cases.

	* sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use
	_DYNAMIC symbol to calculate load address.
2017-10-18 17:35:16 +01:00
Paul Clarke
346729f66b powerpc: fix check-before-set in SET_RESTORE_ROUND
A performance regression was introduced by commit
84d74e427a "powerpc: Cleanup fenv_private.h".

In the powerpc implementation of SET_RESTORE_ROUND, there is the
following code in the "SET" function (slightly simplified):
--
  old.fenv = fegetenv_register ();

  new.l = (old.l & _FPU_MASK_TRAPS_RN) | r; (1)

  if (new.l != old.l)                       (2)
    {
      if ((old.l & _FPU_ALL_TRAPS) != 0)
        (void) __fe_mask_env ();
      fesetenv_register (new.fenv);         (3)
--

Line (1) sets the value of "new" to the current value of FPSCR,
but masks off summary bits, exceptions, non-IEEE mode, and
rounding mode, then ORs in the new rounding mode.

Line (2) compares this new value to the current value in order to
avoid setting a new value in the FPSCR (line (3)) unless something
significant has changed (exception enables or rounding mode).

The summary bits are not germane to the comparison, but are cleared
in "new" and preserved in "old", resulting in false negative
comparisons, and unnecessarily setting the FPSCR in those cases
with associated negative performance impacts.

The solution is to treat the summaries identically for "new" and "old":
- save them in SET
- leave them alone otherwise
- restore the saved values in RESTORE

Also minor changes:
- expand _FPU_MASK_RN to 64bit hex, to match other MASKs
- treat bit 52 (left-to-right) as reserved (since it is)

	* sysdeps/powerpc/fpu/fenv_private.h (_FPU_MASK_TRAPS_RN):
	(_FPU_MASK_FRAC_INEX_RET_CC): Fix masks to more properly handle
	summary bits.
	(_FPU_MASK_RN): Expand _FPU_MASK_RN to 64bit hex.
	(_FPU_MASK_NOT_RN_NI): Treat bit 52 (left-to-right) as reserved.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
2017-10-18 12:08:28 -02:00
Adhemerval Zanella
71d85045fd posix: Add p{readv,writev}2 flags to generic uio-ext.h
* bits/uio-ext.h (RWF_HIPRI, RWF_DSYNC, RWF_SYNC, RWF_NOWAIT): New
	defines.
2017-10-17 17:52:04 -02:00
Adhemerval Zanella
4e17c78e4a Add common ifunc-init.h header
This patch moves the generic definition from x86_64 init-arch
to a common header ifunc-init.h.  No functional changes is expected.

Checked on a x86_64-linux-gnu build.

	* sysdeps/generic/ifunc-init.h: New file.
	* sysdeps/x86/init-arch.h: Use generic ifunc-init.h.
2017-10-17 12:01:22 -02:00
Joseph Myers
c38a4bfd59 Move some float128 symbol version definitions.
With support for _Float128 functions on platforms where that type has
the same ABI as long double, as well as on platforms where it is
ABI-distinct, those functions will need to be exported from glibc's
shared libraries at appropriate symbol versions in each case.

This patch avoids duplication of lists of symbols to export by moving
the symbols other than __* to math/Versions and stdlib/Versions.
There, they are conditional on <float128-abi.h> defining
FLOAT128_VERSION and a default version of that header is added that
does not define that macro.  Enabling the float128 function aliases
will then include adding a sysdeps/ieee754/ldbl-128/float128-abi.h
that defines FLOAT128_VERSION to GLIBC_2.27.  Symbols __* remain in
sysdeps/ieee754/float128/Versions; those symbols should be present
only once per floating-point format, not once per type.

Note that if any platforms currently lacking support for a type with
binary128 format get glibc support for such a type in future (whether
only as _Float128, or also as a new long double format), and new libm
functions (present for all types) have been added by then, additional
macros will be needed to allow such functions to get a version of the
form "GLIBC_2.28 if the platform had _Float128 support by then, or the
later version at which that platform had _Float128 support added".
This is not however a preexisting condition, but would have applied
equally to the existing support for _Float128 as an ABI-distinct
type.  New all-type libm functions should just be added to the
appropriate symbol version (currently GLIBC_2.27) for all types, with
such special-case handling for _Float128 versions (and _Float64x as
well in future) waiting until someone actually wants to add support
for _Float128 to an existing platform after a release in which that
platform and a post-2.26 libm function had support but that platform
lacked _Float128 support.

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.  Also tested in conjunction
with the remaining changes to enable float128 aliases.

	* sysdeps/generic/float128-abi.h: New file.
	* sysdeps/ieee754/float128/Versions (FLOAT128_VERSION): Move
	non-__prefixed symbols to ....
	* math/Versions: ... here.  Include <float128-abi.h>.
	* stdlib/Versions ... and here.  Include <float128-abi.h>
2017-10-16 22:04:42 +00:00
Joseph Myers
02010e79ce Support strtof128 etc. aliases.
This patch adds support for building strtof128, wcstof128, strtof128_l
and wcstof128_l as aliases, in the case of __HAVE_FLOAT128 &&
!__HAVE_DISTINCT_FLOAT128.

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.  Also tested together with
changes to enable float128 aliases.

	* stdlib/strtold.c: Include <bits/floatn.h>
	[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (strtof128): Define
	and later undefine as macro.  Define as weak alias if
	[!USE_WIDE_CHAR].
	[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (wcstof128): Define
	and later undefine as macro.  Define as weak alias if
	[USE_WIDE_CHAR].
	* sysdeps/ieee754/ldbl-128/strtold_l.c [__HAVE_FLOAT128 &&
	!__HAVE_DISTINCT_FLOAT128] (strtof128_l): Define and later
	undefine as macro.  Define as weak alias if [!USE_WIDE_CHAR].
	[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (wcstof128_l):
	Define and later undefine as macro.  Define as weak alias if
	[USE_WIDE_CHAR].
	* sysdeps/ieee754/ldbl-64-128/strtold_l.c: Include
	<bits/floatn.h>.
	[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (strtof128_l):
	Define and later undefine as macro.  Define as weak alias if
	[!USE_WIDE_CHAR].
	[__HAVE_FLOAT128 && !__HAVE_DISTINCT_FLOAT128] (wcstof128_l):
	Define and later undefine as macro.  Define as weak alias if
	[USE_WIDE_CHAR].
2017-10-16 13:22:11 +00:00
Joseph Myers
f8718a9e16 Use libm_alias_ldouble_other in ldbl-64-128/s_nextafterl.c.
This patch makes ldbl-64-128/s_nextafterl.c restore the default
weak_alias definition and use libm_alias_ldouble_other (having
undefined and redefined weak_alias for the include of
ldbl-128/s_nextafterl.c, so the libm_alias_ldouble use in the latter
file is ineffective).

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.  Also tested together with
changes to enable float128 aliases.

	* sysdeps/ieee754/ldbl-64-128/s_nextafterl.c (weak_alias):
	Undefine and restore default definition.  Use
	libm_alias_ldouble_other.
2017-10-13 23:05:15 +00:00
James Clarke
8644588807 Fix TLS relocations against local symbols on powerpc32, sparc32 and sparc64
Normally, TLS relocations against local symbols are optimised by the linker
to be absolute.  However, gold does not do this, and so it is possible to
end up with, for example, R_SPARC_TLS_DTPMOD64 referring to a local symbol.
Since sym_map is left as null in elf_machine_rela for the special local
symbol case, the relocation handling thinks it has nothing to do, and so
the module gets left as 0.  Havoc then ensues when the variable in question
is accessed.

Before this fix, the main_local_gold program would receive a SIGBUS on
sparc64, and SIGSEGV on powerpc32.  With this fix applied, that test now
passes like the rest of them.

	* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
	Assign sym_map to be map for local symbols, as TLS relocations
	use sym_map to determine whether the symbol is defined and to
	extract the TLS information.
	* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela): Likewise.
2017-10-13 16:14:16 -03:00
Tulio Magno Quites Machado Filho
e8dbd6a36d powerpc: Avoid putting floating point values in memory [BZ #22189]
[BZ #22189]
	* sysdeps/powerpc/fpu/math_private.h (math_opt_barrier):
	(math_force_eval): Add powerpc version.
2017-10-13 15:44:39 -03:00
Tulio Magno Quites Machado Filho
179dcdb7af [BZ #22142] powerpc: Fix the carry bit on mpn_[add|sub]_n on POWER7
Fix the ifdef clause that was being used in the opposite way, setting
a wrong value of the carry bit.

This is also correcting 2 memory accesses that were mistakenly referring
to r0 while they were supposed to mean the immediate value 0.

	[BZ #22142]
	* stdio-common/tst-printf.c (fp_test): Add tests for DBL_MAX and
	-DBL_MAX.
	(do_test): Likewise.
	* stdio-common/tst-printf.sh: Likewise.
	* sysdeps/powerpc/powerpc64/power7/add_n.S: Invert the initial
	ifdef clause in order to set the carry bit right.  Replace r0 by
	0 without changing the behavior.
2017-10-13 15:44:39 -03:00
Joseph Myers
006e766437 Use libm_alias_ldouble for SPARC fabsl.
This patch makes SPARC fabsl implementation use libm_alias_ldouble, to
prepare them for also defining _Float128 function aliases.

Tested with build-many-glibcs.py that installed stripped shared
libraries (sparc64-linux-gnu and sparcv9-linux-gnu) are unchanged by
the patch.

	* sysdeps/sparc/sparc32/fpu/s_fabsl.c: Include
	<libm-alias-ldouble.h>.
	(fabsl): Define using libm_alias_ldouble.
	* sysdeps/sparc/sparc64/fpu/s_fabsl.c: Include
	<libm-alias-ldouble.h>.
	(fabsl): Define using libm_alias_ldouble.
2017-10-13 16:43:18 +00:00
Joseph Myers
1def91b304 Fix ldbl-opt/w_lgamma_compatl.c libm_alias_ldouble_other usage.
Testing with changes to enable _Float128 function aliases shows that
the libm_alias_ldouble_other usage in ldbl-opt/w_lgamma_compatl.c does
not in fact work.  Furthermore, it is unnecessary; the relevant
aliases get created through w_lgammal_compat2.c.  This patch removes
the problem code.

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by the patch.  Also tested in conjunction with
patches to enable _Float128 function aliases.

	* sysdeps/ieee754/ldbl-opt/w_lgamma_compatl.c [BUILD_LGAMMA]:
	Remove conditional code.
2017-10-13 16:38:37 +00:00
Joseph Myers
7d25d410c2 Fix ldbl-opt/s_clog10l.c libm_alias_ldouble_other usage.
Testing with changes to enable _Float128 function aliases shows that
the libm_alias_ldouble_other usage in ldbl-opt/s_clog10l.c does not in
fact work, because __clog10l is defined with long_double_symbol rather
than as a normal C alias.  This patch fixes this by renaming the
__clog10l__internal alias (not strictly necessary, but avoids a hack
with "__clog10l_interna" / "__clog10l__interna" as first argument to
libm_alias_ldouble_other) and using the renamed alias when calling
libm_alias_ldouble_other.

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanges by the patch.  Also tested in conjunction with
patches to enable _Float128 function aliases.

	* sysdeps/ieee754/ldbl-opt/s_clog10l.c (__clog10l__internal):
	Rename to __clog10_internal_l.
	(__clog10_internal_l): Define aliases using
	libm_alias_ldouble_other instead of using libm_alias_ldouble_other
	with __clog10.
2017-10-13 16:36:45 +00:00
Adhemerval Zanella
09c76a7409 Linux: Consolidate {RTLD_}SINGLE_THREAD_P definition
Current GLIBC has two ways to implement the single thread optimization
on syscalls to avoid calling the cancellation path: either by using
global variables (__{libc,pthread}_multiple_thread) or by accessing
the TCB field (defined by TLS_MULTIPLE_THREADS_IN_TCB).  Both the
variables and the macros to acces its value are defined in the
architecture sysdep-cancel.h header.

This patch consolidates its definition on only one header,
sysdeps/unix/sysv/linux/sysdep-cancel.h, and adds a new define
(SINGLE_THREAD_BY_GLOBAL) which the architecture defines if it prefer
to use the global variables instead of the TCB field.  This is an
optimization, so if the architecture does not define it, the TCB
method will be used as default.

Checked on x86_64-linux-gnu and on a build with major touched
ABIs (aarch64-linux-gnu, alpha-linux-gnu, arm-linux-gnueabihf,
hppa-linux-gnu, i686-linux-gnu, m68k-linux-gnu, microblaze-linux-gnu,
mips-linux-gnu, mips64-linux-gnu, powerpc-linux-gnu,
powerpc64le-linux-gnu, s390-linux-gnu, s390x-linux-gnu, sh4-linux-gnu,
sparcv9-linux-gnu, sparc64-linux-gnu, tilegx-linux-gnu).

	* sysdeps/unix/sysv/linux/aarch64/sysdep-cancel.h: Remove file.
	* sysdeps/unix/sysv/linux/alpha/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/arm/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/hppa/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/mips/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/nios2/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/sh/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/tile/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/sysdep-cancel.h: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h
	(SINGLE_THREAD_BY_GLOBAL): Define.
	* sysdeps/unix/sysv/linux/aarch64/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
	Likewise.
	* sysdeps/unix/sysv/linux/alpha/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
	Likewise.
	* sysdeps/unix/sysv/linux/arm/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
	Likewise.
	* sysdeps/unix/sysv/linux/hppa/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
	Likewise.
	* sysdeps/unix/sysv/linux/microblaze/sysdep.h
	(SINGLE_THREAD_BY_GLOBAL): Likewise.
	* sysdeps/unix/sysv/linux/x86_64/sysdep.h (SINGLE_THREAD_BY_GLOBAL):
	Likewise.
2017-10-11 14:27:24 -03:00
Joseph Myers
0ff64d3a18 Use generic alias macros in ldbl-opt.
This patch fixes ldbl-opt code to use generic libm alias macros in
preparation for getting _FloatN / _FloatNx aliases where appropriate.

Four functions are affected, that undefine and redefine alias macros
before including the implementations they wrap in such a way that
_FloatN / _FloatNx aliases would not appear.  s_clog10l.c undefines
and redefined declare_mgen_alias, so just needs a
libm_alias_ldouble_other call added.  w_exp10l_compat.c undefines and
redefines weak_alias, but in fact does not need to do so, since
math/w_exp10l_compat.c uses libm_alias_ldouble and does not use
weak_alias other than through that, so the undefines and redefines of
weak_alias are removed.  w_lgamma_compatl.c and w_remainderl_compat.c
are made to use libm_alias_ldouble_other in conjunction with restoring
the original definition of weak_alias so this is effective.

Tested with build-many-glibcs.py.  Installed stripped shared libraries
are unchanged by this patch.

	* sysdeps/ieee754/ldbl-opt/s_clog10l.c: Use
	libm_alias_ldouble_other.
	* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (weak_alias): Do not
	undefine and redefine.
	[LIBM_SVID_COMPAT && !LONG_DOUBLE_COMPAT (libm, GLIBC_2_1)]
	(exp10l): Do not define here.
	* sysdeps/ieee754/ldbl-opt/w_lgamma_compatl.c [BUILD_LGAMMA]
	(weak_alias): Undefine and redefine.
	[BUILD_LGAMMA]: Use libm_alias_ldouble_other.
	* sysdeps/ieee754/ldbl-opt/w_remainderl_compat.c
	[LIBM_SVID_COMPAT] (weak_alias): Undefine and redefine here.
	[LIBM_SVID_COMPAT]: Use libm_alias_ldouble_other.
2017-10-11 02:51:39 +00:00
Joseph Myers
24b6515d87 Add libm_alias_*_other_r macros.
Some libm functions are unable to use the generic alias macros such as
libm_alias_double because they have special symbol versioning
requirements for the main float, double or long double public names.

To facilitate adding _FloatN / _FloatNx function aliases in future,
it's still desirable to have generic macros those functions can use as
far as possible.  This patch adds macros such as
libm_alias_double_other, which only define names for _FloatN /
_FloatNx aliases, not for float / double / long double.  As present,
all these new macros do nothing, but they are called in the
appropriate places in macros such as libm_alias_double.  This patch
also arranges for lgamma implementations, and the recently added
optimized float function implementations, to use the new macros to
make them ready for addition of _FloatN / _FloatNx aliases.

Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by this patch.

	* sysdeps/generic/libm-alias-double.h (libm_alias_double_other_r):
	New macro.
	(libm_alias_double_other): Likewise.
	(libm_alias_double_r): Use libm_alias_double_other_r.
	* sysdeps/generic/libm-alias-float.h (libm_alias_float_other_r):
	New macro.
	(libm_alias_float_other): Likewise.
	(libm_alias_float_r): Use libm_alias_float_other_r.
	* sysdeps/generic/libm-alias-float128.h
	(libm_alias_float128_other_r): New macro.
	(libm_alias_float128_other): Likewise.
	(libm_alias_float128_r): Use libm_alias_float128_other_r.
	* sysdeps/generic/libm-alias-ldouble.h
	(libm_alias_ldouble_other_r): New macro.
	(libm_alias_ldouble_other): Likewise.
	(libm_alias_ldouble_r): Use libm_alias_ldouble_other_r.
	* sysdeps/ieee754/ldbl-opt/libm-alias-double.h
	(libm_alias_double_other_r): New macro.
	(libm_alias_double_other): Likewise.
	(libm_alias_double_r): Use libm_alias_double_other_r.
	* sysdeps/ieee754/ldbl-opt/libm-alias-ldouble.h
	(libm_alias_ldouble_other_r): New macro.
	(libm_alias_ldouble_other): Likewise.
	(libm_alias_ldouble_r): Use libm_alias_ldouble_other_r.
	* math/w_lgamma_main.c: Include <libm-alias-double.h>.
	[!USE_AS_COMPAT]: Use libm_alias_double_other.
	* math/w_lgammaf_main.c: Include <libm-alias-float.h>.
	[!USE_AS_COMPAT]: Use libm_alias_float_other.
	* math/w_lgammal_main.c: Include <libm-alias-ldouble.h>.
	[!USE_AS_COMPAT]: Use libm_alias_ldouble_other.
	* math/w_exp2f.c: Use libm_alias_float_other.
	* math/w_expf.c: Likewise.
	* math/w_log2f.c: Likewise.
	* math/w_logf.c: Likewise.
	* math/w_powf.c: Likewise.
	* sysdeps/ieee754/flt-32/e_exp2f.c: Include <libm-alias-float.h>.
	[!__exp2f]: Use libm_alias_float_other.
	* sysdeps/ieee754/flt-32/e_expf.c: Include <libm-alias-float.h>.
	[!__expf]: Use libm_alias_float_other.
	* sysdeps/ieee754/flt-32/e_log2f.c: Include <libm-alias-float.h>.
	[!__log2f]: Use libm_alias_float_other.
	* sysdeps/ieee754/flt-32/e_logf.c: Include <libm-alias-float.h>.
	[!__logf]: Use libm_alias_float_other.
	* sysdeps/ieee754/flt-32/e_powf.c: Include <libm-alias-float.h>.
	[!__powf]: Use libm_alias_float_other.
2017-10-10 21:29:11 +00:00
Joseph Myers
a8dce6197a Use generic macros for lgamma_r function aliases.
Continuing the use of generic macros for defining libm function
aliases, in preparation for adding more _FloatN / _FloatNx function
names, this patch makes the lgamma_r functions use such macros.

declare_mgen_alias_r becomes a standard macro in math-type-macros.h
instead of being locally defined in w_lgamma_r_templace.c.  This in
turn must be defined by each math-type-macros-<type>.h.  Rather than
providing an unused default in math-type-macros.h, that header is made
to give an error if math-type-macros-<type>.h failed to define
declare_mgen_alias or declare_mgen_alias_r.  The compat lgamma_r
wrappers are updated similarly.  The ldbl-opt versions are removed as
no longer needed.

Tested for x86_64, and with build-many-glibcs.py.  Installed stripped
shared libraries are unchanged except for powerpc64le (where the usual
issue applies that an ldbl-opt long double function previously used
long_double_symbol unconditionally and now the symbol versions on
powerpc64le mean weak_alias is used instead, resulting in the same
symbol versions in the final shared library but still enough
difference in the input objects for that library not to be
byte-identical).

	* sysdeps/generic/math-type-macros.h [!declare_mgen_alias]: Give
	error.  Remove default definition of declare_mgen_alias.
	[!declare_mgen_alias_r]: Likewise.
	* sysdeps/generic/math-type-macros-double.h
	[!declare_mgen_alias_r] (declare_mgen_alias_r): New macro.
	* sysdeps/generic/math-type-macros-float.h [!declare_mgen_alias_r]
	(declare_mgen_alias_r): Likewise.
	* sysdeps/generic/math-type-macros-float128.h
	[!declare_mgen_alias_r] (declare_mgen_alias_r): Likewise.
	* sysdeps/generic/math-type-macros-ldouble.h
	[!declare_mgen_alias_r] (declare_mgen_alias_r): Likewise.
	* math/w_lgamma_r_template.c (declare_mgen_alias_r_x): Remove
	macro.
	(declare_mgen_alias_r_s): Likewise.
	(declare_mgen_alias_r): Likewise.
	* math/w_lgamma_r_compat.c: Include <libm-alias-double.h>.
	(lgamma_r): Define using libm_alias_double_r.
	* math/w_lgammaf_r_compat.c: Include <libm-alias-float.h>.
	(lgammaf_r): Define using libm_alias_float_r.
	* math/w_lgammal_r_compat.c: Include <libm-alias-ldouble.h>.
	(lgammal_r): Define using libm_alias_ldouble_r.
	* sysdeps/ieee754/ldbl-opt/w_lgamma_r_compat.c: Remove file.
	* sysdeps/ieee754/ldbl-opt/w_lgammal_r_compat.c: Likewise.
2017-10-09 22:04:18 +00:00
Joseph Myers
c7509db215 Remove ldbl-opt w_scalbln.c.
The ldbl-opt version of w_scalbln.c is not in fact needed; it handles
compat symbol versions for libc, but this file isn't built for libc,
only for libm.  This patch removes this file.

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.

	* sysdeps/ieee754/ldbl-opt/w_scalbln.c: Remove file.
2017-10-09 16:52:56 +00:00
Joseph Myers
f85a176f3f Use libm_alias_double in ldbl-128, ldbl-96 fma.
This patch makes the ldbl-128 and ldbl-96 implementations of fma use
libm_alias_double.

Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by the patch.

	* sysdeps/ieee754/ldbl-128/s_fma.c: Include <libm-alias-double.h>.
	[!__fma] (fma): Define using libm_alias_double.
	* sysdeps/ieee754/ldbl-96/s_fma.c: Include <libm-alias-double.h>.
	[!__fma] (fma): Define using libm_alias_double.
2017-10-06 20:23:58 +00:00
Joseph Myers
fd3b4e7c8a Use libm_alias_ldouble for ldbl-128 functions.
This patch makes ldbl-128 functions use libm_alias_ldouble to define
function aliases.  float128_private.h is updated accordingly.  Most of
the ldbl-64-128 wrappers are removed as no longer needed with this
change (leaving those that involve versioning for functions in libc or
that shouldn't be exported from libm for _Float128 / _Float64x types
with the same format as long double).

Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by this patch.

	* sysdeps/ieee754/float128/float128_private.h: Include
	<libm-alias-ldouble.h> and <libm-alias-float128.h>.
	(libm_alias_ldouble_r): Undefine and redefine.
	* sysdeps/ieee754/ldbl-128/s_asinhl.c: Include
	<libm-alias-ldouble.h>.
	(asinhl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_atanl.c: Include
	<libm-alias-ldouble.h>.
	(atanl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_cbrtl.c: Include
	<libm-alias-ldouble.h>.
	(cbrtl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_ceill.c: Include
	<libm-alias-ldouble.h>.
	(ceill): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_copysignl.c: Include
	<libm-alias-ldouble.h>.
	(copysignl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_cosl.c: Include
	<libm-alias-ldouble.h>.
	(cosl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_erfl.c: Include
	<libm-alias-ldouble.h>.
	(erfl): Define using libm_alias_ldouble.
	(erfcl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_expm1l.c: Include
	<libm-alias-ldouble.h>.
	(expm1l): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_fabsl.c: Include
	<libm-alias-ldouble.h>.
	(fabsl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_floorl.c: Include
	<libm-alias-ldouble.h>.
	(floorl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_fmal.c: Include
	<libm-alias-ldouble.h>.
	(fmal): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_frexpl.c: Include
	<libm-alias-ldouble.h>.
	(frexpl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_fromfpl.c (fromfpl): Define using
	libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_fromfpl_main.c: Include
	<libm-alias-ldouble.h>.
	* sysdeps/ieee754/ldbl-128/s_fromfpxl.c (fromfpxl): Define using
	libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_getpayloadl.c: Include
	<libm-alias-ldouble.h>.
	(getpayloadl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_llrintl.c: Include
	<libm-alias-ldouble.h>.
	(llrintl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_llroundl.c: Include
	<libm-alias-ldouble.h>.
	(llroundl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_logbl.c: Include
	<libm-alias-ldouble.h>.
	(logbl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_lrintl.c: Include
	<libm-alias-ldouble.h>.
	(lrintl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_lroundl.c: Include
	<libm-alias-ldouble.h>.
	(lroundl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_modfl.c: Include
	<libm-alias-ldouble.h>.
	(modfl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Include
	<libm-alias-ldouble.h>.
	(nearbyintl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_nextafterl.c: Include
	<libm-alias-ldouble.h>.
	(nextafterl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_nextupl.c: Include
	<libm-alias-ldouble.h>.
	(nextupl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_remquol.c: Include
	<libm-alias-ldouble.h>.
	(remquol): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_rintl.c: Include
	<libm-alias-ldouble.h>.
	(rintl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_roundevenl.c: Include
	<libm-alias-ldouble.h>.
	(roundevenl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_roundl.c: Include
	<libm-alias-ldouble.h>.
	(roundl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_setpayloadl.c (setpayloadl): Define
	using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_setpayloadl_main.c: Include
	<libm-alias-ldouble.h>.
	* sysdeps/ieee754/ldbl-128/s_setpayloadsigl.c (setpayloadsigl):
	Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_sincosl.c: Include
	<libm-alias-ldouble.h>.
	(sincosl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_sinl.c: Include
	<libm-alias-ldouble.h>.
	(sinl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_tanhl.c: Include
	<libm-alias-ldouble.h>.
	(tanhl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_tanl.c: Include
	<libm-alias-ldouble.h>.
	(tanl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_totalorderl.c: Include
	<libm-alias-ldouble.h>.
	(totalorderl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_totalordermagl.c: Include
	<libm-alias-ldouble.h>.
	(totalordermagl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_truncl.c: Include
	<libm-alias-ldouble.h>.
	(truncl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_ufromfpl.c (ufromfpl): Define using
	libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-128/s_ufromfpxl.c (ufromfpxl): Define using
	libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-64-128/s_copysignl.c: Include
	<libm-alias-ldouble.h>.
	(weak_alias): Do not undefine and redefine.
	[IS_IN (libc)] (libm_alias_ldouble): Undefine and redefine.
	(copysignl): Define with long_double_symbol only if [IS_IN
	(libc)].
	* sysdeps/ieee754/ldbl-64-128/s_frexpl.c: Include
	<libm-alias-ldouble.h>.
	(weak_alias): Do not undefine and redefine.
	[IS_IN (libc)] (libm_alias_ldouble): Undefine and redefine.
	(frexpl): Define with long_double_symbol only if [IS_IN (libc)].
	* sysdeps/ieee754/ldbl-64-128/s_modfl.c: Include
	<libm-alias-ldouble.h>.
	(weak_alias): Do not undefine and redefine.
	[IS_IN (libc)] (libm_alias_ldouble): Undefine and redefine.
	(modfl): Define with long_double_symbol only if [IS_IN (libc)].
	* sysdeps/ieee754/ldbl-64-128/s_asinhl.c: Remove file.
	* sysdeps/ieee754/ldbl-64-128/s_atanl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_cbrtl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_ceill.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_cosl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_erfl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_expm1l.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_fabsl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_floorl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_fmal.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_llrintl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_llroundl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_logbl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_lrintl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_lroundl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_nearbyintl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_remquol.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_rintl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_roundl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_sincosl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_sinl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_tanhl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_tanl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_truncl.c: Likewise.
2017-10-06 17:45:05 +00:00
Joseph Myers
6dff198369 Remove redundant ldbl-64-128 files.
Various source files in ldbl-64-128 are redundant, because they wrap
files that no longer provide public symbols that need special
versioning (those symbols having moved to separate errno-setting
wrappers), or, in the case of w_scalblnl.c, because the type-generic
template now does everything required (it deals with symbol versioning
for use in libm, and this file is never built for libc anyway - the
compat scalbln* symbols in libc, as opposed to scalbn*, are only for
i386 and m68k and are aliases to the corresponding scalbn* symbols).
This patch removes those redundant files.

Tested with build-many-glibcs.py (for all ldbl-64-128 configurations)
that installed stripped shared libraries are unchanged by this patch.

	* sysdeps/ieee754/ldbl-64-128/e_ilogbl.c: Remove file.
	* sysdeps/ieee754/ldbl-64-128/s_log1pl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_scalblnl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/s_scalbnl.c: Likewise.
	* sysdeps/ieee754/ldbl-64-128/w_scalblnl.c: Likewise.
2017-10-06 15:02:12 +00:00
Rajalakshmi Srinivasaraghavan
5a90716891 powerpc: Fix IFUNC for memrchr
Recent commit 59ba2d2b54 missed to add __memrchr_power8 in
ifunc list.  Also handled discarding unwanted bytes for
unaligned inputs in power8 optimization.

2017-10-05  Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>

	* sysdeps/powerpc/powerpc64/multiarch/memrchr-ppc64.c: Revert
	back to powerpc32 file.
	* sysdeps/powerpc/powerpc64/multiarch/memrchr.c
	(memrchr): Add __memrchr_power8 to ifunc list.
	* sysdeps/powerpc/powerpc64/power8/memrchr.S: Mask
	extra bytes for unaligned inputs.
2017-10-06 10:04:52 +05:30
Joseph Myers
0db0b931cf Update ARM libm-test-ulps.
* sysdeps/arm/libm-test-ulps: Update.
2017-10-05 22:17:30 +00:00
Joseph Myers
86f9568af6 Use libm_alias_ldouble for ldbl-96 functions.
This patch makes ldbl-96 functions use libm_alias_ldouble to define
function aliases.

Tested for x86_64, and tested with build-many-glibcs.py that installed
stripped shared libraries are unchanged by the patch.

	* sysdeps/ieee754/ldbl-96/s_asinhl.c: Include
	<libm-alias-ldouble.h>.
	(asinhl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_cbrtl.c: Include
	<libm-alias-ldouble.h>.
	(cbrtl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_copysignl.c: Include
	<libm-alias-ldouble.h>.
	(copysignl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_cosl.c: Include
	<libm-alias-ldouble.h>.
	(cosl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_erfl.c: Include
	<libm-alias-ldouble.h>.
	(erfl): Define using libm_alias_ldouble.
	(erfcl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_fmal.c: Include
	<libm-alias-ldouble.h>.
	(fmal): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_frexpl.c: Include
	<libm-alias-ldouble.h>.
	(frexpl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_fromfpl.c (fromfpl): Define using
	libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_fromfpl_main.c: Include
	<libm-alias-ldouble.h>.
	* sysdeps/ieee754/ldbl-96/s_fromfpxl.c (fromfpxl): Define using
	libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_getpayloadl.c: Include
	<libm-alias-ldouble.h>.
	(getpayloadl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_llrintl.c: Include
	<libm-alias-ldouble.h>.
	(llrintl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_llroundl.c: Include
	<libm-alias-ldouble.h>.
	(llroundl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_lrintl.c: Include
	<libm-alias-ldouble.h>.
	(lrintl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_lroundl.c: Include
	<libm-alias-ldouble.h>.
	(lroundl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_modfl.c: Include
	<libm-alias-ldouble.h>.
	(modfl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_nextupl.c: Include
	<libm-alias-ldouble.h>.
	(nextupl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_remquol.c: Include
	<libm-alias-ldouble.h>.
	(remquol): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_roundevenl.c: Include
	<libm-alias-ldouble.h>.
	(roundevenl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_roundl.c: Include
	<libm-alias-ldouble.h>.
	(roundl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_setpayloadl.c (setpayloadl): Define
	using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_setpayloadl_main.c: Include
	<libm-alias-ldouble.h>.
	* sysdeps/ieee754/ldbl-96/s_setpayloadsigl.c: Include
	<libm-alias-ldouble.h>.
	(setpayloadsigl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_sincosl.c: Include
	<libm-alias-ldouble.h>.
	(sincosl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_sinl.c: Include
	<libm-alias-ldouble.h>.
	(sinl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_tanhl.c: Include
	<libm-alias-ldouble.h>.
	(tanhl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_tanl.c: Include
	<libm-alias-ldouble.h>.
	(tanl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_totalorderl.c: Include
	<libm-alias-ldouble.h>.
	(totalorderl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_totalordermagl.c: Include
	<libm-alias-ldouble.h>.
	(totalordermagl): Define using libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_ufromfpl.c (ufromfpl): Define using
	libm_alias_ldouble.
	* sysdeps/ieee754/ldbl-96/s_ufromfpxl.c (ufromfpxl): Define using
	libm_alias_ldouble.
2017-10-05 21:13:40 +00:00
Siddhesh Poyarekar
dd5bc7f1b3 aarch64: Optimized implementation of memmove for Qualcomm Falkor
This is an optimized memmove implementation for the Qualcomm Falkor
processor core.  Due to the way the falkor memcpy needs to be written,
code cannot be easily shared between memmove and memcpy like in case
of other aarch64 memcpy implementations due to which this routine is
separate.  The underlying principle is the same as that of memcpy
where it tries to use registers with the same lower 4 bits for
fetching the same stream, thus optimizing hardware prefetcher
performance.

The memcpy copy loop copies 64 bytes at a time using the same register
pair since that's the way to train the hardware prefetcher on the
falkor core.  memmove cannot quite do that since it needs to avoid
overlaps, so it does the next best thing, i.e. has a 32 byte loop with
a 32 byte end (prefetch a loop ahead to account for overlapping
locations) with register pairs that alias so that they hit the same
prefetcher.  Due to this difference in loop size, they have to
currently be separate implementations but efforts are on to try and
get memmove to fall back into memcpy whenever it can without simply
duplicating all of the code.

Performance:

The routine fares around 20-25% better than the generic memmove for
most medium to large sizes (i.e. > 128 bytes) for the new walking
memmove benchmark (memmove-walk) with an unexplained regression
between 1K and 2K.  The minor regression is something worth looking
into for us, but the remaining gains are significant enough that we
would like this included upstream as we looking into the cause for the
regression.  Here is a snippet of the numbers as generated from the
microbenchmark by the compare_strings script.  Comparisons are against
__memmove_generic:

Function: memmove
Variant: walk
                                    __memmove_thunderx	__memmove_falkor	__memmove_generic
========================================================================================================================
<snip>
                        length=16384:  12508800.00 (  6.09%)	 11486800.00 ( 13.76%)	 13319600.00
                        length=16400:  13614200.00 ( -0.67%)	 11585000.00 ( 14.33%)	 13523600.00
                        length=16385:  13448400.00 (  0.10%)	 11732700.00 ( 12.84%)	 13461200.00
                        length=16399:  13594100.00 ( -0.22%)	 11859600.00 ( 12.57%)	 13564400.00
                        length=16386:  13211600.00 (  1.13%)	 11503800.00 ( 13.91%)	 13362400.00
                        length=16398:  13218600.00 (  2.12%)	 11573200.00 ( 14.30%)	 13504700.00
                        length=16387:  13510900.00 ( -0.37%)	 11744200.00 ( 12.76%)	 13461300.00
                        length=16397:  13603700.00 ( -0.15%)	 11878200.00 ( 12.55%)	 13583200.00
                        length=16388:  13461700.00 ( -0.13%)	 11558000.00 ( 14.03%)	 13444100.00
                        length=16396:  13517500.00 ( -0.03%)	 11561300.00 ( 14.45%)	 13513900.00
                        length=16389:  13534100.00 (  0.17%)	 11756800.00 ( 13.28%)	 13556900.00
                        length=16395:  13585600.00 (  0.11%)	 11791800.00 ( 13.30%)	 13601200.00
                        length=16390:  13480100.00 ( -0.13%)	 11685500.00 ( 13.20%)	 13462100.00
                        length=16394:  13529900.00 ( -0.23%)	 11549800.00 ( 14.43%)	 13498200.00
                        length=16391:  13595400.00 ( -0.26%)	 11768200.00 ( 13.22%)	 13560600.00
                        length=16393:  13567000.00 (  0.20%)	 11779700.00 ( 13.35%)	 13594700.00
                        length=32768:  71308800.00 ( -6.53%)	 50220800.00 ( 24.98%)	 66939200.00
                        length=32784:  72100800.00 (-11.55%)	 50114100.00 ( 22.47%)	 64636300.00
                        length=32769:  71767000.00 ( -7.10%)	 51238400.00 ( 23.54%)	 67010000.00
                        length=32783:  70113700.00 (-40.95%)	 51129000.00 ( -2.78%)	 49744400.00
                        length=32770:  71367600.00 ( -6.52%)	 50244700.00 ( 25.01%)	 67000900.00
                        length=32782:  64366700.00 (  4.71%)	 50101400.00 ( 25.83%)	 67545600.00
                        length=32771:  71440100.00 ( -6.51%)	 51263900.00 ( 23.57%)	 67074900.00
                        length=32781:  66993000.00 (  0.34%)	 51108300.00 ( 23.97%)	 67220300.00
                        length=32772:  71443900.00 (-60.50%)	 50062100.00 (-12.47%)	 44512600.00
                        length=32780:  71759100.00 ( -6.58%)	 50263200.00 ( 25.35%)	 67328600.00
                        length=32773:  71714900.00 (-33.21%)	 51076600.00 (  5.12%)	 53835400.00
                        length=32779:  71756900.00 ( -6.56%)	 51290800.00 ( 23.83%)	 67337800.00
                        length=32774:  59689300.00 (-34.55%)	 50068400.00 (-12.86%)	 44363300.00
                        length=32778:  71847500.00 (-18.20%)	 50084100.00 ( 17.61%)	 60786500.00
                        length=32775:  71599300.00 ( -6.54%)	 51278200.00 ( 23.70%)	 67204800.00
                        length=32777:  71862900.00 (-60.85%)	 51094000.00 (-14.36%)	 44677900.00
                        length=65536: 282848000.00 ( -6.60%)	199187000.00 ( 24.93%)	265325000.00
                        length=65552: 243285000.00 (-41.61%)	198512000.00 (-15.54%)	171805000.00
                        length=65537: 255415000.00 (-23.47%)	202499000.00 (  2.11%)	206858000.00
                        length=65551: 280122000.00 (-62.95%)	203349000.00 (-18.29%)	171911000.00
                        length=65538: 283676000.00 (-14.46%)	198368000.00 ( 19.96%)	247848000.00
                        length=65550: 275566000.00 (-51.76%)	198494000.00 ( -9.31%)	181581000.00
                        length=65539: 283699000.00 ( -6.58%)	203453000.00 ( 23.57%)	266195000.00
                        length=65549: 286572000.00 ( -6.65%)	202607000.00 ( 24.60%)	268712000.00
                        length=65540: 283710000.00 ( -6.59%)	199161000.00 ( 25.17%)	266160000.00
                        length=65548: 237573000.00 ( 11.48%)	198462000.00 ( 26.06%)	268395000.00
                        length=65541: 284150000.00 ( -6.58%)	203273000.00 ( 23.75%)	266600000.00
                        length=65547: 286250000.00 ( -6.70%)	202594000.00 ( 24.48%)	268263000.00
                        length=65542: 284167000.00 ( -6.60%)	199122000.00 ( 25.31%)	266584000.00
                        length=65546: 285656000.00 ( -6.59%)	198443000.00 ( 25.95%)	268002000.00
                        length=65543: 284600000.00 ( -6.58%)	203247000.00 ( 23.89%)	267030000.00
                        length=65545: 285665000.00 ( -6.40%)	202575000.00 ( 24.55%)	268472000.00
<snip>

	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
	memmove_falkor.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Likewise.
	* sysdeps/aarch64/multiarch/memmove.c: Likewise.
	* sysdeps/aarch64/multiarch/memmove_falkor.S: New file.
2017-10-05 22:20:23 +05:30
Joseph Myers
644d38570a Remove add-ons mechanism.
glibc has an add-ons mechanism to allow additional software to be
integrated into the glibc build.  Such add-ons may be within the glibc
source tree, or outside it at a path passed to the --enable-add-ons
configure option.

localedata and crypt were once add-ons, distributed in separate
release tarballs, but long since stopped using that mechanism.
Linuxthreads was always an add-on.  Ports spent some time as an add-on
with separate release tarballs, then was first moved into the glibc
source tree, then had its sysdeps files moved into the main sysdeps
hierarchy so the add-ons mechanism was no longer used.  NPTL spent
some time as an add-on in the main glibc tree before stopping using
the add-on mechanism.  libidn used to have separate release tarballs
but no longer does so, but still uses the add-ons mechanism within the
glibc source tree.  Various other software has supported building with
the add-ons mechanism at times in the past, but I don't think any is
still widely used.

Add-ons involve significant, little-used complexity in the glibc build
system, and make it hard to understand what the space of possible
glibc configurations is.  This patch removes the add-ons mechanism.
libidn is now built via the Subdirs mechanism to cause any
configuration using sysdeps/unix/inet to build libidn; HAVE_LIBIDN
(which effectively means shared libraries are available) is now
defined via sysdeps/unix/inet/configure.  Various references to
add-ons around the source tree are removed (in the case of maint.texi,
the example list of sysdeps directories is still very out of date).

Externally maintained ports should now put their files in the normal
sysdeps directory structure rather than being arranged as add-ons;
they probably need to change e.g. elf.h anyway, rather than actually
being able to work just as a drop-in subtree.  Hurd libpthread should
be arranged similarly to NPTL, so some files might go in a
hurd-pthreads (or similar) top-level directory in glibc, while sysdeps
files should go in the normal sysdeps directory structure (possibly in
hurd or hurd-pthreads subdirectories, just as there are nptl
subdirectories in the sysdeps tree).

Tested for x86_64, and with build-many-glibcs.py.

	* configure.ac (--enable-add-ons): Remove option.
	(machine): Do not mention add-ons in comment.
	(LIBC_PRECONFIGURE): Likewise.
	(add_ons): Remove variable and sanity checks and logic to locate
	add-ons.
	(add_ons_automatic): Remove variable.
	(configured_add_ons): Likewise.
	(add_ons_sfx): Likewise.
	(add_ons_pfx): Likewise.
	(add_on_subdirs): Likewise.
	(sysnames_add_ons): Likewise.  Remove loop over add-ons and
	consideration of add-ons in Implies handling.
	(sysdeps_add_ons): Likewise.
	* configure: Regenerated.
	* libidn/configure.ac: Remove.
	* libidn/configure: Likewise.
	* sysdeps/unix/inet/configure.ac: New file.
	* sysdeps/unix/inet/configure: New generated file.
	* sysdeps/unix/inet/Subdirs: Add libidn.
	* Makeconfig (sysdeps-srcdirs): Remove variable.
	(+sysdep_dirs): Do not include $(sysdeps-srcdirs).
	($(common-objpfx)config.status): Do not depend on add-on files.
	($(common-objpfx)shlib-versions.v.i): Do not mention add-ons in
	comment.
	(all-subdirs): Do not include $(add-on-subdirs).
	* Makefile (dist-prepare): Do not use $(sysdeps-add-ons).
	* config.make.in (add-ons): Remove variable.
	(add-on-subdirs): Likewise.
	(sysdeps-add-ons): Likewise.
	* manual/Makefile (add-chapters): Remove.
	($(objpfx)texis): Do not depend on $(add-chapters).
	(nonexamples): Do not handle $(add-chapters).
	(examples): Do not handle $(add-ons).
	(chapters.% top-menu.%): Do not pass '$(add-chapters)' to
	libc-texinfo.sh.
	* manual/install.texi (Installation): Do not mention add-ons.
	(--enable-add-ons): Do not document configure option.
	* INSTALL: Regenerated.
	* manual/libc-texinfo.sh: Do not handle $2 add-ons argument.
	* manual/maint.texi (Hierarchy Conventions): Do not mention
	add-ons.
	* scripts/build-many-glibcs.py (Glibc.build_glibc): Do not use
	--enable-add-ons.
	* scripts/gen-sorted.awk: Do not handle Subdirs files from
	add-ons.
	* scripts/test-installation.pl: Do not handle glibc-compat add-on.
	* sysdeps/nptl/Makeconfig: Do not mention add-ons in comment.
2017-10-05 15:58:13 +00:00