Commit Graph

32975 Commits

Author SHA1 Message Date
Patrick McGehearty
6fd0a3c6a8 Improve __ieee754_exp() performance by greater than 5x on sparc/x86.
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 17:27:31 +00:00
Adhemerval Zanella
3bb1ef58b9 ia64: Fix memchr for large input sizes (BZ #22603)
Current optimized ia64 memchr uses a strategy to check for last address
by adding the input one with expected size.  However it does not take
care for possible overflow.

It was triggered by 3038145ca2 where default rawmemchr now uses memchr
(p, c, (size_t)-1).

This patch fixes it by implement a satured addition where overflows
sets the maximum pointer size to UINTPTR_MAX.

Checked on ia64-linux-gnu where it fixes both stratcliff and
test-rawmemchr failures.

	Adhemerval Zanella  <adhemerval.zanella@linaro.org>
	James Clarke <jrtc27@jrtc27.com>

	[BZ #22603]
	* sysdeps/ia64/memchr.S (__memchr): Avoid overflow in pointer
	addition.
2017-12-19 12:02:36 -02:00
Adhemerval Zanella
554e3d51ef sh: Fix clone exit return code (BZ #22605)
Since 3f823e87cc (Call exit directly in clone (BZ #21512)) SH clone
implementation fails to set the exit code resulting in the failures:

FAIL: nptl/tst-align-clone
FAIL: nptl/tst-getpid1

This patch fixes the both testcases.

	[BZ #22605]
	* sysdeps/unix/sysv/linux/sh/clone.S (__clone): Fix exit return
	code.

Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-12-19 12:02:01 -02:00
H.J. Lu
cba595c350 x86: Add feature_1 to tcbhead_t [BZ #22563]
On x86, padding in struct __jmp_buf_tag is used for shadow stack pointer
to support Shadow Stack in Intel Control-flow Enforcemen Technology.
cancel_jmp_buf has been updated to include saved_mask so that it is as
large as struct __jmp_buf_tag.  We must suport the old cancel_jmp_buf
in existing binaries.  Since symbol versioning doesn't work on
cancel_jmp_buf, feature_1 is added to tcbhead_t so that setjmp and
longjmp can check if shadow stack is enabled.  NB: Shadow stack is
enabled only if all modules are shadow stack enabled.

	[BZ #22563]
	* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Add feature_1.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Rename __glibc_unused1
	to feature_1.
2017-12-19 02:45:34 -08:00
H.J. Lu
f81ddabffd Linux/x86: Update cancel_jmp_buf to match __jmp_buf_tag [BZ #22563]
On x86, padding in struct __jmp_buf_tag is used for shadow stack pointer
to support shadow stack in Intel Control-flow Enforcemen Technology.
Since the cancel_jmp_buf array is passed to setjmp and longjmp by
casting it to pointer to struct __jmp_buf_tag, it should be as large
as struct __jmp_buf_tag.  Otherwise when shadow stack is enabled,
setjmp and longjmp will write and read beyond cancel_jmp_buf when saving
and restoring shadow stack pointer.

This patch adds bits/types/__cancel_jmp_buf_tag.h to define struct
__cancel_jmp_buf_tag so that Linux/x86 can add saved_mask to
cancel_jmp_buf.

Tested natively on i386, x86_64 and x32.  Tested hppa-linux-gnu with
build-many-glibcs.py.

	[BZ #22563]
	* bits/types/__cancel_jmp_buf_tag.h: New file.
	* sysdeps/unix/sysv/linux/x86/bits/types/__cancel_jmp_buf_tag.h
	* sysdeps/unix/sysv/linux/x86/pthreaddef.h: Likewise.
	* sysdeps/unix/sysv/linux/x86/nptl/pthreadP.h: Likewise.
	* nptl/Makefile (headers): Add
	bits/types/__cancel_jmp_buf_tag.h.
	* nptl/descr.h [NEED_SAVED_MASK_IN_CANCEL_JMP_BUF]
	(pthread_unwind_buf): Add saved_mask to cancel_jmp_buf.
	* sysdeps/nptl/pthread.h: Include
	<bits/types/__cancel_jmp_buf_tag.h>.
	(__pthread_unwind_buf_t): Use struct __cancel_jmp_buf_tag with
	__cancel_jmp_buf.
	* sysdeps/unix/sysv/linux/hppa/pthread.h: Likewise.
2017-12-19 02:44:04 -08:00
H.J. Lu
1a49fc59e4 Add --enable-static-pie variants to x86_64, x32 and i686
Since the default GCC and binutils versions used by build-many-glibcs.py,
which are GCC 7 branch and binutils 2.29 branch, support static PIE on
x86_64, x32 and i686, this patch adds --enable-static-pie glibc variants
to x86_64, x32 and i686 to get some coverage for static PIE.

Tested with build-many-glibcs.py.

	* scripts/build-many-glibcs.py (Context.add_all_configs): Add
	--enable-static-pie variants to x86_64, x32 and i686.
2017-12-18 18:11:47 -08:00
Joseph Myers
6642518592 Fix m68k bits/mathinline.h attributes (bug 22631).
m68k bits/mathinline.h declares various functions with const
attributes.  These are inappropriate for functions that have results
depending on the rounding mode; the machine-independent
bits/mathcalls.h only uses const attributes for a very few functions
with no rounding mode dependence, and the m68k header should do
likewise.  GCC uses pure for such functions with -frounding-math,
resulting in GCC mainline warning for conflicts with between the
header and the built-in attributes and glibc failing to build for m68k
with GCC mainline.

This patch fixes the attributes to avoid using const except when
bits/mathcalls.h does so.  (There are a few functions where maybe
bits/mathcalls.h could do so but doesn't, but keeping the headers in
sync in this regard seems to be the safe approach.)

Tested compilation with build-many-glibcs.py with GCC mainline.

	[BZ #22631]
	* sysdeps/m68k/m680x0/fpu/bits/mathinline.h (__m81_defun): Add
	argument for attrubutes.  All callers changed.
	(__inline_mathop1): Likewise.  All callers changed.
	(__inline_mathop): Likewise.  All callers changed.
	[__USE_MISC] (scalbn): Use __inline_forward instead of
	__inline_forward_c.
	[__USE_ISOC99] (scalbln): Likewise.
	[__USE_ISOC99] (nearbyint): Likewise.
	[__USE_ISOC99] (lrint): Likewise.
	[__USE_MISC] (scalbnf): Likewise.
	[__USE_ISOC99] (scalblnf): Likewise.
	[__USE_ISOC99] (nearbyintf): Likewise.
	[__USE_ISOC99] (lrintf): Likewise.
	[__USE_MISC] (scalbnl): Likewise.
	[__USE_ISOC99] (scalblnl): Likewise.
	[__USE_ISOC99] (nearbyintl): Likewise.
	[__USE_ISOC99] (lrintl): Likewise.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h: All callers of
	__inline_mathop and __m81_defun changed.
2017-12-19 02:02:26 +00:00
Joseph Myers
8e52f573a1 Fix build-many-glibcs.py arm-linux-gnueabihf builds with mainline GCC.
My fix to make the arm-linux-gnueabihf build-many-glibcs.py builds
actually use the hard-float ABI as intended showed up another issue
when building with mainline GCC: GCC now determines an FPU based on
the selected CPU or architecture and gives an error for
-mfloat-abi=hard when the CPU does not imply a choice of FPU.  This
patch fixes all the affected configurations to specify a suitable
--with-cpu, --with-fpu or -mfpu option explicitly to avoid that error
from GCC.

Tested the relevant configurations with build-many-glibcs.py with
mainline GCC.

	* scripts/build-many-glibcs.py (Context.add_all_configs): Specify
	CPU or FPU for ARM hard-float configurations.
2017-12-19 00:08:49 +00:00
Joseph Myers
40c4162df6 Disable -Wrestrict for two nptl/tst-attr3.c tests.
nptl/tst-attr3 fails to build with GCC mainline because of
(deliberate) aliasing between the second (attributes) and fourth
(argument to thread start routine) arguments to pthread_create.

Although both those arguments are restrict-qualified in POSIX,
pthread_create does not actually dereference its fourth argument; it's
an opaque pointer passed to the thread start routine.  Thus, the
aliasing is actually valid in this case, and it's deliberate in the
test.  So this patch makes the test disable -Wrestrict for the two
pthread_create calls in question.  (-Wrestrict was added in GCC 7,
hence the __GNUC_PREREQ conditions, but the particular warning in
question is new in GCC 8.)

Tested compilation with build-many-glibcs.py for aarch64-linux-gnu.

	* nptl/tst-attr3.c: Include <libc-diag.h>.
	(do_test) [__GNUC_PREREQ (7, 0)]: Ignore -Wrestrict for two tests.
2017-12-18 22:55:28 +00:00
Joseph Myers
5983df320a Fix truncation warnings in posix/tst-glob_symlinks.c.
The test posix/tst-glob_symlinks.c fails to build with GCC mainline:

tst-glob_symlinks.c: In function 'do_test':
tst-glob_symlinks.c:124:30: error: 'snprintf' output may be truncated before the last format character [-Werror=format-truncation=]
   snprintf (buf, sizeof buf, "%s?", dangling_link);
                              ^~~~~
tst-glob_symlinks.c:124:3: note: 'snprintf' output between 2 and 4097 bytes into a destination of size 4096
   snprintf (buf, sizeof buf, "%s?", dangling_link);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tst-glob_symlinks.c:128:30: error: 'snprintf' output may be truncated before the last format character [-Werror=format-truncation=]
   snprintf (buf, sizeof buf, "%s*", dangling_link);
                              ^~~~~
tst-glob_symlinks.c:128:3: note: 'snprintf' output between 2 and 4097 bytes into a destination of size 4096
   snprintf (buf, sizeof buf, "%s*", dangling_link);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This patch fixes the test to avoid such truncation warnings by
increasing the buffer in question by one byte, to ensure it can hold
any possible result of %s? or %s* formats where %s comes from a buffer
of size PATH_MAX.

Tested compilation with build-many-glibcs.py for aarch64-linux-gnu.

	* posix/tst-glob_symlinks.c (do_test): Increase size of buf.
2017-12-18 22:54:01 +00:00
Joseph Myers
1421f39b7e Disable strncat test array-bounds warnings for GCC 8.
Some strncat tests fail to build with GCC 8 because of -Warray-bounds
warnings.  These tests are deliberately test over-large size arguments
passed to strncat, and already disable -Wstringop-overflow warnings,
but now the warnings for these tests come under -Warray-bounds so that
option needs disabling for them as well, which this patch does (with
an update on the comments; the DIAG_IGNORE_NEEDS_COMMENT call for
-Warray-bounds doesn't need to be conditional itself, because that
option is supported by all versions of GCC that can build glibc).

Tested compilation with build-many-glibcs.py for aarch64-linux-gnu.

	* string/tester.c (test_strncat): Also disable -Warray-bounds
	warnings for two tests.
2017-12-18 22:52:41 +00:00
H.J. Lu
00c714df39 Pass -no-pie to GCC only if GCC defaults to PIE [BZ #22614]
After --enable-static-pie is added to configure, libc_cv_pie_default is
set to yes when either --enable-static-pie is used to configure glibc
or GCC defaults to PIE.  We should set no-pie-ldflag to -no-pie, which
is supported on GCC 6 and later, only if GCC defaults to PIE, not when
--enable-static-pie is used to configure glibc.

Tested on x32 with --enable-static-pie using GCC 5 and without
--enable-static-pie using GCC 7.

	[BZ #22614]
	* Makeconfig (no-pie-ldflag): Set to -no-pie only if
	$(cc-pie-default) == yes.
	* config.make.in (cc-pie-default): New.
	* configure.ac (libc_cv_pie_default): Renamed to ...
	(libc_cv_cc_pie_default): This.
	(libc_cv_pie_default): Set to $libc_cv_cc_pie_default.
	* configure: Regenerated.
2017-12-18 12:24:38 -08:00
Florian Weimer
8e1472d2c1 ld.so: Examine GLRO to detect inactive loader [BZ #20204]
GLRO (_rtld_global_ro) is read-only after initialization and can
therefore not be patched at run time, unlike the hook table addresses
and their contents, so this is a desirable hardening feature.

The hooks are only needed if ld.so has not been initialized, and this
happens only after static dlopen (dlmopen uses a single ld.so object
across all namespaces).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2017-12-18 20:04:13 +01:00
Joseph Myers
49b036bce9 Fix nscd readlink argument aliasing (bug 22446).
Current GCC mainline detects that nscd calls readlink with the same
buffer for both input and output, which is not valid (those arguments
are both restrict-qualified in POSIX).  This patch makes it use a
separate buffer for readlink's input (with a size that is sufficient
to avoid truncation, so there should be no problems with warnings
about possible truncation, though not strictly minimal, but much
smaller than the buffer for output) to avoid this problem.

Tested compilation for aarch64-linux-gnu with build-many-glibcs.py.

	[BZ #22446]
	* nscd/connections.c (handle_request) [SO_PEERCRED]: Use separate
	buffers for readlink input and output.
2017-12-18 18:50:40 +00:00
Sergei Trofimovich
c85c564d14 mips32: fix clobbering s0 in setjmp() [BZ #22624]
Similar to commit 1ab47db00dfbc0128119e3503d3ed640ffc4830b
("mips64: fix clobbering s0 in setjmp() [BZ #22624]")
as sysdeps/mips/setjmp_aux.c is almost an identical copy
of sysdeps/mips/mips64/setjmp_aux.c.

	[BZ #22624]
	* sysdeps/mips/setjmp_aux.c (__sigsetjmp_aux): Use
	inhibit_stack_protector.
2017-12-18 18:26:49 +00:00
Sergei Trofimovich
368b6c8da9 mips64: fix clobbering s0 in setjmp() [BZ #22624]
When configured as --enable-stack-protector=all glibc
inserts stack checking canary into every function
including __sigsetjmp_aux(). Stack checking code
ends up using s0 register to temporary hold address
of global canary value.

Unfortunately __sigsetjmp_aux assumes no caller' caller-save
registers should be clobbered as it stores them as-is.

The fix is to disable stack protection of __sigsetjmp_aux.

Tested on the following test:

    #include <setjmp.h>
    #include <stdio.h>

    int main() {
        jmp_buf jb;
        volatile register long s0 asm ("$s0");
        s0 = 1234;
        if (setjmp(jb) == 0)
            longjmp(jb, 1);
        printf ("$s0 = %lu\n", s0);
    }

Without the fix:
    $ qemu-mipsn32 -L . ./mips-longjmp-bug
    $s0 = 1082346228

With the fix:
    $ qemu-mipsn32 -L . ./mips-longjmp-bug
    $s0 = 1234

	[BZ #22624]
	* sysdeps/mips/mips64/setjmp_aux.c (__sigsetjmp_aux): Use
	inhibit_stack_protector.
2017-12-18 17:23:02 +00:00
Szabolcs Nagy
c8e939f12a Update NEWS with aarch64 static pie support
The required binutils patches are not in a released version yet, but
glibc has support for static linked pie on aarch64 since commit
14d886edbd
2017-12-18 16:57:07 +00:00
Dmitry V. Levin
bb195224ac elf: do not substitute dst in $LD_LIBRARY_PATH twice [BZ #22627]
Starting with commit
glibc-2.18.90-470-g2a939a7e6d81f109d49306bc2e10b4ac9ceed8f9 that
introduced substitution of dynamic string tokens in fillin_rpath,
_dl_init_paths invokes _dl_dst_substitute for $LD_LIBRARY_PATH twice:
the first time it's called directly, the second time the result
is passed on to fillin_rpath which calls expand_dynamic_string_token
which in turn calls _dl_dst_substitute, leading to the following
behaviour:

$ mkdir -p /tmp/'$ORIGIN' && cd /tmp/'$ORIGIN' &&
  echo 'int main(){}' |gcc -xc - &&
  strace -qq -E LD_LIBRARY_PATH='$ORIGIN' -e /open ./a.out
open("/tmp//tmp/$ORIGIN/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/tmp//tmp/$ORIGIN/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/tmp//tmp/$ORIGIN/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/tmp//tmp/$ORIGIN/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3

Fix this by removing the direct _dl_dst_substitute invocation.

* elf/dl-load.c (_dl_init_paths): Remove _dl_dst_substitute preparatory
code and invocation.
2017-12-18 12:24:48 +00:00
Szabolcs Nagy
14d886edbd aarch64: fix start code for static pie
There are three flavors of the crt startup code:

1) crt1.o used for non-pie,
2) Scrt1.o used for dynamic linked pie (dynamic linker relocates),
3) rcrt1.o used for static linked pie (self relocation is needed)

In the --enable-static-pie case crt1.o is built with -DPIC and in case
of static linking it interposes _dl_relocate_static_pie in libc to
avoid self relocation.

Scrt1.o is built with -DPIC -DSHARED and it relies on GOT entries that
the static linker cannot relax and thus need relocation before the
start code is executed, so rcrt1.o needs separate implementation.

This implementation does not work for .text > 4G position independent
executables, which is fine since the toolchain does not support
-mcmodel=large with -fPIE.

Tests pass with ld/22269 and ld/22263 binutils bugs fixed.

	* sysdeps/aarch64/start.S (_start): Handle PIC && !SHARED case.
2017-12-18 10:07:07 +00:00
Aurelien Jarno
7d38eb3897 ldconfig: set LC_COLLATE to C [BZ #22505]
ldconfig supports `include' directives and use the glob function to
process them. The glob function sort entries according to the LC_COLLATE
category. When using a standard "include /etc/ld.so.conf.d/*.conf" entry
in /etc/ld.so.conf, the order therefore depends on the locale used to
run ldconfig. A few examples of locale specific order that might be
disturbing in that context compared to the C locale:
- The cs_CZ and sk_SK locales sort the digits after the letters.
- The et_EE locale sorts the 'z' between 's' and 't'.

This patch fixes that by setting LC_COLLATE to C in order to process
files in deterministic order, independently of the locale used to launch
ldconfig.

NOTE: This should NOT be backported to older release branches.

Changelog:
	[BZ #22505]
	* elf/ldconfig.c (main): Call setlocale to force LC_COLLATE to C.
2017-12-16 12:25:41 +01:00
Rajalakshmi Srinivasaraghavan
2e77deef67 s390: Update ulps
* sysdeps/s390/fpu/libm-test-ulps: Update.
2017-12-16 14:11:56 +05:30
Rajalakshmi Srinivasaraghavan
0b9bef6d22 powerpc: Update ulps
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
2017-12-16 14:04:14 +05:30
Rajalakshmi Srinivasaraghavan
984ae9967b New generic sincosf
This implementation is based on generic s_sinf.c and s_cosf.c.
Tested on s390x, powerpc64le and powerpc32.
2017-12-16 14:01:37 +05:30
Carlos O'Donell
93930ea935 Fix tst-leaks1 (bug 14681)
The test tst-leaks1 exercises calling dlopen with a $ORIGIN DST.

This results in a theoretical leak e.g.

Memory not freed:
-----------------
           Address     Size     Caller
0x0000000001d766c0     0x21  at 0x7fb1bd8bf4ab

Or as seen via valgrind:

==27582== 33 bytes in 1 blocks are still reachable in loss record 1 of 1
==27582==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==27582==    by 0x40124AA: _dl_get_origin (dl-origin.c:50)
==27582==    by 0x4007DB9: expand_dynamic_string_token (dl-load.c:382)
==27582==    by 0x400899C: _dl_map_object (dl-load.c:2160)
==27582==    by 0x4013020: dl_open_worker (dl-open.c:224)
==27582==    by 0x5166F9B: _dl_catch_exception (dl-error-skeleton.c:198)
==27582==    by 0x4012BD9: _dl_open (dl-open.c:594)
==27582==    by 0x4E39EF5: dlopen_doit (dlopen.c:66)
==27582==    by 0x5166F9B: _dl_catch_exception (dl-error-skeleton.c:198)
==27582==    by 0x516700E: _dl_catch_error (dl-error-skeleton.c:217)
==27582==    by 0x4E3A514: _dlerror_run (dlerror.c:162)
==27582==    by 0x4E39F70: dlopen@@GLIBC_2.2.5 (dlopen.c:87)

There is no real leak.

The calling link map (the executable's link map) has it's l_origin
expanded for future use as part of _dl_get_origin, and that results
in the main executable link map having a N-byte allocation for
l->l_origin that is never freed since the executable's link map is
just a part of the process.

To take this into account we do one dlopen with $ORIGIN before
calling mtrace to force the initialization of the executable link
map.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
2017-12-15 20:22:29 -08:00
H.J. Lu
9d7a3741c9 Add --enable-static-pie configure option to build static PIE [BZ #19574]
Static PIE extends address space layout randomization to static
executables.  It provides additional security hardening benefits at
the cost of some memory and performance.

Dynamic linker, ld.so, is a standalone program which can be loaded at
any address.  This patch adds a configure option, --enable-static-pie,
to embed the part of ld.so in static executable to create static position
independent executable (static PIE).  A static PIE is similar to static
executable, but can be loaded at any address without help from a dynamic
linker.  When --enable-static-pie is used to configure glibc, libc.a is
built as PIE and all static executables, including tests, are built as
static PIE.  The resulting libc.a can be used together with GCC 8 or
above to build static PIE with the compiler option, -static-pie.  But
GCC 8 isn't required to build glibc with --enable-static-pie.  Only GCC
with PIE support is needed.  When an older GCC is used to build glibc
with --enable-static-pie, proper input files are passed to linker to
create static executables as static PIE, together with "-z text" to
prevent dynamic relocations in read-only segments, which are not allowed
in static PIE.

The following changes are made for static PIE:

1. Add a new function, _dl_relocate_static_pie, to:
   a. Get the run-time load address.
   b. Read the dynamic section.
   c. Perform dynamic relocations.
Dynamic linker also performs these steps.  But static PIE doesn't load
any shared objects.
2. Call _dl_relocate_static_pie at entrance of LIBC_START_MAIN in
libc.a.  crt1.o, which is used to create dynamic and non-PIE static
executables, is updated to include a dummy _dl_relocate_static_pie.
rcrt1.o is added to create static PIE, which will link in the real
_dl_relocate_static_pie.  grcrt1.o is also added to create static PIE
with -pg.  GCC 8 has been updated to support rcrt1.o and grcrt1.o for
static PIE.

Static PIE can work on all architectures which support PIE, provided:

1. Target must support accessing of local functions without dynamic
relocations, which is needed in start.S to call __libc_start_main with
function addresses of __libc_csu_init, __libc_csu_fini and main.  All
functions in static PIE are local functions.  If PIE start.S can't reach
main () defined in a shared object, the code sequence:

	pass address of local_main to __libc_start_main
	...

local_main:
	tail call to main via PLT

can be used.
2. start.S is updated to check PIC instead SHARED for PIC code path and
avoid dynamic relocation, when PIC is defined and SHARED isn't defined,
to support static PIE.
3. All assembly codes are updated check PIC instead SHARED for PIC code
path to avoid dynamic relocations in read-only sections.
4. All assembly codes are updated check SHARED instead PIC for static
symbol name.
5. elf_machine_load_address in dl-machine.h are updated to support static
PIE.
6. __brk works without TLS nor dynamic relocations in read-only section
so that it can be used by __libc_setup_tls to initializes TLS in static
PIE.

NB: When glibc is built with GCC defaulted to PIE, libc.a is compiled
with -fPIE, regardless if --enable-static-pie is used to configure glibc.
When glibc is configured with --enable-static-pie, libc.a is compiled
with -fPIE, regardless whether GCC defaults to PIE or not.  The same
libc.a can be used to build both static executable and static PIE.
There is no need for separate PIE copy of libc.a.

On x86-64, the normal static sln:

   text	   data	    bss	    dec	    hex	filename
 625425	   8284	   5456	 639165	  9c0bd	elf/sln

the static PIE sln:

   text	   data	    bss	    dec	    hex	filename
 657626	  20636	   5392	 683654	  a6e86	elf/sln

The code size is increased by 5% and the binary size is increased by 7%.

Linker requirements to build glibc with --enable-static-pie:

1. Linker supports --no-dynamic-linker to remove PT_INTERP segment from
static PIE.
2. Linker can create working static PIE.  The x86-64 linker needs the
fix for

https://sourceware.org/bugzilla/show_bug.cgi?id=21782

The i386 linker needs to be able to convert "movl main@GOT(%ebx), %eax"
to "leal main@GOTOFF(%ebx), %eax" if main is defined locally.

Binutils 2.29 or above are OK for i686 and x86-64.  But linker status for
other targets need to be verified.

3. Linker should resolve undefined weak symbols to 0 in static PIE:

https://sourceware.org/bugzilla/show_bug.cgi?id=22269

4. Many ELF backend linkers incorrectly check bfd_link_pic for TLS
relocations, which should check bfd_link_executable instead:

https://sourceware.org/bugzilla/show_bug.cgi?id=22263

Tested on aarch64, i686 and x86-64.

Using GCC 7 and binutils master branch, build-many-glibcs.py with
--enable-static-pie with all patches for static PIE applied have the
following build successes:

PASS: glibcs-aarch64_be-linux-gnu build
PASS: glibcs-aarch64-linux-gnu build
PASS: glibcs-armeb-linux-gnueabi-be8 build
PASS: glibcs-armeb-linux-gnueabi build
PASS: glibcs-armeb-linux-gnueabihf-be8 build
PASS: glibcs-armeb-linux-gnueabihf build
PASS: glibcs-arm-linux-gnueabi build
PASS: glibcs-arm-linux-gnueabihf build
PASS: glibcs-arm-linux-gnueabihf-v7a build
PASS: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch build
PASS: glibcs-m68k-linux-gnu build
PASS: glibcs-microblazeel-linux-gnu build
PASS: glibcs-microblaze-linux-gnu build
PASS: glibcs-mips64el-linux-gnu-n32 build
PASS: glibcs-mips64el-linux-gnu-n32-nan2008 build
PASS: glibcs-mips64el-linux-gnu-n32-nan2008-soft build
PASS: glibcs-mips64el-linux-gnu-n32-soft build
PASS: glibcs-mips64el-linux-gnu-n64 build
PASS: glibcs-mips64el-linux-gnu-n64-nan2008 build
PASS: glibcs-mips64el-linux-gnu-n64-nan2008-soft build
PASS: glibcs-mips64el-linux-gnu-n64-soft build
PASS: glibcs-mips64-linux-gnu-n32 build
PASS: glibcs-mips64-linux-gnu-n32-nan2008 build
PASS: glibcs-mips64-linux-gnu-n32-nan2008-soft build
PASS: glibcs-mips64-linux-gnu-n32-soft build
PASS: glibcs-mips64-linux-gnu-n64 build
PASS: glibcs-mips64-linux-gnu-n64-nan2008 build
PASS: glibcs-mips64-linux-gnu-n64-nan2008-soft build
PASS: glibcs-mips64-linux-gnu-n64-soft build
PASS: glibcs-mipsel-linux-gnu build
PASS: glibcs-mipsel-linux-gnu-nan2008 build
PASS: glibcs-mipsel-linux-gnu-nan2008-soft build
PASS: glibcs-mipsel-linux-gnu-soft build
PASS: glibcs-mips-linux-gnu build
PASS: glibcs-mips-linux-gnu-nan2008 build
PASS: glibcs-mips-linux-gnu-nan2008-soft build
PASS: glibcs-mips-linux-gnu-soft build
PASS: glibcs-nios2-linux-gnu build
PASS: glibcs-powerpc64le-linux-gnu build
PASS: glibcs-powerpc64-linux-gnu build
PASS: glibcs-tilegxbe-linux-gnu-32 build
PASS: glibcs-tilegxbe-linux-gnu build
PASS: glibcs-tilegx-linux-gnu-32 build
PASS: glibcs-tilegx-linux-gnu build
PASS: glibcs-tilepro-linux-gnu build

and the following build failures:

FAIL: glibcs-alpha-linux-gnu build

elf/sln is failed to link due to:

assertion fail bfd/elf64-alpha.c:4125

This is caused by linker bug and/or non-PIC code in PIE libc.a.

FAIL: glibcs-hppa-linux-gnu build

elf/sln is failed to link due to:

collect2: fatal error: ld terminated with signal 11 [Segmentation fault]

https://sourceware.org/bugzilla/show_bug.cgi?id=22537

FAIL: glibcs-ia64-linux-gnu build

elf/sln is failed to link due to:

collect2: fatal error: ld terminated with signal 11 [Segmentation fault]

FAIL: glibcs-powerpc-linux-gnu build
FAIL: glibcs-powerpc-linux-gnu-soft build
FAIL: glibcs-powerpc-linux-gnuspe build
FAIL: glibcs-powerpc-linux-gnuspe-e500v1 build

elf/sln is failed to link due to:

ld: read-only segment has dynamic relocations.

This is caused by linker bug and/or non-PIC code in PIE libc.a.  See:

https://sourceware.org/bugzilla/show_bug.cgi?id=22264

FAIL: glibcs-powerpc-linux-gnu-power4 build

elf/sln is failed to link due to:

findlocale.c:96:(.text+0x22c): @local call to ifunc memchr

This is caused by linker bug and/or non-PIC code in PIE libc.a.

FAIL: glibcs-s390-linux-gnu build

elf/sln is failed to link due to:

collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core dumped

assertion fail bfd/elflink.c:14299

This is caused by linker bug and/or non-PIC code in PIE libc.a.

FAIL: glibcs-sh3eb-linux-gnu build
FAIL: glibcs-sh3-linux-gnu build
FAIL: glibcs-sh4eb-linux-gnu build
FAIL: glibcs-sh4eb-linux-gnu-soft build
FAIL: glibcs-sh4-linux-gnu build
FAIL: glibcs-sh4-linux-gnu-soft build

elf/sln is failed to link due to:

ld: read-only segment has dynamic relocations.

This is caused by linker bug and/or non-PIC code in PIE libc.a.  See:

https://sourceware.org/bugzilla/show_bug.cgi?id=22263

Also TLS code sequence in SH assembly syscalls in glibc doesn't match TLS
code sequence expected by ld:

https://sourceware.org/bugzilla/show_bug.cgi?id=22270

FAIL: glibcs-sparc64-linux-gnu build
FAIL: glibcs-sparcv9-linux-gnu build
FAIL: glibcs-tilegxbe-linux-gnu build
FAIL: glibcs-tilegxbe-linux-gnu-32 build
FAIL: glibcs-tilegx-linux-gnu build
FAIL: glibcs-tilegx-linux-gnu-32 build
FAIL: glibcs-tilepro-linux-gnu build

elf/sln is failed to link due to:

ld: read-only segment has dynamic relocations.

This is caused by linker bug and/or non-PIC code in PIE libc.a.  See:

https://sourceware.org/bugzilla/show_bug.cgi?id=22263

	[BZ #19574]
	* INSTALL: Regenerated.
	* Makeconfig (real-static-start-installed-name): New.
	(pic-default): Updated for --enable-static-pie.
	(pie-default): New for --enable-static-pie.
	(default-pie-ldflag): Likewise.
	(+link-static-before-libc): Replace $(DEFAULT-LDFLAGS-$(@F))
	with $(if $($(@F)-no-pie),$(no-pie-ldflag),$(default-pie-ldflag)).
	Replace $(static-start-installed-name) with
	$(real-static-start-installed-name).
	(+prectorT): Updated for --enable-static-pie.
	(+postctorT): Likewise.
	(CFLAGS-.o): Add $(pie-default).
	(CFLAGS-.op): Likewise.
	* NEWS: Mention --enable-static-pie.
	* config.h.in (ENABLE_STATIC_PIE): New.
	* configure.ac (--enable-static-pie): New configure option.
	(have-no-dynamic-linker): New LIBC_CONFIG_VAR.
	(have-static-pie): Likewise.
	Enable static PIE if linker supports --no-dynamic-linker.
	(ENABLE_STATIC_PIE): New AC_DEFINE.
	(enable-static-pie): New LIBC_CONFIG_VAR.
	* configure: Regenerated.
	* csu/Makefile (omit-deps): Add r$(start-installed-name) and
	gr$(start-installed-name) for --enable-static-pie.
	(extra-objs): Likewise.
	(install-lib): Likewise.
	(extra-objs): Add static-reloc.o and static-reloc.os
	($(objpfx)$(start-installed-name)): Also depend on
	$(objpfx)static-reloc.o.
	($(objpfx)r$(start-installed-name)): New.
	($(objpfx)g$(start-installed-name)): Also depend on
	$(objpfx)static-reloc.os.
	($(objpfx)gr$(start-installed-name)): New.
	* csu/libc-start.c (LIBC_START_MAIN): Call _dl_relocate_static_pie
	in libc.a.
	* csu/libc-tls.c (__libc_setup_tls): Add main_map->l_addr to
	initimage.
	* csu/static-reloc.c: New file.
	* elf/Makefile (routines): Add dl-reloc-static-pie.
	(elide-routines.os): Likewise.
	(DEFAULT-LDFLAGS-tst-tls1-static-non-pie): Removed.
	(tst-tls1-static-non-pie-no-pie): New.
	* elf/dl-reloc-static-pie.c: New file.
	* elf/dl-support.c (_dl_get_dl_main_map): New function.
	* elf/dynamic-link.h (ELF_DURING_STARTUP): Also check
	STATIC_PIE_BOOTSTRAP.
	* elf/get-dynamic-info.h (elf_get_dynamic_info): Likewise.
	* gmon/Makefile (tests): Add tst-gmon-static-pie.
	(tests-static): Likewise.
	(DEFAULT-LDFLAGS-tst-gmon-static): Removed.
	(tst-gmon-static-no-pie): New.
	(CFLAGS-tst-gmon-static-pie.c): Likewise.
	(CRT-tst-gmon-static-pie): Likewise.
	(tst-gmon-static-pie-ENV): Likewise.
	(tests-special): Likewise.
	($(objpfx)tst-gmon-static-pie.out): Likewise.
	(clean-tst-gmon-static-pie-data): Likewise.
	($(objpfx)tst-gmon-static-pie-gprof.out): Likewise.
	* gmon/tst-gmon-static-pie.c: New file.
	* manual/install.texi: Document --enable-static-pie.
	* sysdeps/generic/ldsodefs.h (_dl_relocate_static_pie): New.
	(_dl_get_dl_main_map): Likewise.
	* sysdeps/i386/configure.ac: Check if linker supports static PIE.
	* sysdeps/x86_64/configure.ac: Likewise.
	* sysdeps/i386/configure: Regenerated.
	* sysdeps/x86_64/configure: Likewise.
	* sysdeps/mips/Makefile (ASFLAGS-.o): Add $(pie-default).
	(ASFLAGS-.op): Likewise.
2017-12-15 17:12:14 -08:00
Joseph Myers
95511aab9d Fix testing with read-only source directory.
Three tests fail with a read-only source directory because they try to
write into the source directory.  None of these write into it in a way
that should actually be problematic for concurrent builds sharing the
same writable source directory, but avoiding any writing into the
source directory (from testing, or from building glibc if the source
timestamps are properly ordered) is still a good idea, as being able
to build with read-only sources helps make sure there isn't anything
that could cause problems for concurrent builds.

This patch changes the tests in question to use either /tmp or the
build directory to write their temporary files (or to test O_TMPFILE,
as applicable).

Tested for x86_64.

	* io/Makefile (tst-open-tmpfile-ARGS): New variable.
	* posix/tst-mmap-offset.c (fname): Use /tmp.
	* stdlib/tst-setcontext3.sh (tempfile): Use ${objpfx}.
2017-12-15 22:37:17 +00:00
Steve Ellcey
a7e3edf4f2 Increase buffer size due to warning from ToT GCC
* nscd/dbg_log.c (dbg_log): Increase msg buffer size.
2017-12-15 09:08:23 -08:00
Thomas Schwinge
d232f2e137 Don't set errno in Hurd rtld's __access_noerrno
* sysdeps/mach/hurd/dl-sysdep.c (__access_noerrno): Don't set
	errno.

Fixes commit 819ea3347e.

Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2017-12-15 18:02:56 +01:00
Joseph Myers
5170fa49b2 Correct build-many-glibcs.py arm-linux-gnueabihf configurations.
The conventional configure triplet for ARM GNU/Linux with hard-float
ABI is arm-*-linux-gnueabihf.  However, GCC does not automatically use
the hard-float ABI based on that triplet.  This patch fixes
build-many-glibcs.py to pass --with-float=hard so that the
arm-linux-gnueabihf configurations actually build with the intended
ABI.

Tested building the affected configurations with build-many-glibcs.py.

	* scripts/build-many-glibcs.py (Context.add_all_configs): Use
	--with-float=hard for arm-linux-gnueabihf configurations.
2017-12-15 16:57:29 +00:00
Joseph Myers
f2da2fd81f Do not build .mo files in source directory (bug 14121).
Building and installing glibc leaves .mo files (compiled message
translations) behind in the source directory.  Building those files in
the source directory may once have made sense, if they were included
in release tarballs; now that release tarballs are just the output of
"git archive", building any non-checked-in files in the source
directory does not make sense.  This patch changes these files to be
built in the build directory instead.  The realclean rule is changed
to simply adding the .mo files to the "generated" variable, since once
the files are in the build directory it make no sense to exclude them
from normal cleanup rules.

This is necessary but not sufficient to avoid build-many-glibcs.py
needing to copy the glibc source directory.  Its list of files to
touch on checkout to avoid subsequent regeneration (configure,
preconfigure, *-kw.h) is incomplete (missing at least INSTALL,
sysdeps/gnu/errlist.c, posix/testcases.h, posix/ptestcases.h,
locale/C-translit.h, (only regenerated for Hurd builds)
sysdeps/mach/hurd/bits/errno.h, (only regenerated for 32-bit SPARC
builds) sysdeps/sparc/sparc32/{sdiv,udiv,rem,urem}.S) - the existing
list may be sufficient to prevent regeneration that actually changes
the file contents depending on the installed build tools, but not to
ensure there is no regeneration at all - and there might well be other
things writing into the source directory in the course of building and
testing (so needing appropriate testing with read-only source
directories with different timestamp orderings to find and eliminate
all such cases).

Tested for x86_64.

	[BZ #14121]
	* po/Makefile (generated): Add $(ALL_LINGUAS:%=%.mo).
	(%.mo): Change to $(objpfx)%.mo.  Use $(make-target-directory).
	($(mo-installed)): Use $(objpfx)%.mo.
	(realclean): Remove rule.
2017-12-15 14:27:20 +00:00
Joseph Myers
0c4fe28d7a Remove old po/ code for copying .po files from shared directory.
po/Makefile has both old code for copying .po files from a shared
directory /com/share/ftp/gnu/po/maint/glibc (presumably once present
on some GNU server), and new code for downloading them from the
Translation Project.  This patch removes the old code, leading only
the new code.

Tested for x86_64.

	* po/Makefile (linguas): Remove rule and dependencies.
	(linguas.mo): Likewise.
	(.PHONY): Do not depend on linguas and linguas.mo.
	(podir): Remove variable.
	(pofiles): Likewise.
	[$(pofiles)] (%.po): Remove rule.
2017-12-15 14:20:52 +00:00
Joseph Myers
174edbde7e Update SPARC divrem generation to match output.
While working on another patch I noticed that (a)
sysdeps/sparc/sparc32/Makefile is the only place with special
realclean settings, apart from po/, and (b) the generated files with a
rule in that Makefile to generate them (using m4) had been patched
manually so no longer corresponded with the output of the generator -
so if the timestamps were wrong, a build would result in changes to
the files in the source directory.  (They also didn't correspond
because of changes in make 3.81 to how make handles whitespace at the
start of a line in a sequence of backslash-newline continuation lines
within a recipe.)

This patch fixes the generation and output files to match.  The issue
with make and whitespace at start of continuation lines is fixed by
putting those newlines outside of arguments to echo, so the number of
spaces in the argument matches the number in the existing generated
files.  Then divrem.m4 is changed to avoid generating whitespace-only
lines (my fix to the outputs from 2013; this fix to the generator also
changes the indentation of a label in the output files) and to
generate an alias in udiv.S (Adhemerval's fix from March).

build-many-glibcs.py doesn't have a non-v9 SPARC configuration,
because non-v9 32-bit SPARC didn't build when I set up
build-many-glibcs.py but sparcv9 did build.  Whether or not non-v9
32-bit SPARC now builds (or indeed whether or not support for it is
obsolete), I tested by removing the sparcv8 and sparcv9 versions of
the four files in question, so forcing the generated files to be built
and used, and the compilation parts of the glibc testsuite passed.

	* sysdeps/sparc/sparc32/Makefile
	($(divrem:%=$(sysdep_dir)/sparc/sparc32/%.S)): Do not include
	start-of-line whitespace in argument of echo.
	* sysdeps/sparc/sparc32/divrem.m4: Avoid generating lines starting
	with whitespace.  Generate __wrap_.udiv alias.
	* sysdeps/sparc/sparc32/rem.S: Regenerated.
	* sysdeps/sparc/sparc32/sdiv.S: Likewise.
	* sysdeps/sparc/sparc32/udiv.S: Likewise.
	* sysdeps/sparc/sparc32/urem.S: Likewise.
2017-12-15 14:06:07 +00:00
Rajalakshmi Srinivasaraghavan
1e36806fb8 powerpc: st{r,p}cpy optimization for aligned strings
This patch makes use of vectors for aligned inputs.  Improvements
upto 30% seen for larger aligned inputs.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
2017-12-15 10:55:58 +05:30
Siddhesh Poyarekar
5f1603c331 Convert strcmp benchmark output to json format
The format is now parseable with the compare_strings.py script.
2017-12-15 02:31:41 +05:30
Siddhesh Poyarekar
aa6932aa7b Remove redundant mention of SXID_ERASE
SXID_ERASE is implicit for all environment variables.  Avoid
mentioning it in the tunables list; that way only the ones with
SXID_IGNORE remain prominent and mentioned.  TODO: we need to audit
each of those cases and drop them to SXID_ERASE wherever possible.
2017-12-15 00:48:12 +05:30
Florian Weimer
3ff3dfa5af elf: Count components of the expanded path in _dl_init_path [BZ #22607] 2017-12-14 15:31:46 +01:00
Florian Weimer
8a0b17e48b elf: Compute correct array size in _dl_init_paths [BZ #22606] 2017-12-14 15:27:08 +01:00
Florian Weimer
f58bd7f055 support: Simplify compiling most of support/ outside of glibc
Some include files were missing because they are implied by the
in-tree build process.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2017-12-14 13:48:56 +01:00
H.J. Lu
4ca945e9c5 x86-64: Remove sysdeps/x86_64/fpu/s_cosf.S
On Ivy Bridge, bench-cosf reports performance improvement:

           s_cosf.S      s_cosf.c       Improvement
max        114.136       82.401            39%
min        13.119        11.301            16%
mean       22.1882       21.8938           1%

	* sysdeps/x86_64/fpu/s_cosf.S: Removed.
2017-12-14 04:39:53 -08:00
Patrick McGehearty
e6a1c5dc77 sparc: M7 optimized memset/bzero
Support added to identify Sparc M7/T7/S7/M8/T8 processor capability.
Performance tests run on Sparc S7 using new code and old niagara4 code.

Optimizations for memset also apply to bzero as they share code.

For memset/bzero, performance comparison with niagara4 code:
For memset nonzero data,
  256-1023 bytes - 60-90% gain (in cache); 5% gain (out of cache)
  1K+ bytes - 80-260% gain (in cache); 40-80% gain (out of cache)
For memset zero data (and bzero),
  256-1023 bytes - 80-120% gain (in cache), 0% gain (out of cache)
  1024+ bytes - 2-4x gain (in cache), 10-35% gain (out of cache)

Tested in sparcv9-*-* and sparc64-*-* targets in both multi and
non-multi arch configurations.

	Patrick McGehearty <patrick.mcgehearty@oracle.com>
	Adhemerval Zanella  <adhemerval.zanella@linaro.org>

	* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile
	(sysdeps_routines): Add memset-niagara7.
	* sysdeps/sparc/sparc64/multiarch/Makefile (sysdes_rotuines):
	Likewise.
	* sysdeps/sparc/sparc32/sparcv9/multiarch/memset-niagara7.S: New
	file.
	* sysdeps/sparc/sparc64/multiarch/memset-niagara7.S: Likewise.
	* sysdeps/sparc/sparc64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add __bzero_niagara7 and __memset_niagara7.
	* sysdeps/sparc/sparc64/multiarch/ifunc-memset.h (IFUNC_SELECTOR):
	Add niagara7 option.
	* NEWS: Mention sparc m7 optimized memcpy, mempcpy, memmove, and
	memset.
2017-12-14 08:48:19 -02:00
Patrick McGehearty
1b6e07f8e0 sparc: M7 optimized memcpy/mempcpy/memmove
Support added to identify Sparc M7/T7/S7/M8/T8 processor capability.
Performance tests run on Sparc S7 using new code and old niagara4 code.

Optimizations for memcpy also apply to mempcpy and memmove
where they share code. Optimizations for memset also apply
to bzero as they share code.

For memcpy/mempcpy/memmove, performance comparison with niagara4 code:
Long word aligned data
  0-127 bytes - minimal changes
  128-1023 bytes - 7-30% gain
  1024+ bytes - 1-7% gain (in cache); 30-100% gain (out of cache)
Word aligned data
  0-127 bytes - 50%+ gain
  128-1023 bytes - 10-200% gain
  1024+ bytes - 0-15% gain (in cache); 5-50% gain (out of cache)
Unaligned data
  0-127 bytes - 0-70%+ gain
  128-447 bytes - 40-80%+ gain
  448-511 bytes - 1-3% loss
  512-4096 bytes - 2-3% gain (in cache); 0-20% gain (out of cache)
  4096+ bytes - ± 3% (in cache); 20-50% gain (out of cache)

Tested in sparcv9-*-* and sparc64-*-* targets in both multi and
non-multi arch configurations.

	Patrick McGehearty  <patrick.mcgehearty@oracle.com>
	Adhemerval Zanella  <adhemerval.zanella@linaro.org>

	* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile
	(sysdeps_routines): Add memcpy-memmove-niagara7 and memmove-ultra1.
	* sysdeps/sparc/sparc64/multiarch/Makefile (sysdeps_routines):
	Likewise.
	* sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy-memmove-niagara7.S:
	New file.
	* sysdeps/sparc/sparc32/sparcv9/multiarch/memmove-ultra1.S: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/multiarch/rtld-memmove.c: Likewise.
	* sysdeps/sparc/sparc64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add __memcpy_niagara7, __mempcpy_niagara7,
	and __memmove_niagara7.
	* sysdeps/sparc/sparc64/multiarch/ifunc-memcpy.h (IFUNC_SELECTOR):
	Add niagara7 option.
	* sysdeps/sparc/sparc64/multiarch/memmove.c: New file.
	* sysdeps/sparc/sparc64/multiarch/ifunc-memmove.h: Likewise.
	* sysdeps/sparc/sparc64/multiarch/memcpy-memmove-niagara7.S: Likewise.
	* sysdeps/sparc/sparc64/multiarch/memmove-ultra1.S: Likewise.
	* sysdeps/sparc/sparc64/multiarch/rtld-memmove.c: Likewise.
2017-12-14 08:47:38 -02:00
Jose E. Marchesi
767a26d6a0 sparc: assembly version of memmove for ultra1+
Tested in sparcv9-*-* and sparc64-*-* targets in both non-multi-arch and
multi-arch configurations.

	* sysdeps/sparc/sparc32/sparcv9/memmove.S: New file.
	* sysdeps/sparc/sparc32/sparcv9/rtld-memmove.c: Likewise.
	* sysdeps/sparc/sparc64/memmove.S: Likewise.
	* sysdeps/sparc/sparc64/rtld-memmove.c: Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-12-14 08:47:09 -02:00
Jose E. Marchesi
bfb7bf2273 sparc: support the ADP hw capability
This patch adds support for the ADP (also known as adi) hardware
capability, as reported by the kernel sparc port when running on M7
machines.

Tested in both sparcv9-*-* and sparc64-*-* targets.

	* sysdeps/sparc/bits/hwcap.h (HWCAP_SPARC_ADP): Defined.
	* sysdeps/sparc/dl-procinfo.c: Added "adp" to the
	_dl_sparc_cap_flags array.
	* sysdeps/sparc/dl-procinfo.h (_DL_HWCAP_COUNT): Increment.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2017-12-14 08:47:07 -02:00
Siddhesh Poyarekar
2bce01ebba aarch64: Improve strcmp unaligned performance
Replace the simple byte-wise compare in the misaligned case with a
dword compare with page boundary checks in place.  For simplicity I've
chosen a 4K page boundary so that we don't have to query the actual
page size on the system.

This results in up to 3x improvement in performance in the unaligned
case on falkor and about 2.5x improvement on mustang as measured using
bench-strcmp.

	* sysdeps/aarch64/strcmp.S (misaligned8): Compare dword at a
	time whenever possible.
2017-12-13 18:50:27 +05:30
Carlos O'Donell
243b63337c Fix testing with nss-crypt.
A glibc master build with --enable-nss-crypt using the NSS
crypto libraries fails during make check with the following error:

<command-line>:0:0: error: "USE_CRYPT" redefined [-Werror]
<command-line>:0:0: note: this is the location of the previous
definition

This is caused by commit 36975e8e7e
by H.J. Lu which replaces all = with +=. The fix is to undefine
USE_CRYPT before defining it to zero.

Committed as an obvious fix. Fixes the build issue on x86_64 with
no regressions.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
2017-12-12 18:34:33 -08:00
Joseph Myers
6f7c009282 Add sysdeps/ieee754/soft-fp.
The default sysdeps/ieee754 fma implementations rely on exceptions and
rounding modes to achieve correct results through internal use of
round-to-odd.  Thus, glibc configurations without support for
exceptions and rounding modes instead need to use implementations of
fma based on soft-fp.

At present, this is achieved via having implementation files in
soft-fp/ that are #included by sysdeps files for each glibc
configuration that needs them.  In general this means such a
configuration has its own s_fma.c and s_fmaf.c.

TS 18661-1 adds functions that do an operation (+ - * / sqrt fma) on
arguments wider than the return type, with a single rounding of the
infinite-precision result to that return type.  These are also
naturally implemented using round-to-odd on platforms with hardware
support for rounding modes and exceptions but lacking hardware support
for these narrowing operations themselves.  (Platforms that have
direct hardware support for such narrowing operations include at least
ia64, and Power ISA 2.07 or later, which I think means POWER8 or
later.)

So adding the remaining TS 18661-1 functions would mean at least six
narrowing function implementations (fadd fsub fmul fdiv ffma fsqrt),
with aliases for other types and further implementations in some
configurations, that need to be overridden for configurations lacking
hardware exceptions and rounding modes.  Requiring all such
configurations (currently seven of them) to have their own source
files for all those functions seems undesirable.

Thus, this patch adds a directory sysdeps/ieee754/soft-fp to contain
libm function implementations based on soft-fp.  This directory is
then used via Implies from all the configurations that need it, so no
more files need adding to every such configuration when adding more
functions with soft-fp implementations.  A configuration can still
selectively #include a particular file from this directory if desired;
thus, the MIPS #include of the fmal implementation is retained, since
that's appropriate even for hard float (because long double is always
implementated in software for MIPS64, so the soft-fp implementation of
fmal is better than the ldbl-128 one).

This also provides additional motivation for my recent patch removing
--with-fp / --without-fp: previously there was no need for correct use
of --without-fp for no-FPU ARM or SH3, and now we have autodetection
nofpu/ sysdeps directories can be used by this patch for those
configurations without imposing any new requirements on how glibc is
configured.

(The mips64/*/fpu/s_fma.c files added by this patch are needed to keep
the dbl-64 version of fma for double, rather than the ldbl-128 one,
used in that case.)

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.

	* soft-fp/fmadf4.c: Move to ....
	* sysdeps/ieee754/soft-fp/s_fma.c: ... here.
	* soft-fp/fmasf4.c: Move to ....
	* sysdeps/ieee754/soft-fp/s_fmaf.c: ... here.
	* soft-fp/fmatf4.c: Move to ....
	* sysdeps/ieee754/soft-fp/s_fmal.c: ... here.
	* sysdeps/ieee754/soft-fp/Makefile: New file.
	* sysdeps/arm/preconfigure.ac: Define with_fp_cond.
	* sysdeps/arm/preconfigure: Regenerated.
	* sysdeps/arm/nofpu/Implies: New file.
	* sysdeps/arm/s_fma.c: Remove file.
	* sysdeps/arm/s_fmaf.c: Likewise.
	* sysdeps/m68k/coldfire/nofpu/Implies: New file.
	* sysdeps/m68k/coldfire/nofpu/s_fma.c: Remove file.
	* sysdeps/m68k/coldfire/nofpu/s_fmaf.c: Likewise.
	* sysdeps/microblaze/Implies: Add ieee754/soft-fp.
	* sysdeps/microblaze/s_fma.c: Remove file.
	* sysdeps/microblaze/s_fmaf.c: Likewise.
	* sysdeps/mips/mips32/nofpu/Implies: New file.
	* sysdeps/mips/mips64/n32/fpu/s_fma.c: Likewise.
	* sysdeps/mips/mips64/n32/nofpu/Implies: Likewise.
	* sysdeps/mips/mips64/n64/fpu/s_fma.c: Likewise.
	* sysdeps/mips/mips64/n64/nofpu/Implies: Likewise.
	* sysdeps/mips/ieee754/s_fma.c: Remove file.
	* sysdeps/mips/ieee754/s_fmaf.c: Likewise.
	* sysdeps/mips/ieee754/s_fmal.c: Update include for move of fmal
	implementation.
	* sysdeps/nios2/Implies: Add ieee754/soft-fp.
	* sysdeps/nios2/s_fma.c: Remove file.
	* sysdeps/nios2/s_fmaf.c: Likewise.
	* sysdeps/sh/nofpu/Implies: New file.
	* sysdeps/sh/s_fma.c: Remove file.
	* sysdeps/sh/s_fmaf.c: Likewise.
	* sysdeps/tile/Implies: Add ieee754/soft-fp.
	* sysdeps/tile/s_fma.c: Remove file.
	* sysdeps/tile/s_fmaf.c: Likewise.
2017-12-12 23:35:21 +00:00
H.J. Lu
ac817e083b x86-64: Add cosf with FMA
On Skylake, bench-cosf reports performance improvement:

            Before        After         Improvement
max        135.362       94.552            43%
min        8.532         7.688             11%
mean       17.1446       11.8128           45%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_cosf-sse2 and s_cosf-fma.
	(CFLAGS-s_cosf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-sse2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_cosf.c: Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-12-12 15:32:58 -08:00
Steve Ellcey
eb4285768b Use memcpy instead of strncpy in nscd/nscd.h to fix build problem with ToT GCC
* nscd/nscd.h (init_traced_file): Change strncpy to memcpy.
2017-12-12 13:47:32 -08:00
Adhemerval Zanella
cc683f7ed4 libio: Free backup area when it not required (BZ#22415)
Some libio operations fail to correctly free the backup area (created
by _IO_{w}default_pbackfail on unget{w}c) resulting in either invalid
buffer free operations or memory leaks.

For instance, on the example provided by BZ#22415 a following
fputc after a fseek to rewind the stream issues an invalid free on
the buffer.  It is because although _IO_file_overflow correctly
(from fputc) correctly calls _IO_free_backup_area, the
_IO_new_file_seekoff (called by fseek) updates the FILE internal
pointers without first free the backup area (resulting in invalid
values in the internal pointers).

The wide version also shows an issue, but instead of accessing invalid
pointers it leaks the backup memory on fseek/fputwc operation.

Checked on x86_64-linux-gnu and i686-linux-gnu.

	* libio/Makefile (tests): Add tst-bz22415.
	(tst-bz22415-ENV): New rule.
	(generated): Add tst-bz22415.mtrace and tst-bz22415.check.
	(tests-special): Add tst-bz22415-mem.out.
	($(objpfx)tst-bz22415-mem.out): New rule.
	* libio/fileops.c (_IO_new_file_seekoff): Call _IO_free_backup_area
	in case of a successful seek operation.
	* libio/wfileops.c (_IO_wfile_seekoff): Likewise.
	(_IO_wfile_overflow): Call _IO_free_wbackup_area in case a write
	buffer is required.
	* libio/tst-bz22415.c: New test.
2017-12-12 17:29:54 -02:00
Adhemerval Zanella
c80acdc325 Update IA64 libm-test-ulps
Ran on Itanium Processor 9020, GCC 7.2.1.

	* sysdeps/ia64/fpu/libm-test-ulps: Update.

Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-12-12 16:57:41 -02:00