Commit Graph

10176 Commits

Author SHA1 Message Date
Joseph Myers
f9b437d5ef Update sysdeps/unix/sysv/linux/bits/socket.h for Linux 4.6.
This patch updates sysdeps/unix/sysv/linux/bits/socket.h for new
constants added in Linux 4.6.  AF_KCM / PF_KCM are added.  SOL_KCM is
new, and I added a lot of SOL_* values postdating the last one present
in the header, since I saw no apparent reason for the set in glibc to
stop at SOL_IRDA.  MSG_BATCH is added; Linux also has
MSG_SENDPAGE_NOTLAST which is not in glibc, but given the comment
starts "sendpage() internal" I presume it's correct for it not to be
in glibc.

(Note that this is a case where the Linux kernel header with userspace
relevant values is *not* a uapi header but include/linux/socket.h - I
don't know why, but at least this header, as well as uapi headers,
needs reviewing for glibc-relevant changes each release.)

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).

	* sysdeps/unix/sysv/linux/bits/socket.h (PF_KCM): New macro.
	(PF_MAX): Update value.
	(AF_KCM): New macro.
	(SOL_NETBEUI): Likewise.
	(SOL_LLC): Likewise.
	(SOL_DCCP): Likewise.
	(SOL_NETLINK): Likewise.
	(SOL_TIPC): Likewise.
	(SOL_RXRPC): Likewise.
	(SOL_PPPOL2TP): Likewise.
	(SOL_BLUETOOTH): Likewise.
	(SOL_PNPIPE): Likewise.
	(SOL_RDS): Likewise.
	(SOL_IUCV): Likewise.
	(SOL_CAIF): Likewise.
	(SOL_ALG): Likewise.
	(SOL_NFC): Likewise.
	(SOL_KCM): Likewise.
	(MSG_BATCH): New enum value and macro.
2016-05-23 13:27:37 +00:00
H.J. Lu
b7598b1b85 Remove special L2 cache case for Knights Landing
L2 cache is shared by 2 cores on Knights Landing, which has 4 threads
per core:

https://en.wikipedia.org/wiki/Xeon_Phi#Knights_Landing

So L2 cache is shared by 8 threads on Knights Landing as reported by
CPUID.  We should remove special L2 cache case for Knights Landing.

	[BZ #18185]
	* sysdeps/x86/cacheinfo.c (init_cacheinfo): Don't limit threads
	sharing L2 cache to 2 for Knights Landing.
2016-05-20 14:42:00 -07:00
Joseph Myers
ffe9aaf2b9 Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.

This patch replaces it with a genuine fused operation.  It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result.  Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).

This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs.  Various other pieces of
bug 13304 were fixed in past releases over the past several years.

Tested for powerpc.

	[BZ #13304]
	* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
	<float.h>, <math_private.h> and <stdlib.h>.
	(add_split): New function.
	(mul_split): Likewise.
	(ext_val): New typedef.
	(store_ext_val): New function.
	(mul_ext_val): New function.
	(compare): New function.
	(add_split_ext): New function.
	(__fmal): After checking for Inf, NaN and zero, compute result as
	an exact sum of scaled double values in round-to-nearest before
	adding those up and adjusting for other rounding modes.
	* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
	tests of fma.
	* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
H.J. Lu
de71e0421b Correct Intel processor level type mask from CPUID
Intel CPUID with EAX == 11 returns:

ECX Bits 07 - 00: Level number. Same value in ECX input.
    Bits 15 - 08: Level type.
    ^^^^^^^^^^^^^^^^^^^^^^^^ This is level type.
    Bits 31 - 16: Reserved.

Intel processor level type mask should be 0xff00, not 0xff0.

	[BZ #20119]
	* sysdeps/x86/cacheinfo.c (init_cacheinfo): Correct Intel
	processor level type mask for CPUID with EAX == 11.
2016-05-19 10:02:36 -07:00
H.J. Lu
7c08d791ee Check the HTT bit before counting logical threads
Skip counting logical threads for Intel processors if the HTT bit is 0
which indicates there is only a single logical processor.

	* sysdeps/x86/cacheinfo.c (init_cacheinfo): Skip counting
	logical threads if the HTT bit is 0.
	* sysdeps/x86/cpu-features.h (bit_cpu_HTT): New.
	(index_cpu_HTT): Likewise.
	(reg_HTT): Likewise.
2016-05-19 09:09:00 -07:00
H.J. Lu
eb2c88c7c8 Remove alignments on jump targets in memset
X86-64 memset-vec-unaligned-erms.S aligns many jump targets, which
increases code sizes, but not necessarily improve performance.  As
memset benchtest data of align vs no align on various Intel and AMD
processors

https://sourceware.org/bugzilla/attachment.cgi?id=9277

shows that aligning jump targets isn't necessary.

	[BZ #20115]
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset):
	Remove alignments on jump targets.
2016-05-19 08:49:55 -07:00
H.J. Lu
16cd2b35c2 Don't call internal _Unwind_Resume via PLT
There is no need to call the internal funtion, _Unwind_Resume, which
is defined in unwind-forcedunwind.c, via PLT.

	* sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S
	(__condvar_cleanup2): Remove JUMPTARGET from  _Unwind_Resume
	call.
	* sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
	(__condvar_cleanup1): Likewise.
2016-05-18 13:43:26 -07:00
H.J. Lu
d29261db22 Don't call internal __pthread_unwind via PLT
Add PTHREAD_UNWIND to replace JUMPTARGET(__pthread_unwind) and define
it to __GI___pthread_unwind within libpthread.

	* sysdeps/unix/sysv/linux/x86_64/cancellation.S (PTHREAD_UNWIND):
	New
	(__pthread_unwind): Renamed to ...
	(PTHREAD_UNWIND): This.
	(__pthread_enable_asynccancel): Replace
	JUMPTARGET(__pthread_unwind) with PTHREAD_UNWIND.
2016-05-18 13:41:55 -07:00
Joseph Myers
48526672b6 Add CLONE_NEWCGROUP from Linux 4.6 to bits/sched.h.
This patch adds CLONE_NEWCGROUP, new in Linux 4.6, to
sysdeps/unix/sysv/linux/bits/sched.h.

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).

	* sysdeps/unix/sysv/linux/bits/sched.h [__USE_GNU]
	(CLONE_NEWCGROUP): New macro.
2016-05-18 17:46:52 +00:00
Joseph Myers
2a1aa52824 Add Q_GETNEXTQUOTA from Linux 4.6 to sys/quota.h.
This patch adds Q_GETNEXTQUOTA, new in Linux 4.6, to
sysdeps/unix/sysv/linux/sys/quota.h.

Tested for x86_64 and x86 (testsuite, and that installed shared
libraries are unchanged by the patch).

	* sysdeps/unix/sysv/linux/sys/quota.h [_LINUX_QUOTA_VERSION >= 2]
	(Q_GETNEXTQUOTA): New macro.
2016-05-18 13:15:11 +00:00
H.J. Lu
4facca0b0e Call init_cpu_features only if SHARED is defined
In static executable, since init_cpu_features is called early from
__libc_start_main, there is no need to call it again in dl_platform_init.

	[BZ #20072]
	* sysdeps/i386/dl-machine.h (dl_platform_init): Call
	init_cpu_features only if SHARED is defined.
	* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
2016-05-13 08:29:33 -07:00
H.J. Lu
9e4ec3e816 Support non-inclusive caches on Intel processors
* sysdeps/x86/cacheinfo.c (init_cacheinfo): Check and support
	non-inclusive caches on Intel processors.
2016-05-13 07:18:35 -07:00
Wilco Dijkstra
a8c5a2a952 This is an optimized memset for AArch64. Memset is split into 4 main cases:
small sets of up to 16 bytes, medium of 16..96 bytes which are fully unrolled.
Large memsets of more than 96 bytes align the destination and use an unrolled
loop processing 64 bytes per iteration.  Memsets of zero of more than 256 use
the dc zva instruction, and there are faster versions for the common ZVA sizes
64 or 128.  STP of Q registers is used to reduce codesize without loss of
performance.

The speedup on test-memset is 1% on Cortex-A57 and 8% on Cortex-A53.

	* sysdeps/aarch64/memset.S (__memset):
	Rewrite of optimized memset.
2016-05-12 16:44:53 +01:00
Florian Weimer
56290d6e76 Increase fork signal safety for single-threaded processes [BZ #19703]
This provides a band-aid and addresses the scenario where fork is
called from a signal handler while the process is in the malloc
subsystem (or has acquired the libio list lock).  It does not
address the general issue of async-signal-safety of fork;
multi-threaded processes are not covered, and some glibc
subsystems have fork handlers which are not async-signal-safe.
2016-05-12 15:26:55 +02:00
Florian Weimer
cd065b6843 getaddrinfo: Convert from extend_alloca to struct scratch_buffer 2016-05-12 14:07:56 +02:00
Stefan Liebler
c64a10e544 S390: Use fPIC to avoid R_390_GOT12 relocation in gcrt1.o.
if glibc is build with -march=z900 | -march=z990,
the startup file gcrt1.o (used if you link with gcc -pg)
contains R_390_GOT12 | R_390_GOT20 relocations.
Thus, an entry in the GOT can be addressed relative to the GOT pointer
with a 12 | 20 bit displacement value.
The startup files should not contain R_390_GOT12,
R_390_GOT20 relocations, but R_390_GOTENT ones.

This patch removes the overrides of pic-ccflag and
the default pic-ccflag = -fPIC in Makeconfig
is used instead to get the R_390_GOTENT relocations in gcrt1.o.

ChangeLog:

	* sysdeps/s390/s390-32/Makefile (pic-ccflag): Remove.
	* sysdeps/s390/s390-64/Makefile: Likewise.
2016-05-11 15:51:25 +02:00
H.J. Lu
2a1f15b1a9 Remove x86 ifunc-defines.sym and rtld-global-offsets.sym
Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym.  Remove
x86 ifunc-defines.sym and rtld-global-offsets.sym.  No code changes on
i686 and x86-64.

	* sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers):
	Remove ifunc-defines.sym.
	* sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers):
	Likewise.
	* sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed.
	* sysdeps/x86/rtld-global-offsets.sym: Likewise.
	* sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise.
	* sysdeps/x86/Makefile (gen-as-const-headers): Remove
	rtld-global-offsets.sym.
	* sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ...
	* sysdeps/x86/cpu-features-offsets.sym: This.
	* sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h>
	instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
2016-05-11 05:51:39 -07:00
Florian Weimer
8db2cf163e getaddrinfo: Restore RES_USE_INET6 flag on error path [BZ #19994] 2016-05-10 10:09:24 +02:00
Stefan Liebler
b91a333ecb S390: Add support for vdso getcpu symbol.
This patch adds support for symbol __kernel_getcpu in vDSO,
which is available with kernel 4.5.
Now sched_getcpu is using this symbol if available in mapped vDSO
by defining macro HAVE_GETCPU_VSYSCALL. If not available at runtime,
the former syscall is used.
2016-05-09 11:05:45 +02:00
H.J. Lu
a9558b49b3 Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86
Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86.  No code changes on x86
and x86_64.

	* sysdeps/i386/cacheinfo.c: Include <sysdeps/x86/cacheinfo.c>
	instead of <sysdeps/x86_64/cacheinfo.c>.
	* sysdeps/x86_64/cacheinfo.c: Moved to ...
	* sysdeps/x86/cacheinfo.c: Here.
2016-05-08 08:49:18 -07:00
Samuel Thibault
04794f3e7e Revert "aio: fix newp->running data race"
This reverts commit fd67a9cf7b.
2016-05-04 15:52:30 +02:00
Samuel Thibault
fd67a9cf7b aio: fix newp->running data race
* sysdeps/pthread/aio_misc.c (__aio_enqueue_request): Do not write
	`running` field of `newp` when a thread was started to process it,
	since that thread will not take `__aio_requests_mutex`, and the field
	already has the proper value actually.
2016-05-04 15:14:29 +02:00
Gabriel F. T. Gomes
eb3b8a4924 powerpc: Fix operand prefixes
The file sysdeps/powerpc/sysdeps.h defines aliases for condition register
operands.  E.g.: 'cr7' means condition register 7.  On the one hand, this
increases readability, as it makes it easier for readers to know whether the
operand is a condition register, a general purpose register or an immediate.
On the other hand, this permits that condition registers be written as if they
were general purpose, and vice-versa, thus reducing the readability of the
code.

This commit removes some of these unintentional misuses.

The changes have no effect on the final code.  Checked with objdump.
2016-05-04 09:14:52 -03:00
Florian Weimer
5171f3079f CVE-2016-1234: glob: Do not copy d_name field of struct dirent [BZ #19779]
Instead, we store the data we need from the return value of
readdir in an object of the new type struct readdir_result.
This type is independent of the layout of struct dirent.
2016-05-04 12:09:35 +02:00
Paul E. Murphy
cbc06bc486 powerpc: Add missing insn in swapcontext [BZ #20004]
A missing instruction was discovered in the compat version of
swapcontext while running the GCC test suite.
2016-05-03 10:45:51 -05:00
Adhemerval Zanella
230528c467 powerpc: Fix clone CLONE_VM compare
This patch fixes the clone CLONE_VM change from 0cb313f (BZ#19957)
where the commit changed the register that contains the save flags
argument to compare with (from r28 to r29).  This patch changes
back to correct register.

Tested on powerpc32 (thanks to Tulio Magno Quites Machado Filho).

	* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S (__clone): Fix
	flags CLONE_VM compare.
2016-05-02 17:44:00 -03:00
Andreas Schwab
8a9ea3ccc5 m68k: use large PIC model for gcrt1.o 2016-04-30 18:51:43 +02:00
Andreas Schwab
4816d802ff m68k: avoid local labels in symbol table 2016-04-30 18:50:39 +02:00
Adhemerval Zanella
0cb313f7cb Fix clone (CLONE_VM) pid/tid reset (BZ#19957)
As discussed in libc-alpha [1] current clone with CLONE_VM (without
CLONE_THREAD set) will reset the pthread pid/tid fields to -1.  The
issue is since memory is shared between the parent and child it will
clobber parent's cached pid/tid leading to internal inconsistencies
if the value is not restored.

And even it is restored it may lead to racy conditions when between
set/restore a thread might invoke pthread function that validate the
pthread with INVALID_TD_P/INVALID_NOT_TERMINATED_TD_P and thus get
wrong results.

As stated in BZ19957, previously reports of this behaviour was close
with EWONTFIX due the fact usage of clone outside glibc is tricky
since glibc requires consistent internal pthread, while using clone
directly may not provide it. However since now posix_spawn uses
clone (CLONE_VM) to fixes various issues related to previous vfork
usage this issue requires fixing.

The vfork implementation also does something similar, but instead
it negates and restores only the *pid* field and functions that
might access its value know to handle such case (getpid, raise
and pthread ones that uses INVALID_TD_P/INVALID_NOT_TERMINATED_TD_P
macros that check only *tid* field).  Also vfork does not call
__clone directly, instead calling either __NR_vfork or __NR_clone
directly.

So this patch removes this clone behavior by avoiding setting
the pthread pid/tid field for CLONE_VM. There is no need to
check for CLONE_THREAD, since the minimum supported kernel in all
architecture implies that CLONE_VM must be used with CLONE_THREAD,
otherwise clone returns EINVAL.

Instead of current approach of:

   int clone(int (*fn)(void *), void *child_stack, int flags, ...)
      [...]
      if (flags & CLONE_THREAD)
        goto do_syscall;
      pid_t new_value;
      if (flags & CLONE_VM)
        new_value = -1;
      else
        new_value = getpid ();
      THREAD_SETMEM (THREAD_SELF, pid, new_value);
      THREAD_SETMEM (THREAD_SELF, tid, new_value);

    do_syscall:
      [...]

The new approach uses:

   int clone(int (*fn)(void *), void *child_stack, int flags, ...)
      [...]
      if (flags & CLONE_VM)
        goto do_syscall;
      pid_t new_value = getpid ();
      THREAD_SETMEM (THREAD_SELF, pid, new_value);
      THREAD_SETMEM (THREAD_SELF, tid, new_value);

    do_syscall:
      [...]

It also removes the linux tst-getpid2.c test which expects the previous
behavior and instead add another clone test.

Tested on x86_64, i686, x32, powerpc64le, aarch64, armhf, s390, and
s390x. I also did limited check on mips32 and sparc64 (using the new
added test).

I also got reviews from both m68k, hppa, and tile.  So I presume for
these architecture the patch works.

The fixes for alpha, microblaze, sh, ia64, and nio2 have not been
tested.

[1] https://sourceware.org/ml/libc-alpha/2016-04/msg00307.html

	* sysdeps/unix/sysv/linux/Makefile [$(subdir) == nptl] (test): Remove
	tst-getpid2.
	(test): Add tst-clone2.
	* sysdeps/unix/sysv/linux/tst-clone2.c: New file.
	* sysdeps/unix/sysv/linux/aarch64/clone.S (__clone): Do not change
	pid/tid fields for CLONE_VM.
	* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/i386/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/mips/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise.
	* sysdeps/unix/sysv/linux/tst-getpid2.c: Remove file.
2016-04-29 18:19:30 -03:00
Gabriel F. T. Gomes
72c11b353e powerpc: Zero pad using memset in strncpy/stpncpy
Call __memset_power8 to pad, with zeros, the remaining bytes in the
dest string on __strncpy_power8 and __stpncpy_power8.  This improves
performance when n is larger than the input string, giving ~30% gain for
larger strings without impacting much shorter strings.
2016-04-29 10:05:33 -03:00
Florian Weimer
4ab2ab03d4 CVE-2016-3706: getaddrinfo: stack overflow in hostent conversion [BZ #20010]
When converting a struct hostent response to struct gaih_addrtuple, the
gethosts macro (which is called from gaih_inet) used alloca, without
malloc fallback for large responses.  This commit changes this code to
use calloc unconditionally.

This commit also consolidated a second hostent-to-gaih_addrtuple
conversion loop (in gaih_inet) to use the new conversion function.
2016-04-29 10:35:34 +02:00
Stefan Liebler
b06549a5e6 Add missing iucv related defines.
this patch adds the missing SOL_IUCV socket level definition
and socket options SO_IPRMDATA_MSG, SO_MSGLIMIT, SO_MSGSIZE
which can be used with get/setsockopt().
SCM_IUCV_TRGCLS is needed to send/receive ancillary data with send/recvmsg().

The defines are copied from kernel-source:
include/net/iucv/af_iucv.h
include/linux/socket.h
2016-04-27 09:08:29 +02:00
Paul E. Murphy
8f1b841e45 powerpc: Add optimized strcspn for P8
A few minor adjustments to the P8 strspn gives us
an almost equally optimized P8 strcspn.
2016-04-25 09:11:02 -05:00
Chung-Lin Tang
a5507dfa60 Fix stdlib/tst-makecontext regression for Nios II 2016-04-25 00:08:17 -07:00
Rajalakshmi Srinivasaraghavan
e413b14e18 powerpc: strcasestr optmization for power8
This patch optimizes strcasestr function for power >= 8 systems.  The average
improvement of this optimization is ~40% and compares 16 bytes at a time
using vector instructions.  This patch is tested on powerpc64 and powerpc64le.
2016-04-22 19:23:13 +05:30
Samuel Thibault
6f8222a1c5 Fix gprof timing
* sysdeps/mach/hurd/profil.c (__profile_frequency): Return tick
	frequency instead of tick length in us.
2016-04-19 23:27:27 +02:00
Samuel Thibault
593285ac15 hurd: fix profiling short-living processes
* sysdeps/mach/hurd/profil.c (update_waiter): Initialize
	profil_reply_port.
	(profile_waiter): Do not initialize profil_reply_port.
2016-04-19 00:54:24 +02:00
Carlos Eduardo Seo
1b045ee53e powerpc: Optimization for strlen for POWER8.
This implementation takes advantage of vectorization to improve performance of
the loop over the current strlen implementation for POWER7.
2016-04-15 17:19:19 -03:00
H.J. Lu
2e2d9796da Detect Intel Goldmont and Airmont processors
Updated from the model numbers of Goldmont and Airmont processors in
Intel64 And IA-32 Processor Architectures Software Developer's Manual
Volume 3 Revision 058.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Detect Intel
	Goldmont and Airmont processors.
2016-04-15 05:23:06 -07:00
Adhemerval Zanella
41e77f36d4 Fix pread consolidation on ports that require argument alignment
This patch fixes the __ALIGNMENT_{ARG,COUNT} definition for ports that
define __ASSUME_ALIGNED_REGISTER_PAIRS by including the kernel-features.h
(where it is defined if the case).

This was shown on arm with failing cases:

FAIL: debug/tst-chk1
FAIL: debug/tst-chk2
FAIL: debug/tst-chk3
FAIL: debug/tst-chk4
FAIL: debug/tst-chk5
FAIL: debug/tst-chk6
FAIL: debug/tst-lfschk1
FAIL: debug/tst-lfschk2
FAIL: debug/tst-lfschk3
FAIL: debug/tst-lfschk4
FAIL: debug/tst-lfschk5
FAIL: debug/tst-lfschk6
FAIL: posix/tst-preadwrite
FAIL: posix/tst-preadwrite64

The patches fixes it.  Tested on armhf.

	* sysdeps/unix/sysv/linux/sysdep.h: Include kernel-features.h.
2016-04-14 16:49:40 -03:00
Florian Weimer
ae9e94e744 malloc: Remove unused definitions of thread_atfork, thread_atfork_static 2016-04-14 09:17:36 +02:00
Florian Weimer
29d794863c malloc: Run fork handler as late as possible [BZ #19431]
Previously, a thread M invoking fork would acquire locks in this order:

  (M1) malloc arena locks (in the registered fork handler)
  (M2) libio list lock

A thread F invoking flush (NULL) would acquire locks in this order:

  (F1) libio list lock
  (F2) individual _IO_FILE locks

A thread G running getdelim would use this order:

  (G1) _IO_FILE lock
  (G2) malloc arena lock

After executing (M1), (F1), (G1), none of the threads can make progress.

This commit changes the fork lock order to:

  (M'1) libio list lock
  (M'2) malloc arena locks

It explicitly encodes the lock order in the implementations of fork,
and does not rely on the registration order, thus avoiding the deadlock.
2016-04-14 09:17:02 +02:00
Florian Weimer
b49ab5f450 Remove union wait [BZ #19613]
The overloading approach in the W* macros was incompatible with
integer expressions of a type different from int.  Applications
using union wait and these macros will have to migrate to the
POSIX-specified int status type.
2016-04-14 08:54:57 +02:00
Andreas Schwab
b4bcb3aec6 Register extra test objects
This makes sure that the extra test objects are compiled with the correct
MODULE_NAME and dependencies are tracked.
2016-04-13 17:07:13 +02:00
H.J. Lu
a057f5f8cd X86-64: Use non-temporal store in memcpy on large data
The large memcpy micro benchmark in glibc shows that there is a
regression with large data on Haswell machine.  non-temporal store in
memcpy on large data can improve performance significantly.  This
patch adds a threshold to use non temporal store which is 6 times of
shared cache size.  When size is above the threshold, non temporal
store will be used, but avoid non-temporal store if there is overlap
between destination and source since destination may be in cache when
source is loaded.

For size below 8 vector register width, we load all data into registers
and store them together.  Only forward and backward loops, which move 4
vector registers at a time, are used to support overlapping addresses.
For forward loop, we load the last 4 vector register width of data and
the first vector register width of data into vector registers before the
loop and store them after the loop.  For backward loop, we load the first
4 vector register width of data and the last vector register width of
data into vector registers before the loop and store them after the loop.

	[BZ #19928]
	* sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold):
	New.
	(init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6
	times of shared cache size.
	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S
	(VMOVNT): New.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S
	(VMOVNT): Likewise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
	(VMOVNT): Likewise.
	(VMOVU): Changed to movups for smaller code sizes.
	(VMOVA): Changed to movaps for smaller code sizes.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update
	comments.
	(PREFETCH): New.
	(PREFETCH_SIZE): Likewise.
	(PREFETCHED_LOAD_SIZE): Likewise.
	(PREFETCH_ONE_SET): Likewise.
	Rewrite to use forward and backward loops, which move 4 vector
	registers at a time, to support overlapping addresses and use
	non temporal store if size is above the threshold and there is
	no overlap between destination and source.
2016-04-12 08:10:47 -07:00
Matthew Fortune
b39d84adff VDSO support for MIPS
This patch adds support for using the implementations of gettimeofday()
and clock_gettime() provided by the kernel in the VDSO. The VDSO will
always provide clock_gettime() as CLOCK_{REALTIME,MONOTONIC}_COARSE can
be implemented regardless of platform. CLOCK_{REALTIME,MONOTONIC}, along
with gettimeofday(), are only implemented on platforms which make use of
either the CP0 count or GIC as their clocksource. On other platforms,
the VDSO does not provide the __vdso_gettimeofday symbol, as it is
never useful.

The VDSO functions return ENOSYS when they encounter an unsupported
request, in which case glibc should fall back to the standard syscall.

Tested with upstream kernel 4.5 and QEMU emulating Malta.

./vdsotest gettimeofday bench
gettimeofday: syscall: 1021 nsec/call
gettimeofday:    libc: 262 nsec/call
gettimeofday:    vdso: 174 nsec/call

	* sysdeps/unix/sysv/linux/mips/Makefile (sysdep_routines):
	Include dl-vdso.
	* sysdeps/unix/sysv/linux/mips/Versions: Add
	__vdso_clock_gettime.
	* sysdeps/unix/sysv/linux/mips/init-first.c: New file.
	* sysdeps/unix/sysv/linux/mips/libc-vdso.h: New file.
	* sysdeps/unix/sysv/linux/mips/mips32/sysdep.h:
	(INTERNAL_VSYSCALL_CALL): Define to be compatible with MIPS
	definitions of INTERNAL_SYSCALL_{ERROR_P,ERRNO}.
	(HAVE_CLOCK_GETTIME_VSYSCALL): Define.
	(HAVE_GETTIMEOFDAY_VSYSCALL): Define.
	* sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n64/sysdep.h: Likewise.
2016-04-12 11:05:13 +01:00
Adhemerval Zanella
071af4769f Consolidate pwrite/pwrite64 implementations
This patch consolidates all the pwrite/pwrite64 implementation for Linux
in only one (sysdeps/unix/sysv/linux/pwrite{64}.c).  It also removes the
syscall from the auto-generation using assembly macros.

For pwrite{64} offset argument placement the new SYSCALL_LL{64} macro
is used.  For pwrite ports that do not define __NR_pwrite will use
__NR_pwrite64 and for pwrite64 ports that dot define __NR_pwrite64 will
use __NR_pwrite for the syscall.

Checked on x86_64, x32, i386, aarch64, and ppc64le.

	* sysdeps/unix/sysv/linux/arm/pwrite.c: Remove file.
	* sysdeps/unix/sysv/linux/arm/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pwrite.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pwrite.c: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (prite): Remove
	syscalls generation.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h
	[__NR_pwrite64] (__NR_write): Remove define.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h
	[__NR_pwrite64] (__NR_write): Remove define.
	* sysdeps/unix/sysv/linux/pwrite.c [__NR_pwrite64] (__NR_pwrite):
	Remove define.
	(__libc_pwrite): Use SYSCALL_LL macro on offset argument.
	* sysdeps/unix/sysv/linux/pwrite64.c [__NR_pwrite64] (__NR_pwrite):
	Remove define.
	(__libc_pwrite64): Use SYSCALL_LL64 macro on offset argument.
	* sysdeps/unix/sysv/linux/sh/pwrite.c: Rewrite using default
	Linux implementation as base.
	* sysdeps/unix/sysv/linux/sh/pwrite64.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pwrite.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pwrite64.c: Likewise.
2016-04-11 10:08:01 -03:00
Adhemerval Zanella
77a4fbd536 Consolidate pread/pread64 implementations
This patch consolidates all the pread/pread64 implementation for Linux
in only one (sysdeps/unix/sysv/linux/pread.c).  It also removes the
syscall from the auto-generation using assembly macros.

For pread{64} offset argument placement the new SYSCALL_LL{64} macro
is used.  For pread ports that do not define __NR_pread will use
__NR_pread64 and for pread64 ports that dot define __NR_pread64 will
use __NR_pread for the syscall.

Checked on x86_64, x32, i386, aarch64, and ppc64le.

	* sysdeps/unix/sysv/linux/arm/pread.c: Remove file.
	* sysdeps/unix/sysv/linux/arm/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pread.c: Likewise.
	* sysdeps/unix/sysv/linux/generic/wordsize-32/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pread.c: Likewise,
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (pread): Remove
	syscall generation.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/sysdep.h
	[__NR_pread64] (__NR_pread): Remove define.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/sysdep.h:
	[__NR_pread64] (__NR_pread): Likewise.
	* sysdeps/unix/sysv/linux/pread.c [__NR_pread64] (__NR_pread): Remove
	define.
	(__libc_pread): Use SYSCALL_LL macro on offset argument.
	* sysdeps/unix/sysv/linux/pread64.c [__NR_pread64] (__NR_pread):
	Remove define.
	(__libc_pread64): Use SYSCALL_LL64 macro on offset argument.
	* sysdeps/unix/sysv/linux/sh/pread.c: Rewrite using default
	Linux implementation as base.
	* sysdeps/unix/sysv/linux/sh/pread64.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pread.c: Likewise.
	* sysdeps/unix/sysv/linux/mips/pread64.c: Likewise.
2016-04-11 10:08:01 -03:00
Adhemerval Zanella
eeddfa91cb Consolidate off_t/off64_t syscall argument passing
This patch add three new macros (SYSCALL_LL, SYSCALL_LL64, and
__ASSUME_WORDSIZE64_ILP32) to use along with off_t and off64_t argument
syscalls.  The rationale for this change is:

1. Remove multiple implementations for the same syscall for different
   architectures (for instance, pread have 6 different implementations).

2. Also remove the requirement to use syscall wrappers for cancellable
   entrypoints.

The macro usage should be used along __ALIGNMENT_ARG to follow ABI constrains
for architecture where it applies.  For instance, pread can be rewritten as:

  return SYSCALL_CANCEL (pread, fd, buf, count,
                         __ALIGNMENT_ARG SYSCALL_LL (offset));

Another macro, SYSCALL_LL64, is provided for off64_t.  The macro
__ASSUME_WORDSIZE64_ILP32 is used by the ABI to define is uses 64-bit register
even if ABI is ILP32 (for instance x32 and mips64-n32).

The changes itself are not currently used in any implementation, so no
code change is expected.

	* sysdeps/unix/sysv/linux/generic/sysdep.h (__ALIGNMENT_ARG): Move
	definition.
	(__ALIGNMENT_COUNT): Likewise.
	* sysdeps/unix/sysv/linux/sysdep.h (__ALIGNMENT_ARG): To here.
	(__ALIGNMENT_COUNT): Likewise.
	(SYSCALL_LL): New define.
	(SYSCALL_LL64): Likewise.
	* sysdeps/unix/sysv/linux/mips/kernel-features.h:
	[_MIPS_SIM == _ABIO32] (__ASSUME_WORDSIZE64_ILP32): Define.
	* sysdeps/unix/sysv/linux/x86_64/kernel-features.h:
	[ILP32] (__ASUME_WORDSIZE64_ILP32): Likewise.
2016-04-11 10:07:53 -03:00
Adhemerval Zanella
482b2f87a8 Define __ASSUME_ALIGNED_REGISTER_PAIRS for missing ports
This patch defines __ASSUME_ALIGNED_REGISTER_PAIRS for the missing
ports that require 64-bit value (e.g., long long) to be aligned to
an even register pair in argument passing.

No code change is expected, tested with builds for powerpc32,
mips-o32, and armhf.

	* sysdeps/unix/sysv/linux/arm/kernel-features.h
	(__ASSUME_ALIGNED_REGISTER_PAIRS): Define.
	* sysdeps/unix/sysv/linux/mips/kernel-features.h
	[_MIPS_SIM == _ABIO32] (__ASSUME_ALIGNED_REGISTER_PAIRS): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/kernel-features.h
	[!__powerpc64__] (__ASSUME_ALIGNED_REGISTER_PAIRS): Likewise.
2016-04-11 09:15:11 -03:00