The optimization is achieved by following techniques:
> data alignment [gain from aligned memory access on read/write]
> POWER7 gains performance with loop unrolling/unwinding
[gain by reduction of branch penalty].
> zero padding done by calling optimized memset
This patch changes de default symbol redirection for internal call of
memcpy, memset, memchr, and strlen to the IFUNC resolved ones. The
performance improvement is noticeable in algorithms that uses these
symbols extensible, like the regex functions.
In 7447ccd98e, the XDR currency was
removed from locale/iso-4217.def, despite the fact that it's both
still a part of the standard, according to the official table:
http://www.currency-iso.org/dam/downloads/table_a1.xml
... and, more importantly, is referenced from localedata/i18n, so
any quick-and-dirty locale definition that uses "copy i18n" for
LC_MONETARY wouldn't work anymore.
Define FEATURE_INDEX_1 and FEATURE_INDEX_MAX as macros
for use by both assembly and C code. This fixes the
-Wundef error for cases where FEATURE_INDEX_1 was not
defined but used the correct value of 0 for an undefined
macro.
[BZ #16885]
* sysdeps/sparc/sparc64/strcmp.S: Fix end comparison handling when
multiple zero bytes exist at the end of a string.
Reported by Aurelien Jarno <aurelien@aurel32.net>
* string/test-strcmp.c (check): Add explicit test for situations where
there are multiple zero bytes after the first.
lowlevellock.c for arm differs from the generic lowlevellock.c only in
insignificant ways, so can be removed. Happily, this fixes BZ 15119
(unnecessary busy loop in __lll_timedlock_wait on arm).
The notable differences between the arm and generic implementations are:
1) arm __lll_timedlock_wait has a fast path out if futex has been set
to 0 between since the function was called. This seems unlikely to
happen very often, so it seems at worst harmless to lose this fast
path.
2) Some function in arm's lowlevellock.c set futex to 2 if it was 1.
The generic version always sets the futex to 2. As futex can only be
0, 1 or 2 on entry into these functions, the behaviour is equivalent.
(If the futex manages to be 0 on entry then we've just lost another
unlikely fast path out.)
There are no test suite regressions.
Note that hppa and sparc also have their own lowlevellock.c. I believe
hppa can also be removed, so I'll send a separate patch for that
shortly. sparc's seems to be genuinely needed as it uses a different
locking structure.
Also note that the analysis at
https://sourceware.org/ml/libc-ports/2013-02/msg00021.html indicates a
further locking performance bug to fix - I've got a partial patch for
that which I can submit once I've finished testing.
2014-05-01 Bernard Ogden <bernie.ogden@linaro.org>
[BZ #15119]
* sysdeps/unix/sysv/linux/arm/nptl/lowlevellock.c: Remove file.
This patch fixes what I believe to be a bug in the handling of
R_ARM_IRELATIVE RELA relocations. At present, these are handled the
same as REL relocations: i.e. the addend is loaded from the relocation
address. Most of the time this isn't a problem because RELA relocations
aren't used on ARM (GNU/Linux at least) anyway, but it causes problems
with prelink, which uses RELA on all targets for its conflict table.
(Support for ifunc prelinking requires a prelink patch, not yet posted.)
Anyway, this patch works, though I'm not 100% sure if it is correct: I
notice that this code path received attention last year:
https://sourceware.org/ml/libc-ports/2013-07/msg00000.html
I'm not sure under what circumstances that patch would have had an
effect, nor if my patch conflicts with that case.
No regressions using Mentor's usual glibc cross-testing infrastructure.
[BZ #16888]
* sysdeps/arm/dl-machine.h (elf_machine_rela): Fix R_ARM_IRELATIVE
handling.
This patch increases the minimum Linux kernel version for glibc to
2.6.32, as discussed in the thread starting at
<https://sourceware.org/ml/libc-alpha/2014-01/msg00511.html>.
This patch just does the minimal change to arch_minimum_kernel
settings (and LIBC_LINUX_VERSION, which determines the minimum kernel
headers version, as it doesn't make sense for that to be older than
the minimum kernel that can be used at runtime). Followups would be
expected to do, roughly and not necessarily precisely in this order:
* Remove __LINUX_KERNEL_VERSION checks in kernel-features.h files
where those checks are always true / always false for kernels 2.6.32
and above.
* Otherwise simplify/improve conditionals in those files (for example,
where defining once in the main file then undefining in
architecture-specific files makes things clearer than having lots of
separate definitions of the same macro), possibly fixing in the
process cases where a macro should optimally have been defined for a
given architecture but wasn't. (In the review in preparation for
this version increase I checked what the right conditions should be
for all macros in the main kernel-features.h whose definitions there
would have been affected by the increase - but I only fixed that
subset of the issues found where --enable-kernel=2.6.32 would have
caused a kernel feature to be wrongly assumed to be present, not any
cases where a feature is not assumed but could be assumed.)
* Remove conditionals on __ASSUME_* where they can now be taken to be
always-true, and the definitions when the macros are only used in
Linux-specific files.
* Split more architectures out of the main kernel-features.h (like
ex-ports architectures), once various of the architecture
conditionals there have been eliminated so the new
architecture-specific files are no larger than actually necessary.
Tested x86_64.
2014-03-27 Joseph Myers <joseph@codesourcery.com>
[BZ #9894]
* sysdeps/unix/sysv/linux/configure.ac (LIBC_LINUX_VERSION):
Change to 2.6.32.
(arch_minimum_kernel): Change all 2.6.16 settings to 2.6.32.
* sysdeps/unix/sysv/linux/configure: Regenerated.
* sysdeps/unix/sysv/linux/microblaze/configure.ac: Remove file.
* sysdeps/unix/sysv/linux/microblaze/configure: Likewise.
* sysdeps/unix/sysv/linux/tile/configure.ac: Likewise.
* sysdeps/unix/sysv/linux/tile/configure: Likewise.
* README: Update reference to required Linux kernel version.
* manual/install.texi (Linux): Update reference to required Linux
kernel headers version.
* INSTALL: Regenerated.
Continuing the series of patches to clean up conformtest expectations
for "POSIX" (1995/6) based on review of the expectations against the
standard, this patch cleans up expectations for stdlib.h and
string.h. Tested x86_64; no new XFAILs needed.
* conform/data/stdlib.h-data [POSIX] (stddef.h): Do not allow
header inclusion.
[POSIX] (limits.h): Likewise.
[POSIX] (math.h): Likewise.
[POSIX] (sys/wait.h): Likewise.
* conform/data/string.h-data [POSIX || UNIX98] (strtok_r): Require
function.
[POSIX] (stddef.h): Do not allow header inclusion.
lll_unlock() will be called again if it goes to "wake_all" in
pthread_cond_broadcast(). This may make another thread which is
waiting for lock in pthread_cond_timedwait() unlock. So there are
more than one threads get the lock, it will break the shared data.
It's introduced by commit 8313cb997d2d("FUTEX_*_REQUEUE_PI support for
non-x86 code")
The datahead structure has an unused padding field that remains
uninitialized. Valgrind prints out a warning for it on querying a
netgroups entry. This is harmless, but is a potential data leak since
it would result in writing out an uninitialized byte to the cache
file. Besides, this happens only when there is a cache miss, so we're
not adding computation to any fast path.
This patch consolidates the code to initialize the header of a dataset
into a single set of functions (one for positive and another for
negative datasets) primarily to reduce repetition of code. The
secondary reason is to simplify Patch 2/2 which fixes the problem of
an uninitialized byte in the header by initializing an unused field in
the structure and hence preventing a possible data leak into the cache
file.
[Fixes BZ #14308, #12994, #13651]
AF_UNSPEC results in sending two queries in parallel, one for the A
record and the other for the AAAA record. If one of these is a
referral, then the query fails, which is wrong. It should return at
least the one successful response.
The fix has two parts. The first part makes the referral fall back to
the SERVFAIL path, which results in using the successful response.
There is a bug in that path however, due to which the second part is
necessary. The bug here is that if the first response is a failure
and the second succeeds, __libc_res_nsearch does not detect that and
assumes a failure. The case where the first response is a success and
the second fails, works correctly.
This condition is produced by buggy routers, so here's a crude
interposable library that can simulate such a condition. The library
overrides the recvfrom syscall and modifies the header of the packet
received to reproduce this scenario. It has two key variables:
mod_packet and first_error.
The mod_packet variable when set to 0, results in odd packets being
modified to be a referral. When set to 1, even packets are modified
to be a referral.
The first_error causes the first response to be a failure so that a
domain-appended search is performed to test the second part of the
__libc_nsearch fix.
The driver for this fix is a simple getaddrinfo program that does an
AF_UNSPEC query. I have omitted this since it should be easy to
implement.
I have tested this on x86_64.
The interceptor library source:
/* Override recvfrom and modify the header of the first DNS response to make it
a referral and reproduce bz #845218. We have to resort to this ugly hack
because we cannot make bind return the buggy response of a referral for the
AAAA record and an authoritative response for the A record. */
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <stdbool.h>
#include <endian.h>
#include <dlfcn.h>
#include <stdlib.h>
/* Lifted from resolv/arpa/nameser_compat.h. */
typedef struct {
unsigned id :16; /*%< query identification number */
#if BYTE_ORDER == BIG_ENDIAN
/* fields in third byte */
unsigned qr: 1; /*%< response flag */
unsigned opcode: 4; /*%< purpose of message */
unsigned aa: 1; /*%< authoritive answer */
unsigned tc: 1; /*%< truncated message */
unsigned rd: 1; /*%< recursion desired */
/* fields
* in
* fourth
* byte
* */
unsigned ra: 1; /*%< recursion available */
unsigned unused :1; /*%< unused bits (MBZ as of 4.9.3a3) */
unsigned ad: 1; /*%< authentic data from named */
unsigned cd: 1; /*%< checking disabled by resolver */
unsigned rcode :4; /*%< response code */
#endif
#if BYTE_ORDER == LITTLE_ENDIAN || BYTE_ORDER == PDP_ENDIAN
/* fields
* in
* third
* byte
* */
unsigned rd :1; /*%< recursion desired */
unsigned tc :1; /*%< truncated message */
unsigned aa :1; /*%< authoritive answer */
unsigned opcode :4; /*%< purpose of message */
unsigned qr :1; /*%< response flag */
/* fields
* in
* fourth
* byte
* */
unsigned rcode :4; /*%< response code */
unsigned cd: 1; /*%< checking disabled by resolver */
unsigned ad: 1; /*%< authentic data from named */
unsigned unused :1; /*%< unused bits (MBZ as of 4.9.3a3) */
unsigned ra :1; /*%< recursion available */
#endif
/* remaining
* bytes
* */
unsigned qdcount :16; /*%< number of question entries */
unsigned ancount :16; /*%< number of answer entries */
unsigned nscount :16; /*%< number of authority entries */
unsigned arcount :16; /*%< number of resource entries */
} HEADER;
static int done = 0;
/* Packets to modify. 0 for the odd packets and 1 for even packets. */
static const int mod_packet = 0;
/* Set to true if the first request should result in an error, resulting in a
search query. */
static bool first_error = true;
static ssize_t (*real_recvfrom) (int sockfd, void *buf, size_t len, int flags,
struct sockaddr *src_addr, socklen_t *addrlen);
void
__attribute__ ((constructor))
init (void)
{
real_recvfrom = dlsym (RTLD_NEXT, "recvfrom");
if (real_recvfrom == NULL)
{
printf ("Failed to get reference to recvfrom: %s\n", dlerror ());
printf ("Cannot simulate test\n");
abort ();
}
}
/* Modify the second packet that we receive to set the header in a manner as to
reproduce BZ #845218. */
static void
mod_buf (HEADER *h, int port)
{
if (done % 2 == mod_packet || (first_error && done == 1))
{
printf ("(Modifying header)");
if (first_error && done == 1)
h->rcode = 3;
else
h->rcode = 0; /* NOERROR == 0. */
h->ancount = 0;
h->aa = 0;
h->ra = 0;
h->arcount = 0;
}
done++;
}
ssize_t
recvfrom (int sockfd, void *buf, size_t len, int flags,
struct sockaddr *src_addr, socklen_t *addrlen)
{
ssize_t ret = real_recvfrom (sockfd, buf, len, flags, src_addr, addrlen);
int port = htons (((struct sockaddr_in *) src_addr)->sin_port);
struct in_addr addr = ((struct sockaddr_in *) src_addr)->sin_addr;
const char *host = inet_ntoa (addr);
printf ("\n*** From %s:%d: ", host, port);
mod_buf (buf, port);
printf ("returned %zd\n", ret);
return ret;
}
This patch optimizes the FPSCR update on exception and rounding change
functions by just updating its value if new value if different from
current one. It also optimizes fedisableexcept and feenableexcept by
removing an unecessary FPSCR read.
__int128 was added in GCC 4.6 and __int128_t was added before x86-64
was supported. This patch replaces __int128 with __int128_t so that
the installed bits/link.h can be used with older GCC.
* sysdeps/x86/bits/link.h (La_x86_64_regs): Replace __int128
with __int128_t.
(La_x86_64_retval): Likewise.