These macros often set up a variable that later macros sometimes do
not use. Add unused attribute to avoid that.
Similarly, the ia64 code tends to check the err field rather than
the val (which is opposite of most arches) leading to the same
kind of warning. Replace this with a dummy reference.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The current code declares double constants by using a char buffer and
then casting the pointer to a different type. This makes the aliasing
logic unhappy. Change it to use a union instead to avoid that.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Function pointers on ia64 are like parisc -- they're plabels. While
the parisc port enjoys a gcc builtin for extracting the address here,
ia64 has no such luck.
Casting & dereferencing in one go triggers a strict aliasing warning.
Use a union to fix that.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The ia64_rse_is_rnat_slot func expects an unsigned pointer, but we're
passing in a signed pointer. The signness doesn't matter here, so
convert it to unsigned.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The strcpy and strchr (and related) functions are four times faster
than the byte-by-byte default versions.
The strlen function is twice as fast for long strings and 50% faster
for short strings over the armv4 version.
* sysdeps/unix/sysv/linux/bits/mman-linux.h (MAP_ANONYMOUS):
Allow definition via __MAP_ANONYMOUS.
* sysdeps/unix/sysv/linux/mips/bits/mman.h: Remove all defines
provided by bits/mman-linux.h and include <bits/mman-linux.h>.
(__MAP_ANONYMOUS): Define.
Written from scratch rather than copied from GMP, due to LGPL 2.1 vs
GPL 3, but tested with the GMP testsuite.
This is 250% faster than the generic code as measured on Cortex-A15,
and the same speed as GMP on the same core, and probably everywhere.
Written from scratch rather than copied from GMP, due to LGPL 2.1 vs
GPL 3, but tested with the GMP testsuite.
This is 50% faster than the generic code as measured on Cortex-A15.
It is 25% slower than the current GMP routine on the same core.
Written from scratch rather than copied from GMP, due to LGPL 2.1 vs
GPL 3, but tested with the GMP testsuite.
This is 25% faster than the generic code as measured on Cortex-A15,
and the same speed as GMP on the same core. It's probably slower
than GMP on the A8 and A9 cores though.
There was only one user. It's "condition" argument was used
for "ia" rather than an actual condition. The apcs26 syntax
is almost certainly not needed, given current binutils requirements.
For arm this makes no difference--the result is bit-for-bit identical;
for thumb this results in smaller encodings. Perhaps it ought not and
this is in fact an assembler bug, but I also think it's clearer.
The preceeding patches have allowed for the few incompatibilities
between arm and thumb2 mode, or have marked the file as not wanting
to use thumb2 mode.
Factor out the sequence needed to call kuser_get_tls, as we can't
play subtract into pc games in thumb mode. Prepare for hard-tp,
pulling the save of LR into the macro.
There are several places in which we access negative offsets from
the thread-pointer, but thumb2 only supports positive offsets in
memory references.
Avoid duplicating the rather large macros in which these references
are embedded by abstracting out the operation.
Some routines are written with complex LDM/STM insns that cannot be
used in thumb mode, or are highly conditional requiring excessive
IT insns.
When a future patch goes in to enable thumb2 by default, this marker
will be used to override that default.
New defines from gcc 4.8:
#define __ARM_ARCH_ISA_ARM 1
#define __ARM_ARCH_PROFILE 65
#define __ARM_ARCH_ISA_THUMB 2
#define __ARM_ARCH 7
all of which got in the way of the one we wanted:
#define __ARM_ARCH_7A__ 1
This feature is specifically for the C++ compiler to offload calling
thread_local object destructors on thread program exit, to glibc.
This is to overcome the possible complication of destructors of
thread_local objects getting called after the DSO in which they're
defined is unloaded by the dynamic linker. The DSO is marked as
'unloadable' if it has a constructed thread_local object and marked as
'unloadable' again when all the constructed thread_local objects
defined in it are destroyed.
There hasn't been a use for lll_unlock_wake_cb since it was
removed globally in 2007-05-29. This patch removes the
function from hppa's lowlevellock.[ch] implementation.
ARM now supports loading unmarked objects from
the dynamic loader cache. Unmarked objects can
be used with the hard-float or soft-float ABI.
We must support loading unmarked objects during
the transition period from a binutils that does
not mark objects to one that does mark them with
the correct ELF flags.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
That convention requires the instruction immediately preceding SYSCALL
to initialize $v0 with the syscall number. Then if a restart triggers,
$v0 will have been clobbered by the syscall interrupted, and needs to be
reinititalized. The kernel will decrement the PC by 4 before switching
back to the user mode so that $v0 has been reloaded before SYSCALL is
executed again. This implies the place $v0 is loaded from must be
preserved across a syscall, e.g. an immediate, static register, stack
slot, etc.
The restriction was lifted with Linux 2.6.36 kernel release and no
special requirements are placed around the SYSCALL instruction anymore,
however we still support older kernel binaries.
Previously, we would see a bad frame in the gdb backtrace output, e.g.:
(gdb) bt
#0 foo () at foo.c:5
#1 0x000000aaaab68ee8 in start_thread () from /lib/libpthread.so.0
#2 0x000000aaaad01c88 in clone () from /lib/libc.so.6
#3 0x0000000000000000 in ?? ()
With this change the bogus frame #3 is gone and we have the
same output as x86 does for the same program.
In bdd7898a58 we added self-definitions
of __isnan and friends in order to indicate specialized architecture
support, and avoid redefinitions within various generic math_private.h.
There is no generic math_private.h that concerns ldbl-128, and while
we provide __isnanl in the alpha math_private.h there's no need to
protect the function against redefinition.
In bdd7898a58 we added self-definitions
of __isnan and friends in order to indicate specialized architecture
support, and avoid redefinitions within various generic math_private.h.
There is no generic math_private.h that concerns ldbl-128, and while
we provide __isnanl in the alpha math_private.h there's no need to
protect the function against redefinition.
* sysdeps/unix/sysv/linux/aarch64/ldconfig.h: Add entries
for /lib/ld-linux.so.3 and /lib/ld-linux-armhf.so.3.
Signed-off-by: Steve McIntyre <steve.mcintyre@linaro.org>
Since we no longer support __ASSUME_POSIX_CPU_TIMERS, the ia64 code
no longer needs to override HAS_CPUCLOCK in the common file. Drop
the ia64 shim as well.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Use the new FLAG_AARCH64_LIB64 ldconfig cache tag for AArch64,
similarly to the way tags are handled for other architectures.
Signed-off-by: Steve McIntyre <steve.mcintyre@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
Signed-off-by: Marcus Shawcroft <marcus.shawcroft@linaro.org>
Atomic ops are issued directly from the core, rather than
potentially sitting in the write buffer, so can improve the
performance of other waiters. In addition, if we didn't end
up pulling a copy of the cache line where the lock is into cache,
by using an atomic op we don't have to acquire the cache line
before we can unlock.
With gcc 4.8 tilegx has support for -mcmodel=large, to tolerate very
large shared objects. This option changes the compiler output to
not include direct jump instructions, which have a range of only
2^30, i.e +/- 512MB. Instead the compiler marshalls the target PCs
into registers and then uses jump- or call-to-register instructions.
For glibc, the upshot is that we need to arrange for a few functions
to tolerate the possibility of a large range between the PC and
the target. In particular, the crti.S and start.S code needs
to be able to reach from .init to the PLT, as does gmon-start.c.
The elf-init.c code has the reverse problem, needing to call from
libc_nonshared.a (linked at the end of shared objects) back to the
_init section at the beginning.
No other functions in *_nonshared.a need to be built this way, as
they only call the PLT (or potentially each other), but all of that
code is linked at the very end of the shared object.
We don't build the standard -static archives with this option as the
performance cost is high enough and the use case is rare enough that
it doesn't seem worthwhile. Instead, we would encourage developers
who need the -static model with huge executables to build a private
copy of glibc and configure it with -mcmodel=large.
Note that libc.so et al don't need any changes; the only changes
are for code that is statically linked into user code built with
-mcmodel=large.
For the assembly code, I just rewrote it so that it unconditionally
uses the large model. To be able to pass -mcmodel=large to
csu/elf-init.c and csu/gmon-start.c, I need to check to see if the
compiler supports that flag, since gcc 4.7 doesn't; I added the
support by creating a small Makefile fragment that just runs the
compiler to check.
Normally, the simulator is notified of absolute pathnames by the
_dl_load_hook hook. However, when a relative pathname is used, the
simulator may not know that the relative path matches a path that
it could figure out in the file system that it has access to.
Instead we provide a simplified version of the realpath function
so we can pass a plausible absolute pathname to the simulator.
Since we're now doing more work at object load time, we also add
a guard so we do no work at all if we're not running on the simulator.
- Override <memcopy.h> so we use full 8-byte word copies on tilegx32
for memmove, then use op_t in memcpy instead of the previous
locally-defined word_t just to avoid proliferating identical types.
- Fix bug in memcpy prefetch that caused us to never prefetch past
the first cache line.
- Optimize misaligned memcpy by inlining _wordcopy_fwd_dest_aligned
instead of just doing a dumb word-at-a-time copy.
- Make memcpy safe for forward copies by doing all the loads from
a given cache line prior to doing a wh64 (cache line zero-fill)
on the destination. Remove now-redundant src == dst check.
- Copy and optimize the generic wordcopy.c routines to use the tile
"double align" instruction instead of the MERGE macro; to avoid
offset addressing mode (which tile doesn't have) by rewriting the
pointer math to load and store with a zero index; and to use
post-increment addresses in the inner loops to improve scheduling.