This patch adds an optimized memset implementation for POWER8. For
sizes from 0 to 255 bytes, a word/doubleword algorithm similar to
POWER7 optimized one is used.
For size higher than 255 two strategies are used:
1. If the constant is different than 0, the memory is written with
altivec vector instruction;
2. If constant is 0, dbcz instructions are used. The loop is unrolled
to clear 512 byte at time.
Using vector instructions increases throughput considerable, with a
double performance for sizes larger than 1024. The dcbz loops unrolls
also shows performance improvement, by doubling throughput for sizes
larger than 8192 bytes.
This patch cleanups the multiarch bzero for powerpc64 by remove
the multiarch objects and use instead the the memset embedded
implementation presented in each multiarch optimization. The
code generate is essentially the same, but the TB_TOCLESS (which
is not essential).
GCC 4.4, the minimum compiler version, supports this option. Unlike
other warnings, -Wimplicit-function-declaration warnings should be
independent of compiler versions, so this change should not cause
compiler-specific build failures.
Merge roland/nptl-hppa to master, update and test for hppa-linux-gnu.
This commit squashes and commits the work done by Roland McGrath on
roland/nptl-hppa to migrate hppa to the new non-addon NPTL. Some
additional tweaks were required for tcb-offsets.sym to work correctly
along with clone.S (unique to hppa).
Some types of relocations technically need to be signed rather than
unsigned: in particular ones that are used with moveli or movei,
or for jump and branch. This is almost never a problem. Jump and
branch opcodes are pretty much uniformly resolved by the static linker
(unless you omit -fpic for a shared library, which is not recommended).
The moveli and movei opcodes that need to be sign-extended generally
are for positive displacements, like the construction of the address of
main() from _start(). However, tst-pie1 ends up with main below _start
(in a different module) and the test failed due to signedness issues in
relocation handling.
This commit treats the value as signed when shifting (to preserve the
high bit) and also sign-extends the value generated from the updated
bundle when comparing with the desired bundle, which we do to make sure
no overflow occurred. As a result, the tst-pie1 test now passes.
generic HAVE_RM_CTX implementation which is used for ppc/e500 as well
has introduced calls to fegetenv which should be resolved internally
with in libm
Signed-off-by: Khem Raj <raj.khem@gmail.com>
* sysdeps/powerpc/powerpc32/e500/nofpu/fegetenv.c (fegetenv): Add
libm_hidden_ver.
If e.g. a signal is being received while we are running fork(), the signal
thread may be having our SS lock when we make the space copy, and thus in the
child we can not take the SS lock any more.
* sysdeps/mach/hurd/fork.c (__fork): Lock SS->lock around __proc_dostop call.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
TLS_INIT_TP in sysdeps/i386/nptl/tls.h uses some hand written asm to
generate a set_thread_area that might result in exchanging ebx and esp
around the syscall causing introspection tools like valgrind to loose
track of the user stack. Just use INTERNAL_SYSCALL which makes sure
esp isn't changed arbitrarily.
Before the patch the code would generate:
mov $0xf3,%eax
movl $0xfffff,0x8(%esp)
movl $0x51,0xc(%esp)
xchg %esp,%ebx
int $0x80
xchg %esp,%ebx
Using INTERNAL_SYSCALL instead will generate:
movl $0xfffff,0x8(%esp)
movl $0x51,0xc(%esp)
xchg %ecx,%ebx
mov $0xf3,%eax
int $0x80
xchg %ecx,%ebx
Thanks to Florian Weimer for analysing why the original code generated
the bogus esp usage:
_segdescr.desc happens to be at the top of the stack, so its address
is in %esp. The asm statement says that %3 is an input, so its value
will not change, and GCC can use %esp as the input register for the
expression &_segdescr.desc. But the constraints do not fully describe
the asm statement because the %3 register is actually modified, albeit
only temporarily.
[BZ #17319]
* sysdeps/i386/nptl/tls.h (TLS_INIT_TP): Use INTERNAL_SYSCALL
to call set_thread_area instead of hand written asm.
(__NR_set_thread_area): Removed define.
(TLS_FLAG_WRITABLE): Likewise.
(__ASSUME_SET_THREAD_AREA): Remove check.
(TLS_EBX_ARG): Remove define.
(TLS_LOAD_EBX): Likewise.
pthread_atfork is already built in an extra-libs context, which gives
it NOT_IN_libc in its CPPFLAGS. Adding the same definition to CFLAGS
is pointless.
Verified that the code is unchanged on x86_64.
These programs get the NOT_IN_libc twice, once through the 'other'
target and another explicitly. Remove the explicitly added CPFLAG.
* catgets/Makefile (CPPFLAGS-gencat): Remove.
* iconv/Makefile (CPPFLAGS-iconv_prog): Likewise.
(CPPFLAGS-iconvconfig): Likewise.
* timezone/Makefile (CPPFLAGS-zic): Likewise.
In my powerpc32 testing I've observed misc/test-gettimebasefreq
failing.
This is a glibc build (soft-float, though that's not relevant here)
without any --with-cpu and without any special configuration of the
default CPU for GCC either. In particular, it's one not using
sysdeps/powerpc/powerpc32/power4/hp-timing.h (although in fact the
processor I'm using for testing is POWER4-based), so hp_timing_t is
32-bit not 64-bit. But the VDSO call being used by
INTERNAL_VSYSCALL_NO_SYSCALL_FALLBACK is generating a 64-bit result
(high part in r3, low part in r4). The code extracting that result,
however, expects a result of the type hp_timing_t as passed to
INTERNAL_VSYSCALL_NO_SYSCALL_FALLBACK, meaning that only r3 (= 0) is
used and the value in r4 is ignored. This patch fixes this by always
using uint64_t as the type in INTERNAL_VSYSCALL_NO_SYSCALL_FALLBACK -
reflecting the actual ABI (unconditional in the kernel) of that VDSO
call. This is the minimal change for this issue - no check for
overflow, no change of the type of the timebase_freq variable or the
return type of __get_clockfreq to something other than hp_timing_t
(such a change would simply move the implicit conversions to the over
callers of that function), no change to hp_timing_t itself.
Tested for powerpc32 soft float.
[BZ #17263]
* sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c: Include
<stdint.h>.
(__get_clockfreq): Use uint64_t instead of hp_timing_t in
INTERNAL_VSYSCALL_NO_SYSCALL_FALLBACK call.
Since:
commit 409e00bd69
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jan 29 07:51:41 2014 -0800
Disable x87 inline functions for SSE2 math
When i386 and x86-64 mathinline.h was merged into a single mathinline.h,
"gcc -m32" enables x87 inline functions on x86-64 even when -mfpmath=sse
and SSE2 is enabled. It is a regression on x86-64. We should check
__SSE2_MATH__ instead of __x86_64__ when disabling x87 inline functions.
gcc-3.2 is unable to correctly compile x86_64 routines for llrint
since it gets redefined. This is because gcc 3.2 does not set
__SSE2_MATH__ for x86_64, thus exposing the duplicate definition.
The correct fix ought to be to check for both __SSE2_MATH__ and
__x86_64__ and enable those bits only when neither are defined.
Tested fix with the reproducer for
409e00bd69 as well as with gcc-3.2.
The compiler doesn't know that the cpuid asm statement in intel_check_word
will trash RBX. We are lucky that it doesn't cause any problems since
RBX is also used by compiler for other purposes so that RBX is saved and
restored. This patch replaces it with __cpuid_count.
[BZ #17259]
* sysdeps/x86_64/cacheinfo.c (intel_check_word): Replace cpuid
asm statement with __cpuid_count.
Older versions of ld on ia64 support __ehdr_start, but generate relocs
when they shouldn't. This causes the ld.so to not run because it tries
to resolve the __ehdr_start symbol (but it's not exported).
On powerpc, floating-point environment macros are defined as pointers
to constants in the library that contain the bit-patterns of the
desired environment, instead of being magic constants cast to pointer
type.
For soft-float, the bit-patterns used for fenv_t are not laid out the
same as for hard-float. (e500 has a third layout used; that's not an
ABI issue because these values are only meaningful within a single
process, all of whose glibc libraries must come from the same build of
glibc.) While the __fe_dfl_env value for soft-float was appropriate
for the soft-float fenv_t representation, the other two constants had
the same bit-patterns as for hard-float. Those bit patterns had the
effect of having exceptions already raised, causing
math/test-fenv-return to fail; this patch fixes the patterns used.
(__fe_nonieee_env also had exceptions unmasked, though they should be
masked to match hard-float semantics. Since there is no separate
non-IEEE mode for soft-float, it's most appropriate for
__fe_nonieee_env to be the same as __fe_dfl_env; this patch makes it
an alias.)
Tested for powerpc-nofpu.
[BZ #17261]
* sysdeps/powerpc/nofpu/fenv_const.c (__fe_enabled_env): Change
value to 0.
(__fe_nonieee_env): Define as an alias for __fe_dfl_env.
2014-08-12 Bernard Ogden <bernie.ogden@linaro.org>
[BZ #16892]
* sysdeps/nptl/lowlevellock.h (__lll_timedlock): Use
atomic_compare_and_exchange_bool_acq rather than atomic_exchange_acq.
This test should be more robust about setting up its lang dirs.
I had two completely different systems (ia64 & x86_64) get wedged
in a way where the test just kept FAILing on me due to some of the
files missing. This probably wasn't a big deal until the recent
commit which made checking of the locale dirs more robust (for
security reasons).