This patch tries to organize the implies files for ppc, since there are
a number of processors and most of them are compatible with each other
(backwards compatible).
Having in mind that we start the search for processor-specific files in
the sysdeps/unix/sysv/linux tree
(sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]/fpu to be
exact), we would like to grab any linux-specific code from that tree
prior to going through the other tree (sysdeps/powerpc/...).
For that, i removed the Implies files that were originally inside the
fpu directories and placed then in the non-fpu directories (still inside
the unix/sysv/linux tree). If no processor-specific/linux-specific files
could be found, we "imply" the other tree's (sysdeps/powerpc/...) fpu
directory for that specific processor AND also the non-fpu directory for
that same tree.
If, again, no processor-specific code is found, we read another Implies
file that will point to the most compatible processor that we should
grab code from, and so on, until we reach the power4 processor.
So, in summary, the Implies files will live inside these directories
now:
* sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]
* sysdeps/powerpc/powerpc[32|64]/[processor]
Practical example of the order we will use to pick power6-specific code
with the new structure.
sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6/fpu ->
sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6 ->
sysdeps/powerpc/powerpc[32|64]/power6/fpu ->
sysdeps/powerpc/powerpc[32|64]/power6 ->
sysdeps/powerpc/powerpc[32|64]/power5+/fpu ->
sysdeps/powerpc/powerpc[32|64]/power5+ ->
sysdeps/powerpc/powerpc[32|64]/power5/fpu ->
sysdeps/powerpc/powerpc[32|64]/power5 ->
sysdeps/powerpc/powerpc[32|64]/power4/fpu ->
sysdeps/powerpc/powerpc[32|64]/power4 (from here, it'll go to the
generic path as usual)
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
The getdents64 syscall adds on 32-but platforms padding which isn't needed
and not included in the userlevel data structure definition. We have to
avoid copying those padding bytes in the readdir64_r function.
When doing i686-unknown-linux-gnu build configured with --enable-kernel=2.6.24,
there are several warnings like this:
../sysdeps/unix/sysv/linux/i386/fcntl.c:36:12: warning: ‘miss_F_GETOWN_EX’ defined but not used
It's already so marked in dl-sysdep.c. Failure to so mark
in the header file leads the compiler to believe that the
variable should be addressable via the .sdata section.
Signed-off-by: Richard Henderson <rth@twiddle.net>
GCC 4.5 warns about "extern void _end; &end;".
Use char[] instead, as that also doesn't fall foul
of a target's .sdata optimizations.
Signed-off-by: Richard Henderson <rth@twiddle.net>
When not using gethostbyname4 methods we immediately aborted the loop
over the nss modules on the first successful lookup. While this is
almost always what is wanted the nsswitch.conf file allows to select
something different.
The old implementation uses fd 0 to determine the login TTY. This
was needed because using /dev/tty it is not possible to deduce the
login TTY. For some time now there is the pseudo-file
/proc/self/loginuid which directly helps us to find the user. Prefer
using this file. It also works if stdin is closed, redirected, or
re-opened.
msgrcv() does not work on sparc64, as it passes the 6th argument using
the ipc kludge, while the kernel waits for a 6 arguments syscall. This
patches fixes the problem by using a sparc64 specific version of
msgrcv.c.
2010-03-03 Aurelien Jarno <aurelien@aurel32.net>
* sysdeps/unix/sysv/linux/sparc/sparc64/msgrcv.c: New file.
2010-03-03 David S. Miller <davem@davemloft.net>
* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_lazy_rel): Must
pass '1' for 't' argument to sparc_fixup_plt.
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_lazy_rel):
Likewise.
* sysdeps/sparc/sparc32/dl-plt.h (OPCODE_BA_PT): Define.
(sparc_fixup_plt): Document 't' argument. Enable branch
optimization and use v9 branches when possible. Explain why we
cannot unconditionally patch the branch into the first PLT
instruction.
* sysdeps/sparc/sparc64/dl-plt.h (sparc64_fixup_plt): Document 't'
argument. Use v9 branches when possible. Explain why we can in
fact unconditionally use a branch in the first PLT instruction
here.
When prelinking fails we have to rewrite the PLT, but the code
doing so forgets to adjust all rela->r_offset addresses by the
location of where the object was actually mapped.
When prelinking fails we have to rewrite the PLT, but the code
doing so forgets to adjust all rela->r_offset addresses by the
location of where the object was actually mapped.
If a binary gets invoked by passing it as argument to ld.so the stack
still holds the auxiliary vector of ld.so when entering the _start
routine of the executable. So the invocation via ld.so is not fully
transparent to the executable. This causes problems if the executable
wants to scan the auxv itself.
grantpt was performing two consecutive calls to stat with the same
file name. Avoid this by creating a special version of the ptsname
function which allows to pass the stat result back to the caller.
The pt_chown program is completely transparently called. It might
not be able to live with the various file descriptors the program
has open at the time of the call (e.g., under SELinux). Close all
but the needed descriptor and connect stdin, stdout, and stderr
with /dev/null. pt_chown shouldn't print anything when called to
do real work.
This is a minor optimization. The tty group mustn't change so a
successful call to getgrnam will always return the same information.
Cache it and reuse it.
The ntp_gettime implementation of NTP exports the tai field the kernel
now produces. This requires an ABI change since the ntptimeval structure
changed. Upstream kept the same name, there is nothing to do. This
patch changes the ntptimeval structure but keeps the old ntp_gettime
definition. A new ntp_gettimex function which is transparently invoked
through the old name is introduced. This has the advantage that even
object files can remain compatible. This wouldn't be the case if
symbol versioning would be used to overload the name ntp_gettime.
I've noticed that sync_file_range is a stub on ppc/ppc64.
The kernel on these arches provides sync_file_range2 syscall with swapped
parameters.
The following completely untested patch ought to fix this.
Due to alignment of 64bit parameters there is a dummy second argument.
But other than that the syscall arguments are directly mapped to the
function arguments.
I've just committed STT_GNU_IFUNC ppc/ppc64 support into prelink,
and this patch is needed on the glibc side. Without it ld.so segfaults,
as in dl-conflict.c sym_map is always NULL. While dl-machine.h could use
RESOLVE_CONFLICT_FIND_MAP macro to compute it, it doesn't make sense,
because with prelink we know it is already properly relocated (all relative
relocations are applied by prelink).
As reported in http://bugzilla.redhat.com/533063 , preadv/pwritev prototypes
are wrong on 32-bit arches with -D_FILE_OFFSET_BITS=64 and as I've just
found, fallocate is wrong too.
The problem is that only off_t is remapped to the 64-bit type transparently,
__off_t is not.
The implementation of posix_openpt on Linux can fail in a few extra
ways if the appropriate pseudo filesystems are not mounted etc. In
some of these cases we have to explicitly set errno.
If a second call to ttyname is not for the same type of device (e.g.,
serial vs ptty) the prefix of the buffer was wrong. Don't rely on
the previous content, always reinitialize it.
The syscall conventions on some Linux archs prevented F_GETOWN from working
correctly in some situations. This can be rectified when using the new
F_GETOWN_EX command.
tst-longjmp_chk passes, tst-longjmp_chk2 fails but that is because
of some limitations of kernel signal delivery on sparc that I need
to fix, it has nothing to do with the longjmp_chk implementation.
(The problem with tst-longjmp_chk2 is that it tries to do a stack
fault SIGSEGV within a stack fault SIGSEGV , and the Linux kernel
will refuse to setup the signal stack and deliver the signal if the
register windows can't be written out to the stack first)
If a signal arrived during a symbol lookup and the signal handler also
required a symbol lookup, the end of the lookup in the signal handler reset
the flag whether restoring AVX/SSE registers is needed. Resetting means
in this case that the tail part of the outer lookup code will try to
restore the registers and this can fail miserably. We now restore to the
previous value which makes nesting calls possible.
On 64-bit machines we should not split doubles into two 32 bit
integer and handle the words separately. We have wide registers.
This patch implements a 64-bit ceil version. Ideally all other
functions will be converted over time.
This patch fixes mixed SSE/AVX audit and checks AVX only once in
_dl_runtime_profile. When an AVX or SSE register value in pltenter is
modified, we have to make sure that the SSE part value is the same in both
lr_xmm and lr_vector fields so that pltexit will get the correct value
from either lr_xmm or lr_vector fields. AVX-enabled pltenter should
update both lr_xmm and lr_vector fields to support stacked AVX/SSE
pltenter functions.
The meaning of the 25-14 bits in EAX returned from cpuid with EAX = 4
has been changed from "the maximum number of threads sharing the cache"
to "the maximum number of addressable IDs for logical processors sharing
the cache" if cpuid takes EAX = 11. We need to use results from both
EAX = 4 and EAX = 11 to get the number of threads sharing the cache.
The 25-14 bits in EAX on Core i7 is 15 although the number of logical
processors is 8. Here is a white paper on this:
http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
This patch correctly counts number of logical processors on Intel CPUs
with EAX = 11 support on cpuid. Tested on Dinnington, Core i7 and
Nehalem EX/EP.
It also fixed Pentium Ds workaround since EBX may not have the right
value returned from cpuid with EAX = 1.
This patch adds 32bit SSE4.2 string functions. It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long. Tested
on 32bit Core i7 and Core 2.
This patch adds multiarch support when configured for i686. I modified
some x86-64 functions to support 32bit. I will contribute 32bit SSE string
and memory functions later.
We use sigaltstack internally which on some systems is a syscall
and should be used as such. Move the x86-64 version to the Linux
specific directory and create in its place a file which always
causes compile errors.
We use a callback function into libc.so to get access to the data
structure with the information and have special versions of the test
macros which automatically use this function.
SSE registers are used for passing parameters and must be preserved
in runtime relocations. This is inside ld.so enforced through the
tests in tst-xmmymm.sh. But the malloc routines used after startup
come from libc.so and can be arbitrarily complex. It's overkill
to save the SSE registers all the time because of that. These calls
are rare. Instead we save them on demand. The new infrastructure
put in place in this patch makes this possible and efficient.
The test now takes the callgraph into account. Only code called
during runtime relocation is affected by the limitation. We now
determine the affected object files as closely as possible from
the outside. This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
This patch introduces a test to make sure no function modifies the
xmm/ymm registers. With the exception of the auditing functions.
The test is probably too pessimistic. All code linked into ld.so
is checked. Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
There will be more than one function which, in multiarch mode, wants
to use SSSE3. We should not test in each of them for Atoms with
slow SSSE3. Instead, disable the SSSE3 bit in the startup code for
such machines.
The original AVX patch used a function pointer to handle the difference
between machines with and without AVX support. This is insecure. A
well-placed memory exploit could lead to redirection of the execution.
Using a variable and several tests is a bit slower but cannot be
exploited in this way.
Some symbols have to be identified process-wide by their name. This is
particularly important for some C++ features (e.g., class local static data
and static variables in inline functions). This cannot completely be
implemented with ELF functionality so far. The STB_GNU_UNIQUE binding
helps by ensuring the dynamic linker will always use the same definition for
all symbols with the same name and this binding.
Some of the new multi-arch string functions for x86-64 were
not aligned to 16 byte boundarie,s possibly creating unnecessary
cache line misses and delays.
This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2
and Core i7. I disabled it on Atom since SSSE3 version is slower for
shorter (<64byte) data.
The terminal output etc is not visible in a core file. The new
libc-internal variable __abort_msg will point to a string with the
message which has been printed before the abort in case abort is
called from inside libc. BZ #10217
The dl-lookup.c changes are needed for prelink (support in prelink
checked into SVN, tested for both i?86 and x86-64), dl-irel.h just
something I discovered by code inspection.
The test to call the indirect function now includes a subtest to
checked whether the symbol is defined. When coming to that point
this is almost always the case. The test for STT_GNU_IFUNC on the
other hand rarely is true. Move it to the front means we don't have
to perform the second test unless really necessary.
SO far Intel and AMD use exactly the same bits meaning the same
things in CPUID index 1. Simplify the code. Should an architecture
come along which doesn't use the same semantics then it must use a
different index value than COMMON_CPUID_INDEX_1.
The latest stratcliff extension exposed a bug in the IA-64 memchr which
uses non-speculative loads to prefetch data. Change the code to use
speculative loads with appropriate fixup. Fixes BZ 10162.