The personality system call, starting with linux kernel commit
v2.6.29-6609-g11d06b2a1e5658f448a308aa3beb97bacd64a940, always
successfully changes the personality if requested. The syscall
wrapper, however, still can return an error in the following cases:
- the value returned by the system call looks like an error
due to architecture limitations of 32-bit kernels;
- a personality greater than 0xffffffff is passed to the system call,
and the 64-bit kernel does not have commit
v2.6.35-rc1-372-g485d527686850d68a0e9006dd9904f19f122485e
that would truncate this value to unsigned int;
- on sparc64, the value returned by the system call looks like an error
due to sparc64 kernel sign extension bug.
The solution is three-fold:
- move generic syscalls.list personality entry to generic 64-bit
syscalls.list file;
- for each 32-bit architecture that use negated errno semantics,
add a NOERRNO personality entry to their syscalls.list file;
- for sparc64 and 32-bit architectures that use dedicated registers
to flag syscall errors, add a wrapper around personality syscall;
if the system call return value is flagged as an error, this wrapper
returns the negated "would be errno" value, otherwise it returns
the system call return value; on sparc64, it also truncates the
personality argument to unsigned int before passing it to the kernel.
[BZ #19408]
* sysdeps/unix/sysv/linux/personality.c: New file.
* sysdeps/unix/sysv/linux/sparc/sparc64/personality.c: Likewise.
* sysdeps/unix/sysv/linux/tst-personality.c: Likewise.
* sysdeps/unix/sysv/linux/Makefile [$(subdir) == misc]
(sysdep_routines): Add personality.
(tests): Add tst-personality.
* sysdeps/unix/sysv/linux/syscalls.list (personality): Move ...
* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list: ... here.
* sysdeps/unix/sysv/linux/arm/syscalls.list (personality): New entry.
* sysdeps/unix/sysv/linux/hppa/syscalls.list (personality): Likewise.
* sysdeps/unix/sysv/linux/i386/syscalls.list (personality): Likewise.
* sysdeps/unix/sysv/linux/m68k/syscalls.list (personality): Likewise.
* sysdeps/unix/sysv/linux/microblaze/syscalls.list (personality):
Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/syscalls.list (personality):
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/syscalls.list (personality):
Likewise.
* sysdeps/unix/sysv/linux/sh/syscalls.list (personality): Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/syscalls.list (personality):
Likewise.
Various Linux kernel syscalls have become obsolete over time.
Specifically, the following are obsolete in all kernel versions
supported by glibc, are not present for architectures more recently
added to the kernel, and as such, the wrapper functions for them
should be compat symbols, not in static libc and not available for new
links with shared libc.
* bdflush: in Linux 2.6, does nothing if present.
* create_module get_kernel_syms query_module: Linux 2.4 module
interface, syscalls not present in Linux 2.6.
* uselib: part of the mechanism for loading a.out shared libraries,
irrelevant with ELF.
This patch adds support for syscalls.list to list syscall aliases of
the form NAME@VERSION:OBSOLETED, with SHLIB_COMPAT conditionals being
generated for such aliases. Those five syscalls are then made into
compat symbols (obsoleted in glibc 2.23, so future ports won't have
these symbols at all), with the header <sys/kdaemon.h> declaring
bdflush being removed. When we move to 3.2 as minimum kernel version,
the same can be done for nfsservctl (removed in Linux 3.1) as well.
Tested for x86_64 and x86 (testsuite, as well as checking that the
symbols in question indeed become compat symbols, that they are indeed
omitted from static libc, and that the generated SHLIB_COMPAT
conditionals look right).
[BZ #18472]
* sysdeps/unix/Makefile ($(objpfx)stub-syscalls.c): Handle entries
for the form NAME@VERSION:OBSOLETED and generate SHLIB_COMPAT
conditionals for them.
* sysdeps/unix/make-syscalls.sh (emit_weak_aliases): Likewise.
* sysdeps/unix/sysv/linux/sys/kdaemon.h: Remove file.
* sysdeps/unix/sysv/linux/Makefile (sysdep_headers): Remove
sys/kdaemon.h.
* sysdeps/unix/sysv/linux/syscalls.list (bdflush): Make into
compat-only syscall, obsoleted in glibc 2.23.
(create_module): Likewise.
(get_kernel_syms): Likewise.
(query_module): Likewise.
(uselib): Likewise.
* manual/sysinfo.texi (System Parameters): Do not mention bdflush.
Profiling git's test suite, Linus noted [1] that a disproportionately
large amount of time was spent reading /proc/meminfo. This is done by
the glibc functions get_phys_pages and get_avphys_pages, but they only
need the MemTotal and MemFree fields, respectively. That same
information can be obtained with a single syscall, sysinfo, instead of
six: open, fstat, mmap, read, close, munmap. While sysinfo also
provides more than necessary, it does a lot less work than what the
kernel needs to do to provide the entire /proc/meminfo. Both strace -T
and in-app microbenchmarks shows that the sysinfo() approach is
roughly an order of magnitude faster.
sysinfo() is much older than what glibc currently requires, so I don't
think there's any reason to keep the old parsing code. Moreover, this
makes get_[av]phys_pages work even in the absence of /proc.
Linus noted that something as simple as 'bash -c "echo"' would trigger
the reading of /proc/meminfo, but gdb says that many more applications
than just bash are affected:
Starting program: /bin/bash "-c" "echo"
Breakpoint 1, __get_phys_pages () at ../sysdeps/unix/sysv/linux/getsysstats.c:283
283 ../sysdeps/unix/sysv/linux/getsysstats.c: No such file or directory.
(gdb) bt
So it seems that any application that uses qsort on a moderately sized
array will incur this cost (once), which is obviously proportionately
more expensive for lots of short-lived processes (such as the git test
suite).
[1] http://thread.gmane.org/gmane.linux.kernel/2019285
Signed-off-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
* sysdeps/unix/sysv/linux/getsysstats.c (__get_phys_pages):
Use sysinfo system call instead of parsing /proc/meminfo.
* sysdeps/unix/sysv/linux/getsysstats.c (__get_avphys_pages):
Likewise.
mq_receive calls mq_timedreceive, and mq_send calls mq_timedsend. But
mq_receive and mq_send were in POSIX by 1996, while mq_timed* were
added in the 2001 edition of POSIX. This patch fixes this by making
mq_timed* into weak aliases for __mq_timed* and calling the
__mq_timed* names.
Tested for x86_64 and x86 (testsuite, and that disassembly of
installed shared libraries is unchanged by the patch).
[BZ #18545]
* rt/mq_timedreceive.c (mq_timedreceive): Rename to
__mq_timedreceive and define as alias of __mq_timedreceive. Use
hidden_weak.
* rt/mq_timedsend.c (mq_timedsend): Rename to __mq_timedsend and
define as alias of __mq_timedsend. Use hidden_weak.
* sysdeps/unix/sysv/linux/syscalls.list (mq_timedsend): Use
__mq_timedsend as strong name.
(mq_timedreceive): Use __mq_timedreceive as strong name.
* include/mqueue.h (__mq_timedsend): Declare. Use hidden_proto.
(__mq_timedreceive): Likewise.
* sysdeps/unix/sysv/linux/mq_receive.c (mq_receive): Call
__mq_timedreceive instead of mq_timedreceive.
* sysdeps/unix/sysv/linux/mq_send.c (mq_send): Call __mq_timedsend
instead of mq_timedsend.
* conform/Makefile (test-xfail-UNIX98/mqueue.h/linknamespace):
Remove variable.
The syscall wrappers mechanism automatically creates hidden aliases
for syscalls with libc_hidden_def / libc_hidden_weak. The use of
libc_hidden_* has the side-effect that for syscall wrappers in
non-libc libraries those aliases are not created. In turn, this means
that three mq_* syscalls in sysdeps/unix/sysv/linux/syscalls.list list
the __GI_* names explicitly.
The use of libc_hidden_* dates back to the original introduction of
that support in
2002-08-03 Roland McGrath <roland@redhat.com>
* sysdeps/unix/make-syscalls.sh: Generate libc_hidden_def or
libc_hidden_weak for every system call symbol defined.
(predating the non-libc syscalls in question) and I see no reason for
excluding non-libc syscalls. This patch changes the code to use
hidden_def / hidden_weak (via a wrapper syscall_hidden_def in the case
where the argument is itself a macro, so that the argument gets
expanded before concatenation with __GI_), so avoiding the need to
specify the hidden aliases explicitly in this case.
Tested for x86_64 and x86 (testsuite, and that disassembly of
installed stripped shared libraries is unchanged by the patch; the
mq_* symbols change from weak to strong, which is of no significance
and two of them will shortly change back to weak as part of a fix for
bug 18545).
* sysdeps/unix/make-syscalls.sh (emit_weak_aliases): Use
hidden_def and hidden_weak instead of libc_hidden_def and
libc_hidden_weak.
(top level): Refer to hidden_def in comment.
* sysdeps/unix/syscall-template.S (syscall_hidden_def): New
macro. Use it instead of libc_hidden_def.
* sysdeps/unix/sysv/linux/syscalls.list (mq_timedsend): Do not
specify __GI_* name explicitly.
(mq_timedreceive): Likewise.
(mq_setattr): Likewise.
Continuing the removal of unused __libc_* function names, this patch
removes the __libc_nanosleep name.
Tested for x86_64 (testsuite, and that the disassembly of installed
shared libraries is unchanged by the patch; __nanosleep changes from
weak to strong, which is of no significance).
* posix/nanosleep.c (__libc_nanosleep): Rename to __nanosleep.
(__nanosleep): Do not define as alias.
(nanosleep): Define as alias of __nanosleep.
* sysdeps/unix/sysv/linux/syscalls.list (nanosleep): Remove
__libc_nanosleep name.
glibc has lots of __libc_* function names that no longer serve any
purpose (are not used for any calls or exported at a public symbol
version). This patch removes __libc_creat. It has the effect of
creat becoming a strong symbol instead of a weak symbol in various
cases, but that's fine; in shared libraries it doesn't matter at all,
while for static linking the only other symbol sometimes defined in
the same object is creat64, and whenever creat64 is a reserved name so
is creat.
Other such cases of unnecessary __libc_* symbols are expected to be
dealt with in separate patches over time.
Tested for x86_64 (testsuite, and that the disassembly of installed
shared libraries is unchanged by the patch).
* include/fcntl.h (__libc_creat): Remove declaration.
* io/creat.c (__libc_creat): Rename to creat.
(creat): Do not define as alias.
* sysdeps/unix/sysv/linux/alpha/creat.c (creat64): Define as alias
of creat instead of __libc_creat.
* sysdeps/unix/sysv/linux/generic/creat.c (__libc_creat): Rename
to creat.
(creat): Do not define as alias.
[__WORDSIZE == 64] (creat64): Define as alias of creat instead of
__libc_creat.
* sysdeps/unix/sysv/linux/syscalls.list (creat): Do not define
__libc_creat name.
* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (creat):
Likewise.
Bug 14132 is removal of the old INTDEF/INTUSE system of *_internal
aliases as obsoleted by the hidden_proto / hidden_def system. Various
cases were cleaned up in 2012, but some remain. This patch removes
the use of this mechanism for __adjtimex.
Tested for x86_64 that stripped installed shared libraries are
unchanged by the patch.
[BZ #14132]
* sysdeps/unix/sysv/linux/include/sys/timex.h: New file.
* sysdeps/unix/sysv/linux/adjtime.c [!ADJTIMEX] (ADJTIMEX): Do not
use INTUSE.
[!ADJTIMEX] (INTUSE(__adjtimex)): Remove declaration.
* sysdeps/unix/sysv/linux/alpha/adjtime.c (__adjtimex_internal):
Remove alias.
(__adjtimex): Define using libc_hidden_ver.
* sysdeps/unix/sysv/linux/ntp_gettime.c (INTUSE(__adjtimex)):
Remove declaration.
(ntp_gettime): Call __adjtimex directly.
* sysdeps/unix/sysv/linux/ntp_gettimex.c (INTUSE(__adjtimex)):
Remove declaration.
(ntp_gettimex): Call __adjtimex directly.
* sysdeps/unix/sysv/linux/syscalls.list (adjtimex): Remove
__adjtimex_internal alias.
Continuing the move of syscall definitions to syscalls.list, where
previous cleanups have made this possible, this patch moves the
definition of execve. (In this case, it was the removal of bounded
pointers support, rather than old kernel support, which made the move
possible.)
Tested for x86_64.
[BZ #14138]
* sysdeps/unix/sysv/linux/execve.c: Remove file.
* sysdeps/unix/sysv/linux/syscalls.list (execve): Add syscall.
Continuing the move of syscall definitions to syscalls.list, where the
removal of support for old kernel versions has made this possible,
this patch moves definitions of various *at functions in
sysdeps/unix/sysv/linux/.
These particular moves are straightforward: there are no #includes of
these source files, no special architecture-specific versions, no
special symbol version handling and no aliases. Each source file can
be replaced by a single line in sysdeps/unix/sysv/linux/syscalls.list.
Tested for x86_64.
[BZ #14138]
* sysdeps/unix/sysv/linux/syscalls.list (fchownat): New syscall.
(linkat): Likewise.
(mkdirat): Likewise.
(readlinkat): Likewise.
(renameat): Likewise.
(symlinkat): Likewise.
(unlinkat): Likewise.
* sysdeps/unix/sysv/linux/fchownat.c: Remove file.
* sysdeps/unix/sysv/linux/linkat.c: Likewise.
* sysdeps/unix/sysv/linux/mkdirat.c: Likewise.
* sysdeps/unix/sysv/linux/readlinkat.c: Likewise.
* sysdeps/unix/sysv/linux/renameat.c: Likewise.
* sysdeps/unix/sysv/linux/symlinkat.c: Likewise.
* sysdeps/unix/sysv/linux/unlinkat.c: Likewise.
type __THROW marker of splice, vmsplice, and tee.
* sysdeps/unix/sysv/linux/ia64/bits/fcntl.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/fcntl.h: Likewise.
* sysdeps/unix/sysv/linux/s390/bits/fcntl.h: Likewise.
* sysdeps/unix/sysv/linux/sh/bits/fcntl.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/fcntl.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/bits/fcntl.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/bits/fcntl.h: Likewise.
* sysdeps/unix/sysv/linux/syscalls.list: Mark splice, vmsplice, and tee
as cancellation points.
* stdio-common/vfprintf.c (vfprintf): Don't shadow workstart variable,
reinitialize workend at the start of each do_positional format spec
loop, free workstart before do_positional loops.
(printf_unknown): Fix size of work_buffer.
* stdio-common/tst-sprintf.c (main): Add 3 new testcases.
* sysdeps/unix/sysv/linux/posix_madvise.c: New file.
* sysdeps/unix/sysv/linux/syscalls.list: Remove posix_madvise entry.
* stdio-common/tfformat.c (sprint_doubles): Some more tests.
* sysdeps/unix/sysv/linux/Versions [libc, GLIBC_2.4]: Export
unshare.
* sysdeps/unix/sysv/linux/syscalls.list: Add unshare syscall.
* sysdeps/unix/Makefile ($(objpfx)stub-syscalls.c): Add some
preprocessor magic so that the compiler won't see the prototypes
for the functions we are defining as stubs.
epoll_wait): Align with poll, make cancelable.
2005-11-15 Jakub Jelinek <jakub@redhat.com>
* io/sys/stat.h (fstatat): Don't use __THROW together with
__REDIRECT_NTH.