All cancellable syscalls are done by C implementations, so there is no
no need to use a specialized implementation to optimize register usage.
It fixes BZ #25765.
Checked on x86_64-linux-gnu.
Now there is a generic __timeval32 and helpers we can use them for Alpha
instead of the Alpha specific ones.
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The Linux kernel expects rusage to use a 32-bit time_t, even on archs
with a 64-bit time_t (like RV32). To address this let's convert
rusage to/from 32-bit and 64-bit to ensure the kernel always gets
a 32-bit time_t.
While we are converting these functions let's also convert them to be
the y2038 safe versions. This means there is a *64 function that is
called by a backwards compatible wrapper.
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The Linux kernel expects itimerval to use a 32-bit time_t, even on archs
with a 64-bit time_t (like RV32). To address this let's convert
itimerval to/from 32-bit and 64-bit to ensure the kernel always gets
a 32-bit time_t.
While we are converting these functions let's also convert them to be
the y2038 safe versions. This means there is a *64 function that is
called by a backwards compatible wrapper.
Tested-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On y2038 safe 32-bit systems the Linux kernel expects itimerval
and rusage to use a 32-bit time_t, even though the other time_t's
are 64-bit. There are currently no plans to make 64-bit time_t versions
of these structs.
There are also other occurrences where the time passed to the kernel via
timeval doesn't match the wordsize.
To handle these cases let's define a new macro
__KERNEL_OLD_TIMEVAL_MATCHES_TIMEVAL64. This macro specifies if the
kernel's old_timeval matches the new timeval64. This should be 1 for
64-bit architectures except for Alpha's osf syscalls. The define should
be 0 for 32-bit architectures and Alpha's osf syscalls.
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The corner cases included were generated using exhaustive search
for all float/binary32 values on x86_64 (comparing to MPFR for
correct rounding to nearest).
For the j0/j1/y0 functions, only cases with ulp error <= 9 were
included.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Adds a POWER9 version of fmaf128 that uses the xsmaddqp
instruction.
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This addresses an issue that is present mainly on SMP machines running
threaded code. In a typical indirect call or PLT import stub, the
target address is loaded first. Then the global pointer is loaded into
the PIC register in the delay slot of a branch to the target address.
During lazy binding, the target address is a trampoline which transfers
to _dl_runtime_resolve().
_dl_runtime_resolve() uses the relocation offset stored in the global
pointer and the linkage map stored in the trampoline to find the
relocation. Then, the function descriptor is updated.
In a multi-threaded application, it is possible for the global pointer
to be updated between the load of the target address and the global
pointer. When this happens, the relocation offset has been replaced
by the new global pointer. The function pointer has probably been
updated as well but there is no way to find the address of the function
descriptor and to transfer to the target. So, _dl_runtime_resolve()
typically crashes.
HP-UX addressed this problem by adding an extra pc-relative branch to
the trampoline. The descriptor is initially setup to point to the
branch. The branch then transfers to the trampoline. This allowed
the trampoline code to figure out which descriptor was being used
without any modification to user code. I didn't use this approach
as it is more complex and changes function pointer canonicalization.
The order of loading the target address and global pointer in
indirect calls was not consistent with the order used in import stubs.
In particular, $$dyncall and some inline versions of it loaded the
global pointer first. This was inconsistent with the global pointer
being updated first in dl-machine.h. Assuming the accesses are
ordered, we want elf_machine_fixup_plt() to store the global pointer
first and calls to load it last. Then, the global pointer will be
correct when the target function is entered.
However, just to make things more fun, HP added support for
out-of-order execution of accesses in PA 2.0. The accesses used by
calls are weakly ordered. So, it's possibly under some circumstances
that a function might be entered with the wrong global pointer.
However, HP uses weakly ordered accesses in 64-bit HP-UX, so I assume
that loading the global pointer in the delay slot of the branch must
work consistently.
The basic fix for the race is a combination of modifying user code to
preserve the address of the function descriptor in register %r22 and
setting the least-significant bit in the relocation offset. The
latter was suggested by Carlos as a way to distinguish relocation
offsets from global pointer values. Conventionally, %r22 is used
as the address of the function descriptor in calls to $$dyncall.
So, it wasn't hard to preserve the address in %r22.
I have updated gcc trunk and gcc-9 branch to not clobber %r22 in
$$dyncall and inline indirect calls. I have also modified the import
stubs in binutils trunk and the 2.33 branch to preserve %r22. This
required making the stubs one instruction longer but we save one
relocation. I also modified binutils to align the .plt section on
a 8-byte boundary. This allows descriptors to be updated atomically
with a floting-point store.
With these changes, _dl_runtime_resolve() can fallback to an alternate
mechanism to find the relocation offset when it has been clobbered.
There's just one additional instruction in the fast path. I tested
the fallback function, _dl_fix_reloc_arg(), by changing the branch to
always use the fallback. Old code still runs as it did before.
Fixes bug 23296.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Similar to fenvinline.h removal, this kind of optimization is better
implemented by the compiler. Also newer code avoid setting exceptions
directly (for instance the code to make new logf, log2f and powf
implementatation to now support SVID compat).
The BZ#94194 [1] the corresponding GCC bug for adding replacements
for these on x86.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94194
Similar to string2.h (18b10de7ce) and string3.h (09a596cc2c) this
patch removes the fenvinline.h on all architectures. Currently
only powerpc implements some optimizations. This kind of optimization
is better implemented by the compiler (which handles the architecture
ISA transparently).
Also, for the specific optimized powerpc implementation the code is
becoming convoluted and these micro-optimization are hardly wildly
used, even more being a possible hotspot in realword cases
(non-default rounding are used only on specific cases and exception
handling are done most likely only on errors path). Only x86
implements similar optimization (on fenv.h) also indicates that
these should no be on libc.
The math/test-fenv already covers all math/test-fenvinline tests,
so it is safe to remove it.
The powerpc fegetround optimization is moved to internal
fenv_libc.h.
The BZ#94193 [1] the corresponding GCC bug for adding replacements
for these on powerpc.
Checked on x86_64-linux-gnu and powerpc64le-linux-gnu.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94193
These functions are alpha specifc, rename them to be clear.
Let's also rename the header file from tv32-compat.h to
alpha-tv32-compat.h. This is to avoid conflicts with the one we will
introduce later.
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Some of these files depend on the avoidance of using the various
register sets of POWER. When enabling the IEEE 128 long double,
we must be sure to disable this ABI as some compilers will
refuse to compile if -mno-vsx and -mabi=ieeelongdouble are both
present.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
In practice, this flag should be applied globally, but it makes a good
sanity check to ensure ibm128 and ieee128 long double files are not
getting mismatched. _Float128 files use no long double, thus are
always safe to use this option.
Similarly, when investigating the linker complaints, difftime
makes trivial, self contained, usage of long double, so thus it
is also explicitly marked as such.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This better resembles the default linking process with the gnulibs,
and also resolves the increasingly difficult to maintain
f128-loader-link usage on powerpc64le as some libgcc symbols are
dependent on those found in the loader (ld).
Ensure the correct ldouble abi flags are applied to ibm128 files and
nldbl files. Remove the IEEE options if used, and apply the flags
used to build ldouble files which are ibm128 abi.
nldbl tests are a little tricky. To use the support, we must remove
all ldouble abi flags, and ensure -mlong-double-64 is used.
Co-authored-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Co-authored-by: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
Tweak the PLT bypass magic when building glibc with long double
redirects. This is made more difficult by the fact we only get
one chance to redirect functions. This happens via the public
headers.
There are roughly three classes of redirect we need to attend to
today:
1. Simple redirects, redirected via cdef macro overrides and
and new libc_hidden_ldbl_proto macro.
2. Internal usage of internal API, e.g __snprintf, which has
no direct analogue. This is bypassed directly on case-by-
case basis.
3. Double redirects, e.g sscanf and related. These require
a heavier handed approach of macro renaming to existing
symbols.
Most simple redirects are handled via 1. Ideally, the libc_*
macro would live in libc-symbols.h, but in practice the macros
needed for it to do anything useful live in cdefs.h, so they
are defined in the local override.
Notably, the internal name of the asprintf generated for ieee ldbl
redirects is renamed to work with internal prefixed usage.
This resolves the local plt usage introduced when building glibc
with ldbl == ieee128 on ppc64le.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
During the conversion to support 64 bit time on some architectures with
__WORDSIZE == 32 && __TIMESIZE != 64 the libc_hidden_def attribute for
eligible functions was by mistake omitted.
This patch fixes this issue and exports (and allows using) those
functions when Y2038 support is enabled in glibc.
With mathinline removal there is no need to keep building and testing
inline math tests.
The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries. The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.
Checked on x86_64-linux-gnu and i686-linux-gnu.
This is similar to x86 (da75c1b180) and powerpc (32ea729996)
mathinline.h removal. The required macros to build the fpu routines
are moved to mathimpl.h, while the inline optimization macros for
atan, tanh, rint, log1p, significand, trunc, floor, ceil, isinf,
finite, scalbn, isnan, scalbln, nearbyint, lrint, and sincos are removed.
The gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94204 was
created to track builtin support.
Checked with a build against m68k-linux-gnu, resulting binaries
are similar with and without the patch.
Since legacy bitmap doesn't cover jitted code generated by legacy JIT
engine, it isn't very useful. This patch removes ARCH_CET_LEGACY_BITMAP
and treats indirect branch tracking similar to shadow stack by removing
legacy bitmap support.
Tested on CET Linux/x86-64 and non-CET Linux/x86-64.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Further optimize integer memcpy. Small cases now include copies up
to 32 bytes. 64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies. Comments have been rewritten.
This conversion patch for supporting 64 bit time for futimesat only differs
from the work performed for futimes (when providing __futimes64) with passing
also the file name (and path) to utimensat.
All the design and conversion decisions are exactly the same as for futimens
conversion.
This conversion patch for supporting 64 bit time for lutimes mostly differs from
the work performed for futimes (when providing __futimes64) with adding the
AT_SYMLINK_NOFOLLOW flag to utimensat.
It also supports passing file name instead of file descriptor number, but this
is not relevant for utimensat used to implement it.
All the design and conversion decisions are exactly the same as for futimens
conversion.
This patch provides new __futimes64 explicit 64 bit function for setting file's
64 bit attributes for access and modification time (by specifying file
descriptor number).
Internally, the __utimensat64_helper function is used. This patch is necessary
for having architectures with __WORDSIZE == 32 Y2038 safe.
Moreover, a 32 bit version - __futimes has been refactored to internally use
__futimes64.
The __futimes is now supposed to be used on systems still supporting 32
bit time (__TIMESIZE != 64) - hence the necessary conversion of struct timeval
to 64 bit struct __timeval64.
The check if struct timevals' usec fields are in the range between 0 and 1000000
has been removed as Linux kernel performs it internally in the implementation
of utimensat (the conversion between struct __timeval64 and __timespec64 is not
relevant for this particular check).
Last but not least, checks for tvp{64} not being NULL have been preserved from
the original code as some legacy user space programs may rely on it.
Build tests:
./src/scripts/build-many-glibcs.py glibcs
Run-time tests:
- Run specific tests on ARM/x86 32bit systems (qemu):
https://github.com/lmajewski/meta-y2038 and run tests:
https://github.com/lmajewski/y2038-tests/commits/master
Above tests were performed with Y2038 redirection applied as well as without to
test the proper usage of both __futimes64 and __futimes.
It seems that some gcc versions might generates a stack frame for the
sigreturn stub requires on sparc signal handling. For instance:
$ cat test.c
#define _GNU_SOURCE
#include <sys/syscall.h>
__attribute__ ((__optimize__ ("-fno-stack-protector")))
void
__sigreturn_stub (void)
{
__asm__ ("mov %0, %%g1\n\t"
"ta 0x10\n\t"
: /* no outputs */
: "i" (SYS_rt_sigreturn));
}
$ gcc -v
[...]
gcc version 9.2.1 20200224 (Debian 9.2.1-30)
$ gcc -O2 -m64 test.c -S -o -
[...]
__sigreturn_stub:
save %sp, -176, %sp
#APP
! 9 "t.c" 1
mov 101, %g1
ta 0x10
! 0 "" 2
#NO_APP
.size __sigreturn_stub, .-__sigreturn_stub
As indicated by kernel developers [1], the sigreturn stub can not change
the register window or the stack pointer since the kernel has setup the
restore frame at a precise location relative to the stack pointer when
the stub is invoked.
I tried to play with some compiler flags and even with _Noreturn and
__builtin_unreachable after the asm does not help (and Sparc does not
support naked functions).
To avoid similar issues, as the stack-protector support also have
stumbled, this patch moves the implementation of the sigreturn stubs to
assembly.
Checked on sparcv9-linux-gnu and sparc64-linux-gnu with gcc 9.2.1
and gcc 7.5.0.
[1] https://lkml.org/lkml/2016/5/27/465
Soon, powerpc64le will need to provide extra compiler flags to the long
double files in order to continue to build using the IBM 128-bit
extended floating point type as long double.
This patch creates test-ibm128* tests from the long double function tests.
In order to explicitly test IBM long double functions -mabi=ibmlongdouble is
added to CFLAGS.
Likewise, update the test headers to correct choose ULPs when redirects
are enabled.
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Co-authored-by: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
A recent change to fenvinline.h modified the check if __e is a
a power of 2 inside feraiseexcept and feclearexcept macros. It
introduced the use of the powerof2 macro but also removed the
if statement checking whether __e != 0 before issuing an mtfsb*
instruction. This is problematic because powerof2 (0) evaluates
to 1 and without the removed if __e is allowed to be 0 when
__builtin_clz is called. In that case the value 32 is passed
to __MTFSB*, which is invalid.
This commit uses __builtin_popcount instead of powerof2 to fix this
issue and avoid the extra check for __e != 0. This was the approach
used by the initial versions of that previous patch.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
The commit "arm: Split BE/LE abilist"
(1673ba87fe) changed the soft-fp order for
ARM selection when __SOFTFP__ is defined by the compiler.
On 2.30 the sysdeps order is:
2.30
sysdeps/unix/sysv/linux/arm
sysdeps/arm/nptl
sysdeps/unix/sysv/linux
sysdeps/nptl
sysdeps/pthread
sysdeps/gnu
sysdeps/unix/inet
sysdeps/unix/sysv
sysdeps/unix/arm
sysdeps/unix
sysdeps/posix
sysdeps/arm/nofpu
sysdeps/ieee754/soft-fp
sysdeps/arm
sysdeps/wordsize-32
sysdeps/ieee754/flt-32
sysdeps/ieee754/dbl-64
sysdeps/ieee754
sysdeps/generic
While on master is:
sysdeps/unix/sysv/linux/arm/le
sysdeps/unix/sysv/linux/arm
sysdeps/arm/nptl
sysdeps/unix/sysv/linux
sysdeps/nptl
sysdeps/pthread
sysdeps/gnu
sysdeps/unix/inet
sysdeps/unix/sysv
sysdeps/unix/arm
sysdeps/unix
sysdeps/posix
sysdeps/arm/le
sysdeps/arm
sysdeps/wordsize-32
sysdeps/ieee754/flt-32
sysdeps/ieee754/dbl-64
sysdeps/arm/nofpu
sysdeps/ieee754/soft-fp
sysdeps/ieee754
sysdeps/generic
It make the build select some routines (fadd, fdiv, fmul, fsub, and fma)
on ieee754/flt-32 and ieee754/dbl-64 that requires fenv support to be
correctly rounded which in turns lead to math failures since the
__SOFTFP__ does not have fenv support.
With this patch the order is now:
sysdeps/unix/sysv/linux/arm/le
sysdeps/unix/sysv/linux/arm
sysdeps/arm/nptl
sysdeps/unix/sysv/linux
sysdeps/nptlsysdeps/pthread
sysdeps/gnu
sysdeps/unix/inet
sysdeps/unix/sysv
sysdeps/unix/arm
sysdeps/unix
sysdeps/posix
sysdeps/arm/le/nofpu
sysdeps/arm/nofpu
sysdeps/ieee754/soft-fp
sysdeps/arm/le
sysdeps/arm
sysdeps/wordsize-32
sysdeps/ieee754/flt-32
sysdeps/ieee754/dbl-64
sysdeps/ieee754
sysdeps/generic
Checked on arm-linux-gnuaebi.
The kernel might not clear the padding value for the ipc_perm mode
fields in compat mode (32 bit running on a 64 bit kernel). It was
fixed on v4.14 when the ipc compat code was refactored to move
(commits 553f770ef71b, 469391684626, c0ebccb6fa1e).
Although it is most likely a kernel issue, it was shown only due
BZ#18231 fix which made all the SysVIPC mode_t 32-bit regardless of
the kABI.
This patch fixes it by explicitly zeroing the upper bits for such
cases. The __ASSUME_SYSVIPC_BROKEN_MODE_T case already handles
it with the shift.
(The aarch64 ipc_priv.h is superflous since
__ASSUME_SYSVIPC_DEFAULT_IPC_64 is now defined as default).
Checked on i686-linux-gnu on 3.10 and on 4.15 kernel.
fstatat64 depends on inlining to produce the desired __fxstatat64
call, which does not happen with -Os, leading to a link failure
with an undefined reference to fstatat64. __fxstatat64 has a macro
definition in include/sys/stat.h and thus avoids the problem.
After recent discussions:
- "[PATCH] s390: Remove backchain-based fallback from backtrace"
https://www.sourceware.org/ml/libc-alpha/2020-02/msg00287.html
- "Re: [PATCH 07/11] s390: Implement backtrace on top of <unwind-link.h>"
https://www.sourceware.org/ml/libc-alpha/2020-02/msg00637.html
We've checked and decided to remove the backchain:
We don't know of any environments without libgcc. Thus the backchain
unwinder is not used. If somebody builds with -mbackchain and without
fasynchronous-unwind-tables and has libgcc installed, then the
libgcc unwinder is called but not the backchain-based fallback.
This step allows to get rid of the s390x specific backtrace.c files at all.
Furthermore the now used debug/backtrace.c version has some more
advantages:
- Free all resources if necessary. (libc_freeres_fn)
- Remove NULL address above _start.
- Check whether we make any progress while getting addresses.
The combination of GCC 10 and binutils 2.35 (both unreleased) is no
longer able to link the dynamic linker, due to a GP16 relocation
overflow error:
glibc/alpha-linux-gnu/elf/librtld.os: in function `calloc': glibc/elf/../include/rtld-malloc.h:44:(.text+0xd98): relocation truncated to fit: GPREL16 against symbol `__rtld_calloc' defined in .data.rel.ro section in glibc/alpha-linux-gnu/elf/librtld.os
glibc/alpha-linux-gnu/elf/librtld.os: in function `malloc': glibc/elf/../include/rtld-malloc.h:56:(.text+0x2978): relocation truncated to fit: GPREL16 against symbol `__rtld_malloc' defined in .data.rel.ro section in glibc/alpha-linux-gnu/elf/librtld.os
This is arguably a linker bug; the object files and their section size
requirements look reasonable enough.
Using -fPIC (the default) works around this issue.
This patch replaces auto generated wrapper (as described in
sysdeps/unix/sysv/linux/syscalls.list) for utime with one which adds extra
support for setting file's access and modification 64 bit time on machines
with __TIMESIZE != 64.
Internally, the __utimensat_time64 helper function is used. This patch is
necessary for having architectures with __WORDSIZE == 32 && __TIMESIZE != 64
Y2038 safe.
Moreover, a 32 bit version - __utime has been refactored to internally use
__utime64.
The __utime is now supposed to be used on systems still supporting 32
bit time (__TIMESIZE != 64) - hence the necessary conversion between struct
utimbuf and struct __utimbuf64.
Build tests:
./src/scripts/build-many-glibcs.py glibcs
Run-time tests:
- Run specific tests on ARM/x86 32bit systems (qemu):
https://github.com/lmajewski/meta-y2038 and run tests:
https://github.com/lmajewski/y2038-tests/commits/master
Above tests were performed with Y2038 redirection applied as well as
without to test proper usage of both __utime64 and __utime.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch provides new __utimes64 explicit 64 bit function for setting file's
64 bit attributes for access and modification time.
Internally, the __utimensat64_helper function is used. This patch is necessary
for having architectures with __WORDSIZE == 32 Y2038 safe.
Moreover, a 32 bit version - __utimes has been refactored to internally use
__utimes64.
The __utimes is now supposed to be used on systems still supporting 32
bit time (__TIMESIZE != 64) - hence the necessary conversion of struct
timeval to 64 bit struct __timeval64.
Build tests:
./src/scripts/build-many-glibcs.py glibcs
Run-time tests:
- Run specific tests on ARM/x86 32bit systems (qemu):
https://github.com/lmajewski/meta-y2038 and run tests:
https://github.com/lmajewski/y2038-tests/commits/master
Above tests were performed with Y2038 redirection applied as well as without
to test proper usage of both __utimes64 and __utimes.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Due to the built-in tables, __NR_vfork is always defined, so the
fork-based fallback code is never used.
(It appears that the vfork system call was wired up when the port was
contributed to the kernel.)
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>