Includes test for setcontext too.
The test directly checks after longjmp if ZA got disabled and the
ZA contents got saved following the lazy saving scheme. It does not
use ACLE code to verify that gcc can interoperate with glibc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The runtime support routines for the call ABI of the Scalable Matrix
Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
libgcc_s.so have an implementation of __arm_za_disable in libc for
libc internal use in longjmp and similar APIs.
__libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
but it's a hidden symbol so it does not need variant PCS marking.
Using __libc_fatal instead of abort because it can print a message and
works in ld.so too. But for now we don't need SME routines in ld.so.
To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
_rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
object is accessed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
clang emits an warning when a double alias redirection is used, to warn
the the original symbol will be used even when weak definition is
overridden. However, this is a common pattern for weak_alias, where
multiple alias are set to same symbol.
Reviewed-by: Fangrui Song <maskray@google.com>
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.
This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.
This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.
This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.
getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.
The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S. It is used as default and only
little-endian is supported (BE uses generic code).
As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.
On a virtualized Linux on Apple M1 it shows the following
improvements (using formatted bench-arc4random data):
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 380.89
arc4random_buf(16) [single-thread] 500.73
arc4random_buf(32) [single-thread] 552.61
arc4random_buf(48) [single-thread] 566.82
arc4random_buf(64) [single-thread] 574.01
arc4random_buf(80) [single-thread] 581.02
arc4random_buf(96) [single-thread] 591.19
arc4random_buf(112) [single-thread] 592.29
arc4random_buf(128) [single-thread] 596.43
-----------------------------------------------
OPTIMIZED MB/s
-----------------------------------------------
arc4random [single-thread] 569.60
arc4random_buf(16) [single-thread] 825.78
arc4random_buf(32) [single-thread] 987.03
arc4random_buf(48) [single-thread] 1042.39
arc4random_buf(64) [single-thread] 1075.50
arc4random_buf(80) [single-thread] 1094.68
arc4random_buf(96) [single-thread] 1130.16
arc4random_buf(112) [single-thread] 1129.58
arc4random_buf(128) [single-thread] 1137.91
-----------------------------------------------
Checked on aarch64-linux-gnu.
A separate asm file is easier to maintain than a macro that expands to
inline asm.
The RTLD_START macro is only needed now because _dl_start is local in
rtld.c, but _start has to call it, if _dl_start was made hidden then it
could be empty.
_dl_skip_args is no longer needed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The rtld audit support show two problems on aarch64:
1. _dl_runtime_resolve does not preserve x8, the indirect result
location register, which might generate wrong result calls
depending of the function signature.
2. The NEON Q registers pushed onto the stack by _dl_runtime_resolve
were twice the size of D registers extracted from the stack frame by
_dl_runtime_profile.
While 2. might result in wrong information passed on the PLT tracing,
1. generates wrong runtime behaviour.
The aarch64 rtld audit support is changed to:
* Both La_aarch64_regs and La_aarch64_retval are expanded to include
both x8 and the full sized NEON V registers, as defined by the
ABI.
* dl_runtime_profile needed to extract registers saved by
_dl_runtime_resolve and put them into the new correctly sized
La_aarch64_regs structure.
* The LAV_CURRENT check is change to only accept new audit modules
to avoid the undefined behavior of not save/restore x8.
* Different than other architectures, audit modules older than
LAV_CURRENT are rejected (both La_aarch64_regs and La_aarch64_retval
changed their layout and there are no requirements to support multiple
audit interface with the inherent aarch64 issues).
* A new field is also reserved on both La_aarch64_regs and
La_aarch64_retval to support variant pcs symbols.
Similar to x86, a new La_aarch64_vector type to represent the NEON
register is added on the La_aarch64_regs (so each type can be accessed
directly).
Since LAV_CURRENT was already bumped to support bind-now, there is
no need to increase it again.
Checked on aarch64-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
The malloc-check debugging feature is tightly integrated into glibc
malloc, so thanks to an idea from Florian Weimer, much of the malloc
implementation has been moved into libc_malloc_debug.so to support
malloc-check. Due to this, glibc malloc and malloc-check can no
longer work together; they use altogether different (but identical)
structures for heap management. This should not make a difference
though since the malloc check hook is not disabled anywhere.
malloc_set_state does, but it does so early enough that it shouldn't
cause any problems.
The malloc check tunable is now in the debug DSO and has no effect
when the DSO is not preloaded.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
This is a common operation when heap tagging is enabled, so inline the
instruction instead of using an extern call.
The .inst directive is used instead of the name of the instruction (or
acle intrinsics) because malloc.c is not compiled for armv8.5-a+memtag
architecture, runtime cpu support detection is used.
Prototypes are removed from the comments as they were not always
correct.
The memset api is suboptimal and does not provide much benefit. Memory
tagging only needs a zeroing memset (and only for memory that's sized
and aligned to multiples of the tag granule), so change the internal
api and the target hooks accordingly. This is to simplify the
implementation of the target hook.
Reviewed-by: DJ Delorie <dj@redhat.com>
This test fails without bug 26798 fixed because some integer registers
likely get clobbered by lazy binding and variant PCS only allows x16
and x17 to be clobbered at call time.
The test requires binutils 2.32.1 or newer for handling variant PCS
symbols. SVE registers are not covered by this test, to avoid the
complexity of handling multiple compile- and runtime feature support
cases.
When glibc is built with branch protection (i.e. with a gcc configured
with --enable-standard-branch-protection), all glibc binaries should
be BTI compatible and marked as such.
It is easy to link BTI incompatible objects by accident and this is
silent currently which is usually not the expectation, so this is
changed into a link error. (There is no linker flag for failing on
BTI incompatible inputs so all warnings are turned into fatal errors
outside the test system when building glibc with branch protection.)
Unfortunately, outlined atomic functions are not BTI compatible in
libgcc (PR libgcc/96001), so to build glibc with current gcc use
'CC=gcc -mno-outline-atomics', this should be fixed in libgcc soon
and then glibc can be built and tested without such workarounds.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Binaries can opt-in to using BTI via an ELF object file marking.
The dynamic linker has to then mprotect the executable segments
with PROT_BTI. In case of static linked executables or in case
of the dynamic linker itself, PROT_BTI protection is done by the
operating system.
On AArch64 glibc uses PT_GNU_PROPERTY instead of PT_NOTE to check
the properties of a binary because PT_NOTE can be unreliable with
old linkers (old linkers just append the notes of input objects
together and add them to the output without checking them for
consistency which means multiple incompatible GNU property notes
can be present in PT_NOTE).
BTI property is handled in the loader even if glibc is not built
with BTI support, so in theory user code can be BTI protected
independently of glibc. In practice though user binaries are not
marked with the BTI property if glibc has no support because the
static linked libc objects (crt files, libc_nonshared.a) are
unmarked.
This patch relies on Linux userspace API that is not yet in a
linux release but in v5.8-rc1 so scheduled to be in Linux 5.8.
Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Passing a second argument to the ifunc resolver allows accessing
AT_HWCAP2 values from the resolver. AArch64 will start using AT_HWCAP2
on linux because for ilp32 to remain compatible with lp64 ABI no more
than 32bit hwcap flags can be in AT_HWCAP which is already used up.
Currently the relocation ordering logic does not guarantee that ifunc
resolvers can call libc apis or access libc objects, so only the
resolver arguments and runtime environment dependent instructions can
be used to do the dispatch (this affects ifunc resolvers outside of
the libc).
Since ifunc resolver is target specific and only supposed to be
called by the dynamic linker, the call ABI can be changed in a
backward compatible way:
Old call ABI passed hwcap as uint64_t, new abi sets the
_IFUNC_ARG_HWCAP flag in the hwcap and passes a second argument
that's a pointer to an extendible struct. A resolver has to check
the _IFUNC_ARG_HWCAP flag before accessing the second argument.
The new sys/ifunc.h installed header has the definitions for the
new ABI, everything is in the implementation reserved namespace.
An alternative approach is to try to support extern calls from ifunc
resolvers such as getauxval, but that seems non-trivial
https://sourceware.org/ml/libc-alpha/2017-01/msg00468.html
* sysdeps/aarch64/Makefile: Install sys/ifunc.h and add tests.
* sysdeps/aarch64/dl-irel.h (elf_ifunc_invoke): Update to new ABI.
* sysdeps/aarch64/sys/ifunc.h: New file.
* sysdeps/aarch64/tst-ifunc-arg-1.c: New file.
* sysdeps/aarch64/tst-ifunc-arg-2.c: New file.
As per <https://sourceware.org/ml/libc-alpha/2014-10/msg00369.html>,
there should not be separate sysdeps/<arch>/soft-fp directories when
those are used by all configurations that use sysdeps/<arch>, and,
more generally, should not be sysdeps/foo/Implies files pointing to a
subdirectory foo/bar. This patch eliminates the
sysdeps/aarch64/soft-fp directory accordingly, merging its contents
into sysdeps/aarch64.
Tested with build-many-glibcs.py that installed stripped shared
libraries for aarch64 configurations are unchanged by this patch.
* sysdeps/aarch64/Implies: Remove aarch64/soft-fp.
* sysdeps/aarch64/Makefile [$(subdir) = math] (CPPFLAGS): Add
-I../soft-fp. Moved from ....
* sysdeps/aarch64/soft-fp/Makefile: ... here. Remove file.
* sysdeps/aarch64/soft-fp/e_sqrtl.c: Move to ....
* sysdeps/aarch64/e_sqrtl.c: ... here.
* sysdeps/aarch64/soft-fp/sfp-machine.h: Move to ....
* sysdeps/aarch64/sfp-machine.h: ... here.
Add unwind info to __libc_start_main so that unwinding continues one
extra level to _start. Similarly add unwind info to backtrace.
Given many targets require this, do this in a general way.
* csu/Makefile: Add -funwind-tables to libc-start.c.
* debug/Makefile: Add -funwind-tables to backtrace.c.
* sysdeps/aarch64/Makefile: Remove CFLAGS-backtrace.c.
* sysdeps/arm/Makefile: Likewise.
* sysdeps/i386/Makefile: Likewise.
* sysdeps/m68k/Makefile: Likewise.
* sysdeps/mips/Makefile: Likewise.
* sysdeps/nios2/Makefile: Likewise.
* sysdeps/sh/Makefile: Likewise.
* sysdeps/sparc/Makefile: Likewise.
When glibc is built with --enable-profile, the ENTRY of
asm functions includes CALL_MCOUNT for profiling.
(matters for binaries static linked against libc_p.a.)
CALL_MCOUNT did not save/restore argument registers
around the _mcount call so it clobbered them.
(it is enough to only save/restore the arguments passed
to a given asm function, but that would be too many asm
changes so it is simpler to always save all argument
registers in this macro.)
float args are not saved: mcount does not clobber the
float regs and currently no asm function takes float
arguments anyway.
[BZ #18707]
* sysdeps/aarch64/Makefile (CFLAGS-mcount.c): Add -mgeneral-regs-only.
* sysdeps/aarch64/sysdep.h (CALL_MCOUNT): Save argument registers.
This patch moves the AArch64 port to the main sysdeps hierarchy. The
move is essentially:
git mv ports/sysdeps/aarch64 sysdeps/aarch64
git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64
The README is updated and I've updated ChangeLog.aarch64 along the
lines of the ARM move. The AArch64 build has been tested to confirm
that there were no changes in objdump -dr output or the shared
objects.