The powerpc optimization to provide a fast stacktrace requires some
ad-hoc code to handle Linux signal frames and the change is fragile
once the kernel decides to slight change its execution sequence [1].
The generic implementation work as-is and it should be future proof
since the kernel provides the expected CFI directives in vDSO shared
page.
Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and
powerpc64-linux-gnu.
[1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html
As noted in bug 28475, the access attribute on memfrob in <string.h>
is incorrect: the function both reads and writes the memory pointed to
by its argument, so it needs to use __read_write__, not
__write_only__. This incorrect attribute results in a build failure
for accessing uninitialized memory for s390x-linux-gnu-O3 with
build-many-glibcs.py using GCC mainline.
Correct the attribute. Fixing this shows up that some calls to
memfrob in elf/ tests are reading uninitialized memory; I'm not
entirely sure of the purpose of those calls, but guess they are about
ensuring that the stack space is indeed allocated at that point in the
function, and so it matters that they are calling a function whose
semantics are unknown to the compiler. Thus, change the first memfrob
call in those tests to use explicit_bzero instead, as suggested by
Florian in
<https://sourceware.org/pipermail/libc-alpha/2021-October/132119.html>,
to avoid the use of uninitialized memory.
Tested for x86_64, and with build-many-glibcs.py (GCC mainline) for
s390x-linux-gnu-O3.
In _FORTIFY_SOURCE=3, the size expression may be non-constant,
resulting in branches in the inline functions remaining intact and
causing a tiny overhead. Clang (and in future, gcc) make sure that
the -1 case is always safe, i.e. any comparison of the generated
expression with (size_t)-1 is always false so that bit is taken care
of. The rest is avoidable since we want the _chk variant whenever we
have a size expression and it's not -1.
Rework the conditionals in a uniform way to clearly indicate two
conditions at compile time:
- Either the size is unknown (-1) or we know at compile time that the
operation length is less than the object size. We can call the
original function in this case. It could be that either the length,
object size or both are non-constant, but the compiler, through
range analysis, is able to fold the *comparison* to a constant.
- The size and length are known and the compiler can see at compile
time that operation length > object size. This is valid grounds for
a warning at compile time, followed by emitting the _chk variant.
For everything else, emit the _chk variant.
This simplifies most of the fortified function implementations and at
the same time, ensures that only one call from _chk or the regular
function is emitted.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
In the context of a function definition, the size hints imply that the
size of an object pointed to by one parameter is another parameter.
This doesn't make sense for the fortified versions of the functions
since that's the bit it's trying to validate.
This is harmless with __builtin_object_size since it has fairly simple
semantics when it comes to objects passed as function parameters.
With __builtin_dynamic_object_size we could (as my patchset for gcc[1]
already does) use the access attribute to determine the object size in
the general case but it misleads the fortified functions.
Basically the problem occurs when access attributes are present on
regular functions that have inline fortified definitions to generate
_chk variants; the attributes get inherited by these definitions,
causing problems when analyzing them. For example with poll(fds, nfds,
timeout), nfds is hinted using the __attr_access as being the size of
fds.
Now, when analyzing the inline function definition in bits/poll2.h, the
compiler sees that nfds is the size of fds and tries to use that
information in the function body. In _FORTIFY_SOURCE=3 case, where the
object size could be a non-constant expression, this information results
in the conclusion that nfds is the size of fds, which defeats the
purpose of the implementation because we're trying to check here if nfds
does indeed represent the size of fds. Hence for this case, it is best
to not have the access attribute.
With the attributes gone, the expression evaluation should get delayed
until the function is actually inlined into its destinations.
Disable the access attribute for fortified function inline functions
when building at _FORTIFY_SOURCE=3 to make this work better. The
access attributes remain for the _chk variants since they can be used
by the compiler to warn when the caller is passing invalid arguments.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581125.html
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Unlike GCC, Clang parses asm statements and verifies they are valid
instructions/directives. Place the magic @@@ into a comment to avoid
a parse error.
1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly
before calling elf_get_dynamic_info to get dynamic info in bootstrap_map,
2. Define a single
static inline bool
dl_relocate_ld (const struct link_map *l)
{
/* Don't relocate dynamic section if it is readonly */
return !(l->l_ld_readonly || DL_RO_DYN_SECTION);
}
This updates BZ #28340 fix.
This was found when testing the OpenRISC port I am working on. These
two tests fail with SIGSEGV:
FAIL: misc/tst-ntp_gettime
FAIL: misc/tst-ntp_gettimex
This was found to be due to the kernel overwriting the stack space
allocated by the timex structure. The reason for the overwrite being
that the kernel timex has 64-bit fields and user space code only
allocates enough stack space for timex with 32-bit fields.
On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined.
This causes the timex structure to use 32-bit fields with type
__syscall_slong_t.
This patch adjusts the ifdef condition to allow 32-bit systems with
TIMESIZE=64 to use the 64-bit long long timex definition.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The current language reads "This macro determines...", changing to
"Define this macro...". This is consistent with other feature macro
documentation language.
When I first read the previous language it seems to indicate that the
macro is already defined. By changing the language to "Define this
macro..." it's clear that its the user's responsibility to define it.
The check for waiting for the pidfile to be created looks wrong. At the
point when ACCESS is run the pid file will always be created and
accessible as it is created during DO_PREPARE. This means that thread
cancellation may be performed before the pid is written to the pidfile.
This was found to be flaky when testing on my OpenRISC platform.
Fix this by using the semaphore to wait for pidfile pid write
completion.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
THe d6d89608ac broke powerpc for --enable-bind-now because it turned
out that different than patch assumption rtld elf_get_dynamic_info()
does require to handle RTLD_BOOTSTRAP to avoid DT_FLAGS and
DT_RUNPATH (more specially the GLRO usage which is not reallocate
yet).
This patch fixes by passing two arguments to elf_get_dynamic_info()
to inform that by rtld (bootstrap) or static pie initialization
(static_pie_bootstrap). I think using explicit argument is way more
clear and burried C preprocessor, and compiler should remove the
dead code.
I checked on x86_64 and i686 with default options, --enable-bind-now,
and --enable-bind-now and --enable--static-pie. I also check on
aarch64, armhf, powerpc64, and powerpc with default and
--enable-bind-now.
5bf07e1b3a ("Linux: Simplify __opensock and fix race condition [BZ #28353]")
made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET
knows about network interfaces, so better actually specify that in
if_index.
INTR_MSG_TRAP was tinkering with esp to make it point to
_hurd_intr_rpc_mach_msg's parameters, and notably use (&msg)[-1] which is
meaningless in C.
Instead, just push the parameters on the stack, which also avoids leaving
local variables of _hurd_intr_rpc_mach_msg below esp. We now also
properly express that OPTION and TIMEOUT may be updated during the trap
call.
The 4af6982e4c fix does not fully handle RTLD_BOOTSTRAP usage on
rtld.c due two issues:
1. RTLD_BOOTSTRAP is also used on dl-machine.h on various
architectures and it changes the semantics of various machine
relocation functions.
2. The elf_get_dynamic_info() change was done sideways, previously
to 490e6c62aa get-dynamic-info.h was included by the first
dynamic-link.h include *without* RTLD_BOOTSTRAP being defined.
It means that the code within elf_get_dynamic_info() that uses
RTLD_BOOTSTRAP is in fact unused.
To fix 1. this patch now includes dynamic-link.h only once with
RTLD_BOOTSTRAP defined. The ELF_DYNAMIC_RELOCATE call will now have
the relocation fnctions with the expected semantics for the loader.
And to fix 2. part of 4af6982e4c is reverted (the check argument
elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP
pieces are removed.
To reorganize the includes the static TLS definition is moved to
its own header to avoid a circular dependency (it is defined on
dynamic-link.h and dl-machine.h requires it at same time other
dynamic-link.h definition requires dl-machine.h defitions).
Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL
are moved to its own header. Only ancient ABIs need special values
(arm, i386, and mips), so a generic one is used as default.
The powerpc Elf64_FuncDesc is also moved to its own header, since
csu code required its definition (which would require either include
elf/ folder or add a full path with elf/).
Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32,
and powerpc64le.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
No bug.
Optimization are
1. change control flow for L(more_2x_vec) to fall through to loop and
jump for L(less_4x_vec) and L(less_8x_vec). This uses less code
size and saves jumps for length > 4x VEC_SIZE.
2. For EVEX/AVX512 move L(less_vec) closer to entry.
3. Avoid complex address mode for length > 2x VEC_SIZE
4. Slightly better aligning code for the loop from the perspective of
code size and uops.
5. Align targets so they make full use of their fetch block and if
possible cache line.
6. Try and reduce total number of icache lines that will need to be
pulled in for a given length.
7. Include "local" version of stosb target. For AVX2/EVEX/AVX512
jumping to the stosb target in the sse2 code section will almost
certainly be to a new page. The new version does increase code size
marginally by duplicating the target but should get better iTLB
behavior as a result.
test-memset, test-wmemset, and test-bzero are all passing.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
No bug.
The frontend optimizations are to:
1. Reorganize logically connected basic blocks so they are either in
the same cache line or adjacent cache lines.
2. Avoid cases when basic blocks unnecissarily cross cache lines.
3. Try and 32 byte align any basic blocks possible without sacrificing
code size. Smaller / Less hot basic blocks are used for this.
Overall code size shrunk by 168 bytes. This should make up for any
extra costs due to aligning to 64 bytes.
In general performance before deviated a great deal dependending on
whether entry alignment % 64 was 0, 16, 32, or 48. These changes
essentially make it so that the current implementation is at least
equal to the best alignment of the original for any arguments.
The only additional optimization is in the page cross case. Branch on
equals case was removed from the size == [4, 7] case. As well the [4,
7] and [2, 3] case where swapped as [4, 7] is likely a more hot
argument size.
test-memcmp and test-wmemcmp are both passing.
The test expects stdin to be a file which is not the case when running
tests over ssh where stdin is piped in.
The test fails with:
error: xlseek.c:27: lseek64 (0, 0, 1): Illegal seek
Update the test to create a temporary file and use that to perform the
test.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The tst-audit14, tst-audit15 and tst-audit16 tests all have audit
modules that write to stdout; the test reads from stdout to confirm
what was written. This assumes the stdout is a file which is not the
case when run over ssh.
This patch updates the tests to use a post run cmp command to compare
the output against and .exp file. This is similar to how many other
tests work and it fixes the stdout limitation. Also, this means the
test code can be greatly simplified.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Before to 490e6c62aa ('elf: Avoid nested functions in the loader
[BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on
the first dynamic-link.h include and later within _dl_start(). The
former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used
on setup_vdso() (since it is a global definition), while the former does
define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation.
With the commit change, the function is now included and defined once
instead of defined as a nested function. So rtld.c defines without
defining RTLD_BOOTSTRAP and it brokes at least powerpc32.
This patch fixes by moving the get-dynamic-info.h include out of
dynamic-link.h, which then the caller can corirectly set the expected
semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or
RESOLVE_MAP.
It also required to enable some asserts only for the loader bootstrap
to avoid issues when called from setup_vdso().
As a side note, this is another issues with nested functions: it is
not clear from pre-processed output (-E -dD) how the function will
be build and its semantic (since nested function will be local and
extra C defines may change it).
I checked on x86_64-linux-gnu (w/o --enable-static-pie),
i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4,
aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and
s390x-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
I'd like to be able to test narrow and wide string interfaces, with
the narrow string tests using TEST_COMPARE_STRING and the wide string
tests using something analogous (possibly generated using macros from
a common test template for both the narrow and wide string tests where
appropriate).
Add such a TEST_COMPARE_STRING_WIDE, along with functions
support_quote_blob_wide and support_test_compare_string_wide that it
builds on. Those functions are built using macros from common
templates shared by the narrow and wide string implementations, though
I didn't do that for the tests of test functions. In
support_quote_blob_wide, I chose to use the \x{} delimited escape
sequence syntax proposed for C2X in N2785, rather than e.g. trying to
generate the end of a string and the start of a new string when
ambiguity would result from undelimited \x (when the next character
after such an escape sequence is valid hex) or forcing an escape
sequence to be used for the next character in the case of such
ambiguity.
Tested for x86_64.
Building for nios2-linux-gnu has recently started showing a localplt
test failure, arising from a reference to __floatunsidf from
getloadavg after commit b5c8a3aa82
("Linux: implement getloadavg(3) using sysinfo(2)") (this is an
architecture with soft-fp in libc). Add this as a permitted local PLT
reference in localplt.data.
Tested with build-many-glibcs.py for nios2-linux-gnu.
Intel MPX failed to gain wide adoption and has been deprecated for a
while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in
2019.
This patch removes the support code from the dynamic loader.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Replace a call to sprintf with an equivalent pair of stpcpy/strcpy calls
to avoid a GCC 12 -Wformat-overflow false positive due to recent optimizer
improvements.
No bug.
This commit adds new medium size cases for lengths in [512, 1024). As
well it increase the iters to INNER_LOOP_ITERS_LARGE for more reliable
results.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
No bug.
This change adds a new macro ENTRY_P2ALIGN which takes a second
argument, log2 of the desired function alignment.
The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this
doesn't affect any existing functionality.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
It is at least "more random" than 0xffff & __getpid ();
Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This is a follow-up to the tst-cpuclock1.c change here:
9a29f1a2ae
This test, like tst-cpuclock1, may fail on heavily loaded VM
servers (and has occasionally failed on the 32bit trybot),
so tests that rely on "wall time" have been removed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
dynamic-link.h is included more than once in some elf/ files (rtld.c,
dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested
functions. This harms readability and the nested functions usage
is the biggest obstacle prevents Clang build (Clang doesn't support GCC
nested functions).
The key idea for unnesting is to add extra parameters (struct link_map
*and struct r_scope_elm *[]) to RESOLVE_MAP,
ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a],
elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired
by Stan Shebs' ppc64/x86-64 implementation in the
google/grte/v5-2.27/master which uses mixed extra parameters and static
variables.)
Future simplification:
* If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM,
elf_machine_runtime_setup can drop the `scope` parameter.
* If TLSDESC no longer need to be in elf_machine_lazy_rel,
elf_machine_lazy_rel can drop the `scope` parameter.
Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32,
sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64.
In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu
and riscv64-linux-gnu-rv64imafdc-lp64d.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
When performing symbol lookup for references in executable without
indirect external access:
1. Disallow copy relocations in executable against protected data symbols
in a shared object with indirect external access.
2. Disallow non-zero symbol values of undefined function symbols in
executable, which are used as the function pointer, against protected
function symbols in a shared object with indirect external access.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
1. Add GNU_PROPERTY_1_NEEDED:
#define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO
to indicate the needed properties by the object file.
2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:
#define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)
to indicate that the object file requires canonical function pointers and
cannot be used with copy relocation.
3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The 106ff08526 did not take in consideration the buffer might be
reallocated if the total path is larger than PATH_MAX. The realloc
uses 'dirbuf', where 'dirstreams' is the allocated buffer.
Checked on x86_64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
* time/tzfile.c (__tzfile_compute): Fix unlikely off-by-one bug
that accessed before start of an array when an oddball-but-valid
TZif file was queried with an unusual time_t value.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Both new HWCAPs were introduced in these kernel commits:
- 7e8403ecaf884f307b627f3c371475913dd29292
"s390: add HWCAP_S390_PCI_MIO to ELF hwcaps"
- 7e82523f2583e9813e4109df3656707162541297
"s390/hwcaps: make sie capability regular hwcap"
Also note that the kernel commit 511ad531afd4090625def4d9aba1f5227bd44b8e
"s390/hwcaps: shorten HWCAP defines" has shortened the prefix of the macros
from "HWCAP_S390_" to "HWCAP_". For compatibility reasons, we do not
change the prefix in public glibc header file.
The fd validity check in open_dev_null checks if fd > 0, which would
lead to a leaked fd if it is == 0.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Linker creates the DT_DEBUG entry only in executables. Don't fill the
non-existent DT_DEBUG entry in ld.so with the run-time address of the
r_debug structure. This fixes BZ #28129.
Building stdlib/tst-setcontext.c fails with GCC mainline:
tst-setcontext.c: In function 'f2':
tst-setcontext.c:61:16: error: comparison between two arrays [-Werror=array-compare]
61 | if (on_stack < st2 || on_stack >= st2 + sizeof (st2))
| ^
tst-setcontext.c:61:16: note: use '&on_stack[0] < &st2[0]' to compare the addresses
The comparison in this case is deliberate, so adjust it as suggested
in that note.
Tested with build-many-glibcs.py (GCC mainline) for aarch64-linux-gnu.
The largest errors over the full binary32 range are after this
patch (on x86_64):
RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7
Inputs that were yielding huge errors have been added to "make check".
Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>
My glibc bot shows failures building the testsuite with GCC mainline
across all architectures:
tst-vfprintf-width-prec.c: In function 'do_test':
tst-vfprintf-width-prec.c:90:16: error: the comparison will always evaluate as 'false' for the address of 'result' will never be NULL [-Werror=address]
90 | if (result == NULL)
| ^~
tst-vfprintf-width-prec.c:89:13: note: 'result' declared here
89 | wchar_t result[100];
| ^~~~~~
This is clearly a correct warning; the comparison against NULL is
clearly a cut-and-paste mistake from an earlier case in the test that
does use calloc. Thus, remove the unnecessary check for NULL shown up
by the warning.
Similarly, two other tests have bogus comparisons against NULL; remove
those as well:
scanf14a.c:95:13: error: the comparison will always evaluate as 'false' for the address of 'fname' will never be NULL [-Werror=address]
95 | if (fname == NULL)
| ^~
scanf14a.c:93:8: note: 'fname' declared here
93 | char fname[strlen (tmpdir) + sizeof "/tst-scanf14.XXXXXX"];
| ^~~~~
scanf16a.c:125:13: error: the comparison will always evaluate as 'false' for the address of 'fname' will never be NULL [-Werror=address]
125 | if (fname == NULL)
| ^~
scanf16a.c:123:8: note: 'fname' declared here
123 | char fname[strlen (tmpdir) + sizeof "/tst-scanf16.XXXXXX"];
| ^~~~~
Tested with build-many-glibcs.py (GCC mainline) for aarch64-linux-gnu.
Building benchmarks as static executables:
=========================================
To build benchmarks as static executables, on the build system, run:
$ make STATIC-BENCHTESTS=yes bench-build
You can copy benchmark executables to another machine and run them
without copying the source nor build directories.
The fix for bug 19329 caused a regression such that pthread_create can
deadlock when concurrent ctors from dlopen are waiting for it to finish.
Use a new GL(dl_load_tls_lock) in pthread_create that is not taken
around ctors in dlopen.
The new lock is also used in __tls_get_addr instead of GL(dl_load_lock).
The new lock is held in _dl_open_worker and _dl_close_worker around
most of the logic before/after the init/fini routines. When init/fini
routines are running then TLS is in a consistent, usable state.
In _dl_open_worker the new lock requires catching and reraising dlopen
failures that happen in the critical section.
The new lock is reinitialized in a fork child, to keep the existing
behaviour and it is kept recursive in case malloc interposition or TLS
access from signal handlers can retake it. It is not obvious if this
is necessary or helps, but avoids changing the preexisting behaviour.
The new lock may be more appropriate for dl_iterate_phdr too than
GL(dl_load_write_lock), since TLS state of an incompletely loaded
module may be accessed. If the new lock can replace the old one,
that can be a separate change.
Fixes bug 28357.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Running the test on a 4.4 kernel within KVM, the precision used on
ITIMER_VIRTUAL and ITIMER_PROF seems to different than the one used
for ITIMER_REAL (it seems the same used for CLOCK_REALTIME_COARSE and
CLOCK_MONOTONIC_COARSE). I did not see it on other kernels, for
instance 5.11 and 4.15.
To avoid trying to guess the resolution used, do not check the
nanosecond internal values for the specific timers.
Checked on i686-linux-gnu with a 4.4 kernel.