Some compiler versions, e.g. GCC 7, complain when -mlong-double-128 is
used together with -mabi=ibmlongdouble or -mabi=ieeelongdouble,
producing the following error message:
cc1: error: ‘-mabi=ibmlongdouble’ requires ‘-mlong-double-128’
This patch removes -mlong-double-128 from the compilation lines that
explicitly request -mabi=*longdouble.
Tested for powerpc64le.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Some of the files that provide stdio.h and wchar.h functions have a
filename prefixed with 'io', such as 'iovsprintf.c'. On platforms that
imply ldbl-128ibm-compat, these files must be compiled with the flag
-mabi=ibmlongdouble. This patch adds this flag to their compilation.
Notice that this is not required for the other files that provide
similar functions, because filenames that are not prefixed with 'io'
have ldbl-128ibm-compat counterparts in the Makefile, which already adds
-mabi=ibmlongdouble to them.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
On platforms where long double has IEEE binary128 format as a third
option (initially, only powerpc64le), many exported functions are
redirected to their __*ieee128 equivalents. This redirection is
provided by installed headers such as stdio-ldbl.h, and is supposed to
work correctly with user code.
However, during the build of glibc, similar redirections are employed,
in internal headers, such as include/stdio.h, in order to avoid extra
PLT entries. These redirections conflict with the redirections to
__*ieee128, and must be avoided during the build. This patch protects
the second redirections with a test for __LONG_DOUBLE_USES_FLOAT128, a
new macro that is defined to 1 when functions that deal with long double
typed values reuses the _Float128 implementation (this is currently only
true for powerpc64le).
Tested for powerpc64le, x86_64, and with build-many-glibcs.py.
Co-authored-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
All architectures now uses the Linux generic implementation which
uses __NR_rt_sigprocmask.
Checked on x86_64-linux-gnu, sparc64-linux-gnu, ia64-linux-gnu,
s390x-linux-gnu, and alpha-linux-gnu.
The functions do not fail regardless of the argument value. Also, for
Linux the return value is not correct on some platforms due the missing
usage of INTERNAL_SYSCALL_ERROR_P / INTERNAL_SYSCALL_ERRNO macros.
Checked on x86_64-linux-gnu, i686-linux-gnu, and sparc64-linux-gnu.
On powerpc64le, the libm_alias_float128_other_r_ldbl macro is
used to create an alias between totalorderf128 and __totalorderlieee128,
as well as between the totalordermagf128 and __totalordermaglieee128.
However, the totalorder* and totalordermag* functions changed their
parameter type since commit ID 42760d7646 and got compat symbols for
their old versions. With this change, the aforementioned macro would
create two conflicting aliases for __totalorderlieee128 and
__totalordermaglieee128.
This patch avoids the creation of the alias between the IEEE long double
symbols (__totalorderl*ieee128) and the compat symbols, because the IEEE
long double functions have never been exported thus don't need such
compat symbol.
Tested for powerpc64le.
Reviewed-by: Joseph Myers <joseph@codesourcery.com>
This patch adds IEEE long double versions of q*cvt* functions for
powerpc64le. Unlike all other long double to/from string conversion
functions, these do not rely on internal functions that can take
floating-point numbers with different formats and act on them
accordingly, instead, the related files are rebuilt with the
-mabi=ieeelongdouble compiler flag set.
Having -mabi=ieeelongdouble passed to the compiler causes the object
files to be marked with a .gnu_attribute that is incompatible with the
.gnu_attribute in files built with -mabi=ibmlongdouble (the default).
The difference causes error messages similar to the following:
ld: libc_pic.a(s_isinfl.os) uses IBM long double,
libc_pic.a(ieee128-qefgcvt_r.os) uses IEEE long double.
collect2: error: ld returned 1 exit status
make[2]: *** [../Makerules:649: libc_pic.os] Error 1
Although this warning is useful in other situations, the library
actually needs to have functions with different long double formats, so
.gnu_attribute generation is explicitly disabled for these files with
the use of -mno-gnu-attribute.
Tested for powerpc64le on the branch that actually enables the
sysdeps/ieee754/ldbl-128ibm-compat for powerpc64le.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
This patch refactors the *cvt functions implementation in a way that
makes it easier to re-use them for implementing the IEEE long double on
powerpc64le. By removing the macros that generate the function names
(APPEND combined with FUNC_PREFIX), the new code makes it easier to
define new function names, such as __qecvtieee128.
Tested that installed stripped binaries for all build-many-glibcs
targets remain identical before and after this patch. Also tested for
powerpc64le and x86_64.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
This patch refactors the *cvt functions implementation in a way that
makes it easier to re-use them for implementing the IEEE long double on
powerpc64le. By splitting the implementation per se in one file
(efgcvt-template.c) and the alias definitions in others (e.g. efgcvt.c),
the new code makes it easier to define new function names, such as
__qecvtieee128.
Tested that installed stripped binaries for all build-many-glibcs
targets remain identical before and after this patch. Also tested for
powerpc64le and x86_64.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Due to the branch prediction issue of Kunpeng processor, we found
memset_generic has poor performance on middle sizes setting, and so
we reconstructed the logic, expanded the loop by 4 times in set_long
to solve the problem, even when setting below 1K sizes have benefit.
Another change is that DZ_ZVA seems no work when setting zero, so we
discarded it and used set_long to set zero instead. Fewer branches and
predictions also make the zero case have slightly improvement.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Optimize the strlen implementation by using vector operations and
loop unrolling in main loop.Compared to __strlen_generic,it reduces
latency of cases in bench-strlen by 7%~18% when the length of src
is greater than 128 bytes, with gains throughout the benchmark.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Kunpeng processer is a 64-bit Arm-compatible CPU released by Huawei,
and we have already signed a copyright assignement with the FSF.
This patch adds its to cpu list, and related macro for IFUNC.
Checked on aarch64-linux-gnu.
Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Considering the excellent performance of memchr.S on glibc 2.30, the
same algorithm is used to find chrin. Compared to memrchr.c, this
method with memrchr.S achieves an average performance improvement
of 58% based on benchtest and its extension cases.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Optimize the strlen implementation by using vector operations and
loop unrooling in main loop. Compared to aarch64/strnlen.S, it
reduces latency of cases in bench-strnlen by 11%~24% when the length
of src is greater than 64 bytes, with gains throughout the benchmark.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Optimize the strcpy implementation by using vector loads and operations
in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases
in bench-strlen by 5%~18% when the length of src is greater than 64
bytes, with gains throughout the benchmark.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The loop body is expanded from a 16-byte comparison to a 64-byte
comparison, and the usage of ldp is replaced by the Post-index
mode to the Base plus offset mode. Hence, compare can faster 18%
around > 128 bytes in all.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
If the wait4 syscall is not available (such as y2038 safe 32-bit
systems) waitid should be used instead. However prior Linux 5.4
waitid is not a full superset of other wait syscalls, since it
does not include support for waiting for the current process group.
It is possible to emulate wait4 by issuing an extra syscall to get
the current process group, but it is inherent racy: after the current
process group is received and before it is passed to waitid a signal
could arrive causing the current process group to change.
So waitid is used if wait4 is not defined iff the build is
enabled with a minimum kernel if 5.4+. The new assume
__ASSUME_WAITID_PID0_P_PGID is added and an error is issued if waitid
can not be implemented by either __NR_wait4 or
__NR_waitid && __ASSUME_WAITID_PID0_P_PGID.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Co-authored-by: Alistair Francis <alistair.francis@wdc.com>
The POSIX implementation is used as default and both BSD and Linux
version are removed. It simplifies the implementation for
architectures that do not provide either __NR_waitpid or
__NR_wait4.
Checked on x86_64-linux-gnu and powerpc64le-linux-gnu.
It enables and disables cancellation with pthread_setcancelstate
before calling the waitpid. It simplifies the waitpid implementation
for architectures that do not provide either __NR_waitpid or
__NR_wait4.
Checked on x86_64-linux-gnu.
Previously, ld.so was invoked only with the elf subdirectory on the
library search path. Since the soname link for libc.so only exists in
the top-level build directory, this leaked the system libc into the
test.
The posix_spawn on sparc issues invalid sigprocmask calls:
rt_sigprocmask(0xffe5e15c /* SIG_??? */, ~[], 0xffe5e1dc, 8) = -1 EINVAL (Invalid argument)
Which make support/tst-support_capture_subprocess fails with random
output (due the child signal being wrongly captured by the parent).
Tracking the culprit it seems to be a wrong code generation in the
INTERNAL_SYSCALL due the automatic sigset_t used on
__libc_signal_block_all:
return INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_BLOCK, &SIGALL_SET,
set, _NSIG / 8);
Where SIGALL_SET is defined as:
((__sigset_t) { .__val = {[0 ... _SIGSET_NWORDS-1 ] = -1 } })
Building the expanded __libc_signal_block_all on sparc64 with recent
compiler (gcc 8.3.1 and 9.1.1):
#include <signal>
int
_libc_signal_block_all (sigset_t *set)
{
INTERNAL_SYSCALL_DECL (err);
return INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_BLOCK, &SIGALL_SET,
set, _NSIG / 8);
}
The first argument (SIG_BLOCK) is not correctly set on 'o0' register:
__libc_signal_block_all:
save %sp, -304, %sp
add %fp, 1919, %o0
mov 128, %o2
sethi %hi(.LC0), %o1
call memcpy, 0
or %o1, %lo(.LC0), %o1
add %fp, 1919, %o1
mov %i0, %o2
mov 8, %o3
mov 103, %g1
ta 0x6d;
bcc,pt %xcc, 1f
mov 0, %g1
sub %g0, %o0, %o0
mov 1, %g1
1: sra %o0, 0, %i0
return %i7+8
nop
Where if SIGALL_SET is defined a const object, gcc correctly sets the
expected kernel argument in correct register:
sethi %hi(.LC0), %o1
call memcpy, 0
or %o1, %lo(.LC0), %o1
-> mov 1, %o0
add %fp, 1919, %o1
Another possible fix is use a static const object. Although there
should not be a difference between a const compound literal and a static
const object, the gcc C99 status page [1] has a note stating that this
optimization is not implemented:
"const-qualified compound literals could share storage with each
other and with string literals, but currently don't.".
This patch fixes it by moving both sigset_t that represent the
signal sets to static const data object. It generates slight better
code where the object reference is used directly instead of a stack
allocation plus the content materialization.
Checked on x86_64-linux-gnu, i686-linux-gnu, and sparc64-linux-gnu.
[1] https://gcc.gnu.org/c99status.html
This patch adds the missing bits for powerpc and fixes both
tst-ifunc-fault-lazy and tst-ifunc-fault-bindnow failures on
powerpc-linux-gnu.
Checked on powerpc-linux-gnu and powerpc-linux-gnu-power4.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
After commit f7649d5780 ("dlopen: Do not
block signals"), the dynamic linker no longer uses sigprocmask, which
means that it does not have to be made available explicitly on hurd.
This reverts commit 892badc9bb
("hurd: Make __sigprocmask GLIBC_PRIVATE") and commit
d5ed9ba29a ("hurd: Fix ld.so link"),
but keeps the comment changes from the second commit.
* sysdeps/mach/hurd/getrandom.c (__getrandom): Open the random source
with O_NONBLOCK when the GRND_NONBLOCK flag is provided.
Message-Id: <20191217182929.90989-1-jrtc27@jrtc27.com>
GCC 10 (PR 91233) won't silently allow registers that are not architecturally
available to be present in the clobber list anymore, resulting in build failure
for mips*r6 targets in form of:
...
.../sysdep.h:146:2: error: the register ‘lo’ cannot be clobbered in ‘asm’ for the current target
146 | __asm__ volatile ( \
| ^~~~~~~
This is because base R6 ISA doesn't define hi and lo registers w/o DSP extension.
This patch provides the alternative definitions of __SYSCALL_CLOBBERS for r6
targets that won't include those registers.
* sysdeps/unix/sysv/linux/mips/mips32/sysdep.h (__SYSCALL_CLOBBERS): Exclude
hi and lo from the clobber list for __mips_isa_rev >= 6.
* sysdeps/unix/sysv/linux/mips/mips64/n32/sysdep.h (__SYSCALL_CLOBBERS): Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/sysdep.h (__SYSCALL_CLOBBERS): Likewise.
In the format string for *scanf functions, the '%as', '%aS', and '%a[]'
modifiers behave differently depending on ISO C99 compatibility. When
_GNU_SOURCE is defined and -std=c89 is passed to the compiler, these
functions behave like ascanf, and the modifiers allocate memory for the
output. Otherwise, the ISO C99 compliant version of these functions is
used, and the modifiers consume a floating-point argument. This patch
adds the IEEE binary128 variant of ISO C99 compliant functions for the
third long double format on powerpc64le.
Tested for powerpc64le.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Since commit
commit 03992356e6
Author: Zack Weinberg <zackw@panix.com>
Date: Sat Feb 10 11:58:35 2018 -0500
Use C99-compliant scanf under _GNU_SOURCE with modern compilers.
the selection of the GNU versions of scanf functions requires both
_GNU_SOURCE and -std=c89. This patch changes the tests in
ldbl-128ibm-compat so that they actually test the GNU versions (without
this change, the redirection to the ISO C99 version always happens, so
GNU versions of the new implementation (e.g. __scanfieee128) were left
untested).
Tested for powerpc64le.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Blocking signals causes issues with certain anti-malware solutions
which rely on an unblocked SIGSYS signal for system calls they
intercept.
This reverts commit a2e8aa0d9e
("Block signals during the initial part of dlopen") and adds
comments related to async signal safety to active_nodelete and
its caller.
Note that this does not make lazy binding async-signal-safe with regards
to dlopen. It merely avoids introducing new async-signal-safety hazards
as part of the NODELETE changes.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Commit a2e8aa0d9e ("Block signals during
the initial part of dlopen") was deemed necessary because of
read-modify-write operations like the one in add_dependency in
elf/dl-lookup.c. In the old code, we check for any kind of NODELETE
status and bail out:
/* Redo the NODELETE check, as when dl_load_lock wasn't held
yet this could have changed. */
if (map->l_nodelete != link_map_nodelete_inactive)
goto out;
And then set pending status (during relocation):
if (flags & DL_LOOKUP_FOR_RELOCATE)
map->l_nodelete = link_map_nodelete_pending;
else
map->l_nodelete = link_map_nodelete_active;
If a signal arrives during relocation and the signal handler, through
lazy binding, adds a global scope dependency on the same map, it will
set map->l_nodelete to link_map_nodelete_active. This will be
overwritten with link_map_nodelete_pending by the dlopen relocation
code.
To avoid such problems in relation to the l_nodelete member, this
commit introduces two flags for active NODELETE status (irrevocable)
and pending NODELETE status (revocable until activate_nodelete is
invoked). As a result, NODELETE processing in dlopen does not
introduce further reasons why lazy binding from signal handlers
is unsafe during dlopen, and a subsequent commit can remove signal
blocking from dlopen.
This does not address pre-existing issues (unrelated to the NODELETE
changes) which make lazy binding in a signal handler during dlopen
unsafe, such as the use of malloc in both cases.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The assumption behind the assert in activate_nodelete was wrong:
Inconsistency detected by ld.so: dl-open.c: 459: activate_nodelete:
Assertion `!imap->l_init_called || imap->l_type != lt_loaded' failed! (edit)
It can happen that an already-loaded object that is in the local
scope is promoted to NODELETE status, via binding to a unique
symbol.
Similarly, it is possible that such NODELETE promotion occurs to
an already-loaded object from the global scope. This is why the
loop in activate_nodelete has to cover all objects in the namespace
of the new object.
In do_lookup_unique, it could happen that the NODELETE status of
an already-loaded object was overwritten with a pending NODELETE
status. As a result, if dlopen fails, this could cause a loss of
the NODELETE status of the affected object, eventually resulting
in an incorrect unload.
Fixes commit f63b73814f ("Remove all
loaded objects if dlopen fails, ignoring NODELETE [BZ #20839]").
Not only libc/rtld use __close_nocancel_nostatus.
* sysdeps/mach/hurd/Makefile [$(subdir) == io] (sysdep_routines): Add
close_nocancel_nostatus.
* sysdeps/mach/hurd/Versions (libc): Add __close_nocancel_nostatus to
GLIBC_PRIVATE.
* sysdeps/mach/hurd/not-cancel.h (__close_nocancel_nostatus): Declare
function instead of defining inline.
[IS_IN (libc) || IS_IN (rtld)] (__close_nocancel_nostatus): Make
function hidden.
* sysdeps/mach/hurd/close_nocancel_nostatus.c: New file.