The path auxv[*].a_val could either be an integer or a string,
depending on the a_type value. Use a separate field, a_val_string, to
simplify mechanical parsing of the --list-diagnostics output.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On Skylake, it changes log1p bench performance by:
Before After Improvement
max 63.349 58.347 8%
min 4.448 5.651 -30%
mean 12.0674 10.336 14%
The minimum code path is
if (hx < 0x3FDA827A) /* x < 0.41422 */
{
if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */
{
...
}
if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */
{
math_force_eval (two54 + x); /* raise inexact */
if (ax < 0x3c900000) /* |x| < 2**-54 */
{
...
}
else
return x - x * x * 0.5;
FMA and non-FMA code sequences look similar. Non-FMA version is slightly
faster. Since log1p is called by asinh and atanh, it improves asinh
performance by:
Before After Improvement
max 75.645 63.135 16%
min 10.074 10.071 0%
mean 15.9483 14.9089 6%
and improves atanh performance by:
Before After Improvement
max 91.768 75.081 18%
min 15.548 13.883 10%
mean 18.3713 16.8011 8%
The static PIE configure check uses link tests. When bootstrapping
a cross-toolchain, the link tests fail due to missing crt-files /
libc.so. As we explicitely want to test an issue in binutils (ld),
we now also explicitely check for known linker versions.
See also commit 368b7c614b
S390: Use compile-only instead of also link-tests in configure.
These implementations improve the time to copy data in the glibc
microbenchmark as below:
memcpy-lasx reduces the runtime about 8%-76%
memcpy-lsx reduces the runtime about 8%-72%
memcpy-unaligned reduces the runtime of unaligned data copying up to 40%
memcpy-aligned reduece the runtime of unaligned data copying up to 25%
memmove-lasx reduces the runtime about 20%-73%
memmove-lsx reduces the runtime about 50%
memmove-unaligned reduces the runtime of unaligned data moving up to 40%
memmove-aligned reduces the runtime of unaligned data moving up to 25%
These implementations improve the time to run strchr{nul}
microbenchmark in glibc as below:
strchr-lasx reduces the runtime about 50%-83%
strchr-lsx reduces the runtime about 30%-67%
strchr-aligned reduces the runtime about 10%-20%
strchrnul-lasx reduces the runtime about 50%-83%
strchrnul-lsx reduces the runtime about 36%-65%
strchrnul-aligned reduces the runtime about 6%-10%
SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which
some distributions may disable for hardening. Check if that's the case (unset)
and mark the test as UNSUPPORTED if so.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
All callers pass 1 or 0x11 anyway (same meaning according to man page),
but still.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
On Skylake, it improves expm1 bench performance by:
Before After Improvement
max 70.204 68.054 3%
min 20.709 16.2 22%
mean 22.1221 16.7367 24%
NB: Add
extern long double __expm1l (long double);
extern long double __expm1f128 (long double);
for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
strlen-lasx is implemeted by LASX simd instructions(256bit)
strlen-lsx is implemeted by LSX simd instructions(128bit)
strlen-align is implemented by LA basic instructions and never use unaligned memory acess
LoongArch glibc can add some LASX/LSX vector instructions codes,
change the required minimum binutils version to 2.41 which could
support vector instructions. HAVE_LOONGARCH_VEC_ASM is removed
accordingly.
The following usage of macro LEAF/ENTRY are all feasible:
1. LEAF(fcn) -- the align value of fcn is .align 3(default value)
2. LEAF(fcn, 6) -- the align value of fcn is .align 6
The:
```
if (shared_per_thread > 0 && threads > 0)
shared_per_thread /= threads;
```
Code was accidentally moved to inside the else scope. This doesn't
match how it was previously (before af992e7abd).
This patch fixes that by putting the division after the `else` block.
The nscd daemon caches hosts data from NSS modules verbatim, without
filtering protocol families or sorting them (otherwise separate caches
would be needed for certain ai_flags combinations). The cache
implementation is complete separate from the getaddrinfo code. This
means that rebuilding getaddrinfo is not needed. The only function
actually used is __bump_nl_timestamp from check_pf.c, and this change
moves it into nscd/connections.c.
Tested on x86_64-linux-gnu with -fexceptions, built with
build-many-glibcs.py. I also backported this patch into a distribution
that still supports nscd and verified manually that caching still works.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Since i686 provides the fortified wrappers for memcpy, mempcpy,
memmove, and memset on the same string implementation, the static
build tries to optimized it by not tying the fortified wrappers
to string routine (to avoid pulling the fortify function if
they are not required).
Checked on i686-linux-gnu building with different option:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
With multiarch disabled, the default memmove implementation provides
the fortify routines for memcpy, mempcpy, and memmove. However, it
does not provide the internal hidden definitions used when building
with fortify enabled. The memset has a similar issue.
Checked on x86_64-linux-gnu building with different options:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Linux 6.4 adds new constants PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
and PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG. Add those to all
relevant sys/ptrace.h headers, along with adding the associated
argument structure to bits/ptrace-shared.h (named struct
__ptrace_sud_config there following the usual convention for such
structures).
Tested for x86_64 and with build-many-glibcs.py.
Making error_t defined to enum __error_t_codes conveniently makes the
debugger print symbolic values, but in C++ int is not interoperable with
enum __error_t_codes, leading to C++ application build issues, so let's
revert error_t to int in C++.
This is the only missing part in struct statvfs.
The LSB calls [f]statfs() deprecated, and its weird types are definitely
off-putting. However, its use is required to get f_type.
Instead, allocate one of the six spares to f_type,
copied directly from struct statfs.
This then becomes a small glibc extension to the standard interface
on Linux and the Hurd, instead of two different interfaces, one of which
is quite odd due to being an ABI type, and there no longer is any reason
to use statfs().
The underlying kernel type is a mess, but all architectures agree on u32
(or more) for the ABI, and all filesystem magicks are 32-bit integers.
We don't lose any generality by using u32, and by doing so we both make
the API consistent with the Hurd, and allow C++
switch(f_type) { case RAMFS_MAGIC: ...; }
Also fix tst-statvfs so that it actually fails;
as it stood, all it did was return 0 always.
Test statfs()' and statvfs()' f_types are the same.
Link: https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
When using jemalloc, malloc() needs to use TSD, while libpthread
initialization needs malloc(). Having ___pthread_self set early to some
static storage allows TSD to work early, thus allowing jemalloc and
libpthread to initialize together.
This incidentaly simplifies __pthread_enable/disable_asynccancel and
__pthread_self, now that ___pthread_self is always initialized.
When using jemalloc, malloc() needs to use TSD, while libpthread
initialization needs malloc(). Supporting a static TSD area allows jemalloc
and libpthread to initialize together.
Some legacy AMD CPUs and hypervisors have the _cpuid_ '0x8000_001D'
set to Zero, thus resulting in zeroed-out computed cache values.
This patch reintroduces the old way of cache computation as a
fail-safe option to handle these exceptions.
Fixed 'level4_cache_size' value through handle_amd().
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Tested-by: Florian Weimer <fweimer@redhat.com>
04bf7d2d8a ("chk: Add and fix hidden builtin definitions for *_chk")
added an #undef for longjmp and siglongjmp to compensate for the
definition in include/setjmp.h, but missed doing so for the powerpc
version too.
Fixes: 04bf7d2d8a ("chk: Add and fix hidden builtin definitions for
*_chk")
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.4. (There are no new
constants covered by these tests in 6.4 that need any other header
changes.)
Tested with build-many-glibcs.py.
This patch enables the option to influence hwcaps used by PowerPC.
The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz....,
can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx
and zzz, where the feature name is case-sensitive and has to match the ones
mentioned in the file{sysdeps/powerpc/dl-procinfo.c}.
Note that the hwcap tunables only used in the IFUNC selection.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On __convert_scm_timestamps GCC 6 issues an warning that tvts[0]/tvts[1]
maybe be used uninitialized, however it would be used if type is set to a
value different than 0 (done by either COMPAT_SO_TIMESTAMP_OLD or
COMPAT_SO_TIMESTAMPNS_OLD) which will fallthrough to 'common' label.
It does not show with gcc 7 or more recent versions.
Checked on i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Similar to memcpy, mempcpy, and memmove there is no need for an
specific memset_chk-nonshared.S. It can be provided by
memset-ia32.S itself for static library.
Checked on i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The i386 string routines provide multiple internal definitions
for memcpy, memmove, and mempcpy chk routines:
$ objdump -t libc.a | grep __memcpy_chk
00000000 g F .text 0000000e __memcpy_chk
00000000 g F .text 00000013 __memcpy_chk
$ objdump -t libc.a | grep __mempcpy_chk
00000000 g F .text 0000000e __mempcpy_chk
00000000 g F .text 00000013 __mempcpy_chk
$ objdump -t libc.a | grep __memmove_chk
00000000 g F .text 0000000e __memmove_chk
00000000 g F .text 00000013 __memmove_chk
Although is not an issue for normal static builds, with fortify=3
glibc itself might use the fortify chk functions and thus static
build might fail with multiple definitions. For instance:
x86_64-glibc-linux-gnu-gcc -m32 -march=i686 -o [...]math/test-signgam-uchar-static -nostdlib -nostartfiles -static -static-pie [...]
x86_64-glibc-linux-gnu/bin/ld: [...]/libc.a(mempcpy-ia32.o):
in function `__mempcpy_chk': [...]/glibc-git/string/../sysdeps/i386/i686/mempcpy.S:32: multiple definition of `__mempcpy_chk';
[...]/libc.a(mempcpy_chk-nonshared.o):[...]/debug/../sysdeps/i386/mempcpy_chk.S:28: first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [../Rules:298:
There is no need for mem*-nonshared.S, the __mem*_chk routines
are already provided by the assembly routines.
Checked on i686-linux-gnu with gcc 13 built with fortify=1,2,3 and
without fortify.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The compiler might not see that internal definition is an alias
due the libc_ifunc macro, which redefines __strchrnul. With
gcc 6 it fails with:
In file included from <command-line>:0:0:
./../include/libc-symbols.h:472:33: error: ‘__EI___strchrnul’ aliased to
undefined symbol ‘__GI___strchrnul’
extern thread __typeof (name) __EI_##name \
^
./../include/libc-symbols.h:468:3: note: in expansion of macro
‘__hidden_ver2’
__hidden_ver2 (, local, internal, name)
^~~~~~~~~~~~~
./../include/libc-symbols.h:476:29: note: in expansion of macro
‘__hidden_ver1’
# define hidden_def(name) __hidden_ver1(__GI_##name, name, name);
^~~~~~~~~~~~~
./../include/libc-symbols.h:557:32: note: in expansion of macro
‘hidden_def’
# define libc_hidden_def(name) hidden_def (name)
^~~~~~~~~~
../sysdeps/powerpc/powerpc64/multiarch/strchrnul.c:38:1: note: in
expansion of macro ‘libc_hidden_def’
libc_hidden_def (__strchrnul)
^~~~~~~~~~~~~~~
Use libc_ifunc_hidden as stpcpy. Checked on powerpc64 with
gcc 6 and gcc 13.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Generated on a Cavium Octeon III 2 board running Linux version 4.19.249
and GCC 13.1.0.
Needed due to commit cf7ffdd8a5 ("added pair of inputs for hypotf in
binary32").
Starting with commit 2c6b4b272e
"nptl: Unconditionally use a 32-byte rseq area", the testcase
misc/tst-rseq-disable is UNSUPPORTED as RSEQ_SIG is not defined.
The mentioned commit removes inclusion of sys/rseq.h in nptl/descr.h.
Thus just include sys/rseq.h in the tst-rseq-disable.c as also done
in tst-rseq.c and tst-rseq-nptl.c.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Generated on a VisionFive 2 board running Linux version 6.4.2 and
GCC 13.1.0.
Needed due to commit cf7ffdd8a5 ("added pair of inputs for hypotf in
binary32").
Based on feedback by Mike Gilbert <floppym@gentoo.org>
Linux-6.1.38-dist x86_64 AMD Phenom-tm- II X6 1055T Processor
-march=amdfam10
failures occur for x32 ABI
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>