Commit Graph

16220 Commits

Author SHA1 Message Date
Adhemerval Zanella
ae515ba530 powerpc: Fix __fesetround_inline_nocheck on POWER9+ (BZ 31682)
The e68b1151f7 commit changed the
__fesetround_inline_nocheck implementation to use mffscrni
(through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
ceil/floor/trunc, the function is supposed to disable the
floating-point inexact exception enable bit, however mffscrni
does not change any exception enable bits.

This patch fixes by reverting the optimization for the
__fesetround_inline_nocheck.

Checked on powerpc-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2024-05-09 08:59:30 -03:00
Gabi Falk
dd5f891c1a x86_64: Fix missing wcsncat function definition without multiarch (x86-64-v4)
This code expects the WCSCAT preprocessor macro to be predefined in case
the evex implementation of the function should be defined with a name
different from __wcsncat_evex.  However, when glibc is built for
x86-64-v4 without multiarch support, sysdeps/x86_64/wcsncat.S defines
WCSNCAT variable instead of WCSCAT to build it as wcsncat.  Rename the
variable to WCSNCAT, as it is actually a better naming choice for the
variable in this case.

Reported-by: Kenton Groombridge
Link: https://bugs.gentoo.org/921945
Fixes: 64b8b6516b ("x86: Add evex optimized functions for the wchar_t strcpy family")
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-05-08 07:37:59 -07:00
Adhemerval Zanella
1e1ad714ee support: Add envp argument to support_capture_subprogram
So tests can specify a list of environment variables.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2024-05-07 12:16:36 -03:00
Adhemerval Zanella
bcae44ea85 elf: Only process multiple tunable once (BZ 31686)
The 680c597e9c commit made loader reject ill-formatted strings by
first tracking all set tunables and then applying them. However, it does
not take into consideration if the same tunable is set multiple times,
where parse_tunables_string appends the found tunable without checking
if it was already in the list. It leads to a stack-based buffer overflow
if the tunable is specified more than the total number of tunables.  For
instance:

  GLIBC_TUNABLES=glibc.malloc.check=2:... (repeat over the number of
  total support for different tunable).

Instead, use the index of the tunable list to get the expected tunable
entry.  Since now the initial list is zero-initialized, the compiler
might emit an extra memset and this requires some minor adjustment
on some ports.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reported-by: Yuto Maeda <maeda@cyberdefense.jp>
Reported-by: Yutaro Shimizu <shimizu@cyberdefense.jp>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2024-05-07 12:16:36 -03:00
H.J. Lu
5f245f3bfb Add crt1-2.0.o for glibc 2.0 compatibility tests
Starting from glibc 2.1, crt1.o contains _IO_stdin_used which is checked
by _IO_check_libio to provide binary compatibility for glibc 2.0.  Add
crt1-2.0.o for tests against glibc 2.0.  Define tests-2.0 for glibc 2.0
compatibility tests.  Add and update glibc 2.0 compatibility tests for
stderr, matherr and pthread_kill.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-05-06 07:49:40 -07:00
Amrita H S
23f0d81608 powerpc: Optimized strncmp for power10
This patch is based on __strcmp_power10.

Improvements from __strncmp_power9:

    1. Uses new POWER10 instructions
       - This code uses lxvp to decrease contention on load
	 by loading 32 bytes per instruction.

    2. Performance implication
       - This version has around 38% better performance on average.
       - Minor performance regression is seen for few small sizes
	 and specific combination of alignments.

Signed-off-by: Amrita H S <amritahs@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-05-06 09:01:29 -05:00
Stafford Horne
643d9d38d5 or1k: Add hard float support
This patch adds hardware floating point support to OpenRISC.  Hardware
floating point toolchain builds are enabled by passing the machine
specific argument -mhard-float to gcc via CFLAGS.  With this enabled GCC
generates floating point instructions for single-precision operations
and exports __or1k_hard_float__.

There are 2 main parts to this patch.

 - Implement fenv functions to update the FPCSR flags keeping it in sync
   with sfp (software floating point).
 - Update machine context functions to store and restore the FPCSR
   state.

*On mcontext_t ABI*

This patch adds __fpcsr to mcontext_t.  This is an ABI change, but also
an ABI fix.  The Linux kernel has always defined padding in mcontext_t
that space was missing from the glibc ABI.  In Linux this unused space
has now been re-purposed for storing the FPCSR.  This patch brings
OpenRISC glibc in line with the Linux kernel and other libc
implementation (musl).

Compatibility getcontext, setcontext, etc symbols have been added to
allow for binaries expecting the old ABI to continue to work.

*Hard float ABI*

The calling conventions and types do not change with OpenRISC hard-float
so glibc hard-float builds continue to use dynamic linker
/lib/ld-linux-or1k.so.1.

*Testing*

I have tested this patch both with hard-float and soft-float builds and
the test results look fine to me.  Results are as follows:

Hard Float

    # failures
    FAIL: elf/tst-sprof-basic		(Haven't figured out yet, not related to hard-float)
    FAIL: gmon/tst-gmon-pie		(PIE bug in or1k toolchain)
    FAIL: gmon/tst-gmon-pie-gprof	(PIE bug in or1k toolchain)
    FAIL: iconvdata/iconv-test		(timeout, passed when run manually)
    FAIL: nptl/tst-cond24		(Timeout)
    FAIL: nptl/tst-mutex10		(Timeout)

    # summary
	  6 FAIL
       4289 PASS
	 86 UNSUPPORTED
	 16 XFAIL
	  2 XPASS

    # versions
    Toolchain: or1k-smhfpu-linux-gnu
    Compiler:  gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC)
    Binutils:  GNU assembler version 2.42.0 (or1k-smhfpu-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324
    Linux:     Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux
    Tester:    shorne
    Glibc:     2024-04-25 b62928f907 Florian Weimer   x86: In ld.so, diagnose missing APX support in APX-only builds  (origin/master, origin/HEAD)

Soft Float

    # failures
    FAIL: elf/tst-sprof-basic
    FAIL: gmon/tst-gmon-pie
    FAIL: gmon/tst-gmon-pie-gprof
    FAIL: nptl/tst-cond24
    FAIL: nptl/tst-mutex10

    # summary
	  5 FAIL
       4295 PASS
	 81 UNSUPPORTED
	 16 XFAIL
	  2 XPASS

    # versions
    Toolchain: or1k-smh-linux-gnu
    Compiler:  gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC)
    Binutils:  GNU assembler version 2.42.0 (or1k-smh-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324
    Linux:     Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux
    Tester:    shorne
    Glibc:     2024-04-25 b62928f907 Florian Weimer   x86: In ld.so, diagnose missing APX support in APX-only builds  (origin/master, origin/HEAD)

Documentation: https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.4-rev0.pdf
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-03 18:28:18 +01:00
Stafford Horne
b57adfa49b or1k: Add hard float libm-test-ulps
This patch adds the ulps test file to prepare for the upcoming
hard float patch.  This is separated out to make the hard float patch
smaller.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-03 18:28:18 +01:00
Gabi Falk
5a2cf833f5
i686: Fix multiple definitions of __memmove_chk and __memset_chk
Commit c73c96a4a1 updated memcpy.S and
mempcpy.S, but omitted memmove.S and memset.S.  As a result, the static
library built as PIC, whether with or without multiarch support,
contains two definitions for each of the __memmove_chk and __memset_chk
symbols.

/usr/lib/gcc/i686-pc-linux-gnu/14/../../../../i686-pc-linux-gnu/bin/ld: /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset-ia32.o): in function `__memset_chk':
/var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/string/../sysdeps/i386/i686/memset.S:32: multiple definition of `__memset_chk'; /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset_chk.o):/var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/debug/../sysdeps/i386/i686/multiarch/memset_chk.c:24: first defined here

After this change, regardless of PIC options, the static library, built
for i686 with multiarch contains implementations of these functions
respectively from debug/memmove_chk.c and debug/memset_chk.c, and
without multiarch contains implementations of these functions
respectively from sysdeps/i386/memmove_chk.S and
sysdeps/i386/memset_chk.S.  This ensures that memmove and memset won't
pull in __chk_fail and the routines it calls.

Reported-by: Sam James <sam@gentoo.org>
Tested-by: Sam James <sam@gentoo.org>
Fixes: c73c96a4a1 ("i686: Fix build with --disable-multiarch")
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
2024-05-02 11:51:10 +01:00
Gabi Falk
0fdf4ba48c
i586: Fix multiple definitions of __memcpy_chk and __mempcpy_chk
/home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy_chk.o): in function `__memcpy_chk':
/home/bmg/src/glibc/debug/../sysdeps/i386/memcpy_chk.S:29: multiple definition of `__memcpy_chk';/home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here /home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy_chk.o): in function `__mempcpy_chk': /home/bmg/src/glibc/debug/../sysdeps/i386/mempcpy_chk.S:28: multiple definition of `__mempcpy_chk'; /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here

After this change, the static library built for i586, regardless of PIC
options, contains implementations of these functions respectively from
sysdeps/i386/memcpy_chk.S and sysdeps/i386/mempcpy_chk.S.  This ensures
that memcpy and mempcpy won't pull in __chk_fail and the routines it
calls.

Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
2024-05-02 11:50:21 +01:00
Carlos O'Donell
91695ee459 time: Allow later version licensing.
The FSF's Licensing and Compliance Lab noted a discrepancy in the
licensing of several files in the glibc package.

When timespect_get.c was impelemented the license did not include
the standard ", or (at your option) any later version." text.

Change the license in timespec_get.c and all copied files to match
the expected license.

This change was previously approved in principle by the FSF in
RT ticket #1316403. And a similar instance was fixed in
commit 46703efa02.
2024-05-01 09:03:26 -04:00
Wilco Dijkstra
6dae61567f AArch64: Remove unused defines of CPU names
Remove unused defines of CPU names in cpu-features.h.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-30 13:32:29 +01:00
Florian Weimer
b62928f907 x86: In ld.so, diagnose missing APX support in APX-only builds
At this point, this is mainly a tool for testing the early ld.so
CPU compatibility diagnostics: GCC uses the new instructions in most
functions, so it's easy to spot if some of the early code is not
built correctly.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-25 17:20:28 +02:00
Florian Weimer
3a3a449742 i386: ulp update for SSE2 --disable-multi-arch configurations 2024-04-25 12:56:48 +02:00
H.J. Lu
46c9997413 x86: Define MINIMUM_X86_ISA_LEVEL in config.h [BZ #31676]
Define MINIMUM_X86_ISA_LEVEL at configure time to avoid

/usr/bin/ld: …/build/elf/librtld.os: in function `init_cpu_features':
…/git/elf/../sysdeps/x86/cpu-features.c:1202: undefined reference to `_dl_runtime_resolve_fxsave'
/usr/bin/ld: …/build/elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

when glibc is built with -march=x86-64-v3 and configured with
--with-rtld-early-cflags=-march=x86-64, which is used to allow ld.so to
print an error message on unsupported CPUs:

Fatal glibc error: CPU does not support x86-64-v3

This fixes BZ #31676.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-04-24 04:50:56 -07:00
caiyinyu
095067efdf LoongArch: Add glibc.cpu.hwcap support.
The current IFUNC selection is always using the most recent
features which are available via AT_HWCAP.  But in
some scenarios it is useful to adjust this selection.

The environment variable:

GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,zzz,....

can be used to enable HWCAP feature yyy, disable HWCAP feature xxx,
where the feature name is case-sensitive and has to match the ones
used in sysdeps/loongarch/cpu-tunables.c.

Signed-off-by: caiyinyu <caiyinyu@loongson.cn>
2024-04-24 18:22:38 +08:00
Florian Weimer
f4724843ad nptl: Fix tst-cancel30 on kernels without ppoll_time64 support
Fall back to ppoll if ppoll_time64 fails with ENOSYS.
Fixes commit 370da8a121 ("nptl: Fix
tst-cancel30 on sparc64").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-04-23 21:16:32 +02:00
Florian Weimer
5361ad3910 login: Use unsigned 32-bit types for seconds-since-epoch
These fields store timestamps when the system was running.  No Linux
systems existed before 1970, so these values are unused.  Switching
to unsigned types allows continued use of the existing struct layouts
beyond the year 2038.

The intent is to give distributions more time to switch to improved
interfaces that also avoid locking/data corruption issues.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer
9abdae94c7 login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS.  This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer
4d4da5aab9 login: Check default sizes of structs utmp, utmpx, lastlog
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer
14e56bd4ce powerpc: Fix ld.so address determination for PCREL mode (bug 31640)
This seems to have stopped working with some GCC 14 versions,
which clobber r2.  With other compilers, the kernel-provided
r2 value is still available at this point.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-04-14 08:24:51 +02:00
Florian Weimer
aea52e3d2b Revert "x86_64: Suppress false positive valgrind error"
This reverts commit a1735e0aa8.

The test failure is a real valgrind bug that needs to be fixed before
valgrind is usable with a glibc that has been built with
CC="gcc -march=x86-64-v3".  The proposed valgrind patch teaches
valgrind to replace ld.so strcmp with an unoptimized scalar
implementation, thus avoiding any AVX2-related problems.

Valgrind bug: <https://bugs.kde.org/show_bug.cgi?id=485487>

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-13 17:42:13 +02:00
Adhemerval Zanella
686d542025 posix: Sync tempname with gnulib
The gnulib version contains an important change (9ce573cde), which
fixes some problems with multithreading, entropy loss, and ASLR leak
nfo.  It also fixes an issue where getrandom is not being used
on some new files generation (only for __GT_NOCREATE on first try).

The 044bf893ac removed __path_search, which is now moved to another
gnulib shared files (stdio-common/tmpdir.{c,h}).  Tthis patch
also fixes direxists to use __stat64_time64 instead of __xstat64,
and move the include of pathmax.h for !_LIBC (since it is not used
by glibc).  The license is also changed from GPL 3.0 to 2.1, with
permission from the authors (Bruno Haible and Paul Eggert).

The sync also removed the clock fallback, since clock_gettime
with CLOCK_REALTIME is expected to always succeed.

It syncs with gnulib commit 323834962817af7b115187e8c9a833437f8d20ec.

Checked on x86_64-linux-gnu.

Co-authored-by: Bruno Haible <bruno@clisp.org>
Co-authored-by: Paul Eggert <eggert@cs.ucla.edu>
Reviewed-by: Bruno Haible <bruno@clisp.org>
2024-04-10 14:53:39 -03:00
Florian Weimer
f8d8b1b1e6 aarch64: Enhanced CPU diagnostics for ld.so
This prints some information from struct cpu_features, and the midr_el1
and dczid_el0 system register contents on every CPU.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-08 16:48:55 +02:00
Florian Weimer
7a430f40c4 x86: Add generic CPUID data dumper to ld.so --list-diagnostics
This is surprisingly difficult to implement if the goal is to produce
reasonably sized output.  With the current approaches to output
compression (suppressing zeros and repeated results between CPUs,
folding ranges of identical subleaves, dealing with the %ecx
reflection issue), the output is less than 600 KiB even for systems
with 256 logical CPUs.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-08 16:48:55 +02:00
Florian Weimer
5653ccd847 elf: Add CPU iteration support for future use in ld.so diagnostics
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-08 16:48:55 +02:00
H.J. Lu
9e1f4aef86 x86-64: Exclude FMA4 IFUNC functions for -mapxf
When -mapxf is used to build glibc, the resulting glibc will never run
on FMA4 machines.  Exclude FMA4 IFUNC functions when -mapxf is used.
This requires GCC which defines __APX_F__ for -mapxf with commit:

1df56719bd8 x86: Define __APX_F__ for -mapxf

Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-04-06 05:03:55 -07:00
Adhemerval Zanella
c27f8763cf Reinstate generic features-time64.h
The a4ed0471d7 removed the generic version which is included by
features.h and used by Hurd.

Checked by building i686-gnu and x86_64-gnu with build-many-glibc.py.
2024-04-05 09:02:36 -03:00
Adhemerval Zanella
460d9e2dfe Cleanup __tls_get_addr on alpha/microblaze localplt.data
They are not required.

Checked with a make check for both ABIs.
2024-04-04 17:20:33 -03:00
Adhemerval Zanella
95700e7998 arm: Remove ld.so __tls_get_addr plt usage
Use the hidden alias instead.

Checked on arm-linux-gnueabihf.
2024-04-04 17:03:32 -03:00
Adhemerval Zanella
50c2be2390 aarch64: Remove ld.so __tls_get_addr plt usage
Use the hidden alias instead.

Checked on aarch64-linux-gnu.
2024-04-04 17:02:32 -03:00
Adhemerval Zanella
44ccc2465c math: x86 trunc traps when FE_INEXACT is enabled (BZ 31603)
The implementations of trunc functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella
932544efa4 math: x86 floor traps when FE_INEXACT is enabled (BZ 31601)
The implementations of floor functions using x87 floating point (i386 and
86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella
637bfc392f math: x86 ceill traps when FE_INEXACT is enabled (BZ 31600)
The implementations of ceil functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Joe Ramsay
87cb1dfcd6 aarch64/fpu: Add vector variants of erfc
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:24 +01:00
Joe Ramsay
3d3a4fb8e4 aarch64/fpu: Add vector variants of tanh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:20 +01:00
Joe Ramsay
eedbbca0bf aarch64/fpu: Add vector variants of sinh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:16 +01:00
Joe Ramsay
8b67920528 aarch64/fpu: Add vector variants of atanh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:12 +01:00
Joe Ramsay
81406ea3c5 aarch64/fpu: Add vector variants of asinh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:02 +01:00
Joe Ramsay
b09fee1d21 aarch64/fpu: Add vector variants of acosh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:32:58 +01:00
Joe Ramsay
bdb5705b7b aarch64/fpu: Add vector variants of cosh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:32:52 +01:00
Joe Ramsay
cb5d84f1f8 aarch64/fpu: Add vector variants of erf
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:32:48 +01:00
Stafford Horne
3db9d208dd misc: Add support for Linux uio.h RWF_NOAPPEND flag
In Linux 6.9 a new flag is added to allow for Per-io operations to
disable append mode even if a file was opened with the flag O_APPEND.
This is done with the new RWF_NOAPPEND flag.

This caused two test failures as these tests expected the flag 0x00000020
to be unused.  Adding the flag definition now fixes these tests on Linux
6.9 (v6.9-rc1).

  FAIL: misc/tst-preadvwritev2
  FAIL: misc/tst-preadvwritev64v2

This patch adds the flag, adjusts the test and adds details to
documentation.

Link: https://lore.kernel.org/all/20200831153207.GO3265@brightrain.aerifal.cx/
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-04 09:41:27 +01:00
Adhemerval Zanella
4dcd674b66 powerpc: Add missing arch flags on rounding ifunc variants
The ifunc variants now uses the powerpc implementation which in turn
uses the compiler builtin.  Without the proper -mcpu switch the builtin
does not generate the expected optimization.

Checked on powerpc-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-04-02 15:49:31 -03:00
Adhemerval Zanella
a4ed0471d7 Always define __USE_TIME_BITS64 when 64 bit time_t is used
It was raised on libc-help [1] that some Linux kernel interfaces expect
the libc to define __USE_TIME_BITS64 to indicate the time_t size for the
kABI.  Different than defined by the initial y2038 design document [2],
the __USE_TIME_BITS64 is only defined for ABIs that support more than
one time_t size (by defining the _TIME_BITS for each module).

The 64 bit time_t redirects are now enabled using a different internal
define (__USE_TIME64_REDIRECTS). There is no expected change in semantic
or code generation.

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and
arm-linux-gnueabi

[1] https://sourceware.org/pipermail/libc-help/2024-January/006557.html
[2] https://sourceware.org/glibc/wiki/Y2038ProofnessDesign

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-04-02 15:28:36 -03:00
Adhemerval Zanella
721314c980 x86_64: Remove avx512 strstr implementation
As indicated in a recent thread, this it is a simple brute-force
algorithm that checks the whole needle at a matching character pair
(and does so 1 byte at a time after the first 64 bytes of a needle).
Also it never skips ahead and thus can match at every haystack
position after trying to match all of the needle, which generic
implementation avoids.

As indicated by Wilco, a 4x larger needle and 16x larger haystack gives
a clear 65x slowdown both basic_strstr and __strstr_avx512:

  "ifuncs": ["basic_strstr", "twoway_strstr", "__strstr_avx512",
"__strstr_sse2_unaligned", "__strstr_generic"],

    {
     "len_haystack": 65536,
     "len_needle": 1024,
     "align_haystack": 0,
     "align_needle": 0,
     "fail": 1,
     "desc": "Difficult bruteforce needle",
     "timings": [4.0948e+07, 15094.5, 3.20818e+07, 108558, 10839.2]
    },
    {
     "len_haystack": 1048576,
     "len_needle": 4096,
     "align_haystack": 0,
     "align_needle": 0,
     "fail": 1,
     "desc": "Difficult bruteforce needle",
     "timings": [2.69767e+09, 100797, 2.08535e+09, 495706, 82666.9]
    }

PS: I don't have an AVX512 capable machine to verify this issues, but
    skimming through the code it does seems to follow what Wilco has
    described.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-03-27 13:48:16 -03:00
Adhemerval Zanella
2e53eb9234 signal: Avoid system signal disposition to interfere with tests
Both tst-sigset2 and tst-signal1 expectes that SIGINT disposition
is set to SIG_DFL.
2024-03-27 13:47:09 -03:00
Palmer Dabbelt
96d1b9ac23 RISC-V: Fix the static-PIE non-relocated object check
The value of l_scope is only valid post relocation, so this original
check was triggering undefined behavior.  Instead just directly check to
see if the object has been relocated, at which point using l_scope is
safe.

Reported-by: Andreas Schwab <schwab@suse.de>
Closes: BZ #31317
Fixes: e0590f41fe ("RISC-V: Enable static-pie.")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-25 15:17:13 +01:00
Sergey Bugaev
dc1a77269c htl: Implement some support for TLS_DTV_AT_TP
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-19-bugaevc@gmail.com>
2024-03-23 23:00:30 +01:00
Sergey Bugaev
a4273efa21 htl: Respect GL(dl_stack_flags) when allocating stacks
Previously, HTL would always allocate non-executable stacks.  This has
never been noticed, since GNU Mach on x86 ignores VM_PROT_EXECUTE and
makes all pages implicitly executable.  Since GNU Mach on AArch64
supports non-executable pages, HTL forgetting to pass VM_PROT_EXECUTE
immediately breaks any code that (unfortunately, still) relies on
executable stacks.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-7-bugaevc@gmail.com>
2024-03-23 22:48:44 +01:00
Sergey Bugaev
b467cfcaee hurd: Use the RETURN_ADDRESS macro
This gives us PAC stripping on AArch64.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-6-bugaevc@gmail.com>
2024-03-23 22:48:01 +01:00
Sergey Bugaev
6afeac1289 hurd: Disable Prefer_MAP_32BIT_EXEC on non-x86_64 for now
While we could support it on any architecture, the tunable is currently
only defined on x86_64.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-5-bugaevc@gmail.com>
2024-03-23 22:47:46 +01:00
Sergey Bugaev
7f02511e5b hurd: Move internal functions to internal header
Move _hurd_self_sigstate (), _hurd_critical_section_lock (), and
_hurd_critical_section_unlock () inline implementations (that were
already guarded by #if defined _LIBC) to the internal version of the
header.  While at it, add <tls.h> to the includes, and use
__LIBC_NO_TLS () unconditionally.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-2-bugaevc@gmail.com>
2024-03-23 22:43:07 +01:00
Stafford Horne
ad05a42370 or1k: Add prctl wrapper to unwrap variadic args
On OpenRISC variadic functions and regular functions have different
calling conventions so this wrapper is needed to translate.  This
wrapper is copied from x86_64/x32.  I don't know the build system enough
to find a cleaner way to share the code between x86_64/x32 and or1k
(maybe Implies?), so I went with the straight copy.

This fixes test failures:

  misc/tst-prctl
  nptl/tst-setgetname
2024-03-22 15:43:34 +00:00
Stafford Horne
df7e29e2a4 or1k: Only define fpu rouding and exceptions with hard-float
This test failure:

  math/test-fenv

If rounding mode and exception macros are defined then the fenv tests
run and always fail.  This patch adds an ifdef using the
__or1k_hard_float__ macro provided by gcc to avoid defining these fenv
macros when they cnnot be used.  This is similar to what is done in csky.

Note, I will post the or1k hard-float support soon. So, I prefer to
leave the hard-float bits here for now.
2024-03-22 15:43:34 +00:00
Stafford Horne
2e982a3937 or1k: Update libm test ulps
To fix test failures:

    FAIL: math/test-float-hypot
    FAIL: math/test-float32-hypot
2024-03-22 15:43:34 +00:00
Wilco Dijkstra
2e94e2f5d2 AArch64: Check kernel version for SVE ifuncs
Old Linux kernels disable SVE after every system call.  Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy.  This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched.  Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.

Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions.  If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.

Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-03-21 16:50:51 +00:00
Amrita H S
1ea0511456 powerpc: Placeholder and infrastructure/build support to add Power11 related changes.
The following three changes have been added to provide initial Power11 support.
    1. Add the directories to hold Power11 files.
    2. Add support to select Power11 libraries based on AT_PLATFORM.
    3. Let submachine=power11 be set automatically.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-03-19 21:11:34 -05:00
Manjunath Matti
3ab9b88e2a powerpc: Add HWCAP3/HWCAP4 data to TCB for Power Architecture.
This patch adds a new feature for powerpc.  In order to get faster
access to the HWCAP3/HWCAP4 masks, similar to HWCAP/HWCAP2 (i.e. for
implementing __builtin_cpu_supports() in GCC) without the overhead of
reading them from the auxiliary vector, we now reserve space for them
in the TCB.

This is an ABI change for GLIBC 2.39.

Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-03-19 17:19:27 -05:00
Adhemerval Zanella
3d53d18fc7 elf: Enable TLS descriptor tests on aarch64
The aarch64 uses 'trad' for traditional tls and 'desc' for tls
descriptors, but unlike other targets it defaults to 'desc'.  The
gnutls2 configure check does not set aarch64 as an ABI that uses
TLS descriptors, which then disable somes stests.

Also rename the internal machinery fron gnu2 to tls descriptors.

Checked on aarch64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-03-19 14:53:30 -03:00
Adhemerval Zanella
64c7e34428 arm: Update _dl_tlsdesc_dynamic to preserve caller-saved registers (BZ 31372)
ARM _dl_tlsdesc_dynamic slow path has two issues:

  * The ip/r12 is defined by AAPCS as a scratch register, and gcc is
    used to save the stack pointer before on some function calls.  So it
    should also be saved/restored as well.  It fixes the tst-gnu2-tls2.

  * None of the possible VFP registers are saved/restored.  ARM has the
    additional complexity to have different VFP bank sizes (depending of
    VFP support by the chip).

The tst-gnu2-tls2 test is extended to check for VFP registers, although
only for hardfp builds.  Different than setcontext, _dl_tlsdesc_dynamic
does not have  HWCAP_ARM_IWMMXT (I don't have a way to properly test
it and it is almost a decade since newer hardware was released).

With this patch there is no need to mark tst-gnu2-tls2 as XFAIL.

Checked on arm-linux-gnueabihf.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-03-19 14:53:30 -03:00
Andreas Schwab
fd7ee2e6c5 Add tst-gnu2-tls2mod1 to test-internal-extras
That allows sysdeps/x86_64/tst-gnu2-tls2mod1.S to use internal headers.

Fixes: 717ebfa85c ("x86-64: Allocate state buffer space for RDI, RSI and RBX")
2024-03-19 14:28:28 +01:00
H.J. Lu
717ebfa85c x86-64: Allocate state buffer space for RDI, RSI and RBX
_dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11.  Define
TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
STATE_SAVE_OFFSET(%rsp).

   +==================+<- stack frame start aligned at 8 or 16 bytes
   |                  |<- RDI saved in the red zone
   |                  |<- RSI saved in the red zone
   |                  |<- RBX saved in the red zone
   |                  |<- paddings for stack realignment of 64 bytes
   |------------------|<- xsave buffer end aligned at 64 bytes
   |                  |<-
   |                  |<-
   |                  |<-
   |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
   |                  |<- 8-byte padding for 64-byte alignment
   |                  |<- 8-byte padding for 64-byte alignment
   |                  |<- R11
   |                  |<- R10
   |                  |<- R9
   |                  |<- R8
   |                  |<- RDX
   |                  |<- RCX
   +==================+<- RSP aligned at 64 bytes

Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
and RBX are saved onto stack without adjusting stack pointer first, using
the red-zone.  This fixes BZ #31501.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-03-18 19:45:13 -07:00
Darius Rad
f44f3aed31 riscv: Update nofpu libm test ulps
Fix two test failures.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-03-18 11:28:50 +01:00
Florian Weimer
7a76f21867 linux: Use rseq area unconditionally in sched_getcpu (bug 31479)
Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e ("nptl:
Unconditionally use a 32-byte rseq area").  After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition.  This commit always checks
the rseq area for CPU number information before using the other
approaches.

This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call.  Most architectures that have vDSO
acceleration for getcpu also have rseq support.

Fixes: 2c6b4b272e
Fixes: 1d350aa060
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2024-03-15 19:08:24 +01:00
Szabolcs Nagy
73c26018ed aarch64: fix check for SVE support in assembler
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.

The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-03-14 14:27:56 +00:00
Joseph Myers
2367bf468c Update kernel version to 6.8 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.8.  (There are no new
constants covered by these tests in 6.8 that need any other header
changes.)

Tested with build-many-glibcs.py.
2024-03-13 19:46:21 +00:00
Joseph Myers
3de2f8755c Update syscall lists for Linux 6.8
Linux 6.8 adds five new syscalls.  Update syscall-names.list and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.

Tested with build-many-glibcs.py.
2024-03-13 13:57:56 +00:00
Adhemerval Zanella
4a76fb1da8 powerpc: Remove power8 strcasestr optimization
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation.  The geomean
on bench-strcasestr shows:

            __strcasestr_power8  __strcasestr_ppc
  power10                  1159              1120
  power9                   1640              1469
  power8                   1787              1904

The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.

Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-03-12 17:11:01 -03:00
Adhemerval Zanella
2149da3683 riscv: Fix alignment-ignorant memcpy implementation
The memcpy optimization (commit 587a1290a1) has a series
of mistakes:

  - The implementation is wrong: the chunk size calculation is wrong
    leading to invalid memory access.

  - It adds ifunc supports as default, so --disable-multi-arch does
    not work as expected for riscv.

  - It mixes Linux files (memcpy ifunc selection which requires the
    vDSO/syscall mechanism)  with generic support (the memcpy
    optimization itself).

  - There is no __libc_ifunc_impl_list, which makes testing only
    check the selected implementation instead of all supported
    by the system.

This patch also simplifies the required bits to enable ifunc: there
is no need to memcopy.h; nor to add Linux-specific files.

The __memcpy_noalignment tail handling now uses a branchless strategy
similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte
copies for size 1..3).

Checked on riscv64 and riscv32 by explicitly enabling the function
on __libc_ifunc_impl_list on qemu-system.

Changes from v1:
* Implement the memcpy in assembly to correctly handle RISCV
  strict-alignment.
Reviewed-by: Evan Green <evan@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-12 14:38:08 -03:00
Andreas Schwab
2173173d57 linux/sigsetops: fix type confusion (bug 31468)
Each mask in the sigset array is an unsigned long, so fix __sigisemptyset
to use that instead of int.  The __sigword function returns a simple array
index, so it can return int instead of unsigned long.
2024-03-12 10:00:22 +01:00
caiyinyu
aeee41f1cf LoongArch: Correct {__ieee754, _}_scalb -> {__ieee754, _}_scalbf 2024-03-12 14:07:27 +08:00
Sunil K Pandey
b6e3898194 x86-64: Simplify minimum ISA check ifdef conditional with if
Replace minimum ISA check ifdef conditional with if.  Since
MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants,
compiler will perform constant folding optimization, getting same
results.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-03-03 15:47:53 -08:00
Evan Green
587a1290a1
riscv: Add and use alignment-ignorant memcpy
For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:15:01 -08:00
Evan Green
a2b47f7d46
riscv: Add ifunc helper method to hwprobe.h
Add a little helper method so it's easier to fetch a single value from
the hwprobe function when used within an ifunc selector.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:15:00 -08:00
Evan Green
a29bb320a1
riscv: Enable multi-arg ifunc resolvers
RISC-V is apparently the first architecture to pass more than one
argument to ifunc resolvers. The helper macros in libc-symbols.h,
__ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible
with this. These macros have an "arg" (non-final) parameter that
represents the parameter signature of the ifunc resolver. The result is
an inability to pass the required comma through in a single preprocessor
argument.

Rearrange the __ifunc_resolver() macro to be variadic, and pass the
types as those variable parameters. Move the guts of __ifunc() and
__ifunc_hidden() into new macros, __ifunc_args(), and
__ifunc_args_hidden(), that pass the variable arguments down through to
__ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which
are used in a bunch of places, to simply shuffle the arguments down into
__ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which
provides convenience macros to those looking to write ifunc selectors
that use both arguments.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:59 -08:00
Evan Green
78308ce77a
riscv: Add __riscv_hwprobe pointer to ifunc calls
The new __riscv_hwprobe() function is designed to be used by ifunc
selector functions. This presents a challenge for applications and
libraries, as ifunc selectors are invoked before all relocations have
been performed, so an external call to __riscv_hwprobe() from an ifunc
selector won't work. To address this, pass a pointer to the
__riscv_hwprobe() function into ifunc selectors as the second
argument (alongside dl_hwcap, which was already being passed).

Include a typedef as well for convenience, so that ifunc users don't
have to go through contortions to call this routine. Users will need to
remember to check the second argument for NULL, to account for older
glibcs that don't pass the function.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:58 -08:00
Evan Green
e7919e0db2
riscv: Add hwprobe vdso call support
The new riscv_hwprobe syscall also comes with a vDSO for faster answers
to your most common questions. Call in today to speak with a kernel
representative near you!

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:57 -08:00
Evan Green
c6c33339b4
linux: Introduce INTERNAL_VSYSCALL
Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
a regular syscall, but without setting errno. Instead, the return value
is plumbed straight out of the macro.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:56 -08:00
Evan Green
426d0e1aa8
riscv: Add Linux hwprobe syscall support
Add awareness and a thin wrapper function around a new Linux system call
that allows callers to get architecture and microarchitecture
information about the CPUs from the kernel. This can be used to
do things like dynamically choose a memcpy implementation.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:55 -08:00
H.J. Lu
9b7091415a x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers
_dl_tlsdesc_dynamic should also preserve AMX registers which are
caller-saved.  Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID
to x86-64 TLSDESC_CALL_STATE_SAVE_MASK.  Compute the AMX state size
and save it in xsave_state_full_size which is only used by
_dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec.  This fixes
the AMX part of BZ #31372.  Tested on AMX processor.

AMX test is enabled only for compilers with the fix for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

GCC 14 and GCC 11/12/13 branches have the bug fix.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-02-29 04:30:01 -08:00
H.J. Lu
a1735e0aa8 x86_64: Suppress false positive valgrind error
When strcmp-avx2.S is used as the default, elf/tst-valgrind-smoke fails
with

==1272761== Conditional jump or move depends on uninitialised value(s)
==1272761==    at 0x4022C98: strcmp (strcmp-avx2.S:462)
==1272761==    by 0x400B05B: _dl_name_match_p (dl-misc.c:75)
==1272761==    by 0x40085F3: _dl_map_object (dl-load.c:1966)
==1272761==    by 0x401AEA4: map_doit (rtld.c:644)
==1272761==    by 0x4001488: _dl_catch_exception (dl-catch.c:237)
==1272761==    by 0x40015AE: _dl_catch_error (dl-catch.c:256)
==1272761==    by 0x401B38F: do_preload (rtld.c:816)
==1272761==    by 0x401C116: handle_preload_list (rtld.c:892)
==1272761==    by 0x401EDF5: dl_main (rtld.c:1842)
==1272761==    by 0x401A79E: _dl_sysdep_start (dl-sysdep.c:140)
==1272761==    by 0x401BEEE: _dl_start_final (rtld.c:494)
==1272761==    by 0x401BEEE: _dl_start (rtld.c:581)
==1272761==    by 0x401AD87: ??? (in */elf/ld.so)

The assembly codes are:

   0x0000000004022c80 <+144>:	vmovdqu 0x20(%rdi),%ymm0
   0x0000000004022c85 <+149>:	vpcmpeqb 0x20(%rsi),%ymm0,%ymm1
   0x0000000004022c8a <+154>:	vpcmpeqb %ymm0,%ymm15,%ymm2
   0x0000000004022c8e <+158>:	vpandn %ymm1,%ymm2,%ymm1
   0x0000000004022c92 <+162>:	vpmovmskb %ymm1,%ecx
   0x0000000004022c96 <+166>:	inc    %ecx
=> 0x0000000004022c98 <+168>:	jne    0x4022c32 <strcmp+66>

strcmp-avx2.S has 32-byte vector loads of strings which are shorter than
32 bytes:

(gdb) p (char *) ($rdi + 0x20)
$6 = 0x1ffeffea20 "memcheck-amd64-linux.so"
(gdb) p (char *) ($rsi + 0x20)
$7 = 0x4832640 "core-amd64-linux.so"
(gdb) call (int) strlen ((char *) ($rsi + 0x20))
$8 = 19
(gdb) call (int) strlen ((char *) ($rdi + 0x20))
$9 = 23
(gdb)

It triggers the valgrind error.  The above code is safe since the loads
don't cross the page boundary.  Update tst-valgrind-smoke.sh to accept
an optional suppression file and pass a suppression file to valgrind when
strcmp-avx2.S is the default implementation of strcmp.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-02-28 13:40:55 -08:00
H.J. Lu
8c7c188d62 x86: Don't check XFD against /proc/cpuinfo
Since /proc/cpuinfo doesn't report XFD, don't check it against
/proc/cpuinfo.
2024-02-28 11:50:38 -08:00
H.J. Lu
befe2d3c4d x86-64: Don't use SSE resolvers for ISA level 3 or above
When glibc is built with ISA level 3 or above enabled, SSE resolvers
aren't available and glibc fails to build:

ld: .../elf/librtld.os: in function `init_cpu_features':
.../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave'
ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/local/bin/ld: final link failed: bad value

For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor
_dl_tlsdesc_dynamic_fxsave.

This fixes BZ #31429.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-28 11:49:30 -08:00
H.J. Lu
0aac205a81 x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:

	leaq	tls_var@TLSDESC(%rip), %rax
	call	*tls_var@TLSCALL(%rax)

or

	leal	tls_var@TLSDESC(%ebx), %eax
	call	*tls_var@TLSCALL(%eax)

CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL.  When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path.  __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers.  _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers.  Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers.  This
fixes BZ #31372.

Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-28 09:02:56 -08:00
H.J. Lu
e6350be7e9 sysdeps/unix/sysv/linux/x86_64/Makefile: Add the end marker
Add the end marker to tests, tests-container and modules-names.
2024-02-28 05:48:27 -08:00
Adhemerval Zanella
b53e73ea80 s390: Improve static-pie configure tests
Instead of tying based on the linker name and version, check for the
required support:

  * whether it does not generate dynamic TLS relocations in PIE
    (binutils PR ld/22263);
  * if it accepts --no-dynamic-linker (by using -static-pie);
  * and if it adds a DT_JMPREL pointing to .rela.iplt with static pie.

The patch also trims the comments, for binutils one of the tests should
already cover it.  The kernel ones are not clear which version should
have the backport, nor it is something that glibc can do much about
it.  Finally, the glibc is somewhat confusing, since it refers
to commits not related to s390x.

Checked with a build for s390x-linux-gnu.

Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
2024-02-28 10:09:53 -03:00
H.J. Lu
24c8db87c9 x86: Change ENQCMD test to CHECK_FEATURE_PRESENT
Since ENQCMD is mainly used in kernel, change the ENQCMD test to
CHECK_FEATURE_PRESENT.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-27 11:50:52 -08:00
Joe Ramsay
e302e10213 aarch64/fpu: Sync libmvec routines from 2.39 and before with AOR
This includes a fix for big-endian in AdvSIMD log, some cosmetic
changes, and numerous small optimisations mainly around inlining and
using indexed variants of MLA intrinsics.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-02-26 09:45:50 -03:00
Stefan Liebler
02782fd128 S390: Do not clobber r7 in clone [BZ #31402]
Starting with commit e57d8fc97b
"S390: Always use svc 0"
clone clobbers the call-saved register r7 in error case:
function or stack is NULL.

This patch restores the saved registers also in the error case.
Furthermore the existing test misc/tst-clone is extended to check
all error cases and that clone does not clobber registers in this
error case.
2024-02-26 13:37:46 +01:00
Sunil K Pandey
9f78a7c1d0 x86_64: Exclude SSE, AVX and FMA4 variants in libm multiarch
When glibc is built with ISA level 3 or higher by default, the resulting
glibc binaries won't run on SSE or FMA4 processors.  Exclude SSE, AVX and
FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by
default.

When glibc is built with ISA level 2 enabled by default, only keep SSE4.1
variant.

Fixes BZ 31335.

NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind
doesn't support AVX512 instructions:

https://bugs.kde.org/show_bug.cgi?id=383010

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-02-25 13:20:51 -08:00
H.J. Lu
dfb05f8e70 x86-64: Save APX registers in ld.so trampoline
Add APX registers to STATE_SAVE_MASK so that APX registers are saved in
ld.so trampoline.  This fixes BZ #31371.

Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will
be used by i386 _dl_tlsdesc_dynamic.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-25 09:22:15 -08:00
Simon Chopin
59e0441d4a tests: gracefully handle AppArmor userns containment
Recent AppArmor containment allows restricting unprivileged user
namespaces, which is enabled by default on recent Ubuntu systems.
When this happens, as is common with Linux Security Modules, the syscall
will fail with -EACCESS.

When that happens, the affected tests will now be considered unsupported
rather than simply failing.

Further information:

* https://gitlab.com/apparmor/apparmor/-/wikis/unprivileged_userns_restriction
* https://ubuntu.com/blog/ubuntu-23-10-restricted-unprivileged-user-namespaces
* https://manpages.ubuntu.com/manpages/jammy/man5/apparmor.d.5.html (for
  the return code)

V2:
* Fix duplicated line in check_unshare_hints
* Also handle similar failure in tst-pidfd_getpid

V3:
* Comment formatting
* Aded some more documentation on syscall return value

Signed-off-by: Simon Chopin <simon.chopin@canonical.com>
2024-02-23 08:50:00 -03:00
Adhemerval Zanella
1e9a550ba4 powerpc: Remove power7 strstr optimization
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).

Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).

Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-02-23 08:50:00 -03:00
Adhemerval Zanella
f4c142bb9f arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)
Instead of __dl_iterate_phdr. On ARM dlfo_eh_frame/dlfo_eh_count
maps to PT_ARM_EXIDX vaddr start / length.

On a Neoverse N1 machine with 160 cores, the following program:

  $ cat test.c
  #include <stdlib.h>
  #include <pthread.h>
  #include <assert.h>

  enum {
    niter = 1024,
    ntimes = 128,
  };

  static void *
  tf (void *arg)
  {
    int a = (int) arg;

    for (int i = 0; i < niter; i++)
      {
        void *p[ntimes];
        for (int j = 0; j < ntimes; j++)
  	p[j] = malloc (a * 128);
        for (int j = 0; j < ntimes; j++)
  	free (p[j]);
      }

    return NULL;
  }

  int main (int argc, char *argv[])
  {
    enum { nthreads = 16 };
    pthread_t t[nthreads];

    for (int i = 0; i < nthreads; i ++)
      assert (pthread_create (&t[i], NULL, tf, (void *) i) == 0);

    for (int i = 0; i < nthreads; i++)
      {
        void *r;
        assert (pthread_join (t[i], &r) == 0);
        assert (r == NULL);
      }

    return 0;
  }
  $ arm-linux-gnueabihf-gcc -fsanitize=address test.c -o test

Improves from ~15s to 0.5s.

Checked on arm-linux-gnueabihf.
2024-02-23 08:50:00 -03:00
Xi Ruoyao
e2a65ecc4b
math: Update mips64 ulps
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-02-22 21:28:25 +01:00
Daniel Cederman
aa4106db1d sparc: Treat the version field in the FPU control word as reserved
The FSR version field is read-only and might be non-zero.

This allows math/test-fpucw* to correctly pass when the version is
non-zero.

Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-02-19 10:55:50 -03:00
Flavio Cruz
88b771ab5e Implement setcontext/getcontext/makecontext/swapcontext for Hurd x86_64
Tested with the tests provided by glibc plus some other toy examples.
Message-ID: <20240217202535.1860803-1-flaviocruz@gmail.com>
2024-02-17 21:45:35 +01:00
Flavio Cruz
e3da8f9bad Use proc_getchildren_rusage when available in getrusage and times.
Message-ID: <20240217164846.1837223-1-flaviocruz@gmail.com>
2024-02-17 21:14:39 +01:00
Florian Weimer
6a04404521 Linux: Switch back to assembly syscall wrapper for prctl (bug 29770)
Commit ff026950e2 ("Add a C wrapper for
prctl [BZ #25896]") replaced the assembler wrapper with a C function.
However, on powerpc64le-linux-gnu, the C variadic function
implementation requires extra work in the caller to set up the
parameter save area.  Calling a function that needs a parameter save
area without one (because the prototype used indicates the function is
not variadic) corrupts the caller's stack.   The Linux manual pages
project documents prctl as a non-variadic function.  This has resulted
in various projects over the years using non-variadic prototypes,
including the sanitizer libraries in LLVm and GCC (GCC PR 113728).

This commit switches back to the assembler implementation on most
targets and only keeps the C implementation for x86-64 x32.

Also add the __prctl_time64 alias from commit
b39ffab860 ("Linux: Add time64 alias for
prctl") to sysdeps/unix/sysv/linux/syscalls.list; it was not yet
present in commit ff026950e2.

This restores the old ABI on powerpc64le-linux-gnu, thus fixing
bug 29770.

Reviewed-By: Simon Chopin <simon.chopin@canonical.com>
2024-02-17 09:17:04 +01:00