Commit Graph

16114 Commits

Author SHA1 Message Date
Samuel Thibault
a43003ebf6 htl: avoid exposing the vm_region symbol 2023-09-09 10:07:39 +02:00
Florian Weimer
6985865bc3 elf: Always call destructors in reverse constructor order (bug 30785)
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors.  Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles.  (The force_first handling in
_dl_sort_maps is not effective for dlclose.)  After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.

A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.

In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated.  (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.

After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order.  These destructors
can call dlopen.  Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them.  Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)

After _dl_init_called_list traversal, two more loops follow.  The
processing order changes to the original link map order in the
namespace.  Previously, dependency order was used.  The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.

The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list.  The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.

tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications.  The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.

The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.

There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.

Fixes commit 1df71d32fe ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").

Reviewed-by: DJ Delorie <dj@redhat.com>
2023-09-08 12:34:27 +02:00
Aurelien Jarno
434bf72a94 io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64
Commit 5f828ff824 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.

Fix that by also checking that define, restoring the pre
4d0fe291ae commit values:

Default values:
- F_GETLK: 5
- F_SETLK: 6
- F_SETLKW: 7

With -D_FILE_OFFSET_BITS=64:
- F_GETLK: 12
- F_SETLK: 13
- F_SETLKW: 14

At the same time, it has been noticed that there was no test for io lock
with __USE_FILE_OFFSET64, so just add one.

Tested on x86_64-linux-gnu, i686-linux-gnu and
powerpc64le-unknown-linux-gnu.

Resolves: BZ #30804.
Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2023-09-07 21:56:31 +02:00
Joe Simmons-Talbott
955a47a4bf getaddrinfo: Get rid of alloca
Use a scratch_buffer rather than alloca to avoid potential stack
overflow.
2023-09-06 13:33:02 +00:00
Christoph Müllner
3d6fcf1bd7 riscv: Add support for XTheadBb in string-fz[a,i].h
XTheadBb has similar instructions like Zbb, which allow optimized
string processing:
* th.ff0: find-first zero is a CLZ instruction.
* th.tstnbz: Similar like orc.b, but with a bit-inverted result.

The instructions are documented here:
  https://github.com/T-head-Semi/thead-extension-spec/tree/master/xtheadbb

These instructions can be found in the T-Head C906 and the C910.

Tested with the string tests.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-06 09:27:43 -03:00
Siddhesh Poyarekar
3bf7bab88b getcanonname: Fix a typo
This code is generally unused in practice since there don't seem to be
any NSS modules that only implement _nss_MOD_gethostbyname2_r and not
_nss_MOD_gethostbyname3_r.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-09-05 17:04:05 -04:00
Adhemerval Zanella Netto
e7190fc73d linux: Add pidfd_getpid
This interface allows to obtain the associated process ID from the
process file descriptor.  It is done by parsing the procps fdinfo
information.  Its prototype is:

   pid_t pidfd_getpid (int fd)

It returns the associated pid or -1 in case of an error and sets the
errno accordingly.  The possible errno values are those from open, read,
and close (used on procps parsing), along with:

   - EBADF if the FD is negative, does not have a PID associated, or if
     the fdinfo fields contain a value larger than pid_t.

   - EREMOTE if the PID is in a separate namespace.

   - ESRCH if the process is already terminated.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:59 -03:00
Adhemerval Zanella Netto
0d6f9f6265 posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
Returning a pidfd allows a process to keep a race-free handle for a
child process, otherwise, the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep the old racy interface
base on pid_t.

To correct use pifd_spawn, the kernel must support not only returning
the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux
5.4).  If kernel does not support the waitid, pidfd return ENOSYS.
It avoids the need to racy workarounds, such as reading the procfs
fdinfo to get the pid to use along with other wait interfaces.

These interfaces are similar to the posix_spawn and posix_spawnp, with
the only difference being it returns a process file descriptor (int)
instead of a process ID (pid_t).  Their prototypes are:

  int pidfd_spawn (int *restrict pidfd,
                   const char *restrict file,
                   const posix_spawn_file_actions_t *restrict facts,
                   const posix_spawnattr_t *restrict attrp,
                   char *const argv[restrict],
                   char *const envp[restrict])

  int pidfd_spawnp (int *restrict pidfd,
                    const char *restrict path,
                    const posix_spawn_file_actions_t *restrict facts,
                    const posix_spawnattr_t *restrict attrp,
                    char *const argv[restrict_arr],
                    char *const envp[restrict_arr]);

A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the return
argument lifetime.  Although on Linux pid_t and int are interchangeable,
POSIX only states that pid_t should be a signed integer.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It
also means that both interfaces support the same attribute and file
actions, and a new flag or file action on posix_spawn is also added
automatically for pidfd_spawn.

Also, using posix_spawn plumbing allows the reusing of most of the
current testing with some changes:

  - waitid is used instead of waitpid since it is a more generic
    interface.

  - tst-posix_spawn-setsid.c is adapted to take into consideration that
    the caller can check for session id directly.  The test now spawns
itself and writes the session id as a file instead.

  - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an
    extra file description unused.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:59 -03:00
Adhemerval Zanella Netto
ce2bfb8569 linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26371)
These functions allow to posix_spawn and posix_spawnp to use
CLONE_INTO_CGROUP with clone3, allowing the child process to
be created in a different cgroup version 2.  These are GNU
extensions that are available only for Linux, and also only
for the architectures that implement clone3 wrapper
(HAVE_CLONE3_WRAPPER).

To create a process on a different cgroupv2, one can use the:

  posix_spawnattr_t attr;
  posix_spawnattr_init (&attr);
  posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
  posix_spawnattr_setcgroup_np (&attr, cgroup);
  posix_spawn (...)

Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
whether the cgroup file descriptor will be used or not with
clone3.

There is no fallback if either clone3 does not support the flag
or if the architecture does not provide the clone3 wrapper, in
this case posix_spawn returns EOPNOTSUPP.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:48 -03:00
Adhemerval Zanella Netto
ad77b1bcca linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc
Not all architectures added clone3 syscall.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 10:15:48 -03:00
Adhemerval Zanella Netto
e7d1c58664 mips: Add the clone3 wrapper
It follows the internal signature:

extern int clone3 (struct clone_args *__cl_args, size_t __size,
                   int (*__func) (void *__arg), void *__arg);

Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and
mipsel-linux-gnu.
2023-09-05 10:15:48 -03:00
Adhemerval Zanella Netto
b56f7fe79e arm: Add the clone3 wrapper
It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
		    int (*__func) (void *__arg), void *__arg);

Checked on arm-linux-gnueabihf.
2023-09-05 10:15:48 -03:00
Samuel Thibault
8076906109 htl: Fix stack information for main thread
We can easily directly ask the kernel with vm_region rather than
assuming a one-page stack.
2023-09-03 21:11:29 +02:00
Szabolcs Nagy
d2123d6827 elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.

It may be possible to detect up-to-date dtv faster.  But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.

This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once.  The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.

This patch uses acquire/release synchronization when accessing the
generation counter.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-01 08:21:37 +01:00
H.J. Lu
1493622f4f x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]
The old Intel software developer manual specified that the low byte of
EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of
CPUDID leaf 2 was needed to retrieve the complete cache information. The
newer Intel manual has been changed to that it should always return 1
and be ignored.  If the lower byte isn't 1, CPUID leaf 2 can't be used.
In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead.  If
CPUID leaf 4 doesn't contain the cache information, cache information
isn't available at all.  This addresses BZ #30643.
2023-08-29 12:57:41 -07:00
dengjianbo
693918b6dd LoongArch: Change loongarch to LoongArch in comments 2023-08-29 10:35:38 +08:00
dengjianbo
ea7698a616 LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx}
According to glibc memcmp microbenchmark test results(Add generic
memcmp), this implementation have performance improvement
except the length is less than 3, details as below:

Name             Percent of time reduced
memcmp-lasx      16%-74%
memcmp-lsx       20%-50%
memcmp-aligned   5%-20%
2023-08-29 10:35:38 +08:00
dengjianbo
1b1e9b7c10 LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx}
According to glibc memset microbenchmark test results, for LSX and LASX
versions, A few cases with length less than 8 experience performace
degradation, overall, the LASX version could reduce the runtime about
15% - 75%, LSX version could reduce the runtime about 15%-50%.

The unaligned version uses unaligned memmory access to set data which
length is less than 64 and make address aligned with 8. For this part,
the performace is better than aligned version. Comparing with the generic
version, the performance is close when the length is larger than 128. When
the length is 8-128, the unaligned version could reduce the runtime about
30%-70%, the aligned version could reduce the runtime about 20%-50%.
2023-08-29 10:35:38 +08:00
dengjianbo
55e84dc6ed LoongArch: Add ifunc support for memrchr{lsx, lasx}
According to glibc memrchr microbenchmark, this implementation could reduce
the runtime as following:

Name            Percent of rutime reduced
memrchr-lasx    20%-83%
memrchr-lsx     20%-64%
2023-08-29 10:35:38 +08:00
dengjianbo
60bcb9acbf LoongArch: Add ifunc support for memchr{aligned, lsx, lasx}
According to glibc memchr microbenchmark, this implementation could reduce
the runtime as following:

Name               Percent of runtime reduced
memchr-lasx        37%-83%
memchr-lsx         30%-66%
memchr-aligned     0%-15%
2023-08-29 10:35:38 +08:00
dengjianbo
f8664fe215 LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx}
According to glibc rawmemchr microbenchmark, A few cases tested with
char '\0' experience performance degradation due to the lasx and lsx
versions don't handle the '\0' separately. Overall, rawmemchr-lasx
implementation could reduce the runtime about 40%-80%, rawmemchr-lsx
implementation could reduce the runtime about 40%-66%, rawmemchr-aligned
implementation could reduce the runtime about 20%-40%.
2023-08-29 10:35:38 +08:00
Xi Ruoyao
3efa26749e LoongArch: Micro-optimize LD_PCREL
We are requiring Binutils >= 2.41, so explicit relocation syntax is
always supported by the assembler.  Use it to reduce one instruction.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Xi Ruoyao
aac842d0ed LoongArch: Remove support code for old linker in start.S
We are requiring Binutils >= 2.41, so la.pcrel always works here.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Xi Ruoyao
e757412c3e LoongArch: Simplify the autoconf check for static PIE
We are strictly requiring GAS >= 2.41 now, so we don't need to check
assembler capability anymore.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Kir Kolyshkin
42c960a4f1 Add F_SEAL_EXEC from Linux 6.3 to bits/fcntl-linux.h.
This patch adds the new F_SEAL_EXEC constant from Linux 6.3 (see Linux
commit 6fd7353829c ("mm/memfd: add F_SEAL_EXEC") to bits/fcntl-linux.h.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-28 14:51:39 -03:00
Adhemerval Zanella
87ced255bd m68k: Use M68K_SCALE_AVAILABLE on __mpn_lshift and __mpn_rshift
This patch adds a new macro, M68K_SCALE_AVAILABLE, similar to gmp
scale_available_p (mpn/m68k/m68k-defs.m4) that expand to 1 if a
scale factor can be used in addressing modes.  This is used
instead of __mc68020__ for some optimization decisions.

Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
2023-08-25 10:07:24 -03:00
Adhemerval Zanella
b85880633f m68k: Fix build with -mcpu=68040 or higher (BZ 30740)
GCC currently does not define __mc68020__ for -mcpu=68040 or higher,
which memcpy/memmove assumptions.  Since this memory copy optimization
seems only intended for m68020, disable for other m680X0 variants.

Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
2023-08-25 10:07:24 -03:00
dengjianbo
ddbb74f5c2 LoongArch: Add ifunc support for strncmp{aligned, lsx}
Based on the glibc microbenchmark, only a few short inputs with this
strncmp-aligned and strncmp-lsx implementation experience performance
degradation, overall, strncmp-aligned could reduce the runtime 0%-10%
for aligned comparision, 10%-25% for unaligend comparision, strncmp-lsx
could reduce the runtime about 0%-60%.
2023-08-24 17:19:47 +08:00
dengjianbo
82d9426e4a LoongArch: Add ifunc support for strcmp{aligned, lsx}
Based on the glibc microbenchmark, strcmp-aligned implementation could
reduce the runtime 0%-10% for aligned comparison, 10%-20% for unaligned
comparison, strcmp-lsx implemenation could reduce the runtime 0%-50%.
2023-08-24 17:19:47 +08:00
dengjianbo
e74d959862 LoongArch: Add ifunc support for strnlen{aligned, lsx, lasx}
Based on the glibc microbenchmark, strnlen-aligned implementation could
reduce the runtime more than 10%, strnlen-lsx implementation could reduce
the runtime about 50%-78%, strnlen-lasx implementation could reduce the
runtime about 50%-88%.
2023-08-24 17:19:47 +08:00
Guy-Fleury Iteriteka
1dc0bc8f07 htl: move pthread_attr_setdetachstate into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-11-gfleury@disroot.org>
2023-08-24 01:57:22 +02:00
Guy-Fleury Iteriteka
92a6c26470 htl: move pthread_attr_getdetachstate into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-10-gfleury@disroot.org>
2023-08-24 01:57:17 +02:00
Guy-Fleury Iteriteka
c2c9feebdc htl: move pthread_attr_setschedpolicy into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-9-gfleury@disroot.org>
2023-08-24 01:57:16 +02:00
Guy-Fleury Iteriteka
0f3a39072b htl: move pthread_attr_getschedpolicy into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-8-gfleury@disroot.org>
2023-08-24 01:57:14 +02:00
Guy-Fleury Iteriteka
fb2d92a5b3 htl: move pthread_attr_setinheritsched into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-7-gfleury@disroot.org>
2023-08-24 01:57:13 +02:00
Guy-Fleury Iteriteka
62cf5d2bb3 htl: move pthread_attr_getinheritsched into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-6-gfleury@disroot.org>
2023-08-24 01:57:11 +02:00
Guy-Fleury Iteriteka
79de1a0ca2 htl: move pthread_attr_getschedparam into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-5-gfleury@disroot.org>
2023-08-24 01:57:10 +02:00
Guy-Fleury Iteriteka
3caa6362d0 htl: move pthread_setschedparam into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-4-gfleury@disroot.org>
2023-08-24 01:57:08 +02:00
Guy-Fleury Iteriteka
a1a942fb5f htl: move pthread_getschedparam into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-3-gfleury@disroot.org>
2023-08-24 01:57:04 +02:00
Guy-Fleury Iteriteka
9dfa256216 htl: move pthread_equal into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-2-gfleury@disroot.org>
2023-08-24 01:56:57 +02:00
Florian Weimer
65a5112ede Linux: Avoid conflicting types in ld.so --list-diagnostics
The path auxv[*].a_val could either be an integer or a string,
depending on the a_type value.  Use a separate field, a_val_string, to
simplify mechanical parsing of the --list-diagnostics output.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-23 08:12:48 +02:00
H.J. Lu
a8ecb126d4 x86_64: Add log1p with FMA
On Skylake, it changes log1p bench performance by:

        Before       After     Improvement
max     63.349       58.347       8%
min     4.448        5.651        -30%
mean    12.0674      10.336       14%

The minimum code path is

 if (hx < 0x3FDA827A)                          /* x < 0.41422  */
    {
      if (__glibc_unlikely (ax >= 0x3ff00000))           /* x <= -1.0 */
        {
	   ...
        }
      if (__glibc_unlikely (ax < 0x3e200000))           /* |x| < 2**-29 */
        {
          math_force_eval (two54 + x);          /* raise inexact */
          if (ax < 0x3c900000)                  /* |x| < 2**-54 */
            {
	      ...
            }
          else
            return x - x * x * 0.5;

FMA and non-FMA code sequences look similar.  Non-FMA version is slightly
faster.  Since log1p is called by asinh and atanh, it improves asinh
performance by:

        Before       After     Improvement
max     75.645       63.135       16%
min     10.074       10.071       0%
mean    15.9483      14.9089      6%

and improves atanh performance by:

        Before       After     Improvement
max     91.768       75.081       18%
min     15.548       13.883       10%
mean    18.3713      16.8011      8%
2023-08-21 10:44:26 -07:00
Andreas Schwab
ce99601fa8 Remove references to the defunct db2 subdir
The db2 subdir has been removed more than 20 years ago.
2023-08-21 18:20:53 +02:00
Stefan Liebler
f5f96b784b s390x: Fix static PIE condition for toolchain bootstrapping.
The static PIE configure check uses link tests.  When bootstrapping
a cross-toolchain, the link tests fail due to missing crt-files /
libc.so.  As we explicitely want to test an issue in binutils (ld),
we now also explicitely check for known linker versions.

See also commit 368b7c614b
S390: Use compile-only instead of also link-tests in configure.
2023-08-18 10:57:59 +02:00
Andreas Schwab
464fd8249e m68k: fix __mpn_lshift and __mpn_rshift for non-68020
From revision 03f3d275d0d6 in the gmp repository.
2023-08-17 21:56:14 +02:00
Sam James
369f373057
sysdeps: tst-bz21269: fix -Wreturn-type
Thanks to Andreas Schwab for reporting.

Fixes: 652b9fdb77
Signed-off-by: Sam James <sam@gentoo.org>
2023-08-17 09:30:57 +01:00
dengjianbo
8944ba483f Loongarch: Add ifunc support for memcpy{aligned, unaligned, lsx, lasx} and memmove{aligned, unaligned, lsx, lasx}
These implementations improve the time to copy data in the glibc
microbenchmark as below:
memcpy-lasx       reduces the runtime about 8%-76%
memcpy-lsx        reduces the runtime about 8%-72%
memcpy-unaligned  reduces the runtime of unaligned data copying up to 40%
memcpy-aligned    reduece the runtime of unaligned data copying up to 25%
memmove-lasx      reduces the runtime about 20%-73%
memmove-lsx       reduces the runtime about 50%
memmove-unaligned reduces the runtime of unaligned data moving up to 40%
memmove-aligned   reduces the runtime of unaligned data moving up to 25%
2023-08-17 10:12:18 +08:00
dengjianbo
ba67bc8e0a Loongarch: Add ifunc support for strchr{aligned, lsx, lasx} and strchrnul{aligned, lsx, lasx}
These implementations improve the time to run strchr{nul}
microbenchmark in glibc as below:
strchr-lasx       reduces the runtime about 50%-83%
strchr-lsx        reduces the runtime about 30%-67%
strchr-aligned    reduces the runtime about 10%-20%
strchrnul-lasx    reduces the runtime about 50%-83%
strchrnul-lsx     reduces the runtime about 36%-65%
strchrnul-aligned reduces the runtime about 6%-10%
2023-08-17 10:12:18 +08:00
Sam James
652b9fdb77 sysdeps: tst-bz21269: handle ENOSYS & skip appropriately
SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which
some distributions may disable for hardening. Check if that's the case (unset)
and mark the test as UNSUPPORTED if so.

Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
2023-08-16 21:01:39 +01:00
Sam James
e0b712dd91 sysdeps: tst-bz21269: fix test parameter
All callers pass 1 or 0x11 anyway (same meaning according to man page),
but still.

Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
2023-08-16 21:01:37 +01:00
Samuel Thibault
81dcf8b3d1 hurd: Fix strictness of <mach/thread_state.h>
Fixes: db25bc52026f ("hurd: Add prototype for and thus fix _hurdsig_abort_rpcs call")
2023-08-16 00:12:52 +02:00
H.J. Lu
1b214630ce x86_64: Add expm1 with FMA
On Skylake, it improves expm1 bench performance by:

        Before       After     Improvement
max     70.204       68.054       3%
min     20.709       16.2         22%
mean    22.1221      16.7367      24%

NB: Add

extern long double __expm1l (long double);
extern long double __expm1f128 (long double);

for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
2023-08-14 08:14:19 -07:00
dengjianbo
135407f431 Loongarch: Add ifunc support and add different versions of strlen
strlen-lasx is implemeted by LASX simd instructions(256bit)
strlen-lsx is implemeted by LSX simd instructions(128bit)
strlen-align is implemented by LA basic instructions and never use unaligned memory acess
2023-08-14 09:47:09 +08:00
dengjianbo
cb7954c4c2 LoongArch: Add minuimum binutils required version
LoongArch glibc can add some LASX/LSX vector instructions codes,
change the required minimum binutils version to 2.41 which could
support vector instructions. HAVE_LOONGARCH_VEC_ASM is removed
accordingly.
2023-08-14 09:47:09 +08:00
dengjianbo
57b2c14272 LoongArch: Redefine macro LEAF/ENTRY.
The following usage of macro LEAF/ENTRY are all feasible:
1. LEAF(fcn) -- the align value of fcn is .align 3(default value)
2. LEAF(fcn, 6) -- the align value of fcn is .align 6
2023-08-14 09:47:09 +08:00
Noah Goldstein
084fb31bc2 x86: Fix incorrect scope of setting shared_per_thread [BZ# 30745]
The:

```
    if (shared_per_thread > 0 && threads > 0)
      shared_per_thread /= threads;
```

Code was accidentally moved to inside the else scope.  This doesn't
match how it was previously (before af992e7abd).

This patch fixes that by putting the division after the `else` block.
2023-08-11 15:33:08 -05:00
H.J. Lu
f6b10ed8e9 x86_64: Add log2 with FMA
On Skylake, it improves log2 bench performance by:

        Before       After     Improvement
max     208.779      63.827       69%
min     9.977        6.55         34%
mean    10.366       6.8191       34%
2023-08-11 07:49:45 -07:00
Florian Weimer
039ff51ac7 nscd: Do not rebuild getaddrinfo (bug 30709)
The nscd daemon caches hosts data from NSS modules verbatim, without
filtering protocol families or sorting them (otherwise separate caches
would be needed for certain ai_flags combinations).  The cache
implementation is complete separate from the getaddrinfo code.  This
means that rebuilding getaddrinfo is not needed.  The only function
actually used is __bump_nl_timestamp from check_pf.c, and this change
moves it into nscd/connections.c.

Tested on x86_64-linux-gnu with -fexceptions, built with
build-many-glibcs.py.  I also backported this patch into a distribution
that still supports nscd and verified manually that caching still works.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-08-11 10:10:16 +02:00
H.J. Lu
881546979d x86_64: Sort fpu/multiarch/Makefile
Sort Makefile variables using scripts/sort-makefile-lines.py.

No code generation changes observed in libm.  No regressions on x86_64.
2023-08-10 11:23:25 -07:00
Adhemerval Zanella
c73c96a4a1 i686: Fix build with --disable-multiarch
Since i686 provides the fortified wrappers for memcpy, mempcpy,
memmove, and memset on the same string implementation, the static
build tries to optimized it by not tying the fortified wrappers
to string routine (to avoid pulling the fortify function if
they are not required).

Checked on i686-linux-gnu building with different option:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-08-10 10:29:29 -03:00
Adhemerval Zanella
51cb52214f x86_64: Fix build with --disable-multiarch (BZ 30721)
With multiarch disabled, the default memmove implementation provides
the fortify routines for memcpy, mempcpy, and memmove.  However, it
does not provide the internal hidden definitions used when building
with fortify enabled.  The memset has a similar issue.

Checked on x86_64-linux-gnu building with different options:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-08-10 10:29:29 -03:00
Joseph Myers
b163fca6c3 Add PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG etc. from Linux 6.4 to sys/ptrace.h
Linux 6.4 adds new constants PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
and PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG.  Add those to all
relevant sys/ptrace.h headers, along with adding the associated
argument structure to bits/ptrace-shared.h (named struct
__ptrace_sud_config there following the usual convention for such
structures).

Tested for x86_64 and with build-many-glibcs.py.
2023-08-08 14:38:22 +00:00
Joseph Myers
c8c20039c7 Add PACKET_VNET_HDR_SZ from Linux 6.4 to netpacket/packet.h
Linux 6.4 adds a new constant PACKET_VNET_HDR_SZ; add it to glibc's
netpacket/packet.h.

Tested for x86_64.
2023-08-08 14:37:45 +00:00
Samuel Thibault
e3ae80adbc hurd: Make error_t an int in C++
Making error_t defined to enum __error_t_codes conveniently makes the
debugger print symbolic values, but in C++ int is not interoperable with
enum __error_t_codes, leading to C++ application build issues, so let's
revert error_t to int in C++.
2023-08-08 16:07:57 +02:00
наб
92861d93cd linux: statvfs: allocate spare for f_type
This is the only missing part in struct statvfs.
The LSB calls [f]statfs() deprecated, and its weird types are definitely
off-putting. However, its use is required to get f_type.

Instead, allocate one of the six spares to f_type,
copied directly from struct statfs.
This then becomes a small glibc extension to the standard interface
on Linux and the Hurd, instead of two different interfaces, one of which
is quite odd due to being an ABI type, and there no longer is any reason
to use statfs().

The underlying kernel type is a mess, but all architectures agree on u32
(or more) for the ABI, and all filesystem magicks are 32-bit integers.
We don't lose any generality by using u32, and by doing so we both make
the API consistent with the Hurd, and allow C++
  switch(f_type) { case RAMFS_MAGIC: ...; }

Also fix tst-statvfs so that it actually fails;
as it stood, all it did was return 0 always.
Test statfs()' and statvfs()' f_types are the same.

Link: https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-08 09:29:06 -03:00
наб
a9847e2c66 hurd: statvfs: __f_type -> f_type
No further changes needed ([f]statvfs() just cast to struct statfs *
and call [f]statfs()).

Link: https://lore.kernel.org/linux-man/f54kudgblgk643u32tb6at4cd3kkzha6hslahv24szs4raroaz@ogivjbfdaqtb/t/#u
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2023-08-08 09:29:06 -03:00
Samuel Thibault
53da64d1cf htl: Initialize ___pthread_self early
When using jemalloc, malloc() needs to use TSD, while libpthread
initialization needs malloc(). Having ___pthread_self set early to some
static storage allows TSD to work early, thus allowing jemalloc and
libpthread to initialize together.

This incidentaly simplifies __pthread_enable/disable_asynccancel and
__pthread_self, now that ___pthread_self is always initialized.
2023-08-08 12:19:29 +02:00
Samuel Thibault
644aa127b9 htl: Add support for static TSD data
When using jemalloc, malloc() needs to use TSD, while libpthread
initialization needs malloc(). Supporting a static TSD area allows jemalloc
and libpthread to initialize together.
2023-08-08 12:17:48 +02:00
Sajan Karumanchi
dcad5c8578 x86: Fix for cache computation on AMD legacy cpus.
Some legacy AMD CPUs and hypervisors have the _cpuid_ '0x8000_001D'
set to Zero, thus resulting in zeroed-out computed cache values.
This patch reintroduces the old way of cache computation as a
fail-safe option to handle these exceptions.
Fixed 'level4_cache_size' value through handle_amd().

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Tested-by: Florian Weimer <fweimer@redhat.com>
2023-08-06 19:10:42 +05:30
Samuel Thibault
53850f044f hurd: Rework generating errno.h
We only need to give to gawk the headers that actually define error
numbers, so let's rather filter out the other included headers early.
2023-08-06 22:35:01 +02:00
Samuel Thibault
41d8c3bc33 powerpc longjmp: Fix build after chk hidden builtin fix
04bf7d2d8a ("chk: Add and fix hidden builtin definitions for *_chk")
added an #undef for longjmp and siglongjmp to compensate for the
definition in include/setjmp.h, but missed doing so for the powerpc
version too.

Fixes: 04bf7d2d8a ("chk: Add and fix hidden builtin definitions for
*_chk")
2023-08-04 10:03:59 +02:00
Yang Yujie
c579293f67 LoongArch: Fix static PIE condition for toolchain bootstrapping.
This patch allows the static PIE startfile rcrt1.o to be built
without requiring libgcc_s.so from GCC, which depends on libc
in the first place.
2023-08-04 14:04:37 +08:00
Joseph Myers
bd154cdb9e Add IP_PROTOCOL from Linux 6.4 to bits/in.h
Linux 6.4 adds a new constant IP_PROTOCOL; add it to glibc's
bits/in.h.

Tested for x86_64.
2023-08-01 17:22:12 +00:00
Joseph Myers
47b76f6d1d Update kernel version to 6.4 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.4.  (There are no new
constants covered by these tests in 6.4 that need any other header
changes.)

Tested with build-many-glibcs.py.
2023-08-01 12:43:04 +00:00
Mahesh Bodapati
21841f0d56 PowerPC: Influence cpu/arch hwcap features via GLIBC_TUNABLES
This patch enables the option to influence hwcaps used by PowerPC.
The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz....,
can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx
and zzz, where the feature name is case-sensitive and has to match the ones
mentioned in the file{sysdeps/powerpc/dl-procinfo.c}.

Note that the hwcap tunables only used in the IFUNC selection.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-01 07:41:17 -05:00
H.J. Lu
1547d6a64f <sys/platform/x86.h>: Add APX support
Add support for Intel Advanced Performance Extensions:

https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

to <sys/platform/x86.h>.
2023-07-27 08:42:32 -07:00
Adhemerval Zanella Netto
dbc4b032dc linux: Fix i686 with gcc6
On __convert_scm_timestamps GCC 6 issues an warning that tvts[0]/tvts[1]
maybe be used uninitialized, however it would be used if type is set to a
value different than 0 (done by either COMPAT_SO_TIMESTAMP_OLD or
COMPAT_SO_TIMESTAMPNS_OLD) which will fallthrough to 'common' label.

It does not show with gcc 7 or more recent versions.

Checked on i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-26 09:45:55 -03:00
Adhemerval Zanella Netto
0b1a76c577 i386: Remove memset_chk-nonshared.S
Similar to memcpy, mempcpy, and memmove there is no need for an
specific memset_chk-nonshared.S.  It can be provided by
memset-ia32.S itself for static library.

Checked on i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-26 09:45:55 -03:00
Adhemerval Zanella Netto
f8f9a27257 i386: Fix build with --enable-fortify=3
The i386 string routines provide multiple internal definitions
for memcpy, memmove, and mempcpy chk routines:

  $ objdump -t libc.a | grep __memcpy_chk
  00000000 g     F .text  0000000e __memcpy_chk
  00000000 g     F .text  00000013 __memcpy_chk
  $ objdump -t libc.a | grep __mempcpy_chk
  00000000 g     F .text  0000000e __mempcpy_chk
  00000000 g     F .text  00000013 __mempcpy_chk
  $ objdump -t libc.a | grep __memmove_chk
  00000000 g     F .text  0000000e __memmove_chk
  00000000 g     F .text  00000013 __memmove_chk

Although is not an issue for normal static builds, with fortify=3
glibc itself might use the fortify chk functions and thus static
build might fail with multiple definitions.  For instance:

x86_64-glibc-linux-gnu-gcc -m32 -march=i686 -o [...]math/test-signgam-uchar-static -nostdlib -nostartfiles -static -static-pie [...]
x86_64-glibc-linux-gnu/bin/ld: [...]/libc.a(mempcpy-ia32.o):
in function `__mempcpy_chk': [...]/glibc-git/string/../sysdeps/i386/i686/mempcpy.S:32: multiple definition of `__mempcpy_chk';
[...]/libc.a(mempcpy_chk-nonshared.o):[...]/debug/../sysdeps/i386/mempcpy_chk.S:28: first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [../Rules:298:

There is no need for mem*-nonshared.S, the __mem*_chk routines
are already provided by the assembly routines.

Checked on i686-linux-gnu with gcc 13 built with fortify=1,2,3 and
without fortify.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-26 09:45:55 -03:00
Adhemerval Zanella Netto
648c3b574d powerpc: Fix powerpc64 strchrnul build with old gcc
The compiler might not see that internal definition is an alias
due the libc_ifunc macro, which redefines __strchrnul.  With
gcc 6 it fails with:

In file included from <command-line>:0:0:
./../include/libc-symbols.h:472:33: error: ‘__EI___strchrnul’ aliased to
undefined symbol ‘__GI___strchrnul’
   extern thread __typeof (name) __EI_##name \
                                 ^
./../include/libc-symbols.h:468:3: note: in expansion of macro
‘__hidden_ver2’
   __hidden_ver2 (, local, internal, name)
   ^~~~~~~~~~~~~
./../include/libc-symbols.h:476:29: note: in expansion of macro
‘__hidden_ver1’
 #  define hidden_def(name)  __hidden_ver1(__GI_##name, name, name);
                             ^~~~~~~~~~~~~
./../include/libc-symbols.h:557:32: note: in expansion of macro
‘hidden_def’
 # define libc_hidden_def(name) hidden_def (name)
                                ^~~~~~~~~~
../sysdeps/powerpc/powerpc64/multiarch/strchrnul.c:38:1: note: in
expansion of macro ‘libc_hidden_def’
 libc_hidden_def (__strchrnul)
 ^~~~~~~~~~~~~~~

Use libc_ifunc_hidden as stpcpy.  Checked on powerpc64 with
gcc 6 and gcc 13.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-26 09:45:22 -03:00
Aurelien Jarno
a3eac15251 MIPS: Update mips32 and mip64 libm test ulps
Generated on a Cavium Octeon III 2 board running Linux version 4.19.249
and GCC 13.1.0.

Needed due to commit cf7ffdd8a5 ("added pair of inputs for hypotf in
binary32").
2023-07-25 22:20:57 +02:00
Stefan Liebler
637aac2ae3 Include sys/rseq.h in tst-rseq-disable.c
Starting with commit 2c6b4b272e
"nptl: Unconditionally use a 32-byte rseq area", the testcase
misc/tst-rseq-disable is UNSUPPORTED as RSEQ_SIG is not defined.

The mentioned commit removes inclusion of sys/rseq.h in nptl/descr.h.
Thus just include sys/rseq.h in the tst-rseq-disable.c as also done
in tst-rseq.c and tst-rseq-nptl.c.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-07-25 12:27:30 +02:00
Aurelien Jarno
7fcdc2380c riscv: Update rvd libm test ulps
Generated on a VisionFive 2 board running Linux version 6.4.2 and
GCC 13.1.0.

Needed due to commit cf7ffdd8a5 ("added pair of inputs for hypotf in
binary32").
2023-07-22 15:55:33 +02:00
Andreas K. Hüttel
6d457ff36a
Update x86_64 libm-test-ulps (x32 ABI)
Based on feedback by Mike Gilbert <floppym@gentoo.org>
Linux-6.1.38-dist x86_64 AMD Phenom-tm- II X6 1055T Processor
-march=amdfam10
failures occur for x32 ABI

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2023-07-19 16:56:54 +02:00
Noah Goldstein
8b9a0af8ca [PATCH v1] x86: Use 3/4*sizeof(per-thread-L3) as low bound for NT threshold.
On some machines we end up with incomplete cache information. This can
make the new calculation of `sizeof(total-L3)/custom-divisor` end up
lower than intended (and lower than the prior value). So reintroduce
the old bound as a lower bound to avoid potentially regressing code
where we don't have complete information to make the decision.
Reviewed-by: DJ Delorie <dj@redhat.com>
2023-07-18 22:34:34 -05:00
Noah Goldstein
47f7472178 x86: Fix slight bug in shared_per_thread cache size calculation.
After:
```
    commit af992e7abd
    Author: Noah Goldstein <goldstein.w.n@gmail.com>
    Date:   Wed Jun 7 13:18:01 2023 -0500

        x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4`
```

Split `shared` (cumulative cache size) from `shared_per_thread` (cache
size per socket), the `shared_per_thread` *can* be slightly off from
the previous calculation.

Previously we added `core` even if `threads_l2` was invalid, and only
used `threads_l2` to divide `core` if it was present. The changed
version only included `core` if `threads_l2` was valid.

This change restores the old behavior if `threads_l2` is invalid by
adding the entire value of `core`.
Reviewed-by: DJ Delorie <dj@redhat.com>
2023-07-18 20:56:25 -05:00
Andreas K. Hüttel
2037f8ad01
Update i686 libm-test-ulps (again)
Based on feedback by Arsen Arsenović <arsen@gentoo.org>
Linux-6.1.38-gentoo-dist-hardened x86_64 AMD Ryzen 7 3800X 8-Core Processor
-march=x86-64-v2

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2023-07-19 01:32:13 +02:00
Andreas K. Hüttel
86e56ecf2f
Update i686 libm-test-ulps
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2023-07-18 23:12:24 +02:00
Siddhesh Poyarekar
c6cb8783b5 configure: Use autoconf 2.71
Bump autoconf requirement to 2.71 to allow regenerating configure on
more recent distributions.  autoconf 2.71 has been in Fedora since F36
and is the current version in Debian stable (bookworm).  It appears to
be current in Gentoo as well.

All sysdeps configure and preconfigure scripts have also been
regenerated; all changes are trivial transformations that do not affect
functionality.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-17 10:08:10 -04:00
Adhemerval Zanella
5a70ac9d39 Update sparc libm-test-ulps 2023-07-17 10:09:44 -03:00
Adhemerval Zanella
721f30116c s390: Add the clone3 wrapper
It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
                     int (*__func) (void *__arg), void *__arg);

Checked on s390x-linux-gnu and s390-linux-gnu.
2023-07-13 10:26:34 -03:00
Adhemerval Zanella
dddc88587a sparc: Fix la_symbind for bind-now (BZ 23734)
The sparc ABI has multiple cases on how to handle JMP_SLOT relocations,
(sparc_fixup_plt/sparc64_fixup_plt).  For BINDNOW, _dl_audit_symbind
will be responsible to setup the final relocation value; while for
lazy binding _dl_fixup/_dl_profile_fixup will call the audit callback
and tail cail elf_machine_fixup_plt (which will call
sparc64_fixup_plt).

This patch fixes by issuing the SPARC specific routine on bindnow and
forwarding the audit value to elf_machine_fixup_plt for lazy resolution.
It fixes the la_symbind for bind-now tests on sparc64 and sparcv9:

  elf/tst-audit24a
  elf/tst-audit24b
  elf/tst-audit24c
  elf/tst-audit24d

Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-12 15:29:08 -03:00
Andreas Schwab
ca230f5833 i386: make debug wrappers compatible with static PIE
Static PIE requires the use of PLT relocation.
2023-07-12 14:38:13 +02:00
caiyinyu
0e1324e655 LoongArch: Fix soft-float bug about _dl_runtime_resolve{,lsx,lasx} 2023-07-11 11:57:12 +08:00
caiyinyu
7f079fdc16 LoongArch: Add vector implementation for _dl_runtime_resolve. 2023-07-11 10:56:01 +08:00
caiyinyu
0d341d09f2 LoongArch: config: Added HAVE_LOONGARCH_VEC_ASM.
This patch checks if assembler supports vector instructions to
generate LASX/LSX code or not, and then define HAVE_LOONGARCH_VEC_ASM macro

We have added support for vector instructions in binutils-2.41
See:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=75b2f521b101d974354f6ce9ed7c054d8b2e3b7a

commit 75b2f521b101d974354f6ce9ed7c054d8b2e3b7a
Author: mengqinggang <mengqinggang@loongson.cn>
Date:   Thu Jun 22 10:35:28 2023 +0800

LoongArch: gas: Add lsx and lasx instructions support

gas/ChangeLog:

        * config/tc-loongarch.c (md_parse_option): Add lsx and lasx option.
        (loongarch_after_parse_args): Add lsx and lasx option.

opcodes/ChangeLog:

        * loongarch-opc.c (struct loongarch_ase): Add lsx and lasx
        instructions.
2023-07-11 10:56:01 +08:00
Frédéric Bérat
19f9f7f9d5 sysdeps: Add missing hidden definitions for i386
Add missing libc_hidden_builtin_def for memset_chk and MEMCPY_CHK on
i386.
2023-07-10 14:48:07 +02:00
Frédéric Bérat
e30048fdc1 sysdeps/s390: Exclude fortified routines from being built with _FORTIFY_SOURCE
Depending on build configuration, the [routine]-c.c files may be chosen
to provide fortified routines implementation. While [routines].c
implementation were automatically excluded, the [routines]-c.c ones were
not. This patch fixes that by adding these file to the list to be
filtered.
2023-07-10 14:48:04 +02:00
caiyinyu
0567edf1b2 LoongArch: config: Rewrite check on static PIE.
It's better to add "\" before "EOF" and remove "\"
before "$".
2023-07-07 09:01:51 +08:00
John David Anglin
5000549746 Revert "hppa: Drop 16-byte pthread lock alignment"
This change reverts commits c4468cd399
and ab991a3d1b.
2023-07-06 15:47:50 +00:00
Frédéric Bérat
02261d1bd9 sysdeps/ieee754/ldbl-128ibm-compat: Fix warn unused result
Return value from *scanf and *asprintf routines are now properly checked
in test-scanf-ldbl-compat-template.c and test-printf-ldbl-compat.c.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-07-05 16:59:48 +02:00
Frédéric Bérat
ba745eff46 misc/bits/syslog.h: Clearly separate declaration from definition
This allows to include bits/syslog-decl.h in include/sys/syslog.h and
therefore be able to create the libc_hidden_builtin_proto (__syslog_chk)
prototype.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-07-05 16:59:48 +02:00
Frédéric Bérat
64f9857507 wchar: Avoid PLT entries with _FORTIFY_SOURCE
The change is meant to avoid unwanted PLT entries for the wmemset and
wcrtomb routines when _FORTIFY_SOURCE is set.

On top of that, ensure that *_chk routines have their hidden builtin
definitions available.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-07-05 16:59:48 +02:00
Frédéric Bérat
505c884aeb stdio: Ensure *_chk routines have their hidden builtin definition available
If libc_hidden_builtin_{def,proto} isn't properly set for *_chk routines,
there are unwanted PLT entries in libc.so.

There is a special case with __asprintf_chk:
If ldbl_* macros are used for asprintf, ABI gets broken on s390x,
if it isn't, ppc64le isn't building due to multiple asm redirections.

This is due to the inclusion of bits/stdio-lbdl.h for ppc64le whereas it
isn't for s390x. This header creates redirections, which are not
compatible with the ones generated using libc_hidden_def.
Yet, we can't use libc_hidden_ldbl_proto on s390x since it will not
create a simple strong alias (e.g. as done on x86_64), but a versioned
alias, leading to ABI breakage.

This results in errors on s390x:
/usr/bin/ld: glibc/iconv/../libio/bits/stdio2.h:137: undefined reference
to `__asprintf_chk'

Original __asprintf_chk symbols:
00000000001395b0 T __asprintf_chk
0000000000177e90 T __nldbl___asprintf_chk

__asprintf_chk symbols with ldbl_* macros:
000000000012d590 t ___asprintf_chk
000000000012d590 t __asprintf_chk@@GLIBC_2.4
000000000012d590 t __GI___asprintf_chk
000000000012d590 t __GL____asprintf_chk___asprintf_chk
0000000000172240 T __nldbl___asprintf_chk

__asprintf_chk symbols with the patch:
000000000012d590 t ___asprintf_chk
000000000012d590 T __asprintf_chk
000000000012d590 t __GI___asprintf_chk
0000000000172240 T __nldbl___asprintf_chk

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-05 16:59:48 +02:00
Frédéric Bérat
dd8486ffc1 string: Ensure *_chk routines have their hidden builtin definition available
If libc_hidden_builtin_{def,proto} isn't properly set for *_chk routines,
there are unwanted PLT entries in libc.so.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-07-05 16:59:48 +02:00
Frédéric Bérat
ba96ff24b2 sysdeps: Ensure ieee128*_chk routines to be properly named
The *_chk routines naming doesn't match the name that would be generated
using libc_hidden_ldbl_proto. Since the macro is needed for some of
these *_chk functions for _FORTIFY_SOURCE to be enabled, that needed to
be fixed.
While at it, all the *_chk function get renamed appropriately for
consistency, even if not strictly necessary.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2023-07-05 16:59:48 +02:00
Frédéric Bérat
20c894d21e Exclude routines from fortification
Since the _FORTIFY_SOURCE feature uses some routines of Glibc, they need to
be excluded from the fortification.

On top of that:
 - some tests explicitly verify that some level of fortification works
   appropriately, we therefore shouldn't modify the level set for them.
 - some objects need to be build with optimization disabled, which
   prevents _FORTIFY_SOURCE to be used for them.

Assembler files that implement architecture specific versions of the
fortified routines were not excluded from _FORTIFY_SOURCE as there is no
C header included that would impact their behavior.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-07-05 16:59:48 +02:00
Sergey Bugaev
27cb2bb93d hurd: Implement MAP_EXCL
MAP_FIXED is defined to silently replace any existing mappings at the
address range being mapped over. This, however, is a dangerous, and only
rarely desired behavior.

Various Unix systems provide replacements or additions to MAP_FIXED:

* SerenityOS and Linux provide MAP_FIXED_NOREPLACE. If the address space
  already contains a mapping in the requested range, Linux returns
  EEXIST. SerenityOS returns ENOMEM, however that is a bug, as the
  MAP_FIXED_NOREPLACE implementation is intended to be compatible with
  Linux.

* FreeBSD provides the MAP_EXCL flag that has to be used in combination
  with MAP_FIXED. It returns EINVAL if the requested range already
  contains existing mappings. This is directly analogous to the O_EXCL
  flag in the open () call.

* DragonFly BSD, NetBSD, and OpenBSD provide MAP_TRYFIXED, but with
  different semantics. DragonFly BSD returns ENOMEM if the requested
  range already contains existing mappings. NetBSD does not return an
  error, but instead creates the mapping at a different address if the
  requested range contains mappings. OpenBSD behaves the same, but also
  notes that this is the default behavior even without MAP_TRYFIXED
  (which is the case on the Hurd too).

Since the Hurd leans closer to the BSD side, add MAP_EXCL as the primary
API to request the behavior of not replacing existing mappings. Declare
MAP_FIXED_NOREPLACE and MAP_TRYFIXED as aliases of (MAP_FIXED|MAP_EXCL),
so any existing software that checks for either of those macros will
pick them up automatically. For compatibility with Linux, return EEXIST
if a mapping already exists.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230625231751.404120-5-bugaevc@gmail.com>
2023-07-03 01:38:14 +02:00
Sergey Bugaev
19c3b31812 hurd: Fix mapping at address 0 with MAP_FIXED
Zero address passed to mmap () typically means the caller doesn't have
any specific preferred address. Not so if MAP_FIXED is passed: in this
case 0 means literal 0. Fix this case to pass anywhere = 0 into vm_map.

Also add some documentation.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230625231751.404120-4-bugaevc@gmail.com>
2023-07-03 01:38:12 +02:00
Sergey Bugaev
f84c3ceb04 hurd: Fix calling vm_deallocate (NULL)
Only call vm_deallocate when we do have the old buffer, and check for
unexpected errors.

Spotted while debugging a msgids/readdir issue on x86_64-gnu.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230625231751.404120-3-bugaevc@gmail.com>
2023-07-03 01:38:12 +02:00
Sergey Bugaev
4b5e576fc2 hurd: Map brk non-executable
The rest of the heap (backed by individual pages) is already mapped RW.
Mapping these pages RWX presents a security hazard.

Also, in another branch memory gets allocated using vm_allocate, which
sets memory protection to VM_PROT_DEFAULT (which is RW). The mismatch
between protections prevents Mach from coalescing the VM map entries.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230625231751.404120-2-bugaevc@gmail.com>
2023-07-03 01:38:08 +02:00
Sergey Bugaev
019b0bbc84 htl: Let Mach place thread stacks
Instead of trying to allocate a thread stack at a specific address,
looping over the address space, just set the ANYWHERE flag in
vm_allocate (). The previous behavior:

- defeats ASLR (for Mach versions that support ASLR),
- is particularly slow if the lower 4 GB of the address space are mapped
  inaccessible, as we're planning to do on 64-bit Hurd,
- is just silly.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230625231751.404120-1-bugaevc@gmail.com>
2023-07-03 01:25:33 +02:00
Samuel Thibault
efdb85183a mach: strerror must not return NULL (bug 30555)
This follows 1d44530a5b ("string: strerror must not return NULL (bug 30555)"):

«
    For strerror, this fixes commit 28aff04781 ("string:
    Implement strerror in terms of strerror_l").  This commit avoids
    returning NULL for strerror_l as well, although POSIX allows this
    behavior for strerror_l.
»
2023-07-02 11:27:51 +00:00
John David Anglin
181e991dfb hppa: xfail debug/tst-ssp-1 when have-ssp is yes (gcc-12 and later) 2023-07-01 18:26:18 +00:00
Samuel Thibault
494714d407 hurd: Make getrandom return ENOSYS when /dev/random is not set up
So that callers (e.g. __arc4random_buf) don't try calling it again.
2023-07-01 14:23:40 +02:00
H.J. Lu
6259ab3941 ld.so: Always use MAP_COPY to map the first segment [BZ #30452]
The first segment in a shared library may be read-only, not executable.
To support LD_PREFER_MAP_32BIT_EXEC on such shared libraries, we also
check MAP_DENYWRITE to decide if MAP_32BIT should be passed to mmap.
Normally the first segment is mapped with MAP_COPY, which is defined
as (MAP_PRIVATE | MAP_DENYWRITE).  But if the segment alignment is
greater than the page size, MAP_COPY isn't used to allocate enough
space to ensure that the segment can be properly aligned.  Map the
first segment with MAP_COPY in this case to fix BZ #30452.
2023-06-30 10:42:42 -07:00
Joe Ramsay
4a9392ffc2 aarch64: Add vector implementations of exp routines
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.

As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-06-30 09:04:26 +01:00
Joe Ramsay
78c01a5cbe aarch64: Add vector implementations of log routines
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines. Log lookup table
added as HIDDEN symbol to allow it to be shared between AdvSIMD and
SVE variants.

As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-06-30 09:04:22 +01:00
Joe Ramsay
3bb1af2051 aarch64: Add vector implementations of sin routines
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.

As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-06-30 09:04:16 +01:00
Joe Ramsay
aed39a3aa3 aarch64: Add vector implementations of cos routines
Replace the loop-over-scalar placeholder routines with optimised
implementations from Arm Optimized Routines (AOR).

Also add some headers containing utilities for aarch64 libmvec
routines, and update libm-test-ulps.

Data tables for new routines are used via a pointer with a
barrier on it, in order to prevent overly aggressive constant
inlining in GCC. This allows a single adrp, combined with offset
loads, to be used for every constant in the table.

Special-case handlers are marked NOINLINE in order to confine the
save/restore overhead of switching from vector to normal calling
standard. This way we only incur the extra memory access in the
exceptional cases. NOINLINE definitions have been moved to
math_private.h in order to reduce duplication.

AOR exposes a config option, WANT_SIMD_EXCEPT, to enable
selective masking (and later fixing up) of invalid lanes, in
order to trigger fp exceptions correctly (AdvSIMD only). This is
tested and maintained in AOR, however it is configured off at
source level here for performance reasons. We keep the
WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify
the upstreaming process from AOR to glibc.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-06-30 09:04:10 +01:00
Joseph Myers
1a21693e16 Update syscall lists for Linux 6.4
Linux 6.4 adds the riscv_hwprobe syscall on riscv and enables
memfd_secret on s390.  Update syscall-names.list and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.
2023-06-28 21:22:14 +00:00
Adhemerval Zanella
d35fbd3e68 linux: Return unsupported if procfs can not be mount on tst-ttyname-namespace
Trying to mount procfs can fail due multiples reasons: proc is locked
due the container configuration, mount syscall is filtered by a
Linux Secuirty Module, or any other security or hardening mechanism
that Linux might eventually add.

The tests does require a new procfs without binding to parent, and
to fully fix it would require to change how the container was created
(which is out of the scope of the test itself).  Instead of trying to
foresee any possible scenario, if procfs can not be mount fail with
unsupported.

Checked on aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-28 09:19:11 -03:00
Adhemerval Zanella
a9fed5ea81 linux: Split tst-ttyname
The tst-ttyname-direct.c checks the ttyname with procfs mounted in
bind mode (MS_BIND|MS_REC), while tst-ttyname-namespace.c checks
with procfs mount with MS_NOSUID|MS_NOEXEC|MS_NODEV in a new
namespace.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-28 09:18:23 -03:00
Adhemerval Zanella
b29e70657d x86: Adjust Linux x32 dl-cache inclusion path
It fixes the x32 build failure introduced by 45e2483a6c.

Checked on a x86_64-linux-gnu-x32 build.
2023-06-26 16:51:30 -03:00
Joe Simmons-Talbott
9a17a193b4 check_native: Get rid of alloca
Use malloc rather than alloca to avoid potential stack overflow.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-06-26 10:17:47 -03:00
Joe Simmons-Talbott
48170127d9 ifaddrs: Get rid of alloca
Use scratch_buffer and malloc rather than alloca to avoid potential stack
overflows.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-06-26 10:17:39 -03:00
Sergey Bugaev
45e2483a6c x86: Make dl-cache.h and readelflib.c not Linux-specific
These files could be useful to any port that wants to use ld.so.cache.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-06-26 10:04:31 -03:00
Frederic Berat
99f9ae4ed0 benchtests: fix warn unused result
Few tests needed to properly check for asprintf and system calls return
values with _FORTIFY_SOURCE enabled.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-22 00:21:19 -04:00
Frederic Berat
d636339306 sysdeps/powerpc/fpu/tst-setcontext-fpscr.c: Fix warn unused result
The fread routine return value needs to be checked when fortification
is enabled, hence use xfread helper.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-22 00:21:17 -04:00
Frederic Berat
1bc85effd5 sysdeps/{i386, x86_64}/mempcpy_chk.S: fix linknamespace for __mempcpy_chk
On i386 and x86_64, for libc.a specifically, __mempcpy_chk calls
mempcpy which leads POSIX routines to call non-POSIX mempcpy indirectly.

This leads the linknamespace test to fail when glibc is built with
__FORTIFY_SOURCE=3.

Since calling mempcpy doesn't bring any benefit for libc.a, directly
call __mempcpy instead.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-22 00:20:52 -04:00
Joe Simmons-Talbott
9e6863a537 hurd: readv: Get rid of alloca
Replace alloca with a scratch_buffer to avoid potential stack overflows.

Checked on i686-gnu and x86_64-linux-gnu
Message-Id: <20230619144334.2902429-1-josimmon@redhat.com>
2023-06-20 19:15:10 +02:00
Joe Simmons-Talbott
c6957bddb9 hurd: writev: Add back cleanup handler
There is a potential memory leak for large writes due to writev being a
"shall occur" cancellation point.  Add back the cleanup handler removed
in cf30aa43a5.

Checked on i686-gnu and x86_64-linux-gnu.
Message-Id: <20230619143842.2901522-1-josimmon@redhat.com>
2023-06-20 18:37:04 +02:00
Paul Pluzhnikov
4290aed051 Fix misspellings -- BZ 25337 2023-06-19 21:58:33 +00:00
Frédéric Bérat
20b6b8e8a5 tests: replace read by xread
With fortification enabled, read calls return result needs to be checked,
has it gets the __wur macro enabled.

Note on read call removal from  sysdeps/pthread/tst-cancel20.c and
sysdeps/pthread/tst-cancel21.c:
It is assumed that this second read call was there to overcome the race
condition between pipe closure and thread cancellation that could happen
in the original code. Since this race condition got fixed by
d0e3ffb7a5 the second call seems
superfluous. Hence, instead of checking for the return value of read, it
looks reasonable to simply remove it.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-19 09:14:56 -04:00
Joe Simmons-Talbott
cf30aa43a5 hurd: writev: Get rid of alloca
Use a scratch_buffer rather than alloca to avoid potential stack
overflows.

Checked on i686-gnu and x86_64-linux-gnu
Message-Id: <20230608155844.976554-1-josimmon@redhat.com>
2023-06-19 02:45:19 +02:00
Joe Simmons-Talbott
01dd2875f8 grantpt: Get rid of alloca
Replace alloca with a scratch_buffer to avoid potential stack overflows.
Message-Id: <20230613191631.1080455-1-josimmon@redhat.com>
2023-06-18 01:08:04 +02:00
Florian Weimer
388ae538dd hurd: Add strlcpy, strlcat, wcslcpy, wcslcat to libc.abilist 2023-06-15 10:05:25 +02:00
Florian Weimer
b54e5d1c92 Add the wcslcpy, wcslcat functions
These functions are about to be added to POSIX, under Austin Group
issue 986.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-14 18:10:24 +02:00
Florian Weimer
454a20c875 Implement strlcpy and strlcat [BZ #178]
These functions are about to be added to POSIX, under Austin Group
issue 986.

The fortified strlcat implementation does not raise SIGABRT if the
destination buffer does not contain a null terminator, it just
inherits the non-failing regular strlcat behavior.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-14 18:10:08 +02:00
Frederic Berat
7ba426a111 tests: replace fgets by xfgets
With fortification enabled, fgets calls return result needs to be checked,
has it gets the __wur macro enabled.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-13 19:59:08 -04:00
Dridi Boukelmoune
658f601f2a posix: Handle success in gai_strerror()
Signed-off-by: Dridi Boukelmoune <dridi.boukelmoune@gmail.com>
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2023-06-13 20:54:49 +02:00
caiyinyu
eaa5b1cce8 LoongArch: Add support for dl_runtime_profile
This commit can fix the FAIL item: elf/tst-sprof-basic.
2023-06-13 10:27:45 +08:00
Noah Goldstein
180897c161 x86: Make the divisor in setting non_temporal_threshold cpu specific
Different systems prefer a different divisors.

From benchmarks[1] so far the following divisors have been found:
    ICX     : 2
    SKX     : 2
    BWD     : 8

For Intel, we are generalizing that BWD and older prefers 8 as a
divisor, and SKL and newer prefers 2. This number can be further tuned
as benchmarks are run.

[1]: https://github.com/goldsteinn/memcpy-nt-benchmarks
Reviewed-by: DJ Delorie <dj@redhat.com>
2023-06-12 11:33:39 -05:00
Noah Goldstein
f193ea20ed x86: Refactor Intel init_cpu_features
This patch should have no affect on existing functionality.

The current code, which has a single switch for model detection and
setting prefered features, is difficult to follow/extend. The cases
use magic numbers and many microarchitectures are missing. This makes
it difficult to reason about what is implemented so far and/or
how/where to add support for new features.

This patch splits the model detection and preference setting stages so
that CPU preferences can be set based on a complete list of available
microarchitectures, rather than based on model magic numbers.
Reviewed-by: DJ Delorie <dj@redhat.com>
2023-06-12 11:33:39 -05:00
Noah Goldstein
af992e7abd x86: Increase non_temporal_threshold to roughly sizeof_L3 / 4
Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 /
ncores_per_socket'. This patch updates that value to roughly
'sizeof_L3 / 4`

The original value (specifically dividing the `ncores_per_socket`) was
done to limit the amount of other threads' data a `memcpy`/`memset`
could evict.

Dividing by 'ncores_per_socket', however leads to exceedingly low
non-temporal thresholds and leads to using non-temporal stores in
cases where REP MOVSB is multiple times faster.

Furthermore, non-temporal stores are written directly to main memory
so using it at a size much smaller than L3 can place soon to be
accessed data much further away than it otherwise could be. As well,
modern machines are able to detect streaming patterns (especially if
REP MOVSB is used) and provide LRU hints to the memory subsystem. This
in affect caps the total amount of eviction at 1/cache_associativity,
far below meaningfully thrashing the entire cache.

As best I can tell, the benchmarks that lead this small threshold
where done comparing non-temporal stores versus standard cacheable
stores. A better comparison (linked below) is to be REP MOVSB which,
on the measure systems, is nearly 2x faster than non-temporal stores
at the low-end of the previous threshold, and within 10% for over
100MB copies (well past even the current threshold). In cases with a
low number of threads competing for bandwidth, REP MOVSB is ~2x faster
up to `sizeof_L3`.

The divisor of `4` is a somewhat arbitrary value. From benchmarks it
seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs
such as Broadwell prefer something closer to `8`. This patch is meant
to be followed up by another one to make the divisor cpu-specific, but
in the meantime (and for easier backporting), this patch settles on
`4` as a middle-ground.

Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable
stores where done using:
https://github.com/goldsteinn/memcpy-nt-benchmarks

Sheets results (also available in pdf on the github):
https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-06-12 11:33:39 -05:00
Florian Weimer
7d42120928 pthreads: Use _exit to terminate the tst-stdio1 test
Previously, the exit function was used, but this causes the test to
block (until the timeout) once exit is changed to lock stdio streams
during flush.
2023-06-06 11:39:06 +02:00
Adhemerval Zanella
d4963a844d linux: Fail as unsupported if personality call is filtered
Container management default seccomp filter [1] only accepts
personality(2) with PER_LINUX, (0x0), UNAME26 (0x20000),
PER_LINUX32 (0x8), UNAME26 | PER_LINUX32, and 0xffffffff (to query
current personality)

Although the documentation only state it is blocked to prevent
'enabling BSD emulation' (PER_BSD, not implemented by Linux), checking
on repository log the real reason is to block ASLR disable flag
(ADDR_NO_RANDOMIZE) and other poorly support emulations.

So handle EPERM and fail as UNSUPPORTED if we can really check for
BZ#19408.

Checked on aarch64-linux-gnu.

[1] https://github.com/moby/moby/blob/master/profiles/seccomp/default.json

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-06-05 12:51:48 -03:00
Joseph Myers
be9b883ddd Remove MAP_VARIABLE from hppa bits/mman.h
As suggested in
<https://sourceware.org/pipermail/libc-alpha/2023-February/145890.html>,
remove the MAP_VARIABLE define from the hppa bits/mman.h, for
consistency with Linux 6.2 which removed the define there.

Tested with build-many-glibcs.py for hppa-linux-gnu.
2023-06-05 14:35:25 +00:00
Sergey Bugaev
67f704ab69 hurd: Fix x86_64 sigreturn restoring bogus reply_port
Since the area of the user's stack we use for the registers dump (and
otherwise as __sigreturn2's stack) can and does overlap the sigcontext,
we have to be very careful about the order of loads and stores that we
do. In particular we have to load sc_reply_port before we start
clobbering the sigcontext.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-06-04 19:05:51 +02:00
Paul Pluzhnikov
2cbeda847b Fix a few more typos I missed in previous round -- BZ 25337 2023-06-02 23:46:32 +00:00
Alejandro Colomar
5013f6fc6c Use __nonnull for the epoll_wait(2) family of syscalls
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-06-01 14:50:42 -03:00
Alejandro Colomar
cc5372806a Fix invalid use of NULL in epoll_pwait2(2) test
epoll_pwait2(2)'s second argument should be nonnull.  We're going to add
__nonnull to the prototype, so let's fix the test accordingly.  We can
use a dummy variable to avoid passing NULL.

Reported-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
2023-06-01 14:50:35 -03:00
Joe Simmons-Talbott
884012db20 getipv4sourcefilter: Get rid of alloca
Use a scratch_buffer rather than alloca to avoid potential stack
overflows.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-06-01 14:47:12 -03:00
Joe Simmons-Talbott
d1eaab5a79 getsourcefilter: Get rid of alloca.
Use a scratch_buffer rather than alloca to avoid potential stack
overflows.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-06-01 14:46:09 -03:00
Frédéric Bérat
29e25f6f13 tests: fix warn unused results
With fortification enabled, few function calls return result need to be
checked, has they get the __wur macro enabled.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-01 13:01:32 -04:00
Frédéric Bérat
026a84a54d tests: replace write by xwrite
Using write without cheks leads to warn unused result when __wur is
enabled.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-06-01 12:40:05 -04:00
H.J. Lu
a8c8889978 x86-64: Use YMM registers in memcmpeq-evex.S
Since the assembly source file with -evex suffix should use YMM registers,
not ZMM registers, include x86-evex256-vecs.h by default to use YMM
registers in memcmpeq-evex.S
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-06-01 09:21:14 -07:00
Adhemerval Zanella
5f828ff824 io: Fix F_GETLK, F_SETLK, and F_SETLKW for powerpc64
Different than other 64 bit architectures, powerpc64 defines the
LFS POSIX lock constants  with values similar to 32 ABI, which
are meant to be used with fcntl64 syscall.  Since powerpc64 kABI
does not have fcntl, the constants are adjusted with the
FCNTL_ADJUST_CMD macro.

The 4d0fe291ae changed the logic of generic constants
LFS value are equal to the default values; which is now wrong
for powerpc64.

Fix the value by explicit define the previous glibc constants
(powerpc64 does not need to use the 32 kABI value, but it simplifies
the FCNTL_ADJUST_CMD which should be kept as compatibility).

Checked on powerpc64-linux-gnu and powerpc-linux-gnu.
2023-05-31 15:31:02 -03:00
Paul Pluzhnikov
65cc53fe7c Fix misspellings in sysdeps/ -- BZ 25337 2023-05-30 23:02:29 +00:00
Adhemerval Zanella
4d0fe291ae io: Fix record locking contants on 32 bit arch with 64 bit default time_t (BZ#30477)
For architecture with default 64 bit time_t support, the kernel
does not provide LFS and non-LFS values for F_GETLK, F_GETLK, and
F_GETLK (the default value used for 64 bit architecture are used).

This is might be considered an ABI break, but the currenct exported
values is bogus anyway.

The POSIX lockf is not affected since it is aliased to lockf64,
which already uses the LFS values.

Checked on i686-linux-gnu and the new tests on a riscv32.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-05-30 08:53:07 -03:00
caiyinyu
3eed5f3a1e LoongArch: Fix inconsistency in SHMLBA macro values between glibc and kernel
The LoongArch glibc was using the value of the SHMLBA macro from common code,
which is __getpagesize() (16k), but this was inconsistent with the value of
the SHMLBA macro in the kernel, which is SZ_64K (64k). This caused several
shmat-related tests in LTP (Linux Test Project) to fail. This commit fixes
the issue by ensuring that the glibc's SHMLBA macro value matches the value
used in the kernel like other architectures.
2023-05-30 14:13:06 +08:00
Adhemerval Zanella
a1950a0758 riscv: Add the clone3 wrapper
It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
 int (*__func) (void *__arg), void *__arg);

Checked on riscv64-linux-gnu-rv64imafdc-lp64d.

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-29 17:39:57 -03:00
Dridi Boukelmoune
33d7c0e1cb posix: Add error message for EAI_OVERFLOW
Signed-off-by: Dridi Boukelmoune <dridi.boukelmoune@gmail.com>
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2023-05-29 15:30:14 +02:00
Joe Simmons-Talbott
d9055634a3 setsourcefilter: Replace alloca with a scratch_buffer.
Use a scratch_buffer rather than either alloca or malloc to reduce the
possibility of a stack overflow.

Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-05-29 09:16:00 -04:00
Noah Goldstein
ed2f9dc942 x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429]
If `non_temporal_threshold` is below `minimum_non_temporal_threshold`,
it almost certainly means we failed to read the systems cache info.

In this case, rather than defaulting the minimum correct value, we
should default to a value that gets at least reasonable
performance. 64MB is chosen conservatively to be at the very high
end. This should never cause non-temporal stores when, if we had read
cache info, we wouldn't have otherwise.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-05-27 21:32:57 -05:00
Samuel Thibault
9ffdcf5b79 hurd: Fix setting up signal thread stack alignment
x86_64 needs special alignment when calling functions, so we have to use
MACHINE_THREAD_STATE_SETUP_CALL for the signal thread when forking.
2023-05-28 00:30:26 +02:00
Joseph Myers
9a51f4e2b6 Add MFD_NOEXEC_SEAL, MFD_EXEC from Linux 6.3 to bits/mman-shared.h
Linux 6.3 adds new constants MFD_NOEXEC_SEAL and MFD_EXEC.  Add these
to bits/mman-shared.h (conditional on MFD_NOEXEC_SEAL not already
being defined, similar to the existing conditional on the older MFD_*
macros).

Tested for x86_64.
2023-05-26 15:04:51 +00:00
Joseph Myers
a33c211b11 Add IP_LOCAL_PORT_RANGE from Linux 6.3 to bits/in.h
Linux 6.3 adds a new constant IP_LOCAL_PORT_RANGE.  Add it to the
corresponding bits/in.h in glibc.

Tested for x86_64.
2023-05-26 15:04:13 +00:00
Joe Simmons-Talbott
02f3d4c53a setipv4sourcefilter: Avoid using alloca.
Use a scratch_buffer rather than alloca/malloc to avoid potential stack
overflow.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-05-26 09:58:27 -04:00
Frédéric Bérat
7aec73c406 sysdeps/pthread/eintr.c: fix warn unused result
Fix unused result warnings, detected when _FORTIFY_SOURCE is enabled in
glibc.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-05-24 21:52:31 -04:00
Paul Pluzhnikov
6b3ddc9ae5 Regenerate configure fragment -- BZ 25337.
In commit 0b25c28e02 I updated congure.ac
but neglected to regenerate updated configure.

Fix this here.
2023-05-23 16:21:29 +00:00
Paul Pluzhnikov
0b25c28e02 Fix misspellings in sysdeps/powerpc -- BZ 25337
All fixes are in comments, so the binaries should be identical
before/after this commit, but I can't verify this.

Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2023-05-23 15:23:09 +00:00
Paul Pluzhnikov
d13733c166 Fix misspellings in sysdeps/unix -- BZ 25337
Applying this commit results in bit-identical rebuild of
libc.so.6 math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-05-23 11:59:23 +00:00
Paul Pluzhnikov
1e9d5987fd Fix misspellings in sysdeps/x86_64 -- BZ 25337.
Applying this commit results in bit-identical rebuild of libc.so.6
math/libm.so.6 elf/ld-linux-x86-64.so.2 mathvec/libmvec.so.1

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-05-23 10:25:11 +00:00
Samuel Thibault
ec9a66cd01 mach: Fix accessing mach_i386.h
Fixes: 196358ae26 ("mach: Fix installing mach_i386.h")
2023-05-23 09:46:47 +02:00
Paul Pluzhnikov
1d2971b525 Fix misspellings in sysdeps/x86_64/fpu/multiarch -- BZ 25337.
Applying this commit results in a bit-identical rebuild of
mathvec/libmvec.so.1 (which is the only binary that gets rebuilt).

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-05-23 03:28:58 +00:00
Samuel Thibault
196358ae26 mach: Fix installing mach_i386.h
We do not want mach_i386.h to get installed into machine/, but into
i386/ or x86_64/ depending where mach_i386.defs was found, i.e.
according to 32/64 bitness.
2023-05-23 01:47:05 +02:00
Samuel Thibault
6151d3b79a hurd: Fix making ld.so run static binaries with retry
We need O_EXEC for __rtld_execve
2023-05-23 01:47:05 +02:00
Ronan Pigott
8f59fc79b7 Add voice-admit DSCP code point from RFC-5865 2023-05-22 22:13:41 +02:00
Andreas Schwab
ea08d8dcea Remove last remnants of have-protected 2023-05-22 13:31:04 +02:00
Stefan Liebler
368b7c614b S390: Use compile-only instead of also link-tests in configure.
Some of the s390-specific configure checks are using compile and
link configure tests.  Now use only compile tests as the link
tests fails when e.g. bootstrapping a cross-toolchain due to
missing crt-files/libc.so.  This is achieved by using
AC_COMPILE_IFELSE in configure.ac file.

This is observable e.g. when using buildroot which builds glibc
only once or the build-many-glibcs.py script.  Note that the latter
one is building glibc twice in the compilers-step (configure-checks
fails) and in the glibcs-step (configure-checks succeed).

Note, that the s390 specific configure tests for static PIE have to
link an executable to test binutils support.  Thus we can't fix
those tests.
2023-05-22 09:58:58 +02:00
Sergey Bugaev
70d0dda0c1 htl: Use __hurd_fail () instead of assigning errno
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230520115531.3911877-2-bugaevc@gmail.com>
2023-05-20 18:16:06 +02:00
Sergey Bugaev
9ec31e5727 hurd: Use __hurd_fail () instead of assigning errno
The __hurd_fail () inline function is the dedicated, idiomatic way of
reporting errors in the Hurd part of glibc. Not only is it more concise
than '{ errno = err; return -1; }', it is since commit
6639cc1002
"hurd: Mark error functions as __COLD" marked with the cold attribute,
telling the compiler that this codepath is unlikely to be executed.

In one case, use __hurd_dfail () over the plain __hurd_fail ().

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230520115531.3911877-1-bugaevc@gmail.com>
2023-05-20 18:14:01 +02:00
Mahesh Bodapati
36cc908ed5 powerpc:GCC(<10) doesn't allow -mlong-double-64 after -mabi=ieeelongdouble
Removed -mabi=ieeelongdouble on failing tests. It resolves the error.
error: ‘-mabi=ieeelongdouble’ requires ‘-mlong-double-128’
2023-05-19 17:35:01 -05:00
Sergey Bugaev
b44c1e1252 hurd: Fix using interposable hurd_thread_self
Create a private hidden __hurd_thread_self alias, and use that one.

Fixes 2f8ecb58a5
"hurd: Fix x86_64 _hurd_tls_fork" and
c7fcce38c8
"hurd: Make sure to not use tcb->self"

Reported-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-05-19 20:45:51 +02:00
Sergey Bugaev
4d3f846b88 hurd: Fix __TIMESIZE on x86_64
We had sizeof (time_t) == 8, but __TIMESIZE == 32.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230519171516.3698754-1-bugaevc@gmail.com>
2023-05-19 20:25:37 +02:00
Samuel Thibault
4bd0f1b6ce hurd: Fix expected c++ types
90604f670c ("hurd 64bit: Add data for check-c++-types") actually added
the 32bit version. This fixes it into a 64bit version.
2023-05-19 01:45:06 +02:00
Joseph Myers
5460fbbfea Add HWCAP2_SME* from Linux 6.3 to AArch64 bits/hwcap.h
Linux 6.3 adds six HWCAP2_SME* constants for AArch64; add them to the
corresponding bits/hwcap.h in glibc.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2023-05-18 14:55:27 +00:00
Sergey Bugaev
c93ee967cd hurd: Also make it possible to call strlen very early
strlen, which is another ifunc-selected function, is invoked during
early static executable startup if the argv arrives from the exec
server. Make it not crash.

Checked on x86_64-gnu: statically linked executables launched after the
exec server is up now start up successfully.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-10-bugaevc@gmail.com>
2023-05-17 23:03:23 +02:00
Sergey Bugaev
70fd6b3b23 hurd: Fix setting up pthreads
On x86_64, we have to pass function arguments in registers, not on the
stack. We also have to align the stack pointer in a specific way. Since
sharing the logic with i386 does not bring much benefit, split the file
back into i386- and x86_64-specific versions, and fix the x86_64 version
to set up the thread properly.

Bonus: i386 keeps doing the extra RPC inside __thread_set_pcsptp to
fetch the state of the thread before setting it; but x86_64 no lnoger
does that.

Checked on x86_64-gnu and i686-gnu.

Fixes be6d002ca2
"hurd: Set up the basic tree for x86_64-gnu"

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-9-bugaevc@gmail.com>
2023-05-17 23:02:08 +02:00
Sergey Bugaev
2f8ecb58a5 hurd: Fix x86_64 _hurd_tls_fork
It is illegal to call thread_get_state () on mach_thread_self (), so
this codepath cannot be used as-is to fork the calling thread's TLS.
Fortunately we can use THREAD_SELF (aka %fs:0x0) to find out the value
of our fs_base without calling into the kernel.

Fixes: f6cf701efc
"hurd: Implement TLS for x86_64"

Checked on x86_64-gnu: fork () now works!

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-8-bugaevc@gmail.com>
2023-05-17 23:00:59 +02:00
Sergey Bugaev
c7fcce38c8 hurd: Make sure to not use tcb->self
Unlike sigstate->thread, tcb->self did not hold a Mach port reference on
the thread port it names. This means that the port can be deallocated,
and the name reused for something else, without anyone noticing. Using
tcb->self will then lead to port use-after-free.

Fortunately nothing was accessing tcb->self, other than it being
intially set to then-valid thread port name upon TCB initialization. To
assert that this keeps being the case without altering TCB layout,
rename self -> self_do_not_use, and stop initializing it.

Also, do not (re-)allocate a whole separate and unused stack for the
main thread, and just exit __pthread_setup early in this case.

Found upon attempting to use tcb->self and getting unexpected crashes.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-7-bugaevc@gmail.com>
2023-05-17 22:59:50 +02:00
Sergey Bugaev
aa19c68d2b hurd: Use __mach_setup_thread_call ()
...instead of mach_setup_thread (), which is unsuitable for setting up
function calls.

Checked on x86_64-gnu: the signal thread no longer crashes upon trying
to process a message.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-6-bugaevc@gmail.com>
2023-05-17 22:57:06 +02:00
Sergey Bugaev
4a373ea7d6 mach: Define MACHINE_THREAD_STATE_SETUP_CALL
The existing two macros, MACHINE_THREAD_STATE_SET_PC and
MACHINE_THREAD_STATE_SET_SP, can be used to set program counter and the
stack pointer registers in a machine-specific thread state structure.

Useful as it is, this may not be enough to set up the thread to make a
function call, because the machine-specific ABI may impose additional
requirements. In particular, x86_64 ABI requires that upon function
entry, the stack pointer is 8 less than 16-byte aligned (sp & 15 == 8).

To deal with this, introduce a new macro,
MACHINE_THREAD_STATE_SETUP_CALL (), which sets both stack and
instruction pointers, and also applies any machine-specific requirements
to make a valid function call. The default implementation simply
forwards to MACHINE_THREAD_STATE_SET_PC and MACHINE_THREAD_STATE_SET_SP,
but on x86_64 we additionally align the stack pointer.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-3-bugaevc@gmail.com>
2023-05-17 22:52:39 +02:00
Flavio Cruz
3f7b800d54 Use TASK_THREAD_TIMES_INFO_COUNT when calling task_info with TASK_THREAD_TIMES_INFO
This hasn't caused any problems yet but we are passing a pointer to struct
task_thread_times_info which can cause problems if we populate over the
existing size of the struct.
Message-Id: <ZGRDDNcOM2hA3CuT@jupiter.tail36e24.ts.net>
2023-05-17 19:23:10 +02:00
Joseph Myers
4f009060fb Update kernel version to 6.3 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.3.  (There are no new
constants covered by these tests in 6.3 that need any other header
changes.)

Tested with build-many-glibcs.py.
2023-05-16 23:15:13 +00:00
DJ Delorie
088136aa02 i386: Use pthread_barrier for synchronization on tst-bz21269
So I was able to reproduce the hangs in the original source, and debug
it, and fix it.  In doing so, I realized that we can't use anything
complex to trigger the thread because that "anything" might also cause
the expected segfault and force everything out of sync again.

Here's what I ended up with, and it doesn't seem to hang where the
original one hung quite often (in a tight while..end loop).  The key
changes are:

1. Calls to futex are error checked, with retries, to ensure that the
   futexes are actually doing what they're supposed to be doing.  In the
   original code, nearly every futex call returned an error.

2. The main loop has checks for whether the thread ran or not, and
   "unlocks" the thread if it didn't (this is how the original source
   hangs).

Note: the usleep() is not for timing purposes, but just to give the
kernel an excuse to run the other thread at that time.  The test will
not hang without it, but is more likely to test the right bugfix
if the usleep() is present.
2023-05-16 15:09:18 -04:00
Sergey Bugaev
114f1b7881 hurd: Fix computing user stack pointer
Fixes b574ae0a28
"hurd: Implement sigreturn for x86_64"

Checked on x86_64-gnu.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230515083323.1358039-5-bugaevc@gmail.com>
2023-05-16 16:09:02 +02:00
Sergey Bugaev
e333759f77 hurd: Fix sc_i386_thread_state layout
The real i386_thread_state Mach structure has an alignment of 8 on
x86_64. However, in struct sigcontext, the compiler was packing sc_gs
(which is the first member of sc_i386_thread_state) into the same 8-byte
slot as sc_error; this resulted in the rest of sc_i386_thread_state
members having wrong offsets relative to each other, and the overall
sc_i386_thread_state layout mismatching that of i386_thread_state.

Fix this by explicitly adding the required padding members, and
statically asserting that this results in the desired alignment.

The same goes for sc_i386_float_state.

Checked on x86_64-gnu.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230515083323.1358039-4-bugaevc@gmail.com>
2023-05-16 16:09:00 +02:00
Sergey Bugaev
ce96593c88 hurd: Align signal stack pointer after allocating stackframe
sizeof (*stackframe) appears to be divisible by 16, but we should not
rely on that. So make sure to leave enough space for the stackframe
first, and then align the final pointer at 16 bytes.

Checked on x86_64-gnu.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230515083323.1358039-3-bugaevc@gmail.com>
2023-05-16 16:08:58 +02:00
Sergey Bugaev
ff0f87632a hurd: Fix aligning signal stack pointer
Fixes 60f9bf9746
"hurd: Port trampoline.c to x86_64"

Checked on x86_64-gnu.

Reported-by: Bruno Haible <bruno@clisp.org>
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230515083323.1358039-2-bugaevc@gmail.com>
2023-05-16 16:08:45 +02:00
Carlos O'Donell
dccee96e6d linux: Reformat Makefile.
Reflow Makefile.
Sort using scripts/sort-makefile-lines.py.

No code generation changes observed in binary artifacts.
No regressions on x86_64 and i686.
2023-05-16 07:19:31 -04:00
Joseph Myers
eeef96f56c Update syscall lists for Linux 6.3
Linux 6.3 has no new syscalls.  Update the version number in
syscall-names.list to reflect that it is still current for 6.3.

Tested with build-many-glibcs.py.
2023-05-15 22:26:56 +00:00
Samuel Thibault
d6c72f976c hurd: rule out some mach headers when generating errno.h
While mach/kern_return.h happens to pull mach/machine/kern_return.h,
mach/machine/boolean.h, and mach/machine/vm_types.h (and realpath-ing them
exposes the machine-specific machine symlink content), those headers do not
actually define anything machine-specific for the content of errno.h.

So we can just rule out these machine-specific from the dependency
comment.
2023-05-11 01:53:49 +02:00
Flavio Cruz
3ca9f43d10 Stop checking if MiG supports retcode.
We already did the same change for Hurd
(https://git.savannah.gnu.org/cgit/hurd/hurd.git/commit/?id=ef5924402864ef049f40a39e73967628583bc1a4)

Due to MiG requiring the subsystem to be defined early in order to know the
size of a port, this was causing a division by zero error during ./configure.
We could have just move subsystem to the top of the snippet, however it is
simpler to just remove the check given that we have no plans to use some other
MiG anyway.

HAVE_MIG_RETCODE is removed completely since this will be a no-op either
way (compiling against old Hurd headers will work the same, new Hurd
headers will result in the same stubs since retcode is a no-op).
Message-Id: <ZFspor91aoMwbh9T@jupiter.tail36e24.ts.net>
2023-05-11 01:28:34 +02:00
Sachin Monga
1a57ab0c92 Added Redirects to longdouble error functions [BZ #29033]
This patch redirects the error functions to the appropriate
longdouble variants which enables the compiler to optimize
for the abi ieeelongdouble.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
2023-05-10 13:59:48 -05:00
Carlos O'Donell
f0dbe112f5 nptl: Reformat Makefile.
Reflow all long lines adding comment terminators.
Sort all reflowed text using scripts/sort-makefile-lines.py.

No code generation changes observed in binary artifacts.
No regressions on x86_64 and i686.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-05-10 13:15:13 -04:00
Florian Weimer
bf88b47ecb Revert "riscv: Resolve symbols directly for symbols with STO_RISCV_VARIANT_CC."
This reverts commit 117e8b341c.

Reason for revert: Causes elf/tst-glibcelf and elf/tst-relro-*
to fail on all architectures.
2023-05-07 14:16:03 +02:00
Flavio Cruz
84b4a81aeb Update hurd/hurdselect.c to be more portable.
Summary of changes:
- Use BAD_TYPECHECK to perform type checking in a cleaner way.
  BAD_TYPECHECK is moved into sysdeps/mach/rpc.h to avoid duplication.
- Remove assertions for mach_msg_type_t since those won't work for
  x86_64.
- Update message structs to use mach_msg_type_t directly.
- Use designated initializers.
Message-Id: <ZFa+roan3ioo0ONM@jupiter.tail36e24.ts.net>
2023-05-06 23:10:55 +02:00
Samuel Thibault
e64b7c26d4 hurd: Fix ld.so name
This was set to ld-x86-64.so.1 in gcc.
2023-05-06 21:00:56 +02:00
Samuel Thibault
d2593d452a hurd: Add ioperm symbol on x86_64 2023-05-06 19:06:39 +02:00
Szabolcs Nagy
642f1b9b3d aarch64: More configure checks for libmvec
Check assembler and linker support too, not just SVE ACLE in the
compiler, since variant PCS requires at least binutils 2.32.1.
2023-05-05 11:34:44 +01:00
Szabolcs Nagy
ee68e9cba4 aarch64: SVE ACLE configure test cleanups
Use more idiomatic configure test for better autoconf cache and logs.
2023-05-05 10:28:29 +01:00
Sam James
c8bd171caf hppa: Fix 'concurrency' typo in comment
Signed-off-by: Sam James <sam@gentoo.org>
2023-05-05 10:12:39 +01:00
Flavio Cruz
3f433cb895 Update sysdeps/mach/hurd/ioctl.c to make it more portable
Summary of the changes:
- Update msg_align to use ALIGN_UP like we have done in previous
  patches. Use it below whenever necessary to avoid repeating the same
  alignment logic.
- Define BAD_TYPECHECK to make it easier to do type checking in a few
  places below.
- Update io2mach_type to use designated initializers.
- Make RetCodeType use mach_msg_type_t. mach_msg_type_t is 8 byte for
  x86_64, so this make it portable.
- Also call msg_align for _IOT_COUNT2/_IOT_TYPE2 since it is more
  correct.
Message-Id: <ZFMvVsuFKwIy2dUS@jupiter.tail36e24.ts.net>
2023-05-05 02:22:31 +02:00
Szabolcs Nagy
1a62d7e5c3 aarch64: fix SVE ACLE check for bootstrap glibc builds
arm_sve.h depends on stdint.h but that relies on libc headers unless
compiled in freestanding mode.  Without this change a bootstrap glibc
build (that uses a compiler without installed libc headers) failed with

checking for availability of SVE ACLE... In file included from [...]/arm_sve.h:28,
                 from conftest.c:1:
[...]/stdint.h:9:16: fatal error: stdint.h: No such file or directory
    9 | # include_next <stdint.h>
      |                ^~~~~~~~~~
compilation terminated.
configure: error: mathvec is enabled but compiler does not have SVE ACLE. [...]
2023-05-04 10:19:11 +01:00
Joe Ramsay
cd94326a13 Enable libmvec support for AArch64
This patch enables libmvec on AArch64. The proposed change is mainly
implementing build infrastructure to add the new routines to ABI,
tests and benchmarks. I have demonstrated how this all fits together
by adding implementations for vector cos, in both single and double
precision, targeting both Advanced SIMD and SVE.

The implementations of the routines themselves are just loops over the
scalar routine from libm for now, as we are more concerned with
getting the plumbing right at this point. We plan to contribute vector
routines from the Arm Optimized Routines repo that are compliant with
requirements described in the libmvec wiki.

Building libmvec requires minimum GCC 10 for SVE ACLE. To avoid raising
the minimum GCC by such a big jump, we allow users to disable libmvec
if their compiler is too old.

Note that at this point users have to manually call the vector math
functions. This seems to be acceptable to some downstream users.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-05-03 12:09:49 +01:00
Samuel Thibault
0ec48e3337 hurd 64bit: Make dev_t word type
dev_t are 64bit on Linux ports, so better increase their size on 64bit
Hurd. It happens that this helps with BZ 23084 there: st_dev has type fsid_t
(quad) and is specified by POSIX to have type dev_t. Making dev_t 64bit
makes these match.
2023-05-02 21:29:26 +02:00
Samuel Thibault
e2b3d7f485 hurd 64bit: Fix struct msqid_ds and shmid_ds fields
The standards want msg_lspid/msg_lrpid/shm_cpid/shm_lpid to be pid_t, see BZ
23083 and 23085.

We can leave them __rpc_pid_t on i386 for ABI compatibility, but avoid
hitting the issue on 64bit.
2023-05-01 15:07:51 +02:00
Samuel Thibault
e3a3616dbf hurd 64bit: Fix ipc_perm fields types
The standards want uid/cuid to be uid_t, gid/cgid to be gid_t and mode to be
mode_t, see BZ 23082.

We can leave them short ints on i386 for ABI compatibility, but avoid
hitting the issue on 64bit.

bits/ipc.h ends up being exactly the same in sysdeps/gnu/ and
sysdeps/unix/sysv/linux/, so remove the latter.
2023-05-01 15:05:09 +02:00
Samuel Thibault
d5e2f9eaf7 hurd 64bit: Fix flock fields types
The standards want l_type and l_whence to be short ints, see BZ 23081.

We can leave them ints on i386 for ABI compatibility, but avoid hitting the
issue on 64bit.
2023-05-01 15:05:09 +02:00
Samuel Thibault
90604f670c hurd 64bit: Add data for check-c++-types 2023-05-01 15:05:09 +02:00
Samuel Thibault
65d1407d55 hurd 64bit: Fix pthread_t/thread_t type to long
So that they can be trivially cast to pointer type, like with nptl.
2023-05-01 15:05:09 +02:00
Samuel Thibault
e11a6734c4 hurd 64bit: Add missing data file for check-localplt test 2023-05-01 13:38:57 +02:00
Samuel Thibault
d44995a4b3 hurd 64bit: Add missing libanl
The move of libanl to libc was in glibc 2.34 for nptl only.
2023-05-01 13:36:14 +02:00
Samuel Thibault
d90470a37e hurd: Also XFAIL missing SA_NOCLDWAIT on 64bit 2023-05-01 13:28:53 +02:00
Samuel Thibault
14f16bd482 hurd: Fix tst-writev test
There is no compile-time IOV_MAX constraint on GNU/Hurd
2023-05-01 13:01:30 +02:00
Samuel Thibault
6d4f183495 nptl: move tst-x86-64-tls-1 to nptl-only tests
It is essentially nptl-only.
2023-05-01 12:59:33 +02:00
Sergey Bugaev
adca662202 hurd: Add expected abilist files for x86_64
These were created by creating stub files, running 'make update-abi',
and reviewing the results.

Also, set baseline ABI to GLIBC_2.38, the (upcoming) first glibc
release to first have x86_64-gnu support.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-05-01 12:10:20 +02:00
Sergey Bugaev
4e506f67cb hurd: Replace reply port with a dead name on failed interruption
If we're trying to interrupt an interruptible RPC, but the server fails
to respond to our __interrupt_operation () call, we instead destroy the
reply port we were expecting the reply to the RPC on.

Instead of deallocating the name completely, replace it with a dead
name, so the name won't get reused for some other right, and deallocate
it in _hurd_intr_rpc_mach_msg once we return from the signal handler.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230429201822.2605207-4-bugaevc@gmail.com>
2023-05-01 03:18:48 +02:00
Flavio Cruz
eb14819c14 Define __mig_strlen to support dynamically sized strings in hurd RPCs
We make lib{mach,hurd}user.so only call __mig_strlen which can be
relocated before libc.so is relocated, similar to what is done with
__mig_memcpy.
Message-Id: <ZE8DTRDpY2hpPZlJ@jupiter.tail36e24.ts.net>
2023-05-01 02:24:04 +02:00
Sergey Bugaev
2bc516020f hurd: Make it possible to call memcpy very early
Normally, in static builds, the first code that runs is _start, in e.g.
sysdeps/x86_64/start.S, which quickly calls __libc_start_main, passing
it the argv etc. Among the first things __libc_start_main does is
initializing the tunables (based on env), then CPU features, and then
calls _dl_relocate_static_pie (). Specifically, this runs ifunc
resolvers to pick, based on the CPU features discovered earlier, the
most suitable implementation of "string" functions such as memcpy.

Before that point, calling memcpy (or other ifunc-resolved functions)
will not work.

In the Hurd port, things are more complex. In order to get argv/env for
our process, glibc normally needs to do an RPC to the exec server,
unless our args/env are already located on the stack (which is what
happens to bootstrap processes spawned by GNU Mach). Fetching our
argv/env from the exec server has to be done before the call to
__libc_start_main, since we need to know what our argv/env are to pass
them to __libc_start_main.

On the other hand, the implementation of the RPC (and other initial
setup needed on the Hurd before __libc_start_main can be run) is not
very trivial. In particular, it may (and on x86_64, will) use memcpy.
But as described above, calling memcpy before __libc_start_main can not
work, since the GOT entry for it is not yet initialized at that point.

Work around this by pre-filling the GOT entry with the baseline version
of memcpy, __memcpy_sse2_unaligned. This makes it possible for early
calls to memcpy to just work. The initial value of the GOT entry is
unused on x86_64, and changing it won't interfere with the relocation
being performed later: once _dl_relocate_static_pie () is called, the
baseline version will get replaced with the most suitable one, and that
is what subsequent calls of memcpy are going to call.

Checked on x86_64-gnu.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230429201822.2605207-6-bugaevc@gmail.com>
2023-05-01 01:21:23 +02:00
Sergey Bugaev
e6136c6939 hurd: Implement longjmp for x86_64
Checked on x86_64-gnu.

[samuel.thibault@ens-lyon.org: Restored same comments as on i386]
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230429201822.2605207-3-bugaevc@gmail.com>
2023-05-01 01:13:59 +02:00
Sergey Bugaev
b574ae0a28 hurd: Implement sigreturn for x86_64
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230429201822.2605207-2-bugaevc@gmail.com>
2023-05-01 01:06:17 +02:00
Sergey Bugaev
41aac87234 hurd: Make _exit work during early boot-up
If any of the early boot-up tasks calls exit () or returns from main (),
terminate it properly instead of crashing on trying to dereference
_hurd_ports and getting forcibly terminated by the kernel.

We sadly cannot make the __USEPORT macro do the check for _hurd_ports
being unset, because it evaluates to the value of the expression
provided as the second argument, and that can be of any type; so there
is no single suitable fallback value for the macro to evaluate to in
case _hurd_ports is unset. Instead, each use site that wants to care for
this case will have to do its own checking.

Checked on x86_64-gnu.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230429131354.2507443-4-bugaevc@gmail.com>
2023-04-29 16:53:47 +02:00
H.J. Lu
a443bd3fb2 __check_pf: Add a cancellation cleanup handler [BZ #20975]
There are reports for hang in __check_pf:

https://github.com/JoeDog/siege/issues/4

It is reproducible only under specific configurations:

1. Large number of cores (>= 64) and large number of threads (> 3X of
the number of cores) with long lived socket connection.
2. Low power (frequency) mode.
3. Power management is enabled.

While holding lock, __check_pf calls make_request which calls __sendto
and __recvmsg.  Since __sendto and __recvmsg are cancellation points,
lock held by __check_pf won't be released and can cause deadlock when
thread cancellation happens in __sendto or __recvmsg.  Add a cancellation
cleanup handler for __check_pf to unlock the lock when cancelled by
another thread.  This fixes BZ #20975 and the siege hang issue.
2023-04-28 13:38:38 -07:00
Hsiangkai Wang
117e8b341c
riscv: Resolve symbols directly for symbols with STO_RISCV_VARIANT_CC.
In some cases, we do not want to go through the resolver for function
calls. For example, functions with vector arguments will use vector
registers to pass arguments. In the resolver, we do not save/restore the
vector argument registers for lazy binding efficiency. To avoid ruining
the vector arguments, functions with vector arguments will not go
through the resolver.

To achieve the goal, we will annotate the function symbols with
STO_RISCV_VARIANT_CC flag and add DT_RISCV_VARIANT_CC tag in the dynamic
section. In the first pass on PLT relocations, we do not set up to call
_dl_runtime_resolve. Instead, we resolve the functions directly.

Signed-off-by: Hsiangkai Wang <kai.wang@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://inbox.sourceware.org/libc-alpha/20230314162512.35802-1-kito.cheng@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-28 07:02:42 -07:00
Joseph Myers
af16a59ee1 Fix Hurd getcwd build with GCC >= 13
The build of glibc for i686-gnu has been failing for a while with GCC
mainline / GCC 13:

../sysdeps/mach/hurd/getcwd.c: In function '__hurd_canonicalize_directory_name_internal':
../sysdeps/mach/hurd/getcwd.c:242:48: error: pointer 'file_name' may be used after 'realloc' [-Werror=use-after-free]
  242 |                   file_namep = &buf[file_namep - file_name + size / 2];
      |                                     ~~~~~~~~~~~^~~~~~~~~~~
../sysdeps/mach/hurd/getcwd.c:236:25: note: call to 'realloc' here
  236 |                   buf = realloc (file_name, size);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~

Fix by doing the subtraction before the reallocation.

Tested with build-many-glibcs.py for i686-gnu.

[samuel.thibault@ens-lyon.rg: Removed mention of this being a bug]

Message-Id: <18587337-7815-4056-ebd0-724df262d591@codesourcery.com>
2023-04-27 01:27:28 +02:00
Joseph Myers
bcca5ae804 Regenerate sysdeps/mach/hurd/bits/errno.h
This file was out of date, as shown by build-many-glibcs.py runs
resulting in a modified source directory.
2023-04-26 17:11:41 +00:00
Joe Simmons-Talbott
a3461d4923 if_index: Remove unneeded alloca.h include
Nothing is being used from this header.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-04-26 08:06:49 -04:00
Joe Simmons-Talbott
19fdc3542b gethostid: Do not include alloca.h
Nothing from alloca.h is being used here.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-04-26 08:06:44 -04:00
Samuel Thibault
323fe6a1a9 hurd: Do not take any flag from the CMSG_DATA
As fixed in 0822e3552a ("hurd: Don't pass FD_CLOEXEC in CMSG_DATA"),
senders currently don't have any flag to pass.  We shouldn't blindly take
random flags that senders could be erroneously giving us.
2023-04-25 00:14:58 +02:00
Sergey Bugaev
5fa8945605 hurd: Implement MSG_CMSG_CLOEXEC
This is a new flag that can be passed to recvmsg () to make it
atomically set the CLOEXEC flag on all the file descriptors received
using the SCM_RIGHTS mechanism. This is useful for all the same reasons
that the other XXX_CLOEXEC flags are useful: namely, it provides
atomicity with respect to another thread of the same process calling
(fork and then) exec at the same time.

This flag is already supported on Linux and FreeBSD. The flag's value,
0x40000, is choosen to match FreeBSD's.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230423160548.126576-2-bugaevc@gmail.com>
2023-04-24 23:09:50 +02:00
Samuel Thibault
0822e3552a hurd: Don't pass FD_CLOEXEC in CMSG_DATA
The flags are used by _hurd_intern_fd, which takes O_* flags, not FD_*.

Also, it is of no concern to the receiving process whether or not
the sender process wants to close its copy of sent file descriptor
upon exec, and it should not influence whether or not the received
file descriptor gets the FD_CLOEXEC flag set in the receiving process.

The latter should in fact be dependent on the MSG_CMSG_CLOEXEC flag
being passed to the recvmsg () call, which is going to be implemented
in the following commit.

Fixes 344e755248
"hurd: Support sending file descriptors over Unix sockets"

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-24 23:05:15 +02:00
Sergey Bugaev
c02b26455b hurd: Implement prefer_map_32bit_exec tunable
This makes the prefer_map_32bit_exec tunable no longer Linux-specific.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230423215526.346009-4-bugaevc@gmail.com>
2023-04-24 22:48:35 +02:00
Sergey Bugaev
35b7bf2fe0 hurd: Don't attempt to deallocate MACH_PORT_DEAD
...in some more places.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230423215526.346009-2-bugaevc@gmail.com>
2023-04-24 22:44:53 +02:00
Sergey Bugaev
4c39333050 hurd: Only deallocate addrport when it's valid
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230423160548.126576-3-bugaevc@gmail.com>
2023-04-24 22:44:18 +02:00
Sergey Bugaev
70b9173caa hurd: Implement MAP_32BIT
This is a flag that can be passed to mmap () to request that the mapping
being established should be located in the lower 2 GB area of the
address space, so only the lower 31 (not 32) bits can be set in its
address, and the address can be represented as a 32-bit integer without
truncating it.

This flag is intended to be compatible with Linux, FreeBSD, and Darwin
flags of the same name. Out of those systems, it appears Linux and
FreeBSD take MAP_32BIT to mean "map 31 bit", whereas Darwin allows the
32nd bit to be set in the address as well. The Hurd follows Linux and
FreeBSD behavior.

Unlike on those systems, on the Hurd MAP_32BIT is defined on all
supported architectures (which currently are only i386 and x86_64).

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230423215526.346009-1-bugaevc@gmail.com>
2023-04-24 22:42:12 +02:00
Sergey Bugaev
533deafbdf Use O_CLOEXEC in more places (BZ #15722)
When opening a temporary file without O_CLOEXEC we risk leaking the
file descriptor if another thread calls (fork and then) exec while we
have the fd open. Fix this by consistently passing O_CLOEXEC everywhere
where we open a file for internal use (and not to return it to the user,
in which case the API defines whether or not the close-on-exec flag
shall be set on the returned fd).

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230419160207.65988-4-bugaevc@gmail.com>
2023-04-22 13:50:14 +02:00
Sergey Bugaev
8e78a2e1d1 hurd: Don't migrate reply port into __init1_tcbhead
Properly differentiate between setting up the real TLS with
TLS_INIT_TP, and setting up the early TLS (__init1_tcbhead) in static
builds. In the latter case, don't yet migrate the reply port into the
TCB, and don't yet set __libc_tls_initialized to 1.

This also lets us move the __init1_desc assignment inside
_hurd_tls_init ().

Fixes cd019ddd89
"hurd: Don't leak __hurd_reply_port0"

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-21 03:02:04 +02:00
Sergey Bugaev
88cc282a9a hurd: Make dl-sysdep's open () cope with O_IGNORE_CTTY
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230419160207.65988-6-bugaevc@gmail.com>
2023-04-20 23:05:54 +02:00
Cupertino Miranda
b630be0922 Created tunable to force small pages on stack allocation.
Created tunable glibc.pthread.stack_hugetlb to control when hugepages
can be used for stack allocation.
In case THP are enabled and glibc.pthread.stack_hugetlb is set to
0, glibc will madvise the kernel not to use allow hugepages for stack
allocations.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-04-20 13:54:24 -03:00
Adhemerval Zanella
320768a664 linux: Re-flow and sort multiline Makefile definitions 2023-04-20 10:40:54 -03:00
Sergey Bugaev
8895a99c10 hurd: Microoptimize sigreturn
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-18 16:20:09 +02:00
Sergey Bugaev
e411e31b7b hurd: Fix restoring reply port in sigreturn
We must not use the user's reply port (scp->sc_reply_port) for any of
our own RPCs, otherwise various things break. So, use MACH_PORT_DEAD as
a reply port when destroying our reply port, and make sure to do this
after _hurd_sigstate_unlock (), which may do a gsync_wake () RPC.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-17 21:00:02 +02:00
Wilco Dijkstra
76d0f094dd math: Improve fmod(f) performance
Optimize the fast paths (x < y) and (x/y < 2^12).  Delay handling of special
cases to reduce the number of instructions executed before the fast paths.
Performance improvements for fmod:

		Skylake	Zen2	Neoverse V1
subnormals	11.8%	4.2%	11.5%
normal		3.9%	0.01%	-0.5%
close-exponents	6.3%	5.6%	19.4%

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-04-17 13:03:10 +01:00
Sergey Bugaev
e275690332 hurd: Only check for TLS initialization inside rtld or in static builds
When glibc is built as a shared library, TLS is always initialized by
the call of TLS_INIT_TP () macro made inside the dynamic loader, prior
to running the main program (see dl-call_tls_init_tp.h). We can take
advantage of this: we know for sure that __LIBC_NO_TLS () will evaluate
to 0 in all other cases, so let the compiler know that explicitly too.

Also, only define _hurd_tls_init () and TLS_INIT_TP () under the same
conditions (either !SHARED or inside rtld), to statically assert that
this is the case.

Other than a microoptimization, this also helps with avoiding awkward
sharing of the __libc_tls_initialized variable between ld.so and libc.so
that we would have to do otherwise -- we know for sure that no sharing
is required, simply because __libc_tls_initialized would always be set
to true inside libc.so.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-25-bugaevc@gmail.com>
2023-04-14 10:31:22 +00:00
Sergey Bugaev
ba00d787f3 hurd: Remove __hurd_local_reply_port
Now that the signal code no longer accesses it, the only real user of it
was mig-reply.c, so move the logic for managing the port there.

If we're in SHARED and outside of rtld, we know that __LIBC_NO_TLS ()
always evaluates to 0, and a TLS reply port will always be used, not
__hurd_reply_port0. Still, the compiler does not see that
__hurd_reply_port0 is never used due to its address being taken. To deal
with this, explicitly compile out __hurd_reply_port0 when we know we
won't use it.

Also, instead of accessing the port via THREAD_SELF->reply_port, this
uses THREAD_GETMEM and THREAD_SETMEM directly, avoiding possible
miscompilations.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-14 10:31:22 +00:00
Adhemerval Zanella
05fe3ecfff malloc: Assure that THP mode read do write OOB end of stringt 2023-04-14 08:22:40 -03:00
Adhemerval Zanella
801deb07f6 malloc: Assure that THP mode is always null terminated 2023-04-13 17:18:04 -03:00
Samuel Thibault
decf02d382 hurd: Mark two tests as unsupported
They make the whole testsuite hang/crash.
2023-04-13 02:02:38 +02:00
Samuel Thibault
6538a288be hurd: Restore destroying receive rights on sigreturn
Just subtracting a ref is making signal/tst-signal signal/tst-raise
signal/tst-minsigstksz-5 htl/tst-raise1 fail.
2023-04-13 00:49:16 +02:00
Samuel Thibault
5473a1747a Revert "hurd: Only check for TLS initialization inside rtld or in static builds"
This reverts commit b37899d34d.

Apparently we load libc.so (and thus start using its functions) before
calling TLS_INIT_TP, so libc.so functions should not actually assume
that TLS is always set up.
2023-04-11 18:45:47 +00:00
Sergey Bugaev
cd019ddd89 hurd: Don't leak __hurd_reply_port0
Previously, once we set up TLS, we would implicitly switch from using
__hurd_reply_port0 to reply_port inside the TCB, leaving the former
unused. But we never deallocated it, so it got leaked.

Instead, migrate the port into the new TCB's reply_port slot. This
avoids both the port leak and an extra syscall to create a new reply
port for the TCB.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-28-bugaevc@gmail.com>
2023-04-11 00:24:40 +02:00
Sergey Bugaev
747812349d hurd: Improve reply port handling when exiting signal handlers
If we're doing signals, that means we've already got the signal thread
running, and that implies TLS having been set up. So we know that
__hurd_local_reply_port will resolve to THREAD_SELF->reply_port, and can
access that directly using the THREAD_GETMEM and THREAD_SETMEM macros.
This avoids potential miscompilations, and should also be a tiny bit
faster.

Also, use mach_port_mod_refs () and not mach_port_destroy () to destroy
the receive right. mach_port_destroy () should *never* be used on
mach_task_self (); this can easily lead to port use-after-free
vulnerabilities if the task has any other references to the same port.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-26-bugaevc@gmail.com>
2023-04-10 23:54:28 +02:00
Sergey Bugaev
b37899d34d hurd: Only check for TLS initialization inside rtld or in static builds
When glibc is built as a shared library, TLS is always initialized by
the call of TLS_INIT_TP () macro made inside the dynamic loader, prior
to running the main program (see dl-call_tls_init_tp.h). We can take
advantage of this: we know for sure that __LIBC_NO_TLS () will evaluate
to 0 in all other cases, so let the compiler know that explicitly too.

Also, only define _hurd_tls_init () and TLS_INIT_TP () under the same
conditions (either !SHARED or inside rtld), to statically assert that
this is the case.

Other than a microoptimization, this also helps with avoiding awkward
sharing of the __libc_tls_initialized variable between ld.so and libc.so
that we would have to do otherwise -- we know for sure that no sharing
is required, simply because __libc_tls_initialized would always be set
to true inside libc.so.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-25-bugaevc@gmail.com>
2023-04-10 23:33:30 +02:00
Sergey Bugaev
4644fb9c4c elf: Stop including tls.h in ldsodefs.h
Nothing in there needs tls.h

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-24-bugaevc@gmail.com>
2023-04-10 23:26:28 +02:00
Sergey Bugaev
60f9bf9746 hurd: Port trampoline.c to x86_64
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-3-bugaevc@gmail.com>
2023-04-10 20:44:43 +02:00
Sergey Bugaev
645da826bb hurd: Do not declare local variables volatile
These are just regular local variables that are not accessed in any
funny ways, not even though a pointer. There's absolutely no reason to
declare them volatile. It only ends up hurting the quality of the
generated machine code.

If anything, it would make sense to decalre sigsp as *pointing* to
volatile memory (volatile void *sigsp), but evidently that's not needed
either.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-2-bugaevc@gmail.com>
2023-04-10 20:42:28 +02:00
Sergey Bugaev
892f702827 hurd: Implement x86_64/intr-msg.h
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-18-bugaevc@gmail.com>
2023-04-10 20:39:28 +02:00
Sergey Bugaev
57df0f16b4 hurd: Add sys/ucontext.h and sigcontext.h for x86_64
This is based on the Linux port's version, but laid out to match Mach's
struct i386_thread_state, much like the i386 version does.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-10 20:11:43 +02:00
Flavio Cruz
f7f7dd8009 hurd: Stop depending on the default_pager stubs provided by gnumach
The hurd source tree already provides the same stubs and they are only
needed there.
Message-Id: <ZDN3rDdjMowtUWf7@jupiter.tail36e24.ts.net>
2023-04-10 19:01:52 +02:00
H.J. Lu
81a3cc956e <sys/platform/x86.h>: Add PREFETCHI support
Add PREFETCHI support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
b05521c916 <sys/platform/x86.h>: Add AMX-COMPLEX support
Add AMX-COMPLEX support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
609b7b2d3c <sys/platform/x86.h>: Add AVX-NE-CONVERT support
Add AVX-NE-CONVERT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
4c120c88a6 <sys/platform/x86.h>: Add AVX-VNNI-INT8 support
Add AVX-VNNI-INT8 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
b39741b45f <sys/platform/x86.h>: Add MSRLIST support
Add MSRLIST support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
96037c697d <sys/platform/x86.h>: Add AVX-IFMA support
Add AVX-IFMA support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
8b4cc05eab <sys/platform/x86.h>: Add AMX-FP16 support
Add AMX-FP16 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
227983551d <sys/platform/x86.h>: Add WRMSRNS support
Add WRMSRNS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
a00db8305d <sys/platform/x86.h>: Add ArchPerfmonExt support
Add Architectural Performance Monitoring Extended Leaf (EAX = 23H)
support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
2f02d0d8e1 <sys/platform/x86.h>: Add CMPCCXADD support
Add CMPCCXADD support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
aa528a579b <sys/platform/x86.h>: Add LASS support
Add Linear Address Space Separation (LASS) support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
231bf916ce <sys/platform/x86.h>: Add RAO-INT support
Add RAO-INT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
fb90dc8513 <sys/platform/x86.h>: Add LBR support
Add architectural LBR support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
f47b7d96fb <sys/platform/x86.h>: Add RTM_FORCE_ABORT support
Add RTM_FORCE_ABORT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
f6790a489d <sys/platform/x86.h>: Add SGX-KEYS support
Add SGX-KEYS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
09cc5fee21 <sys/platform/x86.h>: Add BUS_LOCK_DETECT support
Add Bus lock debug exceptions (BUS_LOCK_DETECT) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
8c8e391166 <sys/platform/x86.h>: Add LA57 support
Add 57-bit linear addresses and five-level paging (LA57) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
2d8c590a5e <bits/platform/x86.h>: Rename to x86_cpu_INDEX_7_ECX_15
Rename x86_cpu_INDEX_7_ECX_1 to x86_cpu_INDEX_7_ECX_15 for the unused bit
15 in ECX from CPUID with EAX == 0x7 and ECX == 0.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
John David Anglin
c4468cd399 hppa: Update struct __pthread_rwlock_arch_t comment.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-04-05 18:54:47 +00:00
John David Anglin
e9327e8584 hppa: Revise __TIMESIZE define to use __WORDSIZE
Handle both 32 and 64-bit ABIs.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-04-05 18:35:38 +00:00
Guy-Fleury Iteriteka
5476f8cd2e htl: move pthread_self info libc.
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-4-gfleury@disroot.org>
2023-04-05 01:26:36 +02:00
Guy-Fleury Iteriteka
f987e9b7a3 htl: move ___pthread_self into libc.
sysdeps/mach/hurd/htl/pt-pthread_self.c: New file.
htl/Makefile: .. Add it to libc routine.
sysdeps/mach/hurd/htl/pt-sysdep.c(__pthread_self): Remove it.
sysdeps/mach/hurd/htl/pt-sysdep.h(__pthread_self): Add hidden propertie.
htl/Versions(__pthread_self) Version it as private symbol.

Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-3-gfleury@disroot.org>
2023-04-05 01:26:34 +02:00
Andreas Schwab
856bab7717 x86/dl-cacheinfo: remove unsused parameter from handle_amd
Also replace an unreachable assert with __builtin_unreachable.
2023-04-04 16:16:21 +02:00
Adhemerval Zanella
59db5735e6 powerpc: Disable stack protector in early static initialization
Similar to fb95c31638, also disable
for string-ppc64.c (pulled on rltd as the default string
implementation).

Checked on powerpc64-linux-gnu.
2023-04-03 17:42:08 -03:00
Adhemerval Zanella
370da8a121 nptl: Fix tst-cancel30 on sparc64
As indicated by sparc kernel-features.h, even though sparc64 defines
__NR_pause,  it is not supported (ENOSYS).  Always use ppoll or the
64 bit time_t variant instead.
2023-04-03 17:41:59 -03:00
Adhemerval Zanella Netto
16439f419b math: Remove the error handling wrapper from fmod and fmodf
The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper
with SVID error handling around the new code.  There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).

The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.  For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).

It shows an small improvement, the results for fmod:

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 12.5049  | 9.40992
  x86_64 (Ryzen 9) | normal          | 296.939  | 296.738
  x86_64 (Ryzen 9) | close-exponents | 16.0244  | 13.119
  aarch64 (N1)     | subnormal       | 6.81778  | 4.33313
  aarch64 (N1)     | normal          | 155.620  | 152.915
  aarch64 (N1)     | close-exponents | 8.21306  | 5.76138
  armhf (N1)       | subnormal       | 15.1083  | 14.5746
  armhf (N1)       | normal          | 244.833  | 241.738
  armhf (N1)       | close-exponents | 21.8182  | 22.457

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:45:27 -03:00
Adhemerval Zanella Netto
cf9cf33199 math: Improve fmodf
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:

   mx * 2^ex == 2 * mx * 2^(ex - 1)

   while (ex > ey)
     {
       mx *= 2;
       --ex;
       mx %= my;
     }

With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 8 (which is sizeof of uint32_t minus
MANTISSA_WIDTH plus the signal bit):

   while (ex > ey)
     {
       mx << 8;
       ex -= 8;
       mx %= my;
     }  */

The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles.  Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.

I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 17.2549  | 12.0318
  x86_64 (Ryzen 9) | normal          | 85.4096  | 49.9641
  x86_64 (Ryzen 9) | close-exponents | 19.1072  | 15.8224
  aarch64 (N1)     | subnormal       | 10.2182  | 6.81778
  aarch64 (N1)     | normal          | 60.0616  | 20.3667
  aarch64 (N1)     | close-exponents | 11.5256  | 8.39685

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 11.6662  | 10.8955
  armhf (N1)       | normal          | 69.2759  | 34.1524
  armhf (N1)       | close-exponents | 13.6472  | 18.2131

Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.

Co-authored-by: kirill <kirill.okhotnikov@gmail.com>

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:45:18 -03:00
Adhemerval Zanella Netto
34b9f8bc17 math: Improve fmod
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:

   mx * 2^ex == 2 * mx * 2^(ex - 1)

   while (ex > ey)
     {
       mx *= 2;
       --ex;
       mx %= my;
     }

With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):

   while (ex > ey)
     {
       mx << 11;
       ex -= 11;
       mx %= my;
     }  */

The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles.  Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.

I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 19.1584  | 12.5049
  x86_64 (Ryzen 9) | normal          | 1016.51  | 296.939
  x86_64 (Ryzen 9) | close-exponents | 18.4428  | 16.0244
  aarch64 (N1)     | subnormal       | 11.153   | 6.81778
  aarch64 (N1)     | normal          | 528.649  | 155.62
  aarch64 (N1)     | close-exponents | 11.4517  | 8.21306

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 15.908   | 15.1083
  armhf (N1)       | normal          | 837.525  | 244.833
  armhf (N1)       | close-exponents | 16.2111  | 21.8182

Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.

Co-authored-by: kirill <kirill.okhotnikov@gmail.com>

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:36:24 -03:00