Commit Graph

39855 Commits

Author SHA1 Message Date
Sergey Bugaev
747812349d hurd: Improve reply port handling when exiting signal handlers
If we're doing signals, that means we've already got the signal thread
running, and that implies TLS having been set up. So we know that
__hurd_local_reply_port will resolve to THREAD_SELF->reply_port, and can
access that directly using the THREAD_GETMEM and THREAD_SETMEM macros.
This avoids potential miscompilations, and should also be a tiny bit
faster.

Also, use mach_port_mod_refs () and not mach_port_destroy () to destroy
the receive right. mach_port_destroy () should *never* be used on
mach_task_self (); this can easily lead to port use-after-free
vulnerabilities if the task has any other references to the same port.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-26-bugaevc@gmail.com>
2023-04-10 23:54:28 +02:00
Sergey Bugaev
b37899d34d hurd: Only check for TLS initialization inside rtld or in static builds
When glibc is built as a shared library, TLS is always initialized by
the call of TLS_INIT_TP () macro made inside the dynamic loader, prior
to running the main program (see dl-call_tls_init_tp.h). We can take
advantage of this: we know for sure that __LIBC_NO_TLS () will evaluate
to 0 in all other cases, so let the compiler know that explicitly too.

Also, only define _hurd_tls_init () and TLS_INIT_TP () under the same
conditions (either !SHARED or inside rtld), to statically assert that
this is the case.

Other than a microoptimization, this also helps with avoiding awkward
sharing of the __libc_tls_initialized variable between ld.so and libc.so
that we would have to do otherwise -- we know for sure that no sharing
is required, simply because __libc_tls_initialized would always be set
to true inside libc.so.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-25-bugaevc@gmail.com>
2023-04-10 23:33:30 +02:00
Sergey Bugaev
4644fb9c4c elf: Stop including tls.h in ldsodefs.h
Nothing in there needs tls.h

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-24-bugaevc@gmail.com>
2023-04-10 23:26:28 +02:00
Sergey Bugaev
60f9bf9746 hurd: Port trampoline.c to x86_64
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-3-bugaevc@gmail.com>
2023-04-10 20:44:43 +02:00
Sergey Bugaev
645da826bb hurd: Do not declare local variables volatile
These are just regular local variables that are not accessed in any
funny ways, not even though a pointer. There's absolutely no reason to
declare them volatile. It only ends up hurting the quality of the
generated machine code.

If anything, it would make sense to decalre sigsp as *pointing* to
volatile memory (volatile void *sigsp), but evidently that's not needed
either.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-2-bugaevc@gmail.com>
2023-04-10 20:42:28 +02:00
Sergey Bugaev
892f702827 hurd: Implement x86_64/intr-msg.h
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-18-bugaevc@gmail.com>
2023-04-10 20:39:28 +02:00
Sergey Bugaev
57df0f16b4 hurd: Add sys/ucontext.h and sigcontext.h for x86_64
This is based on the Linux port's version, but laid out to match Mach's
struct i386_thread_state, much like the i386 version does.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-10 20:11:43 +02:00
Flavio Cruz
f7f7dd8009 hurd: Stop depending on the default_pager stubs provided by gnumach
The hurd source tree already provides the same stubs and they are only
needed there.
Message-Id: <ZDN3rDdjMowtUWf7@jupiter.tail36e24.ts.net>
2023-04-10 19:01:52 +02:00
Paul Eggert
54ae6d81c9 manual: update AddressSanitizer discussion
* manual/string.texi (Truncating Strings): Update obsolescent
reference and use the more-generic term “AddressSanitizer”.
Mention fortification, too.  -fcheck-pointer-bounds is no longer
supported.
2023-04-08 13:53:28 -07:00
Paul Eggert
f173e27272 manual: document snprintf truncation better 2023-04-08 13:53:22 -07:00
Paul Eggert
1fb225923a manual: improve string section wording
* manual/string.texi: Editorial fixes.  Do not say “text” when
“string” or “string contents” is meant, as a C string can contain
bytes that are not valid text in the current encoding.
When warning about strcat efficiency, warn similarly about strncat
and wcscat.  “coping” → “copying”.
Mention at the start of the two problematic sections that problems
are discussed at section end.
2023-04-08 13:51:26 -07:00
Paul Eggert
a778333951 manual: fix texinfo typo
* manual/creature.texi (Feature Test Macros): Fix
“creature.texi:309: warning: `.' or `,' must follow @xref, not f”.
2023-04-08 13:51:26 -07:00
Florian Weimer
0d5cb2ae27 <stdio.h>: Make fopencookie, vasprintf, asprintf available by default
FreeBSD makes these functions available by default, so we should
not treat them as GNU-specific and restrict them to _GNU_SOURCE.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-04-06 16:41:15 +02:00
Florian Weimer
30e3ca78f9 <string.h>: Make strchrnul, strcasestr, memmem available by default
FreeBSD makes them available by default, too, so there does not seem
to be a reason to restrict these functions to _GNU_SOURCE.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-04-06 16:41:12 +02:00
H.J. Lu
81a3cc956e <sys/platform/x86.h>: Add PREFETCHI support
Add PREFETCHI support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
b05521c916 <sys/platform/x86.h>: Add AMX-COMPLEX support
Add AMX-COMPLEX support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
609b7b2d3c <sys/platform/x86.h>: Add AVX-NE-CONVERT support
Add AVX-NE-CONVERT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
4c120c88a6 <sys/platform/x86.h>: Add AVX-VNNI-INT8 support
Add AVX-VNNI-INT8 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
b39741b45f <sys/platform/x86.h>: Add MSRLIST support
Add MSRLIST support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
96037c697d <sys/platform/x86.h>: Add AVX-IFMA support
Add AVX-IFMA support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
8b4cc05eab <sys/platform/x86.h>: Add AMX-FP16 support
Add AMX-FP16 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
227983551d <sys/platform/x86.h>: Add WRMSRNS support
Add WRMSRNS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
a00db8305d <sys/platform/x86.h>: Add ArchPerfmonExt support
Add Architectural Performance Monitoring Extended Leaf (EAX = 23H)
support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
2f02d0d8e1 <sys/platform/x86.h>: Add CMPCCXADD support
Add CMPCCXADD support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
aa528a579b <sys/platform/x86.h>: Add LASS support
Add Linear Address Space Separation (LASS) support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
231bf916ce <sys/platform/x86.h>: Add RAO-INT support
Add RAO-INT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
fb90dc8513 <sys/platform/x86.h>: Add LBR support
Add architectural LBR support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
f47b7d96fb <sys/platform/x86.h>: Add RTM_FORCE_ABORT support
Add RTM_FORCE_ABORT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
f6790a489d <sys/platform/x86.h>: Add SGX-KEYS support
Add SGX-KEYS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
09cc5fee21 <sys/platform/x86.h>: Add BUS_LOCK_DETECT support
Add Bus lock debug exceptions (BUS_LOCK_DETECT) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
8c8e391166 <sys/platform/x86.h>: Add LA57 support
Add 57-bit linear addresses and five-level paging (LA57) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
083204a0e2 platform.texi: Move LAM after LAHF64_SAHF64
Move LAM after LAHF64_SAHF64 to sort x86 features.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
2d8c590a5e <bits/platform/x86.h>: Rename to x86_cpu_INDEX_7_ECX_15
Rename x86_cpu_INDEX_7_ECX_1 to x86_cpu_INDEX_7_ECX_15 for the unused bit
15 in ECX from CPUID with EAX == 0x7 and ECX == 0.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
John David Anglin
c4468cd399 hppa: Update struct __pthread_rwlock_arch_t comment.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-04-05 18:54:47 +00:00
John David Anglin
e9327e8584 hppa: Revise __TIMESIZE define to use __WORDSIZE
Handle both 32 and 64-bit ABIs.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-04-05 18:35:38 +00:00
Adhemerval Zanella
21a171bcb2 libio: Remove unused pragma weak on vtable
Both _IO_file_jumps_alias and _IO_wfile_jumps_alias are defined as
alias.
2023-04-05 09:03:42 -03:00
Adhemerval Zanella
b47d02b9c6 malloc: Only set pragma weak for rpc freemem if required
Both __rpc_freemem and __rpc_thread_destroy are only used if the
the compat symbols are required.
2023-04-05 09:03:42 -03:00
Guy-Fleury Iteriteka
5476f8cd2e htl: move pthread_self info libc.
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-4-gfleury@disroot.org>
2023-04-05 01:26:36 +02:00
Guy-Fleury Iteriteka
f987e9b7a3 htl: move ___pthread_self into libc.
sysdeps/mach/hurd/htl/pt-pthread_self.c: New file.
htl/Makefile: .. Add it to libc routine.
sysdeps/mach/hurd/htl/pt-sysdep.c(__pthread_self): Remove it.
sysdeps/mach/hurd/htl/pt-sysdep.h(__pthread_self): Add hidden propertie.
htl/Versions(__pthread_self) Version it as private symbol.

Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-3-gfleury@disroot.org>
2023-04-05 01:26:34 +02:00
Guy-Fleury Iteriteka
7bba5bd8e8 htl: move __pthtread_total into libc
htl/pt-nthreads.c: new file.
htl/Makefile: Add it to routine.
htl/Versions: version it as private libc symbol.
htl/pt-create.c: remove his definition here.
htl/pt-internal.h: add propertie to it declaration.

Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-2-gfleury@disroot.org>
2023-04-05 01:26:29 +02:00
Nisha Menon
51a121eb36 compare_strings.py : Add --gmean flag
To calculate geometric mean for string benchmark results.

Signed-off-by: Nisha Poyarekar <nisha.s.menon@gmail.com>
2023-04-04 13:51:45 -05:00
Andreas Schwab
856bab7717 x86/dl-cacheinfo: remove unsused parameter from handle_amd
Also replace an unreachable assert with __builtin_unreachable.
2023-04-04 16:16:21 +02:00
Adhemerval Zanella
59db5735e6 powerpc: Disable stack protector in early static initialization
Similar to fb95c31638, also disable
for string-ppc64.c (pulled on rltd as the default string
implementation).

Checked on powerpc64-linux-gnu.
2023-04-03 17:42:08 -03:00
Adhemerval Zanella
370da8a121 nptl: Fix tst-cancel30 on sparc64
As indicated by sparc kernel-features.h, even though sparc64 defines
__NR_pause,  it is not supported (ENOSYS).  Always use ppoll or the
64 bit time_t variant instead.
2023-04-03 17:41:59 -03:00
Adhemerval Zanella Netto
16439f419b math: Remove the error handling wrapper from fmod and fmodf
The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper
with SVID error handling around the new code.  There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).

The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.  For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).

It shows an small improvement, the results for fmod:

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 12.5049  | 9.40992
  x86_64 (Ryzen 9) | normal          | 296.939  | 296.738
  x86_64 (Ryzen 9) | close-exponents | 16.0244  | 13.119
  aarch64 (N1)     | subnormal       | 6.81778  | 4.33313
  aarch64 (N1)     | normal          | 155.620  | 152.915
  aarch64 (N1)     | close-exponents | 8.21306  | 5.76138
  armhf (N1)       | subnormal       | 15.1083  | 14.5746
  armhf (N1)       | normal          | 244.833  | 241.738
  armhf (N1)       | close-exponents | 21.8182  | 22.457

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:45:27 -03:00
Adhemerval Zanella Netto
cf9cf33199 math: Improve fmodf
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:

   mx * 2^ex == 2 * mx * 2^(ex - 1)

   while (ex > ey)
     {
       mx *= 2;
       --ex;
       mx %= my;
     }

With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 8 (which is sizeof of uint32_t minus
MANTISSA_WIDTH plus the signal bit):

   while (ex > ey)
     {
       mx << 8;
       ex -= 8;
       mx %= my;
     }  */

The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles.  Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.

I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 17.2549  | 12.0318
  x86_64 (Ryzen 9) | normal          | 85.4096  | 49.9641
  x86_64 (Ryzen 9) | close-exponents | 19.1072  | 15.8224
  aarch64 (N1)     | subnormal       | 10.2182  | 6.81778
  aarch64 (N1)     | normal          | 60.0616  | 20.3667
  aarch64 (N1)     | close-exponents | 11.5256  | 8.39685

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 11.6662  | 10.8955
  armhf (N1)       | normal          | 69.2759  | 34.1524
  armhf (N1)       | close-exponents | 13.6472  | 18.2131

Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.

Co-authored-by: kirill <kirill.okhotnikov@gmail.com>

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:45:18 -03:00
Adhemerval Zanella Netto
34b9f8bc17 math: Improve fmod
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:

   mx * 2^ex == 2 * mx * 2^(ex - 1)

   while (ex > ey)
     {
       mx *= 2;
       --ex;
       mx %= my;
     }

With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):

   while (ex > ey)
     {
       mx << 11;
       ex -= 11;
       mx %= my;
     }  */

The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles.  Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.

I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 19.1584  | 12.5049
  x86_64 (Ryzen 9) | normal          | 1016.51  | 296.939
  x86_64 (Ryzen 9) | close-exponents | 18.4428  | 16.0244
  aarch64 (N1)     | subnormal       | 11.153   | 6.81778
  aarch64 (N1)     | normal          | 528.649  | 155.62
  aarch64 (N1)     | close-exponents | 11.4517  | 8.21306

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 15.908   | 15.1083
  armhf (N1)       | normal          | 837.525  | 244.833
  armhf (N1)       | close-exponents | 16.2111  | 21.8182

Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.

Co-authored-by: kirill <kirill.okhotnikov@gmail.com>

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:36:24 -03:00
Adhemerval Zanella Netto
5c11701c51 benchtests: Add fmodf benchmark
1. Subnormals: 128 inputs.
2. Normal numbers with large exponent difference (|x/y| > 2^8):
   1024 inputs between FLT_MIN and FLT_MAX;
3. Close exponents (ey >= -103 and |x/y| < 2^8): 1024 inputs with
   exponents between -10 and 10.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:13:55 -03:00
Adhemerval Zanella Netto
3ba0c9593f benchtests: Add fmod benchmark
Add three different dataset, from random floating point numbers:

1. Subnormals: 128 inputs.
2. Normal numbers with large exponent difference (|x/y| > 2^52):
   1024 inputs between DBL_MIN and DBL_MAX;
3. Close exponents (ey >= -907 and |x/y| < 2^52): 1024 inputs with
   exponents between -10 and 10.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:13:55 -03:00
H.J. Lu
743113d42e x86: Set FSGSBASE to active if enabled by kernel
Linux kernel uses AT_HWCAP2 to indicate if FSGSBASE instructions are
enabled.  If the HWCAP2_FSGSBASE bit in AT_HWCAP2 is set, FSGSBASE
instructions can be used in user space.  Define dl_check_hwcap2 to set
the FSGSBASE feature to active on Linux when the HWCAP2_FSGSBASE bit is
set.

Add a test to verify that FSGSBASE is active on current kernels.
NB: This test will fail if the kernel doesn't set the HWCAP2_FSGSBASE
bit in AT_HWCAP2 while fsgsbase shows up in /proc/cpuinfo.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-04-03 11:36:48 -07:00