Commit Graph

15601 Commits

Author SHA1 Message Date
Sergey Bugaev
cd019ddd89 hurd: Don't leak __hurd_reply_port0
Previously, once we set up TLS, we would implicitly switch from using
__hurd_reply_port0 to reply_port inside the TCB, leaving the former
unused. But we never deallocated it, so it got leaked.

Instead, migrate the port into the new TCB's reply_port slot. This
avoids both the port leak and an extra syscall to create a new reply
port for the TCB.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-28-bugaevc@gmail.com>
2023-04-11 00:24:40 +02:00
Sergey Bugaev
747812349d hurd: Improve reply port handling when exiting signal handlers
If we're doing signals, that means we've already got the signal thread
running, and that implies TLS having been set up. So we know that
__hurd_local_reply_port will resolve to THREAD_SELF->reply_port, and can
access that directly using the THREAD_GETMEM and THREAD_SETMEM macros.
This avoids potential miscompilations, and should also be a tiny bit
faster.

Also, use mach_port_mod_refs () and not mach_port_destroy () to destroy
the receive right. mach_port_destroy () should *never* be used on
mach_task_self (); this can easily lead to port use-after-free
vulnerabilities if the task has any other references to the same port.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-26-bugaevc@gmail.com>
2023-04-10 23:54:28 +02:00
Sergey Bugaev
b37899d34d hurd: Only check for TLS initialization inside rtld or in static builds
When glibc is built as a shared library, TLS is always initialized by
the call of TLS_INIT_TP () macro made inside the dynamic loader, prior
to running the main program (see dl-call_tls_init_tp.h). We can take
advantage of this: we know for sure that __LIBC_NO_TLS () will evaluate
to 0 in all other cases, so let the compiler know that explicitly too.

Also, only define _hurd_tls_init () and TLS_INIT_TP () under the same
conditions (either !SHARED or inside rtld), to statically assert that
this is the case.

Other than a microoptimization, this also helps with avoiding awkward
sharing of the __libc_tls_initialized variable between ld.so and libc.so
that we would have to do otherwise -- we know for sure that no sharing
is required, simply because __libc_tls_initialized would always be set
to true inside libc.so.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-25-bugaevc@gmail.com>
2023-04-10 23:33:30 +02:00
Sergey Bugaev
4644fb9c4c elf: Stop including tls.h in ldsodefs.h
Nothing in there needs tls.h

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-24-bugaevc@gmail.com>
2023-04-10 23:26:28 +02:00
Sergey Bugaev
60f9bf9746 hurd: Port trampoline.c to x86_64
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-3-bugaevc@gmail.com>
2023-04-10 20:44:43 +02:00
Sergey Bugaev
645da826bb hurd: Do not declare local variables volatile
These are just regular local variables that are not accessed in any
funny ways, not even though a pointer. There's absolutely no reason to
declare them volatile. It only ends up hurting the quality of the
generated machine code.

If anything, it would make sense to decalre sigsp as *pointing* to
volatile memory (volatile void *sigsp), but evidently that's not needed
either.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230403115621.258636-2-bugaevc@gmail.com>
2023-04-10 20:42:28 +02:00
Sergey Bugaev
892f702827 hurd: Implement x86_64/intr-msg.h
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-18-bugaevc@gmail.com>
2023-04-10 20:39:28 +02:00
Sergey Bugaev
57df0f16b4 hurd: Add sys/ucontext.h and sigcontext.h for x86_64
This is based on the Linux port's version, but laid out to match Mach's
struct i386_thread_state, much like the i386 version does.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
2023-04-10 20:11:43 +02:00
Flavio Cruz
f7f7dd8009 hurd: Stop depending on the default_pager stubs provided by gnumach
The hurd source tree already provides the same stubs and they are only
needed there.
Message-Id: <ZDN3rDdjMowtUWf7@jupiter.tail36e24.ts.net>
2023-04-10 19:01:52 +02:00
H.J. Lu
81a3cc956e <sys/platform/x86.h>: Add PREFETCHI support
Add PREFETCHI support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
b05521c916 <sys/platform/x86.h>: Add AMX-COMPLEX support
Add AMX-COMPLEX support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
609b7b2d3c <sys/platform/x86.h>: Add AVX-NE-CONVERT support
Add AVX-NE-CONVERT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
4c120c88a6 <sys/platform/x86.h>: Add AVX-VNNI-INT8 support
Add AVX-VNNI-INT8 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
b39741b45f <sys/platform/x86.h>: Add MSRLIST support
Add MSRLIST support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
96037c697d <sys/platform/x86.h>: Add AVX-IFMA support
Add AVX-IFMA support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
8b4cc05eab <sys/platform/x86.h>: Add AMX-FP16 support
Add AMX-FP16 support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
227983551d <sys/platform/x86.h>: Add WRMSRNS support
Add WRMSRNS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
a00db8305d <sys/platform/x86.h>: Add ArchPerfmonExt support
Add Architectural Performance Monitoring Extended Leaf (EAX = 23H)
support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
2f02d0d8e1 <sys/platform/x86.h>: Add CMPCCXADD support
Add CMPCCXADD support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
aa528a579b <sys/platform/x86.h>: Add LASS support
Add Linear Address Space Separation (LASS) support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
231bf916ce <sys/platform/x86.h>: Add RAO-INT support
Add RAO-INT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
fb90dc8513 <sys/platform/x86.h>: Add LBR support
Add architectural LBR support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
f47b7d96fb <sys/platform/x86.h>: Add RTM_FORCE_ABORT support
Add RTM_FORCE_ABORT support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
f6790a489d <sys/platform/x86.h>: Add SGX-KEYS support
Add SGX-KEYS support to <sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
09cc5fee21 <sys/platform/x86.h>: Add BUS_LOCK_DETECT support
Add Bus lock debug exceptions (BUS_LOCK_DETECT) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
8c8e391166 <sys/platform/x86.h>: Add LA57 support
Add 57-bit linear addresses and five-level paging (LA57) support to
<sys/platform/x86.h>.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
H.J. Lu
2d8c590a5e <bits/platform/x86.h>: Rename to x86_cpu_INDEX_7_ECX_15
Rename x86_cpu_INDEX_7_ECX_1 to x86_cpu_INDEX_7_ECX_15 for the unused bit
15 in ECX from CPUID with EAX == 0x7 and ECX == 0.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-05 14:46:10 -07:00
John David Anglin
c4468cd399 hppa: Update struct __pthread_rwlock_arch_t comment.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-04-05 18:54:47 +00:00
John David Anglin
e9327e8584 hppa: Revise __TIMESIZE define to use __WORDSIZE
Handle both 32 and 64-bit ABIs.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-04-05 18:35:38 +00:00
Guy-Fleury Iteriteka
5476f8cd2e htl: move pthread_self info libc.
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-4-gfleury@disroot.org>
2023-04-05 01:26:36 +02:00
Guy-Fleury Iteriteka
f987e9b7a3 htl: move ___pthread_self into libc.
sysdeps/mach/hurd/htl/pt-pthread_self.c: New file.
htl/Makefile: .. Add it to libc routine.
sysdeps/mach/hurd/htl/pt-sysdep.c(__pthread_self): Remove it.
sysdeps/mach/hurd/htl/pt-sysdep.h(__pthread_self): Add hidden propertie.
htl/Versions(__pthread_self) Version it as private symbol.

Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230318095826.1125734-3-gfleury@disroot.org>
2023-04-05 01:26:34 +02:00
Andreas Schwab
856bab7717 x86/dl-cacheinfo: remove unsused parameter from handle_amd
Also replace an unreachable assert with __builtin_unreachable.
2023-04-04 16:16:21 +02:00
Adhemerval Zanella
59db5735e6 powerpc: Disable stack protector in early static initialization
Similar to fb95c31638, also disable
for string-ppc64.c (pulled on rltd as the default string
implementation).

Checked on powerpc64-linux-gnu.
2023-04-03 17:42:08 -03:00
Adhemerval Zanella
370da8a121 nptl: Fix tst-cancel30 on sparc64
As indicated by sparc kernel-features.h, even though sparc64 defines
__NR_pause,  it is not supported (ENOSYS).  Always use ppoll or the
64 bit time_t variant instead.
2023-04-03 17:41:59 -03:00
Adhemerval Zanella Netto
16439f419b math: Remove the error handling wrapper from fmod and fmodf
The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper
with SVID error handling around the new code.  There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).

The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.  For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).

It shows an small improvement, the results for fmod:

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 12.5049  | 9.40992
  x86_64 (Ryzen 9) | normal          | 296.939  | 296.738
  x86_64 (Ryzen 9) | close-exponents | 16.0244  | 13.119
  aarch64 (N1)     | subnormal       | 6.81778  | 4.33313
  aarch64 (N1)     | normal          | 155.620  | 152.915
  aarch64 (N1)     | close-exponents | 8.21306  | 5.76138
  armhf (N1)       | subnormal       | 15.1083  | 14.5746
  armhf (N1)       | normal          | 244.833  | 241.738
  armhf (N1)       | close-exponents | 21.8182  | 22.457

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:45:27 -03:00
Adhemerval Zanella Netto
cf9cf33199 math: Improve fmodf
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:

   mx * 2^ex == 2 * mx * 2^(ex - 1)

   while (ex > ey)
     {
       mx *= 2;
       --ex;
       mx %= my;
     }

With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 8 (which is sizeof of uint32_t minus
MANTISSA_WIDTH plus the signal bit):

   while (ex > ey)
     {
       mx << 8;
       ex -= 8;
       mx %= my;
     }  */

The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles.  Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.

I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 17.2549  | 12.0318
  x86_64 (Ryzen 9) | normal          | 85.4096  | 49.9641
  x86_64 (Ryzen 9) | close-exponents | 19.1072  | 15.8224
  aarch64 (N1)     | subnormal       | 10.2182  | 6.81778
  aarch64 (N1)     | normal          | 60.0616  | 20.3667
  aarch64 (N1)     | close-exponents | 11.5256  | 8.39685

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 11.6662  | 10.8955
  armhf (N1)       | normal          | 69.2759  | 34.1524
  armhf (N1)       | close-exponents | 13.6472  | 18.2131

Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.

Co-authored-by: kirill <kirill.okhotnikov@gmail.com>

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:45:18 -03:00
Adhemerval Zanella Netto
34b9f8bc17 math: Improve fmod
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:

   mx * 2^ex == 2 * mx * 2^(ex - 1)

   while (ex > ey)
     {
       mx *= 2;
       --ex;
       mx %= my;
     }

With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):

   while (ex > ey)
     {
       mx << 11;
       ex -= 11;
       mx %= my;
     }  */

The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles.  Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.

I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 19.1584  | 12.5049
  x86_64 (Ryzen 9) | normal          | 1016.51  | 296.939
  x86_64 (Ryzen 9) | close-exponents | 18.4428  | 16.0244
  aarch64 (N1)     | subnormal       | 11.153   | 6.81778
  aarch64 (N1)     | normal          | 528.649  | 155.62
  aarch64 (N1)     | close-exponents | 11.4517  | 8.21306

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 15.908   | 15.1083
  armhf (N1)       | normal          | 837.525  | 244.833
  armhf (N1)       | close-exponents | 16.2111  | 21.8182

Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.

Co-authored-by: kirill <kirill.okhotnikov@gmail.com>

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:36:24 -03:00
H.J. Lu
743113d42e x86: Set FSGSBASE to active if enabled by kernel
Linux kernel uses AT_HWCAP2 to indicate if FSGSBASE instructions are
enabled.  If the HWCAP2_FSGSBASE bit in AT_HWCAP2 is set, FSGSBASE
instructions can be used in user space.  Define dl_check_hwcap2 to set
the FSGSBASE feature to active on Linux when the HWCAP2_FSGSBASE bit is
set.

Add a test to verify that FSGSBASE is active on current kernels.
NB: This test will fail if the kernel doesn't set the HWCAP2_FSGSBASE
bit in AT_HWCAP2 while fsgsbase shows up in /proc/cpuinfo.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-04-03 11:36:48 -07:00
Florian Weimer
5d1ccdda7b x86_64: Fix asm constraints in feraiseexcept (bug 30305)
The divss instruction clobbers its first argument, and the constraints
need to reflect that.  Fortunately, with GCC 12, generated code does
not actually change, so there is no externally visible bug.

Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-04-03 18:40:52 +02:00
Sergey Bugaev
17841fa7d4 hurd: Add vm_param.h for x86_64
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-30-bugaevc@gmail.com>
2023-04-03 01:24:13 +02:00
Sergey Bugaev
20427b8f23 hurd: Implement _hurd_longjmp_thread_state for x86_64
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-29-bugaevc@gmail.com>
2023-04-03 01:23:30 +02:00
Sergey Bugaev
e0bbae0062 htl: Implement thread_set_pcsptp for x86_64
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-23-bugaevc@gmail.com>
2023-04-03 01:18:27 +02:00
Sergey Bugaev
8d873a4904 x86_64: Add rtld-stpncpy & rtld-strncpy
Just like the other existing rtld-str* files, this provides rtld with
usable versions of stpncpy and strncpy.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-22-bugaevc@gmail.com>
2023-04-03 01:17:56 +02:00
Sergey Bugaev
fb9e7f6732 htl: Add tcb-offsets.sym for x86_64
The source code is the same as sysdeps/i386/htl/tcb-offsets.sym, but of
course the produced tcb-offsets.h will be different.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-21-bugaevc@gmail.com>
2023-04-03 01:15:30 +02:00
Sergey Bugaev
d8b69e89d8 hurd: Move a couple of signal-related files to x86
These do not need any changes to be used on x86_64.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-20-bugaevc@gmail.com>
2023-04-03 01:14:51 +02:00
Sergey Bugaev
a1fbae7527 hurd: Use uintptr_t for register values in trampoline.c
This is more correct, if only because these fields are defined as having
the type unsigned int in the Mach headers, so casting them to a signed
int and then back is suboptimal.

Also, remove an extra reassignment of uesp -- this is another remnant of
the ecx kludge.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-16-bugaevc@gmail.com>
2023-04-03 01:13:28 +02:00
Sergey Bugaev
b43cb67457 hurd: Move rtld-strncpy-c.c out of mach/hurd/
There's nothing Mach- or Hurd-specific about it; any port that ends
up with rtld pulling in strncpy will need this.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-15-bugaevc@gmail.com>
2023-04-03 01:10:23 +02:00
Sergey Bugaev
0001a23f7a hurd: More 64-bit integer casting fixes
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-13-bugaevc@gmail.com>
2023-04-03 01:03:06 +02:00
Sergey Bugaev
af2942cc62 mach, hurd: Drop __libc_lock_self0
This was used for the value of libc-lock's owner when TLS is not yet set
up, so THREAD_SELF can not be used. Since the value need not be anything
specific -- it just has to be non-NULL -- we can just use a plain
constant, such as (void *) 1, for this. This avoids accessing the symbol
through GOT, and exporting it from libc.so in the first place.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-12-bugaevc@gmail.com>
2023-04-03 01:02:44 +02:00
Sergey Bugaev
71232da3b3 hurd: Remove __hurd_threadvar_stack_{offset,mask}
Noone is or should be using __hurd_threadvar_stack_{offset,mask}, we
have proper TLS now. These two remaining variables are never set to
anything other than zero, so any code that would try to use them as
described would just dereference a zero pointer and crash. So remove
them entirely.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-6-bugaevc@gmail.com>
2023-04-03 00:53:25 +02:00