On x86_64, we have to pass function arguments in registers, not on the
stack. We also have to align the stack pointer in a specific way. Since
sharing the logic with i386 does not bring much benefit, split the file
back into i386- and x86_64-specific versions, and fix the x86_64 version
to set up the thread properly.
Bonus: i386 keeps doing the extra RPC inside __thread_set_pcsptp to
fetch the state of the thread before setting it; but x86_64 no lnoger
does that.
Checked on x86_64-gnu and i686-gnu.
Fixes be6d002ca2
"hurd: Set up the basic tree for x86_64-gnu"
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-9-bugaevc@gmail.com>
It is illegal to call thread_get_state () on mach_thread_self (), so
this codepath cannot be used as-is to fork the calling thread's TLS.
Fortunately we can use THREAD_SELF (aka %fs:0x0) to find out the value
of our fs_base without calling into the kernel.
Fixes: f6cf701efc
"hurd: Implement TLS for x86_64"
Checked on x86_64-gnu: fork () now works!
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-8-bugaevc@gmail.com>
Unlike sigstate->thread, tcb->self did not hold a Mach port reference on
the thread port it names. This means that the port can be deallocated,
and the name reused for something else, without anyone noticing. Using
tcb->self will then lead to port use-after-free.
Fortunately nothing was accessing tcb->self, other than it being
intially set to then-valid thread port name upon TCB initialization. To
assert that this keeps being the case without altering TCB layout,
rename self -> self_do_not_use, and stop initializing it.
Also, do not (re-)allocate a whole separate and unused stack for the
main thread, and just exit __pthread_setup early in this case.
Found upon attempting to use tcb->self and getting unexpected crashes.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230517191436.73636-7-bugaevc@gmail.com>
The real i386_thread_state Mach structure has an alignment of 8 on
x86_64. However, in struct sigcontext, the compiler was packing sc_gs
(which is the first member of sc_i386_thread_state) into the same 8-byte
slot as sc_error; this resulted in the rest of sc_i386_thread_state
members having wrong offsets relative to each other, and the overall
sc_i386_thread_state layout mismatching that of i386_thread_state.
Fix this by explicitly adding the required padding members, and
statically asserting that this results in the desired alignment.
The same goes for sc_i386_float_state.
Checked on x86_64-gnu.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230515083323.1358039-4-bugaevc@gmail.com>
These were created by creating stub files, running 'make update-abi',
and reviewing the results.
Also, set baseline ABI to GLIBC_2.38, the (upcoming) first glibc
release to first have x86_64-gnu support.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Normally, in static builds, the first code that runs is _start, in e.g.
sysdeps/x86_64/start.S, which quickly calls __libc_start_main, passing
it the argv etc. Among the first things __libc_start_main does is
initializing the tunables (based on env), then CPU features, and then
calls _dl_relocate_static_pie (). Specifically, this runs ifunc
resolvers to pick, based on the CPU features discovered earlier, the
most suitable implementation of "string" functions such as memcpy.
Before that point, calling memcpy (or other ifunc-resolved functions)
will not work.
In the Hurd port, things are more complex. In order to get argv/env for
our process, glibc normally needs to do an RPC to the exec server,
unless our args/env are already located on the stack (which is what
happens to bootstrap processes spawned by GNU Mach). Fetching our
argv/env from the exec server has to be done before the call to
__libc_start_main, since we need to know what our argv/env are to pass
them to __libc_start_main.
On the other hand, the implementation of the RPC (and other initial
setup needed on the Hurd before __libc_start_main can be run) is not
very trivial. In particular, it may (and on x86_64, will) use memcpy.
But as described above, calling memcpy before __libc_start_main can not
work, since the GOT entry for it is not yet initialized at that point.
Work around this by pre-filling the GOT entry with the baseline version
of memcpy, __memcpy_sse2_unaligned. This makes it possible for early
calls to memcpy to just work. The initial value of the GOT entry is
unused on x86_64, and changing it won't interfere with the relocation
being performed later: once _dl_relocate_static_pie () is called, the
baseline version will get replaced with the most suitable one, and that
is what subsequent calls of memcpy are going to call.
Checked on x86_64-gnu.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230429201822.2605207-6-bugaevc@gmail.com>
Checked on x86_64-gnu.
[samuel.thibault@ens-lyon.org: Restored same comments as on i386]
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230429201822.2605207-3-bugaevc@gmail.com>
Properly differentiate between setting up the real TLS with
TLS_INIT_TP, and setting up the early TLS (__init1_tcbhead) in static
builds. In the latter case, don't yet migrate the reply port into the
TCB, and don't yet set __libc_tls_initialized to 1.
This also lets us move the __init1_desc assignment inside
_hurd_tls_init ().
Fixes cd019ddd89
"hurd: Don't leak __hurd_reply_port0"
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
When glibc is built as a shared library, TLS is always initialized by
the call of TLS_INIT_TP () macro made inside the dynamic loader, prior
to running the main program (see dl-call_tls_init_tp.h). We can take
advantage of this: we know for sure that __LIBC_NO_TLS () will evaluate
to 0 in all other cases, so let the compiler know that explicitly too.
Also, only define _hurd_tls_init () and TLS_INIT_TP () under the same
conditions (either !SHARED or inside rtld), to statically assert that
this is the case.
Other than a microoptimization, this also helps with avoiding awkward
sharing of the __libc_tls_initialized variable between ld.so and libc.so
that we would have to do otherwise -- we know for sure that no sharing
is required, simply because __libc_tls_initialized would always be set
to true inside libc.so.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-25-bugaevc@gmail.com>
This reverts commit b37899d34d.
Apparently we load libc.so (and thus start using its functions) before
calling TLS_INIT_TP, so libc.so functions should not actually assume
that TLS is always set up.
Previously, once we set up TLS, we would implicitly switch from using
__hurd_reply_port0 to reply_port inside the TCB, leaving the former
unused. But we never deallocated it, so it got leaked.
Instead, migrate the port into the new TCB's reply_port slot. This
avoids both the port leak and an extra syscall to create a new reply
port for the TCB.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-28-bugaevc@gmail.com>
When glibc is built as a shared library, TLS is always initialized by
the call of TLS_INIT_TP () macro made inside the dynamic loader, prior
to running the main program (see dl-call_tls_init_tp.h). We can take
advantage of this: we know for sure that __LIBC_NO_TLS () will evaluate
to 0 in all other cases, so let the compiler know that explicitly too.
Also, only define _hurd_tls_init () and TLS_INIT_TP () under the same
conditions (either !SHARED or inside rtld), to statically assert that
this is the case.
Other than a microoptimization, this also helps with avoiding awkward
sharing of the __libc_tls_initialized variable between ld.so and libc.so
that we would have to do otherwise -- we know for sure that no sharing
is required, simply because __libc_tls_initialized would always be set
to true inside libc.so.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230319151017.531737-25-bugaevc@gmail.com>
This is based on the Linux port's version, but laid out to match Mach's
struct i386_thread_state, much like the i386 version does.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>