* Micro-optimize TLS access using GCC's native support for gs-based
addressing when available;
* Just use THREAD_GETMEM and THREAD_SETMEM instead of more inline
assembly;
* Sync tcbhead_t layout with NPTL, in particular update/fix __private_ss
offset;
* Statically assert that the two offsets that are a part of ABI are what
we expect them to be.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230214173722.428140-2-bugaevc@gmail.com>
ISO C does not support omitting parameter names in function definitions
before C2X,the compiler is giving an error with older versions of gcc and
this commit will resolve the test failure "error: parameter name omitted"
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
These are supposed to stay 32-bit even on 64-bit systems. This matches
BSD and Linux, as well as how these types are already defined in
tioctl.defs
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
'sem' is the opaque 'sem_t', 'isem' is the actual 'struct new_sem'.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230212111044.610942-6-bugaevc@gmail.com>
This does not seem like it is supposed to return negative error codes.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230212111044.610942-5-bugaevc@gmail.com>
When casting between a pointer and an integer of a different size, GCC
emits a warning (which is escalated to a build failure by -Werror).
Indeed, if what you start with is a pointer, which you then cast to a
shorter integer and then back again, you're going to cut off some bits
of the pointer.
But if you start with an integer (such as mach_port_t), then cast it to
a longer pointer (void *), and then back to a shorter integer, you are
fine. To keep GCC happy, cast through an intermediary uintptr_t, which
is always the same size as a pointer.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230212111044.610942-4-bugaevc@gmail.com>
It has been decided that on x86_64, mach_msg_type_number_t stays 32-bit.
Therefore, it's not possible to use mach_msg_type_number_t
interchangeably with size_t, in particular this breaks when a pointer to
a variable is passed to a MIG routine.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230212111044.610942-3-bugaevc@gmail.com>
Make the code flow more linear using early returns where possible. This
makes it so much easier to reason about what runs on error / successful
code paths.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20230212111044.610942-2-bugaevc@gmail.com>
Likewise use __builtin_LINE instead of __LINE__.
When building C++, inline functions are required to have the exact same
sequence of tokens in every translation unit. But __FILE__ token, when
used in a header file, does not necessarily expand to the exact same
string literal, and that may cause compilation failure when C++ modules
are being used. (It would also cause unpredictable output on assertion
failure at runtime, but this rarely matters in practice.)
For example, given the following sources:
// a.h
#include <assert.h>
inline void fn () { assert (0); }
// a.cc
#include "a.h"
// b.cc
#include "foo/../a.h"
preprocessing a.cc will yield a call to __assert_fail("0", "a.h", ...)
but b.cc will yield __assert_fail("0", "foo/../a.h", ...)
We used to use .cfi_adjust_cfa_offset around %esp manipulation
asm instructions to fix unwinding, but when building glibc with
-fno-omit-frame-pointer this is bogus since in that case %ebp is the CFA and
does not move.
Instead, let's force -fno-omit-frame-pointer when building intr-msg.c so
that %ebp can always be used and no .cfi_adjust_cfa_offset is needed.
It follows the internal signature:
extern int clone3 (struct clone_args *__cl_args, size_t __size,
int (*__func) (void *__arg), void *__arg);
The powerpc64 ABI requires an initial stackframe so the child can
store/restore the TOC. It is create prior calling clone3 by
adjusting the stack size (since kernel will compute the stack as
stack plus size).
Checked on powerpc64-linux-gnu (power8, kernel 6.0) and
powerpc64le-linux-gnu (power9, kernel 4.18).
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
For powerpc, strncmp is used on _dl_string_platform issued by
__tcb_parse_hwcap_and_convert_at_platform.
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Although static linker can optimize it to local call, it follows the
internal scheme to provide hidden proto and definitions.
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Although static linker can optimize it to local call, it follows the
internal scheme to provide hidden proto and definitions.
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
The test is sufficient to detect the ldconfig bug fixed in
commit 9fe6f63638 ("elf: Fix 64 time_t
support for installed statically binaries").
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The hard float abi and hard float are different,
Hard float abi: Use float register to pass float type arguments.
Hard float: Enable the hard float ISA feature.
So the with_fp_cond cannot represent these two features. When
-mfloat-abi=softfp, the float abi is soft and hard float is enabled.
So add 'with_hard_float_abi' in preconfigure and define 'CSKY_HARD_FLOAT_ABI'
if float abi is hard, and use 'CSKY_HARD_FLOAT_ABI' to determine
dynamic linker because it is what determines compatibility.
And with_fp_cond is still needed to tell glibc whether to enable
hard floating feature.
In addition, use AC_TRY_COMMAND to test gcc to ensure compatibility
between different versions of gcc. The original way has a problem
that __CSKY_HARD_FLOAT_FPU_SF__ means the target only has single
hard float-points ISA, so it's not defined in CPUs like ck810f.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch enables the option to influence hwcaps and stfle bits used
by the s390 specific ifunc-resolvers. The currently x86-specific
tunable glibc.cpu.hwcaps is also used on s390x to achieve the task. In
addition the user can also set a CPU arch-level like z13 instead of
single HWCAP and STFLE features.
Note that the tunable only handles the features which are really used
in the IFUNC-resolvers. All others are ignored as the values are only
used inside glibc. Thus we can influence:
- HWCAP_S390_VXRS (z13)
- HWCAP_S390_VXRS_EXT (z14)
- HWCAP_S390_VXRS_EXT2 (z15)
- STFLE_MIE3 (z15)
The influenced hwcap/stfle-bits are stored in the s390-specific
cpu_features struct which also contains reserved fields for future
usage.
The ifunc-resolvers and users of stfle bits are adjusted to use the
information from cpu_features struct.
On 31bit, the ELF_MACHINE_IRELATIVE macro is now also defined.
Otherwise the new ifunc-resolvers segfaults as they depend on
the not yet processed_rtld_global_ro@GLIBC_PRIVATE relocation.
It uses the bitmanip extension to optimize index_fist and index_last
with clz/ctz (using generic implementation that routes to compiler
builtin) and orc.b to check null bytes.
Checked the string test on riscv64 user mode.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
While ppc has the more important string functions in assembly,
there are still a few generic routines used.
Use the Power 6 CMPB insn for testing of zeros.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
While arm has the more important string functions in assembly,
there are still a few generic routines used.
Use the UQSUB8 insn for testing of zeros.
Checked on armv7-linux-gnueabihf
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
While alpha has the more important string functions in assembly,
there are still a few for find the generic routines are used.
Use the CMPBGE insn, via the builtin, for testing of zeros. Use a
simplified expansion of __builtin_ctz when the insn isn't available.
Checked on alpha-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Use UXOR,SBZ to test for a zero byte within a word. While we can
get semi-decent code out of asm-goto, we would do slightly better
with a compiler builtin.
For index_zero et al, sequential testing of bytes is less expensive than
any tricks that involve a count-leading-zeros insn that we don't have.
Checked on hppa-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
GCC's combine pass cannot merge (x >> c | y << (32 - c)) into a
double-word shift unless (1) the subtract is in the same basic block
and (2) the result of the subtract is used exactly once. Neither
condition is true for any use of MERGE.
By forcing the use of a double-word shift, we not only reduce
contention on SAR, but also allow the setting of SAR to be hoisted
outside of a loop.
Checked on hppa-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Now that both strlen and memrchr have word vectorized implementation,
it should be faster to implement strrchr based on memrchr over the
string length instead of calling strchr on a loop.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
New algorithm read the lastaligned address and mask off the unwanted
bytes. The loop now read word-aligned address and check using the
has_eq macro.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
It also cleanups the multiple inclusion by leaving the ifunc
implementation to undef the weak_alias and libc_hidden_def.
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
New algorithm read the first aligned address and mask off the
unwanted bytes (this strategy is similar to arch-specific
implementations used on powerpc, sparc, and sh).
The loop now read word-aligned address and check using the has_eq
macro.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Now that stpcpy is vectorized based on op_t, it should be better to
call it instead of strlen plus memcpy.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu,
and powerpc-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>