Commit Graph

267 Commits

Author SHA1 Message Date
Szabolcs Nagy
d174ec248d aarch64: redefine RETURN_ADDRESS to strip PAC
RETURN_ADDRESS is used at several places in glibc to mean a valid
code address of the call site, but with pac-ret it may contain a
pointer authentication code (PAC), so its definition is adjusted.

This is gcc PR target/94891: __builtin_return_address should not
expose signed pointers to user code where it can cause ABI issues.
In glibc RETURN_ADDRESS is only changed if it is built with pac-ret.
There is no detection for the specific gcc issue because it is
hard to test and the additional xpac does not cause problems.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-07-08 15:02:38 +01:00
Szabolcs Nagy
c94767712b aarch64: fix pac-ret support in _mcount
Currently gcc -pg -mbranch-protection=pac-ret passes signed return
address to _mcount, so _mcount now has to always strip pac from the
frompc since that's from user code that may be built with pac-ret.

This is gcc PR target/94791: signed pointers should not escape and get
passed across extern call boundaries, since that's an ABI break, but
because existing gcc has this issue we work it around in glibc until
that is resolved. This is compatible with a fixed gcc and it is a nop
on systems without PAuth support. The bug was introduced in gcc-7 with
-msign-return-address=non-leaf|all support which in gcc-9 got renamed
to -mbranch-protection=pac-ret|pac-ret+leaf|standard.

strip_pac uses inline asm instead of __builtin_aarch64_xpaclri since
that is not a documented api and not available in all supported gccs.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-07-08 15:02:38 +01:00
Szabolcs Nagy
1be3d6eb82 aarch64: Add pac-ret support to assembly files
Use return address signing in assembly files for functions that save
LR when pac-ret is enabled in the compiler.

The GNU property note for PAC-RET is not meaningful to the dynamic
linker so it is not strictly required, but it may be used to track
the security property of binaries. (The PAC-RET property is only set
if BTI is set too because BTI implies working GNU property support.)

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-07-08 15:02:38 +01:00
Szabolcs Nagy
9e1751e6d6 aarch64: configure check for pac-ret code generation
Return address signing requires unwinder support, which is
present in libgcc since >=gcc-7, however due to bugs the
support may be broken in <gcc-10 (and similarly there may
be issues in custom unwinders), so pac-ret is not always
safe to use. So in assembly code glibc should only use
pac-ret if the compiler uses it too. Unfortunately there
is no predefined feature macro for it set by the compiler
so pac-ret is inferred from the code generation.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-07-08 15:02:38 +01:00
Szabolcs Nagy
de9301c02e aarch64: ensure objects are BTI compatible
When glibc is built with branch protection (i.e. with a gcc configured
with --enable-standard-branch-protection), all glibc binaries should
be BTI compatible and marked as such.

It is easy to link BTI incompatible objects by accident and this is
silent currently which is usually not the expectation, so this is
changed into a link error. (There is no linker flag for failing on
BTI incompatible inputs so all warnings are turned into fatal errors
outside the test system when building glibc with branch protection.)

Unfortunately, outlined atomic functions are not BTI compatible in
libgcc (PR libgcc/96001), so to build glibc with current gcc use
'CC=gcc -mno-outline-atomics', this should be fixed in libgcc soon
and then glibc can be built and tested without such workarounds.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-07-08 15:02:38 +01:00
Sudakshina Das
605338745b aarch64: enable BTI at runtime
Binaries can opt-in to using BTI via an ELF object file marking.
The dynamic linker has to then mprotect the executable segments
with PROT_BTI. In case of static linked executables or in case
of the dynamic linker itself, PROT_BTI protection is done by the
operating system.

On AArch64 glibc uses PT_GNU_PROPERTY instead of PT_NOTE to check
the properties of a binary because PT_NOTE can be unreliable with
old linkers (old linkers just append the notes of input objects
together and add them to the output without checking them for
consistency which means multiple incompatible GNU property notes
can be present in PT_NOTE).

BTI property is handled in the loader even if glibc is not built
with BTI support, so in theory user code can be BTI protected
independently of glibc. In practice though user binaries are not
marked with the BTI property if glibc has no support because the
static linked libc objects (crt files, libc_nonshared.a) are
unmarked.

This patch relies on Linux userspace API that is not yet in a
linux release but in v5.8-rc1 so scheduled to be in Linux 5.8.

Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-07-08 15:02:37 +01:00
Szabolcs Nagy
5f846c8b0d aarch64: fix RTLD_START for BTI
Tailcalls must use x16 or x17 for the indirect branch instruction
to be compatible with code that uses BTI c at function entries.
(Other forms of indirect branches can only land on BTI j.)

Also added a BTI c at the ELF entry point of rtld, this is not
strictly necessary since the kernel does not use indirect branch
to get there, but it seems safest once building glibc itself with
BTI is supported.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-07-08 15:02:37 +01:00
Sudakshina Das
91181954f9 aarch64: Add BTI support to assembly files
To enable building glibc with branch protection, assembly code
needs BTI landing pads and ELF object file markings in the form
of a GNU property note.

The landing pads are unconditionally added to all functions that
may be indirectly called. When the code segment is not mapped
with PROT_BTI these instructions are nops. They are kept in the
code when BTI is not supported so that the layout of performance
critical code is unchanged across configurations.

The GNU property notes are only added when there is support for
BTI in the toolchain, because old binutils does not handle the
notes right. (Does not know how to merge them nor to put them in
PT_GNU_PROPERTY segment instead of PT_NOTE, and some versions
of binutils emit warnings about the unknown GNU property. In
such cases the produced libc binaries would not have valid
ELF marking so BTI would not be enabled.)

Note: functions using ENTRY or ENTRY_ALIGN now start with an
additional BTI c, so alignment of the following code changes,
but ENTRY_ALIGN_AND_PAD was fixed so there is no change to the
existing code layout. Some string functions may need to be
tuned for optimal performance after this commit.

Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-07-08 15:02:37 +01:00
Szabolcs Nagy
2a4c2dde49 aarch64: Rename place holder .S files to .c
The compiler can add required elf markings based on CFLAGS
but the assembler cannot, so using C code for empty files
creates less of a maintenance problem.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-07-08 15:02:37 +01:00
Szabolcs Nagy
1b0a4f58f5 aarch64: configure test for BTI support
Check BTI support in the compiler and linker.  The check also
requires READELF that understands the BTI GNU property note.
It is expected to succeed with gcc >=gcc-9 configured with
--enable-standard-branch-protection and binutils >=binutils-2.33.

Note: passing -mbranch-protection=bti in CFLAGS when building glibc
may not be enough to get a glibc that supports BTI because crtbegin*
and crtend* provided by the compiler needs to be BTI compatible too.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-07-08 15:02:37 +01:00
Alex Butler
03e1378f94 aarch64: MTE compatible strncmp
Add support for MTE to strncmp. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Branislav Rankov <branislav.rankov@arm.com>
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-23 17:55:39 +01:00
Alex Butler
adac54ffc5 aarch64: MTE compatible strcmp
Add support for MTE to strcmp. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Branislav Rankov <branislav.rankov@arm.com>
Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-23 17:55:39 +01:00
Alex Butler
79160c06c7 aarch64: MTE compatible strrchr
Add support for MTE to strrchr. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-23 17:55:39 +01:00
Alex Butler
df06b0d90f aarch64: MTE compatible memrchr
Add support for MTE to memrchr. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-23 17:55:39 +01:00
Alex Butler
7ff899969f aarch64: MTE compatible memchr
Add support for MTE to memchr. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Gabor Kertesz <gabor.kertesz@arm.com>
2020-06-23 17:55:39 +01:00
Alex Butler
bb2c12aecb aarch64: MTE compatible strcpy
Add support for MTE to strcpy. Regression tested with xcheck and benchmarked
with glibc's benchtests on the Cortex-A53, Cortex-A72, and Neoverse N1.

The existing implementation assumes that any access to the pages in which the
string resides is safe. This assumption is not true when MTE is enabled. This
patch updates the algorithm to ensure that accesses remain within the bounds
of an MTE tag (16-byte chunks) and improves overall performance.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-23 17:55:39 +01:00
Adhemerval Zanella
ea04f02131 aarch64: Remove fpu Makefile
The -fno-math-errno is already added by default and the minimum
required GCC to build glibc (6.2) make the -ffinite-math-only
superflous.

Checked on aarch64-linux-gnu.
2020-06-22 11:09:50 -03:00
Adhemerval Zanella
271afad8f4 aarch64: Use math-use-builtins for ceil{f}
The define is already set on the math-use-builtins-ceil.h, the patch
just removes the implementations (it was missed on c9feb1be93).

Checked on aarch64-linux-gnu.
2020-06-22 11:09:49 -03:00
Adhemerval Zanella
e80501a5c9 math: Decompose math-use-builtins.h
Each symbol definitions are moved on a separated file and it
cover all symbol type definitions (float, double, long double,
and float128).

It allows to set support for architectures without the boiler
place of copying default values.

Checked with a build on the affected ABIs.
2020-06-22 11:09:45 -03:00
Andrea Corallo
a365ac45b7 aarch64: MTE compatible strlen
Introduce an Arm MTE compatible strlen implementation.

The existing implementation assumes that any access to the pages in
which the string resides is safe.  This assumption is not true when
MTE is enabled.  This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance on modern cores. On cores with less
efficient Advanced SIMD implementation such as Cortex-A53 it can
be slower.

Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-09 09:21:11 +01:00
Andrea Corallo
49beaaec1b aarch64: MTE compatible strchr
Introduce an Arm MTE compatible strchr implementation.

The existing implementation assumes that any access to the pages in
which the string resides is safe.  This assumption is not true when
MTE is enabled.  This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance.

Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-09 09:20:27 +01:00
Andrea Corallo
f7de454f20 aarch64: MTE compatible strchrnul
Introduce an Arm MTE compatible strchrnul implementation.

The existing implementation assumes that any access to the pages in
which the string resides is safe.  This assumption is not true when
MTE is enabled.  This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance.

Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-09 09:20:27 +01:00
Krzysztof Koch
d1f75e9644 AArch64: Merge Falkor memcpy and memmove implementations
Falkor's memcpy and memmove share some implementation details,
therefore, the two routines are moved to a single source file
for code reuse.

The two routines now share code for small and medium copies
(up to and including 128 bytes). Large copies in memcpy do not
handle overlap correctly, consequently, the loops for
moving/copying more than 128 bytes stay separate for memcpy
and memmove.

To increase code reuse a number of small modifications were made:

1. The old implementation of memcpy copied the first 16-bytes as
   soon as the size of data was determined to be greater than 32 bytes.
   For memcpy code to also work when copying small/medium overlapping
   data, the first load and store was moved to the large copy case.
2. Medium memcpy case no longer assumes that 16 bytes were already
   copied and uses 8 registers to copy up to 128 bytes.
3. Small case for memmove was enlarged to that of memcpy, which is
   less than or equal to 32 bytes.
4. Medium case for memmove was enlarged to that of memcpy, which is
   less than or equal to 128 bytes.

Other changes include:

1. Improve alignment of existing loop bodies.
2. 'Delouse' memmove and memcpy input arguments. Make sure that
   upper 32-bits of input registers are zeroed if unused.
3. Do one more iteration in memmove loops and reduce the number of
   copies made from the start/end of the buffer, depending on
   the direction of the memmove loop.

Benchmarking:

Looking at the results from bench-memcpy-random.out, we can see that
now memmove_falkor is about 5% faster than memcpy_falkor_old, while
memmove_falkor_old was more than 15% slower. The memcpy implementation
remained largely unmodified, so there is no significant performance
change.

The reason for such a significant memmove performance gain is the
increase of the upper bound on the small copy case to 32 bytes and
the increase of the upper bound on the medium copy case to 128 bytes.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-06-08 14:13:05 +01:00
Vineet Gupta
c9feb1be93 aarch/fpu: use generic builtins based math functions
introduce sysdep header math-use-builtins.h to replace aarch64
implementations with corresponding generic ones.

 - newly inroduced generic sqrt{,f}, fma{,f}
 - existing floor{,f}, nearbyint{,f}, rint{,f}, round{,f}, trunc{,f}
 - Note that generic copysign was already enabled (via generic
   math-use-builtins.h) now thru sysdep header

Tested with build-many-glibcs for aarch64-linux-gnu

This is a non functional change and aarch64 libm before/after was
byte invariant as compared below:

| cd /SCRATCH/vgupta/gnu/install-glibc-A-baseline
| for i in `find . -name libm-2.31.9000.so`; do
|   echo $i; diff $i /SCRATCH/vgupta/gnu/install-glibc-C-reduce-scope/$i ;
|   echo $?;
| done

| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabi/lib/libm-2.31.9000.so
| 0
| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./powerpc-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./microblaze-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./nios2-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./hppa-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./s390x-linux-gnu/lib64/libm-2.31.9000.so
| 0

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-06-03 10:23:33 -07:00
Lexi Shao
59b64f9cbb aarch64: fix strcpy and strnlen for big-endian [BZ #25824]
This patch fixes the optimized implementation of strcpy and strnlen
on a big-endian arm64 machine.

The optimized method uses neon, which can process 128bit with one
instruction. On a big-endian machine, the bit order should be reversed
for the whole 128-bits double word. But with instuction
	rev64	datav.16b, datav.16b
it reverses 64bits in the two halves rather than reversing 128bits.
There is no such instruction as rev128 to reverse the 128bits, but we
can fix this by loading the data registers accordingly.

Fixes 0237b61526e7("aarch64: Optimized implementation of strcpy") and
2911cb68ed3d("aarch64: Optimized implementation of strnlen").

Signed-off-by: Lexi Shao <shaolexi@huawei.com>
Reviewed-by: Szabolcs Nagy  <szabolcs.nagy@arm.com>
2020-05-15 12:15:56 +01:00
Adhemerval Zanella
6a0474c769 Update aarch64 libm-test-ulps 2020-04-08 13:52:44 -03:00
Adhemerval Zanella
1c15464ca0 math: Remove inline math tests
With mathinline removal there is no need to keep building and testing
inline math tests.

The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries.  The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2020-03-19 11:45:44 -03:00
Wilco Dijkstra
7000651327 [AArch64] Improve integer memcpy
Further optimize integer memcpy.  Small cases now include copies up
to 32 bytes.  64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies.  Comments have been rewritten.
2020-03-11 17:15:25 +00:00
Florian Weimer
f4349837d9 Introduce <elf-initfini.h> and ELF_INITFINI for all architectures
This supersedes the init_array sysdeps directory.  It allows us to
check for ELF_INITFINI in both C and assembler code, and skip DT_INIT
and DT_FINI processing completely on newer architectures.

A new header file is needed because <dl-machine.h> is incompatible
with assembler code.  <sysdep.h> is compatible with assembler code,
but it cannot be included in all assembler files because on some
architectures, it redefines register names, and some assembler files
conflict with that.

<elf-initfini.h> is replicated for legacy architectures which need
DT_INIT/DT_FINI support.  New architectures follow the generic default
and disable it.
2020-02-18 15:12:25 +01:00
Andreas Schwab
4970c9e0b5 nptl: add missing pthread-offsets.h
All architectures using their own definition of struct
__pthread_rwlock_arch_t need to provide their own pthread-offsets.h.
2020-02-10 17:01:21 +01:00
Wilco Dijkstra
220622dde5 Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol.  It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).

The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.

Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).

Passes buildmanyglibc.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-03 10:02:04 -03:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Xuelei Zhang
863d775c48 aarch64: add default memcpy version for kunpeng920
Checked on aarch64-linux-gnu.
2019-12-27 11:59:37 -03:00
Xuelei Zhang
10df95cdaf aarch64: ifunc rename for kunpeng
Rename ifunc for kunpeng to kunpeng920, and modify the corresponding
function files including IS_KUNPENG920 judgement.

Checked on aarch64-linux-gnu.
2019-12-27 11:59:51 -03:00
Xuelei Zhang
64297d49b3 aarch64: Modify error-shown comments for strcpy
Checked on aarch64-linux-gnu.
2019-12-27 11:59:37 -03:00
Xuelei Zhang
525de033a9 aarch64: Optimized memset for Kunpeng processor.
Due to the branch prediction issue of Kunpeng processor, we found
memset_generic has poor performance on middle sizes setting, and so
we reconstructed the logic, expanded the loop by 4 times in set_long
to solve the problem, even when setting below 1K sizes have benefit.

Another change is that DZ_ZVA seems no work when setting zero, so we
discarded it and used set_long to set zero instead. Fewer branches and
predictions also make the zero case have slightly improvement.

Checked on aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2019-12-19 16:31:04 -03:00
Xuelei Zhang
c2150769d0 aarch64: Optimized strlen for strlen_asimd
Optimize the strlen implementation by using vector operations and
loop unrolling in main loop.Compared to __strlen_generic,it reduces
latency of cases in bench-strlen by 7%~18% when the length of src
is greater than 128 bytes, with gains throughout the benchmark.

Checked on aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2019-12-19 16:31:04 -03:00
Xuelei Zhang
a7611806d5 aarch64: Optimized implementation of memrchr
Considering the excellent performance of memchr.S on glibc 2.30, the
same algorithm is used to find chrin. Compared to memrchr.c, this
method with memrchr.S achieves an average performance improvement
of 58% based on benchtest and its extension cases.

Checked on aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2019-12-19 16:31:04 -03:00
Xuelei Zhang
2911cb68ed aarch64: Optimized implementation of strnlen
Optimize the strlen implementation by using vector operations and
loop unrooling in main loop. Compared to aarch64/strnlen.S, it
reduces latency of cases in bench-strnlen by 11%~24% when the length
of src is greater than 64 bytes, with gains throughout the benchmark.

Checked on aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2019-12-19 16:31:04 -03:00
Xuelei Zhang
0237b61526 aarch64: Optimized implementation of strcpy
Optimize the strcpy implementation by using vector loads and operations
in main loop.Compared to aarch64/strcpy.S, it reduces latency of cases
in bench-strlen by 5%~18% when the length of src is greater than 64
bytes, with gains throughout the benchmark.

Checked on aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2019-12-19 16:31:04 -03:00
Xuelei Zhang
233efd433d aarch64: Optimized implementation of memcmp
The loop body is expanded from a 16-byte comparison to a 64-byte
comparison, and the usage of ldp is replaced by the Post-index
mode to the Base plus offset mode. Hence, compare can faster 18%
around > 128 bytes in all.

Checked on aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2019-12-19 16:31:04 -03:00
Florian Weimer
4db71d2f98 elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214]
This commit adds missing skip_ifunc checks to aarch64, arm, i386,
sparc, and x86_64.  A new test case ensures that IRELATIVE IFUNC
resolvers do not run in various diagnostic modes of the dynamic
loader.

Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>
2019-12-02 14:55:22 +01:00
Adhemerval Zanella
7ddac7f265 nptl: Add default pthread-offsets.h
This patch adds a default pthread-offsets.h based on default
thread definitions from struct_mutex.h and struct_rwlock.h.
The idea is to simplify new ports inclusion.

Checked with a build on affected abis.

Change-Id: I7785a9581e651feb80d1413b9e03b5ac0452668a
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
7df8af43ad nptl: Add struct_rwlock.h
This patch adds a new generic __pthread_rwlock_arch_t definition meant
to be used by new ports.  Its layout mimics the current usage on some
64 bits ports and it allows some ports to use the generic definition.
The arch __pthread_rwlock_arch_t definition is moved from
pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h).

Also the static intialization macro for pthread_rwlock_t is set to use
an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its
implementation.

The default pthread_rwlock_t layout differs from current ports with:

  1. Internal layout is the same for 32 bits and 64 bits.

  2. Internal flag is an unsigned short so it should not required
     additional padding to align for word boundary (if it is the case
     for the ABI).

Checked with a build on affected abis.

Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
1c3f9acf1f nptl: Add struct_mutex.h
The current way of defining the common mutex definition for POSIX and
C11 on pthreadtypes-arch.h (added by commit 06be6368da) is
not really the best options for newer ports.  It requires define some
misleading flags that should be always defined as 0
(__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it
exposes options used solely for linuxthreads compat mode
(__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and
requires newer ports to explicit define them (adding more boilerplate
code).

This patch adds a new default __pthread_mutex_s definition meant to
be used by newer ports.  Its layout mimics the current usage on both
32 and 64 bits ports and it allows most ports to use the generic
definition.  Only ports that use some arch-specific definition (such
as hardware lock-elision or linuxthreads compat) requires specific
headers.

For 32 bit, the generic definitions mimic the other 32-bit ports
of using an union to define the fields uses on adaptive and robust
mutexes (thus not allowing both usage at same time) and by using a
single linked-list for robust mutexes.  Both decisions seemed to
follow what recent ports have done and make the resulting
pthread_mutex_t/mtx_t object smaller.

Also the static intialization macro for pthread_mutex_t is set to use
a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine
in its struct_mutex.h if it requires additional fields to be
initialized.

Checked with a build on affected abis.

Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
0377a7fde6 nptl: Remove rwlock elision definitions
The new rwlock implementation added by cc25c8b4c1 (2.25) removed
support for lock-elision.  This patch removes remaining the
arch-specific unused definitions.

Checked with a build against all affected ABIs.

Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
48dbce60cf nptl: Add tests for internal pthread_rwlock_t offsets
This patch new build tests to check for internal fields offsets for
internal pthread_rwlock_t definition.  Althoug the '__data.__flags'
field layout should be preserved due static initializators, the patch
also adds tests for the futexes that may be used in a shared memory
(although using different libc version in such scenario is not really
supported).

Checked with a build against all affected ABIs.

Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
71d260c107 nptl: Cleanup mutex internal offset tests
The offsets of pthread_mutex_t __data.__nusers, __data.__spins,
__data.elision, __data.list are not required to be constant over
the releases.  Only the __data.__kind is used for static
initializers.

This patch also adds an additional size check for __data.__kind.

Checked with a build against affected ABIs.

Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f
2019-11-26 13:53:36 +00:00
Krzysztof Koch
b9f145df85 aarch64: Increase small and medium cases for __memcpy_generic
Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.

Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.

Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html

Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S

Tested against GLIBC testsuite and randomized tests.
2019-11-12 17:08:18 +00:00
Alistair Francis
aa706e13f4 Split up endian.h to minimize exposure of BYTE_ORDER.
With only two exceptions (sys/types.h and sys/param.h, both of which
historically might have defined BYTE_ORDER) the public headers that
include <endian.h> only want to be able to test __BYTE_ORDER against
__*_ENDIAN.

This patch creates a new bits/endian.h that can be included by any
header that wants to be able to test __BYTE_ORDER and/or
__FLOAT_WORD_ORDER against the __*_ENDIAN constants, or needs
__LONG_LONG_PAIR.  It only defines macros in the implementation
namespace.

The existing bits/endian.h (which could not be included independently
of endian.h, and only defines __BYTE_ORDER and maybe __FLOAT_WORD_ORDER)
is renamed to bits/endianness.h.  I also took the opportunity to
canonicalize the form of this header, which we are stuck with having
one copy of per architecture.  Since they are so short, this means git
doesn’t understand that they were renamed from existing headers, sigh.

endian.h itself is a nonstandard header and its only remaining use
from a standard header is guarded by __USE_MISC, so I dropped the
__USE_MISC conditionals from around all of the public-namespace things
it defines.  (This means, an application that requests strict library
conformance but includes endian.h will still see the definition of
BYTE_ORDER.)

A few changes to specific bits/endian(ness).h variants deserve
mention:

 - sysdeps/unix/sysv/linux/ia64/bits/endian.h is moved to
   sysdeps/ia64/bits/endianness.h.  If I remember correctly, ia64 did
   have selectable endianness, but we have assembly code in
   sysdeps/ia64 that assumes it’s little-endian, so there is no reason
   to treat the ia64 endianness.h as linux-specific.

 - The C-SKY port does not fully support big-endian mode, the compile
   will error out if __CSKYBE__ is defined.

 - The PowerPC port had extra logic in its bits/endian.h to detect a
   broken compiler, which strikes me as unnecessary, so I removed it.

 - The only files that defined __FLOAT_WORD_ORDER always defined it to
   the same value as __BYTE_ORDER, so I removed those definitions.
   The SH bits/endian(ness).h had comments inconsistent with the
   actual setting of __FLOAT_WORD_ORDER, which I also removed.

 - I *removed* copyright boilerplate from the few bits/endian(ness).h
   headers that had it; these files record a single fact in a fashion
   dictated by an external spec, so I do not think they are copyrightable.

As long as I was changing every copy of ieee754.h in the tree, I
noticed that only the MIPS variant includes float.h, because it uses
LDBL_MANT_DIG to decide among three different versions of
ieee854_long_double.  This patch makes it not include float.h when
GCC’s intrinsic __LDBL_MANT_DIG__ is available.

	* string/endian.h: Unconditionally define LITTLE_ENDIAN,
	BIG_ENDIAN, PDP_ENDIAN, and BYTE_ORDER.	 Condition byteswapping
	macros only on !__ASSEMBLER__.	Move the definitions of
	__BIG_ENDIAN, __LITTLE_ENDIAN, __PDP_ENDIAN, __FLOAT_WORD_ORDER,
	and __LONG_LONG_PAIR to...
	* string/bits/endian.h: ...this new file, which includes
	the renamed header bits/endianness.h for the definition of
	__BYTE_ORDER and possibly __FLOAT_WORD_ORDER.

	* string/Makefile: Install bits/endianness.h.
	* include/bits/endian.h: New wrapper.

	* bits/endian.h: Rename to bits/endianness.h.
	Add multiple-include guard.  Rewrite the comment explaining what
	the machine-specific variants of this file should do.

	* sysdeps/unix/sysv/linux/ia64/bits/endian.h:
	Move to sysdeps/ia64.

	* sysdeps/aarch64/bits/endian.h
	* sysdeps/alpha/bits/endian.h
	* sysdeps/arm/bits/endian.h
	* sysdeps/csky/bits/endian.h
	* sysdeps/hppa/bits/endian.h
	* sysdeps/ia64/bits/endian.h
	* sysdeps/m68k/bits/endian.h
	* sysdeps/microblaze/bits/endian.h
	* sysdeps/mips/bits/endian.h
	* sysdeps/nios2/bits/endian.h
	* sysdeps/powerpc/bits/endian.h
	* sysdeps/riscv/bits/endian.h
	* sysdeps/s390/bits/endian.h
	* sysdeps/sh/bits/endian.h
	* sysdeps/sparc/bits/endian.h
	* sysdeps/x86/bits/endian.h:
	Rename to endianness.h; canonicalize form of file; remove
	redundant definitions of __FLOAT_WORD_ORDER.

	* sysdeps/powerpc/bits/endianness.h: Remove logic to check for
	broken compilers.

	* ctype/ctype.h
	* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
	* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
	* sysdeps/csky/nptl/bits/pthreadtypes-arch.h
	* sysdeps/ia64/ieee754.h
	* sysdeps/ieee754/ieee754.h
	* sysdeps/ieee754/ldbl-128/ieee754.h
	* sysdeps/ieee754/ldbl-128ibm/ieee754.h
	* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
	* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
	* sysdeps/mips/ieee754/ieee754.h
	* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
	* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
	* sysdeps/nptl/pthread.h
	* sysdeps/riscv/nptl/bits/pthreadtypes-arch.h
	* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
	* sysdeps/sparc/sparc32/ieee754.h
	* sysdeps/unix/sysv/linux/generic/bits/stat.h
	* sysdeps/unix/sysv/linux/generic/bits/statfs.h
	* sysdeps/unix/sysv/linux/sys/acct.h
	* wctype/bits/wctype-wchar.h:
	Include bits/endian.h, not endian.h.

	* sysdeps/unix/sysv/linux/hppa/pthread.h: Don’t include endian.h.

	* sysdeps/mips/ieee754/ieee754.h: Use __LDBL_MANT_DIG__
	in ifdefs, instead of LDBL_MANT_DIG.  Only include float.h
	when __LDBL_MANT_DIG__ is not predefined, in which case
	define __LDBL_MANT_DIG__ to equal LDBL_MANT_DIG.
2019-10-01 14:54:46 -07:00