Commit Graph

42 Commits

Author SHA1 Message Date
H.J. Lu
dc76a059fd Add a generic malloc test for MALLOC_ALIGNMENT
1. Add sysdeps/generic/malloc-size.h to define size related macros for
malloc.
2. Move x86_64/tst-mallocalign1.c to malloc and replace ALIGN_MASK with
MALLOC_ALIGN_MASK.
3. Add tst-mallocalign1 to tests-exclude-mcheck for i386 and x32 since
mcheck doesn't honor MALLOC_ALIGNMENT.
2021-07-09 06:39:30 -07:00
H.J. Lu
a6e7c3745d x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064]
commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 23 01:19:34 2021 -0400

    x86-64: Add wcslen optimize for sse4.1

added wcsnlen-sse4.1 to the wcslen ifunc implementation list.  Since the
random value in the the RSI register is larger than the wide-character
string length in the existing wcslen test, it didn't trigger the wcslen
test failure.  Add a test to force 0 into the RSI register before calling
wcslen.
2021-07-08 18:55:40 -04:00
H.J. Lu
b1ec623ed5 x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591]
config/i386/constraints.md in GCC has

(define_constraint "e"
  "32-bit signed integer constant, or a symbolic reference known
   to fit that range (for immediate operands in sign-extending x86-64
   instructions)."
  (match_operand 0 "x86_64_immediate_operand"))

Since movq takes a signed 32-bit immediate or a register source operand,
use "er", instead of "nr"/"ir", constraint for 32-bit signed integer
constant or register on movq.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-04-01 07:00:22 -07:00
Florian Weimer
f267e1c9dd x86_64: Add glibc-hwcaps support
The subdirectories match those in the x86-64 psABI:

77566eb03b

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-04 09:36:02 +01:00
H.J. Lu
107e6a3c22 x86: Support usable check for all CPU features
Support usable check for all CPU features with the following changes:

1. Change struct cpu_features to

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};

so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits.  EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.

The results are

1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
2020-07-13 06:05:16 -07:00
H.J. Lu
9016b6f389 x86: Remove the unused __x86_prefetchw
Since

commit c867597bff
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jun 8 13:57:50 2016 -0700

    X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove

removed the only usage of __x86_prefetchw, we can remove the unused
__x86_prefetchw.
2020-07-11 09:34:03 -07:00
Siddhesh Poyarekar
dce452dc52 Rename the glibc.tune namespace to glibc.cpu
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to.  Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses.  Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.

	* NEWS: Mention the change.
	* elf/dl-tunables.list: Rename tune namespace to cpu.
	* sysdeps/powerpc/dl-tunables.list: Likewise.
	* sysdeps/x86/dl-tunables.list: Likewise.
	* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
	cpu.name.
	* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
	* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
	* manual/README.tunables: Likewise.
	* manual/tunables.texi: Likewise.
	* sysdeps/powerpc/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
	(init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.c: Likewise.
	* sysdeps/x86/cpu-features.h: Likewise.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86_64/Makefile: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-08-02 23:49:19 +05:30
H.J. Lu
b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
H.J. Lu
4d916f0f12 x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299]
Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set
GLRO(dl_platform) to NULL.

	[BZ #22299]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	GLRO(dl_platform) to NULL.
	* sysdeps/x86_64/Makefile (tests): Add tst-platform-1.
	(modules-names): Add tst-platformmod-1 and
	x86_64/tst-platformmod-2.
	(CFLAGS-tst-platform-1.c): New.
	(CFLAGS-tst-platformmod-1.c): Likewise.
	(CFLAGS-tst-platformmod-2.c): Likewise.
	(LDFLAGS-tst-platformmod-2.so): Likewise.
	($(objpfx)tst-platform-1): Likewise.
	($(objpfx)tst-platform-1.out): Likewise.
	(tst-platform-1-ENV): Likewise.
	($(objpfx)x86_64/tst-platformmod-2.os): Likewise.
	* sysdeps/x86_64/tst-platform-1.c: New file.
	* sysdeps/x86_64/tst-platformmod-1.c: Likewise.
	* sysdeps/x86_64/tst-platformmod-2.c: Likewise.
2017-10-19 08:28:26 -07:00
H.J. Lu
45ff34638f x86: Add x86_64 to x86-64 HWCAP [BZ #22093]
Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
"x86_64" subdirectory when loading a shared library.  ld.so in glibc
2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
on supported ISAs.  This led to shared library loading failure for
shared libraries placed under the "x86_64" subdirectory.

This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
search the "x86_64" subdirectory when loading a shared library.

NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
machines.

Tested on i686 and x86-64.

	[BZ #22093]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
	GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
	* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
	(HWCAP_IMPORTANT): Likewise.
	(HWCAP_X86_64): New enum.
	(HWCAP_X86_AVX512_1): Updated.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
	* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
	(modules-names): Add x86_64/tst-x86_64mod-1.
	(LDFLAGS-tst-x86_64mod-1.so): New.
	($(objpfx)tst-x86_64-1): Likewise.
	($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
	(tst-x86_64-1-clean): Likewise.
	* sysdeps/x86_64/tst-x86_64-1.c: New file.
	* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
2017-09-11 08:18:32 -07:00
H.J. Lu
19f1a11e7e Check linker support for INSERT in linker script
Since gold doesn't support INSERT in linker script:

https://sourceware.org/bugzilla/show_bug.cgi?id=21676

tst-split-dynreloc fails to link with gold.  Check if linker supports
INSERT in linker script before using it.

	* config.make.in (have-insert): New.
	* configure.ac (libc_cv_insert): New.  Set to yes if linker
	supports INSERT in linker script.
	(AC_SUBST(libc_cv_insert): New.
	* configure: Regenerated.
	* sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc only
	if $(have-insert) == yes.
2017-08-04 12:17:30 -07:00
H.J. Lu
031e519c95 x86-64: Align the stack in __tls_get_addr [BZ #21609]
This change forces realignment of the stack pointer in __tls_get_addr, so
that binaries compiled by GCCs older than GCC 4.9:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

continue to work even if vector instructions are used in glibc which
require the ABI stack realignment.

__tls_get_addr_slow is added to handle the slow paths in the default
implementation of__tls_get_addr in elf/dl-tls.c.  The new __tls_get_addr
calls __tls_get_addr_slow after realigning the stack.  Internal calls
within ld.so go directly to the default implementation of __tls_get_addr
because they do not need stack realignment.

	[BZ #21609]
	* sysdeps/x86_64/Makefile (sysdep-dl-routines): Add tls_get_addr.
	(gen-as-const-headers): Add rtld-offsets.sym.
	* sysdeps/x86_64/dl-tls.c: New file.
	* sysdeps/x86_64/rtld-offsets.sym: Likwise.
	* sysdeps/x86_64/tls_get_addr.S: Likewise.
	* sysdeps/x86_64/dl-tls.h: Add multiple inclusion guards.
	* sysdeps/x86_64/tlsdesc.sym (TI_MODULE_OFFSET): New.
	(TI_OFFSET_OFFSET): Likwise.
2017-07-06 04:43:20 -07:00
H.J. Lu
3403a17fea x86-64: Verify that _dl_runtime_resolve preserves vector registers
On x86-64, _dl_runtime_resolve must preserve the first 8 vector
registers.  Add 3 _dl_runtime_resolve tests to verify that SSE,
AVX and AVX512 registers are preserved.

	* sysdeps/x86_64/Makefile (tests): Add tst-sse, tst-avx and
	tst-avx512.
	(test-extras): Add tst-avx-aux and tst-avx512-aux.
	(extra-test-objs): Add tst-avx-aux.o and tst-avx512-aux.o.
	(modules-names): Add tst-ssemod, tst-avxmod and tst-avx512mod.
	($(objpfx)tst-sse): New rule.
	($(objpfx)tst-avx): Likewise.
	($(objpfx)tst-avx512): Likewise.
	(CFLAGS-tst-avx-aux.c): New.
	(CFLAGS-tst-avxmod.c): Likewise.
	(CFLAGS-tst-avx512-aux.c): Likewise.
	(CFLAGS-tst-avx512mod.c): Likewise.
	* sysdeps/x86_64/tst-avx-aux.c: New file.
	* sysdeps/x86_64/tst-avx.c: Likewise.
	* sysdeps/x86_64/tst-avx512-aux.c: Likewise.
	* sysdeps/x86_64/tst-avx512.c: Likewise.
	* sysdeps/x86_64/tst-avx512mod.c: Likewise.
	* sysdeps/x86_64/tst-avxmod.c: Likewise.
	* sysdeps/x86_64/tst-sse.c: Likewise.
	* sysdeps/x86_64/tst-ssemod.c: Likewise.
2017-02-09 12:19:58 -08:00
Nick Alcock
fcd942370f x86_64: tst-quad1pie, tst-quad2pie: compile with -fPIE [BZ #7065]
With stack protection enabled, these files have external symbol
references for the first time, so the fact that they are not compiled
with -fPIE and are then linked into a -pie binary starts to hurt.
2016-12-21 12:04:12 +01:00
Andreas Schwab
b4bcb3aec6 Register extra test objects
This makes sure that the extra test objects are compiled with the correct
MODULE_NAME and dependencies are tracked.
2016-04-13 17:07:13 +02:00
Florian Weimer
3c0f7407ee tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269]
This ensures that GCC will not use unsupported instructions before
the run-time check to ensure support.
2016-03-07 16:00:25 +01:00
H.J. Lu
97f7112728 Add a comment in sysdeps/x86_64/Makefile
Mention recursive calls when ENTRY is used in _mcount.S.

	* sysdeps/x86_64/Makefile (sysdep_noprof): Add a comment.
2016-03-04 08:44:58 -08:00
H.J. Lu
87a07a4376 Copy x86_64 _mcount.op from _mcount.o
No need to compile x86_64 _mcount.S with -pg.  We can just copy the
normal static object.

	* gmon/Makefile (noprof): Add $(sysdep_noprof).
	* sysdeps/x86_64/Makefile (sysdep_noprof): Add _mcount.
2016-03-03 06:56:22 -08:00
Joseph Myers
ae5d8eaed0 Remove configure tests for -mno-vzeroupper support.
GCC added support for -mno-vzeroupper in version 4.6.  Thus the
configure tests for this support are obsolete, and this patch removes
them.

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).

	* sysdeps/i386/configure.ac (libc_cv_cc_novzeroupper): Remove
	configure test.
	* sysdeps/i386/configure: Regenerated.
	* sysdeps/x86_64/configure.ac (libc_cv_cc_novzeroupper): Remove
	configure test.
	* sysdeps/x86_64/configure: Regenerated.
	* sysdeps/x86_64/Makefile [$(config-cflags-novzeroupper) = yes]:
	Make code unconditional.
2015-10-09 16:03:48 +00:00
Joseph Myers
1b12cd7f4d Remove configure tests for AVX support.
GCC added support for -mavx and -msse2avx in version 4.4.  Thus the
configure tests for this support are obsolete, and this patch removes
them.

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).

	* sysdeps/i386/configure.ac (libc_cv_cc_avx): Remove configure
	test.
	(libc_cv_cc_sse2avx): Likewise.
	* sysdeps/i386/configure: Regenerated.
	* sysdeps/i386/i686/multiarch/Makefile
	[$(subdir)$(config-cflags-avx) = mathyes]: Change conditional to
	[$(subdir) = math].
	* sysdeps/i386/i686/multiarch/s_fma-fma.c [HAVE_AVX_SUPPORT]: Make
	code unconditional.
	* sysdeps/i386/i686/multiarch/s_fma.c [HAVE_AVX_SUPPORT]:
	Likewise.
	* sysdeps/i386/i686/multiarch/s_fmaf-fma.c [HAVE_AVX_SUPPORT]:
	Likewise.
	* sysdeps/i386/i686/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]:
	Likewise.
	* sysdeps/x86_64/configure.ac (libc_cv_cc_avx): Remove configure
	test.
	(libc_cv_cc_sse2avx): Likewise.
	* sysdeps/x86_64/configure: Regenerated.
	* sysdeps/x86_64/Makefile [$(config-cflags-avx) = yes]: Make code
	unconditional.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_profile)
	[HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT]: Make code
	unconditional.
	(_dl_runtime_profile)
	[!(HAVE_AVX_SUPPORT || HAVE_AVX512_ASM_SUPPORT)]: Remove
	conditional code.
	* sysdeps/x86_64/fpu/multiarch/Makefile
	[$(config-cflags-sse2avx) = yes]: Make code unconditional.
	* sysdeps/x86_64/fpu/multiarch/e_atan2.c
	[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp.c
	[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log.c
	[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_atan.c
	[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_fma.c [HAVE_AVX_SUPPORT]:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_fmaf.c [HAVE_AVX_SUPPORT]:
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sin.c
	[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_tan.c
	[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Likewise.
	* sysdeps/x86_64/multiarch/strcmp.S [HAVE_AVX_SUPPORT]: Likewise.
	* config.h.in (HAVE_AVX_SUPPORT): Remove #undef.
	(HAVE_SSE2AVX_SUPPORT): Likewise.
2015-10-08 15:59:32 +00:00
H.J. Lu
38d22f9f48 Don't disable SSE in x86-64 ld.so
Since x86-64 ld.so preserves vector registers now, we can use SSE in
x86-64 ld.so.  We should run tst-ld-sse-use.sh only on i386.

	* sysdeps/x86/Makefile [$(subdir) == elf] (CFLAGS-.os,
	tests-special, $(objpfx)tst-ld-sse-use.out): Moved to ...
	* sysdeps/i386/Makefile [$(subdir) == elf] (CFLAGS-.os,
	tests-special, $(objpfx)tst-ld-sse-use.out): Here.  Update
	comments.
	* sysdeps/x86_64/Makefile [$(subdir) == elf] (CFLAGS-.os): Add
	-mno-mmx for $(all-rtld-routines).
	* sysdeps/x86/tst-ld-sse-use.sh: Moved to ...
	* sysdeps/i386/tst-ld-sse-use.sh: Here.  Replace x86-64 with
	i386.
2015-08-26 07:55:42 -07:00
H.J. Lu
f3dcae82d5 Save and restore vector registers in x86-64 ld.so
This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve
and _dl_runtime_profile, which save and restore the first 8 vector
registers used for parameter passing.  elf_machine_runtime_setup
selects the proper _dl_runtime_resolve or _dl_runtime_profile based
on _dl_x86_cpu_features.  It avoids race condition caused by
FOREIGN_CALL macros, which are only used for x86-64.

Performance impact of saving and restoring 8 vector registers are
negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when
ld.so is optimized with SSE2.

	[BZ #15128]
	* sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add
	ifuncmain8.
	(modules-names): Add ifuncmod8.
	($(objpfx)ifuncmain8): New rule.
	* sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and
	<cpuid.h>.
	(elf_machine_runtime_setup): Use _dl_runtime_resolve_sse,
	_dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512,
	_dl_runtime_profile_sse, _dl_runtime_profile_avx, or
	_dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE.
	* sysdeps/x86_64/dl-trampoline.S: Rewrite.
	* sysdeps/x86_64/dl-trampoline.h: Likewise.
	* sysdeps/x86_64/ifuncmain8.c: New file.
	* sysdeps/x86_64/ifuncmod8.c: Likewise.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE):
	Removed.
	* sysdeps/x86_64/nptl/tls.h (__128bits): Removed.
	(tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1.
	Change rtld_savespace_sse to __glibc_unused2.
	(RTLD_CHECK_FOREIGN_CALL): Removed.
	(RTLD_ENABLE_FOREIGN_CALL): Likewise.
	(RTLD_PREPARE_FOREIGN_CALL): Likewise.
	(RTLD_FINALIZE_FOREIGN_CALL): Likewise.
2015-08-25 04:34:13 -07:00
Petar Jovanovic
fa19d5c48a Fix dynamic linker issue with bind-now
Fix the bind-now case when DT_REL and DT_JMPREL sections are separate
and there is a gap between them.

	[BZ #14341]
	* elf/dynamic-link.h (elf_machine_lazy_rel): Properly handle the
	case when there is a gap between DT_REL and DT_JMPREL sections.
	* sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc.
	(LDFLAGS-tst-split-dynreloc): New.
	(tst-split-dynreloc-ENV): Likewise.
	* sysdeps/x86_64/tst-split-dynreloc.c: New file.
	* sysdeps/x86_64/tst-split-dynreloc.lds: Likewise.
2015-08-19 05:37:01 -07:00
Roland McGrath
ac9e0e5e40 Clean up sysdep-dl-routines variable. 2015-02-06 10:42:08 -08:00
Richard Henderson
428dd03f5a Remove HP_TIMING_DIFF_INIT and dl_hp_timing_overhead
Without HP_TIMING_ACCUM, dl_hp_timing_overhead is write-only.
If we remove it, there's no point in HP_TIMING_DIFF_INIT either.
2014-07-03 08:38:25 -07:00
Igor Zamyatin
2d63a517e4 Save and restore AVX-512 zmm registers to x86-64 ld.so
AVX-512 ISA adds 512-bit zmm registers.  This patch updates
_dl_runtime_profile to pass zmm registers to run-time audit. It also
changes _dl_x86_64_save_sse and _dl_x86_64_restore_sse to upport zmm
registers, which are called when only when RTLD_PREPARE_FOREIGN_CALL
is used.  Its performance impact is minimum.

	* config.h.in (HAVE_AVX512_SUPPORT): New #undef.
	(HAVE_AVX512_ASM_SUPPORT): Likewise.
	* sysdeps/x86_64/bits/link.h (La_x86_64_zmm): New.
	(La_x86_64_vector): Add zmm.
	* sysdeps/x86_64/Makefile (tests): Add tst-audit10.
	(modules-names): Add tst-auditmod10a and tst-auditmod10b.
	($(objpfx)tst-audit10): New target.
	($(objpfx)tst-audit10.out): Likewise.
	(tst-audit10-ENV): New.
	(AVX512-CFLAGS): Likewise.
	(CFLAGS-tst-audit10.c): Likewise.
	(CFLAGS-tst-auditmod10a.c): Likewise.
	(CFLAGS-tst-auditmod10b.c): Likewise.
	* sysdeps/x86_64/configure.ac: Set config-cflags-avx512,
	HAVE_AVX512_SUPPORT and HAVE_AVX512_ASM_SUPPORT.
	* sysdeps/x86_64/configure: Regenerated.
	* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Add
	AVX-512 zmm register support.
	(_dl_x86_64_save_sse): Likewise.
	(_dl_x86_64_restore_sse): Likewise.
	* sysdeps/x86_64/dl-trampoline.h: Updated to support different
	size vector registers.
	* sysdeps/x86_64/link-defines.sym (YMM_SIZE): New.
	(ZMM_SIZE): Likewise.
	* sysdeps/x86_64/tst-audit10.c: New file.
	* sysdeps/x86_64/tst-auditmod10a.c: Likewise.
	* sysdeps/x86_64/tst-auditmod10b.c: Likewise.
2014-03-13 11:19:08 -07:00
Joseph Myers
73709b2611 Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
H.J. Lu
eb48db7e89 Also run tst-xmmymm.sh on i386 ld.so 2012-11-07 13:50:08 -08:00
Dmitry V. Levin
57c69bef13 Set "fail on error" mode directly in testsuite shell scripts 2012-09-25 02:48:31 +00:00
H.J. Lu
477cc68e90 Add tst-mallocalign1 2012-05-17 09:55:25 -07:00
H.J. Lu
df8a552f6f Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
Ulrich Drepper
e9f82e0d1d Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
Ulrich Drepper
42e08a5438 Implement optimized strcaecmp for x86-64. 2010-07-30 00:14:04 -07:00
Ulrich Drepper
e83c1a8a72 Refine testing for xmm/ymm register use in x86-64 ld.so.
The test now takes the callgraph into account.  Only code called
during runtime relocation is affected by the limitation.  We now
determine the affected object files as closely as possible from
the outside.  This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
2009-07-27 13:40:27 -07:00
Ulrich Drepper
16d2ea4c82 Make sure no code in ld.so uses xmm/ymm registers on x86-64.
This patch introduces a test to make sure no function modifies the
xmm/ymm registers.  With the exception of the auditing functions.

The test is probably too pessimistic.  All code linked into ld.so
is checked.  Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
2009-07-26 16:10:00 -07:00
H.J. Lu
b0ecde3a63 Add AVX support to ld.so auditing for x86-64. 2009-07-10 12:04:14 -07:00
Ulrich Drepper
c9ff0187a6 Introduce TLS descriptors for i386 and x86_64.
* include/inline-hashtab.h: New file, copied from 2005's
	libiberty, with fix for memory leak imported afterwards by
	Glauber de Oliveira Costa.
	* elf/tlsdeschtab.h: New file.
	* elf/dl-reloc.c (_dl_try_allocate_static_tls): Extract from...
	(_dl_allocate_static_tls): ... here.  Rearrange failure path.
	(CHECK_STATIC_TLS): Move to...
	* elf/dynamic-link.h: ... this file.
	(TRY_STATIC_TLS): New macro.
	* elf/dl-conflict.c (CHECK_STATIC_TLS, TRY_STATIC_TLS): Override.
	* elf/elf.h (R_386_TLS_GOTDESC, R_386_TLS_DESC_CALL,
	R_386_TLS_DESC): Define.
	(R_X86_64_PC64, R_X86_GOTOFF64, R_X86_64_GOTPC32): Merge from
	binutils.
	(R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL,
	R_X86_64_TLSDESC): Define.
	(R_386_NUM, R_X86_64_NUM): Adjust.
	* sysdeps/i386/Makefile (sysdep-dl-routines, sysdep_routines,
	systep-rtld-routines): Add tlsdesc and dl-tlsdesc for elf subdir.
	(gen-as-const-headers): Add tlsdesc.sym to csu subdir.
	* sysdeps/i386/dl-lookupcfg.h: New file.  Introduce _dl_unmap to
	release tlsdesc_table.
	* sysdeps/i386/dl-machine.h: Include dl-tlsdesc.h.
	(elf_machine_type_class): Mark R_386_TLS_DESC as PLT class.
	(elf_machine_rel): Handle R_386_TLS_DESC.
	(elf_machine_rela): Likewise.
	(elf_machine_lazy_rel): Likewise.
	(elf_machine_lazy_rela): Likewise.
	* sysdeps/i386/dl-tls.h (struct dl_tls_index): Name it.
	* sysdeps/i386/dl-tlsdesc.S: New file.
	* sysdeps/i386/dl-tlsdesc.h: New file.
	* sysdeps/i386/tlsdesc.c: New file.
	* sysdeps/i386/tlsdesc.sym: New file.
	* sysdeps/i386/bits/linkmap.h (struct link_map_machine): Add
	tlsdesc_table.
	* sysdeps/x86_64/Makefile (sysdep-dl-routines, sysdep_routines,
	systep-rtld-routines): Add tlsdesc and dl-tlsdesc for elf subdir.
	(gen-as-const-headers): Add tlsdesc.sym to csu subdir.
	* sysdeps/x86_64/dl-lookupcfg.h: New file.  Introduce _dl_unmap to
	release tlsdesc_table.
	* sysdeps/x86_64/dl-machine.h: Include dl-tlsdesc.h.
	(elf_machine_runtime_setup): Set up lazy TLSDESC GOT entry.
	(elf_machine_type_class): Mark R_X86_64_TLSDESC as PLT class.
	(elf_machine_rel): Handle R_X86_64_TLSDESC.
	(elf_machine_rela): Likewise.
	(elf_machine_lazy_rel): Likewise.
	* sysdeps/x86_64/dl-tls.h (struct dl_tls_index): Name it.
	(__tls_get_addr): Do not declare for non-shared compiles.
	* sysdeps/x86_64/dl-tlsdesc.S: New file.
	* sysdeps/x86_64/dl-tlsdesc.h: New file.
	* sysdeps/x86_64/tlsdesc.c: New file.
	* sysdeps/x86_64/tlsdesc.sym: New file.
	* sysdeps/x86_64/bits/linkmap.h (struct link_map_machine): Add
	tlsdesc_table for both 32- and 64-bit structs.
2008-05-13 05:41:30 +00:00
Ulrich Drepper
bfe6f5fa89 * sysdeps/unix/sysv/linux/x86_64/sysconf.c: Move cache information
handling to ...
	* sysdeps/x86_64/cacheinfo.c: ... here.  New file.
	* sysdeps/x86_64/Makefile [subdir=string] (sysdep_routines): Add
	cacheinfo.
	* sysdeps/x86_64/memcpy.S: Complete rewrite.
	* sysdeps/x86_64/mempcpy.S: Adjust appropriately.
	Patch by Evandro Menezes <evandro.menezes@amd.com>.

	* sysdeps/unix/sysv/linux/i386/epoll_pwait.S: New file.
2007-05-21 19:21:48 +00:00
Roland McGrath
813d9c0dd8 * sysdeps/i386/i686/Makefile (elide-routines.os): Append hp-timing to
this, not ...
	(static-only-routines): ... this.
	* sysdeps/ia64/Makefile: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/Makefile: Likewise.
	* sysdeps/sparc/sparc64/Makefile: Likewise.
	* sysdeps/x86_64/Makefile: Likewise.
	* sysdeps/i386/i686/hp-timing.c: Revert copyright terms change.
	* sysdeps/ia64/hp-timing.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/hp-timing.c: Likewise.
	* sysdeps/sparc/sparc64/hp-timing.c: Likewise.
2004-08-16 06:46:31 +00:00
Andreas Jaeger
6dbdc56c3d Update.
2002-08-21  Andreas Jaeger  <aj@suse.de>

	* sysdeps/x86_64/sysdep.h (CALL_MCOUNT): Fix it.

	* sysdeps/x86_64/Makefile (sysdep_routines): Add _mcount.

	* sysdeps/x86_64/machine-gmon.h: New.
	* sysdeps/x86_64/_mcount.S: New.
2002-08-21 07:57:48 +00:00
Ulrich Drepper
ccdf0cab1d Update.
* elf/dl-minimal.c: Define _itoa for 32-bit machines with HP timing.
	* elf/dl-reloc.c: Pretty printing.
	* sysdeps/generic/ldsodefs.h: Move _dl_hp_timing_overhead and
	procinfo-related variables in rtld_global struct.
	* elf/dl-support.c: Likewise.
	* elf/rtld.c: Likewise.
	* sysdeps/i386/i686/Makefile: Likewise.
	* sysdeps/i386/i686/hp-timing.c: Likewise.
	* sysdeps/i386/i686/hp-timing.h: Likewise.
	* sysdeps/ia64/Makefile: Likewise.
	* sysdeps/ia64/hp-timing.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/Makefile: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/hp-timing.c: Likewise.
	* sysdeps/unix/sysv/linux/arm/dl-procinfo.c: Likewise.
	* sysdeps/unix/sysv/linux/arm/dl-procinfo.h: Likewise.
	* sysdeps/unix/sysv/linux/i386/Makefile: Likewise.
	* sysdeps/unix/sysv/linux/i386/dl-procinfo.c: Likewise.
	* sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Likewise.
	* sysdeps/x86_64/Makefile: Likewise.
2002-02-01 07:49:47 +00:00
Andreas Jaeger
c9cf6ddeeb Update.
* sysdeps/unix/sysv/linux/x86_64/Makefile: New file.
	* sysdeps/unix/sysv/linux/x86_64/Versions: New file.
	* sysdeps/unix/sysv/linux/x86_64/bits/fcntl.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/bits/mman.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/bits/stat.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/bits/statfs.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/bits/time.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/bits/types.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/brk.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/clone.S: New file.
	* sysdeps/unix/sysv/linux/x86_64/fstatfs64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/ftruncate64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/fxstat.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/fxstat64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/getdents.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/getdents64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/getrlimit64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/gettimeofday.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/glob64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/lxstat.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/lxstat64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/mmap64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/pread64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/profil-counter.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/pwrite64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/readdir.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/readdir64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/readdir64_r.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/readdir_r.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/recv.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/register-dump.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/send.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/setrlimit64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/sigaction.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/sigcontextinfo.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/sigpending.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/sigprocmask.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/sigsuspend.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/statfs64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/sys/perm.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/sys/procfs.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/sys/reg.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/sys/ucontext.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/sys/user.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/syscall.S: New file.
	* sysdeps/unix/sysv/linux/x86_64/syscalls.list: New file.
	* sysdeps/unix/sysv/linux/x86_64/sysdep.S: New file.
	* sysdeps/unix/sysv/linux/x86_64/sysdep.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/time.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/truncate64.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/umount.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/vfork.S: New file.
	* sysdeps/unix/sysv/linux/x86_64/xstat.c: New file.
	* sysdeps/unix/sysv/linux/x86_64/xstat64.c: New file.
	* sysdeps/unix/x86_64/sysdep.S: New file.
	* sysdeps/unix/x86_64/sysdep.h: New file.
	* sysdeps/x86_64/Implies: New file.
	* sysdeps/x86_64/Makefile: New file.
	* sysdeps/x86_64/Versions: New file.
	* sysdeps/x86_64/__longjmp.S: New file.
	* sysdeps/x86_64/abort-instr.h: New file.
	* sysdeps/x86_64/atomicity.h: New file.
	* sysdeps/x86_64/bits/endian.h: New file.
	* sysdeps/x86_64/bits/setjmp.h: New file.
	* sysdeps/x86_64/bits/string.h: New file.
	* sysdeps/x86_64/bp-asm.h: New file.
	* sysdeps/x86_64/bsd-_setjmp.S: New file.
	* sysdeps/x86_64/bsd-setjmp.S: New file.
	* sysdeps/x86_64/dl-machine.h: New file.
	* sysdeps/x86_64/elf/initfini.c: New file.
	* sysdeps/x86_64/elf/start.S: New file.
	* sysdeps/x86_64/ffs.c: New file.
	* sysdeps/x86_64/ffsll.c: New file.
	* sysdeps/x86_64/fpu/bits/fenv.h: New file.
	* sysdeps/x86_64/fpu/bits/mathdef.h: New file.
	* sysdeps/x86_64/fpu/e_acosl.c: New file.
	* sysdeps/x86_64/fpu/e_atan2l.c: New file.
	* sysdeps/x86_64/fpu/e_exp2l.S: New file.
	* sysdeps/x86_64/fpu/e_expl.c: New file.
	* sysdeps/x86_64/fpu/e_fmodl.S: New file.
	* sysdeps/x86_64/fpu/e_log10l.S: New file.
	* sysdeps/x86_64/fpu/e_log2l.S: New file.
	* sysdeps/x86_64/fpu/e_logl.S: New file.
	* sysdeps/x86_64/fpu/e_powl.S: New file.
	* sysdeps/x86_64/fpu/e_rem_pio2l.c: New file.
	* sysdeps/x86_64/fpu/e_scalbl.S: New file.
	* sysdeps/x86_64/fpu/e_sqrtl.c: New file.
	* sysdeps/x86_64/fpu/fclrexcpt.c: New file.
	* sysdeps/x86_64/fpu/fedisblxcpt.c: New file.
	* sysdeps/x86_64/fpu/feenablxcpt.c: New file.
	* sysdeps/x86_64/fpu/fegetenv.c: New file.
	* sysdeps/x86_64/fpu/fegetexcept.c: New file.
	* sysdeps/x86_64/fpu/fegetround.c: New file.
	* sysdeps/x86_64/fpu/feholdexcpt.c: New file.
	* sysdeps/x86_64/fpu/fesetenv.c: New file.
	* sysdeps/x86_64/fpu/fesetround.c: New file.
	* sysdeps/x86_64/fpu/fgetexcptflg.c: New file.
	* sysdeps/x86_64/fpu/fraiseexcpt.c: New file.
	* sysdeps/x86_64/fpu/fsetexcptflg.c: New file.
	* sysdeps/x86_64/fpu/ftestexcept.c: New file.
	* sysdeps/x86_64/fpu/libm-test-ulps: New file.
	* sysdeps/x86_64/fpu/math_ldbl.h: New file.
	* sysdeps/x86_64/fpu/printf_fphex.c: New file.
	* sysdeps/x86_64/fpu/s_atanl.c: New file.
	* sysdeps/x86_64/fpu/s_cosl.S: New file.
	* sysdeps/x86_64/fpu/s_expm1l.S: New file.
	* sysdeps/x86_64/fpu/s_fpclassifyl.c: New file.
	* sysdeps/x86_64/fpu/s_isinfl.c: New file.
	* sysdeps/x86_64/fpu/s_isnanl.c: New file.
	* sysdeps/x86_64/fpu/s_log1pl.S: New file.
	* sysdeps/x86_64/fpu/s_logbl.c: New file.
	* sysdeps/x86_64/fpu/s_nextafterl.c: New file.
	* sysdeps/x86_64/fpu/s_nexttoward.c: New file.
	* sysdeps/x86_64/fpu/s_nexttowardf.c: New file.
	* sysdeps/x86_64/fpu/s_rintl.c: New file.
	* sysdeps/x86_64/fpu/s_significandl.c: New file.
	* sysdeps/x86_64/fpu/s_sincosl.S: New file.
	* sysdeps/x86_64/fpu/s_sinl.S: New file.
	* sysdeps/x86_64/fpu/s_tanl.S: New file.
	* sysdeps/x86_64/gmp-mparam.h: New file.
	* sysdeps/x86_64/hp-timing.c: New file.
	* sysdeps/x86_64/hp-timing.h: New file.
	* sysdeps/x86_64/htonl.S: New file.
	* sysdeps/x86_64/memusage.h: New file.
	* sysdeps/x86_64/setjmp.S: New file.
	* sysdeps/x86_64/soft-fp/sfp-machine.h: New file.
	* sysdeps/x86_64/stackinfo.h: New file.
	* sysdeps/x86_64/sysdep.h: New file.
	* sysdeps/unix/sysv/linux/x86_64/ldd-rewrite.sed: New file.
2001-09-19 10:37:31 +00:00