Commit Graph

277 Commits

Author SHA1 Message Date
Joseph Myers
32db86d558 Add fall-through comments.
This patch adds fall-through comments in some cases where -Wextra
produces implicit-fallthrough warnings.

The patch is non-exhaustive.  Apart from architecture-specific code
for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy
code, probably should have such changes, but left to be dealt with
separately), or places that already had comments about the
fall-through but not matching the form expected by
-Wimplicit-fallthrough=3 (the default level with -Wextra; my
inclination is to adjust those comments to match rather than
downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one
place where I thought the implicit fallthrough was not correct and so
should be handled separately as a bug fix.  I think the key thing to
consider in review of this patch is whether the fall-through is indeed
intended and correct in each place where such a comment is added.

Tested for x86_64.

	* elf/dl-exception.c (_dl_exception_create_format): Add
	fall-through comments.
	* elf/ldconfig.c (parse_conf_include): Likewise.
	* elf/rtld.c (print_statistics): Likewise.
	* locale/programs/charmap.c (parse_charmap): Likewise.
	* misc/mntent_r.c (__getmntent_r): Likewise.
	* posix/wordexp.c (parse_arith): Likewise.
	(parse_backtick): Likewise.
	* resolv/ns_ttl.c (ns_parse_ttl): Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
2019-02-12 10:30:34 +00:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
H.J. Lu
8700a7851b x86-64: Vectorize sincosf_poly and update s_sincosf-fma.c
Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized
sincosf_poly.  Add x86 sincosf_poly.h to vectorize sincosf_poly.  On
Broadwell, bench-sincosf shows:

       Before         After      Improvement
max    160.273        114.198        40%
min    6.25           5.625          11%
mean   13.0325        10.6462        22%

Vectorized sincosf_poly shows

       Before         After      Improvement
max    138.653        114.198        21%
min    5.004          5.625          -11%
mean   11.5934        10.6462        9%

Tested on x86-64 and i686 as well as with build-many-glibcs.py.

	* sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>.
	(sincos_t, sincosf_poly, sinf_poly): Moved to ...
	* sysdeps/ieee754/flt-32/sincosf_poly.h: Here.  New file.
	* sysdeps/x86/fpu/s_sincosf_data.c: New file.
	* sysdeps/x86/fpu/sincosf_poly.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include
	<sysdeps/ieee754/flt-32/s_sincosf.c>.
2018-12-26 06:56:13 -08:00
Joseph Myers
da75c1b180 Remove x86 mathinline.h.
After previous cleanups, the only code in the x86 bits/mathinline.h
that is relevant with current compilers is the inline of
__ieee754_atan2l that is conditional on __LIBC_INTERNAL_MATH_INLINES
(i.e. for when libm itself is being built).

This inline is something that does belong in glibc not GCC, since
__ieee754_atan2l is a purely internal function name.  This patch moves
that inline to a new sysdeps/x86/fpu/math_private.h, removing the
bits/mathinline.h header.

Note that previously the inline was only for non-SSE 32-bit x86.  That
condition does not make sense, however, for a long double function; if
it's not inlined, exactly the same x87 instruction will end up getting
used by the out-of-line function, for both 32-bit and 64-bit.  So that
condition is not retained in the new version.

Tested for x86_64 and x86.  As expected, installed stripped shared
libraries are unchanged for 32-bit x86, but installed stripped libm.so
is changed for x86_64 because calls to __ieee754_atan2l start being
inlined where previously they were out of line calls.  (The same
change to start inlining the function would presumably also apply for
32-bit built with -mfpmath=sse, but that's not a configuration I've
tested.)

	* sysdeps/x86/fpu/math_private.h: New file.
	* sysdeps/x86/fpu/bits/mathinline.h: Remove.
2018-12-19 22:55:32 +00:00
Joseph Myers
515f463f52 Remove x86 mathinline.h sinh, cosh, tanh inlines.
Continuing the removal of bits/mathinline.h inlines that would better
be done by the compiler, this patch removes x86 inlines for sinh, cosh
and tanh functions (inlines only previously present for fast-math,
non-SSE 32-bit x86).  I've filed
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88556> for adding such
inlines as an optimization in GCC.

I believe the only remaining part of the x86 bits/mathinline.h that
does anything useful with current compilers after this patch is the
__LIBC_INTERNAL_MATH_INLINES inline of __ieee754_atan2l; I intend to
remove the whole header and move that inline to a sysdeps
math_private.h header in a subsequent patch.

Tested for x86_64 and x86.

	* sysdeps/x86/fpu/bits/mathinline.h (sinh): Remove inline
	definition.
	(cosh): Likewise.
	(tanh): Likewise.
2018-12-19 21:48:03 +00:00
H.J. Lu
cd815050e5 x86: Merge i386/x86_64 atomic-machine.h
Merge i386 and x86_64 atomic-machine.h to x86 atomic-machine.h.

Tested on i686 and x86_64 as well as with build-many-glibcs.py.

	* sysdeps/i386/atomic-machine.h: Merged with ...
	* sysdeps/x86_64/atomic-machine.h: To ...
	* sysdeps/x86/atomic-machine.h: This.  New file.
2018-12-18 04:25:26 -08:00
Joseph Myers
033a2c0a20 Remove x86 mathinline.h asinh, acosh, atanh inlines.
Continuing the removal of bits/mathinline.h inlines that would better
be done by the compiler, this patch removes x86 inlines for asinh,
acosh and atanh functions (only for fast-math, non-SSE 32-bit x86).
I've filed <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88502> for
adding such inlines as an optimization in GCC.

Tested for x86_64 and x86.

	* sysdeps/x86/fpu/bits/mathinline.h (asinh): Remove inline
	definition.
	(acosh): Likewise.
	(atanh): Likewise.
2018-12-14 22:35:57 +00:00
Carlos O'Donell
ade8b817fe x86: Add Hygon Dhyana support.
This patch fix Hygon Dhyana processor CPU Vendor ID detection
problem in glibc sysdep module, current glibc codes doesn't
recognize Dhyana CPU Vendor ID("HygonGenuine") and set kind to
arch_kind_other, which result to incorrect zero value for
__cache_sysconf() syscall. As Hygon Dhyana share most
architecture feature as AMD Family 17h, this patch add Hygon CPU
Vendor ID check and setup kind to arch_kind_amd and reuse AMD
code path, which lead to correct return value in
__cache_sysconf() syscall. we run the glibc test suite for both
Hygon Dhyana and AMD EPYC and found no failure case.

Background:
Chengdu Haiguang IC Design Co., Ltd (Hygon) is a Joint Venture
between AMD and Haiguang Information Technology Co.,Ltd., aims at
providing high performance x86 processor for China server market.
Its first generation processor codename is Dhyana, which
originates from AMD technology and shares most of the
architecture with AMD's family 17h, but with different CPU Vendor
ID("HygonGenuine")/Family series number(Family 18h).

Related Hygon kernel patch can be found on
http://lkml.kernel.org/r/5ce86123a7b9dad925ac583d88d2f921040e859b.1538583282.git.puwen@hygon.cn

Signed-off-by: fanjinke <fanjinke@hygon.cn>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-12-13 09:25:20 -05:00
Joseph Myers
bf8ae8c09a Remove x86 mathinline.h hypot inline.
Continuing the removal of bits/mathinline.h inlines that would better
be done by the compiler, this patch removes an x86 inline for hypot
functions (only for fast-math, only for non-SSE 32-bit x86).  I've
filed <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88474> for adding
such an inline as an optimization in GCC.

Tested for x86_64 and x86.

	* sysdeps/x86/fpu/bits/mathinline.h (hypot): Remove inline
	definition.
2018-12-12 22:33:06 +00:00
H.J. Lu
c22e4c2a14 x86: Extend CPUID support in struct cpu_features
Extend CPUID support for all feature bits from CPUID.  Add a new macro,
CPU_FEATURE_USABLE, which can be used to check if a feature is usable at
run-time, instead of HAS_CPU_FEATURE and HAS_ARCH_FEATURE.

Add COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007 and
COMMON_CPUID_INDEX_80000008 to check CPU feature bits in them.

Tested on i686 and x86-64 as well as using build-many-glibcs.py with
x86 targets.

	* sysdeps/x86/cacheinfo.c (intel_check_word): Updated for
	cpu_features_basic.
	(__cache_sysconf): Likewise.
	(init_cacheinfo): Likewise.
	* sysdeps/x86/cpu-features.c (get_extended_indeces): Also
	populate COMMON_CPUID_INDEX_80000007 and
	COMMON_CPUID_INDEX_80000008.
	(get_common_indices): Also populate COMMON_CPUID_INDEX_D_ECX_1.
	Use CPU_FEATURES_CPU_P (cpu_features, XSAVEC) to check if
	XSAVEC is available.  Set the bit_arch_XXX_Usable bits.
	(init_cpu_features): Use _Static_assert on
	index_arch_Fast_Unaligned_Load.
	__get_cpuid_registers and __get_arch_feature.  Updated for
	cpu_features_basic.  Set stepping in cpu_features.
	* sysdeps/x86/cpu-features.h: (FEATURE_INDEX_1): Changed to enum.
	(FEATURE_INDEX_2): New.
	(FEATURE_INDEX_MAX): Changed to enum.
	(COMMON_CPUID_INDEX_D_ECX_1): New.
	(COMMON_CPUID_INDEX_80000007): Likewise.
	(COMMON_CPUID_INDEX_80000008): Likewise.
	(cpuid_registers): Likewise.
	(cpu_features_basic): Likewise.
	(CPU_FEATURE_USABLE): Likewise.
	(bit_arch_XXX_Usable): Likewise.
	(cpu_features): Use cpuid_registers and cpu_features_basic.
	(bit_arch_XXX): Reweritten.
	(bit_cpu_XXX): Likewise.
	(index_cpu_XXX): Likewise.
	(reg_XXX): Likewise.
	* sysdeps/x86/tst-get-cpu-features.c: Include <stdio.h> and
	<support/check.h>.
	(CHECK_CPU_FEATURE): New.
	(CHECK_CPU_FEATURE_USABLE): Likewise.
	(cpu_kinds): Likewise.
	(do_test): Print vendor, family, model and stepping.  Check
	HAS_CPU_FEATURE and CPU_FEATURE_USABLE.
	(TEST_FUNCTION): Removed.
	Include <support/test-driver.c> instead of
	"../../test-skeleton.c".
	* sysdeps/x86_64/multiarch/sched_cpucount.c (__sched_cpucount):
	Check POPCNT instead of POPCOUNT.
	* sysdeps/x86_64/multiarch/test-multiarch.c (do_test): Likewise.
2018-12-03 05:54:56 -08:00
H.J. Lu
c089fd80c7 x86/CET: Add a re-exec test with legacy bitmap
Add a re-exec test with legacy bitmap to verify that legacy bitmap is
properly hanlded by kernel.

	* sysdeps/x86/Makefile (tests): Add tst-cet-legacy-1a.
	(tst-cet-legacy-1a-ARGS): New.
	($(objpfx)tst-cet-legacy-1a): New target.
	* sysdeps/x86/tst-cet-legacy-1a.c: New file.
2018-11-23 07:31:07 -08:00
H.J. Lu
d524fa6c35 Check multiple NT_GNU_PROPERTY_TYPE_0 notes [BZ #23509]
Linkers group input note sections with the same name into one output
note section with the same name.  One output note section is placed in
one PT_NOTE segment.  Since new linkers merge input .note.gnu.property
sections into one output .note.gnu.property section, there is only
one NT_GNU_PROPERTY_TYPE_0 note in one PT_NOTE segment with new linkers.
Since older linkers treat input .note.gnu.property section as a generic
note section and just concatenate all input .note.gnu.property sections
into one output .note.gnu.property section without merging them, we may
see multiple NT_GNU_PROPERTY_TYPE_0 notes in one PT_NOTE segment with
older linkers.

When an older linker is used to created the program on CET-enabled OS,
the linker output has a single .note.gnu.property section with multiple
NT_GNU_PROPERTY_TYPE_0 notes, some of which have IBT and SHSTK enable
bits set even if the program isn't CET enabled.  Such programs will
crash on CET-enabled machines.  This patch updates the note parser:

1. Skip note parsing if a NT_GNU_PROPERTY_TYPE_0 note has been processed.
2. Check multiple NT_GNU_PROPERTY_TYPE_0 notes.

	[BZ #23509]
	* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Skip
	note parsing if a NT_GNU_PROPERTY_TYPE_0 note has been processed.
	Update the l_cet field when processing NT_GNU_PROPERTY_TYPE_0 note.
	Check multiple NT_GNU_PROPERTY_TYPE_0 notes.
	* sysdeps/x86/link_map.h (l_cet): Expand to 3 bits,  Add
	lc_unknown.
2018-11-08 10:07:10 -08:00
H.J. Lu
7cc65773f0 x86: Support RDTSCP for benchtests
RDTSCP waits until all previous instructions have executed and all
previous loads are globally visible before reading the counter.  RDTSC
doesn't wait until all previous instructions have been executed before
reading the counter.  All x86 processors since 2010 support RDTSCP
instruction.  This patch adds RDTSCP support to benchtests.

	* benchtests/Makefile (CPPFLAGS-nonlib): Add -DUSE_RDTSCP if
	USE_RDTSCP is defined.
	* sysdeps/x86/hp-timing.h (HP_TIMING_NOW): Use RDTSCP if
	USE_RDTSCP is defined.
2018-10-24 02:19:34 -07:00
Adhemerval Zanella
c3d8dc45c9 x86: Fix Haswell strong flags (BZ#23709)
Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
default flags for Haswell models.  Previously, new models were handled by the
default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
Prefer_PMINUB_for_stringop (only the TSX one).

This patch fixes it by disentangle the TSX flag handling from the memory
optimization ones.  The strstr case cited on patch now selects the
__strstr_sse2_unaligned as expected for the Haswell cpu.

Checked on x86_64-linux-gnu.

	[BZ #23709]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
	independently of other flags.
2018-10-23 14:57:02 -03:00
H.J. Lu
2dd8e58cc5 x86: Don't include <x86intrin.h>
Use __builtin_ia32_rdtsc directly since including <x86intrin.h> makes
building glibc very slow.  On Intel Core i5-6260U, this patch reduces
x86-64 build time from 8 minutes 33 seconds to 3 minutes 48 seconds
with "make -j4" and GCC 8.2.1.

	* sysdeps/x86/hp-timing.h: Don't include <x86intrin.h>.
	(HP_TIMING_NOW): Replace _rdtsc with __builtin_ia32_rdtsc.
2018-10-21 00:37:29 -07:00
H.J. Lu
72771e5375 x86: Use _rdtsc intrinsic for HP_TIMING_NOW
Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
HP_TIMING_NOW.  This patch

1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h.
2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h
can check minimum x86 ISA to decide if _rdtsc can be used.

NB: Checking if __i686__ isn't sufficient since __i686__ may not be
defined when building for i686 class processors.

	* sysdeps/i386/init-arch.h: Removed.
	* sysdeps/i386/i586/init-arch.h: Likewise.
	* sysdeps/i386/i686/init-arch.h: Likewise.
	* sysdeps/i386/i686/hp-timing.h: Likewise.
	* sysdeps/x86_64/hp-timing.h: Likewise.
	* sysdeps/i386/isa.h: New file.
	* sysdeps/i386/i586/isa.h: Likewise.
	* sysdeps/i386/i686/isa.h: Likewise.
	* sysdeps/x86_64/isa.h: Likewise.
	* sysdeps/x86/hp-timing.h: New file.
	* sysdeps/x86/init-arch.h: Include <isa.h>.
2018-10-17 15:16:45 -07:00
Joseph Myers
9755bc4686 Use round functions not __round functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __round functions to call the
corresponding round names instead, with asm redirection to __round
when the calls are not inlined.

An additional complication arises in
sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the
result converted to int, gets converted by the compiler to call
lroundl in the case of 32-bit long, so resulting in localplt test
failures.  It's logically correct to let the compiler make such an
optimization; an appropriate asm redirection of lroundl to __lroundl
is thus added to that file (it's not needed anywhere else).

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_roundf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_round.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise.
	* sysdeps/ieee754/float128/s_roundf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_roundf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
	* sysdeps/riscv/rvf/s_roundf.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise.
	(round): Redirect to __round.
	(__roundl): Call round instead of __round.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round):
	Remove macro.
	[_ARCH_PWR5X] (__roundf): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round
	functions instead of __round variants.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to
	__lroundl.
	(__ieee754_expl): Call roundl instead of __roundl.
2018-09-27 12:35:23 +00:00
Joseph Myers
d90c9b1a12 Invert sense of list of i686-class processors in sysdeps/x86/cpu-features.h.
I noticed that sysdeps/x86/cpu-features.h had conditionals on whether
to define HAS_CPUID, HAS_I586 and HAS_I686 with a long list of
preprocessor macros for i686-and-later processors which however was
out of date.  This patch avoids the problem of the list getting out of
date by instead having conditionals on all the (few, old) pre-i686
processors for which GCC has preprocessor macros, rather than the
(many, expanding list) i686-and-later processors.  It seems HAS_I586
and HAS_I686 are unused so the only effect of these macros being
missing is that 32-bit glibc built for one of these processors would
end up doing runtime detection of CPUID availability.

i386 builds are prevented by a configure test so there is no need to
allow for them here.  __geode__ (no long nops?) and __k6__ (no CMOV,
at least according to GCC) are conservatively handled as i586, not
i686, here (as noted above, this is a theoretical distinction at
present in that only HAS_CPUID appears to be used).

Tested for x86.

	* sysdeps/x86/cpu-features.h [__geode__ || __k6__]: Handle like
	[__i586__ || __pentium__].
	[__i486__]: Handle explicitly.
	(HAS_CPUID): Define to 1 if above macros are undefined.
	(HAS_I586): Likewise.
	(HAS_I686): Likewise.
2018-09-20 12:43:41 +00:00
Joseph Myers
ff6b24501f Split fenv_private.h out of math_private.h more consistently.
On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h.  As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.

This patch makes the fenv_private.h name an architecture-independent
convention in glibc.  The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed).  Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>.  The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64.  The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).

Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.

	* sysdeps/aarch64/fpu/fenv_private.h: New file.  Based on ....
	* sysdeps/aarch64/fpu/math_private.h: ... this file.  All contents
	moved to fenv_private.h except for ...
	(TOINT_INTRINSICS): Kept in math_private.h.
	(roundtoint): Likewise.
	(converttoint): Likewise.
	* sysdeps/arm/fenv_private.h: Change multiple-include guard to
	[ARM_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/arm/math_private.h: Remove.
	* sysdeps/generic/fenv_private.h: New file.  Contents moved from
	....
	* sysdeps/generic/math_private.h: ... this file.  Include
	<stdbool.h>.  Do not include <fenv.h> or <get-rounding-mode.h>.
	Include <fenv_private.h>.  Remove functions and macros moved to
	fenv_private.h.
	* sysdeps/i386/fpu/math_private.h: Remove.
	* sysdeps/mips/math_private.h: Move to ....
	* sysdeps/mips/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [MIPS_FENV_PRIVATE_H].  Remove
	[__mips_hard_float] conditional.  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
	guard to [POWERPC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/math_private.h: Do not include
	<fenv_private.h>.
	* sysdeps/riscv/rvf/math_private.h: Move to ....
	* sysdeps/riscv/rvf/fenv_private.h: ... here.  Change
	multiple-include guard to [RISCV_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
	to [SPARC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/sparc/fpu/math_private.h: Remove.
	* sysdeps/i386/fpu/fenv_private.h: Move to ....
	* sysdeps/x86/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [X86_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/x86_64/fpu/math_private.h: Do not include
	<sysdeps/i386/fpu/fenv_private.h>.
2018-08-28 20:48:49 +00:00
Joseph Myers
2ce7ba7d15 Move SNAN_TESTS_* out of math-tests.h.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the SNAN_TESTS_* macros for individual types out to their own
sysdeps header (while the type-generic SNAN_TESTS wrapper for those
macros remains in math-tests.h).

Tested for x86_64 and x86, and with build-many-glibcs.py.

	* sysdeps/generic/math-tests-snan.h: New file.
	* sysdeps/generic/math-tests.h: Include <math-tests-snan.h>.
	(SNAN_TESTS_float): Do not define here.
	(SNAN_TESTS_double): Likewise.
	(SNAN_TESTS_long_double): Likewise.
	(SNAN_TESTS_float128): Likewise.
	* sysdeps/i386/fpu/math-tests-snan.h: New file.
	* sysdeps/i386/fpu/math-tests.h: Remove file.
	* sysdeps/ia64/math-tests-snan.h: New file.
	* sysdeps/ia64/math-tests.h: Remove file.
	* sysdeps/x86/math-tests.h: Likewise.
	* sysdeps/x86_64/fpu/math-tests-snan.h: New file.
2018-08-10 19:22:01 +00:00
H.J. Lu
fb4c32aef6 x86: Move STATE_SAVE_OFFSET/STATE_SAVE_MASK to sysdep.h
Move STATE_SAVE_OFFSET and STATE_SAVE_MASK to sysdep.h to make
sysdeps/x86/cpu-features.h a C header file.

	* sysdeps/x86/cpu-features.h (STATE_SAVE_OFFSET): Removed.
	(STATE_SAVE_MASK): Likewise.
	Don't check __ASSEMBLER__ to include <cpu-features-offsets.h>.
	* sysdeps/x86/sysdep.h (STATE_SAVE_OFFSET): New.
	(STATE_SAVE_MASK): Likewise.
	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features-offsets.h>
	instead of <cpu-features.h>.
2018-08-06 06:25:43 -07:00
H.J. Lu
ae67f2e562 x86: Cleanup cpu-features-offsets.sym
Remove the unused macros.  There is no code changes in libc.so nor
ld.so on i686 and x86-64.

	* sysdeps/x86/cpu-features-offsets.sym
	(rtld_global_ro_offsetof): Removed.
	(CPU_FEATURES_SIZE): Likewise.
	(CPUID_OFFSET): Likewise.
	(CPUID_SIZE): Likewise.
	(CPUID_EAX_OFFSET): Likewise.
	(CPUID_EBX_OFFSET): Likewise.
	(CPUID_ECX_OFFSET): Likewise.
	(CPUID_EDX_OFFSET): Likewise.
	(FAMILY_OFFSET): Likewise.
	(MODEL_OFFSET): Likewise.
	(FEATURE_OFFSET): Likewise.
	(FEATURE_SIZ): Likewise.
	(COMMON_CPUID_INDEX_1): Likewise.
	(COMMON_CPUID_INDEX_7): Likewise.
	(FEATURE_INDEX_1): Likewise.
	(RTLD_GLOBAL_RO_DL_X86_CPU_FEATURES_OFFSET): Updated.
2018-08-03 06:42:09 -07:00
Siddhesh Poyarekar
dce452dc52 Rename the glibc.tune namespace to glibc.cpu
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to.  Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses.  Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.

	* NEWS: Mention the change.
	* elf/dl-tunables.list: Rename tune namespace to cpu.
	* sysdeps/powerpc/dl-tunables.list: Likewise.
	* sysdeps/x86/dl-tunables.list: Likewise.
	* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
	cpu.name.
	* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
	* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
	* manual/README.tunables: Likewise.
	* manual/tunables.texi: Likewise.
	* sysdeps/powerpc/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
	(init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.c: Likewise.
	* sysdeps/x86/cpu-features.h: Likewise.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86_64/Makefile: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-08-02 23:49:19 +05:30
H.J. Lu
82c80ac2eb x86: Rename get_common_indeces to get_common_indices
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/x86/cpu-features.c (get_common_indeces): Renamed to
	...
	(get_common_indices): This.
	(init_cpu_features): Updated.
2018-08-01 04:57:50 -07:00
H.J. Lu
98864ed0e0 x86/CET: Fix property note parser [BZ #23467]
GNU_PROPERTY_X86_FEATURE_1_AND may not be the first property item.  We
need to check each property item until we reach the end of the property
or find GNU_PROPERTY_X86_FEATURE_1_AND.

This patch adds 2 tests.  The first test checks if IBT is enabled and
the second test reads the output from the first test to check if IBT
is is enabled.  The second second test fails if IBT isn't enabled
properly.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ #23467]
	* sysdeps/unix/sysv/linux/x86/Makefile (tests): Add
	tst-cet-property-1 and tst-cet-property-2 if CET is enabled.
	(CFLAGS-tst-cet-property-1.o): New.
	(ASFLAGS-tst-cet-property-dep-2.o): Likewise.
	($(objpfx)tst-cet-property-2): Likewise.
	($(objpfx)tst-cet-property-2.out): Likewise.
	* sysdeps/unix/sysv/linux/x86/tst-cet-property-1.c: New file.
	* sysdeps/unix/sysv/linux/x86/tst-cet-property-2.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/tst-cet-property-dep-2.S: Likewise.
	* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Parse
	each property item until GNU_PROPERTY_X86_FEATURE_1_AND is found.
2018-07-30 16:15:38 -07:00
H.J. Lu
c92a00d865 x86: Add tst-get-cpu-features-static to $(tests) [BZ #23458]
All tests should be added to $(tests).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ #23458]
	* sysdeps/x86/Makefile (tests): Add tst-get-cpu-features-static.
2018-07-30 16:11:19 -07:00
H.J. Lu
4591b7db23 x86/CET: Don't parse beyond the note end
Simply check if "ptr < ptr_end" since "ptr" is always incremented by 8.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Don't
	parse beyond the note end.
2018-07-27 13:23:31 -07:00
H.J. Lu
be525a69a6 x86: Populate COMMON_CPUID_INDEX_80000001 for Intel CPUs [BZ #23459]
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ #23459]
	* sysdeps/x86/cpu-features.c (get_extended_indices): New
	function.
	(init_cpu_features): Call get_extended_indices for both Intel
	and AMD CPUs.
	* sysdeps/x86/cpu-features.h (COMMON_CPUID_INDEX_80000001):
	Remove "for AMD" comment.
2018-07-26 13:31:11 -07:00
H.J. Lu
65d87ade1e x86: Correct index_cpu_LZCNT [BZ # 23456]
cpu-features.h has

 #define bit_cpu_LZCNT		(1 << 5)
 #define index_cpu_LZCNT	COMMON_CPUID_INDEX_1
 #define reg_LZCNT

But the LZCNT feature bit is in COMMON_CPUID_INDEX_80000001:

Initial EAX Value: 80000001H
ECX Extended Processor Signature and Feature Bits:
Bit 05: LZCNT available

index_cpu_LZCNT should be COMMON_CPUID_INDEX_80000001, not
COMMON_CPUID_INDEX_1.  The VMX feature bit is in COMMON_CPUID_INDEX_1:

Initial EAX Value: 01H
Feature Information Returned in the ECX Register:
5 VMX

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ # 23456]
	* sysdeps/x86/cpu-features.h (index_cpu_LZCNT): Set to
	COMMON_CPUID_INDEX_80000001.
2018-07-26 11:10:44 -07:00
H.J. Lu
fddcd00377 x86/CET: Add tests with legacy non-CET shared objects
Check binary compatibility of CET-enabled executables:

1. When CET-enabled executable is used with legacy non-CET shared object
at run-time, ld.so should disable SHSTK and put legacy non-CET shared
objects in legacy bitmap.
2. When IBT-enabled executable dlopens legacy non-CET shared object,
ld.so should put legacy shared object in legacy bitmap.
3. Use GLIBC_TUNABLES=glibc.tune.x86_shstk=[on|off|permissive] to
control how SHSTK is enabled.

	* sysdeps/x86/Makefile (tests): Add tst-cet-legacy-1,
	tst-cet-legacy-2, tst-cet-legacy-2a, tst-cet-legacy-3,
	tst-cet-legacy-4, tst-cet-legacy-4a, tst-cet-legacy-4b
	and tst-cet-legacy-4c.
	(modules-names): Add tst-cet-legacy-mod-1, tst-cet-legacy-mod-2
	and tst-cet-legacy-mod-4.
	(CFLAGS-tst-cet-legacy-2.c): New.
	(CFLAGS-tst-cet-legacy-mod-1.c): Likewise.
	(CFLAGS-tst-cet-legacy-mod-2.c): Likewise.
	(CFLAGS-tst-cet-legacy-3.c): Likewise.
	(CFLAGS-tst-cet-legacy-4.c): Likewise.
	(CFLAGS-tst-cet-legacy-mod-4.c): Likewise.
	($(objpfx)tst-cet-legacy-1): Likewise.
	($(objpfx)tst-cet-legacy-2): Likewise.
	($(objpfx)tst-cet-legacy-2.out): Likewise.
	($(objpfx)tst-cet-legacy-2a): Likewise.
	($(objpfx)tst-cet-legacy-2a.out): Likewise.
	($(objpfx)tst-cet-legacy-4): Likewise.
	($(objpfx)tst-cet-legacy-4.out): Likewise.
	($(objpfx)tst-cet-legacy-4a): Likewise.
	($(objpfx)tst-cet-legacy-4a.out): Likewise.
	(tst-cet-legacy-4a-ENV): Likewise.
	($(objpfx)tst-cet-legacy-4b): Likewise.
	($(objpfx)tst-cet-legacy-4b.out): Likewise.
	(tst-cet-legacy-4b-ENV): Likewise.
	($(objpfx)tst-cet-legacy-4c): Likewise.
	($(objpfx)tst-cet-legacy-4c.out): Likewise.
	(tst-cet-legacy-4c-ENV): Likewise.
	* sysdeps/x86/tst-cet-legacy-1.c: New file.
	* sysdeps/x86/tst-cet-legacy-2.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-2a.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-3.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-4.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-4a.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-4b.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-4c.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-mod-1.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-mod-2.c: Likewise.
	* sysdeps/x86/tst-cet-legacy-mod-4.c: Likewise.
2018-07-25 04:47:05 -07:00
H.J. Lu
394df3815e x86/CET: Extend arch_prctl syscall for CET control
CET arch_prctl bits should be defined in <asm/prctl.h> from Linux kernel
header files.  Add x86 <include/asm/prctl.h> for pre-CET kernel header
files.

Note: sysdeps/unix/sysv/linux/x86/include/asm/prctl.h should be removed
if <asm/prctl.h> from the required kernel header files contains CET
arch_prctl bits.

 /* CET features:
    IBT:   GNU_PROPERTY_X86_FEATURE_1_IBT
    SHSTK: GNU_PROPERTY_X86_FEATURE_1_SHSTK
  */

 /* Return CET features in unsigned long long *addr:
      features: addr[0].
      shadow stack base address: addr[1].
      shadow stack size: addr[2].
  */
 # define ARCH_CET_STATUS		0x3001
 /* Disable CET features in unsigned int features.  */
 # define ARCH_CET_DISABLE		0x3002
 /* Lock all CET features.  */
 # define ARCH_CET_LOCK			0x3003
 /* Allocate a new shadow stack with unsigned long long *addr:
      IN: requested shadow stack size: *addr.
      OUT: allocated shadow stack address: *addr.
  */
 # define ARCH_CET_ALLOC_SHSTK		0x3004
 /* Return legacy region bitmap info in unsigned long long *addr:
     address: addr[0].
     size: addr[1].
  */
 # define ARCH_CET_LEGACY_BITMAP	0x3005

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/unix/sysv/linux/x86/include/asm/prctl.h: New file.
	* sysdeps/unix/sysv/linux/x86/cpu-features.c: Include
	<sys/prctl.h> and <asm/prctl.h>.
	(get_cet_status): Call arch_prctl with ARCH_CET_STATUS.
	* sysdeps/unix/sysv/linux/x86/dl-cet.h: Include <sys/prctl.h>
	and <asm/prctl.h>.
	(dl_cet_allocate_legacy_bitmap): Call arch_prctl with
	ARCH_CET_LEGACY_BITMAP.
	(dl_cet_disable_cet): Call arch_prctl with ARCH_CET_DISABLE.
	(dl_cet_lock_cet): Call arch_prctl with ARCH_CET_LOCK.
	* sysdeps/x86/libc-start.c: Include <startup.h>.
2018-07-24 12:23:17 -07:00
H.J. Lu
e27f41ba2b Add <bits/indirect-return.h>
Add <bits/indirect-return.h> and include it in <ucontext.h>.
__INDIRECT_RETURN defined in <bits/indirect-return.h> indicates if
swapcontext requires special compiler treatment.  The default
__INDIRECT_RETURN is empty.

On x86, when shadow stack is enabled, __INDIRECT_RETURN is defined
with indirect_return attribute, which has been added to GCC 9, to
indicate that swapcontext returns via indirect branch.  Otherwise
__INDIRECT_RETURN is defined with returns_twice attribute.

When shadow stack is enabled, remove always_inline attribute from
prepare_test_buffer in string/tst-xbzero-opt.c to avoid:

tst-xbzero-opt.c: In function ‘prepare_test_buffer’:
tst-xbzero-opt.c:105:1: error: function ‘prepare_test_buffer’ can never be inlined because it uses setjmp
 prepare_test_buffer (unsigned char *buf)

when indirect_return attribute isn't available.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* bits/indirect-return.h: New file.
	* misc/sys/cdefs.h (__glibc_has_attribute): New.
	* sysdeps/x86/bits/indirect-return.h: Likewise.
	* stdlib/Makefile (headers): Add bits/indirect-return.h.
	* stdlib/ucontext.h: Include <bits/indirect-return.h>.
	(swapcontext): Add __INDIRECT_RETURN.
	* string/tst-xbzero-opt.c (ALWAYS_INLINE): New.
	(prepare_test_buffer): Use it.
2018-07-24 07:55:47 -07:00
H.J. Lu
ba2ea23d05 x86: Always include <dl-cet.h>/cet-tunables.h> for --enable-cet
Always include <dl-cet.h> and cet-tunables.h> when CET is enabled.
Otherwise, configure glibc with --enable-cet --disable-tunables will
fail to build.

	* sysdeps/x86/cpu-features.c: Always include <dl-cet.h> and
	cet-tunables.h> when CET is enabled.
2018-07-17 04:16:35 -07:00
H.J. Lu
f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598]
Intel Control-flow Enforcement Technology (CET) instructions:

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en
forcement-technology-preview.pdf

includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK).

GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to
indicate that all executable sections are compatible with IBT when
ENDBR instruction starts each valid target where an indirect branch
instruction can land.  Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on
output only if it is set on all relocatable inputs.

On an IBT capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable IBT and
lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable.
2. When loading an executable with an interpreter, enable IBT if
GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable,
     disable IBT.
  b. Lock IBT.
3. If IBT is enabled, when loading a shared object without
GNU_PROPERTY_X86_FEATURE_1_IBT:
  a. If legacy interwork is allowed, then mark all pages in executable
     PT_LOAD segments in legacy code page bitmap.  Failure of legacy code
     page bitmap allocation causes an error.
  b. If legacy interwork isn't allowed, it causes an error.

GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to
indicate that all executable sections are compatible with SHSTK where
return address popped from shadow stack always matches return address
popped from normal stack.  Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK
on output only if it is set on all relocatable inputs.

On a SHSTK capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable.
2. When loading an executable with an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable
     or any shared objects loaded via the DT_NEEDED tag, disable SHSTK.
  b. Otherwise lock SHSTK.
3. After SHSTK is enabled, it is an error to load a shared object
without GNU_PROPERTY_X86_FEATURE_1_SHSTK.

To enable CET support in glibc, --enable-cet is required to configure
glibc.  When CET is enabled, both compiler and assembler must support
CET.  Otherwise, it is a configure-time error.

To support CET run-time control,

1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate
if IBT or SHSTK are enabled at run-time.  It should be initialized by
init_cpu_features.
2. For dynamic executables:
   a. A l_cet field is added to struct link_map to indicate if IBT or
      SHSTK is enabled in an ELF module.  _dl_process_pt_note or
      _rtld_process_pt_note is called to process PT_NOTE segment for
      GNU program property and set l_cet.
   b. _dl_open_check is added to check IBT and SHSTK compatibilty when
      dlopening a shared object.
3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with
_dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if
SHSTK is enabled.

CET run-time control can be changed via GLIBC_TUNABLES with

$ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive|on|off]
$ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive|on|off]

1. permissive: SHSTK is disabled when dlopening a legacy ELF module.
2. on: IBT or SHSTK are always enabled, regardless if there are IBT or
SHSTK bits in GNU program property.
3. off: IBT or SHSTK are always disabled, regardless if there are IBT or
SHSTK bits in GNU program property.

<cet.h> from CET-enabled GCC is automatically included by assembly codes
to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK
to GNU program property.  _CET_ENDBR is added at the entrance of all
assembly functions whose address may be taken.  _CET_NOTRACK is used to
insert NOTRACK prefix with indirect jump table to support IBT.  It is
defined as notrack when _CET_NOTRACK is defined in <cet.h>.

	 [BZ #21598]
	* configure.ac: Add --enable-cet.
	* configure: Regenerated.
	* elf/Makefille (all-built-dso): Add a comment.
	* elf/dl-load.c (filebuf): Moved before "dynamic-link.h".
	Include <dl-prop.h>.
	(_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE
	segment.
	* elf/dl-open.c: Include <dl-prop.h>.
	(dl_open_worker): Call _dl_open_check.
	* elf/rtld.c: Include <dl-prop.h>.
	(dl_main): Call _rtld_process_pt_note on PT_NOTE segment.  Call
	_rtld_main_check.
	* sysdeps/generic/dl-prop.h: New file.
	* sysdeps/i386/dl-cet.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise.
	* sysdeps/x86/cet-tunables.h: Likewise.
	* sysdeps/x86/check-cet.awk: Likewise.
	* sysdeps/x86/configure: Likewise.
	* sysdeps/x86/configure.ac: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.
	* sysdeps/x86/dl-procruntime.c: Likewise.
	* sysdeps/x86/dl-prop.h: Likewise.
	* sysdeps/x86/libc-start.h: Likewise.
	* sysdeps/x86/link_map.h: Likewise.
	* sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
	(_dl_runtime_resolve_shstk): New.
	(_dl_runtime_profile_shstk): Likewise.
	* sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet
	if CET is enabled.
	(CFLAGS-.o): Add -fcf-protection if CET is enabled.
	(CFLAGS-.os): Likewise.
	(CFLAGS-.op): Likewise.
	(CFLAGS-.oS): Likewise.
	(asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET
	is enabled.
	(tests-special): Add $(objpfx)check-cet.out.
	(cet-built-dso): New.
	(+$(cet-built-dso:=.note)): Likewise.
	(common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note).
	($(objpfx)check-cet.out): New.
	(generated): Add check-cet.out.
	* sysdeps/x86/cpu-features.c: Include <dl-cet.h> and
	<cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New prototype.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	(init_cpu_features): Call get_cet_status to check CET status
	and update dl_x86_feature_1 with CET status.  Call
	TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK
	(set_x86_shstk).  Disable and lock CET in libc.a.
	* sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New function.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	* sysdeps/x86/sysdep.h (_CET_NOTRACK): New.
	(_CET_ENDBR): Define if not defined.
	(ENTRY): Add _CET_ENDBR.
	* sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and
	x86_shstk.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
2018-07-16 14:08:27 -07:00
H.J. Lu
faaee1f07e x86: Support shadow stack pointer in setjmp/longjmp
Save and restore shadow stack pointer in setjmp and longjmp to support
shadow stack in Intel CET.  Use feature_1 in tcbhead_t to check if
shadow stack is enabled before saving and restoring shadow stack pointer.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/i386/__longjmp.S: Include <jmp_buf-ssp.h>.
	(__longjmp): Restore shadow stack pointer if shadow stack is
	enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
	isn't defined for __longjmp_cancel.
	* sysdeps/i386/bsd-_setjmp.S: Include <jmp_buf-ssp.h>.
	(_setjmp): Save shadow stack pointer if shadow stack is enabled
	and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/i386/bsd-setjmp.S: Include <jmp_buf-ssp.h>.
	(setjmp): Save shadow stack pointer if shadow stack is enabled
	and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/i386/setjmp.S: Include <jmp_buf-ssp.h>.
	(__sigsetjmp): Save shadow stack pointer if shadow stack is
	enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/unix/sysv/linux/i386/____longjmp_chk.S: Include
	<jmp_buf-ssp.h>.
	(____longjmp_chk): Restore shadow stack pointer if shadow stack
	is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/unix/sysv/linux/x86/Makefile (gen-as-const-headers):
	Remove jmp_buf-ssp.sym.
	* sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S: Include
	<jmp_buf-ssp.h>.
	(____longjmp_chk): Restore shadow stack pointer if shadow stack
	is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/x86/Makefile (gen-as-const-headers): Add
	jmp_buf-ssp.sym.
	* sysdeps/x86/jmp_buf-ssp.sym: New dummy file.
	* sysdeps/x86_64/__longjmp.S: Include <jmp_buf-ssp.h>.
	(__longjmp): Restore shadow stack pointer if shadow stack is
	enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
	isn't defined for __longjmp_cancel.
	* sysdeps/x86_64/setjmp.S: Include <jmp_buf-ssp.h>.
	(__sigsetjmp): Save shadow stack pointer if shadow stack is
	enabled and SHADOW_STACK_POINTER_OFFSET is defined.
2018-07-14 05:59:53 -07:00
H.J. Lu
ebff9c5cfa x86: Rename __glibc_reserved1 to feature_1 in tcbhead_t [BZ #22563]
feature_1 has X86_FEATURE_1_IBT and X86_FEATURE_1_SHSTK bits for CET
run-time control.

CET_ENABLED, IBT_ENABLED and SHSTK_ENABLED are defined to 1 or 0 to
indicate that if CET, IBT and SHSTK are enabled.

<tls-setup.h> is added to set up thread-local data.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ #22563]
	* nptl/pthread_create.c: Include <tls-setup.h>.
	(__pthread_create_2_1): Call tls_setup_tcbhead.
	* sysdeps/generic/tls-setup.h: New file.
	* sysdeps/x86/nptl/tls-setup.h: Likewise.
	* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET):
	Likewise.
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Rename __glibc_reserved1
	to feature_1.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/x86/sysdep.h (X86_FEATURE_1_IBT): New.
	(X86_FEATURE_1_SHSTK): Likewise.
	(CET_ENABLED): Likewise.
	(IBT_ENABLED): Likewise.
	(SHSTK_ENABLED): Likewise.
2018-07-14 05:56:46 -07:00
Amit Pawar
bce5911b67 Use AVX_Fast_Unaligned_Load from Zen onwards.
From Zen onwards this will be enabled. It was disabled for the
Excavator case and will remain disabled.

Reviewd-by: Carlos O'Donell <carlos@redhat.com>
2018-07-06 09:55:36 -04:00
H.J. Lu
e28e9b1ec4 x86-64: Check Prefer_FSRM in ifunc-memmove.h
Although the REP MOVSB implementations of memmove, memcpy and mempcpy
aren't used by the current processors, this patch adds Prefer_FSRM
check in ifunc-memmove.h so that they can be used in the future.

	* sysdeps/x86/cpu-features.h (bit_arch_Prefer_FSRM): New.
	(index_arch_Prefer_FSRM): Likewise.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Also check Prefer_FSRM.
	* sysdeps/x86_64/multiarch/ifunc-memmove.h (IFUNC_SELECTOR):
	Also return OPTIMIZE (erms) for Prefer_FSRM.
2018-05-21 16:54:59 -07:00
H.J. Lu
1af30adcd5 Initial Fast Short REP MOVSB (FSRM) support
The newer Intel processors support Fast Short REP MOVSB which has a
feature bit in CPUID.  This patch adds the Fast Short REP MOVSB (FSRM)
bit to x86 cpu-features.

	* sysdeps/x86/cpu-features.h (bit_cpu_FSRM): New.
	(index_cpu_FSRM): Likewise.
	(reg_FSRM): Likewise.
2018-05-21 10:54:32 -07:00
H.J. Lu
98ee36c7a4 x86: Add sysdeps/x86/ldsodefs.h
Merge sysdeps/i386/ldsodefs.h and sysdeps/x86_64/ldsodefs.h into
sysdeps/x86/ldsodefs.h.

Tested on i686 and x86-64.

	* sysdeps/i386/ldsodefs.h: Removed.
	* sysdeps/x86_64/ldsodefs.h: Moved to ...
	* sysdeps/x86/ldsodefs.h: This.
	(La_i86_regs): New.
	(La_i86_retval): Likewise.
	(ARCH_PLTENTER_MEMBERS): Add i86_gnu_pltenter.
	(ARCH_PLTEXIT_MEMBERS): i86_gnu_pltexit.

Acked-by: Christian Brauner (Ubuntu) christian@brauner.io
2018-05-14 09:19:41 -07:00
Joseph Myers
8f5b00d375 Move math_check_force_underflow macros to separate math-underflow.h.
This patch continues cleaning up math_private.h by moving the
math_check_force_underflow set of macros to a separate header
math-underflow.h.

This header is included by the files that need it rather than from
math_private.h.  Moving these macros to a separate file removes the
math_private.h uses of macros from float.h, so the inclusion of
float.h in math_private.h is also removed; files that were depending
on that inclusion are fixed to include float.h directly.  The
inclusion of math-barriers.h from math_private.h will be removed in a
separate patch.

Tested for x86_64 and x86.  Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.

	* math/math-underflow.h: New file.
	* sysdeps/generic/math_private.h: Do not include <float.h>.
	(fabs_tg): Remove macro.  Moved to math-underflow.h.
	(min_of_type_f): Likewise.
	(min_of_type_): Likewise.
	(min_of_type_l): Likewise.
	(min_of_type_f128): Likewise.
	(min_of_type): Likewise.
	(math_check_force_underflow): Likewise.
	(math_check_force_underflow_nonneg): Likewise.
	(math_check_force_underflow_complex): Likewise.
	* math/e_exp2_template.c: Include <math-underflow.h>.
	* math/k_casinh_template.c: Likewise.
	* math/s_catan_template.c: Likewise.
	* math/s_catanh_template.c: Likewise.
	* math/s_ccosh_template.c: Likewise.
	* math/s_cexp_template.c: Likewise.
	* math/s_clog10_template.c: Likewise.
	* math/s_clog_template.c: Likewise.
	* math/s_csin_template.c: Likewise.
	* math/s_csinh_template.c: Likewise.
	* math/s_csqrt_template.c: Likewise.
	* math/s_ctan_template.c: Likewise.
	* math/s_ctanh_template.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_asin.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_atanh.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_hypot.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_j1.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_sinh.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_asinh.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_erf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_expm1.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_log1p.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_tanh.c: Likewise.
	* sysdeps/ieee754/flt-32/e_asinf.c: Likewise.
	* sysdeps/ieee754/flt-32/e_atanhf.c: Likewise.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
	* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
	* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
	* sysdeps/ieee754/flt-32/e_sinhf.c: Likewise.
	* sysdeps/ieee754/flt-32/k_sinf.c: Likewise.
	* sysdeps/ieee754/flt-32/k_tanf.c: Likewise.
	* sysdeps/ieee754/flt-32/s_asinhf.c: Likewise.
	* sysdeps/ieee754/flt-32/s_atanf.c: Likewise.
	* sysdeps/ieee754/flt-32/s_erff.c: Likewise.
	* sysdeps/ieee754/flt-32/s_expm1f.c: Likewise.
	* sysdeps/ieee754/flt-32/s_log1pf.c: Likewise.
	* sysdeps/ieee754/flt-32/s_tanhf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_atanhl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_hypotl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/e_sinhl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/k_sincosl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/k_sinl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/k_tanl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_asinhl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_atanl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_erfl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_expm1l.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_log1pl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_tanhl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_powl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/k_sincosl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/k_sinl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/k_tanl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_erfl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/e_asinl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
	* sysdeps/ieee754/ldbl-96/e_hypotl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/e_j1l.c: Likewise.
	* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/e_sinhl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/k_sinl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/k_tanl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/s_asinhl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/s_erfl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/s_tanhl.c: Likewise.
	* sysdeps/powerpc/fpu/e_hypot.c: Likewise.
	* sysdeps/x86/fpu/powl_helper.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_nextup.c: Include <float.h>.
	* sysdeps/ieee754/flt-32/s_nextupf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_nextupl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_nextupl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/s_nextupl.c: Likewise.
2018-05-10 00:53:04 +00:00
Joseph Myers
9ed2e15ff4 Move math_opt_barrier, math_force_eval to separate math-barriers.h.
This patch continues cleaning up math_private.h by moving the
math_opt_barrier and math_force_eval macros to a separate header
math-barriers.h.

At present, those macros are inside a "#ifndef math_opt_barrier" in
math_private.h to allow architectures to override them and then use
a separate math-barriers.h header, no such #ifndef or #include_next is
needed; architectures just have their own alternative version of
math-barriers.h when providing their own optimized versions that avoid
going through memory unnecessarily.  The generic math-barriers.h has a
comment added to document these two macros.

In this patch, math_private.h is made to #include <math-barriers.h>,
so files using these macros do not need updating yet.  That is because
of uses of math_force_eval in math_check_force_underflow and
math_check_force_underflow_nonneg, which are still defined in
math_private.h.  Once those are moved out to a separate header, that
separate header can be made to include <math-barriers.h>, as can the
other files directly using these barrier macros, and then the include
of <math-barriers.h> from math_private.h can be removed.

Tested for x86_64 and x86.  Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.

	* sysdeps/generic/math-barriers.h: New file.
	* sysdeps/generic/math_private.h [!math_opt_barrier]
	(math_opt_barrier): Move to math-barriers.h.
	[!math_opt_barrier] (math_force_eval): Likewise.
	* sysdeps/aarch64/fpu/math-barriers.h: New file.
	* sysdeps/aarch64/fpu/math_private.h (math_opt_barrier): Move to
	math-barriers.h.
	(math_force_eval): Likewise.
	* sysdeps/alpha/fpu/math-barriers.h: New file.
	* sysdeps/alpha/fpu/math_private.h (math_opt_barrier): Move to
	math-barriers.h.
	(math_force_eval): Likewise.
	* sysdeps/x86/fpu/math-barriers.h: New file.
	* sysdeps/i386/fpu/fenv_private.h (math_opt_barrier): Move to
	math-barriers.h.
	(math_force_eval): Likewise.
	* sysdeps/m68k/m680x0/fpu/math_private.h: Move to....
	* sysdeps/m68k/m680x0/fpu/math-barriers.h: ... here.  Adjust
	multiple-include guard for rename.
	* sysdeps/powerpc/fpu/math-barriers.h: New file.
	* sysdeps/powerpc/fpu/math_private.h (math_opt_barrier): Move to
	math-barriers.h.
	(math_force_eval): Likewise.
2018-05-09 19:45:47 +00:00
H.J. Lu
d6cc1829aa x86: Use pad in pthread_unwind_buf to preserve shadow stack register
The pad array in struct pthread_unwind_buf is used by setjmp to save
shadow stack register.  We assert that size of struct pthread_unwind_buf
is no less than offset of shadow stack pointer + shadow stack pointer
size.

Since functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as
these with thread cancellation, call setjmp, but never return after
__libc_unwind_longjmp, __libc_unwind_longjmp, which is defined as
__libc_longjmp on x86, doesn't need to restore shadow stack register.
__libc_longjmp, which is a private interface for thread cancellation
implementation in libpthread, is changed to call __longjmp_cancel,
instead of __longjmp.  __longjmp_cancel is a new internal function
in libc, which is similar to __longjmp, but doesn't restore shadow
stack register.

The compatibility longjmp and siglongjmp in libpthread.so are changed
to call __libc_siglongjmp, instead of __libc_longjmp, so that they will
restore shadow stack register.

Tested with build-many-glibcs.py.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* nptl/pthread_create.c (START_THREAD_DEFN): Clear previous
	handlers after setjmp.
	* setjmp/longjmp.c (__libc_longjmp): Don't define alias if
	defined.
	* sysdeps/unix/sysv/linux/x86/setjmpP.h: Include
	<libc-pointer-arith.h>.
	(_JUMP_BUF_SIGSET_BITS_PER_WORD): New.
	(_JUMP_BUF_SIGSET_NSIG): Changed to 96.
	(_JUMP_BUF_SIGSET_NWORDS): Changed to use ALIGN_UP and
	_JUMP_BUF_SIGSET_BITS_PER_WORD.
	* sysdeps/x86/Makefile (sysdep_routines): Add __longjmp_cancel.
	* sysdeps/x86/__longjmp_cancel.S: New file.
	* sysdeps/x86/longjmp.c: Likewise.
	* sysdeps/x86/nptl/pt-longjmp.c: Likewise.
2018-05-02 06:17:41 -07:00
Joseph Myers
5d75b75fb7 Remove sysdeps/x86/fpu/bits/mathinline.h __finite inline.
Continuing the removals of inline functions from the x86
bits/mathinline.h, this patch removes an inline of __finite (which was
not actually architecture-specific at all beyond its
endianness-dependence).

This inline is not normally used with GCC 4.4 or later, because
isfinite now uses __builtin_isfinite except for -fsignaling-nans.
Allowing __builtin_isfinite etc. to work properly even for
-fsignaling-nans, by implementing versions of those built-in functions
that use integer arithmetic in GCC, is
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66462> (a patch was
committed but had to be reverted because it caused problems, and that
patch didn't address all formats for all architectures, only some, so
by itself would not have been sufficient to allow glibc to use
__builtin_isfinite unconditionally for new-enough GCC).

Tested for x86_64 and x86.

	* sysdeps/x86/fpu/bits/mathinline.h [__USE_MISC] (__finite):
	Remove inline function.
2018-03-16 18:36:53 +00:00
Wilco Dijkstra
700593fdd7 Remove all target specific __ieee754_sqrt(f/l) inlines
Remove the now unused target specific__ieee754_sqrt(f/l) inlines.
Also remove inlines of sqrt which are for really old GCC versions.
Removing these is desirable, under the general principle of leaving
such inlining to the compiler rather than trying to do it in installed
headers, especially when only very old compilers are affected.

Note that removing inlines for __ieee754_sqrt disables inlining in the
sqrt wrapper functions.  Given the sqrt function will typically only be
called for negative arguments, it doesn't matter whether the inlining
happens or not.

	* sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/s390/fpu/bits/mathinline.h: Remove file.
	* sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove.
	(sqrtf): Remove.
	(sqrtl): Remove.
	(__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove.
	* sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
2018-03-15 19:21:36 +00:00
Joseph Myers
c429a8d8d6 Remove more old-compilers parts of sysdeps/x86/fpu/bits/mathinline.h.
This patch removes further parts of sysdeps/x86/fpu/bits/mathinline.h
that are only of value for optimization with older compiler versions,
in accordance with general principles of preferring the let the
compiler deal with such inlining through built-in functions.

In general, GCC supports inlining all these functions as of version
4.3 or earlier.  However, some inlines in GCC may have had excessively
restrictive conditions in past GCC versions (e.g. requiring
-ffast-math when the inline is valid under broader conditions).  (In
particular, GCC had, before GCC 7, unnecessarily restrictive
conditions on when it could apply floor and ceil inlines corresponding
to the ones removed here.  The same was true for rint, but
bits/mathinline.h *also* was excessively restrictive there.)

The removed sincos inlines are for __sincos etc. functions (not a
public interface and not currently used in this header either; not in
a part of the header ever used for building glibc itself).  Likewise,
the atan2 inlines included one for __atan2l, also not a public
interface and not used for building glibc itself (calls inside glibc
generally use __ieee754_atan2l, for which there is a separate
__LIBC_INTERNAL_MATH_INLINES case in this header).

Tested for x86_64 and x86.

	* sysdeps/x86/fpu/bits/mathinline.h [__FAST_MATH__]
	(__sincos_code): Remove define and undefine.
	[__FAST_MATH__] (__sincos): Remove inline function.
	[__FAST_MATH__] (__sincosf): Remove inline function.
	[__FAST_MATH__] (__sincosl): Remove inline function.
	(__atan2l): Remove inline functions.
	[!__GNUC_PREREQ (3, 4)] (__atan2_code): Remove macro.
	[!__GNUC_PREREQ (3, 4) && __FAST_MATH__] (atan2): Remove inline
	function.
	(floor): Remove inline function.
	(ceil): Likewise.
	[__FAST_MATH__] (__ldexp_code): Remove macro.
	[__FAST_MATH__] (ldexp): Remove inline function.
	[__FAST_MATH__ && __USE_ISOC99] (ldexpf): Likewise.
	[__FAST_MATH__ && __USE_ISOC99] (ldexpl): Likewise.
	[__FAST_MATH__ && __USE_ISOC99] (rint): Likewise.
	[__USE_ISOC99] (__lrint_code): Remove macro.
	[__USE_ISOC99] (__llrint_code): Likewise.
	[__USE_ISOC99] (lrintf): Remove inline function.
	[__USE_ISOC99] (lrint): Likewise.
	[__USE_ISOC99] (lrintl): Likewise.
	[__USE_ISOC99] (llrint): Likewise.
	[__USE_ISOC99] (llrintf): Likewise.
	[__USE_ISOC99] (llrintl): Likewise.
2018-03-15 18:26:35 +00:00
Joseph Myers
f9555d7312 Remove old-GCC parts of x86 bits/mathinline.h.
In accordance with the general principle of preferring to let the
compiler optimize function calls based on their standard semantics
rather than putting inline definitions of such functions in installed
headers, this patch removes various such inline definitions in the x86
bits/mathinline.h that were already disabled for GCC 3.5 or later and
so were only used with very old compilers (for which good optimization
is particularly unimportant); along with those inlines, a definition
of __M_SQRT2, which was only used in such inline functions, is also
removed.  This is similar to an early step in removing the string.h
inlines; I intend to follow up with further removals of
bits/mathinline.h inline definitions in appropriate logical groups
(with GCC bugs filed in cases where GCC doesn't already support
corresponding optimizations).

Tested for x86_64 and x86.

	* sysdeps/x86/fpu/bits/mathinline.h [!__GNUC_PREREQ (3, 4)]
	(lrintf): Remove definitions used only with old GCC.
	[!__GNUC_PREREQ (3, 4)] (lrint): Likewise.
	[!__GNUC_PREREQ (3, 4)] (llrintf): Likewise.
	[!__GNUC_PREREQ (3, 4)] (llrint): Likewise.
	[!__GNUC_PREREQ (3, 4)] (fmaxf): Likewise.
	[!__GNUC_PREREQ (3, 4)] (fmax): Likewise.
	[!__GNUC_PREREQ (3, 4)] (fminf): Likewise.
	[!__GNUC_PREREQ (3, 4)] (fmin): Likewise.
	[!__GNUC_PREREQ (3, 4)] (rint): Likewise.
	[!__GNUC_PREREQ (3, 4)] (rintf): Likewise.
	[!__GNUC_PREREQ (3, 4)] (nearbyint): Likewise.
	[!__GNUC_PREREQ (3, 4)] (nearbyintf): Likewise.
	[!__GNUC_PREREQ (3, 4)] (ceil): Likewise.
	[!__GNUC_PREREQ (3, 4)] (ceilf): Likewise.
	[!__GNUC_PREREQ (3, 4)] (floor): Likewise.
	[!__GNUC_PREREQ (3, 4)] (floorf): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (tan): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (fmod): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (sin): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (cos): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log10): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (asin): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (acos): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (atan): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log1p): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (logb): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log2): Likewise.
	[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (drem): Likewise.
	[__FAST_MATH__] (__M_SQRT2): Remove macro.
2018-03-14 18:26:03 +00:00
Joseph Myers
0d40d0ecba Unify and simplify bits/byteswap.h, bits/byteswap-16.h headers (bug 14508, bug 15512, bug 17082, bug 20530).
We have a general principle of preferring optimizations for library
facilities to use compiler built-in functions rather than being
located in library headers, where the compiler can reasonably optimize
code without needing to know glibc implementation details.

This patch applies this principle to bits/byteswap.h, eliminating all
the architecture-specific variants and bits/byteswap-16.h.  The
__bswap_16, __bswap_32 and __bswap_64 interfaces all become inline
functions, never macros, using the GCC built-in functions where
available and otherwise a single architecture-independent definition
using shifts and masking (which compilers may well be able to detect
and optimize; GCC has detection of various byte-swapping idioms).

The __bswap_constant_32 macro needs to stay around because of uses in
static initializers within glibc and its tests, and so for consistency
all __bswap_constant_* are kept rather than just being inlined into
the old-GCC-or-non-GCC parts of the __bswap_* inline function
definitions.

Various open bugs are addressed by this cleanup, with caveats about
exactly what is covered by those bugs and when the bugs applied at
all.

Bug 14508 reports -Wformat warnings building glibc because __bswap_*
sometimes returned the wrong types.  Obviously we already don't have
such warnings any more or the build would be failing, given -Werror,
and I suspect that bug was originally for wrong types for x86_64, as
fixed by commit d394eb742a (glibc 2.17).
The only case I saw removed by this patch where the types would still
have been wrong was the non-__GNUC__ case of __bswap_64 in the s390
header (using unsigned long long int, but uint64_t would be unsigned
long int for 64-bit).  In any case, the single header consistently
uses __uintN_t types after this patch, thereby eliminating all such
bugs.  The existing string/test-endian-types.c test already suffices
to verify that the types are correct with the compiler used to build
glibc and its tests.

Bug 15512 reports an error from __bswap_constant_16 with -Werror
-Wsign-conversion.  I am unable to reproduce this with any GCC version
supporting -Wsign-conversion - all seem to be able to avoid warning
for ((x) >> 8) & 0xffu, where x is uint16_t, which while it formally
does involve an implicit conversion from int to unsigned int, is also
a case where it should be easy for the compiler to see that the value
converted is never negative.  But in this patch __bswap_constant_16 is
changed to use signed 0xff so that no such implicit conversion occurs
at all, and a test with -Werror -Wsign-conversion is added.

Bug 17082 objects to the use of ({}) statement expressions in these
macros preventing use at file scope (in C, that's in sizeof etc.; in
C++, more generally in static initializers).  The particular case of
these interfaces is fixed by this patch as it changes them to inline
functions, eliminating all uses of ({}) in bits/byteswap.h, and a
corresponding testcase is added.  The bug tries to raise a more
general policy question about use of ({}) in macros in installed
headers, referring to "many other libc functions" (unspecified which
functions are being considered).

Since such policy questions belong on libc-alpha, and since there
*are* macros in installed headers which can't really avoid using ({})
(where they are type-generic, so can't use an inline function, but
need a temporary variable, and a few where the interface involves
returning memory from alloca so can't use an inline function either),
I propose to consider that bug fixed with this change.  That is
without prejudice to any other new bugs anyone wishes to file *for
precisely defined sets of macros* requesting moving away from ({})
*where it is clearly possible for those interfaces*.  Where ({}) can
be avoided, typically by use of an inline function, I think that's a
good idea - that inline functions are typically to be preferred to
({}) for header interfaces where such optimizations are useful but the
interface is suited to being defined using an inline function.

Bug 20530 requests use of __builtin_bswap16 when available (GCC 4.8
and later), which this patch implements.

Tested for x86_64, and with build-many-glibcs.py.  Also did an x86_64
test with the __GNUC_PREREQ conditionals changed to "#if 0" to verify
the old-GCC/non-GCC case in the headers.  (There are already existing
tests for correctness of results of these interfaces.)

	[BZ #14508]
	[BZ #15512]
	[BZ #17082]
	[BZ #20530]
	* bits/byteswap.h: Update file comment.  Do not include
	<bits/byteswap-16.h>.
	(__bswap_constant_16): Cast result to __uint16_t.  Use signed 0xff
	constant.
	(__bswap_16): Define as inline function.
	(__bswap_constant_32): Reformat definition.
	(__bswap_32): Always define as inline function, not macro, using
	__uint32_t.  Use __builtin_bswap32 if [__GNUC_PREREQ (4, 3)],
	otherwise __bswap_constant_32.
	(__bswap_constant_64): Reformat definition.  Do not use
	__extension__ here.
	(__bswap_64): Always define as inline function, not macro.  Use
	__extension__ on function definition.  Use __builtin_bswap64 if
	[__GNUC_PREREQ (4, 3)], otherwise __bswap_constant_64.
	* string/test-endian-file-scope.c: New file.
	* string/test-endian-sign-conversion.c: Likewise.
	* string/Makefile (headers): Remove bits/byteswap-16.h.
	(tests): Add test-endian-file-scope and
	test-endian-sign-conversion.
	(CFLAGS-test-endian-sign-conversion.c): New variable.
	* bits/byteswap-16.h: Remove file.
	* sysdeps/ia64/bits/byteswap-16.h: Likewise.
	* sysdeps/ia64/bits/byteswap.h: Likewise.
	* sysdeps/m68k/bits/byteswap.h: Likewise.
	* sysdeps/s390/bits/byteswap-16.h: Likewise.
	* sysdeps/s390/bits/byteswap.h: Likewise.
	* sysdeps/tile/bits/byteswap.h: Likewise.
	* sysdeps/x86/bits/byteswap-16.h: Likewise.
	* sysdeps/x86/bits/byteswap.h: Likewise.
2018-02-06 21:55:08 +00:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Joseph Myers
a23aa5b727 Add _Float64x function aliases.
This patch continues filling out TS 18661-3 support by adding *f64x
function aliases on platforms with _Float64x support.  (It so happens
the set of such platforms is exactly the same as the set of platforms
with _Float128 support, although on x86_64, x86 and ia32 the _Float64x
format is Intel extended rather than binary128.)  The API provided
corresponds exactly to that provided for _Float128, mostly coming from
TS 18661-3.  As these functions always alias those for another type
(long double, _Float128 or both), __* function names are not provided,
as in other cases of alias types.

Given the preparation done in previous patches, this one just enables
the feature via Makeconfig and bits/floatn.h, adds symbol versions,
and updates documentation and ABI baselines.  The symbol versions are
present unconditionally as GLIBC_2.27 in the relevant Versions files,
as it's OK for those to specify versions for functions that may not be
present in some configurations; no additional complexity is needed
unless in future some configuration gains support for this type that
didn't have such support in 2.27.  The Makeconfig additions for ia64
and x86 aren't strictly needed, as those configurations also get
float64x-alias-fcts definitions from
sysdeps/ieee754/float128/Makeconfig, but still seem appropriate given
that _Float64x is not _Float128 for those configurations.

A libm-test-ulps update for x86 is included.  This is because
bits/mathinline.h does not have _Float64x support added and for two
functions the use of out-of-line functions results in increased ulps
(ifloat64x shares ulps with ildouble / ifloat128 as appropriate).
Given that we'd like generally to eliminate bits/mathinline.h
optimizations, preferring to have such optimizations in GCC instead,
it seems reasonable not to add such support there for new types.  GCC
support for _FloatN / _FloatNx built-in functions is limited, but has
been improved in GCC 8, and at some point I hope the full set of libm
built-in functions in GCC, and other optimizations with
per-floating-type aspects, will be enabled for all _FloatN / _FloatNx
types.

Tested for x86_64 and x86, and with build-many-glibcs.py, with both
GCC 6 and GCC 7.

	* sysdeps/ia64/Makeconfig (float64x-alias-fcts): New variable.
	* sysdeps/ieee754/float128/Makeconfig (float64x-alias-fcts):
	Likewise.
	* sysdeps/ieee754/ldbl-128/Makeconfig (float64x-alias-fcts):
	Likewise.
	* sysdeps/x86/Makeconfig: New file.
	* bits/floatn-common.h (__HAVE_FLOAT64X): Remove macro.
	(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
	* bits/floatn.h (__HAVE_FLOAT64X): New macro.
	(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
	* sysdeps/ia64/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
	(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
	* sysdeps/ieee754/ldbl-128/bits/floatn.h (__HAVE_FLOAT64X):
	Likewise.
	(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
	* sysdeps/mips/ieee754/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
	(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
	* sysdeps/powerpc/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
	(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
	* sysdeps/x86/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
	(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
	* manual/math.texi (Mathematics): Document support for _Float64x.
	* math/Versions (GLIBC_2.27): Add _Float64x functions.
	* stdlib/Versions (GLIBC_2.27): Likewise.
	* wcsmbs/Versions (GLIBC_2.27): Likewise.
	* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
	Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
2017-11-27 14:16:47 +00:00
Joseph Myers
015c6dc288 Support bits/floatn.h inclusion from .S files.
Further _FloatN / _FloatNx type alias support will involve making
architecture-specific .S files use the common macros for libm function
aliases.  Making them use those macros will also serve to simplify
existing code for aliases / symbol versions in various cases, similar
to such simplifications for ldbl-opt code.

The libm-alias-*.h files sometimes need to include <bits/floatn.h> to
determine which aliases they should define.  At present, this does not
work for inclusion from .S files because <bits/floatn.h> can define
typedefs for old compilers.  This patch changes all the
<bits/floatn.h> and <bits/floatn-common.h> headers to include
__ASSEMBLER__ conditionals.  Those conditionals disable everything
related to C syntax in the __ASSEMBLER__ case, not just the problem
typedefs, as that seemed cleanest.  The __HAVE_* definitions remain in
the __ASSEMBLER__ case, as those provide information that is required
to define the correct set of aliases.

Tested with build-many-glibcs.py for a representative set of
configurations (x86_64-linux-gnu i686-linux-gnu ia64-linux-gnu
powerpc64le-linux-gnu mips64-linux-gnu-n64 sparc64-linux-gnu) with GCC
6.  Also tested with GCC 6 for i686-linux-gnu in conjunction with
changes to use alias macros in .S files.

	* bits/floatn-common.h [!__ASSEMBLER]: Disable everything related
	to C syntax instead of availability and properties of types.
	* bits/floatn.h [!__ASSEMBLER]: Likewise.
	* sysdeps/ia64/bits/floatn.h [!__ASSEMBLER]: Likewise.
	* sysdeps/ieee754/ldbl-128/bits/floatn.h [!__ASSEMBLER]: Likewise.
	* sysdeps/mips/ieee754/bits/floatn.h [!__ASSEMBLER]: Likewise.
	* sysdeps/powerpc/bits/floatn.h [!__ASSEMBLER]: Likewise.
	* sysdeps/x86/bits/floatn.h [!__ASSEMBLER]: Likewise.
2017-11-17 22:01:43 +00:00
Adhemerval Zanella
06be6368da nptl: Define __PTHREAD_MUTEX_{NUSERS_AFTER_KIND,USE_UNION}
This patch adds two new internal defines to set the internal
pthread_mutex_t layout required by the supported ABIS:

  1. __PTHREAD_MUTEX_NUSERS_AFTER_KIND which control whether to define
     __nusers fields before or after __kind.  The preferred value for
     is 0 for new ports and it sets __nusers before __kind.

  2. __PTHREAD_MUTEX_USE_UNION which control whether internal __spins and
     __list members will be place inside an union for linuxthreads
     compatibility.  The preferred value is 0 for ports and it sets
     to not use an union to define both fields.

It fixes the wrong offsets value for __kind value on x86_64-linux-gnu-x32.
Checked with a make check run-built-tests=no on all afected ABIs.

	[BZ #22298]
	* nptl/allocatestack.c (allocate_stack): Check if
	__PTHREAD_MUTEX_HAVE_PREV is non-zero, instead if
	__PTHREAD_MUTEX_HAVE_PREV is defined.
	* nptl/descr.h (pthread): Likewise.
	* nptl/nptl-init.c (__pthread_initialize_minimal_internal):
	Likewise.
	* nptl/pthread_create.c (START_THREAD_DEFN): Likewise.
	* sysdeps/nptl/fork.c (__libc_fork): Likewise.
	* sysdeps/nptl/pthread.h (PTHREAD_MUTEX_INITIALIZER): Likewise.
	* sysdeps/nptl/bits/thread-shared-types.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
	defines.
	(__pthread_internal_list): Check __PTHREAD_MUTEX_USE_UNION instead
	of __WORDSIZE for internal layout.
	(__pthread_mutex_s): Check __PTHREAD_MUTEX_NUSERS_AFTER_KIND instead
	of __WORDSIZE for internal __nusers layout and __PTHREAD_MUTEX_USE_UNION
	instead of __WORDSIZE whether to use an union for __spins and __list
	fields.
	(__PTHREAD_MUTEX_HAVE_PREV): Define also for __PTHREAD_MUTEX_USE_UNION
	case.
	* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
	defines.
	* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/s390/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/tile/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.
	* sysdeps/x86/nptl/bits/pthreadtypes-arch.h
	(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
	Likewise.

Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-11-07 09:48:41 -02:00
H.J. Lu
95b93c6e0d x86: Add sysdeps/x86/sysdep.h
Add a new header file, sysdeps/x86/sysdep.h, for common assembly code
macros between i386 and x86-64.  Tested on i686 and x86-64.  There are
no differences in outputs of "readelf -a" and "objdump -dw" on all glibc
shared objects before and after the patch.

	* sysdeps/i386/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
	of <sysdeps/generic/sysdep.h>.
	(ALIGNARG): Removed.
	(ASM_SIZE_DIRECTIVE): Likewise.
	(ENTRY): Likewise.
	(END): Likewise.
	(ENTRY_CHK): Likewise.
	(END_CHK): Likewise.
	(syscall_error): Likewise.
	(mcount): Likewise.
	(PSEUDO_END): Likewise.
	(L): Likewise.
	(atom_text_section): Likewise.
	* sysdeps/x86/sysdep.h: New file.
	* sysdeps/x86_64/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
	of <sysdeps/generic/sysdep.h>.
	(ALIGNARG): Removed.
	(ASM_SIZE_DIRECTIVE): Likewise.
	(ENTRY): Likewise.
	(END): Likewise.
	(ENTRY_CHK): Likewise.
	(END_CHK): Likewise.
	(syscall_error): Likewise.
	(mcount): Likewise.
	(PSEUDO_END): Likewise.
	(L): Likewise.
	(atom_text_section): Likewise.
2017-11-01 05:37:26 -07:00
H.J. Lu
4ad5106e3b sysdeps/x86/libc-start.c: Add /* !SHARED */
* sysdeps/x86/libc-start.c: Add /* !SHARED */.
2017-10-30 13:40:28 -07:00
H.J. Lu
fe326df7b0 Reformat sysdeps/x86/libc-start.c
* sysdeps/x86/libc-start.c: Reformat.
2017-10-30 13:01:18 -07:00
Joseph Myers
91c3985c23 Update x86 fix-fp-int-compare-invalid.h for GCC 8.
The glibc implementation of iseqsig relies on ordered comparison
operators raising the "invalid" exception for quiet NaN operands, with
a workaround on platforms where a GCC bug means that exception is not
raised.  For x86, that bug has now been fixed for GCC 8, so this patch
disables the workaround in that case.  If and when the corresponding
bugs for powerpc and s390 are fixed, the headers for those platforms
should of course be updated similarly.

Tested for x86_64 and x86, including with GCC mainline.  Note that
other failures appear with GCC mainline because of spurious use of
ordered comparison instructions for unordered operations
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82692>.

	* sysdeps/x86/fpu/fix-fp-int-compare-invalid.h
	(FIX_COMPARE_INVALID): Define to 0 if [__GNUC_PREREQ (8, 0)].
2017-10-24 00:33:08 +00:00
Joseph Myers
797ba44ba2 Add bits/floatn.h defines for more _FloatN / _FloatNx types.
The bits/floatn.h header currently only has defines relating to
_Float128.  This patch adds defines relating to other _FloatN /
_FloatNx types.

The approach taken is to add defines for all _FloatN / _FloatNx types
known to GCC, and to put them in a common bits/floatn-common.h header
included at the end of all the individual bits/floatn.h headers.  If
in future some defines become different for different glibc
configurations, they will move out into the separate bits/floatn.h
headers.

Some defines are expected always to be the same across glibc ports.
Corresponding defines are nevertheless put in this header.  The intent
is that where there are conditionals (in headers or in non-installed
files) that can just repeat the same or nearly the same logic for each
floating-point type, they should do so, even if in fact the cases for
some types could be unconditionally present or absent because the same
conditionals are true or false for all glibc configurations.  This
should make the glibc code with such conditionals easier to read,
because the reader can just see that the same conditionals are
repeated for each type, rather than seeing different conditionals for
different types and needing to reason, at each location with such
differences, why those differences are indeed correct there.  (Cases
involving per-format rather than per-type logic are more likely still
to need differences in how they handle different types.)

Having such defines and conditionals also helps in incremental
preparation for adding _Float32 / _Float64 / _Float32x / _Float64x
function aliases.  I intend subsequent patches to add such
conditionals corresponding to those already present for _Float128, as
well as making more architecture-specific function implementations use
common macros to define aliases in preparation for adding such _FloatN
/ _FloatNx aliases.

Tested for x86_64.

	* bits/floatn-common.h: New file.
	* math/Makefile (headers): Add bits/floatn-common.h.
	* bits/floatn.h: Include <bits/floatn-common.h>.
	* sysdeps/ia64/bits/floatn.h: Likewise.
	* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
	* sysdeps/mips/ieee754/bits/floatn.h: Likewise.
	* sysdeps/powerpc/bits/floatn.h: Likewise.
	* sysdeps/x86/bits/floatn.h: Likewise.
2017-10-20 21:42:51 +00:00
H.J. Lu
b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
H.J. Lu
4d916f0f12 x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299]
Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set
GLRO(dl_platform) to NULL.

	[BZ #22299]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	GLRO(dl_platform) to NULL.
	* sysdeps/x86_64/Makefile (tests): Add tst-platform-1.
	(modules-names): Add tst-platformmod-1 and
	x86_64/tst-platformmod-2.
	(CFLAGS-tst-platform-1.c): New.
	(CFLAGS-tst-platformmod-1.c): Likewise.
	(CFLAGS-tst-platformmod-2.c): Likewise.
	(LDFLAGS-tst-platformmod-2.so): Likewise.
	($(objpfx)tst-platform-1): Likewise.
	($(objpfx)tst-platform-1.out): Likewise.
	(tst-platform-1-ENV): Likewise.
	($(objpfx)x86_64/tst-platformmod-2.os): Likewise.
	* sysdeps/x86_64/tst-platform-1.c: New file.
	* sysdeps/x86_64/tst-platformmod-1.c: Likewise.
	* sysdeps/x86_64/tst-platformmod-2.c: Likewise.
2017-10-19 08:28:26 -07:00
Adhemerval Zanella
4e17c78e4a Add common ifunc-init.h header
This patch moves the generic definition from x86_64 init-arch
to a common header ifunc-init.h.  No functional changes is expected.

Checked on a x86_64-linux-gnu build.

	* sysdeps/generic/ifunc-init.h: New file.
	* sysdeps/x86/init-arch.h: Use generic ifunc-init.h.
2017-10-17 12:01:22 -02:00
Wilco Dijkstra
4d3693ec1c Remove ancient __signbit inlines
Remove __signbit inlines from mathinline.h.  Math.h already uses
the builtin when supported, so additional inlines are only used
on pre 4.0 GCCs.  Similarly remove ancient copysign and fabs
inlines.

	* sysdeps/alpha/fpu/bits/mathinline.h: Delete file.
	* sysdeps/ia64/fpu/bits/mathinline.h: Delete file.
	* sysdeps/m68k/coldfire/fpu/bits/mathinline.h: Delete file.
	* sysdeps/m68k/m680x0/fpu/bits/mathinline.h: (__signbitf): Remove.
	(__signbit): Remove.
	(__signbitl): Remove.
	* sysdeps/powerpc/bits/mathinline.h (__signbitf): Remove.
	(__signbit): Remove.
	(__signbitl): Remove.
	* sysdeps/s390/fpu/bits/mathinline.h: (__signbitf): Remove.
	(__signbit): Remove.
	(__signbitl): Remove
	* sysdeps/sparc/fpu/bits/mathinline.h (__signbitf): Remove.
	(__signbit): Remove.
	(__signbitl): Remove.
	* sysdeps/tile/bits/mathinline.h: Delete file.
	* sysdeps/x86/fpu/bits/mathinline.h (__signbitf): Remove.
	(__signbit): Remove.
	(__signbitl): Remove.
2017-09-28 19:52:13 +01:00
Wilco Dijkstra
1e6d07234f Simplify C99 isgreater macros
Simplify the C99 isgreater macros.  Although some support was added
in GCC 2.97, not all targets added support until GCC 3.1.  Therefore
only use the builtins in math.h from GCC 3.1 onwards, and defer to
generic macros otherwise.  Improve the generic isunordered macro
to use compares rather than call fpclassify twice - this is not only
faster but also correct for signaling NaNs.

	* math/math.h: Improve handling of C99 isgreater macros.
	* sysdeps/alpha/fpu/bits/mathinline.h: Remove isgreater macros.
	* sysdeps/m68k/m680x0/fpu/bits/mathinline.h: Likewise.
	* sysdeps/powerpc/bits/mathinline.h: Likewise.
	* sysdeps/sparc/fpu/bits/mathinline.h: Likewise.
	* sysdeps/x86/fpu/bits/mathinline.h: Likewise.
2017-09-28 19:43:54 +01:00
H.J. Lu
ef8adeb041 x86: Add MathVec_Prefer_No_AVX512 to cpu-features [BZ #21967]
AVX512 functions in mathvec are used on machines with AVX512.  An AVX2
wrapper is also provided and it can be used when the AVX512 version
isn't profitable.  MathVec_Prefer_No_AVX512 is addded to cpu-features.
If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES
environment variable, the AVX2 wrapper will be used.

Tested on x86-64 machines with and without AVX512.  Also verified
glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine.

	[BZ #21967]
	* sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512):
	New.
	(index_arch_MathVec_Prefer_No_AVX512): Likewise.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Handle MathVec_Prefer_No_AVX512.
	* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h
	(IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512
	is set.
2017-09-12 07:54:47 -07:00
H.J. Lu
45ff34638f x86: Add x86_64 to x86-64 HWCAP [BZ #22093]
Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
"x86_64" subdirectory when loading a shared library.  ld.so in glibc
2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
on supported ISAs.  This led to shared library loading failure for
shared libraries placed under the "x86_64" subdirectory.

This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
search the "x86_64" subdirectory when loading a shared library.

NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
machines.

Tested on i686 and x86-64.

	[BZ #22093]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
	GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
	* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
	(HWCAP_IMPORTANT): Likewise.
	(HWCAP_X86_64): New enum.
	(HWCAP_X86_AVX512_1): Updated.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
	* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
	(modules-names): Add x86_64/tst-x86_64mod-1.
	(LDFLAGS-tst-x86_64mod-1.so): New.
	($(objpfx)tst-x86_64-1): Likewise.
	($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
	(tst-x86_64-1-clean): Likewise.
	* sysdeps/x86_64/tst-x86_64-1.c: New file.
	* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
2017-09-11 08:18:32 -07:00
Samuel Thibault
1946d950f2 hurd: fix libm link
* sysdeps/x86/fpu/include/bits/fenv.h [NO_HIDDEN]: Redirect
	__feraiseexcept_renamed to feraiseexcept instead of
	__GI_feraiseexcept.
2017-09-03 05:32:10 +02:00
Joseph Myers
a60eca2e55 Simplify HUGE_VAL definitions.
There are various bits/huge_val*.h headers to define HUGE_VAL and
related macros.  All of them use __builtin_huge_val etc. for GCC 3.3
and later.  Then there are various fallbacks, such as using a large
hex float constant for GCC 2.96 and later, or using unions (with or
without compound literals) to construct the bytes of an infinity, with
this last being the reason for having architecture-specific files.
Supporting TS 18661-3 _FloatN / _FloatNx types that have the same
format as other supported types will mean adding more such macros;
needing to add more headers for them doesn't seem very desirable.

The fallbacks based on bytes of the representation of an infinity do
not meet the standard requirements for a constant expression.  At
least one of them is also wrong: sysdeps/sh/bits/huge_val.h is
producing a mixed-endian representation which does not match what GCC
does.

This patch eliminates all those headers, defining the macros directly
in math.h.  For GCC 3.3 and later, the built-in functions are used as
now.  For other compilers, a large constant 1e10000 (with appropriate
suffix) is used.  This is like the fallback for GCC 2.96 and later,
but without using hex floats (which have no apparent advantage here).
It is unambiguously valid standard C for all floating-point formats
with infinities, which covers all formats supported by glibc or likely
to be supported by glibc in future (C90 DR#025 said that if a
floating-point format represents infinities, all real values lie
within the range of representable values, so the constraints for
constant expressions are not violated), but may generate compiler
warnings and wouldn't handle the TS 18661-1 FENV_ROUND pragma
correctly.  If someone is actually using a compiler with glibc that
does not claim to be GCC 3.3 or later, but which has a better way to
define the HUGE_VAL macros, we can always add compiler conditionals in
with alternative definitions.

I intend to make similar changes for INF and NAN.  The SNAN macros
already just use __builtin_nans etc. with no fallback for compilers
not claiming to be GCC 3.3 or later.

Tested for x86_64.

	* math/math.h: Do not include bits/huge_val.h, bits/huge_valf.h,
	bits/huge_vall.h or bits/huge_val_flt128.h.
	(HUGE_VAL): Define directly here.
	[__USE_ISOC99] (HUGE_VALF): Likewise.
	[__USE_ISOC99] (HUGE_VALL): Likewise.
	[__HAVE_FLOAT128 && __GLIBC_USE (IEC_60559_TYPES_EXT)]
	(HUGE_VAL_F128): Likewise.
	* math/Makefile (headers): Remove bits/huge_val.h,
	bits/huge_valf.h, bits/huge_vall.h and bits/huge_val_flt128.h.
	* bits/huge_val.h: Remove.
	* bits/huge_val_flt128.h: Likewise.
	* bits/huge_valf.h: Likewise.
	* bits/huge_vall.h: Likewise.
	* sysdeps/ia64/bits/huge_vall.h: Likewise.
	* sysdeps/ieee754/bits/huge_val.h: Likewise.
	* sysdeps/ieee754/bits/huge_valf.h: Likewise.
	* sysdeps/m68k/m680x0/bits/huge_vall.h: Likewise.
	* sysdeps/sh/bits/huge_val.h: Likewise.
	* sysdeps/sparc/bits/huge_vall.h: Likewise.
	* sysdeps/x86/bits/huge_vall.h: Likewise.
2017-08-31 15:50:50 +00:00
H.J. Lu
7ab70c98e8 x86: Remove assembly versions of index_cpu_*/index_arch_*
Since assembly versions of HAS_CPU_FEATURE and HAS_ARCH_FEATURE have
been removed,  assembly versions of index_cpu_* and index_arch_* can
also be removed.

Tested on i686 and x86-64 with and without --disable-multi-arch.

	* sysdeps/x86/cpu-features.h [__ASSEMBLER__]
	(index_cpu_*, index_arch_*): Removed.
2017-08-25 10:51:04 -07:00
H.J. Lu
73322d5ff6 x86: Add IBT/SHSTK bits to cpu-features
Add IBT/SHSTK bits to cpu-features for Shadow Stack in Intel Control-flow
Enforcement Technology (CET) instructions:

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

	* sysdeps/x86/cpu-features.h (bit_cpu_BIT): New.
	(bit_cpu_SHSTK): Likewise.
	(index_cpu_IBT): Likewise.
	(index_cpu_SHSTK): Likewise.
	(reg_IBT): Likewise.
	(reg_SHSTK): Likewise.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Handle index_cpu_IBT and index_cpu_SHSTK.
2017-08-14 05:54:38 -07:00
H.J. Lu
64d4dea6cd x86: Remove assembly versions of HAS_CPU_FEATURE/HAS_ARCH_FEATURE
Since all x86 IFUNC selectors are implemented in C, assembly versions of
HAS_CPU_FEATURE and HAS_ARCH_FEATURE can be removed.

	* sysdeps/x86/cpu-features.h [__ASSEMBLER__]
	(LOAD_RTLD_GLOBAL_RO_RDX, HAS_FEATURE, LOAD_FUNC_GOT_EAX,
	HAS_CPU_FEATURE, HAS_ARCH_FEATURE): Removed.
2017-08-04 13:38:20 -07:00
H.J. Lu
d2cf37c0a2 x86-64: Use _dl_runtime_resolve_opt only with AVX512F [BZ #21871]
On AVX machines with XGETBV (ECX == 1) like Skylake processors,

(gdb) disass _dl_runtime_resolve_avx_opt
Dump of assembler code for function _dl_runtime_resolve_avx_opt:
   0x0000000000015890 <+0>:	push   %rax
   0x0000000000015891 <+1>:	push   %rcx
   0x0000000000015892 <+2>:	push   %rdx
   0x0000000000015893 <+3>:	mov    $0x1,%ecx
   0x0000000000015898 <+8>:	xgetbv
   0x000000000001589b <+11>:	mov    %eax,%r11d
   0x000000000001589e <+14>:	pop    %rdx
   0x000000000001589f <+15>:	pop    %rcx
   0x00000000000158a0 <+16>:	pop    %rax
   0x00000000000158a1 <+17>:	and    $0x4,%r11d
   0x00000000000158a5 <+21>:	bnd je 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.

is slower than:

(gdb) disass _dl_runtime_resolve_avx_slow
Dump of assembler code for function _dl_runtime_resolve_avx_slow:
   0x0000000000015850 <+0>:	vorpd  %ymm0,%ymm1,%ymm8
   0x0000000000015854 <+4>:	vorpd  %ymm2,%ymm3,%ymm9
   0x0000000000015858 <+8>:	vorpd  %ymm4,%ymm5,%ymm10
   0x000000000001585c <+12>:	vorpd  %ymm6,%ymm7,%ymm11
   0x0000000000015860 <+16>:	vorpd  %ymm8,%ymm9,%ymm9
   0x0000000000015865 <+21>:	vorpd  %ymm10,%ymm11,%ymm10
   0x000000000001586a <+26>:	vpcmpeqd %xmm8,%xmm8,%xmm8
   0x000000000001586f <+31>:	vorpd  %ymm9,%ymm10,%ymm10
   0x0000000000015874 <+36>:	vptest %ymm10,%ymm8
   0x0000000000015879 <+41>:	bnd jae 0x158b0 <_dl_runtime_resolve_avx>
   0x000000000001587c <+44>:	vzeroupper
   0x000000000001587f <+47>:	bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.
(gdb)

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest.  _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

	[BZ #21871]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.
2017-08-04 11:14:33 -07:00
Joseph Myers
24ab7723b8 Consistently use uintN_t not u_intN_t in libm.
This patch changes libm code to make consistent use of C99 uintN_t
types instead of sometimes using those and sometimes using the older
nonstandard u_intN_t names.  This makes sense as a cleanup in its own
right, and also facilitates merges to GCC's libquadmath (which gets
the types from stdint.h and so may not have u_intN_t available at
all).

Tested for x86_64, and with build-many-glibcs.py.

	* math/s_nextafter.c (__nextafter): Use uintN_t instead of
	u_intN_t.
	* math/s_nexttowardf.c (__nexttowardf): Likewise.
	* sysdeps/generic/math_private.h (ieee_double_shape_type):
	Likewise.
	(ieee_float_shape_type): Likewise.
	* sysdeps/i386/fpu/s_fpclassifyl.c (__fpclassifyl): Likewise.
	* sysdeps/i386/fpu/s_isnanl.c (__isnanl): Likewise.
	* sysdeps/i386/fpu/s_nextafterl.c (__nextafterl): Likewise.
	* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
	* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
	* sysdeps/ieee754/dbl-64/e_acosh.c (__ieee754_acosh): Likewise.
	* sysdeps/ieee754/dbl-64/e_cosh.c (__ieee754_cosh): Likewise.
	* sysdeps/ieee754/dbl-64/e_fmod.c (__ieee754_fmod): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
	Likewise.
	* sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Likewise.
	* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise.
	(__ieee754_yn): Likewise.
	* sysdeps/ieee754/dbl-64/e_log10.c (__ieee754_log10): Likewise.
	* sysdeps/ieee754/dbl-64/e_log2.c (__ieee754_log2): Likewise.
	* sysdeps/ieee754/dbl-64/e_rem_pio2.c (__ieee754_rem_pio2):
	Likewise.
	* sysdeps/ieee754/dbl-64/e_sinh.c (__ieee754_sinh): Likewise.
	* sysdeps/ieee754/dbl-64/s_ceil.c (__ceil): Likewise.
	* sysdeps/ieee754/dbl-64/s_copysign.c (__copysign): Likewise.
	* sysdeps/ieee754/dbl-64/s_erf.c (__erf): Likewise.
	(__erfc): Likewise.
	* sysdeps/ieee754/dbl-64/s_expm1.c (__expm1): Likewise.
	* sysdeps/ieee754/dbl-64/s_finite.c (FINITE): Likewise.
	* sysdeps/ieee754/dbl-64/s_floor.c (__floor): Likewise.
	* sysdeps/ieee754/dbl-64/s_fpclassify.c (__fpclassify): Likewise.
	* sysdeps/ieee754/dbl-64/s_isnan.c (__isnan): Likewise.
	* sysdeps/ieee754/dbl-64/s_issignaling.c (__issignaling):
	Likewise.
	* sysdeps/ieee754/dbl-64/s_llrint.c (__llrint): Likewise.
	* sysdeps/ieee754/dbl-64/s_llround.c (__llround): Likewise.
	* sysdeps/ieee754/dbl-64/s_lrint.c (__lrint): Likewise.
	* sysdeps/ieee754/dbl-64/s_lround.c (__lround): Likewise.
	* sysdeps/ieee754/dbl-64/s_modf.c (__modf): Likewise.
	* sysdeps/ieee754/dbl-64/s_nextup.c (__nextup): Likewise.
	* sysdeps/ieee754/dbl-64/s_remquo.c (__remquo): Likewise.
	* sysdeps/ieee754/dbl-64/s_round.c (__round): Likewise.
	* sysdeps/ieee754/dbl-64/s_trunc.c (__trunc): Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_issignaling.c
	(__issignaling): Likewise.
	* sysdeps/ieee754/flt-32/e_atan2f.c (__ieee754_atan2f): Likewise.
	* sysdeps/ieee754/flt-32/e_fmodf.c (__ieee754_fmodf): Likewise.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
	Likewise.
	* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_ynf): Likewise.
	* sysdeps/ieee754/flt-32/e_log10f.c (__ieee754_log10f): Likewise.
	* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Likewise.
	* sysdeps/ieee754/flt-32/e_rem_pio2f.c (__ieee754_rem_pio2f):
	Likewise.
	* sysdeps/ieee754/flt-32/e_remainderf.c (__ieee754_remainderf):
	Likewise.
	* sysdeps/ieee754/flt-32/e_sqrtf.c (__ieee754_sqrtf): Likewise.
	* sysdeps/ieee754/flt-32/s_ceilf.c (__ceilf): Likewise.
	* sysdeps/ieee754/flt-32/s_copysignf.c (__copysignf): Likewise.
	* sysdeps/ieee754/flt-32/s_erff.c (__erff): Likewise.
	(__erfcf): Likewise.
	* sysdeps/ieee754/flt-32/s_expm1f.c (__expm1f): Likewise.
	* sysdeps/ieee754/flt-32/s_finitef.c (FINITEF): Likewise.
	* sysdeps/ieee754/flt-32/s_floorf.c (__floorf): Likewise.
	* sysdeps/ieee754/flt-32/s_fpclassifyf.c (__fpclassifyf):
	Likewise.
	* sysdeps/ieee754/flt-32/s_isnanf.c (__isnanf): Likewise.
	* sysdeps/ieee754/flt-32/s_issignalingf.c (__issignalingf):
	Likewise.
	* sysdeps/ieee754/flt-32/s_llrintf.c (__llrintf): Likewise.
	* sysdeps/ieee754/flt-32/s_llroundf.c (__llroundf): Likewise.
	* sysdeps/ieee754/flt-32/s_lrintf.c (__lrintf): Likewise.
	* sysdeps/ieee754/flt-32/s_lroundf.c (__lroundf): Likewise.
	* sysdeps/ieee754/flt-32/s_modff.c (__modff): Likewise.
	* sysdeps/ieee754/flt-32/s_remquof.c (__remquof): Likewise.
	* sysdeps/ieee754/flt-32/s_roundf.c (__roundf): Likewise.
	* sysdeps/ieee754/ldbl-128/e_acoshl.c (__ieee754_acoshl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_atan2l.c (__ieee754_atan2l):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_atanhl.c (__ieee754_atanhl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_fmodl.c (__ieee754_fmodl): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
	(__ieee754_ynl): Likewise.
	* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128/e_rem_pio2l.c (__ieee754_rem_pio2l):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_remainderl.c (__ieee754_remainderl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_sinhl.c (__ieee754_sinhl): Likewise.
	* sysdeps/ieee754/ldbl-128/k_cosl.c (__kernel_cosl): Likewise.
	* sysdeps/ieee754/ldbl-128/k_sincosl.c (__kernel_sincosl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/k_sinl.c (__kernel_sinl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_ceill.c (__ceill): Likewise.
	* sysdeps/ieee754/ldbl-128/s_copysignl.c (__copysignl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_erfl.c (__erfcl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_fabsl.c (__fabsl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_finitel.c (__finitel): Likewise.
	* sysdeps/ieee754/ldbl-128/s_floorl.c (__floorl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_fpclassifyl.c (__fpclassifyl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_frexpl.c (__frexpl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_isnanl.c (__isnanl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_issignalingl.c (__issignalingl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_llrintl.c (__llrintl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_llroundl.c (__llroundl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_lroundl.c (__lroundl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_modfl.c (__modfl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_nearbyintl.c (__nearbyintl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_nextafterl.c (__nextafterl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_nexttowardf.c (__nexttowardf):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_nextupl.c (__nextupl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_remquol.c (__remquol): Likewise.
	* sysdeps/ieee754/ldbl-128/s_rintl.c (__rintl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_roundl.c (__roundl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_tanhl.c (__tanhl): Likewise.
	* sysdeps/ieee754/ldbl-128/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_fmodl.c (__ieee754_fmodl):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_rem_pio2l.c (__ieee754_rem_pio2l):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_remainderl.c
	(__ieee754_remainderl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/k_cosl.c (__kernel_cosl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/k_sinl.c (__kernel_sinl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_fabsl.c (__fabsl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_fpclassifyl.c (___fpclassifyl):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_modfl.c (__modfl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
	* sysdeps/ieee754/ldbl-96/e_acoshl.c (__ieee754_acoshl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_asinl.c (__ieee754_asinl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_atanhl.c (__ieee754_atanhl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_coshl.c (__ieee754_coshl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_j0l.c (__ieee754_j0l): Likewise.
	(__ieee754_y0l): Likewise.
	(pzero): Likewise.
	(qzero): Likewise.
	* sysdeps/ieee754/ldbl-96/e_j1l.c (__ieee754_j1l): Likewise.
	(__ieee754_y1l): Likewise.
	(pone): Likewise.
	(qone): Likewise.
	* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise.
	(__ieee754_ynl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
	(__ieee754_lgammal_r): Likewise.
	* sysdeps/ieee754/ldbl-96/e_rem_pio2l.c (__ieee754_rem_pio2l):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_sinhl.c (__ieee754_sinhl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_copysignl.c (__copysignl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_erfl.c (__erfl): Likewise.
	(__erfcl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_frexpl.c (__frexpl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_issignalingl.c (__issignalingl):
	Likewise.
	* sysdeps/ieee754/ldbl-96/s_llrintl.c (__llrintl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_llroundl.c (__llroundl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_lrintl.c (__lrintl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_lroundl.c (__lroundl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_modfl.c (__modfl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Likewise.
	* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
	Likewise.
	* sysdeps/ieee754/ldbl-96/s_nextupl.c (__nextupl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_remquol.c (__remquol): Likewise.
	* sysdeps/ieee754/ldbl-96/s_roundl.c (__roundl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_tanhl.c (__tanhl): Likewise.
	* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
	Likewise.
	* sysdeps/m68k/m680x0/fpu/e_pow.c (s(__ieee754_pow)): Likewise.
	* sysdeps/m68k/m680x0/fpu/s_fpclassifyl.c (__fpclassifyl):
	Likewise.
	* sysdeps/m68k/m680x0/fpu/s_llrint.c (__llrint): Likewise.
	* sysdeps/m68k/m680x0/fpu/s_llrintf.c (__llrintf): Likewise.
	* sysdeps/m68k/m680x0/fpu/s_llrintl.c (__llrintl): Likewise.
	* sysdeps/m68k/m680x0/fpu/s_nextafterl.c (__nextafterl): Likewise.
	* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise.
2017-08-03 19:55:04 +00:00
Gabriel F. T. Gomes
8466ee1cb7 float128: Add signbit alternative for old compilers
In math/math.h, __MATH_TG will expand signbit to __builtin_signbit*,
e.g.: __builtin_signbitf128, before GCC 6.  However, there has never
been a __builtin_signbitf128 in GCC and the type-generic builtin is
only available since GCC 6.  For older GCC, this patch defines
__builtin_signbitf128 to __signbitf128, so that the internal function
is used instead of the non-existent builtin.

This patch also changes the implementation of __signbitf128, because
it was reusing the implementation of __signbitl from ldbl-128, which
calls __builtin_signbitl.  Using the long double version of the
builtin is not correct on machines where _Float128 is ABI-distinct
from long double (i.e.: ia64, powerpc64le, x86, x86_84).  The new
implementation does not rely on builtins when being built with GCC
versions older than 6.0.

The new code does not currently affect powerpc64le builds, because
only GCC 6.2 fulfills the requirements from configure.  It might
affect powerpc64le builds if those requirements are backported to
older versions of the compiler.  The new code affects x86_64 builds,
since glibc is supposed to build correctly with older versions of GCC.

Tested for powerpc64le and x86_64.

	* include/math.h (__signbitf128): Define as hidden.
	* sysdeps/ieee754/float128/s_signbitf128.c (__signbitf128):
	Reimplement without builtins.
	* sysdeps/ia64/bits/floatn.h [!__GNUC_PREREQ (6, 0)]
	(__builtin_signbitf128): Define to __signbitf128.
	* sysdeps/powerpc/bits/floatn.h: Likewise.
	* sysdeps/x86/bits/floatn.h: Likewise.
2017-06-30 18:34:29 -03:00
Joseph Myers
c86ed71d63 Add float128 support for x86_64, x86.
This patch enables float128 support for x86_64 and x86.  All GCC
versions that can build glibc provide the required support, but since
GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN
tests and some tests of NaN payloads need to be disabled with such
compilers (this does not affect the generated glibc binaries at all,
just the tests).  bits/floatn.h declares float128 support to be
available for GCC versions that provide the required libgcc support
(4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd);
compilation-only support was present some time before then, but not
really useful without the libgcc functions.

fenv_private.h needed updating to avoid trying to put _Float128 values
in registers.  I make no assertion of optimality of the
math_opt_barrier / math_force_eval definitions for this case; they are
simply intended to be sufficient to work correctly.

Tested for x86_64 and x86, with GCC 7 and GCC 6.  (Testing for x32 was
compilation tests only with build-many-glibcs.py to verify the ABI
baseline updates.  I have not done any testing for Hurd, although the
float128 support is enabled there as for GNU/Linux.)

	* sysdeps/i386/Implies: Add ieee754/float128.
	* sysdeps/x86_64/Implies: Likewise.
	* sysdeps/x86/bits/floatn.h: New file.
	* sysdeps/x86/float128-abi.h: Likewise.
	* manual/math.texi (Mathematics): Document support for _Float128
	on x86_64 and x86.
	* sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>.
	(math_opt_barrier): Do not put _Float128 values in floating-point
	registers.
	(math_force_eval): Likewise.
	[__x86_64__] (SET_RESTORE_ROUNDF128): New macro.
	* sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append
	to Makefile variable.
	* sysdeps/x86/fpu/e_sqrtf128.c: New file.
	* sysdeps/x86/fpu/sfp-machine.h: Likewise.  Based on libgcc.
	* sysdeps/x86/math-tests.h: New file.
	* math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro.
	* math/libm-test-getpayload.inc (getpayload_test_data): Use
	XFAIL_FLOAT128_PAYLOAD.
	* math/libm-test-setpayload.inc (setpayload_test_data): Likewise.
	* math/libm-test-totalorder.inc (totalorder_test_data): Likewise.
	* math/libm-test-totalordermag.inc (totalordermag_test_data):
	Likewise.
	* sysdeps/unix/sysv/linux/i386/libc.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-06-26 22:02:24 +00:00
Joseph Myers
16000c8d04 Avoid localplt issues from x86 fereaiseexcept inline.
Building for x86_64 with float128 support, I get a localplt test
failure from lrintf128 calling feraiseexcept.

The problem is that an inline optimized version of feraiseexcept calls
__feraiseexcept_renamed in cases where it doesn't completely expand
inline, and that in turn is redirected to feraiseexcept for a library
call, so meaning the redirection of feraiseexcept to
__GI_feraiseexcept inside libm is lost for that call.

This patch fixes the problem by moving the redirect to an internal
header in the _LIBC case, with the internal header using
__GI_feraiseexcept where appropriate.

Tested for x86_64 (in conjunction with float128 patches).

	* sysdeps/x86/fpu/bits/fenv.h [_LIBC] (__feraiseexcept_renamed):
	Do not declare.
	* sysdeps/x86/fpu/include/bits/fenv.h [_LIBC &&
	__USE_EXTERN_INLINES] (__feraiseexcept_renamed): Declare here,
	redirected to __GI_feraiseexcept if [SHARED && IS_IN (libm)].
2017-06-23 20:04:23 +00:00
H.J. Lu
03feacb562 x86: Rename glibc.tune.ifunc to glibc.tune.hwcaps
Rename glibc.tune.ifunc to glibc.tune.hwcaps and move it to
sysdeps/x86/dl-tunables.list since it is x86 specicifc.  Also
change type of data_cache_size, data_cache_size and
non_temporal_threshold to unsigned long int to match size_t.
Remove usage DEFAULT_STRLEN from cpu-tunables.c.

	* elf/dl-tunables.list (glibc.tune.ifunc): Removed.
	* sysdeps/x86/dl-tunables.list (glibc.tune.hwcaps): New.
	Remove security_level on all fields.
	* manual/tunables.texi: Replace ifunc with hwcaps.
	* sysdeps/x86/cpu-features.c (TUNABLE_CALLBACK (set_ifunc)):
	Renamed to ..
	(TUNABLE_CALLBACK (set_hwcaps)): This.
	(init_cpu_features): Updated.
	* sysdeps/x86/cpu-features.h (cpu_features): Change type of
	data_cache_size, data_cache_size and non_temporal_threshold to
	unsigned long int.
	* sysdeps/x86/cpu-tunables.c (DEFAULT_STRLEN): Removed.
	(TUNABLE_CALLBACK (set_ifunc)): Renamed to ...
	(TUNABLE_CALLBACK (set_hwcaps)): This.  Update comments.  Don't
	use DEFAULT_STRLEN.
2017-06-21 10:21:37 -07:00
H.J. Lu
da69a35566 Move x86 specific tunables to x86/dl-tunables.list
* elf/dl-tunables.list: Move x86 specific tunables to ...
	* sysdeps/x86/dl-tunables.list: Here.  New file.
2017-06-20 14:03:31 -07:00
H.J. Lu
905947c304 tunables: Add IFUNC selection and cache sizes
The current IFUNC selection is based on microbenchmarks in glibc.  It
should give the best performance for most workloads.  But other choices
may have better performance for a particular workload or on the hardware
which wasn't available at the selection was made.  The environment
variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
where the feature name is case-sensitive and has to match the ones in
cpu-features.h.  It can be used by glibc developers to override the
IFUNC selection to tune for a new processor or improve performance for
a particular workload.  It isn't intended for normal end users.

NOTE: the IFUNC selection may change over time.  Please check all
multiarch implementations when experimenting.

Also, GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=NUMBER is
provided to set threshold to use non temporal store to NUMBER,
GLIBC_TUNABLES=glibc.tune.x86_data_cache_size=NUMBER to set data cache
size, GLIBC_TUNABLES=glibc.tune.x86_shared_cache_size=NUMBER to set
shared cache size.

	* elf/dl-tunables.list (tune): Add ifunc,
	x86_non_temporal_threshold,
	x86_data_cache_size and x86_shared_cache_size.
	* manual/tunables.texi: Document glibc.tune.ifunc,
	glibc.tune.x86_data_cache_size, glibc.tune.x86_shared_cache_size
	and glibc.tune.x86_non_temporal_threshold.
	* sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86/cacheinfo.c
	(init_cacheinfo): Check and get data cache size, shared cache
	size and non temporal threshold from cpu_features.
	* sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
	New.
	[HAVE_TUNABLES] Include <unistd.h>.
	[HAVE_TUNABLES] Include <elf/dl-tunables.h>.
	[HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
	[HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
	IFUNC selection, data cache size, shared cache size and non
	temporal threshold.
	* sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
	shared_cache_size and non_temporal_threshold.
2017-06-20 08:37:28 -07:00
Zack Weinberg
09a596cc2c Remove bits/string.h.
These machine-dependent inline string functions have never been on by
default, and even if they were a good idea at the time they were
introduced, they haven't really been touched in ten to fifteen years
and probably aren't a good idea on current-gen processors.  Current
thinking is that this class of optimization is best left to the
compiler.

	* bits/string.h, string/bits/string.h
	* sysdeps/aarch64/bits/string.h
	* sysdeps/m68k/m680x0/m68020/bits/string.h
	* sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h
	* sysdeps/x86/bits/string.h: Delete file.

	* string/string.h: Don't include bits/string.h.
	* string/bits/string3.h: Rename to bits/string_fortified.h.
	No need to undef various symbols that the removed headers
	might have defined as macros.
	* string/Makefile (headers): Remove bits/string.h, change
	bits/string3.h to bits/string_fortified.h.
	* string/string-inlines.c: Update commentary.  Remove definitions
	of various macros that nothing looks at anymore.  Don't directly
	include bits/string.h. Set _STRING_INLINE_unaligned here, based on
	compiler-predefined macros.
	* string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY
	_is_ defined, provide internal hidden alias __strncat.
	* include/string.h: Declare internal hidden alias __strncat.
	Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is
	not defined.
	* include/bits/string3.h: Rename to bits/string_fortified.h,
	update to match above.

	* sysdeps/i386/string-inlines.c: Define compat symbols for
	everything formerly defined by sysdeps/x86/bits/string.h.
	Make existing definitions into compat symbols as well.
	Remove some no-longer-necessary messing around with macros.

	* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
	* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
	* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
	* sysdeps/s390/multiarch/mempcpy.c
	No need to define _HAVE_STRING_ARCH_mempcpy.
	Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT.

	* sysdeps/i386/i686/multiarch/strncat-c.c
	* sysdeps/s390/multiarch/strncat-c.c
	* sysdeps/x86_64/multiarch/strncat-c.c
	Define STRNCAT_PRIMARY.  Don't change definition of libc_hidden_def.
2017-06-20 08:21:24 -04:00
Siddhesh Poyarekar
511c5a1087 Make LD_HWCAP_MASK usable for static binaries
The LD_HWCAP_MASK environment variable was ignored in static binaries,
which is inconsistent with the behaviour of dynamically linked
binaries.  This seems to have been because of the inability of
ld_hwcap_mask being read early enough to influence anything but now
that it is in tunables, the mask is usable in static binaries as well.

This feature is important for aarch64, which relies on HWCAP_CPUID
being masked out to disable multiarch.  A sanity test on x86_64 shows
that there are no failures.  Likewise for aarch64.

	* elf/dl-hwcaps.h [HAVE_TUNABLES]: Always read hwcap_mask.
	* sysdeps/sparc/sparc32/dl-machine.h [HAVE_TUNABLES]:
	Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Always set
	up hwcap and hwcap_mask.
2017-06-07 11:11:40 +05:30
Siddhesh Poyarekar
ff08fc59e3 tunables: Use glibc.tune.hwcap_mask tunable instead of _dl_hwcap_mask
Drop _dl_hwcap_mask when building with tunables.  This completes the
transition of hwcap_mask reading from _dl_hwcap_mask to tunables.

	* elf/dl-hwcaps.h: New file.
	* elf/dl-hwcaps.c: Include it.
	(_dl_important_hwcaps)[HAVE_TUNABLES]: Read and update
	glibc.tune.hwcap_mask.
	* elf/dl-cache.c: Include dl-hwcaps.h.
	(_dl_load_cache_lookup)[HAVE_TUNABLES]: Read
	glibc.tune.hwcap_mask.
	* sysdeps/sparc/sparc32/dl-machine.h: Likewise.
	* elf/dl-support.c (_dl_hwcap2)[HAVE_TUNABLES]: Drop
	_dl_hwcap_mask.
	* elf/rtld.c (rtld_global_ro)[HAVE_TUNABLES]: Drop
	_dl_hwcap_mask.
	(process_envvars)[HAVE_TUNABLES]: Likewise.
	* sysdeps/generic/ldsodefs.h (rtld_global_ro)[HAVE_TUNABLES]:
	Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't
	initialize dl_hwcap_mask when tunables are enabled.
2017-06-07 11:11:38 +05:30
H.J. Lu
48e7bc7a55 x86: Don't use dl_x86_cpu_features in cacheinfo.c
Since cpu_features is available, use it instead of dl_x86_cpu_features.

	* sysdeps/x86/cacheinfo.c (intel_check_word): Accept cpu_features
	and use it instead of dl_x86_cpu_features.
	(handle_intel): Replace maxidx with cpu_features.  Pass
	cpu_features to intel_check_word.
	(__cache_sysconf): Pass cpu_features to handle_intel.
	(init_cacheinfo): Likewise.  Use cpu_features instead of
	dl_x86_cpu_features.
2017-06-05 16:20:11 -07:00
H.J. Lu
935971ba6b x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE
Optimize x86-64 memcmp/wmemcmp with AVX2.  It uses vector compare as
much as possible.  It is as fast as SSE4 memcmp for size <= 16 bytes
and up to 2X faster for size > 16 bytes on Haswell and Skylake.  Select
AVX2 memcmp/wmemcmp on AVX2 machines where vzeroupper is preferred and
AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

Key features:

1. For size from 2 to 7 bytes, load as big endian with movbe and bswap
   to avoid branches.
2. Use overlapping compare to avoid branch.
3. Use vector compare when size >= 4 bytes for memcmp or size >= 8
   bytes for wmemcmp.
4. If size is 8 * VEC_SIZE or less, unroll the loop.
5. Compare 4 * VEC_SIZE at a time with the aligned first memory area.
6. Use 2 vector compares when size is 2 * VEC_SIZE or less.
7. Use 4 vector compares when size is 4 * VEC_SIZE or less.
8. Use 8 vector compares when size is 8 * VEC_SIZE or less.

	* sysdeps/x86/cpu-features.h (index_cpu_MOVBE): New.
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memcmp-avx2 and wmemcmp-avx2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test __memcmp_avx2 and __wmemcmp_avx2.
	* sysdeps/x86_64/multiarch/memcmp-avx2.S: New file.
	* sysdeps/x86_64/multiarch/wmemcmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/memcmp.S: Use __memcmp_avx2 on AVX
	2 machines if AVX unaligned load is fast and vzeroupper is
	preferred.
	* sysdeps/x86_64/multiarch/wmemcmp.S: Use __wmemcmp_avx2 on AVX
	2 machines if AVX unaligned load is fast and vzeroupper is
	preferred.
2017-06-05 12:52:55 -07:00
H.J. Lu
9cd30491dd x86: Add macros to implement ifunce selection in C
These macros are used to implement ifunc selection in C.  To implement
an ifunc function, foo, which returns the address of __foo_sse2 or
__foo_avx2:

   __foo_avx2:

   #define foo __redirect_foo
   #define __foo __redirect___foo
   #include <foo.h>
   #undef foo
   #undef __foo
   #define SYMBOL_NAME foo
   #include <init-arch.h>

   extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2) attribute_hidden;
   extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2) attribute_hidden;

   static inline void *
   foo_selector (void)
   {
     if (use AVX2)
      return OPTIMIZE (avx2);

     return OPTIMIZE (sse2);
   }

   libc_ifunc_redirected (__redirect_foo, foo, foo_selector ());

	* sysdeps/x86/init-arch.h (PASTER1): New.
	(EVALUATOR1): Likewise.
	(PASTER2): Likewise.
	(EVALUATOR2): Likewise.
	(REDIRECT_NAME): Likewise.
	(OPTIMIZE): Likewise.
	(IFUNC_SELECTOR): Likewise.
2017-06-05 08:28:13 -07:00
H.J. Lu
808fd9e6fe x86: Update __x86_shared_non_temporal_threshold
__x86_shared_non_temporal_threshold was set to 6 times of per-core
shared cache size, based on the large memcpy micro benchmark in glibc
on a 8-core processor.  For a processor with more than 8 cores, the
threshold is too low.  Set __x86_shared_non_temporal_threshold to the
3/4 of the total shared cache size so that it is unchanged on 8-core
processors.  On processors with less than 8 cores, the threshold is
lower.

	* sysdeps/x86/cacheinfo.c (__x86_shared_non_temporal_threshold):
	Set to the 3/4 of the total shared cache size.
2017-06-02 17:32:37 -07:00
Siddhesh Poyarekar
4158ba082c Delay initialization of CPU features struct in static binaries
Allow the CPU features structure set up to be overridden by tunables
by delaying it to until after tunables are initialized.  The
initialization is already delayed in dynamically linked glibc, it is
only in static binaries that the initialization is set early to allow
it to influence IFUNC relocations that happen in libc-start.  It is a
bit too early however and there is a good place between tunables
initialization and IFUNC relocations where this can be done.

Verified that this does not regress the testsuite.

	* csu/libc-start.c [!ARCH_INIT_CPU_FEATURES]: Define
	ARCH_INIT_CPU_FEATURES.
	(LIBC_START_MAIN): Call it.
	* sysdeps/unix/sysv/linux/aarch64/libc-start.c
	(__libc_start_main): Remove.
	(ARCH_INIT_CPU_FEATURES): New macro.
	* sysdeps/x86/libc-start.c (__libc_start_main): Remove.
	(ARCH_INIT_CPU_FEATURES): New macro.
2017-05-31 06:38:33 +05:30
H.J. Lu
9c450f6f6f x86: Don't include cacheinfo.c in ld.so
Since cacheinfo.c isn't used by ld.so, there is no need to include it
in ld.so.

	* sysdeps/x86/cacheinfo.c: Skip if not in libc.
2017-05-24 06:33:43 -07:00
H.J. Lu
7c1d722554 x86: Use __get_cpu_features to get cpu_features
Remove is_intel, is_amd and max_cpuid macros.  Use __get_cpu_features
to get cpu_features instead.

	* sysdeps/x86/cacheinfo.c (is_intel): Removed.
	(is_amd): Likewise.
	(max_cpuid): Likewise.
	(__cache_sysconf): Use __get_cpu_features to get cpu_features.
	(init_cacheinfo): Likewise.
2017-05-24 06:28:52 -07:00
Adhemerval Zanella
eab380d8ec Move shared pthread definitions to common headers
This patch removes all the replicated pthread definition accross the
architectures and consolidates it on shared headers.  The new
organization is as follow:

  * Architecture specific definition (such as pthread types sizes) are
    place in the new pthreadtypes-arch.h header in arch specific path.

  * All shared structure definition are moved to a common NPTL header
    at sysdeps/nptl/bits/pthreadtypes.h (with now includes the arch
    specific one for internal definitions).

  * Also, for C11 future thread support, both mutex and condition
    definition are placed in a common header at
    sysdeps/nptl/bits/thread-shared-types.h.

It is also a refactor patch without expected functional changes.
Checked with a build for all major ABI (aarch64-linux-gnu, alpha-linux-gnu,
arm-linux-gnueabi, i386-linux-gnu, ia64-linux-gnu,
m68k-linux-gnu, microblaze-linux-gnu, mips{64}-linux-gnu, nios2-linux-gnu,
powerpc{64le}-linux-gnu, s390{x}-linux-gnu, sparc{64}-linux-gnu,
tile{pro,gx}-linux-gnu, and x86_64-linux-gnu).

	* posix/Makefile (headers): Add pthreadtypes-arch.h and
	thread-shared-types.h.
	* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h: New file: arch
	specific thread definition.
	* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/arm/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/mips/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/s390/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/sh/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/tile/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/x86/nptl/bits/pthreadtypes-arch.h: Likewise.
	* sysdeps/nptl/bits/thread-shared-types.h: New file: shared
	thread definition between POSIX and C11.
	* sysdeps/aarch64/nptl/bits/pthreadtypes.h.: Remove file.
	* sysdeps/alpha/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/arm/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/hppa/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/m68k/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/microblaze/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/mips/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/nios2/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/ia64/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/powerpc/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/s390/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/sh/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/sparc/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/tile/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/x86/nptl/bits/pthreadtypes.h: Likewise.
	* sysdeps/nptl/bits/pthreadtypes.h: New file: common thread
	definitions shared across all architectures.
2017-05-09 17:49:17 -03:00
H.J. Lu
1432d38ea0 x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391]
dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very
early during startup.  They are used by dynamic linker to determine
platform and build an array of hardware capability names, which are
added to search path when loading shared object.  dl_platform and
dl_hwcap are unused on x86-64.  On i386, i386, i486, i586 and i686
platforms were supported and only SSE2 capability was used.

On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and
processor capabilities is obsolete since all information is available
in dl_x86_cpu_features.  This patch sets dl_platform and dl_hwcap from
dl_x86_cpu_features in dynamic linker.  On i386, the available plaforms
are changed to i586 and i686 since i386 has been deprecated.  On x86-64,
the available plaforms are haswell, which is for Haswell class processors
with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which
is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and
AVX512PF.  A capability, avx512_1, is also added to x86-64 for AVX512
ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL.

	[BZ #21391]
	* sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]:
	Only call init_cpu_features.
	[!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed.
	* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
	* sysdeps/i386/dl-procinfo.h: Removed.
	* sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include
	<sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>.  Include
	<sysdeps/x86/dl-procinfo.h>.
	(_dl_procinfo): Replace _DL_HWCAP_COUNT with 32.
	* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]:
	Include <sysdeps/x86/dl-procinfo.h> instead of
	 <sysdeps/generic/dl-procinfo.h>.
	* sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>.
	(init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask.
	* sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New.
	(bit_cpu_MOVBE): Likewise.
	(bit_cpu_BMI1): Likewise.
	(bit_cpu_BMI2): Likewise.
	(index_cpu_BMI1): Likewise.
	(index_cpu_BMI2): Likewise.
	(index_cpu_LZCNT): Likewise.
	(index_cpu_MOVBE): Likewise.
	(index_cpu_POPCNT): Likewise.
	(reg_BMI1): Likewise.
	(reg_BMI2): Likewise.
	(reg_LZCNT): Likewise.
	(reg_MOVBE): Likewise.
	(reg_POPCNT): Likewise.
	* sysdeps/x86/dl-hwcap.h: New file.
	* sysdeps/x86/dl-procinfo.h: Likewise.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New.
	(_dl_x86_platforms): Likewise.
2017-05-03 13:44:35 -07:00
H.J. Lu
4cb334c4d6 x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
On Skylake server, AVX512 load/store instructions in memcpy/memset may
lead to lower CPU turbo frequency in certain situations.  Use of AVX2
in memcpy/memset has been observed to have improved overall performance
in many workloads due to the higher frequency.

Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
used on Skylake server.

	[BZ #21396]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Prefer_No_AVX512 if AVX512ER isn't available.
	* sysdeps/x86/cpu-features.h (bit_arch_Prefer_No_AVX512): New.
	(index_arch_Prefer_No_AVX512): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Don't use
	AVX512 version if Prefer_No_AVX512 is set.
	* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Likewise.
	* sysdeps/x86_64/multiarch/memmove_chk.S (__memmove_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
	* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/memset.S (memset): Likewise.
	* sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk):
	Likewise.
2017-04-18 14:01:45 -07:00
H.J. Lu
1c53cb49de x86: Set Prefer_No_VZEROUPPER if AVX512ER is available
AVX512ER won't be implemented in any Xeon processors and will be in
all Xeon Phi processors.  Don't check CPU model number when setting
Prefer_No_VZEROUPPER for Xeon Phi.  Instead, set Prefer_No_VZEROUPPER
if AVX512ER is available.  It works with current and future Xeon Phi
and non-Xeon Phi processors.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Prefer_No_VZEROUPPER if AVX512ER is available.
	* sysdeps/x86/cpu-features.h
	(bit_cpu_AVX512PF): New.
	(bit_cpu_AVX512ER): Likewise.
	(bit_cpu_AVX512CD): Likewise.
	(bit_cpu_AVX512BW): Likewise.
	(bit_cpu_AVX512VL): Likewise.
	(index_cpu_AVX512PF): Likewise.
	(index_cpu_AVX512ER): Likewise.
	(index_cpu_AVX512CD): Likewise.
	(index_cpu_AVX512BW): Likewise.
	(index_cpu_AVX512VL): Likewise.
	(reg_AVX512PF): Likewise.
	(reg_AVX512ER): Likewise.
	(reg_AVX512CD): Likewise.
	(reg_AVX512BW): Likewise.
	(reg_AVX512VL): Likewise.
2017-04-18 08:27:32 -07:00
Adhemerval Zanella
38efe8c5a5 Consolidate pthreadtype.h placementConsolidate pthreadtype.h placement
This patch moves all arch specific pthreadtypes.h to a similar path
for all architectures (sysdeps/unix/sysv/<arch>/bits).  No functional
or build change is expected.  The idea is mainly to organize the
header placement for all architectures.

Checked with a build for all major ABI (aarch64-linux-gnu, alpha-linux-gnu,
arm-linux-gnueabi, i386-linux-gnu, ia64-linux-gnu,
m68k-linux-gnu, microblaze-linux-gnu [1], mips{64}-linux-gnu, nios2-linux-gnu,
powerpc{64le}-linux-gnu, s390{x}-linux-gnu, sparc{64}-linux-gnu,
tile{pro,gx}-linux-gnu, and x86_64-linux-gnu).

	* sysdeps/unix/sysv/linux/x86/Implies: New file.
	* sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h: Move to ...
	* sysdeps/alpha/nptl/bits/pthreadtypes.h: ... here.
	* sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h: Move to ...
	* sysdeps/powerpc/nptl/bits/pthreadtypes.h: ... here.
	* sysdeps/x86/bits/pthreadtypes.h: Move to ...
	* sysdeps/x86/nptl/bits/pthreadtypes.h: ... here.
2017-04-10 17:33:10 -03:00
H.J. Lu
fda19e0438 Add sysdeps/x86/dl-procinfo.c
Add sysdeps/x86/dl-procinfo.c for x86 version of processor capability
information to reduce duplication between i386 and x86_64 dl-procinfo.c.

	* sysdeps/i386/dl-procinfo.c: Include
	<sysdeps/x86/dl-procinfo.c>.
	* sysdeps/x86_64/dl-procinfo.c: Likewise.
	* sysdeps/x86/dl-procinfo.c: New file.
2017-04-10 12:01:45 -07:00
H.J. Lu
bf7730194f Check if SSE is available with HAS_CPU_FEATURE
Similar to other CPU feature checks, check if SSE is available with
HAS_CPU_FEATURE.

	* sysdeps/i386/fpu/fclrexcpt.c (__feclearexcept): Use
	HAS_CPU_FEATURE to check for SSE.
	* sysdeps/i386/fpu/fedisblxcpt.c (fedisableexcept): Likewise.
	* sysdeps/i386/fpu/feenablxcpt.c (feenableexcept): Likewise.
	* sysdeps/i386/fpu/fegetenv.c (__fegetenv): Likewise.
	* sysdeps/i386/fpu/fegetmode.c (fegetmode): Likewise.
	* sysdeps/i386/fpu/feholdexcpt.c (__feholdexcept): Likewise.
	* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Likewise.
	* sysdeps/i386/fpu/fesetmode.c (fesetmode): Likewise.
	* sysdeps/i386/fpu/fesetround.c (__fesetround): Likewise.
	* sysdeps/i386/fpu/feupdateenv.c (__feupdateenv): Likewise.
	* sysdeps/i386/fpu/fgetexcptflg.c (__fegetexceptflag): Likewise.
	* sysdeps/i386/fpu/fsetexcptflg.c (__fesetexceptflag): Likewise.
	* sysdeps/i386/fpu/ftestexcept.c (fetestexcept): Likewise.
	* sysdeps/i386/setfpucw.c (__setfpucw): Likewise.
	* sysdeps/x86/cpu-features.h (bit_cpu_SSE): New.
	(index_cpu_SSE): Likewise.
	(reg_SSE): Likewise.
2017-04-07 07:44:59 -07:00
H.J. Lu
b170d2e7ab Use CPU_FEATURES_CPU_P to check if AVX is available
Don't use bit_cpu_AVX directly.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Check AVX with
	CPU_FEATURES_CPU_P.
2017-03-17 11:38:13 -07:00
Joseph Myers
2072f5c34e Remove C++ namespace handling from glibc headers.
glibc headers include some code (not particularly consistent or
systematic) to put various declarations in C++ namespaces std and
__c99, if _GLIBCPP_USE_NAMESPACES is defined.

As noted in <https://gcc.gnu.org/ml/libstdc++/2017-03/msg00025.html>,
this macro was removed from libstdc++ in 2000.  I don't expect
compilation with such old versions of libstdc++ to work with current
glibc headers anyway (whereas old *binaries* are expected to stay
working with current glibc); this patch (which should be a no-op with
any libstdc++ version postdating that removal) removes all this code
from the glibc headers.

The begin-end-check.pl test, whose comments say it is about checking
these namespace macro calls, is also removed.  The code in that test
would have covered __BEGIN_DECLS / __END_DECLS as well, but if those
weren't properly matched it would show up with the
check-installed-headers-cxx tests, so I don't think there is an actual
use for keeping begin-end-check.pl with the namespace code removed.

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).

	* misc/sys/cdefs.h (__BEGIN_NAMESPACE_STD): Remove macro.
	(__END_NAMESPACE_STD): Likewise.
	(__USING_NAMESPACE_STD): Likewise.
	(__BEGIN_NAMESPACE_C99): Likewise.
	(__END_NAMESPACE_C99): Likewise.
	(__USING_NAMESPACE_C99): Likewise.
	* math/math.h (_Mdouble_BEGIN_NAMESPACE): Do not define and
	undefine macro.
	(_Mdouble_END_NAMESPACE): Likewise.
	* ctype/ctype.h: Do not handle C++ namespaces.
	* libio/bits/stdio-ldbl.h: Likewise.
	* libio/stdio.h: Likewise.
	* locale/locale.h: Likewise.
	* math/bits/mathcalls.h: Likewise.
	* setjmp/setjmp.h: Likewise.
	* signal/signal.h: Likewise.
	* stdlib/bits/stdlib-float.h: Likewise.
	* stdlib/bits/stdlib-ldbl.h: Likewise.
	* stdlib/stdlib.h: Likewise.
	* string/string.h: Likewise.
	* sysdeps/x86/fpu/bits/mathinline.h: Likewise.
	* time/bits/types/clock_t.h: Likewise.
	* time/bits/types/struct_tm.h: Likewise.
	* time/bits/types/time_t.h: Likewise.
	* time/time.h: Likewise.
	* wcsmbs/bits/wchar-ldbl.h: Likewise.
	* wcsmbs/uchar.h: Likewise.
	* wcsmbs/wchar.h: Likewise.
	[_GLIBCPP_USE_NAMESPACES] (wint_t): Remove conditional definition.
	* wctype/wctype.h: Do not handle C++ namespaces.
	* scripts/begin-end-check.pl: Remove.
	* Makefile (installed-headers): Likewise.
	(tests-special): Do not add $(objpfx)begin-end-check.out.
	($(objpfx)begin-end-check.out): Remove.
2017-03-16 13:31:57 +00:00
Joseph Myers
ffe308e4fc Fix test-math-vector-sincos.h aliasing.
x86_64 libmvec tests have been failing to build lately with GCC
mainline with -Wuninitialized errors, and Markus Trippelsdorf traced
this to an aliasing issue
<https://sourceware.org/ml/libc-alpha/2017-03/msg00169.html>.

This patch fixes the aliasing issue, so that the vectors-of-pointers
are initialized using a union instead of pointer casts.  This also
fixes the testsuite build failures with GCC mainline.

Tested for x86_64 (full testsuite with GCC 6; testsuite build with GCC
mainline with build-many-glibcs.py).

	* sysdeps/x86/fpu/test-math-vector-sincos.h (INIT_VEC_PTRS_LOOP):
	Use a union when storing pointers.
	(VECTOR_WRAPPER_fFF_2): Do not take address of integer vector and
	cast result when passing to INIT_VEC_PTRS_LOOP.
	(VECTOR_WRAPPER_fFF_3): Likewise.
	(VECTOR_WRAPPER_fFF_4): Likewise.
2017-03-15 17:32:46 +00:00
H.J. Lu
52ac22365a Use index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit
* sysdeps/x86/cpu-features.c (init_cpu_features): Use
	index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit.
2017-02-17 11:53:26 -08:00
Torvald Riegel
cc25c8b4c1 New pthread rwlock that is more scalable.
This replaces the pthread rwlock with a new implementation that uses a
more scalable algorithm (primarily through not using a critical section
anymore to make state changes).  The fast path for rdlock acquisition and
release is now basically a single atomic read-modify write or CAS and a few
branches.  See nptl/pthread_rwlock_common.c for details.

	* nptl/DESIGN-rwlock.txt: Remove.
	* nptl/lowlevelrwlock.sym: Remove.
	* nptl/Makefile: Add new tests.
	* nptl/pthread_rwlock_common.c: New file.  Contains the new rwlock.
	* nptl/pthreadP.h (PTHREAD_RWLOCK_PREFER_READER_P): Remove.
	(PTHREAD_RWLOCK_WRPHASE, PTHREAD_RWLOCK_WRLOCKED,
	PTHREAD_RWLOCK_RWAITING, PTHREAD_RWLOCK_READER_SHIFT,
	PTHREAD_RWLOCK_READER_OVERFLOW, PTHREAD_RWLOCK_WRHANDOVER,
	PTHREAD_RWLOCK_FUTEX_USED): New.
	* nptl/pthread_rwlock_init.c (__pthread_rwlock_init): Adapt to new
	implementation.
	* nptl/pthread_rwlock_rdlock.c (__pthread_rwlock_rdlock_slow): Remove.
	(__pthread_rwlock_rdlock): Adapt.
	* nptl/pthread_rwlock_timedrdlock.c
	(pthread_rwlock_timedrdlock): Adapt.
	* nptl/pthread_rwlock_timedwrlock.c
	(pthread_rwlock_timedwrlock): Adapt.
	* nptl/pthread_rwlock_trywrlock.c (pthread_rwlock_trywrlock): Adapt.
	* nptl/pthread_rwlock_tryrdlock.c (pthread_rwlock_tryrdlock): Adapt.
	* nptl/pthread_rwlock_unlock.c (pthread_rwlock_unlock): Adapt.
	* nptl/pthread_rwlock_wrlock.c (__pthread_rwlock_wrlock_slow): Remove.
	(__pthread_rwlock_wrlock): Adapt.
	* nptl/tst-rwlock10.c: Adapt.
	* nptl/tst-rwlock11.c: Adapt.
	* nptl/tst-rwlock17.c: New file.
	* nptl/tst-rwlock18.c: New file.
	* nptl/tst-rwlock19.c: New file.
	* nptl/tst-rwlock2b.c: New file.
	* nptl/tst-rwlock8.c: Adapt.
	* nptl/tst-rwlock9.c: Adapt.
	* sysdeps/aarch64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/arm/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/hppa/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/ia64/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/m68k/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/microblaze/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/mips/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/nios2/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/s390/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/sh/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/sparc/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/tile/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* sysdeps/unix/sysv/linux/alpha/bits/pthreadtypes.h
	(pthread_rwlock_t): Adapt.
	* sysdeps/unix/sysv/linux/powerpc/bits/pthreadtypes.h
	(pthread_rwlock_t): Adapt.
	* sysdeps/x86/bits/pthreadtypes.h (pthread_rwlock_t): Adapt.
	* nptl/nptl-printers.py (): Adapt.
	* nptl/nptl_lock_constants.pysym: Adapt.
	* nptl/test-rwlock-printers.py: Adapt.
	* nptl/test-rwlockattr-printers.c: Adapt.
	* nptl/test-rwlockattr-printers.py: Adapt.
2017-01-10 11:50:17 +01:00
Joseph Myers
bfff8b1bec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00