This patch optimizes strcpy for ppc64/power7 for unaligned source or
destination address. The source or destination address is aligned
to doubleword and data is shifted based on the alignment and
added with the previous loaded data to be written as a doubleword.
For each load, cmpb instruction is used for faster null check.
The word aligned optimization is also removed, since the new unaligned
code path shows better results handling word-aligned strings.
More combination of unaligned inputs is also added in benchtest
to measure the improvement.The new optimization shows 2 to 80% of
performance improvement for longer string though it does not show
big difference on string size less than 16 due to additional checks.
The natural fix for some linknamespace test failures, where C90 libm
functions call C99 <fenv.h> functions, is to make fe* into weak
aliases for __fe* and call __fe* from within libm as needed.
To do this, the __fe* names need to be available for that purpose -
that is, they must not be used for something other than aliases of
fe*. On powerpc, however, __fegetround is an inline function in
fenv_libc.h, with no corresponding fegetround inline function;
fegetround has an equivalent macro expansion in bits/fenvinline.h, but
that is disabled if __NO_MATH_INLINES (which is defined for building
libm).
I see no need for that disabling; it's not even clear that
__NO_MATH_INLINES should affect <fenv.h>, and the results of
fegetround are completely defined so there is no semantic effect of
that disabling at all outside glibc. The x86 inline feraiseexcept is
conditioned on __USE_EXTERN_INLINES not __NO_MATH_INLINES (but that's
an inline function rather than a macro).
This patch removes the __NO_MATH_INLINES conditional on that
fegetround macro, so resulting in it being expanded inline inside
glibc. In turn, this means that direct calls to __fegetround from C99
functions in ldbl-128ibm can be changed to calls to fegetround, so
that nofpu fenv_libc.h files don't need to define __fegetround at all
and, by changing ldbl-128ibm files to use <fenv.h> not <fenv_libc.h>,
non-e500 nofpu no longer needs an fenv_libc.h file.
The other macros in fenvinline.h are left conditional on
__NO_MATH_INLINES, although since the only case where this should make
a difference is one involving undefined behavior (if the argument to
the function is not a valid exception macro).
The out-of-line definition for fegetround uses __fegetround (the
inline function removed by this patch). So this continues to work,
the fenvinline.h header is made to define __fegetround, and then to
define fegetround to call __fegetround.
Tested for powerpc32 (hard float) that installed stripped shared
libraries are unchanged by this patch; also tested that powerpc-nofpu
build still works. (This patch does not itself fix any bugs; it
simply cleans things up in preparation for separate bug fixes.)
* sysdeps/powerpc/bits/fenvinline.h (fegetround): Rename macro to
__fegetround and redefine to call __fegetround. Remove condition
on [!__NO_MATH_INLINES].
* sysdeps/powerpc/fpu/fenv_libc.h (__fegetround): Remove inline
function.
* sysdeps/powerpc/nofpu/fenv_libc.h: Remove file.
* sysdeps/powerpc/powerpc32/e500/nofpu/fenv_libc.h (__fegetround):
Remove macro.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Include <fenv.h>
instead of <fenv_libc.h>.
(__llrintl): Call fegetround instead of __fegetround.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Include <fenv.h>
instead of <fenv_libc.h>.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
(__lrintl): Call fegetround instead of __fegetround.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Include <fenv.h>
instead of <fenv_libc.h>.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
(__rintl): Call fegetround instead of __fegetround.
True multi-dimensional arrays were introduced in awk 4.0 and we
support awk versions as early as 3.12. Use a single subscript of the
form prefix_conf instead of two dimensions to work around this
limitation. We also need one additional array of just the conf names
subscripted by the prefix_conf to print the names for the
specifications.
* scripts/gen-posix-conf-vars.awk: Don't use multi-dimensional
arrays.
PI_STATIC_AND_HIDDEN is always defined for i386. There is no need to
check PI_STATIC_AND_HIDDEN in sysdeps/i386/dl-machine.h.
[BZ #17775]
* sysdeps/i386/dl-machine.h (PI_STATIC_AND_HIDDEN): Removed.
(elf_machine_dynamic) [!PI_STATIC_AND_HIDDEN]: Likewise.
(elf_machine_load_address) [!PI_STATIC_AND_HIDDEN]: Likewise.
Fixed 3 "make check" failures on glibc 32bit built by gcc 5.0 due to EBX
was enabled for allocation:
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00892.html
Tests elf/tst-tls3, elf/tst-execstack-needed, elf/tst-execstack-prog
were failed because EBX was used as PIC register.
* sysdeps/i386/tls-macros.h: Include <features.h>.
(TLS_LE): Use non-PIC version for GCC >= 5.0.
(TLS_IE): Likewise.
(TLS_LD): Likewise.
(TLS_GD): Likewise.
* sysdeps/unix/sysv/linux/i386/sysdep.h (check_consistency): Don't
define for GCC >= 5.0.
Due to tile missing a bunch of FP exception and rounding
support, the tests generate warnings. These changes fix the
warnings by just not compiling some unused functions, and
adding some attribute ((unused)) tags.
Various C90 and UNIX98 libm functions call feraiseexcept, which is not
in those standards. This causes linknamespace test failures - except
on x86 / x86_64, where feraiseexcept is inline (for the relevant
constant arguments) in bits/fenv.h.
This patch fixes this by making those functions call __feraiseexcept
instead. All changes are applied to all architectures rather than
considering the possibility that some might not be needed in some
cases (e.g. x86) as it seems most maintainable to keep architectures
consistent.
Where __feraiseexcept does not exist, it is added, with feraiseexcept
made a weak alias; where it is a strong alias, it is made weak.
libm_hidden_def / libm_hidden_proto are used with __feraiseexcept
(this might in some cases improve code generation for existing calls
to __feraiseexcept in some code on some architectures). Where there
are dummy feraiseexcept macros (on architectures without
floating-point exceptions support, to avoid compile errors from
references to undefined FE_* macros), corresponding dummy
__feraiseexcept macros are added. And on x86, to ensure
__feraiseexcept calls still get inlined, the inline function in
bits/fenv.h is refactored so that most of it can be reused in an
inline __feraiseexcept in a separate include/bits/fenv.h.
Calls are changed in C90/UNIX98 functions, but generally not in
functions missing from those standards. They are also changed in
libc_fe* functions (on the basis that those might be used in any libm
function), and in feupdateenv (on the same basis - may be used, via
default libc_*, in any libm function - of course feupdateenv will need
changing to __feupdateenv in a subsequent patch to make that fully
namespace-clean).
No __feraiseexcept is added corresponding to the feraiseexcept in
powerpc bits/fenvinline.h, because that macro definition is
conditional on !defined __NO_MATH_INLINES, and glibc libm is built
with -D__NO_MATH_INLINES, so changing internal calls to use
__feraiseexcept should make no difference.
Tested for x86_64 (testsuite; the only change in disassembly of
installed shared libraries is a slight code reordering in clog10, of
no apparent significance). Also tested for MIPS, where (in the
configuration tested) it eliminates math.h linknamespace failures for
n32 and n64 (some for o32 remain because of other issues).
[BZ #17723]
* include/fenv.h (__feraiseexcept): Use libm_hidden_proto.
* math/fraiseexcpt.c (__feraiseexcept): Use libm_hidden_def.
* sysdeps/aarch64/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/arm/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/hppa/fpu/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/i386/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/ia64/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/m68k/coldfire/fpu/fraiseexcpt.c (feraiseexcept):
Likewise.
* sysdeps/microblaze/math_private.h (__feraiseexcept): New macro.
* sysdeps/mips/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/powerpc/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/powerpc/nofpu/fraiseexcpt.c (__feraiseexcept): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/fraiseexcpt.c
(__feraiseexcept): Likewise.
* sysdeps/s390/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/sh/sh4/fpu/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/sparc/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/tile/math_private.h (__feraiseexcept): New macro.
* sysdeps/unix/sysv/linux/alpha/fraiseexcpt.S (__feraiseexcept):
Use libm_hidden_def.
* sysdeps/x86_64/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
(feraiseexcept): Define as weak not strong alias. Use
libm_hidden_weak.
* sysdeps/x86/fpu/bits/fenv.h (__feraiseexcept_invalid_divbyzero):
New inline function. Factored out of ...
(feraiseexcept): ... here. Use __feraiseexcept_invalid_divbyzero.
* sysdeps/x86/fpu/include/bits/fenv.h: New file.
* math/e_scalb.c (invalid_fn): Call __feraiseexcept instead of
feraiseexcept.
* math/w_acos.c (__acos): Likewise.
* math/w_asin.c (__asin): Likewise.
* math/w_ilogb.c (__ilogb): Likewise.
* math/w_j0.c (y0): Likewise.
* math/w_j1.c (y1): Likewise.
* math/w_jn.c (yn): Likewise.
* math/w_log.c (__log): Likewise.
* math/w_log10.c (__log10): Likewise.
* sysdeps/aarch64/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/aarch64/fpu/math_private.h
(libc_feupdateenv_test_aarch64): Likewise.
* sysdeps/alpha/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/arm/fenv_private.h (libc_feupdateenv_test_vfp): Likewise.
* sysdeps/arm/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/ia64/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/m68k/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/mips/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrt): Likewise.
* sysdeps/s390/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sh/sh4/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sparc/fpu/feupdateenv.c (__feupdateenv): Likewise.
These new memcpy functions are the 32-bit version of x86_64 SSE2 unaligned
memcpy. Memcpy average performace benefit is 18% on Silvermont, other
platforms also improved about 35%, benchmarked on Silvermont, Haswell, Ivy
Bridge, Sandy Bridge and Westmere, performance results attached in
https://sourceware.org/ml/libc-alpha/2014-07/msg00157.html
* sysdeps/i386/i686/multiarch/bcopy-sse2-unaligned.S: New file.
* sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove-sse2-unaligned.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy-sse2-unaligned.S: Likewise.
* sysdeps/i386/i686/multiarch/bcopy.S: Select the sse2_unaligned
version if bit_Fast_Unaligned_Load is set.
* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/i386/i686/multiarch/Makefile (sysdep_routines): Add
bcopy-sse2-unaligned, memcpy-sse2-unaligned,
memmove-sse2-unaligned and mempcpy-sse2-unaligned.
* sysdeps/i386/i686/multiarch/ifunc-impl-list.c (MAX_IFUNC): Set
to 4.
(__libc_ifunc_impl_list): Test __bcopy_sse2_unaligned,
__memmove_chk_sse2_unaligned, __memmove_sse2_unaligned,
__memcpy_chk_sse2_unaligned, __memcpy_sse2_unaligned,
__mempcpy_chk_sse2_unaligned, and __mempcpy_sse2_unaligned.
This patch adds support to generate the spec array in getconf from the
conf.list. The generated code is mostly unchanged. the only changes
are due to the change in layout of the spec and val arrays in the ELF.
The val array can also be auto-generated from posix-conf-vars.list
once the remaining macros are added to it.
* posix/posix-conf-vars.list (SPEC:XBS5): Add sysconf prefix.
* posix/confstr.c: Define NEED_SPEC_ARRAY to 0.
* posix/posix-envs.def: Likewise.
* sysdeps/posix/sysconf.c: Likewise.
* posix/getconf.c: Define NEED_SPEC_ARRAY to 1.
(specs): Remove array.
* scripts/gen-posix-conf-vars.awk: Support generation of specs
array.
This fixes the remaining -Wundef warnings. Tested on x86_64.
* posix/posix-conf-vars.list: Add _POSIX sysconf namespace.
* sysdeps/posix/sysconf.c: Include posix-conf-vars.h.
(__sysconf): Use CONF_IS_* macros.
This patch adds a file posix-conf-vars.list that is used to generate
macros to determine if a macro is defined as set, unset or not
defined. gen-posix-conf-vars.awk processes this file and generates a
header (posix-conf-vars-def.h) with these macros. A new header
posix-conf-vars.h includes this generated header and defines accessor
macros for the generated macros.
Tested on x86_64.
* posix/Makefile (before-compile): Add posix-conf-vars-def.h.
($(objpfx)posix-conf-vars-def.h): New target.
* posix/posix-conf-vars.list: New file.
* posix/posix-conf-vars.h: New file.
* posix/confstr.c: Include posix-conf-vars.h.
(confstr): Use CONF_IS_* macros.
* posix/posix-envs.def: Include posix-conf-vars.h. Use
CONF_IS_* macros.
* scripts/gen-posix-conf-vars.awk: New file.
These avoid having tile generate real calls to the no-op
functions, which then causes linknamespace test failures.
It might make sense to factor all of these out into a common
header that can be shared by tile, microblaze, etc., but for
now just fix the test failures.
These definitions were added back before __ASSUME_POSIX_CPU_TIMERS
was removed. There used to be a vsyscall to clock_getres() in
maybe_syscall_settime_cpu(), but that function was removed in commit
26889eac. The presence of the vsyscall definitions means that platforms
that don't provide clock_getres as a vsyscall hit a symbol redefinition
warning in this file, becoming fatal with -Werror. Removing the
vsyscall definitions is the obvious fix.
No change to generated code on x86_64.
The symbol for HAVE_CLOCK_GETTIME_VSYSCALL was being
only conditionally defined under [SHARED]. However, it turns
out this causes a preprocessor symbol redefinition warning
when building clock_gettime.o. Move the symbol definition
down to make it unconditional, like other platforms do.
The _Unwind_GetCFA() routine returns a 64-bit value,
which we interpret as a pointer. Add an intermediate
cast to long so that in ILP32 mode we don't get a warning
about casting a wrong-sized integer to a pointer.
I missed this during the initial port. Some testing shows that
enabling this mode does, unsurprisingly, yield some nice speedups
on the math functions in question.
This patch makes __ASSUME_UTIMES hppa-specific, removing mentions of
the macro from architecture-independent code and code for other
architectures. (All other architectures either have the utimes
syscall in all relevant kernel versions, or use the asm-generic
interface so only have utimensat and won't get the utimes syscall.) A
similar approach is used to that used for futimesat for MicroBlaze: if
the kernel is recent enough that the utimes syscall can be assumed to
be present, use the implementation in terms of the utimes syscall, and
otherwise use the linux/generic implementation in terms of utimensat.
Tested x86_64 that the disassembly of installed shared libraries is
unchanged by the patch. Not tested for hppa.
* sysdeps/unix/sysv/linux/kernel-features.h (__ASSUME_UTIMES): Do
not define.
* sysdeps/unix/sysv/linux/utimes.c: Do not include
<kernel-features.h>.
(__utimes) [__NR_utimes]: Make code unconditional.
(__utimes) [!__ASSUME_UTIMES]: Remove conditional code.
* sysdeps/unix/sysv/linux/aarch64/kernel-features.h
(__ASSUME_UTIMES): Do not undefine.
* sysdeps/unix/sysv/linux/tile/kernel-features.h
(__ASSUME_UTIMES): Likewise.
* sysdeps/unix/sysv/linux/hppa/kernel-features.h
(__ASSUME_UTIMES): Define for [__LINUX_KERNEL_VERSION >= 0x030e00]
instead of undefining for [__LINUX_KERNEL_VERSION < 0x030e00].
* sysdeps/unix/sysv/linux/hppa/utimes.c: New file.
[BZ #17747]
The y0/y1/yn and j0/j1/jn functions provided a strong_alias
to the "l"-suffixed variants when no long double support is
being compiled. This breaks namespace conformance when the
basename versions conform but the l-suffixed ones don't.
Fixed by making them weak aliases instead.
[BZ #17746]
The __builtin_expect() truncated a uint64_t to a 32-bit long
in ILP32 mode, discarding the high 32 bits, and potentially
missing the NUL terminator that we were searching for with SIMD
operations. Explicitly compare to zero to fix the problem.
Bug 17724 reports references to fesetround being brought in by
ldbl-128ibm rintl via references to __rintl from __kernel_standard_l.
Because all three __kernel_standard* functions are in the same file,
this gets brought in even though only the long double version
__kernel_standard_l needs __rintl, and the C90 functions use only
__kernel_standard.
This patch fixes this by splitting the three versions into separate
files; it's fine for long double functions to refer to fe* functions
directly, unless they get called by C90 double functions.
Tested for x86_64 (testsuite; the reordering of code means disassembly
of shared libraries can't usefully be compared). Tested for powerpc
that the relevant issue disappears from the linknamespace test
output.
[BZ #17724]
* sysdeps/ieee754/k_standard.c: Don't include <float.h>.
(__kernel_standard_f): Remove. Moved to k_standardf.c.
(__kernel_standard_l): Remove. Moved to k_standardl.c with
(char *) casts added.
* sysdeps/ieee754/k_standardf.c: New file.
* sysdeps/ieee754/k_standardl.c: Likewise.
* math/Makefile (libm-support): Remove k_standard.
(libm-calls): Add k_standard.
On Linux architectures using socketcall, the resolver ends up bringing
in strong symbols for bind and getsockname, which are not in
POSIX.1-1996. This causes linknamespace test failures:
FAIL: conform/POSIX/pthread.h/linknamespace
FAIL: conform/POSIX/sched.h/linknamespace
FAIL: conform/POSIX/time.h/linknamespace
These functions are defined as strong symbols with __bind and
__getsockname as weak aliases. This patch switches this to the other
way round by removing the NO_WEAK_ALIAS definitions and so letting the
default case in socket.S act; I see no reason for the existing
arrangements.
Tested for x86 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch).
[BZ #17733]
* sysdeps/unix/sysv/linux/bind.S (NO_WEAK_ALIAS): Do not define.
(__bind): Do not define as weak alias.
* sysdeps/unix/sysv/linux/getsockname.S (NO_WEAK_ALIAS): Do not
define.
(__getsockname): Do not define as weak alias.
The merge of the latest gettext code introduced changes to the yacc
parser source that are incompatible with versions of bison older
than 2.7. Add a configure check for the appropriate versions and
document the requirement in INSTALL.
ChangeLog:
2014-12-22 Will Newton <will.newton@linaro.org>
* manual/install.texi: Document that we require bison 2.7
or above.
* INSTALL: Regenerate.
* configure.ac: Use AC_CHECK_PROG_VER instead of
AC_PATH_PROG when checking for bison and check for
version 2.7 or above.
* configure: Regenerate.