On POWER8, unaligned memory accesses to cached memory has little impact
on performance as opposed to its ancestors.
It is disabled by default and will only be available when the tunable
glibc.tune.cached_memopt is set to 1.
__memcpy_power8_cached __memcpy_power7
============================================================
max-size=4096: 33325.70 ( 12.65%) 38153.00
max-size=8192: 32878.20 ( 11.17%) 37012.30
max-size=16384: 33782.20 ( 11.61%) 38219.20
max-size=32768: 33296.20 ( 11.30%) 37538.30
max-size=65536: 33765.60 ( 10.53%) 37738.40
* manual/tunables.texi (Hardware Capability Tunables): Document
glibc.tune.cached_memopt.
* sysdeps/powerpc/cpu-features.c: New file.
* sysdeps/powerpc/cpu-features.h: New file.
* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
_dl_powerpc_cpu_features.
* sysdeps/powerpc/dl-tunables.list: New file.
* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
(INIT_ARCH): Initialize use_aligned_memopt.
* sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED &&
IS_IN(rtld))]: Restrict dl_platform_init availability and
initialize CPU features used by tunables.
* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines):
Add memcpy-power8-cached.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add
__memcpy_power8_cached.
* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S:
New file.
Reviewed-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
These machine-dependent inline string functions have never been on by
default, and even if they were a good idea at the time they were
introduced, they haven't really been touched in ten to fifteen years
and probably aren't a good idea on current-gen processors. Current
thinking is that this class of optimization is best left to the
compiler.
* bits/string.h, string/bits/string.h
* sysdeps/aarch64/bits/string.h
* sysdeps/m68k/m680x0/m68020/bits/string.h
* sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h
* sysdeps/x86/bits/string.h: Delete file.
* string/string.h: Don't include bits/string.h.
* string/bits/string3.h: Rename to bits/string_fortified.h.
No need to undef various symbols that the removed headers
might have defined as macros.
* string/Makefile (headers): Remove bits/string.h, change
bits/string3.h to bits/string_fortified.h.
* string/string-inlines.c: Update commentary. Remove definitions
of various macros that nothing looks at anymore. Don't directly
include bits/string.h. Set _STRING_INLINE_unaligned here, based on
compiler-predefined macros.
* string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY
_is_ defined, provide internal hidden alias __strncat.
* include/string.h: Declare internal hidden alias __strncat.
Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is
not defined.
* include/bits/string3.h: Rename to bits/string_fortified.h,
update to match above.
* sysdeps/i386/string-inlines.c: Define compat symbols for
everything formerly defined by sysdeps/x86/bits/string.h.
Make existing definitions into compat symbols as well.
Remove some no-longer-necessary messing around with macros.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
* sysdeps/s390/multiarch/mempcpy.c
No need to define _HAVE_STRING_ARCH_mempcpy.
Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT.
* sysdeps/i386/i686/multiarch/strncat-c.c
* sysdeps/s390/multiarch/strncat-c.c
* sysdeps/x86_64/multiarch/strncat-c.c
Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.
The current s390 ifunc resolver for vector optimized functions and the common
libc_ifunc macro in include/libc-symbols.h uses something like that to generate ifunc'ed functions:
extern void *__resolve___strlen(unsigned long int dl_hwcap) asm (strlen);
asm (".type strlen, %gnu_indirect_function");
This leads to false debug information:
objdump --dwarf=info libc.so:
...
<1><1e6424>: Abbrev Number: 43 (DW_TAG_subprogram)
<1e6425> DW_AT_external : 1
<1e6425> DW_AT_name : (indirect string, offset: 0x1146e): __resolve___strlen
<1e6429> DW_AT_decl_file : 1
<1e642a> DW_AT_decl_line : 23
<1e642b> DW_AT_linkage_name: (indirect string, offset: 0x1147a): strlen
<1e642f> DW_AT_prototyped : 1
<1e642f> DW_AT_type : <0x1e4ccd>
<1e6433> DW_AT_low_pc : 0x998e0
<1e643b> DW_AT_high_pc : 0x16
<1e6443> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<1e6445> DW_AT_GNU_all_call_sites: 1
<1e6445> DW_AT_sibling : <0x1e6459>
<2><1e6449>: Abbrev Number: 44 (DW_TAG_formal_parameter)
<1e644a> DW_AT_name : (indirect string, offset: 0x1845): dl_hwcap
<1e644e> DW_AT_decl_file : 1
<1e644f> DW_AT_decl_line : 23
<1e6450> DW_AT_type : <0x1e4c8d>
<1e6454> DW_AT_location : 0x122115 (location list)
...
The debuginfo for the ifunc-resolver function contains the DW_AT_linkage_name
field, which names the real function name "strlen". If you perform an inferior
function call to strlen in lldb, then it fails due to something like that:
"error: no matching function for call to 'strlen'
candidate function not viable: no known conversion from 'const char [6]'
to 'unsigned long' for 1st argument"
The unsigned long is the dl_hwcap argument of the resolver function.
The strlen function itself has no debufinfo.
The s390 ifunc resolver for memset & co uses something like that:
asm (".globl FUNC"
".type FUNC, @gnu_indirect_function"
".set FUNC, __resolve_FUNC");
This way the debuginfo for the ifunc-resolver function does not conain the
DW_AT_linkage_name field and the real function has no debuginfo, too.
Using this strategy for the vector optimized functions leads to some troubles
for functions like strnlen. Here we have __strnlen and a weak alias strnlen.
The __strnlen function is the ifunc function, which is realized with the asm-
statement above. The weak_alias-macro can't be used here due to undefined symbol:
gcc ../sysdeps/s390/multiarch/strnlen.c -c ...
In file included from <command-line>:0:0:
../sysdeps/s390/multiarch/strnlen.c:28:24: error: ‘strnlen’ aliased to undefined symbol ‘__strnlen’
weak_alias (__strnlen, strnlen)
^
./../include/libc-symbols.h:111:26: note: in definition of macro ‘_weak_alias’
extern __typeof (name) aliasname __attribute__ ((weak, alias (#name)));
^
../sysdeps/s390/multiarch/strnlen.c:28:1: note: in expansion of macro ‘weak_alias’
weak_alias (__strnlen, strnlen)
^
make[2]: *** [build/string/strnlen.o] Error 1
As the __strnlen function is defined with asm-statements the function name
__strnlen isn't known by gcc. But the weak alias can also be done with an
asm statement to resolve this issue:
__asm__ (".weak strnlen\n\t"
".set strnlen,__strnlen\n");
In order to use the weak_alias macro, gcc needs to know the ifunc function. The
minimum gcc to build glibc is currently 4.7, which supports attribute((ifunc)).
See https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Function-Attributes.html.
It is only supported if gcc is configured with --enable-gnu-indirect-function
or gcc supports it by default for at least intel and s390x architecture.
This patch uses the old behaviour if gcc support is not available.
Usage of attribute ifunc is something like that:
__typeof (FUNC) FUNC __attribute__ ((ifunc ("__resolve_FUNC")));
Then gcc produces the same .globl, .type, .set assembler instructions like above.
And the debuginfo does not contain the DW_AT_linkage_name field and there is no
debuginfo for the real function, too.
But in order to get it work, there is also some extra work to do.
Currently, the glibc internal symbol on s390x e.g. __GI___strnlen is not the
ifunc symbol, but the fallback __strnlen_c symbol. Thus I have to omit the
libc_hidden_def macro in strnlen.c (here is the ifunc function __strnlen)
because it is already handled in strnlen-c.c (here is __strnlen_c).
Due to libc_hidden_proto (__strnlen) in string.h, compiling fails:
gcc ../sysdeps/s390/multiarch/strnlen.c -c ...
In file included from <command-line>:0:0:
../sysdeps/s390/multiarch/strnlen.c:53:24: error: ‘strnlen’ aliased to undefined symbol ‘__strnlen’
weak_alias (__strnlen, strnlen)
^
./../include/libc-symbols.h:111:26: note: in definition of macro ‘_weak_alias’
extern __typeof (name) aliasname __attribute__ ((weak, alias (#name)));
^
../sysdeps/s390/multiarch/strnlen.c:53:1: note: in expansion of macro ‘weak_alias’
weak_alias (__strnlen, strnlen)
^
make[2]: *** [build/string/strnlen.os] Error 1
I have to redirect the prototypes for __strnlen in string.h and create a copy
of the prototype for using as ifunc function:
__typeof (__redirect___strnlen) __strnlen __attribute__ ((ifunc ("__resolve_strnlen")));
weak_alias (__strnlen, strnlen)
This way there is no trouble with the internal __GI_* symbols.
Glibc builds fine with this construct and the debuginfo is "correct".
For functions without a __GI_* symbol like memccpy this redirection is not needed.
This patch adjusts the common libc_ifunc and libm_ifunc macro to use gcc
attribute ifunc. Due to this change, the macro users where the __GI_* symbol
does not target the ifunc symbol have to be prepared with the redirection
construct.
Furthermore a configure check to test gcc support is added. If it is not supported,
the old behaviour is used.
This patch also prepares the libc_ifunc macro to be useable in s390-ifunc-macro.
The s390 ifunc-resolver-functions do have an hwcaps parameter and not all
resolvers need the same initialization code. The next patch in this series
changes the s390 ifunc macros to use this common one.
ChangeLog:
* include/libc-symbols.h (__ifunc_resolver):
New macro is used by __ifunc* macros.
(__ifunc): New macro uses gcc attribute ifunc or inline assembly
depending on HAVE_GCC_IFUNC.
(libc_ifunc, libm_ifunc): Use __ifunc as base macro.
(libc_ifunc_redirected, libc_ifunc_hidden, libm_ifunc_init): New macro.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c:
Redirect ifunced function in header for using as type for ifunc function.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strstr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c:
Add libc_hidden_def() and use libc_ifunc_hidden() macro
instead of libc_ifunc() macro.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c: Likewise.
Various code in glibc uses __strnlen instead of strnlen for namespace
reasons. However, __strnlen does not use libc_hidden_proto /
libc_hidden_def (as is normally done for any function defined and
called within the same library, whether or not exported from the
library and whatever namespace it is in), so the compiler does not
know that those calls are to a function within libc.
This patch uses libc_hidden_proto / libc_hidden_def with __strnlen.
On x86_64, it makes no difference to the installed stripped shared
libraries. On 32-bit x86, it causes __strnlen calls to go to the same
place as strnlen calls (the fallback strnlen implementation), rather
than through a PLT entry for the strnlen IFUNC; I'm not sure of the
logic behind when calls from within libc should use IFUNCs versus when
they should go direct to a particular function implementation, but
clearly it doesn't make sense for strnlen and __strnlen to be handled
differently in this regard.
Tested for x86_64 and x86 (testsuite, and comparison of installed
shared libraries as described above).
* string/strnlen.c [!STRNLEN] (__strnlen): Use libc_hidden_def.
* include/string.h (__strnlen): Use libc_hidden_proto.
* sysdeps/aarch64/strnlen.S (__strnlen): Use libc_hidden_def.
* sysdeps/i386/i686/multiarch/strnlen-c.c [SHARED]
(libc_hidden_def): Define __GI___strnlen as well as __GI_strnlen.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-power7.S
(libc_hidden_def): Undefine and redefine.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-ppc32.c
[SHARED] (libc_hidden_def): Define __GI___strnlen as well as
__GI_strnlen.
* sysdeps/powerpc/powerpc32/power7/strnlen.S (__strnlen): Use
libc_hidden_def.
* sysdeps/tile/tilegx/strnlen.c (__strnlen): Likewise.
This patch fix the static build for strftime, which uses __wcschr.
Current powerpc32 implementation defines the __wcschr be an alias to
__wcschr_ppc32 and current implementation misses the correct alias for
static build.
It also changes the default wcschr.c logic so a IFUNC implementation
should just define WCSCHR and undefine the required alias/internal
definitions.
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation. The patch removes the internal IFUNC for wordcopy
symbols and uses local branches in the memmmove optimization instead.
Use of strftime, a C90 function, ends up bringing in wcschr, which is
not a C90 function. Although not a conformance bug (C90 reserves
wcs*), this is still contrary to glibc practice of avoiding relying on
those reservations; this patch arranges for the internal uses to use
__wcschr instead, with wcschr being a weak alias. This is more
complicated than some such patches because of the various IFUNC
definitions of wcschr (which include code redefining libc_hidden_def
in a way that involves creating __GI_wcschr manually and so also needs
to create __GI___wcschr after the change of internal uses to use
__wcschr).
Tested for x86_64 and 32-bit x86 (testsuite, and that disassembly of
installed shared libraries is unchanged by the patch).
2014-12-10 Joseph Myers <joseph@codesourcery.com>
Adhemerval Zanella <azanella@linux.vnet.ibm.com>
[BZ #17634]
* wcsmbs/wcschr.c [!WCSCHR] (wcschr): Define as __wcschr.
Undefine after defining function. Define as weak alias of
__wcschr. Use libc_hidden_weak.
* include/wchar.h (__wcschr): Declare. Use libc_hidden_proto.
* sysdeps/i386/i686/multiarch/wcschr-c.c [IS_IN (libc) && SHARED]
(libc_hidden_def): Also define __GI___wcschr alias.
* sysdeps/i386/i686/multiarch/wcschr.S (wcschr): Rename to
__wcschr and define as weak alias of __wcschr.
* sysdeps/powerpc/power6/wcschr.c [!WCSCHR] (WCSCHR): Define as
__wcschr.
[!WCSCHR] (DEFAULT_WCSCHR): Define.
[DEFAULT_WCSCHR] (__wcschr): Use libc_hidden_def.
[DEFAULT_WCSCHR] (wcschr): Define as weak alias of __wcschr. Use
libc_hidden_weak. Do not use libc_hidden_def.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr-ppc32.c
[IS_IN (libc) && SHARED] (libc_hidden_def): Also define
__GI___wcschr alias.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcschr.c
[IS_IN (libc)] (wcschr): Define as macro expanding to
__redirect_wcschr.
[IS_IN (libc)] (__wcschr_ppc): Use __redirect_wcschr in typeof.
[IS_IN (libc)] (__wcschr_power6): Likewise.
[IS_IN (libc)] (__wcschr_power7): Likewise.
[IS_IN (libc)] (__libc_wcschr): New. Define with libc_ifunc
instead of wcschr.
[IS_IN (libc)] (wcschr): Undefine and define as weak alias of
__libc_wcschr.
[!IS_IN (libc)] (libc_hidden_def): Do not undefine and redefine.
* sysdeps/powerpc/powerpc64/multiarch/wcschr.c (wcschr): Rename to
__wcschr and define as weak alias of __wcschr. Use
libc_hidden_builtin_def.
* sysdeps/x86_64/wcschr.S (wcschr): Rename to __wcschr and define
as weak alias of __wcschr. Use libc_hidden_weak.
* time/alt_digit.c (_nl_get_walt_digit): Use __wcschr instead of
wcschr.
* time/era.c (_nl_init_era_entries): Likewise.
* conform/Makefile (test-xfail-ISO/time.h/linknamespace): Remove
variable.
(test-xfail-XPG3/time.h/linknamespace): Likewise.
(test-xfail-XPG4/time.h/linknamespace): Likewise.
This patch fixes the build of C mempcpy and stpcpy by disabling the
redirection to __mempcpy and __stpcpy asm names if
NO_MEMPCPY_STPCPY_REDIRECT is defined, and defining that macro in the
relevant source files.
Tested for powerpc32 that the build is fixed.
* include/string.h [NO_MEMPCPY_STPCPY_REDIRECT] (mempcpy): Do not
redeclare with asm name.
[NO_MEMPCPY_STPCPY_REDIRECT] (stpcpy): Likewise.
* string/mempcpy.c (NO_MEMPCPY_STPCPY_REDIRECT): Define before
including <string.h>.
* string/stpcpy.c (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
[!NOT_IN_libc] (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
[!NOT_IN_libc] (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
[SHARED && !NOT_IN_libc] (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
Now that MEMCPY_OK_FOR_FWD_MEMMOVE should be define on memcopy.h there
is no need to specialized powerpc memmove implementation. This patch
moves the define set to powerpc memcopy and cleanup its definition on
powerpc code.
This patch fixes a similar issue to
736c304a1a, where for PPC32 if the symbol
is defined as hidden (memchr) then compiler will create a local branc
(symbol@local) and the linker will not create a required PLT call to
make the ifunc work. It changes the default hidden symbol (__GI_memchr)
to default memchr symbol for powerpc32 (__memchr_ppc32).
This patch changes de default symbol redirection for internal call of
memcpy, memset, memchr, and strlen to the IFUNC resolved ones. The
performance improvement is noticeable in algorithms that uses these
symbols extensible, like the regex functions.
This patch fixes an issue for powerpc32-fpu static build which fails
with an 'bzero' undefined reference. This patch adds bzero ifunc selector
for static builds and fixes the '__bzero_ppc' reference to default
memset symbol (since static memset build does not provide ifunc
selector).
Fixes BZ#16689.
This patch add a optimized isnan/isnanf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.