glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-12 14:20:13 +00:00

Author	SHA1	Message	Date
H.J. Lu	6a094c0ff1	x86-64: Regenerate libm-test-ulps for AVX512 mathvec tests Update libm-test-ulps for AVX512 mathvec tests by running “make regen-ulps” on Intel Xeon processor with AVX512. * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated. (cherry picked from commit `fcaaca412f`)	2017-11-13 13:31:18 +01:00
Adhemerval Zanella	5712f8db26	nptl: Add tests for internal pthread_mutex_t offsets This patch adds a new build test to check for internal fields offsets for user visible internal field. Although currently the only field which is statically initialized to a non zero value is pthread_mutex_t.__data.__kind value, the tests also check the offset of __kind, __spins, __elision (if supported), and __list internal member. A internal header (pthread-offset.h) is added to each major ABI with the reference value. Checked on x86_64-linux-gnu and with a build check for all affected ABIs (aarch64-linux-gnu, alpha-linux-gnu, arm-linux-gnueabihf, hppa-linux-gnu, i686-linux-gnu, ia64-linux-gnu, m68k-linux-gnu, microblaze-linux-gnu, mips64-linux-gnu, mips64-n32-linux-gnu, mips-linux-gnu, powerpc64le-linux-gnu, powerpc-linux-gnu, s390-linux-gnu, s390x-linux-gnu, sh4-linux-gnu, sparc64-linux-gnu, sparcv9-linux-gnu, tilegx-linux-gnu, tilegx-linux-gnu-x32, tilepro-linux-gnu, x86_64-linux-gnu, and x86_64-linux-x32). * nptl/pthreadP.h (ASSERT_PTHREAD_STRING, ASSERT_PTHREAD_INTERNAL_OFFSET): New macro. * nptl/pthread_mutex_init.c (__pthread_mutex_init): Add build time checks for internal pthread_mutex_t offsets. * sysdeps/aarch64/nptl/pthread-offsets.h (__PTHREAD_MUTEX_NUSERS_OFFSET, __PTHREAD_MUTEX_KIND_OFFSET, __PTHREAD_MUTEX_SPINS_OFFSET, __PTHREAD_MUTEX_ELISION_OFFSET, __PTHREAD_MUTEX_LIST_OFFSET): New macro. * sysdeps/alpha/nptl/pthread-offsets.h: Likewise. * sysdeps/arm/nptl/pthread-offsets.h: Likewise. * sysdeps/hppa/nptl/pthread-offsets.h: Likewise. * sysdeps/i386/nptl/pthread-offsets.h: Likewise. * sysdeps/ia64/nptl/pthread-offsets.h: Likewise. * sysdeps/m68k/nptl/pthread-offsets.h: Likewise. * sysdeps/microblaze/nptl/pthread-offsets.h: Likewise. * sysdeps/mips/nptl/pthread-offsets.h: Likewise. * sysdeps/nios2/nptl/pthread-offsets.h: Likewise. * sysdeps/powerpc/nptl/pthread-offsets.h: Likewise. * sysdeps/s390/nptl/pthread-offsets.h: Likewise. * sysdeps/sh/nptl/pthread-offsets.h: Likewise. * sysdeps/sparc/nptl/pthread-offsets.h: Likewise. * sysdeps/tile/nptl/pthread-offsets.h: Likewise. * sysdeps/x86_64/nptl/pthread-offsets.h: Likewise. Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> (cherry picked from commit `dff91cd45e`)	2017-11-07 10:57:49 -02:00
H.J. Lu	4b692dffb9	x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299 ] Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set GLRO(dl_platform) to NULL. [BZ #22299] * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set GLRO(dl_platform) to NULL. * sysdeps/x86_64/Makefile (tests): Add tst-platform-1. (modules-names): Add tst-platformmod-1 and x86_64/tst-platformmod-2. (CFLAGS-tst-platform-1.c): New. (CFLAGS-tst-platformmod-1.c): Likewise. (CFLAGS-tst-platformmod-2.c): Likewise. (LDFLAGS-tst-platformmod-2.so): Likewise. ($(objpfx)tst-platform-1): Likewise. ($(objpfx)tst-platform-1.out): Likewise. (tst-platform-1-ENV): Likewise. ($(objpfx)x86_64/tst-platformmod-2.os): Likewise. * sysdeps/x86_64/tst-platform-1.c: New file. * sysdeps/x86_64/tst-platformmod-1.c: Likewise. * sysdeps/x86_64/tst-platformmod-2.c: Likewise. (cherry picked from commit `4d916f0f12`)	2017-10-26 07:36:47 -07:00
H.J. Lu	f82a6fc223	x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265 ] In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector, mask and bound registers. It simplifies _dl_runtime_resolve and supports different calling conventions. ld.so code size is reduced by more than 1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles than saving and restoring vector and bound registers individually. Latency for _dl_runtime_resolve to lookup the function, foo, from one shared library plus libc.so: Before After Change Westmere (SSE)/fxsave 345 866 151% IvyBridge (AVX)/xsave 420 643 53% Haswell (AVX)/xsave 713 1252 75% Skylake (AVX+MPX)/xsavec 559 719 28% Skylake (AVX512+MPX)/xsavec 145 272 87% Ryzen (AVX)/xsavec 280 553 97% This is the worst case where portion of time spent for saving and restoring registers is bigger than majority of cases. With smaller _dl_runtime_resolve code size, overall performance impact is negligible. On IvyBridge, differences in build and test time of binutils with lazy binding GCC and binutils are noises. On Westmere, differences in bootstrap and "makc check" time of GCC 7 with lazy binding GCC and binutils are also noises. [BZ #21265] * sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET): New. * sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>. (get_common_indeces): Set xsave_state_size, xsave_state_full_size and bit_arch_XSAVEC_Usable if needed. (init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow and bit_arch_Use_dl_runtime_resolve_opt. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): Removed. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (bit_arch_Prefer_No_AVX512): Updated. (bit_arch_MathVec_Prefer_No_AVX512): Likewise. (bit_arch_XSAVEC_Usable): New. (STATE_SAVE_OFFSET): Likewise. (STATE_SAVE_MASK): Likewise. [__ASSEMBLER__]: Include <cpu-features-offsets.h>. (cpu_features): Add xsave_state_size and xsave_state_full_size. (index_arch_Use_dl_runtime_resolve_opt): Removed. (index_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_XSAVEC_Usable): New. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow. * sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables is enabled. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, _dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt, _dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec. * sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed. (DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT instead of VEC_SIZE. (REGISTER_SAVE_BND0): Removed. (REGISTER_SAVE_BND1): Likewise. (REGISTER_SAVE_BND3): Likewise. (REGISTER_SAVE_RAX): Always defined to 0. (VMOV): Removed. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_slow): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_avx512): Likewise. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (USE_FXSAVE): New. (_dl_runtime_resolve_fxsave): Likewise. (USE_XSAVE): Likewise. (_dl_runtime_resolve_xsave): Likewise. (USE_XSAVEC): Likewise. (_dl_runtime_resolve_xsavec): Likewise. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512): Removed. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (_dl_runtime_resolve_fxsave): New. (_dl_runtime_resolve_xsave): Likewise. (_dl_runtime_resolve_xsavec): Likewise. (cherry picked from commit `b52b0d793d`)	2017-10-22 07:41:00 -07:00
H.J. Lu	b2c78ae69e	x86: Add x86_64 to x86-64 HWCAP [BZ #22093 ] Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the "x86_64" subdirectory when loading a shared library. ld.so in glibc 2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based on supported ISAs. This led to shared library loading failure for shared libraries placed under the "x86_64" subdirectory. This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always search the "x86_64" subdirectory when loading a shared library. NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi" machines. Tested on i686 and x86-64. [BZ #22093] * sysdeps/x86/cpu-features.c (init_cpu_features): Initialize GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64. * sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated. (HWCAP_IMPORTANT): Likewise. (HWCAP_X86_64): New enum. (HWCAP_X86_AVX512_1): Updated. * sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64". * sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1. (modules-names): Add x86_64/tst-x86_64mod-1. (LDFLAGS-tst-x86_64mod-1.so): New. ($(objpfx)tst-x86_64-1): Likewise. ($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise. (tst-x86_64-1-clean): Likewise. * sysdeps/x86_64/tst-x86_64-1.c: New file. * sysdeps/x86_64/tst-x86_64mod-1.c: Likewise. (cherry picked from commit `45ff34638f`)	2017-10-22 04:16:39 -07:00
Markus Trippelsdorf	5f5532caf8	Update x86_64 ulps for AMD Ryzen. * sysdeps/x86_64/fpu/libm-test-ulps: Update for AMD Ryzen. (cherry picked from commit `4c03a69680`)	2017-09-08 20:00:10 +00:00
H.J. Lu	7a499756ab	x86-64: Test memmove_chk and memset_chk only in libc.so [BZ #21741 ] Since there are no multiarch versions of memmove_chk and memset_chk, test multiarch versions of memmove_chk and memset_chk only in libc.so. [BZ #21741] * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test memmove_chk and memset_chk only in libc.so.	2017-07-10 04:44:38 -07:00
H.J. Lu	58d021c836	x86-64: Update comments in IFUNC selectors * sysdeps/x86_64/multiarch/memcmp.c: Update comments. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memrchr.c: Likewise. * sysdeps/x86_64/multiarch/memset.c: Likewise. * sysdeps/x86_64/multiarch/rawmemchr.c: Likewise. * sysdeps/x86_64/multiarch/strchrnul.c: Likewise. * sysdeps/x86_64/multiarch/strlen.c: Likewise. * sysdeps/x86_64/multiarch/strnlen.c: Likewise. * sysdeps/x86_64/multiarch/wcschr.c: Likewise. * sysdeps/x86_64/multiarch/wcscpy.c: Likewise. * sysdeps/x86_64/multiarch/wcslen.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c: Likewise. * sysdeps/x86_64/multiarch/wmemchr.c: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.c: Likewise. * sysdeps/x86_64/multiarch/wmemset.c: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.	2017-07-09 11:43:20 -07:00
H.J. Lu	4df54c89bb	x86-64: Update comments in ifunc-impl-list.c All x86-64 IFUNC selectors are written in C now. Update comments to reflect it. * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update comments.	2017-07-09 11:38:37 -07:00
H.J. Lu	031e519c95	x86-64: Align the stack in __tls_get_addr [BZ #21609 ] This change forces realignment of the stack pointer in __tls_get_addr, so that binaries compiled by GCCs older than GCC 4.9: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 continue to work even if vector instructions are used in glibc which require the ABI stack realignment. __tls_get_addr_slow is added to handle the slow paths in the default implementation of__tls_get_addr in elf/dl-tls.c. The new __tls_get_addr calls __tls_get_addr_slow after realigning the stack. Internal calls within ld.so go directly to the default implementation of __tls_get_addr because they do not need stack realignment. [BZ #21609] * sysdeps/x86_64/Makefile (sysdep-dl-routines): Add tls_get_addr. (gen-as-const-headers): Add rtld-offsets.sym. * sysdeps/x86_64/dl-tls.c: New file. * sysdeps/x86_64/rtld-offsets.sym: Likwise. * sysdeps/x86_64/tls_get_addr.S: Likewise. * sysdeps/x86_64/dl-tls.h: Add multiple inclusion guards. * sysdeps/x86_64/tlsdesc.sym (TI_MODULE_OFFSET): New. (TI_OFFSET_OFFSET): Likwise.	2017-07-06 04:43:20 -07:00
Joseph Myers	073e8fa773	Require binutils 2.25 or later to build glibc. This patch implements a requirement of binutils >= 2.25 (up from 2.22) to build glibc. Tests for 2.24 or later on x86_64 and s390 are removed. It was already the case, as indicated by buildbot results, that 2.24 was too old for building tests for 32-bit x86 (produced internal linker errors linking elf/tst-gnu2-tls1mod.so). I don't know if any configure tests for binutils features are obsolete given the increased version requirement. Tested for x86_64. * configure.ac (AS): Require binutils 2.25 or later. (LD): Likewise. * configure: Regenerated. * sysdeps/s390/configure.ac (AS): Remove version check. * sysdeps/s390/configure: Regenerated. * sysdeps/x86_64/configure.ac (AS): Remove version check. * sysdeps/x86_64/configure: Regenerated. * manual/install.texi (Tools for Compilation): Document requirement for binutils 2.25 or later. * INSTALL: Regenerated.	2017-06-28 11:31:50 +00:00
H.J. Lu	e94c310357	x86-64: Optimize memcmp-avx2-movbe.S for short difference Check the first 32 bytes before checking size when size >= 32 bytes to avoid unnecessary branch if the difference is in the first 32 bytes. Replace vpmovmskb/subl/jnz with vptest/jnc. On Haswell, the new version is as fast as the previous one. On Skylake, the new version is a little bit faster. * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (MEMCMP): Check the first 32 bytes before checking size when size >= 32 bytes. Replace vpmovmskb/subl/jnz with vptest/jnc.	2017-06-27 07:55:00 -07:00
Joseph Myers	c86ed71d63	Add float128 support for x86_64, x86. This patch enables float128 support for x86_64 and x86. All GCC versions that can build glibc provide the required support, but since GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN tests and some tests of NaN payloads need to be disabled with such compilers (this does not affect the generated glibc binaries at all, just the tests). bits/floatn.h declares float128 support to be available for GCC versions that provide the required libgcc support (4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd); compilation-only support was present some time before then, but not really useful without the libgcc functions. fenv_private.h needed updating to avoid trying to put _Float128 values in registers. I make no assertion of optimality of the math_opt_barrier / math_force_eval definitions for this case; they are simply intended to be sufficient to work correctly. Tested for x86_64 and x86, with GCC 7 and GCC 6. (Testing for x32 was compilation tests only with build-many-glibcs.py to verify the ABI baseline updates. I have not done any testing for Hurd, although the float128 support is enabled there as for GNU/Linux.) * sysdeps/i386/Implies: Add ieee754/float128. * sysdeps/x86_64/Implies: Likewise. * sysdeps/x86/bits/floatn.h: New file. * sysdeps/x86/float128-abi.h: Likewise. * manual/math.texi (Mathematics): Document support for _Float128 on x86_64 and x86. * sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>. (math_opt_barrier): Do not put _Float128 values in floating-point registers. (math_force_eval): Likewise. [__x86_64__] (SET_RESTORE_ROUNDF128): New macro. * sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append to Makefile variable. * sysdeps/x86/fpu/e_sqrtf128.c: New file. * sysdeps/x86/fpu/sfp-machine.h: Likewise. Based on libgcc. * sysdeps/x86/math-tests.h: New file. * math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro. * math/libm-test-getpayload.inc (getpayload_test_data): Use XFAIL_FLOAT128_PAYLOAD. * math/libm-test-setpayload.inc (setpayload_test_data): Likewise. * math/libm-test-totalorder.inc (totalorder_test_data): Likewise. * math/libm-test-totalordermag.inc (totalordermag_test_data): Likewise. * sysdeps/unix/sysv/linux/i386/libc.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise. * sysdeps/i386/fpu/libm-test-ulps: Likewise. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.	2017-06-26 22:02:24 +00:00
H.J. Lu	049816c3be	x86-64: Optimize L(between_2_3) in memcmp-avx2-movbe.S Turn movzbl -1(%rdi, %rdx), %edi movzbl -1(%rsi, %rdx), %esi orl %edi, %eax orl %esi, %ecx into movb -1(%rdi, %rdx), %al movb -1(%rsi, %rdx), %cl * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3): Replace movzbl and orl with movb.	2017-06-23 12:46:12 -07:00
Florian Weimer	bc0382ae90	x86-64: Fix comment typo in memcmp-avx2-movbe.S	2017-06-23 19:00:58 +02:00
Florian Weimer	3ec7c02cc3	x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662 ] This code: L(between_2_3): /* Load as big endian with overlapping loads and bswap to avoid branches. */ movzwl -2(%rdi, %rdx), %eax movzwl -2(%rsi, %rdx), %ecx shll $16, %eax shll $16, %ecx movzwl (%rdi), %edi movzwl (%rsi), %esi orl %edi, %eax orl %esi, %ecx bswap %eax bswap %ecx subl %ecx, %eax ret needs a saturating subtract because the full register is used. With this commit, only the lower 24 bits of the register are used, so a regular subtraction suffices. The test case change adds coverage for these kinds of bugs.	2017-06-23 17:24:40 +02:00
H.J. Lu	11ffcacb64	x86-64: Implement strcmp family IFUNC selectors in C Implement strcmp family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcmp family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcmp-sse2, strcmp-sse4_2, strncmp-sse2, strncmp-sse4_2, strcasecmp_l-sse2, strcasecmp_l-sse4_2, strcasecmp_l-avx, strncase_l-sse2, strncase_l-sse4_2 and strncase_l-avx. * sysdeps/x86_64/multiarch/ifunc-strcasecmp.h: New file. * sysdeps/x86_64/multiarch/strcasecmp.c: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l-avx.S: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l.c: Likewise. * sysdeps/x86_64/multiarch/strcmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcmp-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strcmp.c: Likewise. * sysdeps/x86_64/multiarch/strncase.c: Likewise. * sysdeps/x86_64/multiarch/strncase_l-avx.S : Likewise. * sysdeps/x86_64/multiarch/strncase_l-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strncase_l-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strncase_l.c: Likewise. * sysdeps/x86_64/multiarch/strncmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strncmp-sse4_2.S: Likewise. * sysdeps/x86_64/multiarch/strncmp.c: Likewise. * sysdeps/x86_64/multiarch/strcasecmp_l.S: Removed. * sysdeps/x86_64/multiarch/strcmp.S: Likewise. * sysdeps/x86_64/multiarch/strncase_l.S: Likewise. * sysdeps/x86_64/multiarch/strncmp.S: Likewise. * sysdeps/x86_64/multiarch/strcmp-sse42.S: Include <sysdep.h>. (STRCMP_SSE42): New. Defined to __strcmp_sse42 if not defined. [USE_AS_STRCASECMP_L \|\| USE_AS_STRNCASECMP_L]: Include "locale-defines.h". (UPDATE_STRNCMP_COUNTER): New. (SECTION): Likewise. (GLABEL): Likewise. (LABEL): Likewise. * sysdeps/x86_64/multiarch/strncmp-ssse3.S: Rewrite and enable for libc.a.	2017-06-21 12:11:06 -07:00
Zack Weinberg	af85385f31	Use locale_t, not __locale_t, throughout glibc <locale.h> is specified to define locale_t in POSIX.1-2008, and so are all of the headers that define functions that take locale_t arguments. Under _GNU_SOURCE, the additional headers that define such functions have also always defined locale_t. Therefore, there is no need to use __locale_t in public function prototypes, nor in any internal code. * ctype/ctype-c99_l.c, ctype/ctype.h, ctype/ctype_l.c * include/monetary.h, include/stdlib.h, include/time.h * include/wchar.h, locale/duplocale.c, locale/freelocale.c * locale/global-locale.c, locale/langinfo.h, locale/locale.h * locale/localeinfo.h, locale/newlocale.c * locale/nl_langinfo_l.c, locale/uselocale.c * localedata/bug-usesetlocale.c, localedata/tst-xlocale2.c * stdio-common/vfscanf.c, stdlib/monetary.h, stdlib/stdlib.h * stdlib/strfmon_l.c, stdlib/strtod_l.c, stdlib/strtof_l.c * stdlib/strtol.c, stdlib/strtol_l.c, stdlib/strtold_l.c * stdlib/strtoll_l.c, stdlib/strtoul_l.c, stdlib/strtoull_l.c * string/strcasecmp.c, string/strcoll_l.c, string/string.h * string/strings.h, string/strncase.c, string/strxfrm_l.c * sysdeps/ieee754/float128/strtof128_l.c * sysdeps/ieee754/float128/wcstof128.c * sysdeps/ieee754/float128/wcstof128_l.c * sysdeps/ieee754/ldbl-128ibm/strtold_l.c * sysdeps/ieee754/ldbl-64-128/strtold_l.c * sysdeps/ieee754/ldbl-opt/nldbl-compat.c * sysdeps/ieee754/ldbl-opt/nldbl-strfmon_l.c * sysdeps/ieee754/ldbl-opt/nldbl-strtold_l.c * sysdeps/ieee754/ldbl-opt/nldbl-wcstold_l.c * sysdeps/powerpc/powerpc32/power7/strcasecmp.S * sysdeps/powerpc/powerpc64/power7/strcasecmp.S * sysdeps/x86_64/strcasecmp_l-nonascii.c * sysdeps/x86_64/strncase_l-nonascii.c, time/strftime_l.c * time/strptime_l.c, time/time.h, wcsmbs/mbsrtowcs_l.c * wcsmbs/wchar.h, wcsmbs/wcscasecmp.c, wcsmbs/wcsncase.c * wcsmbs/wcstod.c, wcsmbs/wcstod_l.c, wcsmbs/wcstof.c * wcsmbs/wcstof_l.c, wcsmbs/wcstol_l.c, wcsmbs/wcstold.c * wcsmbs/wcstold_l.c, wcsmbs/wcstoll_l.c, wcsmbs/wcstoul_l.c * wcsmbs/wcstoull_l.c, wctype/iswctype_l.c * wctype/towctrans_l.c, wctype/wcfuncs_l.c * wctype/wctrans_l.c, wctype/wctype.h, wctype/wctype_l.c: Change all uses of __locale_t to locale_t.	2017-06-20 20:30:06 -04:00
Zack Weinberg	c0b23001a8	Fix fallout from bits/string.h removal. Remove one more string inline that was defined directly in string.h; in the absence of the rest of the inlines, it broke the build. Like other ifunc shims for these functions, x86_64/multiarch/{mem,st}pcpy.c need to define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT. * string/string.h (__mempcpy_inline): Delete. * sysdeps/x86_64/multiarch/mempcpy.c * sysdeps/x86_64/multiarch/stpcpy.c: Define NO_MEMPCPY_STPCPY_REDIRECT and __NO_STRING_INLINES before including string.h.	2017-06-20 09:39:08 -04:00
Zack Weinberg	09a596cc2c	Remove bits/string.h. These machine-dependent inline string functions have never been on by default, and even if they were a good idea at the time they were introduced, they haven't really been touched in ten to fifteen years and probably aren't a good idea on current-gen processors. Current thinking is that this class of optimization is best left to the compiler. * bits/string.h, string/bits/string.h * sysdeps/aarch64/bits/string.h * sysdeps/m68k/m680x0/m68020/bits/string.h * sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h * sysdeps/x86/bits/string.h: Delete file. * string/string.h: Don't include bits/string.h. * string/bits/string3.h: Rename to bits/string_fortified.h. No need to undef various symbols that the removed headers might have defined as macros. * string/Makefile (headers): Remove bits/string.h, change bits/string3.h to bits/string_fortified.h. * string/string-inlines.c: Update commentary. Remove definitions of various macros that nothing looks at anymore. Don't directly include bits/string.h. Set _STRING_INLINE_unaligned here, based on compiler-predefined macros. * string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY _is_ defined, provide internal hidden alias __strncat. * include/string.h: Declare internal hidden alias __strncat. Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is not defined. * include/bits/string3.h: Rename to bits/string_fortified.h, update to match above. * sysdeps/i386/string-inlines.c: Define compat symbols for everything formerly defined by sysdeps/x86/bits/string.h. Make existing definitions into compat symbols as well. Remove some no-longer-necessary messing around with macros. * sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/mempcpy.c * sysdeps/powerpc/powerpc64/multiarch/stpcpy.c * sysdeps/s390/multiarch/mempcpy.c No need to define _HAVE_STRING_ARCH_mempcpy. Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT. * sysdeps/i386/i686/multiarch/strncat-c.c * sysdeps/s390/multiarch/strncat-c.c * sysdeps/x86_64/multiarch/strncat-c.c Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.	2017-06-20 08:21:24 -04:00
Siddhesh Poyarekar	629ebc873a	Fix typo when undefining weak_alias The macro directive #undef was miswritten as #undefine. * sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Fix typo.	2017-06-19 14:56:40 +05:30
H.J. Lu	70fe2eb794	x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C Implement strcspn/strpbrk/strspn IFUNC selectors in C All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcspn/strpbrk/strspn functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcspn-sse2, strpbrk-sse2 and strspn-sse2. * sysdeps/x86_64/strcspn.S (STRPBRK_P): Removed. Check USE_AS_STRPBRK instead of STRPBRK_P. * sysdeps/x86_64/strpbrk.S (USE_AS_STRPBRK): New. * sysdeps/x86_64/multiarch/ifunc-sse4_2.h: New file. * sysdeps/x86_64/multiarch/strcspn-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcspn.c: Likewise. * sysdeps/x86_64/multiarch/strpbrk-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strpbrk.c: Likewise. * sysdeps/x86_64/multiarch/strspn-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strspn.c: Likewise. * sysdeps/x86_64/multiarch/strcspn.S: Removed. * sysdeps/x86_64/multiarch/strpbrk.S: Likewise. * sysdeps/x86_64/multiarch/strspn.S: Likewise. * sysdeps/x86_64/multiarch/strpbrk-c.c: Remove "#ifdef SHARED" and "#endif".	2017-06-15 08:59:05 -07:00
H.J. Lu	9f4254b8bd	x86-64: Implement wcscpy IFUNC selector in C * sysdeps/x86_64/multiarch/wcscpy.S: Removed. * sysdeps/x86_64/multiarch/wcscpy.c: New file.	2017-06-15 08:57:52 -07:00
H.J. Lu	9ed0aa15d3	x86-64: Implement strcat family IFUNC selectors in C Implement strcat family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcat family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcat-sse2. * sysdeps/x86_64/multiarch/strcat-sse2.S: New file. * sysdeps/x86_64/multiarch/strcat.c: Likewise. * sysdeps/x86_64/multiarch/strncat.c: Likewise. * sysdeps/x86_64/multiarch/strcat.S: Removed. * sysdeps/x86_64/multiarch/strncat.S: Likewise.	2017-06-15 08:56:59 -07:00
H.J. Lu	b91a52d0d7	x86-64: Implement memcmp family IFUNC selectors in C Implement memcmp family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memcmp family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memcmp-sse2. * sysdeps/x86_64/multiarch/ifunc-memcmp.h: New file. * sysdeps/x86_64/multiarch/memcmp-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.c: Likewise. * sysdeps/x86_64/multiarch/wmemcmp.c: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Removed. * sysdeps/x86_64/multiarch/wmemcmp.S: Likewise.	2017-06-15 08:49:57 -07:00
H.J. Lu	93e46f8773	x86-64: Implement memset family IFUNC selectors in C Implement memset family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memset functions within libc. 2017-06-07 H.J. Lu <hongjiu.lu@intel.com> Erich Elsen <eriche@google.com> * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, and memset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add test for __memset_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memset.h: New file. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset.c: Likewise. * sysdeps/x86_64/multiarch/memset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.c: Likewise. * sysdeps/x86_64/multiarch/memset.S: Removed. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset_chk_erms): New function.	2017-06-15 08:33:35 -07:00
H.J. Lu	5c3e322d3b	x86-64: Implement memmove family IFUNC selectors in C Implement memmove family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for memmove family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memcpy_chk-nonshared, mempcpy_chk-nonshared and memmove_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memmove_chk_erms, __memcpy_chk_erms and __mempcpy_chk_erms. Update comments. * sysdeps/x86_64/multiarch/ifunc-memmove.h: New file. * sysdeps/x86_64/multiarch/memcpy.c: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memmove.c: Likewise. * sysdeps/x86_64/multiarch/memmove_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy.c: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.c: Likewise. * sysdeps/x86_64/multiarch/memcpy.S: Removed. * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove.S: Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (__mempcpy_chk_erms): New function. (__memmove_chk_erms): Likewise. (__memcpy_chk_erms): New alias.	2017-06-14 12:11:10 -07:00
Zack Weinberg	fd860eaaa8	Remove __need macros from errno.h (__need_Emath, __need_error_t). This is fairly complicated, not because the users of __need_Emath and __need_error_t have complicated requirements, but because the core changes had a lot of fallout. __need_error_t exists for gnulib compatibility in argz.h and argp.h. error_t itself is a Hurdism, an enum containing all the E-constants, so you can do 'p (error_t) errno' in gdb and get a symbolic value. argz.h and argp.h use it for function return values, and they want to fall back to 'int' when that's not available. There is no reason why these nonstandard headers cannot just go ahead and include all of errno.h; so we do that. __need_Emath is defined only by .S files; what they _really_ need is for errno.h to avoid declaring anything other than the E-constants (e.g. 'extern int __errno_location(void);' is a syntax error in assembly language). This is replaced with a check for __ASSEMBLER__ in errno.h, plus a carefully documented requirement for bits/errno.h not to define anything other than macros. That in turn has the consequence that bits/errno.h must not define errno - fortunately, all live ports use the same definition of errno, so I've moved it to errno.h. The Hurd bits/errno.h must also take care not to define error_t when __ASSEMBLER__ is defined, which involves repeating all of the definitions twice, but it's a generated file so that's okay. * stdlib/errno.h: Remove __need_Emath and __need_error_t logic. Reorganize file. Declare errno here. When __ASSEMBLER__ is defined, don't declare anything other than the E-constants. * include/errno.h: Change conditional for exposing internal declarations to (not _ISOMAC and not __ASSEMBLER__). * bits/errno.h: Remove logic for __need_Emath. Document requirements for a port-specific bits/errno.h. * sysdeps/unix/sysv/linux/bits/errno.h * sysdeps/unix/sysv/linux/alpha/bits/errno.h * sysdeps/unix/sysv/linux/hppa/bits/errno.h * sysdeps/unix/sysv/linux/mips/bits/errno.h * sysdeps/unix/sysv/linux/sparc/bits/errno.h: Add multiple-include guard and check against improper inclusion. Remove __need_Emath logic. Don't declare errno here. Ensure all constants are defined as simple integer literals. Consistent formatting. * sysdeps/mach/hurd/errnos.awk: Likewise. Only define error_t and enum __error_t_codes if __ASSEMBLER__ is not defined. * sysdeps/mach/hurd/bits/errno.h: Regenerate. * argp/argp.h, string/argz.h: Don't define __need_error_t before including errno.h. * sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sincosf-sse2.S * sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S * sysdeps/x86_64/fpu/s_cosf.S * sysdeps/x86_64/fpu/s_sincosf.S * sysdeps/x86_64/fpu/s_sinf.S: Just include errno.h; don't define __need_Emath or include bits/errno.h directly.	2017-06-14 08:14:34 -04:00
Alan Modra	0572433b5b	PowerPC64 ELFv2 PPC64_OPT_LOCALENTRY ELFv2 functions with localentry:0 are those with a single entry point, ie. global entry == local entry, that have no requirement on r2 or r12 and guarantee r2 is unchanged on return. Such an external function can be called via the PLT without saving r2 or restoring it on return, avoiding a common load-hit-store for small functions. This patch implements the ld.so changes necessary for this optimization. ld.so needs to check that an optimized plt call sequence is in fact calling a function implemented with localentry:0, end emit a fatal error otherwise. The elf/testobj6.c change is to stop "error while loading shared libraries: expected localentry:0 `preload'" when running elf/preloadtest, which we'd get otherwise. * elf/elf.h (PPC64_OPT_LOCALENTRY): Define. * sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add refsym and sym parameters. Adjust callers. * sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise. * sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New. (_dl_reloc_overflow): Increase buffser size. Formatting. * sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset): Delete reloc param, add refsym and sym. Check optimized plt call stubs for localentry:0 functions. Adjust callers. (elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym and sym parameters. Adjust callers. (_dl_reloc_overflow): Move attribute. (_dl_error_localentry): Declare. * elf/dl-runtime.c (_dl_fixup): Save original sym. Pass refsym and sym to elf_machine_fixup_plt. * elf/testobj6.c (preload): Call printf.	2017-06-14 10:47:25 +09:30
H.J. Lu	5a103908c0	x86-64: Implement strcpy family IFUNC selectors in C Implement strcpy family IFUNC selectors in C. All internal calls within libc.so can use IFUNC on x86-64 since unlike x86, x86-64 supports PC-relative addressing to access the GOT entry so that it can call via PLT without using an extra register. For libc.a, we can't use IFUNC for functions which are called before IFUNC has been initialized. Use IFUNC internally reduces the icache footprint since libc.so and other codes in the process use the same implementations. This patch uses IFUNC for strcpy family functions within libc. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strcpy-sse2 and stpcpy-sse2. * sysdeps/x86_64/multiarch/ifunc-unaligned-ssse3.h: New file. * sysdeps/x86_64/multiarch/stpcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/stpcpy.c: Likewise. * sysdeps/x86_64/multiarch/stpncpy.c: Likewise. * sysdeps/x86_64/multiarch/strcpy-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.c: Likewise. * sysdeps/x86_64/multiarch/strncpy.c: Likewise. * sysdeps/x86_64/multiarch/stpcpy.S: Removed. * sysdeps/x86_64/multiarch/stpncpy.S: Likewise. * sysdeps/x86_64/multiarch/strcpy.S: Likewise. * sysdeps/x86_64/multiarch/strncpy.S: Likewise. * sysdeps/x86_64/multiarch/stpncpy-c.c (weak_alias): New. (libc_hidden_def): Always defined as empty. * sysdeps/x86_64/multiarch/strncpy-c.c (libc_hidden_builtin_def): Always Defined as empty.	2017-06-12 09:06:09 -07:00
H.J. Lu	6b6710e55b	x86-64: Correct comments in ifunc-impl-list.c * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Correct comments.	2017-06-09 05:53:45 -07:00
H.J. Lu	d2538b9156	x86-64: Optimize strrchr/wcsrchr with AVX2 Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector instructions. It is as fast as SSE2 version for small data sizes and up to 1X faster for large data sizes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strrchr-sse2, strrchr-avx2, wcsrchr-sse2 and wcsrchr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strrchr_avx2, __strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2. * sysdeps/x86_64/multiarch/strrchr-avx2.S: New file. * sysdeps/x86_64/multiarch/strrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strrchr.c: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcsrchr.c: Likewise.	2017-06-09 05:45:52 -07:00
H.J. Lu	5ac7aa1d7c	x86-64: Optimize memrchr with AVX2 Optimize memrchr with AVX2 to search 32 bytes with a single vector compare instruction. It is as fast as SSE2 memrchr for small data sizes and up to 1X faster for large data sizes on Haswell. Select AVX2 memrchr on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memrchr-sse2 and memrchr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __memrchr_avx2 and __memrchr_sse2. * sysdeps/x86_64/multiarch/memrchr-avx2.S: New file. * sysdeps/x86_64/multiarch/memrchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memrchr.c: Likewise.	2017-06-09 05:44:41 -07:00
H.J. Lu	8fe57365bf	x86-64: Optimize strchr/strchrnul/wcschr with AVX2 Optimize strchr/strchrnul/wcschr with AVX2 to search 32 bytes with vector instructions. It is as fast as SSE2 versions for size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strchr-sse2, strchrnul-sse2, strchr-avx2, strchrnul-avx2, wcschr-sse2 and wcschr-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strchr_avx2, __strchrnul_avx2, __strchrnul_sse2, __wcschr_avx2 and __wcschr_sse2. * sysdeps/x86_64/multiarch/strchr-avx2.S: New file. * sysdeps/x86_64/multiarch/strchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strchr.c: Likewise. * sysdeps/x86_64/multiarch/strchrnul-avx2.S: Likewise. * sysdeps/x86_64/multiarch/strchrnul-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strchrnul.c: Likewise. * sysdeps/x86_64/multiarch/wcschr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcschr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcschr.c: Likewise. * sysdeps/x86_64/multiarch/strchr.S: Removed.	2017-06-09 05:42:29 -07:00
H.J. Lu	dc485ceb2a	x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 to check 32 bytes with a single vector compare instruction. It is as fast as SSE2 versions for size <= 16 bytes and up to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add strlen-sse2, strnlen-sse2, strlen-avx2, strnlen-avx2, wcslen-sse2, wcslen-avx2 and wcsnlen-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for __strlen_avx2, __strlen_sse2, __strnlen_avx2, __strnlen_sse2, __wcslen_avx2, __wcslen_sse2 and __wcsnlen_avx2. * sysdeps/x86_64/multiarch/strlen-avx2.S: New file. * sysdeps/x86_64/multiarch/strlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strlen.c: Likewise. * sysdeps/x86_64/multiarch/strnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/strnlen.c: Likewise. * sysdeps/x86_64/multiarch/wcslen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wcslen.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c (OPTIMIZE (avx2)): New. (IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast.	2017-06-09 05:18:18 -07:00
H.J. Lu	2f5d20ac99	x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 SSE2 memchr is extended to support wmemchr. AVX2 memchr/rawmemchr/wmemchr are added to search 32 bytes with a single vector compare instruction. AVX2 memchr/rawmemchr/wmemchr are as fast as SSE2 memchr/rawmemchr/wmemchr for small sizes and up to 1.5X faster for larger sizes on Haswell and Skylake. Select AVX2 memchr/rawmemchr/wmemchr on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. * sysdeps/x86_64/memchr.S (MEMCHR): New. Depending on if USE_AS_WMEMCHR is defined. (PCMPEQ): Likewise. (memchr): Renamed to ... (MEMCHR): This. Support wmemchr if USE_AS_WMEMCHR is defined. Replace pcmpeqb with PCMPEQ. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memchr-sse2, rawmemchr-sse2, memchr-avx2, rawmemchr-avx2, wmemchr-sse4_1, wmemchr-avx2 and wmemchr-c. * sysdeps/x86_64/multiarch/ifunc-avx2.h: New file. * sysdeps/x86_64/multiarch/memchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/memchr.c: Likewise. * sysdeps/x86_64/multiarch/rawmemchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/rawmemchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/rawmemchr.c: Likewise. * sysdeps/x86_64/multiarch/wmemchr-avx2.S: Likewise. * sysdeps/x86_64/multiarch/wmemchr-sse2.S: Likewise. * sysdeps/x86_64/multiarch/wmemchr.c: Likewise. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memchr_avx2, __memchr_sse2, __rawmemchr_avx2, __rawmemchr_sse2, __wmemchr_avx2 and __wmemchr_sse2.	2017-06-09 05:13:31 -07:00
H.J. Lu	5e1122827a	x86-64: Rename wmemset.h to ifunc-wmemset.h No code changes. * sysdeps/x86_64/multiarch/wmemset.c: Include ifunc-wmemset.h instead of wmemset.h. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/multiarch/wmemset.h: Renamed to ... * sysdeps/x86_64/multiarch/ifunc-wmemset.h: This.	2017-06-07 14:48:34 -07:00
H.J. Lu	2e87c7d158	x86-64: Fold ifunc-sse4_1.h into wcsnlen.c Since ifunc-sse4_1.h is included only by wcsnlen.c, we can fold it into wcsnlen.c. No code changes in wcsnlen.o. 2017-06-07 H.J. Lu <hongjiu.lu@intel.com> * sysdeps/x86_64/multiarch/ifunc-sse4_1.h: Removed and folded into ... * sysdeps/x86_64/multiarch/wcsnlen.c: Here. Don't include ifunc-sse4_1.h.	2017-06-07 09:04:40 -07:00
H.J. Lu	d4cc385c6e	x86-64: Move wcsnlen.S to multiarch/wcsnlen-sse4_1.S Since wcsnlen.S uses pminud which is the part of SSE4.1, move wcsnlen.S to multiarch/wcsnlen-sse4_1.S. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wcsnlen-sse4_1 and wcsnlen-c. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __wcsnlen_sse4_1 and __wcsnlen_sse2. * sysdeps/x86_64/multiarch/ifunc-sse4_1.h: New file. * sysdeps/x86_64/multiarch/wcsnlen-c.c: Likewise. * sysdeps/x86_64/multiarch/wcsnlen-sse4_1.S: Likewise. * sysdeps/x86_64/multiarch/wcsnlen.c: Likewise. * sysdeps/x86_64/wcsnlen.S: Removed.	2017-06-06 06:12:32 -07:00
Stefan Liebler	12d2dd7060	Optimize generic spinlock code and use C11 like atomic macros. This patch optimizes the generic spinlock code. The type pthread_spinlock_t is a typedef to volatile int on all archs. Passing a volatile pointer to the atomic macros which are not mapped to the C11 atomic builtins can lead to extra stores and loads to stack if such a macro creates a temporary variable by using "__typeof ((mem)) tmp;". Thus, those macros which are used by spinlock code - atomic_exchange_acquire, atomic_load_relaxed, atomic_compare_exchange_weak - have to be adjusted. According to the comment from Szabolcs Nagy, the type of a cast expression is unqualified (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_423.htm): __typeof ((__typeof ((mem)) (mem)) tmp; Thus from spinlock perspective the variable tmp is of type int instead of type volatile int. This patch adjusts those macros in include/atomic.h. With this construct GCC >= 5 omits the extra stores and loads. The atomic macros are replaced by the C11 like atomic macros and thus the code is aligned to it. The pthread_spin_unlock implementation is now using release memory order instead of sequentially consistent memory order. The issue with passed volatile int pointers applies to the C11 like atomic macros as well as the ones used before. I've added a glibc_likely hint to the first atomic exchange in pthread_spin_lock in order to return immediately to the caller if the lock is free. Without the hint, there is an additional jump if the lock is free. I've added the atomic_spin_nop macro within the loop of plain reads. The plain reads are also realized by C11 like atomic_load_relaxed macro. The new define ATOMIC_EXCHANGE_USES_CAS determines if the first try to acquire the spinlock in pthread_spin_lock or pthread_spin_trylock is an exchange or a CAS. This is defined in atomic-machine.h for all architectures. The define SPIN_LOCK_READS_BETWEEN_CMPXCHG is now removed. There is no technical reason for throwing in a CAS every now and then, and so far we have no evidence that it can improve performance. If that would be the case, we have to adjust other spin-waiting loops elsewhere, too! Using a CAS loop without plain reads is not a good idea on many targets and wasn't used by one. Thus there is now no option to do so. Architectures are now using the generic spinlock automatically if they do not provide an own implementation. Thus the pthread_spin_lock.c files in sysdeps folder are deleted. ChangeLog: NEWS: Mention new spinlock implementation. * include/atomic.h: (__atomic_val_bysize): Cast type to omit volatile qualifier. (atomic_exchange_acq): Likewise. (atomic_load_relaxed): Likewise. (ATOMIC_EXCHANGE_USES_CAS): Check definition. * nptl/pthread_spin_init.c (pthread_spin_init): Use atomic_store_relaxed. * nptl/pthread_spin_lock.c (pthread_spin_lock): Use C11-like atomic macros. * nptl/pthread_spin_trylock.c (pthread_spin_trylock): Likewise. * nptl/pthread_spin_unlock.c (pthread_spin_unlock): Use atomic_store_release. * sysdeps/aarch64/nptl/pthread_spin_lock.c: Delete File. * sysdeps/arm/nptl/pthread_spin_lock.c: Likewise. * sysdeps/hppa/nptl/pthread_spin_lock.c: Likewise. * sysdeps/m68k/nptl/pthread_spin_lock.c: Likewise. * sysdeps/microblaze/nptl/pthread_spin_lock.c: Likewise. * sysdeps/mips/nptl/pthread_spin_lock.c: Likewise. * sysdeps/nios2/nptl/pthread_spin_lock.c: Likewise. * sysdeps/aarch64/atomic-machine.h (ATOMIC_EXCHANGE_USES_CAS): Define. * sysdeps/alpha/atomic-machine.h: Likewise. * sysdeps/arm/atomic-machine.h: Likewise. * sysdeps/i386/atomic-machine.h: Likewise. * sysdeps/ia64/atomic-machine.h: Likewise. * sysdeps/m68k/coldfire/atomic-machine.h: Likewise. * sysdeps/m68k/m680x0/m68020/atomic-machine.h: Likewise. * sysdeps/microblaze/atomic-machine.h: Likewise. * sysdeps/mips/atomic-machine.h: Likewise. * sysdeps/powerpc/powerpc32/atomic-machine.h: Likewise. * sysdeps/powerpc/powerpc64/atomic-machine.h: Likewise. * sysdeps/s390/atomic-machine.h: Likewise. * sysdeps/sparc/sparc32/atomic-machine.h: Likewise. * sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: Likewise. * sysdeps/sparc/sparc64/atomic-machine.h: Likewise. * sysdeps/tile/tilegx/atomic-machine.h: Likewise. * sysdeps/tile/tilepro/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/hppa/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/nios2/atomic-machine.h: Likewise. * sysdeps/unix/sysv/linux/sh/atomic-machine.h: Likewise. * sysdeps/x86_64/atomic-machine.h: Likewise.	2017-06-06 09:41:56 +02:00
H.J. Lu	935971ba6b	x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE Optimize x86-64 memcmp/wmemcmp with AVX2. It uses vector compare as much as possible. It is as fast as SSE4 memcmp for size <= 16 bytes and up to 2X faster for size > 16 bytes on Haswell and Skylake. Select AVX2 memcmp/wmemcmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT. Key features: 1. For size from 2 to 7 bytes, load as big endian with movbe and bswap to avoid branches. 2. Use overlapping compare to avoid branch. 3. Use vector compare when size >= 4 bytes for memcmp or size >= 8 bytes for wmemcmp. 4. If size is 8 * VEC_SIZE or less, unroll the loop. 5. Compare 4 * VEC_SIZE at a time with the aligned first memory area. 6. Use 2 vector compares when size is 2 * VEC_SIZE or less. 7. Use 4 vector compares when size is 4 * VEC_SIZE or less. 8. Use 8 vector compares when size is 8 * VEC_SIZE or less. * sysdeps/x86/cpu-features.h (index_cpu_MOVBE): New. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memcmp-avx2 and wmemcmp-avx2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memcmp_avx2 and __wmemcmp_avx2. * sysdeps/x86_64/multiarch/memcmp-avx2.S: New file. * sysdeps/x86_64/multiarch/wmemcmp-avx2.S: Likewise. * sysdeps/x86_64/multiarch/memcmp.S: Use __memcmp_avx2 on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred. * sysdeps/x86_64/multiarch/wmemcmp.S: Use __wmemcmp_avx2 on AVX 2 machines if AVX unaligned load is fast and vzeroupper is preferred.	2017-06-05 12:52:55 -07:00
H.J. Lu	ef9c4cb6c7	x86-64: Optimize wmemset with SSE2/AVX2/AVX512 The difference between memset and wmemset is byte vs int. Add stubs to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size: SSE2 wmemset: shl $0x2,%rdx movd %esi,%xmm0 mov %rdi,%rax pshufd $0x0,%xmm0,%xmm0 jmp entry_from_wmemset SSE2 memset: movd %esi,%xmm0 mov %rdi,%rax punpcklbw %xmm0,%xmm0 punpcklwd %xmm0,%xmm0 pshufd $0x0,%xmm0,%xmm0 entry_from_wmemset: Since the ERMS versions of wmemset requires "rep stosl" instead of "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset is about 6X faster on Haswell. * include/wchar.h (__wmemset_chk): New. * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_CHK_SYMBOL): Likewise. (WMEMSET_SYMBOL): Likewise. (__wmemset): Add hidden definition. (wmemset): Add weak hidden definition. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wmemset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned, __wmemset_avx2_unaligned, __wmemset_avx512_unaligned, __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned and __wmemset_chk_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated. (WMEMSET_CHK_SYMBOL): New. (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise. (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise. * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New. (libc_hidden_builtin_def): Also define __GI_wmemset and __GI___wmemset. (weak_alias): New. * sysdeps/x86_64/multiarch/wmemset.c: New file. * sysdeps/x86_64/multiarch/wmemset.h: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/wmemset.c: Likewise. * sysdeps/x86_64/wmemset_chk.c: Likewise.	2017-06-05 11:09:59 -07:00
H.J. Lu	30cb625a21	x86-64: Update strlen.S to support wcslen/wcsnlen The difference between strlen and wcslen is byte vs int. We can replace pminub and pcmpeqb with pminud and pcmpeqd to turn strlen into wcslen. * sysdeps/x86_64/strlen.S (PMINU): New. (PCMPEQ): Likewise. (SHIFT_RETURN): Likewise. (FIND_ZERO): Replace pcmpeqb with PCMPEQ. (strlen): Add SHIFT_RETURN before ret. Replace pcmpeqb and pminub with PCMPEQ and PMINU. * sysdeps/x86_64/wcsnlen.S: New file.	2017-06-05 07:58:23 -07:00
H.J. Lu	7395928b95	x86_64: Remove redundant REX bytes from memrchr.S By x86-64 specification, 32-bit destination registers are zero-extended to 64 bits. There is no need to use 64-bit registers when only the lower 32 bits are non-zero. Also 2 instructions in: mov %rdi, %rcx and $15, %rcx jz L(length_less16_offset0) mov %rdi, %rcx <<< redundant and $15, %rcx <<< redundant are redundant. * sysdeps/x86_64/memrchr.S (__memrchr): Use 32-bit registers for the lower 32 bits. Remove redundant instructions.	2017-06-05 07:41:26 -07:00
H.J. Lu	4f26ef1b67	x86_64: Remove redundant REX bytes from memchr.S By x86-64 specification, 32-bit destination registers are zero-extended to 64 bits. There is no need to use 64-bit registers when only the lower 32 bits are non-zero. * sysdeps/x86_64/memchr.S (MEMCHR): Use 32-bit registers for the lower 32 bits.	2017-05-30 12:39:14 -07:00
H.J. Lu	1f655beb08	x86_64: Remove L(return_null) from rawmemchr.S L(return_null) is unused. * sysdeps/x86_64/rawmemchr.S (L(return_null)): Removed.	2017-05-20 06:13:38 -07:00
H.J. Lu	402bf06952	x86: Optimize SSE2 memchr overflow calculation SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16. Use "edx - (16 - ecx)", instead of satured math, to avoid possible addition overflow. This replaces add %ecx, %edx sbb %eax, %eax or %eax, %edx sub $16, %edx with neg %ecx add $16, %ecx sub %ecx, %edx It is the same for x86_64, except for rcx/rdx, instead of ecx/edx. * sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use "edx + ecx - 16" to avoid possible addition overflow. * sysdeps/x86_64/memchr.S (memchr): Likewise.	2017-05-19 10:48:45 -07:00
H.J. Lu	a7fbedff76	Correct comments in x86_64/multiarch/memcmp.S * sysdeps/x86_64/multiarch/memcmp.S (__GI_memcmp): Correct comments.	2017-05-18 14:02:02 -07:00
Zack Weinberg	7c3018f9e4	Suppress internal declarations for most of the testsuite. This patch adds a new build module called 'testsuite'. IS_IN (testsuite) implies _ISOMAC, as do IS_IN_build and __cplusplus (which means several ad-hoc tests for __cplusplus can go away). libc-symbols.h now suppresses almost all of itself when _ISOMAC is defined; in particular, _ISOMAC mode does not get config.h automatically anymore. There are still quite a few tests that need to see internal gunk of one variety or another. For them, we now have 'tests-internal' and 'test-internal-extras'; files in this category will still be compiled with MODULE_NAME=nonlib, and everything proceeds as it always has. The bulk of this patch is moving tests from 'tests' to 'tests-internal'. There is also 'tests-static-internal', which has the same effect on files in 'tests-static', and 'modules-names-tests', which has the inverse effect on files in 'modules-names' (it's inverted because most of the things in modules-names are not tests). For both of these, the file must appear in both the new variable and the old one. There is also now a special case for when libc-symbols.h is included without MODULE_NAME being defined at all. (This happens during the creation of libc-modules.h, and also when preprocessing Versions files.) When this happens, IS_IN is set to be always false and _ISOMAC is not defined, which was the status quo, but now it's explicit. The remaining changes to C source files in this patch seemed likely to cause problems in the absence of the main change. They should be relatively self-explanatory. In a few cases I duplicated a definition from an internal header rather than move the test to tests-internal; this was a judgement call each time and I'm happy to change those however reviewers feel is more appropriate. * Makerules: New subdir configuration variables 'tests-internal' and 'test-internal-extras'. Test files in these categories will still be compiled with MODULE_NAME=nonlib. Test files in the existing categories (tests, xtests, test-srcs, test-extras) are now compiled with MODULE_NAME=testsuite. New subdir configuration variable 'modules-names-tests'. Files which are in both 'modules-names' and 'modules-names-tests' will be compiled with MODULE_NAME=testsuite instead of MODULE_NAME=extramodules. (gen-as-const-headers): Move to tests-internal. (do-tests-clean, common-mostlyclean): Support tests-internal. * Makeconfig (built-modules): Add testsuite. * Makefile: Change libof-check-installed-headers-c and libof-check-installed-headers-cxx to 'testsuite'. * Rules: Likewise. Support tests-internal. * benchtests/strcoll-inputs/filelist#en_US.UTF-8: Remove extra-modules.mk. * config.h.in: Don't check for __OPTIMIZE__ or __FAST_MATH__ here. * include/libc-symbols.h: Move definitions of _GNU_SOURCE, PASTE_NAME, PASTE_NAME1, IN_MODULE, IS_IN, and IS_IN_LIB to the very top of the file and rationalize their order. If MODULE_NAME is not defined at all, define IS_IN to always be false, and don't define _ISOMAC. If any of IS_IN (testsuite), IS_IN_build, or __cplusplus are true, define _ISOMAC and suppress everything else in this file, starting with the inclusion of config.h. Do check for inappropriate definitions of __OPTIMIZE__ and __FAST_MATH__ here, but only if _ISOMAC is not defined. Correct some out-of-date commentary. * include/math.h: If _ISOMAC is defined, undefine NO_LONG_DOUBLE and _Mlong_double_ before including math.h. * include/string.h: If _ISOMAC is defined, don't expose _STRING_ARCH_unaligned. Move a comment to a more appropriate location. * include/errno.h, include/stdio.h, include/stdlib.h, include/string.h * include/time.h, include/unistd.h, include/wchar.h: No need to check __cplusplus nor use __BEGIN_DECLS/__END_DECLS. * misc/sys/cdefs.h (__NTHNL): New macro. * sysdeps/m68k/m680x0/fpu/bits/mathinline.h (__m81_defun): Use __NTHNL to avoid errors with GCC 6. * elf/tst-env-setuid-tunables.c: Include config.h with _LIBC defined, for HAVE_TUNABLES. * inet/tst-checks-posix.c: No need to define _ISOMAC. * intl/tst-gettext2.c: Provide own definition of N_. * math/test-signgam-finite-c99.c: No need to define _ISOMAC. * math/test-signgam-main.c: No need to define _ISOMAC. * stdlib/tst-strtod.c: Convert to test-driver. Split locale_test to... * stdlib/tst-strtod1i.c: ...this new file. * stdlib/tst-strtod5.c: Convert to test-driver and add copyright notice. Split tests of __strtod_internal to... * stdlib/tst-strtod5i.c: ...this new file. * string/test-string.h: Include stdint.h. Duplicate definition of inhibit_loop_to_libcall here (from libc-symbols.h). * string/test-strstr.c: Provide dummy definition of libc_hidden_builtin_def when including strstr.c. * sysdeps/ia64/fpu/libm-symbols.h: Suppress entire file in _ISOMAC mode; no need to test __STRICT_ANSI__ nor __cplusplus as well. * sysdeps/x86_64/fpu/math-tests-arch.h: Include cpu-features.h. Don't include init-arch.h. * sysdeps/x86_64/multiarch/test-multiarch.h: Include cpu-features.h. Don't include init-arch.h. * elf/Makefile: Move tst-ptrguard1-static, tst-stackguard1-static, tst-tls1-static, tst-tls2-static, tst-tls3-static, loadtest, unload, unload2, circleload1, neededtest, neededtest2, neededtest3, neededtest4, tst-tls1, tst-tls2, tst-tls3, tst-tls6, tst-tls7, tst-tls8, tst-dlmopen2, tst-ptrguard1, tst-stackguard1, tst-_dl_addr_inside_object, and all of the ifunc tests to tests-internal. Don't add $(modules-names) to test-extras. * inet/Makefile: Move tst-inet6_scopeid_pton to tests-internal. Add tst-deadline to tests-static-internal. * malloc/Makefile: Move tst-mallocstate and tst-scratch_buffer to tests-internal. * misc/Makefile: Move tst-atomic and tst-atomic-long to tests-internal. * nptl/Makefile: Move tst-typesizes, tst-rwlock19, tst-sem11, tst-sem12, tst-sem13, tst-barrier5, tst-signal7, tst-tls3, tst-tls3-malloc, tst-tls5, tst-stackguard1, tst-sem11-static, tst-sem12-static, and tst-stackguard1-static to tests-internal. Link tests-internal with libpthread also. Don't add $(modules-names) to test-extras. * nss/Makefile: Move tst-field to tests-internal. * posix/Makefile: Move bug-regex5, bug-regex20, bug-regex33, tst-rfc3484, tst-rfc3484-2, and tst-rfc3484-3 to tests-internal. * stdlib/Makefile: Move tst-strtod1i, tst-strtod3, tst-strtod4, tst-strtod5i, tst-tls-atexit, and tst-tls-atexit-nodelete to tests-internal. * sunrpc/Makefile: Move tst-svc_register to tests-internal. * sysdeps/powerpc/Makefile: Move test-get_hwcap and test-get_hwcap-static to tests-internal. * sysdeps/unix/sysv/linux/Makefile: Move tst-setgetname to tests-internal. * sysdeps/x86_64/fpu/Makefile: Add all libmvec test modules to modules-names-tests.	2017-05-11 19:27:59 -04:00
H.J. Lu	1432d38ea0	x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391 ] dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very early during startup. They are used by dynamic linker to determine platform and build an array of hardware capability names, which are added to search path when loading shared object. dl_platform and dl_hwcap are unused on x86-64. On i386, i386, i486, i586 and i686 platforms were supported and only SSE2 capability was used. On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and processor capabilities is obsolete since all information is available in dl_x86_cpu_features. This patch sets dl_platform and dl_hwcap from dl_x86_cpu_features in dynamic linker. On i386, the available plaforms are changed to i586 and i686 since i386 has been deprecated. On x86-64, the available plaforms are haswell, which is for Haswell class processors with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and AVX512PF. A capability, avx512_1, is also added to x86-64 for AVX512 ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL. [BZ #21391] * sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]: Only call init_cpu_features. [!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed. * sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise. * sysdeps/i386/dl-procinfo.h: Removed. * sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include <sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>. Include <sysdeps/x86/dl-procinfo.h>. (_dl_procinfo): Replace _DL_HWCAP_COUNT with 32. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]: Include <sysdeps/x86/dl-procinfo.h> instead of <sysdeps/generic/dl-procinfo.h>. * sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>. (init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask. * sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New. (bit_cpu_MOVBE): Likewise. (bit_cpu_BMI1): Likewise. (bit_cpu_BMI2): Likewise. (index_cpu_BMI1): Likewise. (index_cpu_BMI2): Likewise. (index_cpu_LZCNT): Likewise. (index_cpu_MOVBE): Likewise. (index_cpu_POPCNT): Likewise. (reg_BMI1): Likewise. (reg_BMI2): Likewise. (reg_LZCNT): Likewise. (reg_MOVBE): Likewise. (reg_POPCNT): Likewise. * sysdeps/x86/dl-hwcap.h: New file. * sysdeps/x86/dl-procinfo.h: Likewise. * sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New. (_dl_x86_platforms): Likewise.	2017-05-03 13:44:35 -07:00

1 2 3 4 5 ...

1116 Commits