glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-28 21:41:08 +00:00

Author	SHA1	Message	Date
Florian Weimer	1d2a8245ff	hsearch_r: Include <limits.h> It is needed for UINT_MAX.	2016-04-07 13:48:00 +02:00
Florian Weimer	c04af6068b	scratch_buffer_set_array_size: Include <limits.h> It is needed for CHAR_BIT.	2016-04-07 13:46:28 +02:00
H.J. Lu	a7d1c51482	X86-64: Prepare memmove-vec-unaligned-erms.S Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the default memcpy, mempcpy and memmove. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S (MEMCPY_SYMBOL): New. (MEMPCPY_SYMBOL): Likewise. (MEMMOVE_CHK_SYMBOL): Likewise. Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk symbols. Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on __mempcpy symbols. Provide alias for __memcpy_chk in libc.a. Provide alias for memcpy in libc.a and ld.so.	2016-04-06 10:19:16 -07:00
H.J. Lu	4af1bb06c5	X86-64: Prepare memset-vec-unaligned-erms.S Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the default memset. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (MEMSET_CHK_SYMBOL): New. Define if not defined. (__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH. Disabled fro now. Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk symbols. Properly check USE_MULTIARCH on __memset symbols.	2016-04-06 09:10:35 -07:00
H.J. Lu	a25322f4e8	Add memcpy/memmove/memset benchmarks with large data Add memcpy, memmove and memset benchmarks with large data sizes. * benchtests/Makefile (string-benchset): Add memcpy-large, memmove-large and memset-large. * benchtests/bench-memcpy-large.c: New file. * benchtests/bench-memmove-large.c: Likewise. * benchtests/bench-memmove-large.c: Likewise. * benchtests/bench-string.h (TIMEOUT): Don't redefine.	2016-04-06 08:37:39 -07:00
Stefan Liebler	aa7353ce5c	Mention Bug in ChangeLog for S390: Save and restore fprs/vrs while resolving symbols. The Bugzilla 19916 is added to the ChangeLog for commit `4603c51ef7`.	2016-04-06 15:21:00 +02:00
H.J. Lu	ec0cac9a1f	Force 32-bit displacement in memset-vec-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force 32-bit displacement to avoid long nop between instructions.	2016-04-05 05:21:19 -07:00
H.J. Lu	696ac77484	Add a comment in memset-sse2-unaligned-erms.S * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add a comment on VMOVU and VMOVA.	2016-04-05 05:19:18 -07:00
Florian Weimer	985fc132f2	strfmon_l: Use specified locale for number formatting [BZ #19633 ]	2016-04-04 15:18:13 +02:00
H.J. Lu	5cd7af016d	Don't put SSE2/AVX/AVX512 memmove/memset in ld.so Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX and AVX512 memmove and memset in ld.so. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip if not in libc. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise.	2016-04-03 14:35:38 -07:00
H.J. Lu	ea2785e96f	Fix memmove-vec-unaligned-erms.S __mempcpy_erms and __memmove_erms can't be placed between __memmove_chk and __memmove it breaks __memmove_chk. Don't check source == destination first since it is less common. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: (__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk with unaligned_erms. (__memmove_erms): Skip if source == destination. (__memmove_unaligned_erms): Don't check source == destination first.	2016-04-03 12:38:25 -07:00
H.J. Lu	27d3ce1467	Remove Fast_Copy_Backward from Intel Core processors Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors.	2016-04-01 15:09:14 -07:00
Adhemerval Zanella	2e51bc3813	Use PTR_ALIGN_DOWN on strcspn and strspn Tested on aarch64. * string/strcspn.c (strcspn): Use PTR_ALIGN_DOWN. * string/strspn.c (strspn): Likewise.	2016-04-01 18:33:03 -03:00
H.J. Lu	344303f3cf	Test 64-byte alignment in memset benchtest Add 64-byte alignment tests in memset benchtest for 64-byte vector registers. * benchtests/bench-memset.c (do_test): Support 64-byte alignment. (test_main): Test 64-byte alignment.	2016-04-01 10:00:12 -07:00
H.J. Lu	aea44bf61a	Test 64-byte alignment in memmove benchtest Add 64-byte alignment tests in memmove benchtest for 64-byte vector registers. * benchtests/bench-memmove.c (test_main): Test 64-byte alignment.	2016-04-01 09:59:09 -07:00
H.J. Lu	32b28d24a1	Test 64-byte alignment in memcpy benchtest Add 64-byte alignment tests in memcpy benchtest for 64-byte vector registers. * benchtests/bench-memcpy.c (test_main): Test 64-byte alignment.	2016-04-01 09:57:53 -07:00
Adhemerval Zanella	528ffb3a04	Remove powerpc64 strspn, strcspn, and strpbrk implementation This patch removes the powerpc64 optimized strspn, strcspn, and strpbrk assembly implementation now that the default C one implements the same strategy. On internal glibc benchtests current implementations shows similar performance with -O2. Tested on powerpc64le (POWER8). * sysdeps/powerpc/powerpc64/strcspn.S: Remove file. * sysdeps/powerpc/powerpc64/strpbrk.S: Remove file. * sysdeps/powerpc/powerpc64/strspn.S: Remove file.	2016-04-01 10:44:45 -03:00
Adhemerval Zanella	282b71f07e	Improve generic strpbrk performance With now a faster strcspn implementation, it is faster to just use it with some return tests than reimplementing strpbrk itself. As for strcspn optimization, it is generally at least 10 times faster than the existing implementation on bench-strspn on a few AArch64 implementations. Also the string/bits/string2.h inlines make no longer sense, as current implementation will already implement most of the optimizations. Tested on x86_64, i386, and aarch64. * string/strpbrk.c (strpbrk): Rewrite function. * string/bits/string2.h (strpbrk): Use __builtin_strpbrk. (__strpbrk_c2): Likewise. (__strpbrk_c3): Likewise. * string/string-inlines.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strpbrk_c3): Likewise.	2016-04-01 10:44:45 -03:00
Adhemerval Zanella	91f3b75f47	Improve generic strspn performance As for strcspn, this patch improves strspn performance using a much faster algorithm. It first constructs a 256-entry table based on the accept string and then uses it as a lookup table for the input string. As for strcspn optimization, it is generally at least 10 times faster than the existing implementation on bench-strspn on a few AArch64 implementations. Also the string/bits/string2.h inlines make no longer sense, as current implementation will already implement most of the optimizations. Tested on x86_64, i686, and aarch64. * string/strspn.c (strcspn): Rewrite function. * string/bits/string2.h (strspn): Use __builtin_strcspn. (__strspn_c1): Remove inline function. (__strspn_c2): Likewise. (__strspn_c3): Likewise. * string/string-inlines.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c1): Add compatibility symbol. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strspn_c3): Likewise.	2016-04-01 10:44:44 -03:00
Wilco Dijkstra	d3496c9f4f	Improve generic strcspn performance Improve strcspn performance using a much faster algorithm. It is kept simple so it works well on most targets. It is generally at least 10 times faster than the existing implementation on bench-strcspn on a few AArch64 implementations, and for some tests 100 times as fast (repeatedly calling strchr on a small string is extremely slow...). In fact the string/bits/string2.h inlines make no longer sense, as GCC already uses strlen if reject is an empty string, strchrnul is 5 times as fast as __strcspn_c1, while __strcspn_c2 and __strcspn_c3 are slower than the strcspn main loop for large strings (though reject length 2-4 could be special cased in the future to gain even more performance). Tested on x86_64, i686, and aarch64. * string/Version (libc): Add GLIBC_2.24. * string/strcspn.c (strcspn): Rewrite function. * string/bits/string2.h (strcspn): Use __builtin_strcspn. (__strcspn_c1): Remove inline function. (__strcspn_c2): Likewise. (__strcspn_c3): Likewise. * string/string-inline.c [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c1): Add compatibility symbol. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c2): Likewise. [SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c3): Likewise. * sysdeps/i386/string-inlines.c: Include generic string-inlines.c.	2016-04-01 10:44:40 -03:00
Stefan Liebler	d8a012c5c9	S390: Use ahi instead of aghi in 32bit _dl_runtime_resolve. This patch uses ahi instead of aghi in 32bit _dl_runtime_resolve to adjust the stack pointer. This is no functional change, but a cosmetic one. ChangeLog: * sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_resolve): Use ahi instead of aghi to adjust stack pointer.	2016-04-01 10:42:54 +02:00
Paul E. Murphy	37a4c70bd4	Increase internal precision of ldbl-128ibm decimal printf [BZ #19853 ] When the signs differ, the precision of the conversion sometimes drops below 106 bits. This strategy is identical to the hexadecimal variant. I've refactored tst-sprintf3 to enable testing a value with more than 30 significant digits in order to demonstrate this failure and its solution. Additionally, this implicitly fixes a typo in the shift quantities when subtracting from the high mantissa to compute the difference.	2016-03-31 12:14:33 -05:00
H.J. Lu	830566307f	Add x86-64 memset with unaligned store and rep stosb Implement x86-64 memset with unaligned store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. A single file provides 2 implementations of memset, one with rep stosb and the other without rep stosb. They share the same codes when size is between 2 times of vector register size and REP_STOSB_THRESHOLD which defaults to 2KB. Key features: 1. Use overlapping store to avoid branch. 2. For size <= 4 times of vector register size, fully unroll the loop. 3. For size > 4 times of vector register size, store 4 times of vector register size at a time. [BZ #19881] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and memset-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned, __memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned, __memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned, __memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned, __memset_sse2_unaligned_erms, __memset_erms, __memset_avx2_unaligned, __memset_avx2_unaligned_erms, __memset_avx512_unaligned_erms and __memset_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Likewise.	2016-03-31 10:06:07 -07:00
H.J. Lu	88b57b8ed4	Add x86-64 memmove with unaligned load/store and rep movsb Implement x86-64 memmove with unaligned load/store and rep movsb. Support 16-byte, 32-byte and 64-byte vector register sizes. When size <= 8 times of vector register size, there is no check for address overlap bewteen source and destination. Since overhead for overlap check is small when size > 8 times of vector register size, memcpy is an alias of memmove. A single file provides 2 implementations of memmove, one with rep movsb and the other without rep movsb. They share the same codes when size is between 2 times of vector register size and REP_MOVSB_THRESHOLD which is 2KB for 16-byte vector register size and scaled up by large vector register size. Key features: 1. Use overlapping load and store to avoid branch. 2. For size <= 8 times of vector register size, load all sources into registers and store them together. 3. If there is no address overlap bewteen source and destination, copy from both ends with 4 times of vector register size at a time. 4. If address of destination > address of source, backward copy 8 times of vector register size at a time. 5. Otherwise, forward copy 8 times of vector register size at a time. 6. Use rep movsb only for forward copy. Avoid slow backward rep movsb by fallbacking to backward copy 8 times of vector register size at a time. 7. Skip when address of destination == address of source. [BZ #19776] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and memmove-avx512-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Test __memmove_chk_avx512_unaligned_2, __memmove_chk_avx512_unaligned_erms, __memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms, __memmove_chk_sse2_unaligned_2, __memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2, __memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2, __memmove_avx512_unaligned_erms, __memmove_erms, __memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms, __memcpy_chk_avx512_unaligned_2, __memcpy_chk_avx512_unaligned_erms, __memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms, __memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms, __memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms, __memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms, __memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms, __memcpy_erms, __mempcpy_chk_avx512_unaligned_2, __mempcpy_chk_avx512_unaligned_erms, __mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms, __mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms, __mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms, __mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms, __mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and __mempcpy_erms. * sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New file. * sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S: Likwise. * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Likwise.	2016-03-31 10:04:40 -07:00
Stefan Liebler	5cdd1989d1	S390: Extend structs La_s390_regs / La_s390_retval with vector-registers. Starting with z13, vector registers can also occur as argument registers. Thus the passed input/output register structs for la_s390_[32\|64]_gnu_plt[enter\|exit] functions should reflect those new registers. This patch extends these structs La_s390_regs and La_s390_retval and adjusts _dl_runtime_profile() to handle those fields in case of running on a z13 machine. ChangeLog: * sysdeps/s390/bits/link.h: (La_s390_vr) New typedef. (La_s390_32_regs): Append vector register lr_v24-lr_v31. (La_s390_64_regs): Likewise. (La_s390_32_retval): Append vector register lrv_v24. (La_s390_64_retval): Likeweise. * sysdeps/s390/s390-32/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_32_regs and La_s390_32_retval. * sysdeps/s390/s390-64/dl-trampoline.h (_dl_runtime_profile): Handle extended structs La_s390_64_regs and La_s390_64_retval.	2016-03-31 17:37:16 +02:00
Stefan Liebler	4603c51ef7	S390: Save and restore fprs/vrs while resolving symbols. On s390, no fpr/vrs were saved while resolving a symbol via _dl_runtime_resolve/_dl_runtime_profile. According to the abi, the fpr-arguments are defined as call clobbered. In leaf-functions, gcc 4.9 and newer can use fprs for saving/restoring gprs instead of saving them to the stack. If gcc do this in one of the resolver-functions, then the floating point arguments of a library-function are invalid for the first library-function-call. Thus, this patch saves/restores the fprs around the resolving code. The same could occur for vector registers. Furthermore an ifunc-resolver could also clobber the vector/floating point argument registers. Thus this patch provides the further variants _dl_runtime_resolve_vx/ _dl_runtime_profile_vx, which are used if the kernel claims, that we run on a machine with vector registers. Furthermore, if _dl_runtime_profile calls _dl_call_pltexit, the pointers to inregs-/outregs-structs were setup invalid. Now they point to the correct location in the stack-frame. Before branching back to the caller, the return values are now restored instead of containing the return values of the _dl_call_pltexit() call. On s390-32, an endless loop occurs if _dl_call_pltexit() should be called. Now, this code-path branches to this function instead of just after the preceding basr-instruction. ChangeLog: * sysdeps/s390/s390-32/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-32/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues if _dl_call_pltexit is called. * sysdeps/s390/s390-32/dl-machine.h (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx. * sysdeps/s390/s390-64/dl-trampoline.S: Include dl-trampoline.h twice to create a non-vector/vector version for _dl_runtime_resolve and _dl_runtime_profile. Move implementation to ... * sysdeps/s390/s390-64/dl-trampoline.h: ... here. (_dl_runtime_resolve) Save and restore fpr/vrs. (_dl_runtime_profile) Save and restore vrs and fix some issues * sysdeps/s390/s390-64/dl-machine.h: (elf_machine_runtime_setup): Choose the correct resolver function if running on a machine with vx.	2016-03-31 17:37:16 +02:00
Adhemerval Zanella	e91bd74658	Fix tst-dlsym-error build This patch fixes the new test tst-dlsym-error build on aarch64 (and possible other architectures as well) due missing strchrnul definition. * elf/tst-dlsym-error.c: Include <string.h> for strchrnul.	2016-03-31 10:51:51 -03:00
Florian Weimer	7d45c163d0	Report dlsym, dlvsym lookup errors using dlerror [BZ #19509 ] * elf/dl-lookup.c (_dl_lookup_symbol_x): Report error even if skip_map != NULL. * elf/tst-dlsym-error.c: New file. * elf/Makefile (tests): Add tst-dlsym-error. (tst-dlsym-error): Link against libdl.	2016-03-31 11:26:55 +02:00
Joseph Myers	258ec8abc1	[microblaze] Remove __ASSUME_FUTIMESAT. MicroBlaze has a special version of futimesat.c because it gained the futimesat syscall later than other non-asm-generic architectures. Now the minimum kernel is recent enough that this syscall can always be assumed to be present for MicroBlaze, so this patch removes the special version and the __ASSUME_FUTIMESAT macro, resulting in the sysdeps/unix/sysv/linux/futimesat.c version being used. Untested. * sysdeps/unix/sysv/linux/microblaze/kernel-features.h (__ASSUME_FUTIMESAT): Remove macro. * sysdeps/unix/sysv/linux/microblaze/futimesat.c: Remove file.	2016-03-29 22:13:36 +00:00
Florian Weimer	317b199b4a	CVE-2016-3075: Stack overflow in _nss_dns_getnetbyname_r [BZ #19879 ] The defensive copy is not needed because the name may not alias the output buffer.	2016-03-29 12:57:56 +02:00
Florian Weimer	a6033052d0	nss_db: Propagate ERANGE error if parse_line fails [BZ #19837 ] Reproducer (needs to run as root): perl -e \ 'print "large❌999:" . join(",", map {"user$_"} (1 .. 135))."\n"' \ >> /etc/group cd /var/db make getent -s db group After the fix, the last command should list the "large" group. The magic number 135 has been chosen so that the line is shorter than 1024 bytes, but the pointers required to encode the member array will cross the threshold, triggering the bug.	2016-03-29 11:27:32 +02:00
H.J. Lu	0791f91dff	Initial Enhanced REP MOVSB/STOSB (ERMS) support The newer Intel processors support Enhanced REP MOVSB/STOSB (ERMS) which has a feature bit in CPUID. This patch adds the Enhanced REP MOVSB/STOSB (ERMS) bit to x86 cpu-features. * sysdeps/x86/cpu-features.h (bit_cpu_ERMS): New. (index_cpu_ERMS): Likewise. (reg_ERMS): Likewise.	2016-03-28 19:23:31 -07:00
Aurelien Jarno	9ff9351d02	Synchronize <sys/personality.h> with kernel headers <sys/personality.h> is out of sync with kernel headers, missing the UNAME26, FDPIC_FUNCPTRS and PER_LINUX_FDPIC entries. Fix that. Changelog: * sysdeps/unix/sysv/linux/sys/personality.h (UNAME26, FDPIC_FUNCPTRS, PER_LINUX_FDPIC): Add.	2016-03-28 22:42:52 +02:00
H.J. Lu	064f01b10b	Make __memcpy_avx512_no_vzeroupper an alias Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make __memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper to reduce code size of libc.so. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed to ... * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This. (MEMCPY): Don't define. (MEMCPY_CHK): Likewise. (MEMPCPY): Likewise. (MEMPCPY_CHK): Likewise. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMPCPY_CHK): Renamed to ... (__mempcpy_chk_avx512_no_vzeroupper): This. (MEMCPY_CHK): Renamed to ... (__memmove_chk_avx512_no_vzeroupper): This. (MEMCPY): Renamed to ... (__memmove_avx512_no_vzeroupper): This. (__memcpy_avx512_no_vzeroupper): New alias. (__memcpy_chk_avx512_no_vzeroupper): Likewise.	2016-03-28 13:16:22 -07:00
H.J. Lu	c365e615f7	Implement x86-64 multiarch mempcpy in memcpy Implement x86-64 multiarch mempcpy in memcpy to share most of code. It reduces code size of libc.so. [BZ #18858] * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned and mempcpy-avx512-no-vzeroupper. * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New. (MEMPCPY): Likewise. * sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed. * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise. * sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise.	2016-03-28 13:13:51 -07:00
H.J. Lu	e41b395523	[x86] Add a feature bit: Fast_Unaligned_Copy On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load.	2016-03-28 04:40:03 -07:00
Florian Weimer	b66d837bb5	resolv: Always set resplen2 out parameter in send_dg [BZ #19791 ] Since commit `44d20bca52` (Implement second fallback mode for DNS requests), there is a code path which returns early, before resplen2 is initialized. This happens if the name server address is immediately recognized as invalid (because of lack of protocol support, or if it is a broadcast address such 255.255.255.255, or another invalid address). If this happens and resplen2 was non-zero (which is the case if a previous query resulted in a failure), __libc_res_nquery would reuse an existing second answer buffer. This answer has been previously identified as unusable (for example, it could be an NXDOMAIN response). Due to the presence of a second answer, no name server switching will occur. The result is a name resolution failure, although a successful resolution would have been possible if name servers have been switched and queries had proceeded along the search path. The above paragraph still simplifies the situation. Before glibc 2.23, if the second answer needed malloc, the stub resolver would still attempt to reuse the second answer, but this is not possible because __libc_res_nsearch has freed it, after the unsuccessful call to __libc_res_nquerydomain, and set the buffer pointer to NULL. This eventually leads to an assertion failure in __libc_res_nquery: / Make sure both hp and hp2 are defined / assert((hp != NULL) && (hp2 != NULL)); If assertions are disabled, the consequence is a NULL pointer dereference on the next line. Starting with glibc 2.23, as a result of commit `e9db92d3ac` (CVE-2015-7547: getaddrinfo() stack-based buffer overflow (Bug 18665)), the second answer is always allocated with malloc. This means that the assertion failure happens with small responses as well because there is no buffer to reuse, as soon as there is a name resolution failure which triggers a search for an answer along the search path. This commit addresses the issue by ensuring that resplen2 is initialized before the send_dg function returns. This commit also addresses a bug where an invalid second reply is incorrectly returned as a valid to the caller.	2016-03-25 11:49:52 +01:00
Florian Weimer	f327f5b47b	tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860 ] [BZ# 19860] * sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return zero if the compiler does not provide the AVX512F bit.	2016-03-25 11:11:42 +01:00
Joseph Myers	c898991d8b	Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848). Bug 19848 reports cases where powl on x86 / x86_64 has error accumulation, for small integer exponents, larger than permitted by glibc's accuracy goals, at least in some rounding modes. This patch further restricts the exponent range for which the small-integer-exponent logic is used to limit the possible error accumulation. Tested for x86_64 and x86 and ulps updated accordingly. [BZ #19848] * sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value from 8 to 4. (__ieee754_powl): Compare integer exponent against 4 not 8. * sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value from 8 to 4. (__ieee754_powl): Compare integer exponent against 4 not 8. * math/auto-libm-test-in: Add more tests of pow. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise.	2016-03-24 01:32:52 +00:00
Aurelien Jarno	7e1ff08c26	Assume __NR_utimensat is always defined With the 2.6.32 minimum kernel on x86 and 3.2 on other architectures, __NR_utimensat is always defined. Changelog: * sysdeps/unix/sysv/linux/futimens.c (futimens) [__NR_utimensat]: Make code unconditional. [!__NR_utimensat]: Remove conditional code. * sysdeps/unix/sysv/linux/lutimes.c (lutimes) [__NR_utimensat]: Make code unconditional. [!__NR_utimensat]: Remove conditional code. * sysdeps/unix/sysv/linux/utimensat.c (utimensat) [__NR_utimensat]: Make code unconditional. [!__NR_utimensat]: Remove conditional code.	2016-03-23 23:35:08 +01:00
Aurelien Jarno	16d94f67e5	Assume __NR_openat is always defined With the 2.6.32 minimum kernel on x86 and 3.2 on other architectures, __NR_openat is always defined. Changelog: * sysdeps/unix/sysv/linux/dl-openat64.c (openat64) [__NR_openat]: Make code unconditional.	2016-03-23 23:35:08 +01:00
Nick Alcock	7a25d6a84d	x86, pthread_cond_wait: Do not depend on %eax not being clobbered The x86-specific versions of both pthread_cond_wait and pthread_cond_timedwait have (in their fall-back-to-futex-wait slow paths) calls to __pthread_mutex_cond_lock_adjust followed by __pthread_mutex_unlock_usercnt, which load the parameters before the first call but then assume that the first parameter, in %eax, will survive unaffected. This happens to have been true before now, but %eax is a call-clobbered register, and this assumption is not safe: it could change at any time, at GCC's whim, and indeed the stack-protector canary checking code clobbers %eax while checking that the canary is uncorrupted. So reload %eax before calling __pthread_mutex_unlock_usercnt. (Do this unconditionally, even when stack-protection is not in use, because it's the right thing to do, it's a slow path, and anything else is dicing with death.) sysdeps/unix/sysv/linux/i386/pthread_cond_timedwait.S: Reload call-clobbered %eax on retry path. * sysdeps/unix/sysv/linux/i386/pthread_cond_wait.S: Likewise.	2016-03-23 13:40:14 +01:00
H.J. Lu	3c9a4cd16c	Don't set %rcx twice before "rep movsb" * sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY): Don't set %rcx twice before "rep movsb".	2016-03-22 08:36:16 -07:00
H.J. Lu	f781a9e961	Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.	2016-03-22 07:47:20 -07:00
Samuel Thibault	b87e41378b	Fix malloc threaded tests link on non-Linux * malloc/Makefile ($(objpfx)tst-malloc-backtrace, $(objpfx)tst-malloc-thread-exit, $(objpfx)tst-malloc-thread-fail): Use $(shared-thread-library) instead of hardcoding the path to libpthread.	2016-03-22 09:58:48 +01:00
Joseph Myers	37ad347359	Remove __ASSUME_GETDENTS64_SYSCALL. This patch removes the __ASSUME_GETDENTS64_SYSCALL macro, as its definition is constant given the new kernel version requirements (and was constant anyway before those requirements except for MIPS n32). Note that the "#ifdef __NR_getdents64" conditional is still needed, because MIPS n64 only has the getdents syscall (being a 64-bit ABI, that syscall is 64-bit; the difference between the two on 64-bit architectures is where d_type goes). If MIPS n64 were to gain the getdents64 syscall and we wanted to use it conditionally on the kernel version at runtime we'd have to revert this patch, but I think that's unlikely (and in any case, we could follow the simpler approach of undefining __NR_getdents64 if the syscall can't be assumed, just like we do for accept4 / recvmmsg / sendmmsg syscalls on architectures where socketcall support came first). Most of the getdents.c changes are reindentation. Tested for x86_64 and x86 that installed stripped shared libraries are unchanged by the patch. * sysdeps/unix/sysv/linux/kernel-features.h (__ASSUME_GETDENTS64_SYSCALL): Remove macro. * sysdeps/unix/sysv/linux/getdents.c [!__ASSUME_GETDENTS64_SYSCALL]: Remove conditional code. [!have_no_getdents64_defined]: Likewise. (__GETDENTS): Remove __have_no_getdents64 conditional.	2016-03-22 00:32:20 +00:00
Joseph Myers	238d60ac9b	Remove __ASSUME_SIGNALFD4. Current Linux kernel version requirements mean the signalfd4 syscall can always be assumed to be available. This patch removes __ASSUME_SIGNALFD4 and associated conditionals. Tested for x86_64 and x86 that installed stripped shared libraries are unchanged by the patch. * sysdeps/unix/sysv/linux/kernel-features.h (__ASSUME_SIGNALFD4): Remove macro. * sysdeps/unix/sysv/linux/signalfd.c: Do not include <kernel-features.h>. (signalfd) [__NR_signalfd4]: Make code unconditional. (signalfd) [!__ASSUME_SIGNALFD4]: Remove conditional code.	2016-03-21 16:30:05 +00:00
Adhemerval Zanella	67b23376fb	posix: Fix posix_spawn implict check style This patch fixes the implicit check style add in `2a69f853c` for the general convention one. Checked on x86_64. * sysdeps/unix/sysv/linux/spawni.c (__spawnix): Fix implict checks style.	2016-03-21 12:12:26 -03:00
H.J. Lu	893e371b2f	Use JUMPTARGET in x86-64 pthread When PLT may be used, JUMPTARGET should be used instead calling the function directly. * sysdeps/unix/sysv/linux/x86_64/cancellation.S (__pthread_enable_asynccancel): Use JUMPTARGET to call __pthread_unwind. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S (__condvar_cleanup2): Use JUMPTARGET to call _Unwind_Resume. * sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S (__condvar_cleanup1): Likewise.	2016-03-21 06:51:05 -07:00
Mike Frysinger	a4cea54b12	localedata: standardize copyright/license information [BZ #11213 ] Use the language from the FSF in all locale files to disclaim any license/copyright on locale data. See https://sourceware.org/ml/libc-locales/2013-q1/msg00048.html	2016-03-21 02:29:56 -04:00

... 6 7 8 9 10 ...

30338 Commits