glibc/sysdeps/x86_64/memset.S

78 lines
2.0 KiB
ArmAsm
Raw Normal View History

/* memset/bzero -- set memory area to CH/0
Optimized version for x86-64.
Copyright (C) 2002-2022 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
Prefer https to http for gnu.org and fsf.org URLs Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 05:40:42 +00:00
<https://www.gnu.org/licenses/>. */
#include <sysdep.h>
#define USE_WITH_SSE2 1
X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
2016-06-08 20:55:45 +00:00
#define VEC_SIZE 16
#define MOV_SIZE 3
#define RET_SIZE 1
X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
2016-06-08 20:55:45 +00:00
#define VEC(i) xmm##i
#define VMOVU movups
#define VMOVA movaps
X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
2016-06-08 20:55:45 +00:00
# define MEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
2016-06-08 20:55:45 +00:00
movd d, %xmm0; \
movq r, %rax; \
punpcklbw %xmm0, %xmm0; \
punpcklwd %xmm0, %xmm0; \
pshufd $0, %xmm0, %xmm0
X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
2016-06-08 20:55:45 +00:00
# define BZERO_ZERO_VEC0() \
pxor %xmm0, %xmm0
# define WMEMSET_SET_VEC0_AND_SET_RETURN(d, r) \
x86-64: Optimize wmemset with SSE2/AVX2/AVX512 The difference between memset and wmemset is byte vs int. Add stubs to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size: SSE2 wmemset: shl $0x2,%rdx movd %esi,%xmm0 mov %rdi,%rax pshufd $0x0,%xmm0,%xmm0 jmp entry_from_wmemset SSE2 memset: movd %esi,%xmm0 mov %rdi,%rax punpcklbw %xmm0,%xmm0 punpcklwd %xmm0,%xmm0 pshufd $0x0,%xmm0,%xmm0 entry_from_wmemset: Since the ERMS versions of wmemset requires "rep stosl" instead of "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset is about 6X faster on Haswell. * include/wchar.h (__wmemset_chk): New. * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_CHK_SYMBOL): Likewise. (WMEMSET_SYMBOL): Likewise. (__wmemset): Add hidden definition. (wmemset): Add weak hidden definition. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wmemset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned, __wmemset_avx2_unaligned, __wmemset_avx512_unaligned, __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned and __wmemset_chk_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated. (WMEMSET_CHK_SYMBOL): New. (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise. (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise. * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New. (libc_hidden_builtin_def): Also define __GI_wmemset and __GI___wmemset. (weak_alias): New. * sysdeps/x86_64/multiarch/wmemset.c: New file. * sysdeps/x86_64/multiarch/wmemset.h: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/wmemset.c: Likewise. * sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
movd d, %xmm0; \
pshufd $0, %xmm0, %xmm0; \
movq r, %rax
# define MEMSET_VDUP_TO_VEC0_HIGH()
# define MEMSET_VDUP_TO_VEC0_LOW()
# define WMEMSET_VDUP_TO_VEC0_HIGH()
# define WMEMSET_VDUP_TO_VEC0_LOW()
x86-64: Optimize wmemset with SSE2/AVX2/AVX512 The difference between memset and wmemset is byte vs int. Add stubs to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size: SSE2 wmemset: shl $0x2,%rdx movd %esi,%xmm0 mov %rdi,%rax pshufd $0x0,%xmm0,%xmm0 jmp entry_from_wmemset SSE2 memset: movd %esi,%xmm0 mov %rdi,%rax punpcklbw %xmm0,%xmm0 punpcklwd %xmm0,%xmm0 pshufd $0x0,%xmm0,%xmm0 entry_from_wmemset: Since the ERMS versions of wmemset requires "rep stosl" instead of "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset is about 6X faster on Haswell. * include/wchar.h (__wmemset_chk): New. * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_CHK_SYMBOL): Likewise. (WMEMSET_SYMBOL): Likewise. (__wmemset): Add hidden definition. (wmemset): Add weak hidden definition. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wmemset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned, __wmemset_avx2_unaligned, __wmemset_avx512_unaligned, __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned and __wmemset_chk_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated. (WMEMSET_CHK_SYMBOL): New. (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise. (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise. * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New. (libc_hidden_builtin_def): Also define __GI_wmemset and __GI___wmemset. (weak_alias): New. * sysdeps/x86_64/multiarch/wmemset.c: New file. * sysdeps/x86_64/multiarch/wmemset.h: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/wmemset.c: Likewise. * sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
2016-06-08 20:55:45 +00:00
#define SECTION(p) p
#ifndef MEMSET_SYMBOL
# define MEMSET_CHK_SYMBOL(p,s) p
# define MEMSET_SYMBOL(p,s) memset
2004-10-15 Jakub Jelinek <jakub@redhat.com> * elf/dl-minimal.c (__chk_fail): New. Add rtld_hidden_def. * sysdeps/unix/sysv/linux/readonly-area.c: New file. * sysdeps/i386/i686/memmove.S (__memmove_chk): Add checking routine. * sysdeps/i386/i686/memcpy.S (__memcpy_chk): Likewise. * sysdeps/i386/i686/mempcpy.S (__mempcpy_chk): Likewise. * sysdeps/i386/i686/memset.S (__memset_chk): Likewise. * sysdeps/i386/i686/memmove-chk.S: New file. * sysdeps/i386/i686/memcpy-chk.S: Likewise. * sysdeps/i386/i686/mempcpy-chk.S: Likewise. * sysdeps/i386/i686/memset-chk.S: Likewise. * sysdeps/generic/strcat-chk.c (__strcat_chk): Don't __chk_fail if exactly fitting into buffer. * sysdeps/generic/strncat-chk.c (__strncat_chk): Likewise. * sysdeps/generic/readonly-area.c: New file. * sysdeps/generic/strncpy-chk.c (__strncpy_chk): Only test destlen once. * sysdeps/x86_64/memset.S (__memset_chk): Add checking routine. * sysdeps/x86_64/memcpy.S (__memcpy_chk): Likewise. * sysdeps/x86_64/mempcpy.S (__memcpy_chk): Define to __mempcpy_chk. * sysdeps/x86_64/memcpy-chk.S: New file. * sysdeps/x86_64/mempcpy-chk.S: Likewise. * sysdeps/x86_64/memset-chk.S: Likewise. * sysdeps/x86_64/strcpy-chk.S: Likewise. * sysdeps/x86_64/stpcpy-chk.S: Likewise. * argp/argp-xinl.c (__OPTIMIZE__): Define to 1 instead of nothing. * argp/argp-fs-xinl.c (__OPTIMIZE__): Likewise. * debug/tst-chk1.c: New test. * debug/tst-chk2.c: Likewise. * debug/tst-chk3.c: Likewise. * debug/test-strcpy_chk.c: Likewise. * debug/test-stpcpy_chk.c: Likewise. * debug/vsprintf_chk.c (__vsprintf_chk): If flags > 0, request _IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def. * debug/Makefile (routines): Add printf_chk, fprintf_chk, vprintf_chk, vfprintf_chk, gets_chk and readonly-area. (CFLAGS-*_chk.c): Set. (tests): Add tst-chk1, tst-chk2, tst-chk3, test-strcpy_chk and test-stpcpy_chk. * debug/vprintf_chk.c: New file. * debug/printf_chk.c: Likewise. * debug/vfprintf_chk.c: Likewise. * debug/fprintf_chk.c: Likewise. * debug/gets_chk.c: Likewise. * debug/chk_fail.c (__chk_fail): Add libc_hidden_def. * debug/snprintf_chk.c (__snprintf_chk): Fix order of arguments passed to __vsnprintf_chk. * debug/Versions (libc): Export __printf_chk, __fprintf_chk, __vprintf_chk, __vfprintf_chk and __gets_chk @GLIBC_2.3.4. * debug/vsnprintf_chk.c (__vsnprintf_chk): Don't call __vsnprintf, instead create a temporary file with _IO_strn_jumps jumptable. If flags > 0, request _IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def. * libio/Makefile (headers): Add bits/stdio2.h. * libio/stdio.h: Include <bits/stdio2.h> if __USE_FORTIFY_LEVEL. (sprintf, snprintf, vsprintf, vsnprintf): Remove defines. * libio/strfile.h (_IO_strnfile): New type. (_IO_strn_jumps): New extern. * libio/vsnprintf.c (_IO_strnfile): Remove. (_IO_strn_jumps): Remove static. * libio/bits/stdio2.h: New file. * libio/vswprintf.c (_IO_strnfile): Rename type to... (_IO_wstrnfile): ...this. Adjust all uses. * libio/libio.h (_IO_FLAGS2_CHECK_PERCENT_N): Define. * stdio-common/vfprintf.c (STR_LEN): Define. (vfprintf): Add readonly_format variable. Handle _IO_FLAGS2_CHECK_PERCENT_N. (buffered_vfprintf): Copy _flags2. * include/stdio.h (__sprintf_chk, __snprintf_chk, __vsprintf_chk, __vsnprintf_chk, __printf_chk, __fprintf_chk, __vprintf_chk, __vfprintf_chk): New prototypes. (__vsprintf_chk, __vsnprintf_chk): Add libc_hidden_proto. * include/string.h (__memcpy_chk, __memmove_chk, __mempcpy_chk, __memset_chk, __strcpy_chk, __stpcpy_chk, __strncpy_chk, __strcat_chk, __strncat_chk): New prototypes. * include/bits/string3.h: New file. * include/sys/cdefs.h (__chk_fail): Add libc_hidden_proto and rtld_hidden_proto. * string/Makefile (headers): Add bits/string3.h. * string/bits/string3.h (bcopy, bzero): New defines. (memset, memcpy, memmove, strcpy, strncpy, strcat, strncat): Change macros so that inlines are used only if unknown destination size or side-effects in destination argument. (mempcpy, stpcpy): Likewise. Protect with #ifdef __USE_GNU. 2004-09-16 Ulrich Drepper <drepper@redhat.com> * debug/Makefile (routines): Add *_chk. * debug/Versions (libc): Export __chk_fail, __memcpy_chk, __memmove_chk, __mempcpy_chk, __memset_chk, __stpcpy_chk, __strcat_chk, __strcpy_chk, __strncat_chk, __strncpy_chk, __sprintf_chk, __vsprintf_chk, __snprintf_chk, __vsnprintf_chk @GLIBC_2.3.4. * debug/chk_fail.c: New file. * debug/snprintf_chk.c: Likewise. * debug/sprintf_chk.c: Likewise. * debug/vsnprintf_chk.c: Likewise. * debug/vsprintf_chk.c: Likewise. * include/features.h (_FORTIFY_SOURCE): Document, handle. (__USE_FORTIFY_LEVEL): Define. (__GNUC_PREREQ): Move to earlier location. * include/sys/cdefs.h (__chk_fail): New prototype. * libio/bits/stdio.h (sprintf, vsprintf, snprintf, vsnprintf): Define if __USE_FORTIFY_LEVEL. * misc/sys/cdefs.h (__bos, __bos0): Define. * string/string.h: Include <bits/string3.h> if __USE_FORTIFY_LEVEL. * bits/string/string3.h: New header. * sysdeps/generic/memcpy_chk.c: New file. * sysdeps/generic/memmove_chk.c: Likewise. * sysdeps/generic/mempcpy_chk.c: Likewise. * sysdeps/generic/memset_chk.c: Likewise. * sysdeps/generic/stpcpy_chk.c: Likewise. * sysdeps/generic/strcat_chk.c: Likewise. * sysdeps/generic/strcpy_chk.c: Likewise. * sysdeps/generic/strncat_chk.c: Likewise. * sysdeps/generic/strncpy_chk.c: Likewise. 2004-10-15 Jakub Jelinek <jakub@redhat.com> * elf/dl-minimal.c (__chk_fail): New. Add rtld_hidden_def. * sysdeps/unix/sysv/linux/readonly-area.c: New file. * sysdeps/i386/i686/memmove.S (__memmove_chk): Add checking routine. * sysdeps/i386/i686/memcpy.S (__memcpy_chk): Likewise. * sysdeps/i386/i686/mempcpy.S (__mempcpy_chk): Likewise. * sysdeps/i386/i686/memset.S (__memset_chk): Likewise. * sysdeps/i386/i686/memmove-chk.S: New file. * sysdeps/i386/i686/memcpy-chk.S: Likewise. * sysdeps/i386/i686/mempcpy-chk.S: Likewise. * sysdeps/i386/i686/memset-chk.S: Likewise. * sysdeps/generic/strcat-chk.c (__strcat_chk): Don't __chk_fail if exactly fitting into buffer. * sysdeps/generic/strncat-chk.c (__strncat_chk): Likewise. * sysdeps/generic/readonly-area.c: New file. * sysdeps/generic/strncpy-chk.c (__strncpy_chk): Only test destlen once. * sysdeps/x86_64/memset.S (__memset_chk): Add checking routine. * sysdeps/x86_64/memcpy.S (__memcpy_chk): Likewise. * sysdeps/x86_64/mempcpy.S (__memcpy_chk): Define to __mempcpy_chk. * sysdeps/x86_64/memcpy-chk.S: New file. * sysdeps/x86_64/mempcpy-chk.S: Likewise. * sysdeps/x86_64/memset-chk.S: Likewise. * sysdeps/x86_64/strcpy-chk.S: Likewise. * sysdeps/x86_64/stpcpy-chk.S: Likewise. * argp/argp-xinl.c (__OPTIMIZE__): Define to 1 instead of nothing. * argp/argp-fs-xinl.c (__OPTIMIZE__): Likewise. * debug/tst-chk1.c: New test. * debug/tst-chk2.c: Likewise. * debug/tst-chk3.c: Likewise. * debug/test-strcpy_chk.c: Likewise. * debug/test-stpcpy_chk.c: Likewise. * debug/vsprintf_chk.c (__vsprintf_chk): If flags > 0, request _IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def. * debug/Makefile (routines): Add printf_chk, fprintf_chk, vprintf_chk, vfprintf_chk, gets_chk and readonly-area. (CFLAGS-*_chk.c): Set. (tests): Add tst-chk1, tst-chk2, tst-chk3, test-strcpy_chk and test-stpcpy_chk. * debug/vprintf_chk.c: New file. * debug/printf_chk.c: Likewise. * debug/vfprintf_chk.c: Likewise. * debug/fprintf_chk.c: Likewise. * debug/gets_chk.c: Likewise. * debug/chk_fail.c (__chk_fail): Add libc_hidden_def. * debug/snprintf_chk.c (__snprintf_chk): Fix order of arguments passed to __vsnprintf_chk. * debug/Versions (libc): Export __printf_chk, __fprintf_chk, __vprintf_chk, __vfprintf_chk and __gets_chk @GLIBC_2.3.4. * debug/vsnprintf_chk.c (__vsnprintf_chk): Don't call __vsnprintf, instead create a temporary file with _IO_strn_jumps jumptable. If flags > 0, request _IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def. * libio/Makefile (headers): Add bits/stdio2.h. * libio/stdio.h: Include <bits/stdio2.h> if __USE_FORTIFY_LEVEL. (sprintf, snprintf, vsprintf, vsnprintf): Remove defines. * libio/strfile.h (_IO_strnfile): New type. (_IO_strn_jumps): New extern. * libio/vsnprintf.c (_IO_strnfile): Remove. (_IO_strn_jumps): Remove static. * libio/bits/stdio2.h: New file. * libio/vswprintf.c (_IO_strnfile): Rename type to... (_IO_wstrnfile): ...this. Adjust all uses. * libio/libio.h (_IO_FLAGS2_CHECK_PERCENT_N): Define. * stdio-common/vfprintf.c (STR_LEN): Define. (vfprintf): Add readonly_format variable. Handle _IO_FLAGS2_CHECK_PERCENT_N. (buffered_vfprintf): Copy _flags2. * include/stdio.h (__sprintf_chk, __snprintf_chk, __vsprintf_chk, __vsnprintf_chk, __printf_chk, __fprintf_chk, __vprintf_chk, __vfprintf_chk): New prototypes. (__vsprintf_chk, __vsnprintf_chk): Add libc_hidden_proto. * include/string.h (__memcpy_chk, __memmove_chk, __mempcpy_chk, __memset_chk, __strcpy_chk, __stpcpy_chk, __strncpy_chk, __strcat_chk, __strncat_chk): New prototypes. * include/bits/string3.h: New file. * include/sys/cdefs.h (__chk_fail): Add libc_hidden_proto and rtld_hidden_proto. * string/Makefile (headers): Add bits/string3.h. * string/bits/string3.h (bcopy, bzero): New defines. (memset, memcpy, memmove, strcpy, strncpy, strcat, strncat): Change macros so that inlines are used only if unknown destination size or side-effects in destination argument. (mempcpy, stpcpy): Likewise. Protect with #ifdef __USE_GNU. 2004-09-16 Ulrich Drepper <drepper@redhat.com> * debug/Makefile (routines): Add *_chk. * debug/Versions (libc): Export __chk_fail, __memcpy_chk, __memmove_chk, __mempcpy_chk, __memset_chk, __stpcpy_chk, __strcat_chk, __strcpy_chk, __strncat_chk, __strncpy_chk, __sprintf_chk, __vsprintf_chk, __snprintf_chk, __vsnprintf_chk @GLIBC_2.3.4. * debug/chk_fail.c: New file. * debug/snprintf_chk.c: Likewise. * debug/sprintf_chk.c: Likewise. * debug/vsnprintf_chk.c: Likewise. * debug/vsprintf_chk.c: Likewise. * include/features.h (_FORTIFY_SOURCE): Document, handle. (__USE_FORTIFY_LEVEL): Define. (__GNUC_PREREQ): Move to earlier location. * include/sys/cdefs.h (__chk_fail): New prototype. * libio/bits/stdio.h (sprintf, vsprintf, snprintf, vsnprintf): Define if __USE_FORTIFY_LEVEL. * misc/sys/cdefs.h (__bos, __bos0): Define. * string/string.h: Include <bits/string3.h> if __USE_FORTIFY_LEVEL. * bits/string/string3.h: New header. * sysdeps/generic/memcpy_chk.c: New file. * sysdeps/generic/memmove_chk.c: Likewise. * sysdeps/generic/mempcpy_chk.c: Likewise. * sysdeps/generic/memset_chk.c: Likewise. * sysdeps/generic/stpcpy_chk.c: Likewise. * sysdeps/generic/strcat_chk.c: Likewise. * sysdeps/generic/strcpy_chk.c: Likewise. * sysdeps/generic/strncat_chk.c: Likewise. * sysdeps/generic/strncpy_chk.c: Likewise.
2004-10-18 04:17:19 +00:00
#endif
2010-11-08 08:41:34 +00:00
#ifndef BZERO_SYMBOL
# define BZERO_SYMBOL(p,s) __bzero
#endif
x86-64: Optimize wmemset with SSE2/AVX2/AVX512 The difference between memset and wmemset is byte vs int. Add stubs to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size: SSE2 wmemset: shl $0x2,%rdx movd %esi,%xmm0 mov %rdi,%rax pshufd $0x0,%xmm0,%xmm0 jmp entry_from_wmemset SSE2 memset: movd %esi,%xmm0 mov %rdi,%rax punpcklbw %xmm0,%xmm0 punpcklwd %xmm0,%xmm0 pshufd $0x0,%xmm0,%xmm0 entry_from_wmemset: Since the ERMS versions of wmemset requires "rep stosl" instead of "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset is about 6X faster on Haswell. * include/wchar.h (__wmemset_chk): New. * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_CHK_SYMBOL): Likewise. (WMEMSET_SYMBOL): Likewise. (__wmemset): Add hidden definition. (wmemset): Add weak hidden definition. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wmemset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned, __wmemset_avx2_unaligned, __wmemset_avx512_unaligned, __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned and __wmemset_chk_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated. (WMEMSET_CHK_SYMBOL): New. (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise. (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise. * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New. (libc_hidden_builtin_def): Also define __GI_wmemset and __GI___wmemset. (weak_alias): New. * sysdeps/x86_64/multiarch/wmemset.c: New file. * sysdeps/x86_64/multiarch/wmemset.h: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/wmemset.c: Likewise. * sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
#ifndef WMEMSET_SYMBOL
# define WMEMSET_CHK_SYMBOL(p,s) p
# define WMEMSET_SYMBOL(p,s) __wmemset
#endif
X86-64: Remove the previous SSE2/AVX2 memsets Since the new SSE2/AVX2 memsets are faster than the previous ones, we can remove the previous SSE2/AVX2 memsets and replace them with the new ones. This reduces the size of libc.so by about 900 bytes. No change in IFUNC selection if SSE2 and AVX2 memsets weren't used before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset optimized with Enhanced REP STOSB will be used for processors with ERMS. The new AVX512 memset will be used for processors with AVX512 which prefer vzeroupper. [BZ #19881] * sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded into ... * sysdeps/x86_64/memset.S: This. (__bzero): Removed. (__memset_tail): Likewise. (__memset_chk): Likewise. (memset): Likewise. (MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't defined. (MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined. * sysdeps/x86_64/multiarch/memset-avx2.S: Removed. (__memset_zero_constant_len_parameter): Check SHARED instead of PIC. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove memset-avx2 and memset-sse2-unaligned-erms. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Remove __memset_chk_sse2, __memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__bzero): Enabled. * sysdeps/x86_64/multiarch/memset.S (memset): Replace __memset_sse2 and __memset_avx2 with __memset_sse2_unaligned and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms or __memset_avx2_unaligned_erms if processor has ERMS. Support __memset_avx512_unaligned_erms and __memset_avx512_unaligned. (memset): Removed. (__memset_chk): Likewise. (MEMSET_SYMBOL): New. (libc_hidden_builtin_def): Replace __memset_sse2 with __memset_sse2_unaligned. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace __memset_chk_sse2 and __memset_chk_avx2 with __memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms. Use __memset_chk_sse2_unaligned_erms or __memset_chk_avx2_unaligned_erms if processor has ERMS. Support __memset_chk_avx512_unaligned_erms and __memset_chk_avx512_unaligned.
2016-06-08 20:55:45 +00:00
#include "multiarch/memset-vec-unaligned-erms.S"
Update. * sysdeps/i386/fpu/ftestexcept.c: Also check SSE status word. * include/signal.h: Use libc_hidden_proto for sigaddset and sigdelset. * signal/sigaddset.c: Add libc_hidden_def. * signal/sigdelset.c: Likewise. 2003-04-29 Jakub Jelinek <jakub@redhat.com> * sysdeps/i386/i486/string-inlines.c (__memcpy_g, __strchr_g): Move to the end of the file. * configure.in: Change __oline__ to $LINENO. (HAVE_BUILTIN_REDIRECTION): New check. * config.h.in (HAVE_BUILTIN_REDIRECTION): Add. * include/libc-symbols.h (libc_hidden_builtin_proto, libc_hidden_builtin_def, libc_hidden_builtin_weak, libc_hidden_builtin_ver): Define. * include/string.h (memchr, memcpy, memmove, memset, strcat, strchr, strcmp, strcpy, strcspn, strlen, strncmp, strncpy, strpbrk, strrchr, strspn, strstr): Add libc_hidden_builtin_proto. * intl/plural.y: Include string.h. * sysdeps/alpha/alphaev6/memchr.S (memchr): Add libc_hidden_builtin_def. * sysdeps/alpha/alphaev6/memcpy.S (memcpy): Likewise. * sysdeps/alpha/alphaev6/memset.S (memset): Likewise. * sysdeps/alpha/alphaev67/strcat.S (strcat): Likewise. * sysdeps/alpha/alphaev67/strchr.S (strchr): Likewise. * sysdeps/alpha/alphaev67/strlen.S (strlen): Likewise. * sysdeps/alpha/alphaev67/strrchr.S (strrchr): Likewise. * sysdeps/alpha/memchr.S (memchr): Likewise. * sysdeps/alpha/memset.S (memset): Likewise. * sysdeps/alpha/strcat.S (strcat): Likewise. * sysdeps/alpha/strchr.S (strchr): Likewise. * sysdeps/alpha/strcmp.S (strcmp): Likewise. * sysdeps/alpha/strcpy.S (strcpy): Likewise. * sysdeps/alpha/strlen.S (strlen): Likewise. * sysdeps/alpha/strncmp.S (strncmp): Likewise. * sysdeps/alpha/strncpy.S (strncpy): Likewise. * sysdeps/alpha/strrchr.S (strrchr): Likewise. * sysdeps/arm/memset.S (memset): Likewise. * sysdeps/arm/strlen.S (strlen): Likewise. * sysdeps/generic/memchr.c (memchr): Likewise. * sysdeps/generic/memcpy.c (memcpy): Likewise. * sysdeps/generic/memmove.c (memmove): Likewise. * sysdeps/generic/memset.c (memset): Likewise. * sysdeps/generic/strcat.c (strcat): Likewise. * sysdeps/generic/strchr.c (strchr): Likewise. * sysdeps/generic/strcmp.c (strcmp): Likewise. * sysdeps/generic/strcpy.c (strcpy): Likewise. * sysdeps/generic/strcspn.c (strcspn): Likewise. * sysdeps/generic/strlen.c (strlen): Likewise. * sysdeps/generic/strncmp.c (strncmp): Likewise. * sysdeps/generic/strncpy.c (strncpy): Likewise. * sysdeps/generic/strpbrk.c (strpbrk): Likewise. * sysdeps/generic/strrchr.c (strrchr): Likewise. * sysdeps/generic/strspn.c (strspn): Likewise. * sysdeps/generic/strstr.c (strstr): Likewise. * sysdeps/i386/i486/strcat.S (strcat): Likewise. * sysdeps/i386/i486/strlen.S (strlen): Likewise. * sysdeps/i386/i586/memcpy.S (memcpy): Likewise. * sysdeps/i386/i586/memset.S (memset): Likewise. * sysdeps/i386/i586/strchr.S (strchr): Likewise. * sysdeps/i386/i586/strcpy.S (strcpy): Likewise. * sysdeps/i386/i586/strlen.S (strlen): Likewise. * sysdeps/i386/i686/memcpy.S (memcpy): Likewise. * sysdeps/i386/i686/memmove.S (memmove): Likewise. * sysdeps/i386/i686/memset.S (memset): Likewise. * sysdeps/i386/i686/strcmp.S (strcmp): Likewise. * sysdeps/i386/memchr.S (memchr): Likewise. * sysdeps/i386/memset.c (memset): Likewise. * sysdeps/i386/strchr.S (strchr): Likewise. * sysdeps/i386/strcspn.S (strcspn): Likewise. * sysdeps/i386/strlen.c (strlen): Likewise. * sysdeps/i386/strpbrk.S (strpbrk): Likewise. * sysdeps/i386/strrchr.S (strrchr): Likewise. * sysdeps/i386/strspn.S (strspn): Likewise. * sysdeps/ia64/memchr.S (memchr): Likewise. * sysdeps/ia64/memcpy.S (memcpy): Likewise. * sysdeps/ia64/memmove.S (memmove): Likewise. * sysdeps/ia64/memset.S (memset): Likewise. * sysdeps/ia64/strcat.S (strcat): Likewise. * sysdeps/ia64/strchr.S (strchr): Likewise. * sysdeps/ia64/strcmp.S (strcmp): Likewise. * sysdeps/ia64/strcpy.S (strcpy): Likewise. * sysdeps/ia64/strlen.S (strlen): Likewise. * sysdeps/ia64/strncmp.S (strncmp): Likewise. * sysdeps/ia64/strncpy.S (strncpy): Likewise. * sysdeps/m68k/memchr.S (memchr): Likewise. * sysdeps/m68k/strchr.S (strchr): Likewise. * sysdeps/mips/mips64/memcpy.S (memcpy): Likewise. * sysdeps/mips/mips64/memset.S (memset): Likewise. * sysdeps/mips/memcpy.S (memcpy): Likewise. * sysdeps/mips/memset.S (memset): Likewise. * sysdeps/powerpc/powerpc32/memset.S (memset): Likewise. * sysdeps/powerpc/powerpc32/strchr.S (strchr): Likewise. * sysdeps/powerpc/powerpc32/strcmp.S (strcmp): Likewise. * sysdeps/powerpc/powerpc32/strcpy.S (strcpy): Likewise. * sysdeps/powerpc/powerpc32/strlen.S (strlen): Likewise. * sysdeps/powerpc/powerpc64/memcpy.S (memcpy): Likewise. * sysdeps/powerpc/powerpc64/memset.S (memset): Likewise. * sysdeps/powerpc/powerpc64/strchr.S (strchr): Likewise. * sysdeps/powerpc/powerpc64/strcmp.S (strcmp): Likewise. * sysdeps/powerpc/powerpc64/strcpy.S (strcpy): Likewise. * sysdeps/powerpc/powerpc64/strlen.S (strlen): Likewise. * sysdeps/powerpc/strcat.c (strcat): Likewise. * sysdeps/sparc/sparc32/memchr.S (memchr): Likewise. * sysdeps/sparc/sparc32/memcpy.S (memcpy): Likewise. * sysdeps/sparc/sparc32/memset.S (memset): Likewise. * sysdeps/sparc/sparc32/strcat.S (strcat): Likewise. * sysdeps/sparc/sparc32/strchr.S (strchr, strrchr): Likewise. * sysdeps/sparc/sparc32/strcmp.S (strcmp): Likewise. * sysdeps/sparc/sparc32/strcpy.S (strcpy): Likewise. * sysdeps/sparc/sparc32/strlen.S (strlen): Likewise. * sysdeps/sparc/sparc64/sparcv9b/memcpy.S (memcpy, memmove): Likewise. * sysdeps/sparc/sparc64/memchr.S (memchr): Likewise. * sysdeps/sparc/sparc64/memcpy.S (memcpy, memmove): Likewise. * sysdeps/sparc/sparc64/memset.S (memset): Likewise. * sysdeps/sparc/sparc64/strcat.S (strcat): Likewise. * sysdeps/sparc/sparc64/strchr.S (strchr, strrchr): Likewise. * sysdeps/sparc/sparc64/strcmp.S (strcmp): Likewise. * sysdeps/sparc/sparc64/strcpy.S (strcpy): Likewise. * sysdeps/sparc/sparc64/strcspn.S (strcspn): Likewise. * sysdeps/sparc/sparc64/strlen.S (strlen): Likewise. * sysdeps/sparc/sparc64/strncmp.S (strncmp): Likewise. * sysdeps/sparc/sparc64/strncpy.S (strncpy): Likewise. * sysdeps/sparc/sparc64/strpbrk.S (strpbrk): Likewise. * sysdeps/sparc/sparc64/strspn.S (strspn): Likewise. * sysdeps/sh/memcpy.S (memcpy): Likewise. * sysdeps/sh/memset.S (memset): Likewise. * sysdeps/sh/strlen.S (strlen): Likewise. * sysdeps/s390/s390-32/memchr.S (memchr): Likewise. * sysdeps/s390/s390-32/memcpy.S (memcpy): Likewise. * sysdeps/s390/s390-32/memset.S (memset): Likewise. * sysdeps/s390/s390-32/strcmp.S (strcmp): Likewise. * sysdeps/s390/s390-32/strcpy.S (strcpy): Likewise. * sysdeps/s390/s390-32/strncpy.S (strncpy): Likewise. * sysdeps/s390/s390-64/memchr.S (memchr): Likewise. * sysdeps/s390/s390-64/memcpy.S (memcpy): Likewise. * sysdeps/s390/s390-64/memset.S (memset): Likewise. * sysdeps/s390/s390-64/strcmp.S (strcmp): Likewise. * sysdeps/s390/s390-64/strcpy.S (strcpy): Likewise. * sysdeps/s390/s390-64/strncpy.S (strncpy): Likewise. * sysdeps/x86_64/memcpy.S (memcpy): Likewise. * sysdeps/x86_64/memset.S (memset): Likewise. * sysdeps/x86_64/strcat.S (strcat): Likewise. * sysdeps/x86_64/strchr.S (strchr): Likewise. * sysdeps/x86_64/strcmp.S (strcmp): Likewise. * sysdeps/x86_64/strcpy.S (strcpy): Likewise. * sysdeps/x86_64/strcspn.S (strcspn): Likewise. * sysdeps/x86_64/strlen.S (strlen): Likewise. * sysdeps/x86_64/strspn.S (strspn): Likewise. * string/string-inlines.c: Move... * sysdeps/generic/string-inlines.c: ...here. (__memcpy_g, __strchr_g): Remove. (__NO_INLINE__): Define before including <string.h>, undefine after. Include bits/string.h and bits/string2.h. * sysdeps/i386/i486/string-inlines.c: New file. * sysdeps/i386/string-inlines.c: New file. * sysdeps/i386/i486/Versions: Remove. All GLIBC_2.1.1 symbols moved... * sysdeps/i386/Versions (libc): ...here. 2003-04-29 Ulrich Drepper <drepper@redhat.com>
2003-04-29 22:49:58 +00:00
libc_hidden_builtin_def (memset)
* sysdeps/unix/sysv/linux/libc_fatal.c: Print backtrace and memory map if requested. * debug/chk_fail.c: Request backtrace and memory map dump. * Versions.def: Add GLIBC_2.4 for libc. * debug/fgets_chk.c: New file. * debug/fgets_u_chk.c: New file. * debug/getcwd_chk.c: New file. * debug/getwd_chk.c: New file. * debug/readlink_chk.c: New file. * debug/read_chk.c: New file. * debug/pread_chk.c: New file. * debug/pread64_chk.c: New file. * debug/recv_chk.c: New file. * debug/recvfrom_chk.c: New file. * debug/Versions: Add all new functions with version GLIBC_2.4. * debug/Makefile (routines): Add fgets_chk, fgets_u_chk, read_chk, pread_chk, pread64_chk, recv_chk, recvfrom_chk, readlink_chk, getwd_chk, and getcwd_chk. Plus appropriate CFLAGS definitions. * debug/tst-chk1.c: Add more tests. * libio/bits/stdio2.h: Add macros for fgets and fgets_unlocked. * include/stdio.h: Declare __fgets_chk and __fgets_unlocked_chk. * posix/unistd.h: Include <bits/unistd.h> for fortification. * posix/bits/unistd.h: New file. * posix/Makefile (headers): Add bits/unistd.h. * socket/sys/socket.h: Include <bits/socket2.h> for fortification. * socket/bits/socket2.h: New file. * socket/Makefile (headers): Add bits/socket2.h. * string/bits/string3.h: Extend memset macro to check for zero 3rd parameter and use __memset_zero_constant_len_parameter in that case. * sysdeps/generic/memset_chk.c: Add __memset_zero_constant_len_parameter alias and linker warning. * debug/Versions: Add __memset_zero_constant_len_parameter to libc with version GLIBC_2.4. * sysdeps/generic/bits/types.h: Don't unnecessarily use __extension__ in __STD_TYPE definition. 2005-02-21 Jakub Jelinek <jakub@redhat.com> * malloc/malloc.c (malloc_printerr): If MALLOC_CHECK_={5,7}, print the error message rather than program name. 2005-02-21 Ulrich Drepper <drepper@redhat.com>
2005-02-21 23:14:10 +00:00
x86-64: Optimize wmemset with SSE2/AVX2/AVX512 The difference between memset and wmemset is byte vs int. Add stubs to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size: SSE2 wmemset: shl $0x2,%rdx movd %esi,%xmm0 mov %rdi,%rax pshufd $0x0,%xmm0,%xmm0 jmp entry_from_wmemset SSE2 memset: movd %esi,%xmm0 mov %rdi,%rax punpcklbw %xmm0,%xmm0 punpcklwd %xmm0,%xmm0 pshufd $0x0,%xmm0,%xmm0 entry_from_wmemset: Since the ERMS versions of wmemset requires "rep stosl" instead of "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset is about 6X faster on Haswell. * include/wchar.h (__wmemset_chk): New. * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_CHK_SYMBOL): Likewise. (WMEMSET_SYMBOL): Likewise. (__wmemset): Add hidden definition. (wmemset): Add weak hidden definition. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wmemset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned, __wmemset_avx2_unaligned, __wmemset_avx512_unaligned, __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned and __wmemset_chk_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated. (WMEMSET_CHK_SYMBOL): New. (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise. (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise. * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New. (libc_hidden_builtin_def): Also define __GI_wmemset and __GI___wmemset. (weak_alias): New. * sysdeps/x86_64/multiarch/wmemset.c: New file. * sysdeps/x86_64/multiarch/wmemset.h: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/wmemset.c: Likewise. * sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
#if IS_IN (libc)
weak_alias (__bzero, bzero)
x86-64: Optimize wmemset with SSE2/AVX2/AVX512 The difference between memset and wmemset is byte vs int. Add stubs to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size: SSE2 wmemset: shl $0x2,%rdx movd %esi,%xmm0 mov %rdi,%rax pshufd $0x0,%xmm0,%xmm0 jmp entry_from_wmemset SSE2 memset: movd %esi,%xmm0 mov %rdi,%rax punpcklbw %xmm0,%xmm0 punpcklwd %xmm0,%xmm0 pshufd $0x0,%xmm0,%xmm0 entry_from_wmemset: Since the ERMS versions of wmemset requires "rep stosl" instead of "rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset is about 6X faster on Haswell. * include/wchar.h (__wmemset_chk): New. * sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_CHK_SYMBOL): Likewise. (WMEMSET_SYMBOL): Likewise. (__wmemset): Add hidden definition. (wmemset): Add weak hidden definition. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add wmemset_chk-nonshared. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned, __wmemset_avx2_unaligned, __wmemset_avx512_unaligned, __wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned and __wmemset_chk_avx512_unaligned. * sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ... (MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This. (WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New. (WMEMSET_SYMBOL): Likewise. * sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated. (WMEMSET_CHK_SYMBOL): New. (WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise. (WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise. * sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New. (libc_hidden_builtin_def): Also define __GI_wmemset and __GI___wmemset. (weak_alias): New. * sysdeps/x86_64/multiarch/wmemset.c: New file. * sysdeps/x86_64/multiarch/wmemset.h: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise. * sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise. * sysdeps/x86_64/wmemset.c: Likewise. * sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
libc_hidden_def (__wmemset)
weak_alias (__wmemset, wmemset)
libc_hidden_weak (wmemset)
#endif