2002-08-31 17:45:33 +00:00
|
|
|
/* memset/bzero -- set memory area to CH/0
|
|
|
|
Optimized version for x86-64.
|
2022-01-01 18:54:23 +00:00
|
|
|
Copyright (C) 2002-2022 Free Software Foundation, Inc.
|
2002-08-31 17:45:33 +00:00
|
|
|
This file is part of the GNU C Library.
|
|
|
|
|
|
|
|
The GNU C Library is free software; you can redistribute it and/or
|
|
|
|
modify it under the terms of the GNU Lesser General Public
|
|
|
|
License as published by the Free Software Foundation; either
|
|
|
|
version 2.1 of the License, or (at your option) any later version.
|
|
|
|
|
|
|
|
The GNU C Library is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
Lesser General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU Lesser General Public
|
2012-02-09 23:18:22 +00:00
|
|
|
License along with the GNU C Library; if not, see
|
Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:
sed -ri '
s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
$(find $(git ls-files) -prune -type f \
! -name '*.po' \
! -name 'ChangeLog*' \
! -path COPYING ! -path COPYING.LIB \
! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
! -path manual/texinfo.tex ! -path scripts/config.guess \
! -path scripts/config.sub ! -path scripts/install-sh \
! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
! -path INSTALL ! -path locale/programs/charmap-kw.h \
! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
! '(' -name configure \
-execdir test -f configure.ac -o -f configure.in ';' ')' \
! '(' -name preconfigure \
-execdir test -f preconfigure.ac ';' ')' \
-print)
and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:
chmod a+x sysdeps/unix/sysv/linux/riscv/configure
# Omit irrelevant whitespace and comment-only changes,
# perhaps from a slightly-different Autoconf version.
git checkout -f \
sysdeps/csky/configure \
sysdeps/hppa/configure \
sysdeps/riscv/configure \
sysdeps/unix/sysv/linux/csky/configure
# Omit changes that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
git checkout -f \
sysdeps/powerpc/powerpc64/ppc-mcount.S \
sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
# Omit change that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 05:40:42 +00:00
|
|
|
<https://www.gnu.org/licenses/>. */
|
2002-08-31 17:45:33 +00:00
|
|
|
|
|
|
|
#include <sysdep.h>
|
2021-09-20 21:20:15 +00:00
|
|
|
#define USE_WITH_SSE2 1
|
2002-08-31 17:45:33 +00:00
|
|
|
|
2016-06-08 20:55:45 +00:00
|
|
|
#define VEC_SIZE 16
|
2021-09-20 21:20:15 +00:00
|
|
|
#define MOV_SIZE 3
|
|
|
|
#define RET_SIZE 1
|
|
|
|
|
2016-06-08 20:55:45 +00:00
|
|
|
#define VEC(i) xmm##i
|
2021-09-20 21:20:15 +00:00
|
|
|
#define VMOVU movups
|
|
|
|
#define VMOVA movaps
|
2016-06-08 20:55:45 +00:00
|
|
|
|
x86-64: Optimize wmemset with SSE2/AVX2/AVX512
The difference between memset and wmemset is byte vs int. Add stubs
to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
SSE2 wmemset:
shl $0x2,%rdx
movd %esi,%xmm0
mov %rdi,%rax
pshufd $0x0,%xmm0,%xmm0
jmp entry_from_wmemset
SSE2 memset:
movd %esi,%xmm0
mov %rdi,%rax
punpcklbw %xmm0,%xmm0
punpcklwd %xmm0,%xmm0
pshufd $0x0,%xmm0,%xmm0
entry_from_wmemset:
Since the ERMS versions of wmemset requires "rep stosl" instead of
"rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset
is about 6X faster on Haswell.
* include/wchar.h (__wmemset_chk): New.
* sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_CHK_SYMBOL): Likewise.
(WMEMSET_SYMBOL): Likewise.
(__wmemset): Add hidden definition.
(wmemset): Add weak hidden definition.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
wmemset_chk-nonshared.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
__wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
__wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
and __wmemset_chk_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
(WMEMSET_CHK_SYMBOL): New.
(WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
(WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
* sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
(libc_hidden_builtin_def): Also define __GI_wmemset and
__GI___wmemset.
(weak_alias): New.
* sysdeps/x86_64/multiarch/wmemset.c: New file.
* sysdeps/x86_64/multiarch/wmemset.h: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
* sysdeps/x86_64/wmemset.c: Likewise.
* sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
|
|
|
#define MEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
2016-06-08 20:55:45 +00:00
|
|
|
movd d, %xmm0; \
|
|
|
|
movq r, %rax; \
|
|
|
|
punpcklbw %xmm0, %xmm0; \
|
|
|
|
punpcklwd %xmm0, %xmm0; \
|
|
|
|
pshufd $0, %xmm0, %xmm0
|
|
|
|
|
x86-64: Optimize wmemset with SSE2/AVX2/AVX512
The difference between memset and wmemset is byte vs int. Add stubs
to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
SSE2 wmemset:
shl $0x2,%rdx
movd %esi,%xmm0
mov %rdi,%rax
pshufd $0x0,%xmm0,%xmm0
jmp entry_from_wmemset
SSE2 memset:
movd %esi,%xmm0
mov %rdi,%rax
punpcklbw %xmm0,%xmm0
punpcklwd %xmm0,%xmm0
pshufd $0x0,%xmm0,%xmm0
entry_from_wmemset:
Since the ERMS versions of wmemset requires "rep stosl" instead of
"rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset
is about 6X faster on Haswell.
* include/wchar.h (__wmemset_chk): New.
* sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_CHK_SYMBOL): Likewise.
(WMEMSET_SYMBOL): Likewise.
(__wmemset): Add hidden definition.
(wmemset): Add weak hidden definition.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
wmemset_chk-nonshared.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
__wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
__wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
and __wmemset_chk_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
(WMEMSET_CHK_SYMBOL): New.
(WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
(WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
* sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
(libc_hidden_builtin_def): Also define __GI_wmemset and
__GI___wmemset.
(weak_alias): New.
* sysdeps/x86_64/multiarch/wmemset.c: New file.
* sysdeps/x86_64/multiarch/wmemset.h: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
* sysdeps/x86_64/wmemset.c: Likewise.
* sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
|
|
|
#define WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN(d, r) \
|
|
|
|
movd d, %xmm0; \
|
|
|
|
movq r, %rax; \
|
|
|
|
pshufd $0, %xmm0, %xmm0
|
|
|
|
|
2016-06-08 20:55:45 +00:00
|
|
|
#define SECTION(p) p
|
|
|
|
|
|
|
|
#ifndef MEMSET_SYMBOL
|
|
|
|
# define MEMSET_CHK_SYMBOL(p,s) p
|
|
|
|
# define MEMSET_SYMBOL(p,s) memset
|
2004-10-15 Jakub Jelinek <jakub@redhat.com>
* elf/dl-minimal.c (__chk_fail): New. Add rtld_hidden_def.
* sysdeps/unix/sysv/linux/readonly-area.c: New file.
* sysdeps/i386/i686/memmove.S (__memmove_chk): Add checking
routine.
* sysdeps/i386/i686/memcpy.S (__memcpy_chk): Likewise.
* sysdeps/i386/i686/mempcpy.S (__mempcpy_chk): Likewise.
* sysdeps/i386/i686/memset.S (__memset_chk): Likewise.
* sysdeps/i386/i686/memmove-chk.S: New file.
* sysdeps/i386/i686/memcpy-chk.S: Likewise.
* sysdeps/i386/i686/mempcpy-chk.S: Likewise.
* sysdeps/i386/i686/memset-chk.S: Likewise.
* sysdeps/generic/strcat-chk.c (__strcat_chk): Don't __chk_fail
if exactly fitting into buffer.
* sysdeps/generic/strncat-chk.c (__strncat_chk): Likewise.
* sysdeps/generic/readonly-area.c: New file.
* sysdeps/generic/strncpy-chk.c (__strncpy_chk): Only test
destlen once.
* sysdeps/x86_64/memset.S (__memset_chk): Add checking routine.
* sysdeps/x86_64/memcpy.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/mempcpy.S (__memcpy_chk): Define to __mempcpy_chk.
* sysdeps/x86_64/memcpy-chk.S: New file.
* sysdeps/x86_64/mempcpy-chk.S: Likewise.
* sysdeps/x86_64/memset-chk.S: Likewise.
* sysdeps/x86_64/strcpy-chk.S: Likewise.
* sysdeps/x86_64/stpcpy-chk.S: Likewise.
* argp/argp-xinl.c (__OPTIMIZE__): Define to 1 instead of nothing.
* argp/argp-fs-xinl.c (__OPTIMIZE__): Likewise.
* debug/tst-chk1.c: New test.
* debug/tst-chk2.c: Likewise.
* debug/tst-chk3.c: Likewise.
* debug/test-strcpy_chk.c: Likewise.
* debug/test-stpcpy_chk.c: Likewise.
* debug/vsprintf_chk.c (__vsprintf_chk): If flags > 0, request
_IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def.
* debug/Makefile (routines): Add printf_chk, fprintf_chk, vprintf_chk,
vfprintf_chk, gets_chk and readonly-area.
(CFLAGS-*_chk.c): Set.
(tests): Add tst-chk1, tst-chk2, tst-chk3, test-strcpy_chk and
test-stpcpy_chk.
* debug/vprintf_chk.c: New file.
* debug/printf_chk.c: Likewise.
* debug/vfprintf_chk.c: Likewise.
* debug/fprintf_chk.c: Likewise.
* debug/gets_chk.c: Likewise.
* debug/chk_fail.c (__chk_fail): Add libc_hidden_def.
* debug/snprintf_chk.c (__snprintf_chk): Fix order of arguments
passed to __vsnprintf_chk.
* debug/Versions (libc): Export __printf_chk, __fprintf_chk,
__vprintf_chk, __vfprintf_chk and __gets_chk @GLIBC_2.3.4.
* debug/vsnprintf_chk.c (__vsnprintf_chk): Don't call
__vsnprintf, instead create a temporary file with
_IO_strn_jumps jumptable. If flags > 0, request
_IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def.
* libio/Makefile (headers): Add bits/stdio2.h.
* libio/stdio.h: Include <bits/stdio2.h> if __USE_FORTIFY_LEVEL.
(sprintf, snprintf, vsprintf, vsnprintf): Remove defines.
* libio/strfile.h (_IO_strnfile): New type.
(_IO_strn_jumps): New extern.
* libio/vsnprintf.c (_IO_strnfile): Remove.
(_IO_strn_jumps): Remove static.
* libio/bits/stdio2.h: New file.
* libio/vswprintf.c (_IO_strnfile): Rename type to...
(_IO_wstrnfile): ...this. Adjust all uses.
* libio/libio.h (_IO_FLAGS2_CHECK_PERCENT_N): Define.
* stdio-common/vfprintf.c (STR_LEN): Define.
(vfprintf): Add readonly_format variable.
Handle _IO_FLAGS2_CHECK_PERCENT_N.
(buffered_vfprintf): Copy _flags2.
* include/stdio.h (__sprintf_chk, __snprintf_chk, __vsprintf_chk,
__vsnprintf_chk, __printf_chk, __fprintf_chk, __vprintf_chk,
__vfprintf_chk): New prototypes.
(__vsprintf_chk, __vsnprintf_chk): Add libc_hidden_proto.
* include/string.h (__memcpy_chk, __memmove_chk, __mempcpy_chk,
__memset_chk, __strcpy_chk, __stpcpy_chk, __strncpy_chk, __strcat_chk,
__strncat_chk): New prototypes.
* include/bits/string3.h: New file.
* include/sys/cdefs.h (__chk_fail): Add libc_hidden_proto
and rtld_hidden_proto.
* string/Makefile (headers): Add bits/string3.h.
* string/bits/string3.h (bcopy, bzero): New defines.
(memset, memcpy, memmove, strcpy, strncpy, strcat, strncat): Change
macros so that inlines are used only if unknown destination size
or side-effects in destination argument.
(mempcpy, stpcpy): Likewise. Protect with #ifdef __USE_GNU.
2004-09-16 Ulrich Drepper <drepper@redhat.com>
* debug/Makefile (routines): Add *_chk.
* debug/Versions (libc): Export __chk_fail, __memcpy_chk,
__memmove_chk, __mempcpy_chk, __memset_chk, __stpcpy_chk,
__strcat_chk, __strcpy_chk, __strncat_chk, __strncpy_chk,
__sprintf_chk, __vsprintf_chk, __snprintf_chk, __vsnprintf_chk
@GLIBC_2.3.4.
* debug/chk_fail.c: New file.
* debug/snprintf_chk.c: Likewise.
* debug/sprintf_chk.c: Likewise.
* debug/vsnprintf_chk.c: Likewise.
* debug/vsprintf_chk.c: Likewise.
* include/features.h (_FORTIFY_SOURCE): Document, handle.
(__USE_FORTIFY_LEVEL): Define.
(__GNUC_PREREQ): Move to earlier location.
* include/sys/cdefs.h (__chk_fail): New prototype.
* libio/bits/stdio.h (sprintf, vsprintf, snprintf, vsnprintf):
Define if __USE_FORTIFY_LEVEL.
* misc/sys/cdefs.h (__bos, __bos0): Define.
* string/string.h: Include <bits/string3.h> if __USE_FORTIFY_LEVEL.
* bits/string/string3.h: New header.
* sysdeps/generic/memcpy_chk.c: New file.
* sysdeps/generic/memmove_chk.c: Likewise.
* sysdeps/generic/mempcpy_chk.c: Likewise.
* sysdeps/generic/memset_chk.c: Likewise.
* sysdeps/generic/stpcpy_chk.c: Likewise.
* sysdeps/generic/strcat_chk.c: Likewise.
* sysdeps/generic/strcpy_chk.c: Likewise.
* sysdeps/generic/strncat_chk.c: Likewise.
* sysdeps/generic/strncpy_chk.c: Likewise.
2004-10-15 Jakub Jelinek <jakub@redhat.com>
* elf/dl-minimal.c (__chk_fail): New. Add rtld_hidden_def.
* sysdeps/unix/sysv/linux/readonly-area.c: New file.
* sysdeps/i386/i686/memmove.S (__memmove_chk): Add checking
routine.
* sysdeps/i386/i686/memcpy.S (__memcpy_chk): Likewise.
* sysdeps/i386/i686/mempcpy.S (__mempcpy_chk): Likewise.
* sysdeps/i386/i686/memset.S (__memset_chk): Likewise.
* sysdeps/i386/i686/memmove-chk.S: New file.
* sysdeps/i386/i686/memcpy-chk.S: Likewise.
* sysdeps/i386/i686/mempcpy-chk.S: Likewise.
* sysdeps/i386/i686/memset-chk.S: Likewise.
* sysdeps/generic/strcat-chk.c (__strcat_chk): Don't __chk_fail
if exactly fitting into buffer.
* sysdeps/generic/strncat-chk.c (__strncat_chk): Likewise.
* sysdeps/generic/readonly-area.c: New file.
* sysdeps/generic/strncpy-chk.c (__strncpy_chk): Only test
destlen once.
* sysdeps/x86_64/memset.S (__memset_chk): Add checking routine.
* sysdeps/x86_64/memcpy.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/mempcpy.S (__memcpy_chk): Define to __mempcpy_chk.
* sysdeps/x86_64/memcpy-chk.S: New file.
* sysdeps/x86_64/mempcpy-chk.S: Likewise.
* sysdeps/x86_64/memset-chk.S: Likewise.
* sysdeps/x86_64/strcpy-chk.S: Likewise.
* sysdeps/x86_64/stpcpy-chk.S: Likewise.
* argp/argp-xinl.c (__OPTIMIZE__): Define to 1 instead of nothing.
* argp/argp-fs-xinl.c (__OPTIMIZE__): Likewise.
* debug/tst-chk1.c: New test.
* debug/tst-chk2.c: Likewise.
* debug/tst-chk3.c: Likewise.
* debug/test-strcpy_chk.c: Likewise.
* debug/test-stpcpy_chk.c: Likewise.
* debug/vsprintf_chk.c (__vsprintf_chk): If flags > 0, request
_IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def.
* debug/Makefile (routines): Add printf_chk, fprintf_chk, vprintf_chk,
vfprintf_chk, gets_chk and readonly-area.
(CFLAGS-*_chk.c): Set.
(tests): Add tst-chk1, tst-chk2, tst-chk3, test-strcpy_chk and
test-stpcpy_chk.
* debug/vprintf_chk.c: New file.
* debug/printf_chk.c: Likewise.
* debug/vfprintf_chk.c: Likewise.
* debug/fprintf_chk.c: Likewise.
* debug/gets_chk.c: Likewise.
* debug/chk_fail.c (__chk_fail): Add libc_hidden_def.
* debug/snprintf_chk.c (__snprintf_chk): Fix order of arguments
passed to __vsnprintf_chk.
* debug/Versions (libc): Export __printf_chk, __fprintf_chk,
__vprintf_chk, __vfprintf_chk and __gets_chk @GLIBC_2.3.4.
* debug/vsnprintf_chk.c (__vsnprintf_chk): Don't call
__vsnprintf, instead create a temporary file with
_IO_strn_jumps jumptable. If flags > 0, request
_IO_FLAGS2_CHECK_PERCENT_N. Add libc_hidden_def.
* libio/Makefile (headers): Add bits/stdio2.h.
* libio/stdio.h: Include <bits/stdio2.h> if __USE_FORTIFY_LEVEL.
(sprintf, snprintf, vsprintf, vsnprintf): Remove defines.
* libio/strfile.h (_IO_strnfile): New type.
(_IO_strn_jumps): New extern.
* libio/vsnprintf.c (_IO_strnfile): Remove.
(_IO_strn_jumps): Remove static.
* libio/bits/stdio2.h: New file.
* libio/vswprintf.c (_IO_strnfile): Rename type to...
(_IO_wstrnfile): ...this. Adjust all uses.
* libio/libio.h (_IO_FLAGS2_CHECK_PERCENT_N): Define.
* stdio-common/vfprintf.c (STR_LEN): Define.
(vfprintf): Add readonly_format variable.
Handle _IO_FLAGS2_CHECK_PERCENT_N.
(buffered_vfprintf): Copy _flags2.
* include/stdio.h (__sprintf_chk, __snprintf_chk, __vsprintf_chk,
__vsnprintf_chk, __printf_chk, __fprintf_chk, __vprintf_chk,
__vfprintf_chk): New prototypes.
(__vsprintf_chk, __vsnprintf_chk): Add libc_hidden_proto.
* include/string.h (__memcpy_chk, __memmove_chk, __mempcpy_chk,
__memset_chk, __strcpy_chk, __stpcpy_chk, __strncpy_chk, __strcat_chk,
__strncat_chk): New prototypes.
* include/bits/string3.h: New file.
* include/sys/cdefs.h (__chk_fail): Add libc_hidden_proto
and rtld_hidden_proto.
* string/Makefile (headers): Add bits/string3.h.
* string/bits/string3.h (bcopy, bzero): New defines.
(memset, memcpy, memmove, strcpy, strncpy, strcat, strncat): Change
macros so that inlines are used only if unknown destination size
or side-effects in destination argument.
(mempcpy, stpcpy): Likewise. Protect with #ifdef __USE_GNU.
2004-09-16 Ulrich Drepper <drepper@redhat.com>
* debug/Makefile (routines): Add *_chk.
* debug/Versions (libc): Export __chk_fail, __memcpy_chk,
__memmove_chk, __mempcpy_chk, __memset_chk, __stpcpy_chk,
__strcat_chk, __strcpy_chk, __strncat_chk, __strncpy_chk,
__sprintf_chk, __vsprintf_chk, __snprintf_chk, __vsnprintf_chk
@GLIBC_2.3.4.
* debug/chk_fail.c: New file.
* debug/snprintf_chk.c: Likewise.
* debug/sprintf_chk.c: Likewise.
* debug/vsnprintf_chk.c: Likewise.
* debug/vsprintf_chk.c: Likewise.
* include/features.h (_FORTIFY_SOURCE): Document, handle.
(__USE_FORTIFY_LEVEL): Define.
(__GNUC_PREREQ): Move to earlier location.
* include/sys/cdefs.h (__chk_fail): New prototype.
* libio/bits/stdio.h (sprintf, vsprintf, snprintf, vsnprintf):
Define if __USE_FORTIFY_LEVEL.
* misc/sys/cdefs.h (__bos, __bos0): Define.
* string/string.h: Include <bits/string3.h> if __USE_FORTIFY_LEVEL.
* bits/string/string3.h: New header.
* sysdeps/generic/memcpy_chk.c: New file.
* sysdeps/generic/memmove_chk.c: Likewise.
* sysdeps/generic/mempcpy_chk.c: Likewise.
* sysdeps/generic/memset_chk.c: Likewise.
* sysdeps/generic/stpcpy_chk.c: Likewise.
* sysdeps/generic/strcat_chk.c: Likewise.
* sysdeps/generic/strcpy_chk.c: Likewise.
* sysdeps/generic/strncat_chk.c: Likewise.
* sysdeps/generic/strncpy_chk.c: Likewise.
2004-10-18 04:17:19 +00:00
|
|
|
#endif
|
2010-11-08 08:41:34 +00:00
|
|
|
|
x86-64: Optimize wmemset with SSE2/AVX2/AVX512
The difference between memset and wmemset is byte vs int. Add stubs
to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
SSE2 wmemset:
shl $0x2,%rdx
movd %esi,%xmm0
mov %rdi,%rax
pshufd $0x0,%xmm0,%xmm0
jmp entry_from_wmemset
SSE2 memset:
movd %esi,%xmm0
mov %rdi,%rax
punpcklbw %xmm0,%xmm0
punpcklwd %xmm0,%xmm0
pshufd $0x0,%xmm0,%xmm0
entry_from_wmemset:
Since the ERMS versions of wmemset requires "rep stosl" instead of
"rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset
is about 6X faster on Haswell.
* include/wchar.h (__wmemset_chk): New.
* sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_CHK_SYMBOL): Likewise.
(WMEMSET_SYMBOL): Likewise.
(__wmemset): Add hidden definition.
(wmemset): Add weak hidden definition.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
wmemset_chk-nonshared.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
__wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
__wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
and __wmemset_chk_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
(WMEMSET_CHK_SYMBOL): New.
(WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
(WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
* sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
(libc_hidden_builtin_def): Also define __GI_wmemset and
__GI___wmemset.
(weak_alias): New.
* sysdeps/x86_64/multiarch/wmemset.c: New file.
* sysdeps/x86_64/multiarch/wmemset.h: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
* sysdeps/x86_64/wmemset.c: Likewise.
* sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
|
|
|
#ifndef WMEMSET_SYMBOL
|
|
|
|
# define WMEMSET_CHK_SYMBOL(p,s) p
|
|
|
|
# define WMEMSET_SYMBOL(p,s) __wmemset
|
|
|
|
#endif
|
|
|
|
|
2016-06-08 20:55:45 +00:00
|
|
|
#include "multiarch/memset-vec-unaligned-erms.S"
|
2002-08-31 17:45:33 +00:00
|
|
|
|
Update.
* sysdeps/i386/fpu/ftestexcept.c: Also check SSE status word.
* include/signal.h: Use libc_hidden_proto for sigaddset and sigdelset.
* signal/sigaddset.c: Add libc_hidden_def.
* signal/sigdelset.c: Likewise.
2003-04-29 Jakub Jelinek <jakub@redhat.com>
* sysdeps/i386/i486/string-inlines.c (__memcpy_g, __strchr_g): Move
to the end of the file.
* configure.in: Change __oline__ to $LINENO.
(HAVE_BUILTIN_REDIRECTION): New check.
* config.h.in (HAVE_BUILTIN_REDIRECTION): Add.
* include/libc-symbols.h (libc_hidden_builtin_proto,
libc_hidden_builtin_def, libc_hidden_builtin_weak,
libc_hidden_builtin_ver): Define.
* include/string.h (memchr, memcpy, memmove, memset, strcat, strchr,
strcmp, strcpy, strcspn, strlen, strncmp, strncpy, strpbrk, strrchr,
strspn, strstr): Add libc_hidden_builtin_proto.
* intl/plural.y: Include string.h.
* sysdeps/alpha/alphaev6/memchr.S (memchr): Add
libc_hidden_builtin_def.
* sysdeps/alpha/alphaev6/memcpy.S (memcpy): Likewise.
* sysdeps/alpha/alphaev6/memset.S (memset): Likewise.
* sysdeps/alpha/alphaev67/strcat.S (strcat): Likewise.
* sysdeps/alpha/alphaev67/strchr.S (strchr): Likewise.
* sysdeps/alpha/alphaev67/strlen.S (strlen): Likewise.
* sysdeps/alpha/alphaev67/strrchr.S (strrchr): Likewise.
* sysdeps/alpha/memchr.S (memchr): Likewise.
* sysdeps/alpha/memset.S (memset): Likewise.
* sysdeps/alpha/strcat.S (strcat): Likewise.
* sysdeps/alpha/strchr.S (strchr): Likewise.
* sysdeps/alpha/strcmp.S (strcmp): Likewise.
* sysdeps/alpha/strcpy.S (strcpy): Likewise.
* sysdeps/alpha/strlen.S (strlen): Likewise.
* sysdeps/alpha/strncmp.S (strncmp): Likewise.
* sysdeps/alpha/strncpy.S (strncpy): Likewise.
* sysdeps/alpha/strrchr.S (strrchr): Likewise.
* sysdeps/arm/memset.S (memset): Likewise.
* sysdeps/arm/strlen.S (strlen): Likewise.
* sysdeps/generic/memchr.c (memchr): Likewise.
* sysdeps/generic/memcpy.c (memcpy): Likewise.
* sysdeps/generic/memmove.c (memmove): Likewise.
* sysdeps/generic/memset.c (memset): Likewise.
* sysdeps/generic/strcat.c (strcat): Likewise.
* sysdeps/generic/strchr.c (strchr): Likewise.
* sysdeps/generic/strcmp.c (strcmp): Likewise.
* sysdeps/generic/strcpy.c (strcpy): Likewise.
* sysdeps/generic/strcspn.c (strcspn): Likewise.
* sysdeps/generic/strlen.c (strlen): Likewise.
* sysdeps/generic/strncmp.c (strncmp): Likewise.
* sysdeps/generic/strncpy.c (strncpy): Likewise.
* sysdeps/generic/strpbrk.c (strpbrk): Likewise.
* sysdeps/generic/strrchr.c (strrchr): Likewise.
* sysdeps/generic/strspn.c (strspn): Likewise.
* sysdeps/generic/strstr.c (strstr): Likewise.
* sysdeps/i386/i486/strcat.S (strcat): Likewise.
* sysdeps/i386/i486/strlen.S (strlen): Likewise.
* sysdeps/i386/i586/memcpy.S (memcpy): Likewise.
* sysdeps/i386/i586/memset.S (memset): Likewise.
* sysdeps/i386/i586/strchr.S (strchr): Likewise.
* sysdeps/i386/i586/strcpy.S (strcpy): Likewise.
* sysdeps/i386/i586/strlen.S (strlen): Likewise.
* sysdeps/i386/i686/memcpy.S (memcpy): Likewise.
* sysdeps/i386/i686/memmove.S (memmove): Likewise.
* sysdeps/i386/i686/memset.S (memset): Likewise.
* sysdeps/i386/i686/strcmp.S (strcmp): Likewise.
* sysdeps/i386/memchr.S (memchr): Likewise.
* sysdeps/i386/memset.c (memset): Likewise.
* sysdeps/i386/strchr.S (strchr): Likewise.
* sysdeps/i386/strcspn.S (strcspn): Likewise.
* sysdeps/i386/strlen.c (strlen): Likewise.
* sysdeps/i386/strpbrk.S (strpbrk): Likewise.
* sysdeps/i386/strrchr.S (strrchr): Likewise.
* sysdeps/i386/strspn.S (strspn): Likewise.
* sysdeps/ia64/memchr.S (memchr): Likewise.
* sysdeps/ia64/memcpy.S (memcpy): Likewise.
* sysdeps/ia64/memmove.S (memmove): Likewise.
* sysdeps/ia64/memset.S (memset): Likewise.
* sysdeps/ia64/strcat.S (strcat): Likewise.
* sysdeps/ia64/strchr.S (strchr): Likewise.
* sysdeps/ia64/strcmp.S (strcmp): Likewise.
* sysdeps/ia64/strcpy.S (strcpy): Likewise.
* sysdeps/ia64/strlen.S (strlen): Likewise.
* sysdeps/ia64/strncmp.S (strncmp): Likewise.
* sysdeps/ia64/strncpy.S (strncpy): Likewise.
* sysdeps/m68k/memchr.S (memchr): Likewise.
* sysdeps/m68k/strchr.S (strchr): Likewise.
* sysdeps/mips/mips64/memcpy.S (memcpy): Likewise.
* sysdeps/mips/mips64/memset.S (memset): Likewise.
* sysdeps/mips/memcpy.S (memcpy): Likewise.
* sysdeps/mips/memset.S (memset): Likewise.
* sysdeps/powerpc/powerpc32/memset.S (memset): Likewise.
* sysdeps/powerpc/powerpc32/strchr.S (strchr): Likewise.
* sysdeps/powerpc/powerpc32/strcmp.S (strcmp): Likewise.
* sysdeps/powerpc/powerpc32/strcpy.S (strcpy): Likewise.
* sysdeps/powerpc/powerpc32/strlen.S (strlen): Likewise.
* sysdeps/powerpc/powerpc64/memcpy.S (memcpy): Likewise.
* sysdeps/powerpc/powerpc64/memset.S (memset): Likewise.
* sysdeps/powerpc/powerpc64/strchr.S (strchr): Likewise.
* sysdeps/powerpc/powerpc64/strcmp.S (strcmp): Likewise.
* sysdeps/powerpc/powerpc64/strcpy.S (strcpy): Likewise.
* sysdeps/powerpc/powerpc64/strlen.S (strlen): Likewise.
* sysdeps/powerpc/strcat.c (strcat): Likewise.
* sysdeps/sparc/sparc32/memchr.S (memchr): Likewise.
* sysdeps/sparc/sparc32/memcpy.S (memcpy): Likewise.
* sysdeps/sparc/sparc32/memset.S (memset): Likewise.
* sysdeps/sparc/sparc32/strcat.S (strcat): Likewise.
* sysdeps/sparc/sparc32/strchr.S (strchr, strrchr): Likewise.
* sysdeps/sparc/sparc32/strcmp.S (strcmp): Likewise.
* sysdeps/sparc/sparc32/strcpy.S (strcpy): Likewise.
* sysdeps/sparc/sparc32/strlen.S (strlen): Likewise.
* sysdeps/sparc/sparc64/sparcv9b/memcpy.S (memcpy, memmove): Likewise.
* sysdeps/sparc/sparc64/memchr.S (memchr): Likewise.
* sysdeps/sparc/sparc64/memcpy.S (memcpy, memmove): Likewise.
* sysdeps/sparc/sparc64/memset.S (memset): Likewise.
* sysdeps/sparc/sparc64/strcat.S (strcat): Likewise.
* sysdeps/sparc/sparc64/strchr.S (strchr, strrchr): Likewise.
* sysdeps/sparc/sparc64/strcmp.S (strcmp): Likewise.
* sysdeps/sparc/sparc64/strcpy.S (strcpy): Likewise.
* sysdeps/sparc/sparc64/strcspn.S (strcspn): Likewise.
* sysdeps/sparc/sparc64/strlen.S (strlen): Likewise.
* sysdeps/sparc/sparc64/strncmp.S (strncmp): Likewise.
* sysdeps/sparc/sparc64/strncpy.S (strncpy): Likewise.
* sysdeps/sparc/sparc64/strpbrk.S (strpbrk): Likewise.
* sysdeps/sparc/sparc64/strspn.S (strspn): Likewise.
* sysdeps/sh/memcpy.S (memcpy): Likewise.
* sysdeps/sh/memset.S (memset): Likewise.
* sysdeps/sh/strlen.S (strlen): Likewise.
* sysdeps/s390/s390-32/memchr.S (memchr): Likewise.
* sysdeps/s390/s390-32/memcpy.S (memcpy): Likewise.
* sysdeps/s390/s390-32/memset.S (memset): Likewise.
* sysdeps/s390/s390-32/strcmp.S (strcmp): Likewise.
* sysdeps/s390/s390-32/strcpy.S (strcpy): Likewise.
* sysdeps/s390/s390-32/strncpy.S (strncpy): Likewise.
* sysdeps/s390/s390-64/memchr.S (memchr): Likewise.
* sysdeps/s390/s390-64/memcpy.S (memcpy): Likewise.
* sysdeps/s390/s390-64/memset.S (memset): Likewise.
* sysdeps/s390/s390-64/strcmp.S (strcmp): Likewise.
* sysdeps/s390/s390-64/strcpy.S (strcpy): Likewise.
* sysdeps/s390/s390-64/strncpy.S (strncpy): Likewise.
* sysdeps/x86_64/memcpy.S (memcpy): Likewise.
* sysdeps/x86_64/memset.S (memset): Likewise.
* sysdeps/x86_64/strcat.S (strcat): Likewise.
* sysdeps/x86_64/strchr.S (strchr): Likewise.
* sysdeps/x86_64/strcmp.S (strcmp): Likewise.
* sysdeps/x86_64/strcpy.S (strcpy): Likewise.
* sysdeps/x86_64/strcspn.S (strcspn): Likewise.
* sysdeps/x86_64/strlen.S (strlen): Likewise.
* sysdeps/x86_64/strspn.S (strspn): Likewise.
* string/string-inlines.c: Move...
* sysdeps/generic/string-inlines.c: ...here.
(__memcpy_g, __strchr_g): Remove.
(__NO_INLINE__): Define before including <string.h>,
undefine after. Include bits/string.h and bits/string2.h.
* sysdeps/i386/i486/string-inlines.c: New file.
* sysdeps/i386/string-inlines.c: New file.
* sysdeps/i386/i486/Versions: Remove.
All GLIBC_2.1.1 symbols moved...
* sysdeps/i386/Versions (libc): ...here.
2003-04-29 Ulrich Drepper <drepper@redhat.com>
2003-04-29 22:49:58 +00:00
|
|
|
libc_hidden_builtin_def (memset)
|
* sysdeps/unix/sysv/linux/libc_fatal.c: Print backtrace and memory
map if requested.
* debug/chk_fail.c: Request backtrace and memory map dump.
* Versions.def: Add GLIBC_2.4 for libc.
* debug/fgets_chk.c: New file.
* debug/fgets_u_chk.c: New file.
* debug/getcwd_chk.c: New file.
* debug/getwd_chk.c: New file.
* debug/readlink_chk.c: New file.
* debug/read_chk.c: New file.
* debug/pread_chk.c: New file.
* debug/pread64_chk.c: New file.
* debug/recv_chk.c: New file.
* debug/recvfrom_chk.c: New file.
* debug/Versions: Add all new functions with version GLIBC_2.4.
* debug/Makefile (routines): Add fgets_chk, fgets_u_chk, read_chk,
pread_chk, pread64_chk, recv_chk, recvfrom_chk, readlink_chk,
getwd_chk, and getcwd_chk. Plus appropriate CFLAGS definitions.
* debug/tst-chk1.c: Add more tests.
* libio/bits/stdio2.h: Add macros for fgets and fgets_unlocked.
* include/stdio.h: Declare __fgets_chk and __fgets_unlocked_chk.
* posix/unistd.h: Include <bits/unistd.h> for fortification.
* posix/bits/unistd.h: New file.
* posix/Makefile (headers): Add bits/unistd.h.
* socket/sys/socket.h: Include <bits/socket2.h> for fortification.
* socket/bits/socket2.h: New file.
* socket/Makefile (headers): Add bits/socket2.h.
* string/bits/string3.h: Extend memset macro to check for zero 3rd
parameter and use __memset_zero_constant_len_parameter in that case.
* sysdeps/generic/memset_chk.c: Add
__memset_zero_constant_len_parameter alias and linker warning.
* debug/Versions: Add __memset_zero_constant_len_parameter to libc
with version GLIBC_2.4.
* sysdeps/generic/bits/types.h: Don't unnecessarily use __extension__
in __STD_TYPE definition.
2005-02-21 Jakub Jelinek <jakub@redhat.com>
* malloc/malloc.c (malloc_printerr): If MALLOC_CHECK_={5,7}, print
the error message rather than program name.
2005-02-21 Ulrich Drepper <drepper@redhat.com>
2005-02-21 23:14:10 +00:00
|
|
|
|
x86-64: Optimize wmemset with SSE2/AVX2/AVX512
The difference between memset and wmemset is byte vs int. Add stubs
to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:
SSE2 wmemset:
shl $0x2,%rdx
movd %esi,%xmm0
mov %rdi,%rax
pshufd $0x0,%xmm0,%xmm0
jmp entry_from_wmemset
SSE2 memset:
movd %esi,%xmm0
mov %rdi,%rax
punpcklbw %xmm0,%xmm0
punpcklwd %xmm0,%xmm0
pshufd $0x0,%xmm0,%xmm0
entry_from_wmemset:
Since the ERMS versions of wmemset requires "rep stosl" instead of
"rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
are added. The SSE2 wmemset is about 3X faster and the AVX2 wmemset
is about 6X faster on Haswell.
* include/wchar.h (__wmemset_chk): New.
* sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_CHK_SYMBOL): Likewise.
(WMEMSET_SYMBOL): Likewise.
(__wmemset): Add hidden definition.
(wmemset): Add weak hidden definition.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
wmemset_chk-nonshared.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
__wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
__wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
and __wmemset_chk_avx512_unaligned.
* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
(WMEMSET_SYMBOL): Likewise.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
(WMEMSET_CHK_SYMBOL): New.
(WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
(WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
* sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
(libc_hidden_builtin_def): Also define __GI_wmemset and
__GI___wmemset.
(weak_alias): New.
* sysdeps/x86_64/multiarch/wmemset.c: New file.
* sysdeps/x86_64/multiarch/wmemset.h: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise.
* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
* sysdeps/x86_64/wmemset.c: Likewise.
* sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 18:09:48 +00:00
|
|
|
#if IS_IN (libc)
|
|
|
|
libc_hidden_def (__wmemset)
|
|
|
|
weak_alias (__wmemset, wmemset)
|
|
|
|
libc_hidden_weak (wmemset)
|
|
|
|
#endif
|