Since the new SSE2/AVX2 memsets are faster than the previous ones, we
can remove the previous SSE2/AVX2 memsets and replace them with the
new ones. This reduces the size of libc.so by about 900 bytes.
No change in IFUNC selection if SSE2 and AVX2 memsets weren't used
before. If SSE2 or AVX2 memset was used, the new SSE2 or AVX2 memset
optimized with Enhanced REP STOSB will be used for processors with
ERMS. The new AVX512 memset will be used for processors with AVX512
which prefer vzeroupper.
[BZ #19881]
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Folded
into ...
* sysdeps/x86_64/memset.S: This.
(__bzero): Removed.
(__memset_tail): Likewise.
(__memset_chk): Likewise.
(memset): Likewise.
(MEMSET_CHK_SYMBOL): New. Define only if MEMSET_SYMBOL isn't
defined.
(MEMSET_SYMBOL): Define only if MEMSET_SYMBOL isn't defined.
* sysdeps/x86_64/multiarch/memset-avx2.S: Removed.
(__memset_zero_constant_len_parameter): Check SHARED instead of
PIC.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
memset-avx2 and memset-sse2-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Remove __memset_chk_sse2,
__memset_chk_avx2, __memset_sse2 and __memset_avx2_unaligned.
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
(__bzero): Enabled.
* sysdeps/x86_64/multiarch/memset.S (memset): Replace
__memset_sse2 and __memset_avx2 with __memset_sse2_unaligned
and __memset_avx2_unaligned. Use __memset_sse2_unaligned_erms
or __memset_avx2_unaligned_erms if processor has ERMS. Support
__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
(memset): Removed.
(__memset_chk): Likewise.
(MEMSET_SYMBOL): New.
(libc_hidden_builtin_def): Replace __memset_sse2 with
__memset_sse2_unaligned.
* sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Replace
__memset_chk_sse2 and __memset_chk_avx2 with
__memset_chk_sse2_unaligned and __memset_chk_avx2_unaligned_erms.
Use __memset_chk_sse2_unaligned_erms or
__memset_chk_avx2_unaligned_erms if processor has ERMS. Support
__memset_chk_avx512_unaligned_erms and
__memset_chk_avx512_unaligned.
This implementation speed up memset in several ways. First is avoiding
expensive computed jump. Second is using fact that arguments of memset
are most of time aligned to 8 bytes.
Benchmark results on:
kam.mff.cuni.cz/~ondra/benchmark_string/memset_profile_result27_04_13.tar.bz2
[BZ #13592]
There are several signed compares of the size argument, whereas
it really is unsigned. Depending on situations e.g. a "memset(ptr, 0,
-1)" segfault (but for the wrong reasons, because jumping into nirvana)
or succeeds even.
In normal use this is harmless, as a size with signbit set indicates
more than half the address space which on x86_64 is impossible to
allocate, but as the size is used to index some jump tables this
potentially could have other unwanted side effects.
This patch enables SSE2 memset for AMD's upcoming Orochi processor.
This patch also fixes the following bug:
For misaligned blocks larger than > 144 Bytes, memset branches into
the integer code path depending on the value of misalignment even if
the startup code chooses the SSE2 code path upfront, when multiarch
is enabled.
2008-2-26 Harsha Jagasia <harsha.jagasia@amd.com>
* sysdeps/x86_64/cacheinfo.c (NOT_USED_RIGHT_NOW): Remove ifdef guards.
* sysdeps/x86_64/memset.S: Rewrite non-SSE code path as tuned for AMD
Barcelona machine. Make default fall through branch of
__x86_64_preferred_memory_instruction check as the integer code path.
2007-10-15 H.J. Lu <hongjiu.lu@intel.com>
* sysdeps/x86_64/cacheinfo.c
(__x86_64_preferred_memory_instruction): New variable.
(init_cacheinfo): Initialize __x86_64_preferred_memory_instruction.
* sysdeps/x86_64/memset.S: Rewrite.
2008-01-08 Jakub Jelinek <jakub@redhat.com>
* malloc/malloc.c (public_cALLOc): For arenas other than
a local label rather than HIDDEN_JUMPTARGET.
2007-10-16 Jakub Jelinek <jakub@redhat.com>
* sysdeps/x86_64/memset.S: Jump from bzero to memset using
a local label rather than HIDDEN_JUMPTARGET.
map if requested.
* debug/chk_fail.c: Request backtrace and memory map dump.
* Versions.def: Add GLIBC_2.4 for libc.
* debug/fgets_chk.c: New file.
* debug/fgets_u_chk.c: New file.
* debug/getcwd_chk.c: New file.
* debug/getwd_chk.c: New file.
* debug/readlink_chk.c: New file.
* debug/read_chk.c: New file.
* debug/pread_chk.c: New file.
* debug/pread64_chk.c: New file.
* debug/recv_chk.c: New file.
* debug/recvfrom_chk.c: New file.
* debug/Versions: Add all new functions with version GLIBC_2.4.
* debug/Makefile (routines): Add fgets_chk, fgets_u_chk, read_chk,
pread_chk, pread64_chk, recv_chk, recvfrom_chk, readlink_chk,
getwd_chk, and getcwd_chk. Plus appropriate CFLAGS definitions.
* debug/tst-chk1.c: Add more tests.
* libio/bits/stdio2.h: Add macros for fgets and fgets_unlocked.
* include/stdio.h: Declare __fgets_chk and __fgets_unlocked_chk.
* posix/unistd.h: Include <bits/unistd.h> for fortification.
* posix/bits/unistd.h: New file.
* posix/Makefile (headers): Add bits/unistd.h.
* socket/sys/socket.h: Include <bits/socket2.h> for fortification.
* socket/bits/socket2.h: New file.
* socket/Makefile (headers): Add bits/socket2.h.
* string/bits/string3.h: Extend memset macro to check for zero 3rd
parameter and use __memset_zero_constant_len_parameter in that case.
* sysdeps/generic/memset_chk.c: Add
__memset_zero_constant_len_parameter alias and linker warning.
* debug/Versions: Add __memset_zero_constant_len_parameter to libc
with version GLIBC_2.4.
* sysdeps/generic/bits/types.h: Don't unnecessarily use __extension__
in __STD_TYPE definition.
2005-02-21 Jakub Jelinek <jakub@redhat.com>
* malloc/malloc.c (malloc_printerr): If MALLOC_CHECK_={5,7}, print
the error message rather than program name.
2005-02-21 Ulrich Drepper <drepper@redhat.com>
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Declare
external functions with hidden attribute.
(elf_machine_rela): Optimize.
* sysdeps/x86_64/memset.S: New file.
* sysdeps/x86_64/bzero.S: New file.
* sysdeps/x86_64/stpcpy.S: New file.
* sysdeps/x86_64/strcat.S: New file.
* sysdeps/x86_64/strchr.S: New file.
* sysdeps/x86_64/strcpy.S: New file.
* sysdeps/x86_64/strcspn.S: New file.
* sysdeps/x86_64/strlen.S: New file.
* sysdeps/x86_64/strpbrk.S: New file.
* sysdeps/x86_64/strspn.S: New file.
* sysdeps/x86_64/strcmp.S: New file.
* sysdeps/x86_64/strtok_r.S: New file.
* sysdeps/x86_64/strtok.S: New file.
* sysdeps/x86_64/memcpy.S: New file.
* sysdeps/x86_64/mempcpy.S: New file.