Besides the option being gcc specific, this approach is still fragile
and not future proof since we do not know if this will be the only
optimization option gcc will add that transforms loops to memset
(or any libcall).
This patch adds a new header, dl-symbol-redir-ifunc.h, that can b
used to redirect the compiler generated libcalls to port the generic
memset implementation if required.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Add a proper bounds check to __libc_ifunc_impl_list. This makes MAX_IFUNC
redundant and fixes several targets that will write outside the array.
To avoid unnecessary large diffs, pass the maximum in the argument 'i' to
IFUNC_IMPL_ADD - 'max' can be used in new ifunc definitions and existing
ones can be updated if desired.
Passes buildmanyglibc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Support added to identify Sparc M7/T7/S7/M8/T8 processor capability.
Performance tests run on Sparc S7 using new code and old niagara4 code.
Optimizations for memset also apply to bzero as they share code.
For memset/bzero, performance comparison with niagara4 code:
For memset nonzero data,
256-1023 bytes - 60-90% gain (in cache); 5% gain (out of cache)
1K+ bytes - 80-260% gain (in cache); 40-80% gain (out of cache)
For memset zero data (and bzero),
256-1023 bytes - 80-120% gain (in cache), 0% gain (out of cache)
1024+ bytes - 2-4x gain (in cache), 10-35% gain (out of cache)
Tested in sparcv9-*-* and sparc64-*-* targets in both multi and
non-multi arch configurations.
Patrick McGehearty <patrick.mcgehearty@oracle.com>
Adhemerval Zanella <adhemerval.zanella@linaro.org>
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile
(sysdeps_routines): Add memset-niagara7.
* sysdeps/sparc/sparc64/multiarch/Makefile (sysdes_rotuines):
Likewise.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memset-niagara7.S: New
file.
* sysdeps/sparc/sparc64/multiarch/memset-niagara7.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add __bzero_niagara7 and __memset_niagara7.
* sysdeps/sparc/sparc64/multiarch/ifunc-memset.h (IFUNC_SELECTOR):
Add niagara7 option.
* NEWS: Mention sparc m7 optimized memcpy, mempcpy, memmove, and
memset.
Support added to identify Sparc M7/T7/S7/M8/T8 processor capability.
Performance tests run on Sparc S7 using new code and old niagara4 code.
Optimizations for memcpy also apply to mempcpy and memmove
where they share code. Optimizations for memset also apply
to bzero as they share code.
For memcpy/mempcpy/memmove, performance comparison with niagara4 code:
Long word aligned data
0-127 bytes - minimal changes
128-1023 bytes - 7-30% gain
1024+ bytes - 1-7% gain (in cache); 30-100% gain (out of cache)
Word aligned data
0-127 bytes - 50%+ gain
128-1023 bytes - 10-200% gain
1024+ bytes - 0-15% gain (in cache); 5-50% gain (out of cache)
Unaligned data
0-127 bytes - 0-70%+ gain
128-447 bytes - 40-80%+ gain
448-511 bytes - 1-3% loss
512-4096 bytes - 2-3% gain (in cache); 0-20% gain (out of cache)
4096+ bytes - ± 3% (in cache); 20-50% gain (out of cache)
Tested in sparcv9-*-* and sparc64-*-* targets in both multi and
non-multi arch configurations.
Patrick McGehearty <patrick.mcgehearty@oracle.com>
Adhemerval Zanella <adhemerval.zanella@linaro.org>
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile
(sysdeps_routines): Add memcpy-memmove-niagara7 and memmove-ultra1.
* sysdeps/sparc/sparc64/multiarch/Makefile (sysdeps_routines):
Likewise.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy-memmove-niagara7.S:
New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memmove-ultra1.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/multiarch/rtld-memmove.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add __memcpy_niagara7, __mempcpy_niagara7,
and __memmove_niagara7.
* sysdeps/sparc/sparc64/multiarch/ifunc-memcpy.h (IFUNC_SELECTOR):
Add niagara7 option.
* sysdeps/sparc/sparc64/multiarch/memmove.c: New file.
* sysdeps/sparc/sparc64/multiarch/ifunc-memmove.h: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-memmove-niagara7.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memmove-ultra1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/rtld-memmove.c: Likewise.
This patch refactors the sparc64 ifunc selector to a C implementation.
Also, the generic symbol is moved to its own implementation file
add_n-generic.S).
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
* sysdeps/sparc/sparc64/multiarch/Makefile (sysdep_routines):
Add add_n-generic.
* sysdeps/sparc/sparc64/multiarch/add_n-generic.S: New file.
* sysdeps/sparc/sparc64/multiarch/add_n.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/add_n.S: Remove file.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch refactors the sparc64 ifunc selector to a C implementation.
Also, the generic symbol is moved to its own implementation file
submul_1-generic.S).
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
* sysdeps/sparc/sparc64/multiarch/Makefile (sysdep_routines):
Add submul_1-generic.
* sysdeps/sparc/sparc64/multiarch/submul_1-generic.S: New file.
* sysdeps/sparc/sparc64/multiarch/submul_1.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/submul_1.S: Remove file.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch refactors the sparc64 ifunc selector to a C implementation.
Also, the generic symbol is moved to its own implementation file
addmul_1-generic.S).
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
* sysdeps/sparc/sparc64/multiarch/Makefile (sysdep_routines):
Add addmul_1-generic.
* sysdeps/sparc/sparc64/multiarch/addmul_1-generic.S: New file.
* sysdeps/sparc/sparc64/multiarch/addmul_1.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/addmul_1.S: Remove file.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch refactors the sparc64 ifunc selector to a C implementation.
Also, the generic symbol is moved to its own implementation file
sub_n-generic.S).
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
* sysdeps/sparc/sparc64/multiarch/Makefile (sysdep_routines):
Add sub_n-generic.
* sysdeps/sparc/sparc64/multiarch/sub_n-generic.S: New file.
* sysdeps/sparc/sparc64/multiarch/sub_n.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/sub_n.S: Remove file.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch refactors the sparc64 ifunc selector to a C implementation.
Also, the generic symbol is moved to its own implementation file
mul_1-generic.S).
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
* sysdeps/sparc/sparc64/multiarch/Makefile (sysdep_routines):
Add mul_1-generic.
* sysdeps/sparc/sparc64/multiarch/mul_1-generic.S: New file.
* sysdeps/sparc/sparc64/multiarch/mul_1.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/mul_1.S: Remove file.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch refactor the SPARC64 ifunc selector to a C implementation.
The x86_64 implementation is used as default, which resulted in common
definitions (ifunc-init.h) used on both architectures. No functional
change is expected, including ifunc resolution rules.
Checked on sparc64-linux-gnu, sparcv9-linux-gnu and x86_64-linux-gnu.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy-ultra1.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/multiarch/mempcpy.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/ifunc-memcpy.h: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-ultra1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy.c: Likewise.
* sysdeps/sparc/sparc64/multiarch/mempcpy.c: Likewise.
* sysdeps/sparc/sparc-ifunc.h (sparc_libc_ifunc_redirected): New
macro.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile
[$(subdir) = string] (sysdep_routines): Add memcpy-ultra1.
* sysdeps/sparc/sparc64/multiarch/Makefile [$(subdir) = string]
(sysdep_routines): Add memcpy-ultra1.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Remove file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy.S: Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
[BZ #16150]
* sysdeps/sparc/sparc64/multiarch/add_n.S: Resolve to the correct generic
symbol in the non-vis3 case in static builds.
* sysdeps/sparc/sparc64/multiarch/addmul_1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/mul_1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/sub_n.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/submul_1.S: Likewise.
* math/Makefile: Recognize gmp-sysdep_routines.
* sysdeps/sparc/sparc64/multiarch/Makefile: Add VIS3 optimized GMP routines
to sysdeps.
* sysdeps/sparc/sparc64/multiarch/add_n-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/add_n.S: New file.
* sysdeps/sparc/sparc64/multiarch/addmul_1-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/addmul_1.S: New file.
* sysdeps/sparc/sparc64/multiarch/mul_1-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/mul_1.S: New file.
* sysdeps/sparc/sparc64/multiarch/sub_n-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/sub_n.S: New file.
* sysdeps/sparc/sparc64/multiarch/submul_1-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/submul_1.S: New file.
* crypt/Makefile: Move test targets after toplevel Rules
inclusion. Grab any necessary sysdep routines when linking.
* crypt/md5.c (md5_process_block): Remove define, we will always
name it __md5_process_block.
(md5_finish_ctx): Update md5_process_block call.
(md5_stream): Likewise.
(md5_process_bytes): Likewise.
(md5_process_block): Rename to __md5_process_block and move to ...
* crypt/md5-block.c: ... here.
* crypt/sha256.c (sha256_process_block): Move to ...
* crypt/sha256-block.c: ... here.
* crypt/sha512.c (sha512_process_block): Move to ...
* crypt/sha512-block.c: ... here.
* locale/Makefile (CFLAGS-md5.c): Define to add crypt/ to include
path.
* sysdeps/sparc/sparc-ifunc.c (sparc_libc_ifunc): Define.
* sysdeps/sparc/sparc64/multiarch/Makefile
(libcrypt-sysdep_routines): Add crypto assembler sysdeps when in
crypt subdir.
(localedef-aux): Add md5 crypto assembler when in locale subdir.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile: Mirror sparc64
multiarch changes.
* sysdeps/sparc/sparc64/multiarch/md5-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/md5-crop.S: New file.
* sysdeps/sparc/sparc64/multiarch/sha256-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/sha256-crop.S: New file.
* sysdeps/sparc/sparc64/multiarch/sha512-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/sha512-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/md5-block.c: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/md5-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha256-block.c: New
file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha256-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha512-block.c: New
file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha512-crop.S: New file.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: On 32-bit, clear
upper 32-bits of the length value in %o2 since we use branch-on-register
tests which consider the entire 64-bit register.
* sysdeps/sparc/sparc64/multiarch/memset-niagara4.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memset-niagara4.S: New
file.
* sysdeps/sparc/sparc64/multiarch/Makefile: Add to
sysdep_routines.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset.S: Use Niagara-4 memset
and bzero when HWCAP_SPARC_CRYPTO is present.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy-niagara4.S: New
file.
* sysdeps/sparc/sparc64/multiarch/Makefile: Add to
sysdep_routines.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Use Niagara-4 memcpy
and mempcpy when HWCAP_SPARC_CRYPTO is set.
fmovd clears the current exception field in the %fsr, fsrc2
does not and therefore runs more efficiently on some cpus.
* sysdeps/sparc/sparc64/memcpy.S: Use fsrc2 to move 64-bit
values between float registers.
* sysdeps/sparc/sparc64/memset.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara2.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Provide a hidden def to
the IFUNC routine in the libc case.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/multiarch/rtld-memset.c: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/rtld-memcpy.c: New file.
* sysdeps/sparc/sparc32/sparcv9/rtld-memset.c: New file.
* sysdeps/sparc/sparc32/sparcv9/rtld-memcpy.c: New file.
* sysdeps/sparc/sparc64/multiarch/rtld-memset.c: New file.
* sysdeps/sparc/sparc64/multiarch/rtld-memcpy.c: New file.
* sysdeps/sparc/sparc64/rtld-memset.c: New file.
* sysdeps/sparc/sparc64/rtld-memcpy.c: New file.