glibc/sysdeps
Adhemerval Zanella Netto e169aff0e9 x86: Add SSE2 optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S.  It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.

As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               443.11
arc4random_buf(16) [single-thread]       552.27
arc4random_buf(32) [single-thread]       626.86
arc4random_buf(48) [single-thread]       649.81
arc4random_buf(64) [single-thread]       663.95
arc4random_buf(80) [single-thread]       674.78
arc4random_buf(96) [single-thread]       675.17
arc4random_buf(112) [single-thread]      680.69
arc4random_buf(128) [single-thread]      683.20
-----------------------------------------------

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

Checked on x86_64-linux-gnu.
2022-07-22 11:58:27 -03:00
..
aarch64 aarch64: Add optimized chacha20 2022-07-22 11:58:27 -03:00
alpha alpha: Remove _dl_skip_args usage 2022-05-30 16:32:22 -03:00
arc elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA 2022-06-15 11:29:55 -07:00
arm Add bounds check to __libc_ifunc_impl_list 2022-06-10 17:13:29 +01:00
csky csky: Remove _dl_skip_args usage 2022-05-30 16:32:33 -03:00
generic aarch64: Add optimized chacha20 2022-07-22 11:58:27 -03:00
gnu
hppa hppa: Remove _dl_skip_args usage (BZ# 29165) 2022-05-30 16:32:35 -03:00
htl htl: Fix initializing the key lock 2022-02-14 19:29:02 +01:00
hurd hurd: Fix pthread_kill on exiting/ted thread 2022-01-15 15:11:54 +01:00
i386 i386: Remove -Wa,-mtune=i686 2022-07-12 11:14:32 -07:00
ia64 grep: egrep -> grep -E, fgrep -> grep -F 2022-06-05 12:09:02 -07:00
ieee754 i686: Use generic sincosf implementation for SSE2 version 2022-06-01 10:47:44 -03:00
m68k m68k: optimize RTLD_START 2022-06-25 00:22:02 +02:00
mach stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) 2022-07-22 11:58:27 -03:00
microblaze microblaze: Remove _dl_skip_args usage 2022-05-30 16:33:14 -03:00
mips mips: Remove _dl_skip_args usage 2022-05-30 16:33:16 -03:00
nios2 Remove remnant reference to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA 2022-06-15 13:02:17 -07:00
nptl stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) 2022-07-22 11:58:27 -03:00
or1k elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC 2022-04-26 09:26:22 -07:00
posix Refactor internal-signals.h 2022-06-30 14:56:21 -03:00
powerpc Add bounds check to __libc_ifunc_impl_list 2022-06-10 17:13:29 +01:00
pthread nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214) 2022-06-08 09:23:02 -03:00
riscv riscv: Use memcpy to handle unaligned access when fixing R_RISCV_RELATIVE 2022-06-30 08:04:52 -07:00
s390 s390: use LC_ALL=C for readelf call 2022-06-21 10:16:44 +02:00
sh sh: Remove _dl_skip_args usage 2022-05-30 16:33:28 -03:00
sparc Add bounds check to __libc_ifunc_impl_list 2022-06-10 17:13:29 +01:00
unix stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) 2022-07-22 11:58:27 -03:00
wordsize-32
wordsize-64
x86 x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
x86_64 x86: Add SSE2 optimized chacha20 2022-07-22 11:58:27 -03:00