glibc/sysdeps
Patrick McGehearty d3c5702747 Reversing calculation of __x86_shared_non_temporal_threshold
The __x86_shared_non_temporal_threshold determines when memcpy on x86
uses non_temporal stores to avoid pushing other data out of the last
level cache.

This patch proposes to revert the calculation change made by H.J. Lu's
patch of June 2, 2017.

H.J. Lu's patch selected a threshold suitable for a single thread
getting maximum performance. It was tuned using the single threaded
large memcpy micro benchmark on an 8 core processor. The last change
changes the threshold from using 3/4 of one thread's share of the
cache to using 3/4 of the entire cache of a multi-threaded system
before switching to non-temporal stores. Multi-threaded systems with
more than a few threads are server-class and typically have many
active threads. If one thread consumes 3/4 of the available cache for
all threads, it will cause other active threads to have data removed
from the cache. Two examples show the range of the effect. John
McCalpin's widely parallel Stream benchmark, which runs in parallel
and fetches data sequentially, saw a 20% slowdown with this patch on
an internal system test of 128 threads. This regression was discovered
when comparing OL8 performance to OL7.  An example that compares
normal stores to non-temporal stores may be found at
https://vgatherps.github.io/2018-09-02-nontemporal/.  A simple test
shows performance loss of 400 to 500% due to a failure to use
nontemporal stores. These performance losses are most likely to occur
when the system load is heaviest and good performance is critical.

The tunable x86_non_temporal_threshold can be used to override the
default for the knowledgable user who really wants maximum cache
allocation to a single thread in a multi-threaded system.
The manual entry for the tunable has been expanded to provide
more information about its purpose.

	modified: sysdeps/x86/cacheinfo.c
	modified: manual/tunables.texi
2020-09-28 22:10:39 +00:00
..
aarch64 AArch64: Improve backwards memmove performance 2020-08-28 17:51:40 +01:00
alpha alpha: Use builtin sqrt{f} 2020-06-22 11:09:49 -03:00
arc ARC: Build Infrastructure 2020-07-10 16:08:45 -07:00
arm arm: remove string/tst-memmove-overflow XFAIL 2020-07-16 06:56:52 +02:00
csky semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
generic Linux: Remove rseq support 2020-07-16 17:55:35 +02:00
gnu Remove internal usage of extensible stat functions 2020-09-11 14:35:32 -03:00
hppa dl-runtime: reloc_{offset,index} now functions arch overide'able 2020-06-05 13:45:46 -07:00
htl htl: Move cleanup handling to non-private libc-lock 2020-06-28 00:13:57 +00:00
hurd hurd: Fix build-many-glibcs.py 2020-07-13 14:25:03 -03:00
i386 x86: Use one ldbl2mpn.c file for both i386 and x86_64 2020-09-22 17:58:39 +02:00
ia64 x86: Use one ldbl2mpn.c file for both i386 and x86_64 2020-09-22 17:58:39 +02:00
ieee754 math: Fix inaccuracy of j0f for x >= 2^127 when sin(x)+cos(x) is tiny 2020-08-07 16:33:13 -03:00
m68k m68k: Use sqrt{f} builtin for coldfire 2020-06-22 11:09:50 -03:00
mach hurd: add ST_RELATIME 2020-09-27 18:23:27 +02:00
microblaze semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
mips mips: Use sqrt{f} builtin 2020-06-22 11:09:49 -03:00
nios2 Update Nios II libm-test-ulps file. 2020-08-03 01:42:48 -07:00
nptl nptl: Fix __futex_abstimed_wait_cancellable32 2020-09-28 16:05:32 -03:00
posix Remove internal usage of extensible stat functions 2020-09-11 14:35:32 -03:00
powerpc powerpc: Protect dl_powerpc_cpu_features on INIT_ARCH() [BZ #26615] 2020-09-22 17:45:12 -03:00
pthread C11 threads: Fix inaccuracies in testsuite 2020-09-07 11:42:52 +02:00
riscv RISC-V: Build infrastructure for 32-bit port 2020-08-27 08:17:43 -07:00
s390 S390: Sync HWCAP names with kernel by adding aliases [BZ #25971] 2020-08-21 11:23:17 +02:00
sh semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
sparc Update sparc libm-test-ulps 2020-09-11 14:39:03 -03:00
unix linux: Add time64 recvmmsg support 2020-09-28 17:28:39 -03:00
wordsize-32 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
wordsize-64 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
x86 Reversing calculation of __x86_shared_non_temporal_threshold 2020-09-28 22:10:39 +00:00
x86_64 x86: Use one ldbl2mpn.c file for both i386 and x86_64 2020-09-22 17:58:39 +02:00