x86: Enable non-temporal memset for Hygon processors

This patch uses 'Avoid_Non_Temporal_Memset' flag to access
the non-temporal memset implementation for hygon processors.

Test Results:

hygon1 arch
x86_memset_non_temporal_threshold = 8MB
size                          new performance time / old performance time
1MB                           0.994
4MB                           0.996
8MB                           0.670
16MB                          0.343
32MB                          0.355

hygon2 arch
x86_memset_non_temporal_threshold = 8MB
size                          new performance time / old performance time
1MB                           1
4MB                           1
8MB                           1.312
16MB                          0.822
32MB                          0.830

hygon3 arch
x86_memset_non_temporal_threshold = 8MB
size                          new performance time / old performance time
1MB                           1
4MB                           0.990
8MB                           0.737
16MB                          0.390
32MB                          0.401

For hygon arch with this patch, non-temporal stores can improve
performance by 20% - 65%.

Signed-off-by: Feifei Wang <wangfeifei@hygon.cn>
Reviewed-by: Jing Li <lijing@hygon.cn>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This commit is contained in:
Feifei Wang 2024-08-19 14:57:55 +08:00 committed by H.J. Lu
parent d14aecbffc
commit ca90758b2a
2 changed files with 8 additions and 3 deletions

View File

@ -756,9 +756,9 @@ init_cpu_features (struct cpu_features *cpu_features)
unsigned int stepping = 0; unsigned int stepping = 0;
enum cpu_features_kind kind; enum cpu_features_kind kind;
/* Default is avoid non-temporal memset for non Intel/AMD hardware. This is, /* Default is avoid non-temporal memset for non Intel/AMD/Hygon hardware. This is,
as of writing this, we only have benchmarks indicatings it profitability as of writing this, we only have benchmarks indicatings it profitability
on Intel/AMD. */ on Intel/AMD/Hygon. */
cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset] cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
|= bit_arch_Avoid_Non_Temporal_Memset; |= bit_arch_Avoid_Non_Temporal_Memset;
@ -1116,6 +1116,11 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht
get_extended_indices (cpu_features); get_extended_indices (cpu_features);
update_active (cpu_features); update_active (cpu_features);
/* Benchmarks indicate non-temporal memset can be profitable on Hygon
hardware. */
cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
&= ~bit_arch_Avoid_Non_Temporal_Memset;
} }
else else
{ {

View File

@ -1071,7 +1071,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
/* Non-temporal stores are more performant on some hardware above /* Non-temporal stores are more performant on some hardware above
non_temporal_threshold. Currently Prefer_Non_Temporal is set for for both non_temporal_threshold. Currently Prefer_Non_Temporal is set for for both
Intel and AMD hardware. */ Intel, AMD and Hygon hardware. */
unsigned long int memset_non_temporal_threshold = SIZE_MAX; unsigned long int memset_non_temporal_threshold = SIZE_MAX;
if (!CPU_FEATURES_ARCH_P (cpu_features, Avoid_Non_Temporal_Memset)) if (!CPU_FEATURES_ARCH_P (cpu_features, Avoid_Non_Temporal_Memset))
memset_non_temporal_threshold = non_temporal_threshold; memset_non_temporal_threshold = non_temporal_threshold;