Commit Graph

8 Commits

Author SHA1 Message Date
H.J. Lu
5313581cb5 i386: Replace assembly versions of e_powf with generic e_powf.c
This patch replaces i386 assembly versions of e_powf with generic
e_powf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      230.855          78.3358       194%
latency                    231.685          94.1259       146%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      239.858          47.4713       405%
latency                    247.57           93.8798       163%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      269.078          63.3758       324%
latency                    271.473          102.091       165%

	* sysdeps/i386/fpu/e_powf.S: Removed.
	* sysdeps/i386/fpu/e_powf_log2_data.c: Likewise.
	* sysdeps/i386/fpu/w_powf.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_powf.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_powf-sse2.
	(CFLAGS-e_powf-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_powf-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_powf.c: Likewise.
2017-10-22 08:12:41 -07:00
H.J. Lu
6089a3ee24 i386: Replace assembly versions of e_log2f with generic e_log2f.c
This patch replaces i386 assembly versions of e_log2f with generic
e_log2f.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      92.3845          30.8752       199%
latency                    112.855          54.8645       105%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      98.7488          22.7507       334%
latency                    118.01           51.6083       128%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      106.635          28.8596       269%
latency                    129.888          56.9187       128%

	* sysdeps/i386/fpu/e_log2f.S: Removed.
	* sysdeps/i386/fpu/e_log2f_data.c: Likewise.
	* sysdeps/i386/fpu/w_log2f.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_log2f.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_log2f-sse2.
	(CFLAGS-e_log2f-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_log2f-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_log2f.c: Likewise.
2017-10-22 08:10:18 -07:00
H.J. Lu
fe596486d6 i386: Replace assembly versions of e_logf with generic e_logf.c
This patch replaces i386 assembly versions of e_logf with generic
e_logf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      73.3865          40.0454       83%
latency                    90.0985          54.4479       65%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      75.1384          22.1452       239%
latency                    91.9441          50.7925       81%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      84.5575          28.7879       193%
latency                    103.971          57.5231       80%

	* sysdeps/i386/fpu/e_logf.S: Removed.
	* sysdeps/i386/fpu/e_logf_data.c: Likewise.
	* sysdeps/i386/fpu/w_logf.c: Likewise.
	* sysdeps/i386/i686/fpu/e_logf.S: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_logf.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_logf-sse2.
	(CFLAGS-e_logf-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_logf-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_logf.c: Likewise.
2017-10-22 08:02:58 -07:00
H.J. Lu
7eda65f69e i386: Replace assembly versions of e_exp2f with generic e_exp2f.c
This patch replaces i386 assembly versions of e_exp2f with generic
e_exp2f.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      112.996          40.0454       182%
latency                    126.581          54.4479       132%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      113.14           39.447        186%
latency                    136.068          55.684        144%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      132.521          40.3759       228%
latency                    145.791          58.4587       149%

	* sysdeps/i386/fpu/e_exp2f.S: Removed.
	* sysdeps/i386/fpu/w_exp2f.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_exp2f.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add e_exp2f-sse2.
	(CFLAGS-e_exp2f-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_exp2f-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Likewise.
2017-10-22 08:00:18 -07:00
H.J. Lu
b2f6137ea5 i386: Replace assembly versions of e_expf with generic e_expf.c
This patch replaces i386 assembly versions of e_expf with generic
e_expf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      55.5724          40.2664       38%
latency                    80.0687          60.8517       31%

On Skylake, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      62.4056          39.4188       58%
latency                    85.5496          59.6377       43%

On IvyBridge with --disable-multi-arch, it improves performance by:

                           Before            After     Improvement
reciprocal-throughput      133.707          40.3778       231%
latency                    149.191          63.2515       135%

	* sysdeps/i386/fpu/e_exp2f_data.c: Removed.
	* sysdeps/i386/fpu/e_expf.S: Likewise.
	* sysdeps/i386/fpu/math_errf.c: Likewise.
	* sysdeps/i386/fpu/w_expf.c: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/e_expf-ia32.S: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/w_expf.c: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_expf.c.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove e_expf-ia32.
	(CFLAGS-e_expf-sse2.c): New.
	* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Rewritten.
2017-10-22 07:54:50 -07:00
Liubov Dmitrieva
22bf5c1793 Add optimized sincosf for SSE2 for x86 and x86-64 2012-09-25 20:47:20 +02:00
Liubov Dmitrieva
4ffffbd272 Add optimized sinf and cosf routines for x86 and x86-64
* sysdeps/i386/i686/fpu/multiarch/Makefile (sysdep_routines):
	Add s_sinf-sse2, s_conf-sse2.

	* sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S: New file.
	* sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S: New file.
	* sysdeps/i386/i686/fpu/multiarch/s_sinf.c: New file.
	* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: New file.

	* sysdeps/ieee754/flt-32/s_sinf.c (SINF, SINF_FUNC): Add macros
	for using routine as __sinf_ia32.
	Use macro for function declaration and weak_alias.
	* sysdeps/ieee754/flt-32/s_cosf.c (COSF, COSF_FUNC): Add macros
	for using routine as __cosf_ia32.
	Use macro for function declaration and weak_alias.

	* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S: Fix Copyright.
	* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Fix Copyright.

	* sysdeps/x86_64/fpu/s_sinf.S: New file.
	* sysdeps/x86_64/fpu/s_cosf.S: New file.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.

	* math/libm-test.inc (cos_test): Add more test cases.
	(sin_test): Likewise.
	(sincos_test): Likewise.
2012-09-03 15:32:13 +02:00
Liubov Dmitrieva
d7bb4c428a Add optimized expf for x86
2012-05-14  Liubov Dmitrieva  <liubov.dmitrieva@gmail.com>

	* sysdeps/i386/i686/fpu/multiarch/Makefile: New file.
	* sysdeps/i386/i686fpu/multiarch/e_expf.c: New file.
	* sysdeps/i386/i686fpu/multiarch/e_expf-ia32.S: New file.
	* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S: New file.
2012-05-14 11:23:56 +02:00