Since the x86-64 assembly version of sincosf is higly optimized with
vector instructions, there isn't much room for improvement. However
s_sincosf.c written in C with vector math and intrinsics can be
optimized by GCC with FMA.
On Skylake, bench-sincosf reports performance improvement:
Assembly FMA improvement
max 104.042 101.008 3%
min 9.426 8.586 10%
mean 20.6209 18.2238 13%
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Add s_sincosf-sse2 and s_sincosf-fma.
(CFLAGS-s_sincosf-fma.c): New.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise.
* sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if
__sincosf is defined.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes x86_64 libm function implementations use
libm_alias_float to define function aliases, or libm_alias_float_other
where the main name is defined with versioned_symbol.
Tested with the glibc testsuite for x86_64, and tested with
build-many-glibcs.py for all its x86_64 configurations that installed
stripped shared libraries are unchanged by the patch.
* sysdeps/x86_64/fpu/multiarch/e_exp2f.c: Include
<libm-alias-float.h>.
(exp2f): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/x86_64/fpu/multiarch/e_expf.c: Include
<libm-alias-float.h>.
(exp2f): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/x86_64/fpu/multiarch/e_log2f.c: Include
<libm-alias-float.h>.
(exp2f): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/x86_64/fpu/multiarch/e_logf.c: Include
<libm-alias-float.h>.
(exp2f): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/x86_64/fpu/multiarch/e_powf.c: Include
<libm-alias-float.h>.
(exp2f): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Include
<libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Include
<libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c: Include
<libm-alias-float.h>.
(fmaf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/multiarch/s_nearbyintf.c: Include
<libm-alias-float.h>.
(nearbyintf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Include
<libm-alias-float.h>.
(rintf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Include
<libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/s_copysignf.S: Include <libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/s_cosf.S: Include <libm-alias-float.h>.
(cosf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/s_fabsf.c: Include <libm-alias-float.h>.
(fabsf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/s_fmaxf.S: Include <libm-alias-float.h>.
(fmaxf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/s_fminf.S: Include <libm-alias-float.h>.
(fminf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/s_llrintf.S: Include <libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
[!__ILP32__] (lrintf): Likewise.
* sysdeps/x86_64/fpu/s_sincosf.S: Include <libm-alias-float.h>.
(sincosf): Define using libm_alias_float.
* sysdeps/x86_64/fpu/s_sinf.S: Include <libm-alias-float.h>.
(sinf): Define using libm_alias_float.
* sysdeps/x86_64/x32/fpu/s_lrintf.S: Include <libm-alias-float.h>.
(lrintf): Define using libm_alias_float.
This is fairly complicated, not because the users of __need_Emath and
__need_error_t have complicated requirements, but because the core
changes had a lot of fallout.
__need_error_t exists for gnulib compatibility in argz.h and argp.h.
error_t itself is a Hurdism, an enum containing all the E-constants,
so you can do 'p (error_t) errno' in gdb and get a symbolic value.
argz.h and argp.h use it for function return values, and they want to
fall back to 'int' when that's not available. There is no reason why
these nonstandard headers cannot just go ahead and include all of
errno.h; so we do that.
__need_Emath is defined only by .S files; what they _really_ need is
for errno.h to avoid declaring anything other than the E-constants
(e.g. 'extern int __errno_location(void);' is a syntax error in
assembly language). This is replaced with a check for __ASSEMBLER__ in
errno.h, plus a carefully documented requirement for bits/errno.h not
to define anything other than macros. That in turn has the
consequence that bits/errno.h must not define errno - fortunately, all
live ports use the same definition of errno, so I've moved it to
errno.h. The Hurd bits/errno.h must also take care not to define
error_t when __ASSEMBLER__ is defined, which involves repeating all of
the definitions twice, but it's a generated file so that's okay.
* stdlib/errno.h: Remove __need_Emath and __need_error_t logic.
Reorganize file. Declare errno here. When __ASSEMBLER__ is
defined, don't declare anything other than the E-constants.
* include/errno.h: Change conditional for exposing internal
declarations to (not _ISOMAC and not __ASSEMBLER__).
* bits/errno.h: Remove logic for __need_Emath. Document
requirements for a port-specific bits/errno.h.
* sysdeps/unix/sysv/linux/bits/errno.h
* sysdeps/unix/sysv/linux/alpha/bits/errno.h
* sysdeps/unix/sysv/linux/hppa/bits/errno.h
* sysdeps/unix/sysv/linux/mips/bits/errno.h
* sysdeps/unix/sysv/linux/sparc/bits/errno.h:
Add multiple-include guard and check against improper inclusion.
Remove __need_Emath logic. Don't declare errno here. Ensure all
constants are defined as simple integer literals. Consistent
formatting.
* sysdeps/mach/hurd/errnos.awk: Likewise. Only define error_t and
enum __error_t_codes if __ASSEMBLER__ is not defined.
* sysdeps/mach/hurd/bits/errno.h: Regenerate.
* argp/argp.h, string/argz.h: Don't define __need_error_t before
including errno.h.
* sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S
* sysdeps/i386/i686/fpu/multiarch/s_sincosf-sse2.S
* sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S
* sysdeps/x86_64/fpu/s_cosf.S
* sysdeps/x86_64/fpu/s_sincosf.S
* sysdeps/x86_64/fpu/s_sinf.S:
Just include errno.h; don't define __need_Emath or include
bits/errno.h directly.