glibc

AuroraMiddleware/glibc

Fork 0

mirror of https://sourceware.org/git/glibc.git synced 2024-11-25 14:30:06 +00:00

Commit Graph

Author	SHA1	Message	Date
Joe Ramsay	5bc100bd4b	AArch64: Improve codegen in users of AdvSIMD log1pf helper log1pf is quite register-intensive - use fewer registers for the polynomial, and make various changes to shorten dependency chains in parent routines. There is now no spilling with GCC 14. Accuracy moves around a little - comments adjusted accordingly but does not require regen-ulps. Use the helper in log1pf as well, instead of having separate implementations. The more accurate polynomial means special-casing can be simplified, and the shorter dependency chain avoids the usual dance around v0, which is otherwise difficult. There is a small duplication of vectors containing 1.0f (or 0x3f800000) - GCC is not currently able to efficiently handle values which fit in FMOV but not MOVI, and are reinterpreted to integer. There may be potential for more optimisation if this is fixed. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2024-09-23 15:44:07 +01:00
Joe Ramsay	b09fee1d21	aarch64/fpu: Add vector variants of acosh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:32:58 +01:00

Author

SHA1

Message

Date

Joe Ramsay

5bc100bd4b

AArch64: Improve codegen in users of AdvSIMD log1pf helper

log1pf is quite register-intensive - use fewer registers for the
polynomial, and make various changes to shorten dependency chains in
parent routines.  There is now no spilling with GCC 14.  Accuracy moves
around a little - comments adjusted accordingly but does not require
regen-ulps.

Use the helper in log1pf as well, instead of having separate
implementations.  The more accurate polynomial means special-casing can
be simplified, and the shorter dependency chain avoids the usual dance
around v0, which is otherwise difficult.

There is a small duplication of vectors containing 1.0f (or 0x3f800000) -
GCC is not currently able to efficiently handle values which fit in FMOV
but not MOVI, and are reinterpreted to integer.  There may be potential
for more optimisation if this is fixed.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>

2024-09-23 15:44:07 +01:00

Joe Ramsay

b09fee1d21

aarch64/fpu: Add vector variants of acosh

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

2024-04-04 10:32:58 +01:00

2 Commits