This operation can be simplified to use simpler multiply-round-convert
sequence, which uses fewer instructions and constants.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This includes a fix for big-endian in AdvSIMD log, some cosmetic
changes, and numerous small optimisations mainly around inlining and
using indexed variants of MLA intrinsics.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* Update ULP comment reflecting a new observed max in [-pi/2, pi/2]
* Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions
* Improve register use near special-case branch
Also use overloaded intrinsics for SVE.
Remove the unnecessary extra checks for sin (-0.0) from vector sin/sinf,
improving performance. Passes regress.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>