* Update ULP comment reflecting a new observed max in [-pi/2, pi/2]
* Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions
* Improve register use near special-case branch
Also use overloaded intrinsics for SVE.
Optimised implementations for single and double precision, Advanced
SIMD and SVE, copied from Arm Optimized Routines.
As previously, data tables are used via a barrier to prevent
overly aggressive constant inlining. Special-case handlers are
marked NOINLINE to avoid incurring the penalty of switching call
standards unnecessarily.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>