mirror of
https://sourceware.org/git/glibc.git
synced 2024-11-30 00:31:08 +00:00
9fc813e1a3
89 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Florian Weimer
|
9fc813e1a3 |
Implement <unwind-link.h> for dynamically loading the libgcc_s unwinder
This will be used to consolidate the libgcc_s access for backtrace and pthread_cancel. Unlike the existing backtrace implementations, it provides some hardening based on pointer mangling. Reviewed-by: Carlos O'Donell <carlos@redhat.com> |
||
Paul Eggert
|
2b778ceb40 |
Update copyright dates with scripts/update-copyrights
I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: *** pre-commit check failed ... remote: *** error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master |
||
Adhemerval Zanella
|
be668a8d78 |
New exp10f version without SVID compat wrapper
This patch changes the exp10f error handling semantics to only set errno according to POSIX rules. New symbol version is introduced at GLIBC_2.32. The old wrappers are kept for compat symbols. There are some outliers that need special handling: - ia64 provides an optimized implementation of exp10f that uses ia64 specific routines to set SVID compatibility. The new symbol version is aliased to the exp10f one. - m68k also provides an optimized implementation, and the new version uses it instead of the sysdeps/ieee754/flt32 one. - riscv and csky uses the generic template implementation that does not provide SVID support. For both cases a new exp10f version is not added, but rather the symbols version of the generic sysdeps/ieee754/flt32 is adjusted instead. Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu, powerpc64le-linux-gnu. |
||
Adhemerval Zanella
|
1c15464ca0 |
math: Remove inline math tests
With mathinline removal there is no need to keep building and testing inline math tests. The gen-libm-tests.py support to generate ULP_I_* is removed and all libm-test-ulps files are updated to longer have the i{float,double,ldouble} entries. The support for no-test-inline is also removed from both gen-auto-libm-tests and the auto-libm-test-out-* were regenerated. Checked on x86_64-linux-gnu and i686-linux-gnu. |
||
Adhemerval Zanella
|
4bad2e014e |
m68k: Remove mathinline.h
This is similar to x86 ( |
||
Tulio Magno Quites Machado Filho
|
c624d23260 |
Add a generic scalb implementation
This is a preparatory patch to enable building a _Float128 variant to ease reuse when building a _Float128 variant to alias this long double only symbol. Notably, stubs are added where missing to the native _Float128 sysdep dir to prevent building these newly templated variants created inside the build directories. Also noteworthy are the changes around LIBM_SVID_COMPAT. These changes are not intuitive. The templated version is only enabled when !LIBM_SVID_COMPAT, and the compat version is predicated entirely on LIBM_SVID_COMPAT. Thus, exactly one is stubbed out entirely when building. The nldbl scalb compat files are updated to account for this. Likewise, fixup the reuse of m68k's e_scalb{f,l}.c to include it's override of e_scalb.c. Otherwise, the search path finds the templated copy in the build directory. This could be futher simplified by providing an overridden template, but I lack the hardware to verify. |
||
Wilco Dijkstra
|
220622dde5 |
Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite symbol. It sets all _finite symbol as compat symbol based on its first version (obtained from the definition at built generated first-versions.h). The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need special treatment in code that is shared between long double and float128. It is done by adding a list, similar to internal symbol redifinition, on sysdeps/ieee754/float128/float128_private.h. Alpha also needs some tricky changes to ensure we still emit 2 compat symbols for sqrt(f). Passes buildmanyglibc. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> |
||
Joseph Myers
|
d614a75396 | Update copyright dates with scripts/update-copyrights. | ||
Paul Eggert
|
5a82c74822 |
Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '*.po' \ ! -name 'ChangeLog*' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S |
||
Joseph Myers
|
e0cb7b6131 |
Add and move fall-through comments in system-specific code.
This patch fixes -Wimplicit-fallthrough warnings in system-specific code that show up building glibc with -Wextra, by adding fall-through comments, or moving existing such comments to the place required for them to work (immediately before the case label being fallen through). Tested with build-many-glibcs.py. * sysdeps/i386/dl-machine.h (elf_machine_rela): Add fall-through comments. * sysdeps/m68k/m680x0/fpu/s_cexp_template.c (s(__cexp)): Likewise. * sysdeps/m68k/memcopy.h (WORD_COPY_FWD): Likewise. (WORD_COPY_BWD): Likewise. * sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise. * sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise. * sysdeps/s390/iso-8859-1_cp037_z900.c (TR_LOOP): Likewise. * sysdeps/mips/dl-machine.h (elf_machine_reloc): Move fall-through comment. * sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise. |
||
Joseph Myers
|
04277e02d7 |
Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise. |
||
Szabolcs Nagy
|
a502c5294b |
Remove the error handling wrapper from pow
Introduce new pow symbol version that doesn't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty w_pow.c and enabled for targets with their own pow implementation or ifunc dispatch on __ieee754_pow by including math/w_pow.c. The compatibility symbol version still uses the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously powl was an alias of pow, now it points to the compatibility symbol with the wrapper, because it still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The __pow_finite symbol is now an alias of pow. Both __pow_finite and pow set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header. Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add pow. * math/w_pow_compat.c (__pow_compat): Change to versioned compat symbol. * math/w_pow.c: New file. * sysdeps/i386/fpu/w_pow.c: New file. * sysdeps/ia64/fpu/e_pow.S: Add versioned symbols. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Rename to __pow and add necessary aliases. * sysdeps/ieee754/dbl-64/w_pow.c: New file. * sysdeps/m68k/m680x0/fpu/w_pow.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__ieee754_pow): Rename to __pow. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__ieee754_pow): Likewise. * sysdeps/x86_64/fpu/multiarch/e_pow.c (__ieee754_pow): Likewise. * sysdeps/x86_64/fpu/multiarch/w_pow.c: New file. |
||
Szabolcs Nagy
|
718d6542f2 |
Remove the error handling wrapper from log2
Introduce new log2 symbol version that doesn't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty w_log2.c and enabled for targets with their own log2 implementation by including math/w_log2.c. The compatibility symbol version still uses the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously log2l was an alias of log2, now it points to the compatibility symbol with the wrapper, because it still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The __log2_finite symbol is now an alias of log2. Both __log2_finite and log2 set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add log2. * math/w_log2_compat.c (__log2_compat): Change to versioned compat symbol. * math/w_log2.c: New file. * sysdeps/i386/fpu/w_log2.c: New file. * sysdeps/ia64/fpu/e_log2.S: Add versioned symbols. * sysdeps/ieee754/dbl-64/e_log2.c (__ieee754_log2): Rename to __log2 and add necessary aliases. * sysdeps/ieee754/dbl-64/w_log2.c: New file. * sysdeps/m68k/m680x0/fpu/w_log2.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. |
||
Szabolcs Nagy
|
f29b7c492d |
Remove the error handling wrapper from log
Introduce new log symbol version that doesn't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty w_log.c and enabled for targets with their own log implementation by including math/w_log.c. The compatibility symbol version still uses the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously logl was an alias of log, now it points to the compatibility symbol with the wrapper, because it still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g. arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The __log_finite symbol is now an alias of log. Both __log_finite and log set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header. Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add log. * math/w_log_compat.c (__log_compat): Change to versioned compat symbol. * math/w_log.c: New file. * sysdeps/i386/fpu/w_log.c: New file. * sysdeps/ia64/fpu/e_log.S: Update. * sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Rename to __log and add necessary aliases. * sysdeps/ieee754/dbl-64/w_log.c: New file. * sysdeps/m68k/m680x0/fpu/w_log.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_log-avx.c (__ieee754_log): Rename to __log. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma4.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/e_log.c (__ieee754_log): Likewise. * sysdeps/x86_64/fpu/multiarch/w_log.c: New file. |
||
Szabolcs Nagy
|
c20a10561a |
Remove the error handling wrapper from exp and exp2
Introduce new exp and exp2 symbol version that don't do SVID compatible error handling. The standard errno and fp exception based error handling is inline in the new code and does not have significant overhead. The double precision wrappers are disabled for sysdeps/ieee754/dbl-64 by using empty w_exp.c and w_exp2.c files, the math/w_exp.c and math/w_exp2.c files use the wrapper template and can be included by targets that have their own exp and exp2 implementations or use ifunc on the glibc internal __ieee754_exp symbol. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). On targets where previously expl and exp2l were aliases of exp and exp2, now they point to the compatibility symbols with the wrapper, because they still need the SVID compatible error handling. This affects NO_LONG_DOUBLE (e.g arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well. The _finite symbols are now aliases of the standard symbols (they have no performance advantage anymore). Both the standard symbols and _finite symbols set errno and thus not const functions. The ia64 asm is changed so the compat and new symbol versions map to the same address. On x86_64 #include <math.h> was added before macro definitions that may affect that header (the new macro name is __exp instead of __ieee754_exp which breaks some math.h macros). Tested with build-many-glibcs.py. * math/Versions (GLIBC_2.29): Add exp and exp2. * math/w_exp2_compat.c (__exp2_compat): Change to versioned compat symbol, handle NO_LONG_DOUBLE and LONG_DOUBLE_COMPAT explicitly. * math/w_exp_compat.c (__exp_compat): Likewise. * math/w_exp.c: New file. * math/w_exp2.c: New file. * sysdeps/i386/fpu/w_exp.c: New file. * sysdeps/i386/fpu/w_exp2.c: New file. * sysdeps/ia64/fpu/e_exp.S: Add versioned symbols. * sysdeps/ia64/fpu/e_exp2.S: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Rename to __exp and add necessary aliases. * sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Rename to __exp2 and add necessary aliases. * sysdeps/ieee754/dbl-64/w_exp.c: New file. * sysdeps/ieee754/dbl-64/w_exp2.c: New file. * sysdeps/m68k/m680x0/fpu/w_exp.c: New file. * sysdeps/m68k/m680x0/fpu/w_exp2.c: New file. * sysdeps/mach/hurd/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update. * sysdeps/unix/sysv/linux/alpha/libm.abilist: Update. * sysdeps/unix/sysv/linux/arm/libm.abilist: Update. * sysdeps/unix/sysv/linux/hppa/libm.abilist: Update. * sysdeps/unix/sysv/linux/i386/libm.abilist: Update. * sysdeps/unix/sysv/linux/ia64/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update. * sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update. * sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update. * sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update. * sysdeps/unix/sysv/linux/nios2/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update. * sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update. * sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update. * sysdeps/unix/sysv/linux/sh/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update. * sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update. * sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__exp1): Remove. (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/e_exp.c (__ieee754_exp): Rename to __exp. * sysdeps/x86_64/fpu/multiarch/w_exp.c: New file. |
||
Joseph Myers
|
7abf97bed9 |
Use trunc functions not __trunc functions in glibc libm.
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __trunc functions to call the corresponding trunc names instead, with asm redirection to __trunc when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_truncf.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise. * sysdeps/ieee754/float128/s_truncf128.c: Likewise. * sysdeps/ieee754/dbl-64/s_trunc.c: Likewise. * sysdeps/ieee754/flt-32/s_truncf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise. * sysdeps/riscv/rvf/s_truncf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise. (ceil): Redirect to __ceil. (floor): Redirect to __floor. (trunc): Redirect to __trunc. (__truncl): Call trunc instead of __trunc. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc): Remove macro. [_ARCH_PWR5X] (__truncf): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use trunc functions instead of __trunc variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. |
||
Szabolcs Nagy
|
424c4f60ed |
Add new pow implementation
The algorithm is exp(y * log(x)), where log(x) is computed with about 1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result in two doubles, and the exp part uses the same algorithm (and lookup tables) as exp, but takes the input as two doubles and a sign (to handle negative bases with odd integer exponent). The __exp1 internal symbol is no longer necessary. There is separate code path when fma is not available but the worst case error is about 0.54 ULP in both cases. The lookup table and consts for log are 4168 bytes. The .rodata+.text is decreased by 37908 bytes on aarch64. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1] pow latency: 1.84x in [0.01 11.1]x[0.01 11.1] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets. * NEWS: Mention pow improvements. * math/Makefile (type-double-routines): Add e_pow_log_data. * sysdeps/generic/math_private.h (__exp1): Remove. * sysdeps/i386/fpu/e_pow_log_data.c: New file. * sysdeps/ia64/fpu/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma contraction. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove. (exp_inline): Remove. (__ieee754_exp): Only single double input is handled. * sysdeps/ieee754/dbl-64/e_pow.c: Rewrite. * sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define. (__pow_log_data): Define. * sysdeps/ieee754/dbl-64/upow.h: Remove. * sysdeps/ieee754/dbl-64/upow.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file. * sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma contraction. (CFLAGS-e_pow-fma4.c): Likewise. |
||
Joseph Myers
|
71223ef909 |
Use ceil functions not __ceil functions in glibc libm.
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __ceil functions to call the corresponding ceil names instead, with asm redirection to __ceil when the calls are not inlined. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_ceilf.c: Likewise. * sysdeps/ieee754/dbl-64/s_ceil.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise. * sysdeps/ieee754/float128/s_ceilf128.c: Likewise. * sysdeps/ieee754/flt-32/s_ceilf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise. * sysdeps/riscv/rvf/s_ceilf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil): Remove macro. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil functions instead of __ceil variants. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise. |
||
Joseph Myers
|
f29b6f17e4 |
Use rint functions not __rint functions in glibc libm.
Continuing the move to use, within libm, public names for libm functions that can be inlined as built-in functions on many architectures, this patch moves calls to __rint functions to call the corresponding rint names instead, with asm redirection to __rint when the calls are not inlined. The x86_64 math_private.h is removed as no longer useful after this patch. This patch is relative to a tree with my floor patch <https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied, and much the same considerations arise regarding possibly replacing an IFUNC call with a direct inline expansion. Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect using MATH_REDIRECT. * sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_rintf.c: Likewise. * sysdeps/alpha/fpu/s_rint.c: Likewise. * sysdeps/alpha/fpu/s_rintf.c: Likewise. * sysdeps/i386/fpu/s_rintl.c: Likewise. * sysdeps/ieee754/dbl-64/s_rint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise. * sysdeps/ieee754/float128/s_rintf128.c: Likewise. * sysdeps/ieee754/flt-32/s_rintf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise. * sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise. * sysdeps/powerpc/fpu/s_rint.c: Likewise. * sysdeps/powerpc/fpu/s_rintf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_rint.c: Likewise. * sysdeps/riscv/rvf/s_rintf.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise. * sysdeps/x86_64/fpu/math_private.h: Remove file. * math/e_scalb.c (invalid_fn): Use rint functions instead of __rint variants. * math/e_scalbf.c (invalid_fn): Likewise. * math/e_scalbl.c (invalid_fn): Likewise. * sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r): Likewise. * sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise. * sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise. |
||
Joseph Myers
|
e44acb2063 |
Use floor functions not __floor functions in glibc libm.
Similar to the changes that were made to call sqrt functions directly in glibc, instead of __ieee754_sqrt variants, so that the compiler could inline them automatically without needing special inline definitions in lots of math_private.h headers, this patch makes libm code call floor functions directly instead of __floor variants, removing the inlines / macros for x86_64 (SSE4.1) and powerpc (POWER5). The redirection used to ensure that __ieee754_sqrt does still get called when the compiler doesn't inline a built-in function expansion is refactored so it can be applied to other functions; the refactoring is arranged so it's not limited to unary functions either (it would be reasonable to use this mechanism for copysign - removing the inline in math_private_calls.h but also eliminating unnecessary local PLT entry use in the cases (powerpc soft-float and e500v1, for IBM long double) where copysign calls don't get inlined). The point of this change is that more architectures can get floor calls inlined where they weren't previously (AArch64, for example), without needing special inline definitions in their math_private.h, and existing such definitions in math_private.h headers can be removed. Note that it's possible that in some cases an inline may be used where an IFUNC call was previously used - this is the case on x86_64, for example. I think the direct calls to floor are still appropriate; if there's any significant performance cost from inline SSE2 floor instead of an IFUNC call ending up with SSE4.1 floor, that indicates that either the function should be doing something else that's faster than using floor at all, or it should itself have IFUNC variants, or that the compiler choice of inlining for generic tuning should change to allow for the possibility that, by not inlining, an SSE4.1 IFUNC might be called at runtime - but not that glibc should avoid calling floor internally. (After all, all the same considerations would apply to any user program calling floor, where it might either be inlined or left as an out-of-line call allowing for a possible IFUNC.) Tested for x86_64, and with build-many-glibcs.py. * include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT): New macro. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT. [!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (floor): Likewise. * sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before header inclusion. * sysdeps/aarch64/fpu/s_floorf.c: Likewise. * sysdeps/ieee754/dbl-64/s_floor.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise. * sysdeps/ieee754/float128/s_floorf128.c: Likewise. * sysdeps/ieee754/flt-32/s_floorf.c: Likewise. * sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/riscv/rv64/rvd/s_floor.c: Likewise. * sysdeps/riscv/rvf/s_floorf.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise. * sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor): Remove macro. [_ARCH_PWR5X] (__floorf): Likewise. * sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove inline function. [__SSE4_1__] (__floorf): Likewise. * math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions instead of __floor variants. * math/w_lgamma_r_compat.c (__lgamma_r): Likewise. * math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise. * math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise. * math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise. * math/w_lgammal_r_compat.c (__lgammal_r): Likewise. * math/w_tgamma_compat.c (__tgamma): Likewise. * math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise. * math/w_tgammaf_compat.c (__tgammaf): Likewise. * math/w_tgammal_compat.c (__tgammal): Likewise. * sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise. * sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2): Likewise. * sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise. * sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise. * sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise. * sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise. * sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise. * sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise. * sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise. * sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise. |
||
Szabolcs Nagy
|
3e08ff544b |
Add new log2 implementation
Similar algorithm is used as in log: log2(2^k x) = k + log2(c) + log2(x/c) where the last term is approximated by a polynomial of x/c - 1, the first order coefficient is about 1/ln2 in this case. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, for which the table size is doubled. The worst case error is 0.547 ULP (0.55 without fma), the read only global data size is 1168 bytes (2192 without fma) on aarch64. The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log2 thruput: 2.00x in [0.01 11.1] log2 latency: 2.04x in [0.01 11.1] log2 thruput: 2.17x in [0.999 1.001] log2 latency: 2.88x in [0.999 1.001] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA) arm-linux-gnueabihf (!defined __FP_FAST_FMA) x86_64-linux-gnu (!defined __FP_FAST_FMA) powerpc64le-linxu-gnu (defined __FP_FAST_FMA) targets. * NEWS: Mention log2 improvements. * math/Makefile (type-double-routines): Add e_log2_data. * sysdeps/i386/fpu/e_log2_data.c: New file. * sysdeps/ia64/fpu/e_log2_data.c: New file. * sysdeps/ieee754/dbl-64/e_log2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_log2_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (__log2_data): Add. * sysdeps/ieee754/dbl-64/wordsize-64/e_log2.c: Remove. * sysdeps/m68k/m680x0/fpu/e_log2_data.c: New file. |
||
Szabolcs Nagy
|
f41b0a43e4 |
Add new log implementation
Optimized log using carefully generated lookup table with 1/c and log(c) values for small intervalls around 1. The log(c) is very near a double precision value, it has about 62 bits precision. The algorithm is log(2^k x) = k log(2) + log(c) + log(x/c), where the last term is approximated by a polynomial of x/c - 1. Near 1 a single polynomial of x - 1 is used. There is separate code path when fma instruction is not available for computing x/c - 1 precisely, in which case the table size is doubled. The code uses __builtin_fma under __FP_FAST_FMA to ensure it is inlined as an instruction. With the default configuration settings the worst case error is 0.519 ULP (and 0.520 without fma), the rodata size is 2192 bytes (4240 without fma). The non-nearest rounding error is less than 1 ULP. Improvements on Cortex-A72 compared to current glibc master: log thruput: 3.28x in [0.01 11.1] log latency: 2.23x in [0.01 11.1] log thruput: 1.56x in [0.999 1.001] log latency: 1.57x in [0.999 1.001] Tested on aarch64-linux-gnu (defined __FP_FAST_FMA) arm-linux-gnueabihf (!defined __FP_FAST_FMA) x86_64-linux-gnu (!defined __FP_FAST_FMA) powerpc64le-linux-gnu (defined __FP_FAST_FMA) targets. * NEWS: Mention log improvement. * math/Makefile (type-double-routines): Add e_log_data. * sysdeps/i386/fpu/e_log_data.c: New file. * sysdeps/ia64/fpu/e_log_data.c: New file. * sysdeps/ieee754/dbl-64/e_log.c: Rewrite. * sysdeps/ieee754/dbl-64/e_log_data.c: New file. * sysdeps/ieee754/dbl-64/math_config.h (__log_data): Add. * sysdeps/ieee754/dbl-64/ulog.h: Remove. * sysdeps/ieee754/dbl-64/ulog.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_log_data.c: New file. |
||
Szabolcs Nagy
|
e70c176825 |
Add new exp and exp2 implementations
Optimized exp and exp2 implementations using a lookup table for fractional powers of 2. There are several variants, see e_exp_data.c, they can be selected by modifying math_config.h allowing different tradeoffs. The default selection should be acceptable as generic libm code. Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on aarch64 the rodata size is 2160 bytes, shared between exp and exp2. On aarch64 .text + .rodata size decreased by 24912 bytes. The non-nearest rounding error is less than 1 ULP even on targets without efficient round implementation (although the error rate is higher in that case). Targets with single instruction, rounding mode independent, to nearest integer rounding and conversion can use them by setting TOINT_INTRINSICS and adding the necessary code to their math_private.h. The __exp1 code uses the same algorithm, so the error bound of pow increased a bit. New double precision error handling code was added following the style of the single precision error handling code. Improvements on Cortex-A72 compared to current glibc master: exp thruput: 1.61x in [-9.9 9.9] exp latency: 1.53x in [-9.9 9.9] exp thruput: 1.13x in [0.5 1] exp latency: 1.30x in [0.5 1] exp2 thruput: 2.03x in [-9.9 9.9] exp2 latency: 1.64x in [-9.9 9.9] For small (< 1) inputs the current exp code uses a separate algorithm so the speed up there is less. Was tested on aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets, only non-nearest rounding ulp errors increase and they are within acceptable bounds (ulp updates are in separate patches). * NEWS: Mention exp and exp2 improvements. * math/Makefile (libm-support): Remove t_exp. (type-double-routines): Add math_err and e_exp_data. * sysdeps/aarch64/libm-test-ulps: Update. * sysdeps/arm/libm-test-ulps: Update. * sysdeps/i386/fpu/e_exp_data.c: New file. * sysdeps/i386/fpu/math_err.c: New file. * sysdeps/i386/fpu/t_exp.c: Remove. * sysdeps/ia64/fpu/e_exp_data.c: New file. * sysdeps/ia64/fpu/math_err.c: New file. * sysdeps/ia64/fpu/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/e_exp.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite. * sysdeps/ieee754/dbl-64/e_exp_data.c: New file. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound. * sysdeps/ieee754/dbl-64/eexp.tbl: Remove. * sysdeps/ieee754/dbl-64/math_config.h: New file. * sysdeps/ieee754/dbl-64/math_err.c: New file. * sysdeps/ieee754/dbl-64/t_exp.c: Remove. * sysdeps/ieee754/dbl-64/t_exp2.h: Remove. * sysdeps/ieee754/dbl-64/uexp.h: Remove. * sysdeps/ieee754/dbl-64/uexp.tbl: Remove. * sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_err.c: New file. * sysdeps/m68k/m680x0/fpu/t_exp.c: Remove. * sysdeps/powerpc/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Update. |
||
Wilco Dijkstra
|
ca3aac57ef |
Remove unused math files
Remove empty files due to the sin/cos improvements: k_sinf.c, k_cosf.c, k_cos.c, k_sin.c. After the tanf change s_rem_pio2f.c and k_rem_pio2f.c (and the ia64, m68k and powerpc equivalents) are no longer used, so remove them. All e_rem_pio2.c files were already empty or commented out, so remove them too. Passes build-many-glibcs. * math/Makefile: Remove empty files k_sin(f).c, k_cos(f).c. Remove unused files e_rem_pio2(f).c, k_rem_pio2f.c. * sysdeps/i386/fpu/e_rem_pio2.c: Delete file. * sysdeps/ia64/fpu/e_rem_pio2.c: Likewise. * sysdeps/ia64/fpu/e_rem_pio2f.c: Likewise. * sysdeps/ia64/fpu/k_rem_pio2f.c: Likewise. * sysdeps/ieee754/dbl-64/e_rem_pio2.c: Likewise. * sysdeps/ieee754/dbl-64/k_cos.c: Likewise. * sysdeps/ieee754/dbl-64/k_sin.c: Likewise. * sysdeps/ieee754/flt-32/e_rem_pio2f.c: Likewise. * sysdeps/ieee754/flt-32/k_cosf.c: Likewise. * sysdeps/ieee754/flt-32/k_rem_pio2f.c: Likewise. * sysdeps/ieee754/flt-32/k_sinf.c: Likewise. * sysdeps/m68k/m680x0/fpu/e_rem_pio2.c: Likewise * sysdeps/m68k/m680x0/fpu/e_rem_pio2f.c: Likewise * sysdeps/m68k/m680x0/fpu/k_rem_pio2f.c: Likewise * sysdeps/powerpc/fpu/e_rem_pio2f.c: Likewise. * sysdeps/powerpc/fpu/k_rem_pio2f.c: Likewise. |
||
Wilco Dijkstra
|
ea5c662c62 |
Improve performance of sincosf
This patch is a complete rewrite of sincosf. The new version is significantly faster, as well as simple and accurate. The worst-case ULP is 0.5607, maximum relative error is 0.5303 * 2^-23 over all 4 billion inputs. In non-nearest rounding modes the error is 1ULP. The algorithm uses 3 main cases: small inputs which don't need argument reduction, small inputs which need a simple range reduction and large inputs requiring complex range reduction. The code uses approximate integer comparisons to quickly decide between these cases. The small range reducer uses a single reduction step to handle values up to 120.0. It is fastest on targets which support inlined round instructions. The large range reducer uses integer arithmetic for simplicity. It does a 32x96 bit multiply to compute a 64-bit modulo result. This is more than accurate enough to handle the worst-case cancellation for values close to an integer multiple of PI/4. It could be further optimized, however it is already much faster than necessary. sincosf throughput gains on Cortex-A72: * |x| < 0x1p-12 : 1.6x * |x| < M_PI_4 : 1.7x * |x| < 2 * M_PI: 1.5x * |x| < 120.0 : 1.8x * |x| < Inf : 2.3x * math/Makefile: Add s_sincosf_data.c. * sysdeps/ia64/fpu/s_sincosf_data.c: New file. * sysdeps/ieee754/flt-32/s_sincosf.h (abstop12): Add new function. (sincosf_poly): Likewise. (reduce_small): Likewise. (reduce_large): Likewise. * sysdeps/ieee754/flt-32/s_sincosf.c (sincosf): Rewrite. * sysdeps/ieee754/flt-32/s_sincosf_data.c: New file with sincosf data. * sysdeps/m68k/m680x0/fpu/s_sincosf_data.c: New file. * sysdeps/x86_64/fpu/s_sincosf_data.c: New file. |
||
Tulio Magno Quites Machado Filho
|
d93f4ff16b |
m68k: Reorganize log1p and significand implementations
Commit
|
||
Joseph Myers
|
b4d5b8b021 |
Do not include math-barriers.h in math_private.h.
This patch continues the math_private.h cleanup by stopping math_private.h from including math-barriers.h and making the users of the barrier macros include the latter header directly. No attempt is made to remove any math_private.h includes that are now unused, except in strtod_l.c where that is done to avoid line number changes in assertions, so that installed stripped shared libraries can be compared before and after the patch. (I think the floating-point environment support in math_private.h should also move out - some architectures already have fenv_private.h as an architecture-internal header included from their math_private.h - and after moving that out might be a better time to identify unused math_private.h includes.) Tested for x86_64 and x86, and tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by the patch. * sysdeps/generic/math_private.h: Do not include <math-barriers.h>. * stdlib/strtod_l.c: Include <math-barriers.h> instead of <math_private.h>. * math/fromfp.h: Include <math-barriers.h>. * math/math-narrow.h: Likewise. * math/s_nextafter.c: Likewise. * math/s_nexttowardf.c: Likewise. * sysdeps/aarch64/fpu/s_llrint.c: Likewise. * sysdeps/aarch64/fpu/s_llrintf.c: Likewise. * sysdeps/aarch64/fpu/s_lrint.c: Likewise. * sysdeps/aarch64/fpu/s_lrintf.c: Likewise. * sysdeps/i386/fpu/s_nextafterl.c: Likewise. * sysdeps/i386/fpu/s_nexttoward.c: Likewise. * sysdeps/i386/fpu/s_nexttowardf.c: Likewise. * sysdeps/ieee754/dbl-64/e_atan2.c: Likewise. * sysdeps/ieee754/dbl-64/e_atanh.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp2.c: Likewise. * sysdeps/ieee754/dbl-64/e_j0.c: Likewise. * sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise. * sysdeps/ieee754/dbl-64/s_expm1.c: Likewise. * sysdeps/ieee754/dbl-64/s_fma.c: Likewise. * sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise. * sysdeps/ieee754/dbl-64/s_log1p.c: Likewise. * sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise. * sysdeps/ieee754/flt-32/e_atanhf.c: Likewise. * sysdeps/ieee754/flt-32/e_j0f.c: Likewise. * sysdeps/ieee754/flt-32/s_expm1f.c: Likewise. * sysdeps/ieee754/flt-32/s_log1pf.c: Likewise. * sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise. * sysdeps/ieee754/flt-32/s_nextafterf.c: Likewise. * sysdeps/ieee754/k_standardl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_expl.c: Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nextafterl.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nexttoward.c: Likewise. * sysdeps/ieee754/ldbl-128/s_nexttowardf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c: Likewise. * sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise. * sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise. * sysdeps/ieee754/ldbl-96/e_j0l.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fma.c: Likewise. * sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise. * sysdeps/ieee754/ldbl-96/s_nexttoward.c: Likewise. * sysdeps/ieee754/ldbl-96/s_nexttowardf.c: Likewise. * sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_nextafterl.c: Likewise. |
||
Joseph Myers
|
9ed2e15ff4 |
Move math_opt_barrier, math_force_eval to separate math-barriers.h.
This patch continues cleaning up math_private.h by moving the math_opt_barrier and math_force_eval macros to a separate header math-barriers.h. At present, those macros are inside a "#ifndef math_opt_barrier" in math_private.h to allow architectures to override them and then use a separate math-barriers.h header, no such #ifndef or #include_next is needed; architectures just have their own alternative version of math-barriers.h when providing their own optimized versions that avoid going through memory unnecessarily. The generic math-barriers.h has a comment added to document these two macros. In this patch, math_private.h is made to #include <math-barriers.h>, so files using these macros do not need updating yet. That is because of uses of math_force_eval in math_check_force_underflow and math_check_force_underflow_nonneg, which are still defined in math_private.h. Once those are moved out to a separate header, that separate header can be made to include <math-barriers.h>, as can the other files directly using these barrier macros, and then the include of <math-barriers.h> from math_private.h can be removed. Tested for x86_64 and x86. Also tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by this patch. * sysdeps/generic/math-barriers.h: New file. * sysdeps/generic/math_private.h [!math_opt_barrier] (math_opt_barrier): Move to math-barriers.h. [!math_opt_barrier] (math_force_eval): Likewise. * sysdeps/aarch64/fpu/math-barriers.h: New file. * sysdeps/aarch64/fpu/math_private.h (math_opt_barrier): Move to math-barriers.h. (math_force_eval): Likewise. * sysdeps/alpha/fpu/math-barriers.h: New file. * sysdeps/alpha/fpu/math_private.h (math_opt_barrier): Move to math-barriers.h. (math_force_eval): Likewise. * sysdeps/x86/fpu/math-barriers.h: New file. * sysdeps/i386/fpu/fenv_private.h (math_opt_barrier): Move to math-barriers.h. (math_force_eval): Likewise. * sysdeps/m68k/m680x0/fpu/math_private.h: Move to.... * sysdeps/m68k/m680x0/fpu/math-barriers.h: ... here. Adjust multiple-include guard for rename. * sysdeps/powerpc/fpu/math-barriers.h: New file. * sysdeps/powerpc/fpu/math_private.h (math_opt_barrier): Move to math-barriers.h. (math_force_eval): Likewise. |
||
Wilco Dijkstra
|
22679b2c33 |
Revert m68k __ieee754_sqrt change
Revert m68k __ieee754_sqrt change as it causes a build failure in one m68k configuration. m68k-linux-gnu now passes again. * sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Revert previous commit. |
||
Wilco Dijkstra
|
700593fdd7 |
Remove all target specific __ieee754_sqrt(f/l) inlines
Remove the now unused target specific__ieee754_sqrt(f/l) inlines. Also remove inlines of sqrt which are for really old GCC versions. Removing these is desirable, under the general principle of leaving such inlining to the compiler rather than trying to do it in installed headers, especially when only very old compilers are affected. Note that removing inlines for __ieee754_sqrt disables inlining in the sqrt wrapper functions. Given the sqrt function will typically only be called for negative arguments, it doesn't matter whether the inlining happens or not. * sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove. (__ieee754_sqrtf): Remove. * sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove. (__ieee754_sqrtf): Remove. * sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt. * sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove. * sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove. (__ieee754_sqrtf): Remove. * sysdeps/s390/fpu/bits/mathinline.h: Remove file. * sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove. (sqrtf): Remove. (sqrtl): Remove. (__ieee754_sqrt): Remove. (__ieee754_sqrtf): Remove. (__ieee754_sqrtl): Remove. * sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove. * sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove. * sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove. (__ieee754_sqrtf): Remove. (__ieee754_sqrtl): Remove. |
||
Wilco Dijkstra
|
f67a8147b0 |
Rename all __ieee754_sqrt(f/l) calls to sqrt(f/l)
Use sqrt(f/l) to enable inlining by GCC - if inlining doesn't happen, the asm redirect ensures we will still call __ieee754_sqrt(f/l). * sysdeps/ieee754/dbl-64/e_acosh.c (__ieee754_acosh): Use sqrt. * sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Likewise. * sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Likewise. * sysdeps/ieee754/dbl-64/e_j0.c (__ieee754_j0): Likewise. * sysdeps/ieee754/dbl-64/e_j1.c (__ieee754_j1): Likewise. * sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Likewise. * sysdeps/ieee754/dbl-64/s_asinh.c (__asinh): Likewise. * sysdeps/ieee754/dbl-64/wordsize-64/e_acosh.c (__ieee754_acosh): Likewise. * sysdeps/ieee754/flt-32/e_acosf.c (__ieee754_acosf): Likewise. * sysdeps/ieee754/flt-32/e_acoshf.c (__ieee754_acoshf): Likewise. * sysdeps/ieee754/flt-32/e_asinf.c (__ieee754_asinf): Likewise. * sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise. * sysdeps/ieee754/flt-32/e_hypotf.c (__ieee754_hypotf): Likewise. * sysdeps/ieee754/flt-32/e_j0f.c (__ieee754_j0f): Likewise. * sysdeps/ieee754/flt-32/e_j1f.c (__ieee754_j1f): Likewise. * sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Likewise. * sysdeps/ieee754/flt-32/s_asinhf.c (__asinhf): Likewise. * sysdeps/ieee754/ldbl-128/e_acoshl.c (__ieee754_acoshl): Use sqrtl. * sysdeps/ieee754/ldbl-128/e_acosl.c (__ieee754_acosl): Likewise. * sysdeps/ieee754/ldbl-128/e_asinl.c (__ieee754_asinl): Likewise. * sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl): Likewise. * sysdeps/ieee754/ldbl-128/e_j0l.c (__ieee754_j0l): Likewise. * sysdeps/ieee754/ldbl-128/e_j1l.c (__ieee754_j1l): Likewise. * sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise. * sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128/s_asinhl.c (__ieee754_asinhl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_acoshl.c (__ieee754_acoshl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_acosl.c (__ieee754_acosl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_asinl.c (__ieee754_asinl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_j0l.c (__ieee754_j0l): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_j1l.c (__ieee754_j1l): Likewise * sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise. * sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise. * sysdeps/ieee754/ldbl-128ibm/s_asinhl.c (__ieee754_asinhl): Likewise. * sysdeps/ieee754/ldbl-96/e_acoshl.c (__ieee754_acoshl): Use sqrtl. * sysdeps/ieee754/ldbl-96/e_asinl.c (__ieee754_asinl): Likewise. * sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Likewise. * sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise. * sysdeps/ieee754/ldbl-96/e_j0l.c (__ieee754_j0l): Likewise. * sysdeps/ieee754/ldbl-96/e_j1l.c (__ieee754_j1l): Likewise. * sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise. * sysdeps/ieee754/ldbl-96/s_asinhl.c (__ieee754_asinhl): Likewise. * sysdeps/m68k/m680x0/fpu/e_pow.c (__ieee754_pow): Likewise. * sysdeps/powerpc/fpu/e_hypot.c (__ieee754_hypot): Likewise. * sysdeps/powerpc/fpu/e_hypotf.c (__ieee754_hypotf): Likewise. |
||
Wilco Dijkstra
|
610ee1fc93 |
Remove mplog and mpexp
Remove the now unused mplog and mpexp files. * math/Makefile: Remove mpexp.c and mplog.c * sysdeps/i386/fpu/mpexp.c: Delete file. * sysdeps/i386/fpu/mplog.c: Likewise. * sysdeps/ia64/fpu/mpexp.c: Likewise. * sysdeps/ia64/fpu/mplog.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog. * sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function. * sysdeps/ieee754/dbl-64/mpexp.c: Delete file. * sysdeps/ieee754/dbl-64/mplog.c: Likewise. * sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/mplog.c: Likewise. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*. * sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines. * sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file. * sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise. |
||
Szabolcs Nagy
|
de800d8305 |
Remove slow paths from exp
Remove the __slowexp code, so exp is no longer correctly rounded. The result is computed to about 70 bits precision so the worst case ulp error is about 0.500007 in nearest rounding mode. * manual/probes.texi: Remove slowexp probes. * math/Makefile: Remove slowexp. * sysdeps/generic/math_private.h (__slowexp): Remove. * sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and document error bounds. * sysdeps/i386/fpu/slowexp.c: Remove. * sysdeps/ia64/fpu/slowexp.c: Remove. * sysdeps/ieee754/dbl-64/slowexp.c: Remove. * sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove. * sysdeps/m68k/m680x0/fpu/slowexp.c: Remove. * sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove. |
||
Wilco Dijkstra
|
c3d466cba1 |
Remove slow paths from pow
Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise. |
||
Joseph Myers
|
688903eb3e |
Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise. |
||
Joseph Myers
|
f1e005022e |
Revert exp reimplementation (causes test failures).
Revert: 2017-12-19 Joseph Myers <joseph@codesourcery.com> * sysdeps/x86_64/fpu/libm-test-ulps: Update. 2017-12-19 Patrick McGehearty <patrick.mcgehearty@oracle.com> * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise. |
||
Patrick McGehearty
|
6fd0a3c6a8 |
Improve __ieee754_exp() performance by greater than 5x on sparc/x86.
These changes will be active for all platforms that don't provide their own exp() routines. They will also be active for ieee754 versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and erf. Typical performance gains is typically around 5x when measured on Sparc s7 for common values between exp(1) and exp(40). Using the glibc perf tests on sparc, sparc (nsec) x86 (nsec) old new old new max 17629 395 5173 144 min 399 54 15 13 mean 5317 200 1349 23 The extreme max times for the old (ieee754) exp are due to the multiprecision computation in the old algorithm when the true value is very near 0.5 ulp away from an value representable in double precision. The new algorithm does not take special measures for those cases. The current glibc exp perf tests overrepresent those values. Informal testing suggests approximately one in 200 cases might invoke the high cost computation. The performance advantage of the new algorithm for other values is still large but not as large as indicated by the chart above. Glibc correctness tests for exp() and expf() were run. Within the test suite 3 input values were found to cause 1 bit differences (ulp) when "FE_TONEAREST" rounding mode is set. No differences in exp() were seen for the tested values for the other rounding modes. Typical example: exp(-0x1.760cd2p+0) (-1.46113312244415283203125) new code: 2.31973271630014299393707e-01 0x1.db14cd799387ap-3 old code: 2.31973271630014271638132e-01 0x1.db14cd7993879p-3 exp = 2.31973271630014285508337 (high precision) Old delta: off by 0.49 ulp New delta: off by 0.51 ulp In addition, because ieee754_exp() is used by other routines, cexp() showed test results with very small imaginary input values where the imaginary portion of the result was off by 3 ulp when in upward rounding mode, but not in the other rounding modes. For x86, tgamma showed a few values where the ulp increased to 6 (max ulp for tgamma is 5). Sparc tgamma did not show these failures. I presume the tgamma differences are due to compiler optimization differences within the gamma function.The gamma function is known to be difficult to compute accurately. * sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and <errno.h>. Include "eexp.tbl". (half): New constant. (one): Likewise. (__ieee754_exp): Rewrite. (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/eexp.tbl: New file. * sysdeps/ieee754/dbl-64/slowexp.c: Remove file. * sysdeps/i386/fpu/slowexp.c: Likewise. * sysdeps/ia64/fpu/slowexp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise. * sysdeps/generic/math_private.h (__slowexp): Remove prototype. * sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in comment. * sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math] (CPPFLAGS-slowexp.c): Remove variable. * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Remove slowexp-fma, slowexp-fma4 and slowexp-avx. (CFLAGS-slowexp-fma.c): Remove variable. (CFLAGS-slowexp-fma4.c): Likewise. (CFLAGS-slowexp-avx.c): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not define as macro. * sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise. * sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise. * math/Makefile (type-double-routines): Remove slowexp. * manual/probes.texi (slowexp_p6): Remove. (slowexp_p32): Likewise. |
||
Joseph Myers
|
6642518592 |
Fix m68k bits/mathinline.h attributes (bug 22631).
m68k bits/mathinline.h declares various functions with const attributes. These are inappropriate for functions that have results depending on the rounding mode; the machine-independent bits/mathcalls.h only uses const attributes for a very few functions with no rounding mode dependence, and the m68k header should do likewise. GCC uses pure for such functions with -frounding-math, resulting in GCC mainline warning for conflicts with between the header and the built-in attributes and glibc failing to build for m68k with GCC mainline. This patch fixes the attributes to avoid using const except when bits/mathcalls.h does so. (There are a few functions where maybe bits/mathcalls.h could do so but doesn't, but keeping the headers in sync in this regard seems to be the safe approach.) Tested compilation with build-many-glibcs.py with GCC mainline. [BZ #22631] * sysdeps/m68k/m680x0/fpu/bits/mathinline.h (__m81_defun): Add argument for attrubutes. All callers changed. (__inline_mathop1): Likewise. All callers changed. (__inline_mathop): Likewise. All callers changed. [__USE_MISC] (scalbn): Use __inline_forward instead of __inline_forward_c. [__USE_ISOC99] (scalbln): Likewise. [__USE_ISOC99] (nearbyint): Likewise. [__USE_ISOC99] (lrint): Likewise. [__USE_MISC] (scalbnf): Likewise. [__USE_ISOC99] (scalblnf): Likewise. [__USE_ISOC99] (nearbyintf): Likewise. [__USE_ISOC99] (lrintf): Likewise. [__USE_MISC] (scalbnl): Likewise. [__USE_ISOC99] (scalblnl): Likewise. [__USE_ISOC99] (nearbyintl): Likewise. [__USE_ISOC99] (lrintl): Likewise. * sysdeps/m68k/m680x0/fpu/mathimpl.h: All callers of __inline_mathop and __m81_defun changed. |
||
Joseph Myers
|
e53df1dee8 |
Rework m68k libm functions to use declare_mgen_alias.
Many m68k libm functions use their own system to share code between different types and functions, involving defining macros before including code for another function (for example, s_atan.c also acts as a template that can define other functions). Thes files serving as templates generate function aliases directly with e.g. "weak_alias (__CONCATX(__,FUNC), FUNC)" in s_atan.c. To be prepared to generate _Float32, _Float64 and _Float32x function aliases, this needs changing so that the libm_alias_* macros get used instead. As the macro to use varies depending on the type, that would mean additional macros to define in several different places to get the appropriate alias-generation macro used in each case. Rather than adding to the m68k-specific mechanisms, this patch converts the functions in question to use something closer to the math/ type-generic template mechanism. After this patch, these functions have m68k-specific templates such as s_atan_template.c, but those templates use all the same macros as in the math/ templates, such as FLOAT, M_DECL_FUNC, M_SUF and declare_mgen_alias. There is no automatic generation of the files such as s_atan.c that include the appropriate math-type-macros-*.h header and the template file (the existing automatic generation logic is only applicable for the fixed set of templates listed in math/ - and sysdeps sources always override files generated that way), so those files are still checked in, but they are all the obvious two-line files (with one additional definition in the case of the expm1 implementations), rather than making e.g. s_atan.c special. Functions are only converted where they should have aliases for _FloatN / _FloatNx types. Those m68k functions that do not generate public names (those that only generate __ieee754_*, with wrappers generating the public names, and classification functions that only exist once per format not once per type so don't get aliases) are unchanged. However, log1p (public names generated by wrapper) and significand (not provided for new types so no new aliases needed) needed changing because they previously included the atan implementations. Now, s_significand.c is the main implementation for functions with that prototype and using the old implementation approach, while log1p includes it in place of atan. Any further cleanups in this area (which preserve the proper set of functions getting aliases defined by libm_alias_float and libm_alias_double) are of course welcome, just not needed for the goals of this patch. Tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by the patch. * sysdeps/m68k/m680x0/fpu/s_atan_template.c: New file. * sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_cos_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_expm1_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_fabs_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_frexp_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_lrint_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_modf_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_nearbyint_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_remquo_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_sin_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_sincos_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_tan_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_tanh_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_atan.c: Reimplement to use s_atan_template.c. * sysdeps/m68k/m680x0/fpu/s_atanf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_atanl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceil.c: Reimplement to use s_ceil_template.c. * sysdeps/m68k/m680x0/fpu/s_ceilf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_ceill.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_cos.c: Reimplement to use s_cos_template.c. * sysdeps/m68k/m680x0/fpu/s_cosf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_cosl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_expm1.c: Reimplement to use s_expm1_template.c. * sysdeps/m68k/m680x0/fpu/s_expm1f.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_expm1l.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_fabs.c: Reimplement to use s_fabs_template.c. * sysdeps/m68k/m680x0/fpu/s_fabsf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_fabsl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floor.c: Reimplement to use s_floor_template.c. * sysdeps/m68k/m680x0/fpu/s_floorf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_floorl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_frexp.c: Reimplement to use s_frexp_template.c. * sysdeps/m68k/m680x0/fpu/s_frexpf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_lrint.c: Reimplement to use s_lrint_template.c. * sysdeps/m68k/m680x0/fpu/s_lrintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_lrintl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_modf.c: Reimplement to use s_modf_template.c. * sysdeps/m68k/m680x0/fpu/s_modff.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_modfl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_nearbyint.c: Reimplement to use s_nearbyint_template.c. * sysdeps/m68k/m680x0/fpu/s_nearbyintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_nearbyintl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_remquo.c: Reimplement to use s_remquo_template.c. * sysdeps/m68k/m680x0/fpu/s_remquof.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_remquol.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rint.c: Reimplement to use s_rint_template.c. * sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_sin.c: Reimplement to use s_sin_template.c. * sysdeps/m68k/m680x0/fpu/s_sinf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_sinl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_sincos.c: Reimplement to use s_sincos_template.c. * sysdeps/m68k/m680x0/fpu/s_sincosf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_sincosl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_tan.c: Reimplement to use s_tan_template.c. * sysdeps/m68k/m680x0/fpu/s_tanf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_tanl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_tanh.c: Reimplement to use s_tanh_template.c. * sysdeps/m68k/m680x0/fpu/s_tanhf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_tanhl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_trunc.c: Reimplement to use s_trunc_template.c. * sysdeps/m68k/m680x0/fpu/s_truncf.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_truncl.c: Likewise. * sysdeps/m68k/m680x0/fpu/s_significand.c: Reimplement based on s_atan.c instead of including s_atan.c. * sysdeps/m68k/m680x0/fpu/s_significandf.c: Reimplement based on s_atanf.c instead of including s_atanf.c. * sysdeps/m68k/m680x0/fpu/s_significandl.c: Reimplement based on s_atanl.c instead of including s_atanl.c. * sysdeps/m68k/m680x0/fpu/s_log1p.c: Include s_significand.c instead of s_atan.c. * sysdeps/m68k/m680x0/fpu/s_log1pf.c: Include s_significandf.c instead of s_atanf.c. * sysdeps/m68k/m680x0/fpu/s_log1pl.c: Include s_significandl.c instead of s_atanl.c. |
||
Joseph Myers
|
bd6ea9edd1 |
Use libm_alias macros in m68k llrint functions.
Most m68k libm functions share code via sources for one function including those for another function or type, in a way that will require significant changes to create function aliases in a way friendly to adding _FloatN / _FloatNx aliases. The llrint function implementations, however, use a conventional separate implementation for each floating-point type. Thus preparing them for _FloatN / _FloatNx aliases is just a matter of changing them to include the appropriate headers and use the appropriate macros, which this patch does. The llrintl changes aren't strictly required, since m68k long double does not meet the criteria for a _FloatN / _FloatNx type, but are included anyway to keep consistency between the implementations for the three types. Tested with build-many-glibcs.py that installed stripped shared libraries for m68k-linux-gnu are unchanged by the patch. * sysdeps/m68k/m680x0/fpu/s_llrint.c: Include <libm-alias-double.h>. (llrint): Define using libm_alias_double. * sysdeps/m68k/m680x0/fpu/s_llrintf.c: Include <libm-alias-float.h>. (llrintf): Define using libm_alias_float. * sysdeps/m68k/m680x0/fpu/s_llrintl.c: Include <libm-alias-ldouble.h>. (llrintl): Define using libm_alias_ldouble. |
||
Joseph Myers
|
faec63238f |
Use declare_mgen_alias in m68k templates.
Some m68k libm functions have their own templates replacing the generic math/ ones but using the type-generic template machinery. These currently define function aliases directly using weak_alias. In preparation for additional _FloatN / _FloatNx function aliases, this patch changes them to use declare_mgen_alias for creating aliases instead. Tested with build-many-glibcs.py that installed stripped shared libraries for m68k-linux-gnu are unchanged by the patch. * sysdeps/m68k/m680x0/fpu/s_ccosh_template.c (ccosh): Use declare_mgen_alias instead of weak_alias. * sysdeps/m68k/m680x0/fpu/s_cexp_template.c (cexp): Likewise. * sysdeps/m68k/m680x0/fpu/s_csin_template.c (csin): Likewise. * sysdeps/m68k/m680x0/fpu/s_csinh_template.c (csinh): Likewise. |
||
Szabolcs Nagy
|
bd4430c2a6 |
Do not wrap logf, log2f and powf
The new generic logf, log2f and powf code don't need wrappers any more, they set errno inline so only use the wrappers on targets that need it. * sysdeps/ieee754/flt-32/e_log2f.c (__log2f): Define without wrapper. * sysdeps/ieee754/flt-32/e_logf.c (__logf): Likewise * sysdeps/ieee754/flt-32/e_powf.c (__powf): Likewise * sysdeps/ieee754/flt-32/w_log2f.c: New file. * sysdeps/ieee754/flt-32/w_logf.c: New file. * sysdeps/ieee754/flt-32/w_powf.c: New file. * sysdeps/i386/fpu/w_log2f.c: New file. * sysdeps/i386/fpu/w_logf.c: New file. * sysdeps/i386/fpu/w_powf.c: New file. * sysdeps/m68k/m680x0/fpu/w_log2f.c: New file. * sysdeps/m68k/m680x0/fpu/w_logf.c: New file. * sysdeps/m68k/m680x0/fpu/w_powf.c: New file. |
||
Szabolcs Nagy
|
f7a0b063e7 |
Do not wrap expf and exp2f
The new generic expf and exp2f code don't need wrappers any more, they set errno inline, so only use the wrappers on targets that need it. (If the wrapper is needed, then the top level wrapper code is included, otherwise empty w_exp*f.c is used to suppress the wrapper.) A powerpc64 expf implementation includes the expf c code directly which needed some changes. * sysdeps/ieee754/flt-32/e_exp2f.c (__exp2f): Define without wrapper. * sysdeps/ieee754/flt-32/e_expf.c (__expf): Likewise * sysdeps/ieee754/flt-32/w_exp2f.c: New file. * sysdeps/ieee754/flt-32/w_expf.c: New file. * sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Update for the new expf code. * sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: New file. * sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: New file. * sysdeps/m68k/m680x0/fpu/w_exp2f.c: New file. * sysdeps/m68k/m680x0/fpu/w_expf.c: New file. * sysdeps/i386/fpu/w_exp2f.c: New file. * sysdeps/i386/fpu/w_expf.c: New file. * sysdeps/i386/i686/fpu/multiarch/w_expf.c: New file. * sysdeps/x86_64/fpu/w_expf.c: New file. |
||
Szabolcs Nagy
|
4ea49f4c08 |
New generic powf
without wrapper on aarch64: powf reciprocal-throughput: 4.2x faster powf latency: 2.6x faster old worst-case error: 1.11 ulp new worst-case error: 0.82 ulp aarch64 .text size: -780 bytes aarch64 .rodata size: +144 bytes powf(x,y) is implemented as exp2(y*log2(x)) with the same algorithms that are used in exp2f and log2f, except that the log2f polynomial is larger for extra precision and its output (and exp2f input) may be scaled by a power of 2 (POWF_SCALE) to simplify the argument reduction step of exp2 (possible when efficient round and convert toint operation is available). The special case handling tries to minimize the checks in the hot path. When the input of exp2_inline is checked, int arithmetics is used as that was faster on the tested aarch64 cores. * math/Makefile (type-float-routines): Add e_powf_log2_data. * sysdeps/ieee754/flt-32/e_powf.c: New implementation. * sysdeps/ieee754/flt-32/e_powf_log2_data.c: New file. * sysdeps/ieee754/flt-32/math_config.h (__powf_log2_data): Define. (issignalingf_inline): Likewise. (POWF_LOG2_TABLE_BITS): Likewise. (POWF_LOG2_POLY_ORDER): Likewise. (POWF_SCALE_BITS): Likewise. (POWF_SCALE): Likewise. * sysdeps/i386/fpu/e_powf_log2_data.c: New file. * sysdeps/ia64/fpu/e_powf_log2_data.c: New file. * sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c: New file. |
||
Szabolcs Nagy
|
875c76c704 |
New generic log2f
Similar to the new logf: double precision arithmetics and a small lookup table is used. The argument reduction step is the same as in the new logf. without wrapper on aarch64: log2f reciprocal-throughput: 2.3x faster log2f latency: 2.1x faster old worst case error: 1.72 ulp new worst case error: 0.75 ulp aarch64 .text size: -252 bytes aarch64 .rodata size: +244 bytes * math/Makefile (type-float-routines): Add e_log2f_data. * sysdeps/ieee754/flt-32/e_log2f.c: New implementation. * sysdeps/ieee754/flt-32/e_log2f_data.c: New file. * sysdeps/ieee754/flt-32/math_config.h (__log2f_data): Define. (LOG2F_TABLE_BITS, LOG2F_POLY_ORDER): Define. * sysdeps/i386/fpu/e_log2f_data.c: New file. * sysdeps/ia64/fpu/e_log2f_data.c: New file. * sysdeps/m68k/m680x0/fpu/e_log2f_data.c: New file. |
||
Szabolcs Nagy
|
bf27d3973d |
New generic logf
without wrapper on aarch64: logf reciprocal-throughput: 2.2x faster logf latency: 1.9x faster old worst case error: 0.89 ulp new worst case error: 0.82 ulp aarch64 .text size: -356 bytes aarch64 .rodata size: +240 bytes Uses double precision arithmetics and a lookup table to allow smaller polynomial and avoid the use of division. Data is in a separate translation unit with fixed layout to prevent the compiler generating suboptimal literal access. Errors are handled inline according to POSIX rules, but this patch keeps the wrapper with SVID compatible error handling. Needs libm-test-ulps adjustment for clogf in non-nearest rounding mode. * math/Makefile (type-float-routines): Add e_logf_data. * sysdeps/ieee754/flt-32/e_logf.c: New implementation. * sysdeps/ieee754/flt-32/e_logf_data.c: New file. * sysdeps/ieee754/flt-32/math_config.h (__logf_data): Define. (LOGF_TABLE_BITS, LOGF_POLY_ORDER): Define. * sysdeps/i386/fpu/e_logf_data.c: New file. * sysdeps/ia64/fpu/e_logf_data.c: New file. * sysdeps/m68k/m680x0/fpu/e_logf_data.c: New file. |
||
Wilco Dijkstra
|
4d3693ec1c |
Remove ancient __signbit inlines
Remove __signbit inlines from mathinline.h. Math.h already uses the builtin when supported, so additional inlines are only used on pre 4.0 GCCs. Similarly remove ancient copysign and fabs inlines. * sysdeps/alpha/fpu/bits/mathinline.h: Delete file. * sysdeps/ia64/fpu/bits/mathinline.h: Delete file. * sysdeps/m68k/coldfire/fpu/bits/mathinline.h: Delete file. * sysdeps/m68k/m680x0/fpu/bits/mathinline.h: (__signbitf): Remove. (__signbit): Remove. (__signbitl): Remove. * sysdeps/powerpc/bits/mathinline.h (__signbitf): Remove. (__signbit): Remove. (__signbitl): Remove. * sysdeps/s390/fpu/bits/mathinline.h: (__signbitf): Remove. (__signbit): Remove. (__signbitl): Remove * sysdeps/sparc/fpu/bits/mathinline.h (__signbitf): Remove. (__signbit): Remove. (__signbitl): Remove. * sysdeps/tile/bits/mathinline.h: Delete file. * sysdeps/x86/fpu/bits/mathinline.h (__signbitf): Remove. (__signbit): Remove. (__signbitl): Remove. |
||
Wilco Dijkstra
|
1e6d07234f |
Simplify C99 isgreater macros
Simplify the C99 isgreater macros. Although some support was added in GCC 2.97, not all targets added support until GCC 3.1. Therefore only use the builtins in math.h from GCC 3.1 onwards, and defer to generic macros otherwise. Improve the generic isunordered macro to use compares rather than call fpclassify twice - this is not only faster but also correct for signaling NaNs. * math/math.h: Improve handling of C99 isgreater macros. * sysdeps/alpha/fpu/bits/mathinline.h: Remove isgreater macros. * sysdeps/m68k/m680x0/fpu/bits/mathinline.h: Likewise. * sysdeps/powerpc/bits/mathinline.h: Likewise. * sysdeps/sparc/fpu/bits/mathinline.h: Likewise. * sysdeps/x86/fpu/bits/mathinline.h: Likewise. |
||
Szabolcs Nagy
|
72aa623345 |
Optimized generic expf and exp2f with wrappers
Based on new expf and exp2f code from https://github.com/ARM-software/optimized-routines/ with wrapper on aarch64: expf reciprocal-throughput: 2.3x faster expf latency: 1.7x faster without wrapper on aarch64: expf reciprocal-throughput: 3.3x faster expf latency: 1.7x faster without wrapper on aarch64: exp2f reciprocal-throughput: 2.8x faster exp2f latency: 1.3x faster libm.so size on aarch64: .text size: -152 bytes .rodata size: -1740 bytes expf/exp2f worst case nearest rounding error: 0.502 ulp worst case non-nearest rounding error: 1 ulp Error checks are inline and errno setting is in separate tail called functions, but the wrappers are kept in this patch to handle the _LIB_VERSION==_SVID_ case. (So e.g. errno is set twice for expf calls and once for __expf_finite calls on targets where the new code is used.) Double precision arithmetics is used which is expected to be faster on most targets (including soft-float) than using single precision and it is easier to get good precision result with it. Const data is kept in a separate translation unit which complicates maintenance a bit, but is expected to give good code for literal loads on most targets and allows sharing data across expf, exp2f and powf. (This data is disabled on i386, m68k and ia64 which have their own expf, exp2f and powf code.) Some details may need target specific tweaks: - best convert and round to int operation in the arg reduction may be different across targets. - code was optimized on fma target, optimal polynomial eval may be different without fma. - gcc does not always generate good code for fp bit representation access via unions or it may be inherently slow on some targets. The libm-test-ulps will need adjustment because.. - The argument reduction ideally uses nearest rounded rint, but that is not efficient on most targets, so the polynomial can get evaluated on a wider interval in non-nearest rounding mode making 1 ulp errors common in that case. - The polynomial is evaluated such that it may have 1 ulp error on negative tiny inputs with upward rounding. * math/Makefile (type-float-routines): Add math_errf and e_exp2f_data. * sysdeps/aarch64/fpu/math_private.h (TOINT_INTRINSICS): Define. (roundtoint, converttoint): Likewise. * sysdeps/ieee754/flt-32/e_expf.c: New implementation. * sysdeps/ieee754/flt-32/e_exp2f.c: New implementation. * sysdeps/ieee754/flt-32/e_exp2f_data.c: New file. * sysdeps/ieee754/flt-32/math_config.h: New file. * sysdeps/ieee754/flt-32/math_errf.c: New file. * sysdeps/ieee754/flt-32/t_exp2f.h: Remove. * sysdeps/i386/fpu/e_exp2f_data.c: New file. * sysdeps/i386/fpu/math_errf.c: New file. * sysdeps/ia64/fpu/e_exp2f_data.c: New file. * sysdeps/ia64/fpu/math_errf.c: New file. * sysdeps/m68k/m680x0/fpu/e_exp2f_data.c: New file. * sysdeps/m68k/m680x0/fpu/math_errf.c: New file. |
||
Joseph Myers
|
a60eca2e55 |
Simplify HUGE_VAL definitions.
There are various bits/huge_val*.h headers to define HUGE_VAL and related macros. All of them use __builtin_huge_val etc. for GCC 3.3 and later. Then there are various fallbacks, such as using a large hex float constant for GCC 2.96 and later, or using unions (with or without compound literals) to construct the bytes of an infinity, with this last being the reason for having architecture-specific files. Supporting TS 18661-3 _FloatN / _FloatNx types that have the same format as other supported types will mean adding more such macros; needing to add more headers for them doesn't seem very desirable. The fallbacks based on bytes of the representation of an infinity do not meet the standard requirements for a constant expression. At least one of them is also wrong: sysdeps/sh/bits/huge_val.h is producing a mixed-endian representation which does not match what GCC does. This patch eliminates all those headers, defining the macros directly in math.h. For GCC 3.3 and later, the built-in functions are used as now. For other compilers, a large constant 1e10000 (with appropriate suffix) is used. This is like the fallback for GCC 2.96 and later, but without using hex floats (which have no apparent advantage here). It is unambiguously valid standard C for all floating-point formats with infinities, which covers all formats supported by glibc or likely to be supported by glibc in future (C90 DR#025 said that if a floating-point format represents infinities, all real values lie within the range of representable values, so the constraints for constant expressions are not violated), but may generate compiler warnings and wouldn't handle the TS 18661-1 FENV_ROUND pragma correctly. If someone is actually using a compiler with glibc that does not claim to be GCC 3.3 or later, but which has a better way to define the HUGE_VAL macros, we can always add compiler conditionals in with alternative definitions. I intend to make similar changes for INF and NAN. The SNAN macros already just use __builtin_nans etc. with no fallback for compilers not claiming to be GCC 3.3 or later. Tested for x86_64. * math/math.h: Do not include bits/huge_val.h, bits/huge_valf.h, bits/huge_vall.h or bits/huge_val_flt128.h. (HUGE_VAL): Define directly here. [__USE_ISOC99] (HUGE_VALF): Likewise. [__USE_ISOC99] (HUGE_VALL): Likewise. [__HAVE_FLOAT128 && __GLIBC_USE (IEC_60559_TYPES_EXT)] (HUGE_VAL_F128): Likewise. * math/Makefile (headers): Remove bits/huge_val.h, bits/huge_valf.h, bits/huge_vall.h and bits/huge_val_flt128.h. * bits/huge_val.h: Remove. * bits/huge_val_flt128.h: Likewise. * bits/huge_valf.h: Likewise. * bits/huge_vall.h: Likewise. * sysdeps/ia64/bits/huge_vall.h: Likewise. * sysdeps/ieee754/bits/huge_val.h: Likewise. * sysdeps/ieee754/bits/huge_valf.h: Likewise. * sysdeps/m68k/m680x0/bits/huge_vall.h: Likewise. * sysdeps/sh/bits/huge_val.h: Likewise. * sysdeps/sparc/bits/huge_vall.h: Likewise. * sysdeps/x86/bits/huge_vall.h: Likewise. |