Change the tcache->counts[] entries to uint16_t - this removes
the limit set by char and allows a larger tcache. Remove a few
redundant asserts.
bench-malloc-thread with 4 threads is ~15% faster on Cortex-A72.
Reviewed-by: DJ Delorie <dj@redhat.com>
* malloc/malloc.c (MAX_TCACHE_COUNT): Increase to UINT16_MAX.
(tcache_put): Remove redundant assert.
(tcache_get): Remove redundant asserts.
(__libc_malloc): Check tcache count is not zero.
* manual/tunables.texi (glibc.malloc.tcache_count): Update maximum.
(cherry picked from commit 1f50f2ad85)
Add Ares to the midr_el0 list and support ifunc dispatch. Since Ares
supports 2 128-bit loads/stores, use Neon registers for memcpy by
selecting __memcpy_falkor by default (we should rename this to
__memcpy_simd or similar).
* manual/tunables.texi (glibc.cpu.name): Add ares tunable.
* sysdeps/aarch64/multiarch/memcpy.c (__libc_memcpy): Use
__memcpy_falkor for ares.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_ARES):
Add new define.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list):
Add ares cpu.
(cherry picked from commit 02f440c1ef)
The kernel is evolving this interface (e.g., removal of the
restriction on cross-device copies), and keeping up with that
is difficult. Applications which need the function should
run kernels which support the system call instead of relying on
the imperfect glibc emulation.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
(cherry picked from commit 5a659ccc0e)
The tcache counts[] array is a char, which has a very small range and thus
may overflow. When setting tcache_count tunable, there is no overflow check.
However the tunable must not be larger than the maximum value of the tcache
counts[] array, otherwise it can overflow when filling the tcache.
[BZ #24531]
* malloc/malloc.c (MAX_TCACHE_COUNT): New define.
(do_set_tcache_count): Only update if count is small enough.
* manual/tunables.texi (glibc.malloc.tcache_count): Document max value.
(cherry picked from commit 5ad533e8e6)
* malloc/malloc.c (tcache_entry): Add key field.
(tcache_put): Set it.
(tcache_get): Likewise.
(_int_free): Check for double free in tcache.
* malloc/tst-tcfree1.c: New.
* malloc/tst-tcfree2.c: New.
* malloc/Makefile: Run the new tests.
* manual/probes.texi: Document memory_tcache_double_free probe.
* dlfcn/dlerror.c (check_free): Prevent double frees.
(cherry picked from commit bcdaad21d4)
malloc: tcache: Validate tc_idx before checking for double-frees [BZ #23907]
The previous check could read beyond the end of the tcache entry
array. If the e->key == tcache cookie check happened to pass, this
would result in crashes.
(cherry picked from commit affec03b71)
This patch updates the manual and adds a new chapter to the manual,
explaining types macros, constants and functions defined by ISO C11
threads.h standard.
[BZ# 14092]
* manual/debug.texi: Update adjacent chapter name.
* manual/probes.texi: Likewise.
* manual/threads.texi (ISO C Threads): New section.
(POSIX Threads): Convert to a section.
The implementation falls back to renameat if renameat2 is not available
in the kernel (or in the kernel headers) and the flags argument is zero.
Without kernel support, a non-zero argument returns EINVAL, not ENOSYS.
This mirrors what the kernel does for invalid renameat2 flags.
$(file …) appears to be the only convenient way to create files
with newlines and make substitution variables. This needs make 4.0
(released in 2013), so update the requirement to match.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Some Linux distributions are experimenting with a new, separately
maintained and hopefully more agile implementation of the crypt
API. To facilitate this, add a configure option which disables
glibc's embedded libcrypt. When this option is given, libcrypt.*
and crypt.h will not be built nor installed.
This is a major rewrite of the description of 'crypt', 'getentropy',
and 'getrandom'.
A few highlights of the content changes:
- Throughout the manual, public headers, and user-visible messages,
I replaced the term "password" with "passphrase", the term
"password database" with "user database", and the term
"encrypt(ion)" with "(one-way) hashing" whenever it was applied to
passphrases. I didn't bother making this change in internal code
or tests. The use of the term "password" in ruserpass.c survives,
because that refers to a keyword in netrc files, but it is adjusted
to make this clearer.
There is a note in crypt.texi explaining that they were
traditionally called passwords but single words are not good enough
anymore, and a note in users.texi explaining that actual passphrase
hashes are found in a "shadow" database nowadays.
- There is a new short introduction to the "Cryptographic Functions"
section, explaining how we do not intend to be a general-purpose
cryptography library, and cautioning that there _are_, or have
been, legal restrictions on the use of cryptography in many
countries, without getting into any kind of detail that we can't
promise to keep up to date.
- I added more detail about what a "one-way function" is, and why
they are used to obscure passphrases for storage. I removed the
paragraph saying that systems not connected to a network need no
user authentication, because that's a pretty rare situation
nowadays. (It still says "sometimes it is necessary" to
authenticate the user, though.)
- I added documentation for all of the hash functions that glibc
actually supports, but not for the additional hash functions
supported by libxcrypt. If we're going to keep this manual section
around after the transition is more advanced, it would probably
make sense to add them then.
- There is much more detailed discussion of how to generate a salt,
and the failure behavior for crypt is documented. (Returning an
invalid hash on failure is what libxcrypt does; Solar Designer's
notes say that this was done "for compatibility with old programs
that assume crypt can never fail".)
- As far as I can tell, the header 'crypt.h' is entirely a GNU
invention, and never existed on any other Unix lineage. The
function 'crypt', however, was in Issue 1 of the SVID and is now
in the XSI component of POSIX. I tried to make all of the
@standards annotations consistent with this, but I'm not sure I got
them perfectly right.
- The genpass.c example has been improved to use getentropy instead
of the current time to generate the salt, and to use a SHA-256 hash
instead of MD5. It uses more random bytes than is strictly
necessary because I didn't want to complicate the code with proper
base64 encoding.
- The testpass.c example has three hardwired hashes now, to
demonstrate that different one-way functions produce different
hashes for the same input. It also demonstrates how DES hashing
only pays attention to the first eight characters of the input.
- There is new text explaining in more detail how a CSPRNG differs
from a regular random number generator, and how
getentropy/getrandom are not exactly a CSPRNG. I tried not to make
specific falsifiable claims here. I also tried to make the
blocking/cancellation/error behavior of both getentropy and
getrandom clearer.
In preparation for a major revision of the documentation for
crypt(_r), getentropy, and getrandom, reorganize crypt.texi. This
patch does not change any text; it only deletes and moves text.
The description of 'getpass' moves to terminal.texi, since all it does
is read a password from the controlling terminal with echo disabled.
The "Legal Problems" section of crypt.texi is dropped, and the
introductory text is shifted down to the "Encrypting Passwords"
section; the next patch will add some new introductory text.
Also, it is no longer true that crypt.texi's top @node needs to have
no pointers. That was a vestige of crypt/ being an add-on. (makeinfo
itself doesn't need @node pointers anymore, but the scripts that
assemble the libc manual's topmost node rely on each chapter-level
node having them.)
The functions encrypt, setkey, encrypt_r, setkey_r, cbc_crypt,
ecb_crypt, and des_setparity should not be used in new programs,
because they use the DES block cipher, which is unacceptably weak by
modern standards. Demote all of them to compatibility symbols, and
remove their prototypes from installed headers. cbc_crypt, ecb_crypt,
and des_setparity were already compat symbols when glibc was
configured with --disable-obsolete-rpc.
POSIX requires encrypt and setkey to be available when _XOPEN_CRYPT
is defined, so this change also removes the definition of X_OPEN_CRYPT
from <unistd.h>.
The entire "DES Encryption" section is dropped from the manual, as is
the mention of AUTH_DES and FIPS 140-2 in the introduction to
crypt.texi. The documentation of 'memfrob' cross-referenced the DES
Encryption section, which is replaced by a hyperlink to libgcrypt, and
while I was in there I spruced up the actual documentation of
'memfrob' and 'strfry' a little. It's still fairly jokey, because
those functions _are_ jokes, but they do also have real use cases, so
people trying to use them for real should have all the information
they need.
DES-based authentication for Sun RPC is also insecure and should be
deprecated or even removed, but maybe that can be left as TI-RPC's
problem.
This patch fixes the OFD ("file private") locks for architectures that
support non-LFS flock definition (__USE_FILE_OFFSET64 not defined). The
issue in this case is both F_OFD_{GETLK,SETLK,SETLKW} and
F_{SET,GET}L{W}K64 expects a flock64 argument and when using old
F_OFD_* flags with a non LFS flock argument the kernel might interpret
the underlying data wrongly. Kernel idea originally was to avoid using
such flags in non-LFS syscall, but since GLIBC uses fcntl with LFS
semantic as default it is possible to provide the functionality and
avoid the bogus struct kernel passing by adjusting the struct manually
for the required flags.
The idea follows other LFS interfaces that provide two symbols:
1. A new LFS fcntl64 is added on default ABI with the usual macros to
select it for FILE_OFFSET_BITS=64.
2. The Linux non-LFS fcntl use a stack allocated struct flock64 for
F_OFD_{GETLK,SETLK,SETLKW} copy the results on the user provided
struct.
3. Keep a compat symbol with old broken semantic for architectures
that do not define __OFF_T_MATCHES_OFF64_T.
So for architectures which defines __USE_FILE_OFFSET64, fcntl64 will
aliased to fcntl and no adjustment would be required. So to actually
use F_OFD_* with LFS support the source must be built with LFS support
(_FILE_OFFSET_BITS=64).
Also F_OFD_SETLKW command is handled a cancellation point, as for
F_SETLKW{64}.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[BZ #20251]
* NEWS: Mention fcntl64 addition.
* csu/check_fds.c: Replace __fcntl_nocancel by __fcntl64_nocancel.
* login/utmp_file.c: Likewise.
* sysdeps/posix/fdopendir.c: Likewise.
* sysdeps/posix/opendir.c: Likewise.
* sysdeps/unix/pt-fcntl.c: Likewise.
* include/fcntl.h (__libc_fcntl64, __fcntl64,
__fcntl64_nocancel_adjusted): New prototype.
(__fcntl_nocancel_adjusted): Remove prototype.
* io/Makefile (routines): Add fcntl64.
(CFLAGS-fcntl64.c): New rule.
* io/Versions [GLIBC_2.28] (fcntl64): New symbol.
[GLIBC_PRIVATE] (__libc_fcntl): Rename to __libc_fcntl64.
* io/fcntl.h (fcntl64): Add prototype and redirect if
__USE_FILE_OFFSET64 is defined.
* io/fcntl64.c: New file.
* manual/llio.text: Add a note for which commands fcntl acts a
cancellation point.
* nptl/Makefile (CFLAGS-fcntl64.c): New rule.
* sysdeps/mach/hurd/fcntl.c: Alias fcntl to fcntl64 symbols.
* sysdeps/mach/hurd/i386/libc.abilist [GLIBC_2.28] (fcntl, fcntl64):
New symbols.
* sysdeps/unix/sysv/linux/fcntl.c (__libc_fcntl): Fix F_GETLK64,
F_OFD_GETLK, F_SETLK64, F_SETLKW64, F_OFD_SETLK, and F_OFD_SETLKW for
non-LFS case.
* sysdeps/unix/sysv/linux/fcntl64.c: New file.
* sysdeps/unix/sysv/linux/fcntl_nocancel.c (__fcntl_nocancel): Rename
to __fcntl64_nocancel.
(__fcntl_nocancel_adjusted): Rename to __fcntl64_nocancel_adjusted.
* sysdeps/unix/sysv/linux/not-cancel.h (__fcntl_nocancel): Rename
to __fcntl64_nocancel.
* sysdeps/unix/sysv/linux/tst-ofdlocks.c: New file.
* sysdeps/unix/sysv/linux/tst-ofdlocks-compat.c: Likewise.
* sysdeps/unix/sysv/linux/Makefile (tests): Add tst-ofdlocks.
(tests-internal): Add tst-ofdlocks-compat.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist [GLIBC_2.28]
(fcntl64): New symbol.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libc.abilist [GLIBC_2.28] (fcntl,
fcntl64): Likewise.
* sysdeps/unix/sysv/linux/hppa/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilis: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
2018-04-30 Raymond Nicholson <rain1@airmail.cc>
* manual/startup.texi (Aborting a Program): Remove inappropriate joke.
This complies with the decision of the project leader and primary and
ultimate maintainer, who partially delegated maintainership to myself
and others under certain constraints.
This is also in line with the community-agreed procedures.
It is obvious that we didn't have consensus on a decision to install
that patch, since both sides are still arguing over it.
As for the decision to reverse the deletion, if we even need one to
counter a move that did not have consensus, although nobody else offered
to install the reversal and restore the status prior to the fait
accompli, and some explicitly refused to do so themselves, nobody
objected when I offered to do so. Therefore, by the same reasoning that
led to the mistaken installation of the patch, and after a much longer
wait for objections, I understand there is consensus on my reverting it.
Since tile support has been removed from the Linux kernel for 4.17,
this patch removes the (unmaintained) port to tilegx from glibc (the
tilepro support having been previously removed). This reflects the
general principle that a glibc port needs upstream support for the
architecture in all the components it build-depends on (so binutils,
GCC and the Linux kernel, for the normal case of a port supporting the
Linux kernel but no other OS), in order to be maintainable.
Apart from removal of sysdeps/tile and sysdeps/unix/sysv/linux/tile,
there are updates to various comments referencing tile for which
removal of those references seemed appropriate. The configuration is
removed from README and from build-many-glibcs.py. contrib.texi keeps
mention of removed contributions, but I updated Chris Metcalf's entry
to reflect that he also contributed the non-removed support for the
generic Linux kernel syscall interface.
__ASSUME_FADVISE64_64_NO_ALIGN support is removed, as it was only used
by tile.
* sysdeps/tile: Remove.
* sysdeps/unix/sysv/linux/tile: Likewise.
* README (tilegx-*-linux-gnu): Remove from list of supported
configurations.
* manual/contrib.texi (Contributors): Mention Chris Metcalf's
contribution of support for generic Linux kernel syscall
interface.
* scripts/build-many-glibcs.py (Context.add_all_configs): Remove
tilegx configurations.
(Config.install_linux_headers): Do not handle tile.
* sysdeps/unix/sysv/linux/aarch64/ldsodefs.h: Do not mention Tile
in comment.
* sysdeps/unix/sysv/linux/nios2/Makefile: Likewise.
* sysdeps/unix/sysv/linux/posix_fadvise.c: Likewise.
[__ASSUME_FADVISE64_64_NO_ALIGN] (__ALIGNMENT_ARG): Remove
conditional undefine and redefine.
* sysdeps/unix/sysv/linux/posix_fadvise64.c: Do not mention Tile
in comment.
[__ASSUME_FADVISE64_64_NO_ALIGN] (__ALIGNMENT_ARG): Remove
conditional undefine and redefine.
The example did not work because the null byte was not converted, and
mbrtowc was called with a zero-length input string. This results in a
(size_t) -2 return value, so the function always returns NULL.
The size computation for the heap allocation of the result was
incorrect because it did not deal with integer overflow.
Error checking was missing, and the allocated memory was not freed on
error paths. All error returns now set errno. (Note that there is an
assumption that free does not clobber errno.)
The slightly unportable comparision against (size_t) -2 to catch both
(size_t) -1 and (size_t) -2 return values is gone as well.
A null wide character needs to be stored in the result explicitly, to
terminate it.
The description in the manual is updated to deal with these finer
points. The (size_t) -2 behavior (consuming the input bytes) matches
what is specified in ISO C11.
* manual/errno.texi (EOWNERDEAD, ENOTRECOVERABLE): Remove errno
values from Linux-specific section now that it is in the GNU section.
* sysdeps/gnu/errlist.c: Regenerate.
* hurd/Makefile (routines): Add hurdlock.
* hurd/Versions (GLIBC_PRIVATE): Added new entry to export the above
interface.
(HURD_CTHREADS_0.3): Remove __libc_getspecific.
* hurd/hurdpid.c: Include <lowlevellock.h>
(_S_msg_proc_newids): Use lll_wait to synchronize.
* hurd/hurdsig.c: (reauth_proc): Use __mutex_lock and __mutex_unlock.
* hurd/setauth.c: Include <hurdlock.h>, use integer for synchronization.
* mach/Makefile (lock-headers): Remove machine-lock.h.
* mach/lock-intern.h: Include <lowlevellock.h> instead of
<machine-lock.h>.
(__spin_lock_t): New type.
(__SPIN_LOCK_INITIALIZER): New macro.
(__spin_lock, __spin_unlock, __spin_try_lock, __spin_lock_locked,
__mutex_init, __mutex_lock_solid, __mutex_unlock_solid, __mutex_lock,
__mutex_unlock, __mutex_trylock): Use lll to implement locks.
* mach/mutex-init.c: Include <lowlevellock.h> instead of <cthreads.h>.
(__mutex_init): Initialize with lll.
* manual/errno.texi (EOWNERDEAD, ENOTRECOVERABLE): New errno values.
* sysdeps/mach/Makefile: Add libmachuser as dependencies for libs
needing lll.
* sysdeps/mach/hurd/bits/errno.h: Regenerate.
* sysdeps/mach/hurd/cthreads.c (__libc_getspecific): Remove function.
* sysdeps/mach/hurd/bits/libc-lock.h: Remove file.
* sysdeps/mach/hurd/setpgid.c: Include <lowlevellock.h>.
(__setpgid): Use lll for synchronization.
* sysdeps/mach/hurd/setsid.c: Likewise with __setsid.
* sysdeps/mach/bits/libc-lock.h: Include <tls.h> and <lowlevellock.h>
instead of <cthreads.h>.
(_IO_lock_inexpensive): New macro
(__libc_lock_recursive_t, __rtld_lock_recursive_t): New structures.
(__libc_lock_self0): New declaration.
(__libc_lock_owner_self): New macro.
(__libc_key_t): Remove type.
(_LIBC_LOCK_INITIALIZER): New macro.
(__libc_lock_define_initialized, __libc_lock_init, __libc_lock_fini,
__libc_lock_fini_recursive, __rtld_lock_fini_recursive,
__libc_lock_lock, __libc_lock_trylock, __libc_lock_unlock,
__libc_lock_define_initialized_recursive,
__rtld_lock_define_initialized_recursive,
__libc_lock_init_recursive, __libc_lock_trylock_recursive,
__libc_lock_lock_recursive, __libc_lock_unlock_recursive,
__rtld_lock_initialize, __rtld_lock_trylock_recursive,
__rtld_lock_lock_recursive, __rtld_lock_unlock_recursive
__libc_once_define, __libc_mutex_unlock): Reimplement with lll.
(__libc_lock_define_recursive, __rtld_lock_define_recursive,
_LIBC_LOCK_RECURSIVE_INITIALIZER, _RTLD_LOCK_RECURSIVE_INITIALIZER):
New macros.
Include <libc-lockP.h> to reimplement libc_key* with pthread_key*.
* hurd/hurdlock.c: New file.
* hurd/hurdlock.h: New file.
* mach/lowlevellock.h: New file
The description of the interplay between feature test macros and
compiler options in the description of _DEFAULT_SOURCE is a little
confusing, and dated, so clarify the situation, and don't assume a
specific value for _DEFAULT_SOURCE.
Also, _DEFAULT_SOURCE is supposed to be defined if none of the C/POSIX
feature test macros are defined, but the condition was lacking a test
for _ISOC11_SOURCE, so that is also addressed.
[BZ #22862]
* include/features.h: Add _ISOC11_SOURCE to test for whether
to define _DEFAULT_SOURCE.
* manual/creature.texi (_DEFAULT_SOURCE): Improve
documentation.
The current description refers to ISO C99 not being widely adopted,
which it is believed to be now.
* manual/creature.texi (_ISOC99_SOURCE): Update the dated
description.
Several feature test macros are documented in features.h but absent in
the manual, and some documented macros accept undocumented values.
This commit updates the manual to mention all the accepted macros,
along with any values that hold special meaning.
* manual/creature.texi (_POSIX_C_SOURCE): Document special
values of 199606L, 200112L, and 200809L.
(_XOPEN_SOURCE): Document special values of 600 and 700.
(_ISOC11_SOURCE): Document macro.
(_ATFILE_SOURCE): Likewise.
(_FORTIFY_SOURCE): Likewise.
This is a minor rewording to clarify the behaviour of
get_current_dir_name. Additionally, the @vindex is moved above the
@deftypefun so that following links give a better result with regard
to context.
[BZ #6889]
* manual/filesys.texi (get_current_dir_name): Clarify
behaviour.
The opening parenthesis for function arguments in an @deftypefun need
to be separated from the function name. This isn't just a matter of
the GNU coding style---it causes the "(void" (in this case) to be
rendered as a part of the function name, causing a visual defect, and
also results in a warning to the following effect during `make pdf':
Warning: unbalanced parentheses in @def...)
* manual/platform.texi (__riscv_flush_icache): Fix @deftypefun
syntax.
There are some bug reports from people setting CFLAGS not including a
-O option and then being confused when the build fails. This patch
addresses this by documenting the proper use of CC and CFLAGS in more
detail - saying what options should go where and specifying the
requirement to compile with optimization.
The previous text incorrectly used @var markup with CC and CFLAGS.
The correct markup for environment variables is @env, but it's also
the case that passing such variables explicitly on the configure
command line is preferred to passing them in the environment, so this
patch changes the documentation to describe passing them on the
command line (and uses @code).
In many cases putting options in the wrong place may in fact work, but
I believe what I've specified is the correct rule for which options to
put where.
[BZ #20980]
[BZ #21234]
* manual/install.texi (Configuring and compiling): Describe
passing CC and CFLAGS on configure command line, not as
environment variables. Use @code markup on those variables.
Specify what options go in CC and what go in CFLAGS. Note the
requirement to compile with optimization.
* INSTALL: Regenerated.
Remove the slow paths from pow. Like several other double precision math
functions, pow is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP. A simple test over a few hundred million
values shows pow is 10% faster on average. This fixes BZ #13932.
[BZ #13932]
* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
* benchtests/pow-inputs: Update comment for slow path cases.
* manual/probes.texi (slowpow_p10): Delete removed probe.
(slowpow_p10): Likewise.
* math/Makefile: Remove halfulp.c and slowpow.c.
* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/generic/math_private.h (__exp1): Remove error argument.
(__halfulp): Remove.
(__slowpow): Remove.
* sysdeps/i386/fpu/halfulp.c: Delete file.
* sysdeps/i386/fpu/slowpow.c: Likewise.
* sysdeps/ia64/fpu/halfulp.c: Likewise.
* sysdeps/ia64/fpu/slowpow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
improve comments and add error analysis.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
(power1): Remove function:
(log1): Remove error argument, add error analysis.
(my_log2): Remove function.
* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
This patch adds the narrowing add functions from TS 18661-1 to glibc's
libm: fadd, faddl, daddl, f32addf64, f32addf32x, f32xaddf64 for all
configurations; f32addf64x, f32addf128, f64addf64x, f64addf128,
f32xaddf64x, f32xaddf128, f64xaddf128 for configurations with
_Float64x and _Float128; __nldbl_daddl for ldbl-opt. As discussed for
the build infrastructure patch, tgmath.h support is deliberately
deferred, and FP_FAST_* macros are not applicable without optimized
function implementations.
Function implementations are added for all relevant pairs of formats
(including certain cases of a format and itself where more than one
type has that format). The main implementations use round-to-odd, or
a trivial computation in the case where both formats are the same or
where the wider format is IBM long double (in which case we don't
attempt to be correctly rounding). The sysdeps/ieee754/soft-fp
implementations use soft-fp, and are used automatically for
configurations without exceptions and rounding modes by virtue of
existing Implies files. As previously discussed, optimized versions
for particular architectures are possible, but not included.
i386 gets a special version of f32xaddf64 to avoid problems with
double rounding (similar to the existing fdim version), since this
function must round just once without an intermediate rounding to long
double. (No such special version is needed for any other function,
because the nontrivial functions use round-to-odd, which does the
intermediate computation with the rounding mode set to round-to-zero,
and double rounding is OK except in round-to-nearest mode, so is OK
for that intermediate round-to-zero computation.) mul and div will
need slightly different special versions for i386 (using round-to-odd
on long double instead of precision control) because of the
possibility of inexact intermediate results in the subnormal range for
double.
To reduce duplication among the different function implementations,
math-narrow.h gets macros CHECK_NARROW_ADD, NARROW_ADD_ROUND_TO_ODD
and NARROW_ADD_TRIVIAL.
In the trivial cases and for any architecture-specific optimized
implementations, the overhead of the errno setting might be
significant, but I think that's best handled through compiler built-in
functions rather than providing separate no-errno versions in glibc
(and likewise there are no __*_finite entry points for these function
provided, __*_finite effectively being no-errno versions at present in
most cases).
Tested for x86_64 and x86, with both GCC 6 and GCC 7. Tested for
mips64 (all three ABIs, both hard and soft float) and powerpc with GCC
7. Tested with build-many-glibcs.py with both GCC 6 and GCC 7.
* math/Makefile (libm-narrow-fns): Add add.
(libm-test-funcs-narrow): Likewise.
* math/Versions (GLIBC_2.28): Add narrowing add functions.
* math/bits/mathcalls-narrow.h (add): Use __MATHCALL_NARROW .
* math/gen-auto-libm-tests.c (test_functions): Add add.
* math/math-narrow.h (CHECK_NARROW_ADD): New macro.
(NARROW_ADD_ROUND_TO_ODD): Likewise.
(NARROW_ADD_TRIVIAL): Likewise.
* sysdeps/ieee754/float128/float128_private.h (__faddl): New
macro.
(__daddl): Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Add fadd and
dadd.
(CFLAGS-nldbl-dadd.c): New variable.
(CFLAGS-nldbl-fadd.c): Likewise.
* sysdeps/ieee754/ldbl-opt/Versions (GLIBC_2.28): Add
__nldbl_daddl.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.h (__nldbl_daddl): New
prototype.
* manual/arith.texi (Misc FP Arithmetic): Document fadd, faddl,
daddl, fMaddfN, fMaddfNx, fMxaddfN and fMxaddfNx.
* math/auto-libm-test-in: Add tests of add.
* math/auto-libm-test-out-narrow-add: New generated file.
* math/libm-test-narrow-add.inc: New file.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fadd.c: Likewise.
* sysdeps/ieee754/float128/s_f32addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-dadd.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_daddl.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_faddl.c: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/mach/hurd/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
A number of cross-references to the GCC info manual cause Texinfo
warnings; e.g.:
./creature.texi:11: warning: @xref node name should not contain `.'
This is due to "gcc.info" being used in the INFO-FILE-NAME (fourth)
argument. Changing it to "gcc" removes these warnings. (Manually
confirmed equivalent behaviour for make info, html, and pdf.)
* manual/creature.texi: Convert references to gcc.info to gcc.
* manual/stdio.texi: Likewise.
* manual/string.texi: Likewise.
Remove the slow paths from log. Like several other double precision math
functions, log is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
Interestingly removing the slow paths makes hardly any difference in practice:
the worst case error is still ~0.502ULP, and exp(log(x)) shows identical results
before/after on many millions of random cases. All GLIBC math tests pass on
AArch64 and x64 with no change in ULP error. A simple test over a few hundred
million values shows log is now 18% faster on average.
* manual/probes.texi (slowlog): Delete documentation of removed probe.
(slowlog_inexact): Likewise
* sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Remove slow paths.
* sysdeps/ieee754/dbl-64/ulog.h: Remove unused declarations.