The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Before:
cos(): ITERS:4.69335e+08: TOTAL:28884.6Mcy, MAX:4080.28cy, MIN:57.562cy, 16248.6 calls/Mcy
exp(): ITERS:4.47604e+08: TOTAL:28796.2Mcy, MAX:207.721cy, MIN:62.385cy, 15543.9 calls/Mcy
pow(): ITERS:1.63485e+08: TOTAL:28879.9Mcy, MAX:362.255cy, MIN:172.469cy, 5660.86 calls/Mcy
sin(): ITERS:3.89578e+08: TOTAL:28900Mcy, MAX:704.859cy, MIN:47.583cy, 13480.2 calls/Mcy
tan(): ITERS:7.0971e+07: TOTAL:28902.2Mcy, MAX:1357.79cy, MIN:388.58cy, 2455.55 calls/Mcy
After:
cos(): ITERS:6.0014e+08: TOTAL:28875.9Mcy, MAX:364.283cy, MIN:45.716cy, 20783.4 calls/Mcy
exp(): ITERS:5.48578e+08: TOTAL:28764.9Mcy, MAX:191.617cy, MIN:51.011cy, 19071.1 calls/Mcy
pow(): ITERS:1.70013e+08: TOTAL:28873.6Mcy, MAX:689.522cy, MIN:163.989cy, 5888.18 calls/Mcy
sin(): ITERS:4.64079e+08: TOTAL:28891.5Mcy, MAX:6959.3cy, MIN:36.189cy, 16062.8 calls/Mcy
tan(): ITERS:7.2354e+07: TOTAL:28898.9Mcy, MAX:1295.57cy, MIN:380.698cy, 2503.7 calls/Mcy
So the improvements are:
cos: 27.9089%
exp: 22.6919%
pow: 4.01564%
sin: 19.1585%
tan: 1.96086%
The downside of the change is that it will have an adverse performance
impact on non-default rounding modes, but I think the tradeoff is
justified.
Resolves: #15465
The program name may be unavailable if the user application tampers
with argc and argv[]. Some parts of the dynamic linker caters for
this while others don't, so this patch consolidates the check and
fallback into a single macro and updates all users.
These prototypes are duplicated in many places. Add a dedicated
header for holding prototypes for program-specific functions to
avoid that.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
ARM now supports loading unmarked objects from
the dynamic loader cache. Unmarked objects can
be used with the hard-float or soft-float ABI.
We must support loading unmarked objects during
the transition period from a binutils that does
not mark objects to one that does mark them with
the correct ELF flags.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/generic/ldconfig.h (FLAG_AARCH64_LIB64): New macro.
* elf/cache.c (print_entry): Print ",AArch64" for
FLAG_AARCH64_LIB64.
Signed-off-by: Steve McIntyre <steve.mcintyre@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
* sysdeps/generic/ldconfig.h (FLAG_ARM_LIBHF): New macro.
* elf/cache.c (print_entry): Print ",hard-float" for
FLAG_ARM_LIBHF.
Signed-off-by: Steve McIntyre <steve.mcintyre@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
(__crypt_r, __crypt): Disable MD5 and DES if FIPS is enabled.
* crypt/md5c-test.c (main): Tolerate disabled MD5.
* sysdeps/unix/sysv/linux/fips-private.h: New file.
* sysdeps/generic/fips-private.h: New file, dummy fallback.
Using madvise with MADV_DONTNEED to release memory back to the kernel
is not sufficient to change the commit charge accounted against the
process on Linux. It is OK however, when overcommit is enabled or is
heuristic. However, when overcommit is restricted to a percentage of
memory setting the contents of /proc/sys/vm/overcommit_memory as 2, it
makes a difference since memory requests will fail. Hence, we do what
we do with secure exec binaries, which is to call mmap on the region
to be dropped with MAP_FIXED. This internally unmaps the pages in
question and reduces the amount of memory accounted against the
process.
[BZ #6794]
Following Joseph comments about bug 6794, here is a proposed fix. It turned out
to be a large fix mainly because I had to move some file along to follow libm
files/names conventions.
Basically I have added wrappers (w_ilogb.c, w_ilogbf.c, w_ilogbl.c) that now calls
the symbol '__ieee754_ilogb'. The wrappers checks for '__ieee754_ilogb' output and
set the errno and raise exceptions as expected.
The '__ieee754_ilogb' is implemented in sysdeps. I have moved the 's_ilogb[f|l]' files
to e_ilogb[f|l] and renamed the '__ilogb[f|l]' to '__ieee754_ilogb[f|l]'.
I also found out a bug in i386 and x86-64 assembly coded ilogb implementation where
it raises a FE_DIVBYZERO when argument is '0.0'. I corrected this issue as well.
Finally I added the errno and FE_INVALID tests for 0.0, NaN and +-InF argument. Tested
on i386, x86-64, ppc32 and ppc64.
It may sometimes be desirable to make the dynamic linker only pick up
libraries from the library path and rpath and not look at the
ld.so.cache that ldconfig generates. An example of such a use case is
the glibc testsuite where the dynamic linker must not be influenced by
any external paths or caches.
This change adds a new option --inhibit-ldcache that when used, tells
the dynamic linker to not use ld.so.cache even if it is available.
* sysdeps/generic/dl-osinfo.h (_dl_setup_stack_chk_guard): Fix
masking out of the most significant byte of random value used.
* sysdeps/unix/sysv/linux/dl-osinfo.h (_dl_setup_stack_chk_guard):
Fix coding style in previous change.
* sysdeps/generic/dl-osinfo.h (_dl_setup_stack_chk_guard): Ensure
default stack guard is set in last bytes.
* sysdeps/unix/sysv/linux/dl-osinfo.h (_dl_setup_stack_chk_guard): Same.
It's already so marked in dl-sysdep.c. Failure to so mark
in the header file leads the compiler to believe that the
variable should be addressable via the .sdata section.
Signed-off-by: Richard Henderson <rth@twiddle.net>
If a binary gets invoked by passing it as argument to ld.so the stack
still holds the auxiliary vector of ld.so when entering the _start
routine of the executable. So the invocation via ld.so is not fully
transparent to the executable. This causes problems if the executable
wants to scan the auxv itself.
Some symbols have to be identified process-wide by their name. This is
particularly important for some C++ features (e.g., class local static data
and static variables in inline functions). This cannot completely be
implemented with ELF functionality so far. The STB_GNU_UNIQUE binding
helps by ensuring the dynamic linker will always use the same definition for
all symbols with the same name and this binding.
* elf/dl-load.c (_dl_map_object_from_fd): Only call audit hooks
if we are not loading a new audit library.
* elf/dl-reloc (_dl_relocate_object): Third parameter is now a bitmask.
Only use profiling trampoline for auditing if we are not relocating
an audit library.
* elf/dl-open.c (dl_open_worker): Adjust _dl_relocate_object call.
* elf/rtld.c: Likewise.
* sysdeps/generic/ldsodefs.h: Adjust _dl_relocate_object prototype.