The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Before:
cos(): ITERS:4.69335e+08: TOTAL:28884.6Mcy, MAX:4080.28cy, MIN:57.562cy, 16248.6 calls/Mcy
exp(): ITERS:4.47604e+08: TOTAL:28796.2Mcy, MAX:207.721cy, MIN:62.385cy, 15543.9 calls/Mcy
pow(): ITERS:1.63485e+08: TOTAL:28879.9Mcy, MAX:362.255cy, MIN:172.469cy, 5660.86 calls/Mcy
sin(): ITERS:3.89578e+08: TOTAL:28900Mcy, MAX:704.859cy, MIN:47.583cy, 13480.2 calls/Mcy
tan(): ITERS:7.0971e+07: TOTAL:28902.2Mcy, MAX:1357.79cy, MIN:388.58cy, 2455.55 calls/Mcy
After:
cos(): ITERS:6.0014e+08: TOTAL:28875.9Mcy, MAX:364.283cy, MIN:45.716cy, 20783.4 calls/Mcy
exp(): ITERS:5.48578e+08: TOTAL:28764.9Mcy, MAX:191.617cy, MIN:51.011cy, 19071.1 calls/Mcy
pow(): ITERS:1.70013e+08: TOTAL:28873.6Mcy, MAX:689.522cy, MIN:163.989cy, 5888.18 calls/Mcy
sin(): ITERS:4.64079e+08: TOTAL:28891.5Mcy, MAX:6959.3cy, MIN:36.189cy, 16062.8 calls/Mcy
tan(): ITERS:7.2354e+07: TOTAL:28898.9Mcy, MAX:1295.57cy, MIN:380.698cy, 2503.7 calls/Mcy
So the improvements are:
cos: 27.9089%
exp: 22.6919%
pow: 4.01564%
sin: 19.1585%
tan: 1.96086%
The downside of the change is that it will have an adverse performance
impact on non-default rounding modes, but I think the tradeoff is
justified.
__clock_gettime and other __clock_* functions could result in an extra
PLT reference within libc.so if it actually gets used. None of the
code currently uses them, which is why this probably went unnoticed.
Since we want to use this in installed headers, move it to the installed
sys/cdefs.h. This requires a slight tweaking of the name (add trailing
underscores).
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
These prototypes are duplicated in many places. Add a dedicated
header for holding prototypes for program-specific functions to
avoid that.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This feature is specifically for the C++ compiler to offload calling
thread_local object destructors on thread program exit, to glibc.
This is to overcome the possible complication of destructors of
thread_local objects getting called after the DSO in which they're
defined is unloaded by the dynamic linker. The DSO is marked as
'unloadable' if it has a constructed thread_local object and marked as
'unloadable' again when all the constructed thread_local objects
defined in it are destroyed.
These internal knobs are not exposed as part of the public ABI, so mark
them hidden to avoid generating relocations against them.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
We can't assume sock_cloexec and pipe2 are bound together as the former
defines are found in glibc only while the latter are a combo of kernel
headers and glibc. So if we do a runtime detection of SOCK_CLOEXEC, but
pipe2() is a stub inside of glibc, we hit a problem. For example:
main()
{
getgrnam("portage");
if (!popen("ls", "r"))
perror("popen()");
}
getgrnam() will detect that the kernel supports SOCK_CLOEXEC and then set
both __have_sock_cloexec and __have_pipe2 to true. But if glibc was built
against older kernel headers where __NR_pipe2 does not exist, glibc will
have a ENOSYS stub for it. So popen() will always fail as glibc assumes
pipe2() works.
While this isn't too much of an issue for some arches as they added the
functionality to the kernel at the same time, not all arches are that
lucky.
Since the code already has dedicated names for each feature, delete the
defines wiring these three features together and make each one a proper
dedicated knob.
We've been carrying this in Gentoo since glibc-2.9.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
[BZ #13579] Do not free l_initfini and allow it to be reused
on subsequent dl_open calls for the same library. This fixes
the invalid memory access in do_lookup_x when the previously
free'd l_initfini was accessed through l_searchlist when a
library had been opened for the second time.