Add a comment pointing to the SystemTap wiki page that documents the
format of the arguments. Also add a pointer to the SystemTap and
gdb sources which seem to be the best place to get the architecture
specific details.
ChangeLog:
2014-02-11 Will Newton <will.newton@linaro.org>
* include/stap-probe.h: Add comment about probe argument
format.
This patch adds a feature test macro _DEFAULT_SOURCE to enable the
default set of header declarations.
The intention is: if _DEFAULT_SOURCE is not used there is no change to
the set of __USE_* macros glibc defines; if it's used on its own, and
without compiler options such as -std=c99 that define __STRICT_ANSI__,
again, there is no change; if it's used together with the macros it
approximately (i.e., apart from __USE_POSIX_IMPLICITLY) implies
(-D_BSD_SOURCE -D_SVID_SOURCE -D_POSIX_C_SOURCE=200809L), again, there
is no change. Otherwise, it causes the relevant features to be
enabled, even if __STRICT_ANSI__, or another feature test macro, would
cause them to be disabled.
This macro deliberately bundles the POSIX.1-2008 (non-X/Open)
functionality with the BSD/SVID/"misc" functionality, rather than
defining a macro that gives just the latter, as many of the header
cleanups resulting from removing _BSD_SOURCE and _SVID_SOURCE support
are only possible when BSD/SVID/"misc" is always bundled with
POSIX.1-2008.
Tested x86_64.
* include/features.h: Update comment documenting feature test
macros. Mention _DEFAULT_SOURCE in comment.
[_GNU_SOURCE] (_DEFAULT_SOURCE): Undefine and redefine.
[_DEFAULT_SOURCE]: Undefine and redefine _DEFAULT_SOURCE,
_BSD_SOURCE and _SVID_SOURCE.
[!__STRICT_ANSI__ && !_ISOC99_SOURCE && !_POSIX_SOURCE &&
!_POSIX_C_SOURCE && !_XOPEN_SOURCE && !_BSD_SOURCE &&
!_SVID_SOURCE]: Likewise.
[_DEFAULT_SOURCE && !_POSIX_SOURCE && !_POSIX_C_SOURCE]
(__USE_POSIX_IMPLICITLY): Define.
[_DEFAULT_SOURCE && !_POSIX_SOURCE && !_POSIX_C_SOURCE]
(_POSIX_SOURCE): Undefine and redefine.
[_DEFAULT_SOURCE && !_POSIX_SOURCE && !_POSIX_C_SOURCE]
(_POSIX_C_SOURCE): Likewise.
* manual/creature.texi (_DEFAULT_SOURCE): Document.
(Feature Test Macros): Update documentation of default features.
Joseph pointed out in the bug report (and in an earlier thread) that
systemtap probes cause build time warnings like the following:
../sysdeps/ieee754/dbl-64/e_atan2.c:602:4: warning: the address of
'p' will always evaluate as 'true' [-Waddress]
due to the fact that we're now passing non-weak variables to
LIBC_PROBE in the libm probes. This happens only on configurations
that do not enable systemtap. The macro definition of LIBC_PROBE in
this case only acts as a sanity checker to ensure that the number
parameters passed to LIBC_PROBE is equal to the argument count
parameter passed before it. This can be done in a much simpler manner
by just adding a macro definition for each number of arguments. I am
assuming here that we don't really want to bother with supporting
LIBC_PROBE with an indeterminate number of arguments and if there is a
need for a probe to have more data than what is currently supported (4
arguments), one could simply add an additional macro here.
Rather than open coding the masks, add helper macros to do the magic.
This makes code easier to read.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Check wheter the compiler has the option -fno-tree-loop-distribute-patterns
to inhibit loop transformation to library calls and uses it on memset
and memmove default implementation to avoid recursive calls.
The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Before:
cos(): ITERS:4.69335e+08: TOTAL:28884.6Mcy, MAX:4080.28cy, MIN:57.562cy, 16248.6 calls/Mcy
exp(): ITERS:4.47604e+08: TOTAL:28796.2Mcy, MAX:207.721cy, MIN:62.385cy, 15543.9 calls/Mcy
pow(): ITERS:1.63485e+08: TOTAL:28879.9Mcy, MAX:362.255cy, MIN:172.469cy, 5660.86 calls/Mcy
sin(): ITERS:3.89578e+08: TOTAL:28900Mcy, MAX:704.859cy, MIN:47.583cy, 13480.2 calls/Mcy
tan(): ITERS:7.0971e+07: TOTAL:28902.2Mcy, MAX:1357.79cy, MIN:388.58cy, 2455.55 calls/Mcy
After:
cos(): ITERS:6.0014e+08: TOTAL:28875.9Mcy, MAX:364.283cy, MIN:45.716cy, 20783.4 calls/Mcy
exp(): ITERS:5.48578e+08: TOTAL:28764.9Mcy, MAX:191.617cy, MIN:51.011cy, 19071.1 calls/Mcy
pow(): ITERS:1.70013e+08: TOTAL:28873.6Mcy, MAX:689.522cy, MIN:163.989cy, 5888.18 calls/Mcy
sin(): ITERS:4.64079e+08: TOTAL:28891.5Mcy, MAX:6959.3cy, MIN:36.189cy, 16062.8 calls/Mcy
tan(): ITERS:7.2354e+07: TOTAL:28898.9Mcy, MAX:1295.57cy, MIN:380.698cy, 2503.7 calls/Mcy
So the improvements are:
cos: 27.9089%
exp: 22.6919%
pow: 4.01564%
sin: 19.1585%
tan: 1.96086%
The downside of the change is that it will have an adverse performance
impact on non-default rounding modes, but I think the tradeoff is
justified.
__clock_gettime and other __clock_* functions could result in an extra
PLT reference within libc.so if it actually gets used. None of the
code currently uses them, which is why this probably went unnoticed.
Since we want to use this in installed headers, move it to the installed
sys/cdefs.h. This requires a slight tweaking of the name (add trailing
underscores).
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
These prototypes are duplicated in many places. Add a dedicated
header for holding prototypes for program-specific functions to
avoid that.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This feature is specifically for the C++ compiler to offload calling
thread_local object destructors on thread program exit, to glibc.
This is to overcome the possible complication of destructors of
thread_local objects getting called after the DSO in which they're
defined is unloaded by the dynamic linker. The DSO is marked as
'unloadable' if it has a constructed thread_local object and marked as
'unloadable' again when all the constructed thread_local objects
defined in it are destroyed.