Resolves: #15424
The compiler would optimize the benchmark function call out of the
loop and call it only once, resulting in blazingly fast times for some
benchmarks (notably atan, sin and cos). Mark the inputs as volatile
so that the code is forced to read again from the input for each
iteration.
[BZ #15442] This adds support for the inverse interpretation of the
quiet bit of IEEE 754 floating-point NaN data that some processors
use. This includes in particular MIPS architecture processors; the
payload used for the canonical qNaN encoding is updated accordingly
so as not to interfere with the quiet bit.
The EXTRACT_WORDS64 and INSERT_WORDS64 macros use movd for a 64-bit
operation. Somehow gcc manages to turn this into movq, but LLVM won't.
2013-05-15 Peter Collingbourne <pcc@google.com>
* sysdeps/x86_64/fpu/math_private.h (MOVQ): New macro.
(EXTRACT_WORDS64) Use where appropriate.
(INSERT_WORDS64) Likewise.
While these instructions accept memory operands, only one operand
may be a memory operand. Giving two operands xm constraints gives
the compiler the option of using memory for both operands, which
would result in invalid assembly code. Using x for all operands is
more appropriate, as most x86_64 calling conventions will pass the
arguments in registers anyway.
2013-05-15 Peter Collingbourne <pcc@google.com>
* sysdeps/x86_64/fpu/multiarch/s_fma.c (__fma_fma4): Replace xm
constraints with x constraints.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c (__fmaf_fma4): Likewise.
it is impossible to create an alias of a common symbol (as
compat_symbol does), because common symbols do not have a section or
an offset until linked. GNU as tolerates aliases of common symbols by
simply creating another common symbol, but other assemblers (notably
LLVM's integrated assembler) are less tolerant.
2013-05-15 Peter Collingbourne <pcc@google.com>
* malloc/obstack.c (_obstack_compat): Add initializer.
-
Loading of the vDSO pseudo-hwcap from the type 2 GNU note is
a rather arcane and poorly documented process. Given that I had
a chance to review this code today I thought I would add all
of the things I had to lookup to verify the validity of the
process.
With a single .note.GNU the vDSO can register up to 64 flags,
though in practice you are limited to 64 - _DL_FIRST_EXTRA
bits which on x86 is 12 bits.
The only use of this that I know of is in the Xen support
in Linux where they use the 1st bit to indicate "nosegneg".
I see "We use bit 1 to avoid bugs in some versions of glibc
when bit 0 is used; the choice is otherwise arbitrary.", but
no reference to a glibc bug anywhere. The code as-is should
support bit zero, so we still have that free for future use.
The kernel, glibc, and ld.so.cache must coordinate to ensure
that bit values don't go too high and are used consistently.
---
2013-05-13 Carlos O'Donell <carlos@redhat.com>
* elf/dl-hwcaps.c (_dl_important_hwcaps): Comment vDSO hwcap loading.
* elf/ldconfig.c (is_hwcap_platform): Comment each hwcap check.
(main): Comment "tls" pseudo-hwcap.
HP_TIMING uses native timestamping instructions if available, thus
greatly reducing the overhead of recording start and end times for
function calls. For architectures that don't have HP_TIMING
available, we fall back to the clock_gettime bits. One may also
override this by invoking the benchmark as follows:
make USE_CLOCK_GETTIME=1 bench
and get the benchmark results using clock_gettime. One has to do
`make bench-clean` to ensure that the benchmark programs are rebuilt.
The algorithm for scanning dependencies upon dlclose is
less than immediately obvious. This patch adds two bits
of comments that explain why you start the dependency
search at l_initfini[1], and why you need to restart
the search.
---
2013-05-09 Carlos O'Donell <carlos@redhat.com>
* elf/dl-close.c (_dl_close_worker): Add comments.
Rewrite the first paragraph to talk about users not humans,
and to use correct English.
Clarify that it is the mapping of messages to IDs that
impacts the design of the message translation API.
---
2013-05-07 Carlos O'Donell <carlos@redhat.com>
* manual/message.texi (Message Translation): Talk about users.
Message to key mapping impacts design.
PowerPC kernel now provides a vDSO implementation for time syscall
(commit fcb41a2030abe0eb716ef0798035ef9562097f42). This patch changes
time syscall wrapper to use the vDSO when available. It also changes
the default non vDSO time on PowerPC to use sysdeps/posix/time.c
(since gettimeofday is a vDSO call).
* sysdeps/gnu/netinet/tcp.h (TCP_TIMESTAMP): New value, from
Linux 3.9.
* sysdeps/unix/sysv/linux/bits/socket.h (PF_VSOCK, AF_VSOCK):
Add.
(PF_MAX): Adjust for VSOCK change.
We add yesstr and nostr to three more locales.
We ignore the issue of capitalization of the first
character in yesstr and nostr. All locales will need
to be revisited to make this uniform policy change.
---
2013-05-02 Carlos O'Donell <carlos@redhat.com>
[BZ #15264]
* localedata/locales/en_CA (LC_MESSAGES): Define yesstr and nostr.
* localedata/locales/es_AR (LC_MESSAGES): Copy es_ES.
* localedata/locales/es_ES (LC_MESSAGES): Define yesstr and nostr.
Use the __gnu_inline__ attribute in _FORTIFY_SOURCE's __extern_always_inline
macro whenever the compiler supports it. Previously this macro only included
the __gnu_inline__ attribute in C++ mode for gcc >= 4.3. However,
__gnu_inline__ semantics are always desired for the __extern_always_inline
functions, and are available in g++ 4.2 (and some releases of g++ 4.1, and
also in Clang, which claims to be g++ 4.2).
This change stops g++-4.2 from emitting weak definitions for the fortify
wrapper functions if they can't be inlined, and also improves Clang
compatibility.
Some math functions have distinct performance characteristics in
specific domains of inputs, where some inputs return via a fast path
while other inputs require multiple precision calculations, that too
at different precision levels. The way to implement different domains
was to have a separate source file and benchmark definition, resulting
in separate programs.
This clutters up the benchmark, so this change allows these domains to
be consolidated into the same input file. To do this, the input file
format is now enhanced to allow comments with a preceding # and
directives with two # at the begining of a line. A directive that
looks like:
tells the benchmark generation script that what follows is a different
domain of inputs. The value of the 'name' directive (in this case,
foo) is used in the output. The two input domains are then executed
sequentially and their results collated separately. with the above
directive, there would be two lines in the result that look like:
func(): ....
func(foo): ...
The idea to run benchmarks for a constant number of iterations is
problematic. While the benchmarks may run for 10 seconds on x86_64,
they could run for about 30 seconds on powerpc and worse, over 3
minutes on arm. Besides that, adding a new benchmark is cumbersome
since one needs to find out the number of iterations needed for a
sufficient runtime.
A better idea would be to run each benchmark for a specific amount of
time. This patch does just that. The run time defaults to 10 seconds
and it is configurable at command line:
make BENCH_DURATION=5 bench
This patch fix the 3c0265394d commits
by correctly setting minimum architecture for modf PPC optimization
to power5+ instead of power5 (since only on power5+ round/ceil will
be inline to inline assembly).
Use the most accurate hex literals possible for the answers to the
cos and sincos tests that vary according to the error in the rounding
of PI/2.
---
2013-04-24 Carlos O'Donell <carlos@redhat.com>
* math/libm-test.inc (cos_test): Use accurate hex constants.
(sincost_test): Likewise.
Resolves#14888.
This only really manifests itself when there are no spaces between
format specifiers, which is not allowed by POSIX, but is allowed by
the glibc implementation.
Kay Sievers reported that coreutils' stat tool has a problem with
s390's statfs[64] definition:
> The definition of struct statfs::f_type needs a fix. s390 is the only
> architecture in the kernel that uses an int and expects magic
> constants lager than INT_MAX to fit into.
>
> A fix is needed to make Fedora boot on s390, it currently fails to do
> so. Userspace does not want to add code to paper-over this issue.
[...]
> Even coreutils cannot handle it:
> #define RAMFS_MAGIC 0x858458f6
> # stat -f -c%t /
> ffffffff858458f6
>
> #define BTRFS_SUPER_MAGIC 0x9123683E
> # stat -f -c%t /mnt
> ffffffff9123683e
The bug is caused by an implicit sign extension within the stat tool:
out_uint_x (pformat, prefix_len, statfsbuf->f_type);
where the format finally will be "%lx".
A similar problem can be found in the 'tail' tool.
s390 is the only architecture which has an int type f_type member in
struct statfs[64]. Other architectures have either unsigned ints or
long values, so that the problem doesn't occur there.
Therefore change the type of the f_type member to unsigned int, so
that we get zero extension instead sign extension when assignment to
a long value happens.
Reported-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
We no longer support configuring for i386, nor do we
elide such a configuration to i686. Configuring with
i386-* is a failure, and we provide an example of
how to fix that.
---
2013-04-17 Carlos O'Donell <carlos@redhat.com>
* configure.in: Remove i386 configure warning. Remove i386 case.
* configure: Regenerate.
* sysdeps/i386/configure.in: Raise error if config_machine is i386.
Add example to error message.
* sysdeps/i386/configure: Regenerate.
Appending benchmark program output on every run could result in a case
where the benchmark run was cancelled, resulting in a partially
written file. This file gets used again on the next run, resulting in
results being appended to old results.
It could have been possible to remove the file before every benchmark
run, but it is easier to just write the output to bench.out-tmp only
once.
Benchmark programs are generated using parameters from the Makefile,
so it is necessary to rebuild them whenever the parameters in the
Makefile are updated. Hence, added a dependency for the generated C
source on the Makefile so that it gets regenerated when the Makefile
is updated.
The value of PI is never exactly PI in any floating point representation,
and the value of PI/2 is never PI/2. It is wrong to expect cos(M_PI_2l)
to return 0, instead it will return an answer that is non-zero because
M_PI_2l doesn't round to exactly PI/2 in the type used.
That is to say that the correct answer is to do the following:
* Take PI or PI/2.
* Round to the floating point representation.
* Take the rounded value and compute an infinite precision cos or sin.
* Use the rounded result of the infinite precision cos or sin as the
answer to the test.
I used printf to do the type rounding, and Wolfram's Alpha to do the
infinite precision cos calculations.
The following changes bring x86-64 and x86 to 1/2 ulp for two tests.
It shows that the x86 cos implementation is quite good, and that
our test are flawed.
Unfortunately given that the rounding errors are type dependent we
need to fix this for each type. No regressions on x86-64 or x86.
---
2013-04-11 Carlos O'Donell <carlos@redhat.com>
* math/libm-test.inc (cos_test): Fix PI/2 test.
(sincos_test): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate.
* sysdeps/i386/fpu/libm-test-ulps: Regenerate.
run-via-rtld-prefix checks whether the program to be run is a static
test and skips if it is. This is fine, except that it assumes that
the program to be run is the second $^, which is true only for tests.
This change creates an rtld-prefix, which is simply the dynamic linker
prefix with the necessary arguments and uses that in the non-test
targets.
Fixes#15346.
The POSIX description of getdate allows for extra spaces in the
getdate input string. __getdate_r uses strptime internally, which
works fine with extra spaces between format strings (and hence within
an input string) but not with leading and trailing spaces. So we trim
off the leading and trailing spaces before we pass it on to strptime.
Document the use of the convenience testrun.sh script for
running the libm test.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
* math/README.libm-test (How can I generate "libm-test-ulps"?):
Use testrun.sh to run libm tests.
The seen array was doubled in size recently, but the memset to clear
the array was not adjusted. We adjust the memset to always be correct
regardless of the size of seen.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
[BZ #15309]
* elf/dl-open.c (dl_open_worker): memset all of seen array.
Define yesstr/nostr in fi_FI (as "Kyllä" and "Ei").
Fixes part of BZ#15264.
---
2013-04-06 Marko Myllynen <myllynen@redhat.com>
[BZ #15264]
* locales/fi_FI (LC_MESSAGES): Define yesstr and nostr.
The wiki "Regeneration" page has this to say about update ULPs.
"The libm-test-ulps files are semiautomatically updated. To
update an ulps baseline, run each of the failing tests (test-float,
test-double, etc.) with -u; this will generate a file called ULPs;
concatenate each of those files with the existing libm-test-ulps
file, after removing any entries for particularly huge numbers of
ulps that you do not want to mark as expected. Then run
gen-libm-test.pl -n -u FILE where FILE is the concatenated file
produced in the previous step. This generates a file called
NewUlps which is the new sorted version of libm-test-ulps."
The same information is listed in math/README.libm-test, and is a
lot of manual work that you often want to run over-and-over again
while working on a particular test.
The `regen-ulps' convenience target does this automatically for
developers.
We strictly assume the source tree is readonly and add a
new --output-dir option to libm-test.inc to allow for writing
out ULPs to $(objpfx).
When run the new target does the following:
* Starts with the baseline ULPs file.
* Runs each of the libm math tests with -u.
* Adds new changes seen with -u to the baseline.
* Sorts and prepares the test output with gen-libm-test.pl.
* Leaves math/NewUlps in your build tree to copy to your source
tree, cleanup, and checkin.
The math test documentation in math/README.libm-test is updated
document the new Makefile target.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
* Makefile.in (regen-ulps): New target.
* math/Makefile [ifneq (no,$(PERL)]: Declare regen-ulps with .PHONY.
[ifneq (no,$(PERL)] (run-regen-ulps): New variable.
[ifneq (no,$(PERL)] (regen-ulps): New target.
[ifeq (no,$(PERL)] (regen-ulps): New target.
* math/libm-test.inc (ulps_file_name): Define.
(output_dir): New variable.
(options): Add "output-dir" option.
(parse_opt): Handle 'o' case.
(main): If output_dir is non-NULL use it as a prefix
otherwise use "".
* math/README.libm-test: Update `How can I generate "libm-test-ulps"?'
This change does two things:
* Treats a target i386-* as if it were i686.
* Fails configure if the user is generating code
for i386.
We no longer support i386 code-generation because the i386
lacks the atomic operations we need in glibc.
You can still configure for i386-*, but you get i686 code.
You can't build with --march=i386, --mtune=i386 or a compiler
that defaults to i386 code-generation.
I've added two i386 entries in the master todo list to discuss
merging and renaming:
http://sourceware.org/glibc/wiki/Development_Todo/Master#i386
The failure modes are fail-safe here. You compile for i386,
get i686, and try to run on i386 and it fails. The configure
log has a warning saying we elided to i686. There is no situation
that I can see where we run into any serious problems.
The patch makes the current state better in that we get less
confused users and we build successfully in more default
configurations.
The next enhancement would be to add --march=i?86
as suggested in #c20 of BZ#10062 for any i?86-* builds, which
would solve the problem of a 32-bit compiler that defaults to
i386 code-gen and glibc configured for i686-* target. Which
previously failed at build time, and now will fail at configure
time (requires adding --march=i686).
Updated NEWS with BZ #10060 and #10062.
No regressions.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
[BZ #10060, #10062]
* aclocal.m4 (LIBC_COMPILER_BUILTIN_INLINED): New macro.
* sysdeps/i386/configure.in: Use LIBC_COMPILER_BUILTIN_INLINED and
fail configure if __sync_val_compare_and_swap is not inlined.
* sysdeps/i386/configure: Regenerate.
* configure.in: Build for i686 when configured for i386.
* configure: Regenerate.
* README: Remove i386 reference.
Write output from the currently running benchmark into a temporary
file and move files around only once the current run is complete.
That way we don't lose data from the last two runs due to an
incomplete run.
Fix BZ #15305.
On kernel versions earlier than 2.6.29, the Linux kernel exported a
sysctl called restrict_chown for xfs, which could be used to allow
chown to users other than the owner. 2.6.29 removed this support,
causing the open_not_cancel_2 to fail and thus modify errno. The fix
is to save and restore errno so that the caller sees it as unmodified.
Additionally, since the code to check the sysctl is not useful on
newer kernels, we add an ifdef so that in future the code block gets
rmeoved completely.
Separate benchmarks for the fast and slow implementations of pow and
exp since measuring both together doesn't make sense. Adjust the
iterations for pow and exp accordingly so that they run long enough
for the measurements to be meaningful.