This patch adds some more directives to the benchmark inputs file,
moving functionality from the Makefile and making the code generation
script a bit cleaner. The function argument and return types that
were earlier added as variables in the makefile and passed to the
script via command line arguments are now the 'args' and 'ret'
directive respectively. 'args' should be a colon separated list of
argument types (skipped if the function doesn't accept any arguments)
and 'ret' should be the return type.
Additionally, an 'includes' directive may have a comma separated list
of headers to include in the source. For example, the pow input file
now looks like this:
42.0, 42.0
1.0000000000000020, 1.5
I did this to unclutter the benchtests Makefile a bit and eventually
eliminate dependency of the tests on the Makefile and have tests
depend on their respective include files only.
Add some comments and call free on all potentially allocated pointers.
Also remove duplicate check for NULL pointer.
ChangeLog:
2013-10-04 Will Newton <will.newton@linaro.org>
* malloc/tst-valloc.c: Add comments.
(do_test): Add comments and call free on all potentially
allocated pointers. Remove duplicate check for NULL pointer.
Add space after cast.
Add some comments and call free on all potentially allocated pointers.
Also remove duplicate check for NULL pointer.
ChangeLog:
2013-10-04 Will Newton <will.newton@linaro.org>
* malloc/tst-pvalloc.c: Add comments.
(do_test): Add comments and call free on all potentially
allocated pointers. Remove duplicate check for NULL pointer.
Add space after cast.
Add some comments and call free on all potentially allocated pointers.
ChangeLog:
2013-10-04 Will Newton <will.newton@linaro.org>
* malloc/tst-posix_memalign.c: Add comments.
(do_test): Add comments and call free on all potentially
allocated pointers. Add space after cast.
* sysdeps/powerpc/powerpc32/dl-machine.c (__process_machine_rela):
Use stdint types in rather than __attribute__((mode())).
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00096.html
This adds the basic configury bits for powerpc64le and powerpcle.
* configure.in: Map powerpc64le and powerpcle to base_machine/machine.
* configure: Regenerate.
* nptl/shlib-versions: Powerpc*le starts at 2.18.
* shlib-versions: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00095.html
I found this useful at one stage when I was seeing a huge number of
memrchr failures all of test number 10.
* string/tester.c (test_memrchr): Increment reported test cycle.
http://sourceware.org/ml/libc-alpha/2013-08/msg00094.html
Using plain %s here runs the risk of segfaulting when displaying the
string. src and dst aren't zero terminated strings.
* string/test-memcpy.c (do_one_test): When reporting errors, print
string address and don't overrun end of string.
http://sourceware.org/ml/libc-alpha/2013-08/msg00105.html
Like strnlen, memchr and memrchr had a number of defects fixed by this
patch as well as adding little-endian support. The first one I
noticed was that the entry to the main loop needlessly checked for
"are we done yet?" when we know the size is large enough that we can't
be done. The second defect I noticed was that the main loop count was
wrong, which in turn meant that the small loop needed to handle an
extra word. Thirdly, there is nothing to say that the string can't
wrap around zero, except of course that we'd normally hit a segfault
on trying to read from address zero. Fixing that simplified a number
of places:
- /* Are we done already? */
- addi r9,r8,8
- cmpld r9,r7
- bge L(null)
becomes
+ cmpld r8,r7
+ beqlr
However, the exit gets an extra test because I test for being on the
last word then if so whether the byte offset is less than the end.
Overall, the change is a win.
Lastly, memrchr used the wrong cache hint.
* sysdeps/powerpc/powerpc64/power7/memchr.S: Replace rlwimi with
insrdi. Make better use of reg selection to speed exit slightly.
Schedule entry path a little better. Remove useless "are we done"
checks on entry to main loop. Handle wrapping around zero address.
Correct main loop count. Handle single left-over word from main
loop inline rather than by using loop_small. Remove extra word
case in loop_small caused by wrong loop count. Add little-endian
support.
* sysdeps/powerpc/powerpc32/power7/memchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. Use proper
cache hint.
* sysdeps/powerpc/powerpc32/power7/memrchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Add little-endian
support. Avoid rlwimi.
* sysdeps/powerpc/powerpc32/power7/rawmemchr.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00104.html
One of the things I noticed when looking at power7 timing is that rlwimi
is cracked and the two resulting insns have a register dependency.
That makes it a little slower than the equivalent rldimi.
* sysdeps/powerpc/powerpc64/memset.S: Replace rlwimi with
insrdi. Formatting.
* sysdeps/powerpc/powerpc64/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memset.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00103.html
LIttle-endian support for memcpy. I spent some time cleaning up the
64-bit power7 memcpy, in order to avoid the extra alignment traps
power7 takes for little-endian. It probably would have been better
to copy the linux kernel version of memcpy.
* sysdeps/powerpc/powerpc32/power4/memcpy.S: Add little endian support.
* sysdeps/powerpc/powerpc32/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/mempcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. Make better
use of regs. Use power7 mtocrf. Tidy function tails.
http://sourceware.org/ml/libc-alpha/2013-08/msg00102.html
This is a rather large patch due to formatting and renaming. The
formatting changes were to make it possible to compare power7 and
power4 versions of memcmp. Using different register defines came
about while I was wrestling with the code, trying to find spare
registers at one stage. I found it much simpler if we refer to a reg
by the same name throughout a function, so it's better if short-term
multiple use regs like rTMP are referred to using their register
number. I made the cr field usage changes when attempting to reload
rWORDn regs in the exit path to byte swap before comparing when
little-endian. That proved a bad idea due to the pipelining involved
in the main loop; Offsets to reload the regs were different first
time around the loop.. Anyway, I left the cr field usage changes in
place for consistency.
Aside from these more-or-less cosmetic changes, I fixed a number of
places where an early exit path restores regs unnecessarily, removed
some dead code, and optimised one or two exits.
* sysdeps/powerpc/powerpc64/power7/memcmp.S: Add little-endian support.
Formatting. Consistently use rXXX register defines or rN defines.
Use early exit labels that avoid restoring unused non-volatile regs.
Make cr field use more consistent with rWORDn compares. Rename
regs used as shift registers for unaligned loop, using rN defines
for short lifetime/multiple use regs.
* sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcmp.S: Likewise. Exit with
addi 1,1,64 to pop stack frame. Simplify return value code.
* sysdeps/powerpc/powerpc32/power4/memcmp.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html
Adds little-endian support to optimised strchr assembly. I've also
tweaked the big-endian code a little. In power7/strchr.S there's a
check in the tail of the function that we didn't match 0 before
finding a c match, done by comparing leading zero counts. It's just
as valid, and quicker, to compare the raw output from cmpb.
Another little tweak is to use rldimi/insrdi in place of rlwimi for
the power7 strchr functions. Since rlwimi is cracked, it is a few
cycles slower. rldimi can be used on the 32-bit power7 functions
too.
* sysdeps/powerpc/powerpc64/power7/strchr.S (strchr): Add little-endian
support. Correct typos, formatting. Optimize tail. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc32/power7/strchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strchrnul.S (__strchrnul): Add
little-endian support. Correct typos.
* sysdeps/powerpc/powerpc32/power7/strchrnul.S: Likewise. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc64/strchr.S (rTMP4, rTMP5): Define. Use
in loop and entry code to keep "and." results.
(strchr): Add little-endian support. Comment. Move cntlzd
earlier in tail.
* sysdeps/powerpc/powerpc32/strchr.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00100.html
The strcpy changes for little-endian are quite straight-forward, just
a matter of rotating the last word differently.
I'll note that the powerpc64 version of stpcpy is just begging to be
converted to use 64-bit loads and stores..
* sysdeps/powerpc/powerpc64/strcpy.S: Add little-endian support:
* sysdeps/powerpc/powerpc32/strcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/stpcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/stpcpy.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00099.html
More little-endian support. I leave the main strcmp loops unchanged,
(well, except for renumbering rTMP to something other than r0 since
it's needed in an addi insn) and modify the tail for little-endian.
I noticed some of the big-endian tail code was a little untidy so have
cleaned that up too.
* sysdeps/powerpc/powerpc64/strcmp.S (rTMP2): Define as r0.
(rTMP): Define as r11.
(strcmp): Add little-endian support. Optimise tail.
* sysdeps/powerpc/powerpc32/strcmp.S: Similarly.
* sysdeps/powerpc/powerpc64/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/strncmp.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00098.html
The existing strnlen code has a number of defects, so this patch is more
than just adding little-endian support. The changes here are similar to
those for memchr.
* sysdeps/powerpc/powerpc64/power7/strnlen.S (strnlen): Add
little-endian support. Remove unnecessary "are we done" tests.
Handle "s" wrapping around zero and extremely large "size".
Correct main loop count. Handle single left-over word from main
loop inline rather than by using small_loop. Correct comments.
Delete "zero" tail, use "end_max" instead.
* sysdeps/powerpc/powerpc32/power7/strnlen.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00097.html
This is the first of nine patches adding little-endian support to the
existing optimised string and memory functions. I did spend some
time with a power7 simulator looking at cycle by cycle behaviour for
memchr, but most of these patches have not been run on cpu simulators
to check that we are going as fast as possible. I'm sure PowerPC can
do better. However, the little-endian support mostly leaves main
loops unchanged, so I'm banking on previous authors having done a
good job on big-endian.. As with most code you stare at long enough,
I found some improvements for big-endian too.
Little-endian support for strlen. Like most of the string functions,
I leave the main word or multiple-word loops substantially unchanged,
just needing to modify the tail.
Removing the branch in the power7 functions is just a tidy. .align
produces a branch anyway. Modifying regs in the non-power7 functions
is to suit the new little-endian tail.
* sysdeps/powerpc/powerpc64/power7/strlen.S (strlen): Add little-endian
support. Don't branch over align.
* sysdeps/powerpc/powerpc32/power7/strlen.S: Likewise.
* sysdeps/powerpc/powerpc64/strlen.S (strlen): Add little-endian support.
Rearrange tmp reg use to suit. Comment.
* sysdeps/powerpc/powerpc32/strlen.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00093.html
This copies the sparc version of sigstack.h, which gives powerpc
#define MINSIGSTKSZ 4096
#define SIGSTKSZ 16384
Before the VSX changes, struct rt_sigframe size was 1920 plus 128 for
__SIGNAL_FRAMESIZE giving ppc64 exactly the default MINSIGSTKSZ of
2048.
After VSX, ucontext increased by 256 bytes. Oops, we're over
MINSIGSTKSZ, so powerpc has been using the wrong value for quite a
while. Add another ucontext for TM and rt_sigframe is now at 3872,
giving actual MINSIGSTKSZ of 4000.
The glibc testcase that I was looking at was tst-cancel21, which
allocates 2*SIGSTKSZ (not because the test is trying to be
conservative, but because the test actually has nested signal stack
frames). We blew the allocation by 48 bytes when using current
mainline gcc to compile glibc (le ppc64).
The required stack depth in _dl_lookup_symbol_x from the top of the
next signal frame was 10944 bytes. I guess you'd want to add 288 to
that, implying an actual SIGSTKSZ of 11232.
* sysdeps/unix/sysv/linux/powerpc/bits/sigstack.h: New file.
http://sourceware.org/ml/libc-alpha/2013-08/msg00092.html
Use conditional form of branch and link to avoid destroying the cpu
link stack used to predict blr return addresses.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/makecontext.S: Use
conditional form of branch and link when obtaining pc.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/makecontext.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00091.html
More LE support, correcting word accesses to _dl_hwcap.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/getcontext-common.S: Use
HIWORD/LOWORD.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S: Ditto.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S: Ditto.
http://sourceware.org/ml/libc-alpha/2013-08/msg00090.html
This patch fixes symbol versioning in setjmp/longjmp. The existing
code uses raw versions, which results in wrong symbol versioning when
you want to build glibc with a base version of 2.19 for LE.
Note that the merging the 64-bit and 32-bit versions in novmx-lonjmp.c
and pt-longjmp.c doesn't result in GLIBC_2.0 versions for 64-bit, due
to the base in shlib_versions.
* sysdeps/powerpc/longjmp.c: Use proper symbol versioning macros.
* sysdeps/powerpc/novmx-longjmp.c: Likewise.
* sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/bsd-setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/mcount.c: Likewise.
* sysdeps/powerpc/powerpc32/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/setjmp.S: Likewise.
* nptl/sysdeps/unix/sysv/linux/powerpc/pt-longjmp.c: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00089.html
Little-endian fixes for setjmp/longjmp. When writing these I noticed
the setjmp code corrupts the non volatile VMX registers when using an
unaligned buffer. Anton fixed this, and also simplified it quite a
bit.
The current code uses boilerplate for the case where we want to store
16 bytes to an unaligned address. For that we have to do a
read/modify/write of two aligned 16 byte quantities. In our case we
are storing a bunch of back to back data (consective VMX registers),
and only the start and end of the region need the read/modify/write.
[BZ #15723]
* sysdeps/powerpc/jmpbuf-offsets.h: Comment fix.
* sysdeps/powerpc/powerpc32/fpu/__longjmp-common.S: Correct
_dl_hwcap access for little-endian.
* sysdeps/powerpc/powerpc32/fpu/setjmp-common.S: Likewise. Don't
destroy vmx regs when saving unaligned.
* sysdeps/powerpc/powerpc64/__longjmp-common.S: Correct CR load.
* sysdeps/powerpc/powerpc64/setjmp-common.S: Likewise CR save. Don't
destroy vmx regs when saving unaligned.
http://sourceware.org/ml/libc-alpha/2013-08/msg00088.html
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Increase alignment of
constants to usual value for .cst8 section, and remove redundant
high address load.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Use float
constant for 0x1p52. Load little-endian words of double from
correct stack offsets.
http://sourceware.org/ml/libc-alpha/2013-07/msg00201.html
These two functions oddly test x+1>0 when a double x is >= 0.0, and
similarly when x is negative. I don't see the point of that since the
test should always be true. I also don't see any need to convert x+1
to integer rather than simply using xr+1. Note that the standard
allows these functions to return any value when the input is outside
the range of long long, but it's not too hard to prevent xr+1
overflowing so that's what I've done.
(With rounding mode FE_UPWARD, x+1 can be a lot more than what you
might naively expect, but perhaps that situation was covered by the
x - xrf < 1.0 test.)
* sysdeps/powerpc/fpu/s_llround.c (__llround): Rewrite.
* sysdeps/powerpc/fpu/s_llroundf.c (__llroundf): Rewrite.
http://sourceware.org/ml/libc-alpha/2013-07/msg00200.html
This works around the fact that vsx is disabled in current
little-endian gcc. Also, float constants take 4 bytes in memory
vs. 16 bytes for vector constants, and we don't need to write one lot
of masks for double (register format) and another for float (mem
format).
* sysdeps/powerpc/fpu/s_float_bitwise.h (__float_and_test28): Don't
use vector int constants.
(__float_and_test24, __float_and8, __float_get_exp): Likewise.
http://sourceware.org/ml/libc-alpha/2013-07/msg00197.html
A rewrite to make this code correct for little-endian.
* sysdeps/ieee754/ldbl-128ibm/e_sqrtl.c (mynumber): Replace
union 32-bit int array member with 64-bit int array.
(t515, tm256): Double rather than long double.
(__ieee754_sqrtl): Rewrite using 64-bit arithmetic.
http://sourceware.org/ml/libc-alpha/2013-08/msg00085.html
Rid ourselves of ieee854.
* sysdeps/ieee754/ldbl-128ibm/ieee754.h (union ieee854_long_double):
Delete.
(IEEE854_LONG_DOUBLE_BIAS): Delete.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h: Don't include ieee854
version of math_ldbl.h.
http://sourceware.org/ml/libc-alpha/2013-08/msg00084.html
Another batch of ieee854 macros and union replacement. These four
files also have bugs fixed with this patch. The fact that the two
doubles in an IBM long double may have different signs means that
negation and absolute value operations can't just twiddle one sign bit
as you can with ieee864 style extended double. fmodl, remainderl,
erfl and erfcl all had errors of this type. erfl also returned +1 for
large magnitude negative input where it should return -1. The hypotl
error is innocuous since the value adjusted twice is only used as a
flag. The e_hypotl.c tests for large "a" and small "b" are mutually
exclusive because we've already exited when x/y > 2**120. That allows
some further small simplifications.
[BZ #15734], [BZ #15735]
* sysdeps/ieee754/ldbl-128ibm/e_fmodl.c (__ieee754_fmodl): Rewrite
all uses of ieee875 long double macros and unions. Simplify test
for 0.0L. Correct |x|<|y| and |x|=|y| test. Use
ldbl_extract_mantissa value for ix,iy exponents. Properly
normalize after ldbl_extract_mantissa, and don't add hidden bit
already handled. Don't treat low word of ieee854 mantissa like
low word of IBM long double and mask off bit when testing for
zero.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl): Rewrite
all uses of ieee875 long double macros and unions. Simplify tests
for 0.0L and inf. Correct double adjustment of k. Delete dead code
adjusting ha,hb. Simplify code setting kld. Delete two600 and
two1022, instead use their values. Recognise that tests for large
"a" and small "b" are mutually exclusive. Rename vars. Comment.
* sysdeps/ieee754/ldbl-128ibm/e_remainderl.c (__ieee754_remainderl):
Rewrite all uses of ieee875 long double macros and unions. Simplify
test for 0.0L and nan. Correct negation.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c (__erfl): Rewrite all uses of
ieee875 long double macros and unions. Correct output for large
magnitude x. Correct absolute value calculation.
(__erfcl): Likewise.
* math/libm-test.inc: Add tests for errors discovered in IBM long
double versions of fmodl, remainderl, erfl and erfcl.
http://sourceware.org/ml/libc-alpha/2013-08/msg00083.html
Further replacement of ieee854 macros and unions. These files also
have some optimisations for comparison against 0.0L, infinity and nan.
Since the ABI specifies that the high double of an IBM long double
pair is the value rounded to double, a high double of 0.0 means the
low double must also be 0.0. The ABI also says that infinity and
nan are encoded in the high double, with the low double unspecified.
This means that tests for 0.0L, +/-Infinity and +/-NaN need only check
the high double.
* sysdeps/ieee754/ldbl-128ibm/e_atan2l.c (__ieee754_atan2l): Rewrite
all uses of ieee854 long double macros and unions. Simplify tests
for long doubles that are fully specified by the high double.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_ilogbl.c (__ieee754_ilogbl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_log10l.c (__ieee754_log10l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_logl.c (__ieee754_logl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c (__kernel_tanl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_isinf_nsl.c (__isinf_nsl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_isinfl.c (___isinfl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (__log1pl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_modfl.c (__modfl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c (__nextafterl): Likewise.
Comment on variable precision.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c (__scalblnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (__scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c (__tanhl): Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Adjust tan_towardzero ulps.
http://sourceware.org/ml/libc-alpha/2013-08/msg00081.html
This is the first of a series of patches to ban ieee854_long_double
and the ieee854_long_double macros when using IBM long double. union
ieee854_long_double just isn't correct for IBM long double, especially
when little-endian, and pretending it is OK has allowed a number of
bugs to remain undetected in sysdeps/ieee754/ldbl-128ibm/.
This changes the few places in generic code that use it.
* stdio-common/printf_size.c (__printf_size): Don't use
union ieee854_long_double in fpnum union.
* stdio-common/printf_fphex.c (__printf_fphex): Likewise. Use
signbit macro to retrieve sign from long double.
* stdio-common/printf_fp.c (___printf_fp): Use signbit macro to
retrieve sign from long double.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c: Adjust for fpnum change.
* sysdeps/ieee754/ldbl-128/printf_fphex.c: Likewise.
* sysdeps/ieee754/ldbl-96/printf_fphex.c: Likewise.
* sysdeps/x86_64/fpu/printf_fphex.c: Likewise.
* math/test-misc.c (main): Don't use union ieee854_long_double.
ports/
* sysdeps/ia64/fpu/printf_fphex.c: Adjust for fpnum change.
http://sourceware.org/ml/libc-alpha/2013-06/msg00919.html
I discovered a number of places where denormals and other corner cases
were being handled wrongly.
- printf_fphex.c: Testing for the low double exponent being zero is
unnecessary. If the difference in exponents is less than 53 then the
high double exponent must be nearing the low end of its range, and the
low double exponent hit rock bottom.
- ldbl2mpn.c: A denormal (ie. exponent of zero) value is treated as
if the exponent was one, so shift mantissa left by one. Code handling
normalisation of the low double mantissa lacked a test for shift count
greater than bits in type being shifted, and lacked anything to handle
the case where the difference in exponents is less than 53 as in
printf_fphex.c.
- math_ldbl.h (ldbl_extract_mantissa): Same as above, but worse, with
code testing for exponent > 1 for some reason, probably a typo for >= 1.
- math_ldbl.h (ldbl_insert_mantissa): Round the high double as per
mpn2ldbl.c (hi is odd or explicit mantissas non-zero) so that the
number we return won't change when applying ldbl_canonicalize().
Add missing overflow checks and normalisation of high mantissa.
Correct misleading comment: "The hidden bit of the lo mantissa is
zero" is not always true as can be seen from the code rounding the hi
mantissa. Also by inspection, lzcount can never be less than zero so
remove that test. Lastly, masking bitfields to their widths can be
left to the compiler.
- mpn2ldbl.c: The overflow checks here on rounding of high double were
just plain wrong. Incrementing the exponent must be accompanied by a
shift right of the mantissa to keep the value unchanged. Above notes
for ldbl_insert_mantissa are also relevant.
[BZ #15680]
* sysdeps/ieee754/ldbl-128ibm/e_rem_pio2l.c: Comment fix.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c
(PRINT_FPHEX_LONG_DOUBLE): Tidy code by moving -53 into ediff
calculation. Remove unnecessary test for denormal exponent.
* sysdeps/ieee754/ldbl-128ibm/ldbl2mpn.c (__mpn_extract_long_double):
Correct handling of denormals. Avoid undefined shift behaviour.
Correct normalisation of low mantissa when low double is denormal.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h
(ldbl_extract_mantissa): Likewise. Comment. Use uint64_t* for hi64.
(ldbl_insert_mantissa): Make both hi64 and lo64 parms uint64_t.
Correct normalisation of low mantissa. Test for overflow of high
mantissa and normalise.
(ldbl_nearbyint): Use more readable constant for two52.
* sysdeps/ieee754/ldbl-128ibm/mpn2ldbl.c
(__mpn_construct_long_double): Fix test for overflow of high
mantissa and correct normalisation. Avoid undefined shift.
http://sourceware.org/ml/libc-alpha/2013-07/msg00001.html
This patch starts the process of supporting powerpc64 little-endian
long double in glibc. IBM long double is an array of two ieee
doubles, so making union ibm_extended_long_double reflect this fact is
the correct way to access fields of the doubles.
* sysdeps/ieee754/ldbl-128ibm/ieee754.h
(union ibm_extended_long_double): Define as an array of ieee754_double.
(IBM_EXTENDED_LONG_DOUBLE_BIAS): Delete.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c: Update all references
to ibm_extended_long_double and IBM_EXTENDED_LONG_DOUBLE_BIAS.
* sysdeps/ieee754/ldbl-128ibm/e_exp10l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/ldbl2mpn.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h: Likewise.
* sysdeps/ieee754/ldbl-128ibm/mpn2ldbl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/strtold_l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
[BZ #15915] As described in the bug, the pattern rule for lib%.so files
in Makerules includes linkobj/libc.so as a dependency. However, the
explicit rule for linkobj/libc.so is in the top-level Makefile.
Thus, the subdirectory makefiles that include Makerules end up with an
erroneous makefile pattern rule for linkobj/libc.so that includes
itself as a dependency. The result is make warnings whenever rules
for other .so files are resolved -- and, on occasion, actual makefile
failures when a race condition causes the implicit rule to actually be
used.
This patch moves the explicit rules for linkobj/libc.so into Makerules
to clear up this problem. It also elaborates a couple of comments
that I'd initially found confusing.
Colin Watson reported that some versions of gcc warn about
attribute leaf used on a static function, since it has no
effect on anything but external functions.
* posix/glob.c (next_brace_sub, prefix_array, collated_compare):
Use __THROWNL rather than __THROW on static functions.
Signed-off-by: Eric Blake <eblake@redhat.com>
sysdeps/unix/make-syscalls.sh and sysdeps/unix/Makefile use GNU Bash's
${parameter/pattern/string} parameter expansion. Non-Bash shells (e.g.
dash or BusyBox ash when built with CONFIG_ASH_BASH_COMPAT disabled)
don't support this expansion syntax. So glibc will fail to build when
$(SHELL) expands to a path that isn't provided by Bash.
An example build failure:
for dir in [...]; do \
test -f $dir/syscalls.list && \
{ sysdirs='[...]' \
asm_CPP='gcc -c -I[...] -D_LIBC_REENTRANT -include include/libc-symbols.h -DASSEMBLER -g -Wa,--noexecstack -E -x assembler-with-cpp' \
/bin/sh sysdeps/unix/make-syscalls.sh $dir || exit 1; }; \
test $dir = sysdeps/unix && break; \
done > [build-dir]/sysd-syscallsT
sysdeps/unix/make-syscalls.sh: line 273: syntax error: bad substitution
This patch simply replaces the three instances of the Bash-only syntax
in these files with an echo and sed command substitution.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This define was removed from the rest of the tree eight years ago.
ChangeLog:
2013-09-24 Will Newton <will.newton@linaro.org>
* sysdeps/mach/hurd/i386/tls.h (TLS_INIT_TP_EXPENSIVE): Remove
macro.
strcoll is implemented using a cache for indices and weights of
collation sequences in the strings so that subsequent passes do not
have to search through collation data again. For very large string
inputs, the cache size computation could overflow. In such a case,
use the fallback function that does not cache indices and weights of
collation sequences.
Fixes CVE-2012-4412.
strcoll currently falls back to alloca if malloc fails, resulting in a
possible stack overflow. This patch implements sequence traversal and
comparison without caching indices and rules.
Fixes CVE-2012-4424.
Statically built binaries use __pointer_chk_guard_local,
while dynamically built binaries use __pointer_chk_guard.
Provide the right definition depending on the test case
we are building.
The pointer guard used for pointer mangling was not initialized for
static applications resulting in the security feature being disabled.
The pointer guard is now correctly initialized to a random value for
static applications. Existing static applications need to be
recompiled to take advantage of the fix.
The test tst-ptrguard1-static and tst-ptrguard1 add regression
coverage to ensure the pointer guards are sufficiently random
and initialized to a default value.
for ChangeLog
* malloc/arena.c (new_heap): New memory_heap_new probe.
(grow_heap): New memory_heap_more probe.
(shrink_heap): New memory_heap_less probe.
(heap_trim): New memory_heap_free probe.
* malloc/malloc.c (sysmalloc): New memory_sbrk_more probe.
(systrim): New memory_sbrk_less probe.
* manual/probes.texi: Document them.
It has been a long practice for software using IEEE 754 floating-point
arithmetic run on MIPS processors to use an encoding of Not-a-Number
(NaN) data different to one used by software run on other processors.
And as of IEEE 754-2008 revision [1] this encoding does not follow one
recommended in the standard, as specified in section 6.2.1, where it
is stated that quiet NaNs should have the first bit (d1) of their
significand set to 1 while signalling NaNs should have that bit set to
0, but MIPS software interprets the two bits in the opposite manner.
As from revision 3.50 [2][3] the MIPS Architecture provides for
processors that support the IEEE 754-2008 preferred NaN encoding format.
As the two formats (further referred to as "legacy NaN" and "2008 NaN")
are incompatible to each other, tools have to provide support for the
two formats to help people avoid using incompatible binary modules.
The change is comprised of two functional groups of features, both of
which are required for correct support.
1. Dynamic linker support.
To enforce the NaN encoding requirement in dynamic linking a new ELF
file header flag has been defined. This flag is set for 2008-NaN
shared modules and executables and clear for legacy-NaN ones. The
dynamic linker silently ignores any incompatible modules it
encounters in dependency processing.
To avoid unnecessary processing of incompatible modules in the
presence of a shared module cache, a set of new cache flags has been
defined to mark 2008-NaN modules for the three ABIs supported.
Changes to sysdeps/unix/sysv/linux/mips/readelflib.c have been made
following an earlier code quality suggestion made here:
http://sourceware.org/ml/libc-ports/2009-03/msg00036.html
and are therefore a little bit more extensive than the minimum
required.
Finally a new name has been defined for the dynamic linker so that
2008-NaN and legacy-NaN binaries can coexist on a single system that
supports dual-mode operation and that a legacy dynamic linker that
does not support verifying the 2008-NaN ELF file header flag is not
chosen to interpret a 2008-NaN binary by accident.
2. Floating environment support.
IEEE 754-2008 features are controlled in the Floating-Point Control
and Status (FCSR) register and updates are needed to floating
environment support so that the 2008-NaN flag is set correctly and
the kernel default, inferred from the 2008-NaN ELF file header flag
at the time an executable is loaded, respected.
As the NaN encoding format is a property of GCC code generation that is
both a user-selected GCC configuration default and can be overridden
with GCC options, code that needs to know what NaN encoding standard it
has been configured for checks for the __mips_nan2008 macro that is
defined internally by GCC whenever the 2008-NaN mode has been selected.
This mode is determined at the glibc configuration time and therefore a
few consistency checks have been added to catch cases where compilation
flags have been overridden by the user.
The 2008 NaN set of features relies on kernel support as the in-kernel
floating-point emulator needs to be aware of the NaN encoding used even
on hard-float processors and configure the FPU context according to the
value of the 2008 NaN ELF file header flag of the executable being
started. As at this time work on kernel support is still in progress
and the relevant changes have not made their way yet to linux.org master
repository.
Therefore the minimum version supported has been artificially set to
10.0.0 so that 2008-NaN code is not accidentally run on a Linux kernel
that does not suppport it. It is anticipated that the version is
adjusted later on to the actual initial linux.org kernel version to
support this feature. Legacy NaN encoding support is unaffected, older
kernel versions remain supported.
[1] "IEEE Standard for Floating-Point Arithmetic", IEEE Computer
Society, IEEE Std 754-2008, 29 August 2008
[2] "MIPS Architecture For Programmers, Volume I-A: Introduction to the
MIPS32 Architecture", MIPS Technologies, Inc., Document Number:
MD00082, Revision 3.50, September 20, 2012
[3] "MIPS Architecture For Programmers, Volume I-A: Introduction to the
MIPS64 Architecture", MIPS Technologies, Inc., Document Number:
MD00083, Revision 3.50, September 20, 2012
The TIMING_INIT macro currently sets the number of loop iterations
to 1000, which limits usefulness. Make the argument a clock
resolution value and multiply by 1000 in bench-skeleton.c instead
to allow easier reuse.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
* benchtests/bench-timing.h (TIMING_INIT): Rename ITERS
parameter to RES. Remove hardcoded 1000 value.
* benchtests/bench-skeleton.c (main): Pass RES parameter
to TIMING_INIT and multiply result by 1000.
A large bytes parameter to memalign could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15857]
* malloc/malloc.c (__libc_memalign): Check the value of bytes
does not overflow.
A large bytes parameter to valloc could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15856]
* malloc/malloc.c (__libc_valloc): Check the value of bytes
does not overflow.
A large bytes parameter to pvalloc could cause an integer overflow
and corrupt allocator internals. Check the overflow does not occur
before continuing with the allocation.
ChangeLog:
2013-09-11 Will Newton <will.newton@linaro.org>
[BZ #15855]
* malloc/malloc.c (__libc_pvalloc): Check the value of bytes
does not overflow.
The end of the "Parsing of Floats" subsection currently reads:
The GNU C Library also provides '_l' versions of these functions,
which take an additional argument, the locale to use in conversion.
*Note Parsing of Integers::.
Split the final note as it is unrelated to the above comment and
reference it with "See also" instead.
The pt-chown binary is discussed in the "Running make install" section
without clarification of the needed configure option. Clarify this
and simplfy the discription which is already covered in the "Configuring
and compiling" section.
Long ago static startup did not parse the auxiliary vector and therefore
could not get at any `AT_FPUCW' tag to check whether upon FPU context
allocation the kernel would use a FPU control word setting different to
that provided by the `__fpu_control' variable. Static startup therefore
always initialized the FPU control word, forcing immediate FPU context
allocation even for binaries that otherwise never used the FPU.
As from GIT commit f8f900ecb9 static
startup supports parsing the auxiliary vector, so now it can avoid
explicit initialization of the FPU control word, just as can dynamic
startup, in the usual case where the setting written to the FPU control
word would be the same as the kernel uses. This defers FPU context
allocation until the binary itself actually pokes at the FPU.
Note that the `AT_FPUCW' tag is usually absent from the auxiliary vector
in which case _FPU_DEFAULT is assumed to be the kernel default.
The current tests don't test the functionality of realloc in detail.
Add a new test for realloc that exercises some of the corner cases
that are not otherwise tested.
ChangeLog:
2013-09-09 Will Newton <will.newton@linaro.org>
* malloc/Makefile: Add tst-realloc to tests.
* malloc/tst-realloc.c: New file.
The benchmark for memcpy got disabled accidentally. Re-enable it.
ChangeLog:
2013-09-06 Will Newton <will.newton@linaro.org>
* benchtests/Makefile (string-bench): Add memcpy.
This change synchronizes the glibc headers with the Linux kernel
headers and arranges to coordinate the definition of structures
already defined the Linux kernel UAPI headers.
It is now safe to include glibc's netinet/in.h or Linux's linux/in6.h
in any order in a userspace application and you will get the same
ABI. The ABI is guaranteed by UAPI and glibc.
Since fanotify_init requires CAP_SYS_ADMIN in order to work (which usually
means running as root), we need to handle that error case too.
Reported-by: Andreas Jaeger <aj@suse.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Switch the string benchmarks to using bench-timing.h instead
of hp-timing.h directly. This allows the string benchmarks to
be run usefully on architectures such as ARM that do not have
support for hp-timing.h.
In order to do this the tests have been changed from timing each
individual call and picking the lowest execution time recorded to
timing a number of calls and taking the mean execution time.
ChangeLog:
2013-09-04 Will Newton <will.newton@linaro.org>
* benchtests/bench-timing.h (TIMING_PRINT_MEAN): New macro.
* benchtests/bench-string.h: Include bench-timing.h instead
of including hp-timing.h directly. (INNER_LOOP_ITERS): New
define. (HP_TIMING_BEST): Delete macro. (test_init): Remove
call to HP_TIMING_DIFF_INIT.
* benchtests/bench-memccpy.c: Use bench-timing.h macros
instead of hp-timing.h macros.
* benchtests/bench-memchr.c: Likewise.
* benchtests/bench-memcmp.c: Likewise.
* benchtests/bench-memcpy.c: Likewise.
* benchtests/bench-memmem.c: Likewise.
* benchtests/bench-memmove.c: Likewise.
* benchtests/bench-memset.c: Likewise.
* benchtests/bench-rawmemchr.c: Likewise.
* benchtests/bench-strcasecmp.c: Likewise.
* benchtests/bench-strcasestr.c: Likewise.
* benchtests/bench-strcat.c: Likewise.
* benchtests/bench-strchr.c: Likewise.
* benchtests/bench-strcmp.c: Likewise.
* benchtests/bench-strcpy.c: Likewise.
* benchtests/bench-strcpy_chk.c: Likewise.
* benchtests/bench-strlen.c: Likewise.
* benchtests/bench-strncasecmp.c: Likewise.
* benchtests/bench-strncat.c: Likewise.
* benchtests/bench-strncmp.c: Likewise.
* benchtests/bench-strncpy.c: Likewise.
* benchtests/bench-strnlen.c: Likewise.
* benchtests/bench-strpbrk.c: Likewise.
* benchtests/bench-strrchr.c: Likewise.
* benchtests/bench-strspn.c: Likewise.
* benchtests/bench-strstr.c: Likewise.
LDFLAGS puts the library too early in the command line if --as-needed
is being used. Use LDLIBS instead.
ChangeLog:
2013-09-04 Will Newton <will.newton@linaro.org>
* benchtests/Makefile: Use LDLIBS instead of LDFLAGS.
Another example of all the 64bit arches getting the definition via a
common file, but the 32bit ones all adding it by themselves and hppa
was missed.
I'm not entirely sure about the usage of GLIBC_2.19 symbols here.
We'd like to backport this so people can use it, but it means we'd
be releasing a glibc-2.17/glibc-2.18 with a GLIBC_2.19 symbol in it.
But maybe it won't be a big deal since you'd only get that 2.19 ref
if you actually used the symbol ?
There hasn't been a glibc release where hppa worked w/out a bunch of
patches, so in reality there's only two distros that matter -- Gentoo
and Debian.
Reported-by: Jeroen Roovers <jer@gentoo.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
These have helped me find and fix type conversion issues in QEMU's MIPS
hardware emulation. While certainly glibc is not the best place for such
tests, they're just an enhancement of tests already present.
Since the dlopen funcs might invoke a constructor that calls a func
that is in the same compilation unit as the caller, we cannot mark
them as leaf funcs.
Similarly, dlclose might invoke a destructor that calls a func that
is in the same compilation unit as the caller.
URL: https://sourceware.org/bugzilla/show_bug.cgi?id=15897
Reportedy-by: Fabrice Bauzac <libnoon@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patch fixes backtrace for PPC32 and PPC64 to correctly handle
signal trampolines. The 'debug/tst-backtrace6.c' also check for
SA_SIGINFO handling, where is triggers another vDSO symbols for PPC32.
This patch fixes dlfcn/tststatic5 for PowerPC where pagesize
variable was not properly initialized in certain cases. This patch
is based on other architecture code.
The helper binary pt_chown tricked into granting access to another
user's pseudo-terminal.
Pre-conditions for the attack:
* Attacker with local user account
* Kernel with FUSE support
* "user_allow_other" in /etc/fuse.conf
* Victim with allocated slave in /dev/pts
Using the setuid installed pt_chown and a weak check on whether a file
descriptor is a tty, an attacker could fake a pty check using FUSE and
trick pt_chown to grant ownership of a pty descriptor that the current
user does not own. It cannot access /dev/pts/ptmx however.
In most modern distributions pt_chown is not needed because devpts
is enabled by default. The fix for this CVE is to disable building
and using pt_chown by default. We still provide a configure option
to enable hte use of pt_chown but distributions do so at their own
risk.
The generated header is compiled with `-ffreestanding' to avoid any
circular dependencies against the installed implementation headers.
Such a dependency would require the implementation header to be
installed before the generated header could be built (See bug 15711).
In current practice the generated header dependencies do not include
any of the implementation headers removed by the use of `-ffreestanding'.
---
2013-07-15 Carlos O'Donell <carlos@redhat.com>
[BZ #15711]
* sysdeps/unix/sysv/linux/Makefile ($(objpfx)bits/syscall%h):
Avoid system header dependency with -ffreestanding.
($(objpfx)bits/syscall%d): Likewise.
This change creates a link map in static executables to serve as the
global search list for dlopen. It fixes a problem with the inability
to access the global symbol object and a crash on an attempt to map a
DSO into the global scope. Some code that has become dead after the
addition of this link map is removed too and test cases are provided.
Many Linux arches require fixed mmaps to be aligned higher than pagesize,
so use the SHMLBA define as it represents this quantity exactly.
This fixes spurious errors seen on those arches like:
cannot map archive header: Invalid argument
URL: http://sourceware.org/bugzilla/show_bug.cgi?id=10283
Reported-by: CHIKAMA Masaki <masaki.chikama@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Rather than open coding the masks, add helper macros to do the magic.
This makes code easier to read.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>