* sysdeps/ieee754/ldbl-opt/wordsize-64/s_ceil.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_finite.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_floor.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_frexp.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_isinf.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_isnan.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_llround.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_logb.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_lround.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_modf.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_nearbyint.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_remquo.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_rint.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_round.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_scalbln.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_scalbn.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_trunc.c: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/Implies: Add
ieee754/ldbl-opt/wordsize-64.
* sysdeps/powerpc/powerpc64/Implies: Add
ieee754/dbl-64/wordsize-64.
Adapts __ppc_get_timebase to the upcoming GCC 4.8 that provides
__builtin_ppc_get_timebase. Building applicationns with previous
versions of GCC will continue to use the internal implementation.
Initially based on the versions found in wcsmbs/* ; these files have
been changed by hand unrolling, and adding some additional variables
to allow some read-ahead to occur, which then relieves some of the
wait-for-increment/wait-for-load/wait-for-compare-results pressure
that was slowing down every iteration through the while-loop.
For 64-bit Power7, These changes give an approx 20% throughput boost
for the wcschr and wcsrchr functions; and approx 40% boost for the
wcscpy function. 32-bit improvements appear to be slightly better
with ~ %30 and ~ %45 respectively. Results for Power6 closely match
those for power7.
Assorted tweaking, twisting and tuning to squeeze a few additional cycles
out of the memchr code. Changes include bypassing the shift pairs
(sld,srd) when they are not required, and unrolling the small_loop that
handles short and trailing strings.
Per scrollpipe data measuring aligned strings for 64-bit, these changes
save between five and eight cycles (9-13% overall) for short strings (<32),
Longer aligned strings see slight improvement of 1-3% due to bypassing the
shifts and the instruction rearranging.