This patch fixes the missing "__memcpy_ppc" symbol for memmove-ppc64
object in static builds. Since memcpy ifunc is not enabled in static
mode, the specialized symbols are not provided. The patch changed the
it to just "__memcpy" instead.
The ldbl-128ibm implementation of asinhl uses cut-offs of 0x1p28 and
0x1p-29 to determine when to use simpler formulas that avoid possible
overflow / underflow. Both those cut-offs are inappropriate for this
format, resulting in large errors. This patch changes the code to use
more appropriate cut-offs of 0x1p56 and 0x1p-56, adding tests around
the cut-offs for various floating-point formats.
Tested for powerpc. Also tested for x86_64 and x86 and updated ulps.
[BZ #18020]
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c (__asinhl): Use 2**56 and
2**-56 not 2**28 and 2**-29 as thresholds for simpler formulas.
* math/auto-libm-test-in: Add more tests of asinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Similarly to what we did for in6_addr, we need a macro
to guard in6_pktinfo and ip6_mtuinfo too.
Cc: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
The ldbl-128ibm implementation of acoshl uses a cut-off of 0x1p28 to
determine when to use log(x) + log(2) as a formula. That cut-off is
too small for this format, resulting in large errors. This patch
changes it to a more appropriate cut-off of 0x1p56, adding tests
around the cut-offs for various floating-point formats.
Tested for powerpc. Also tested for x86_64 and x86 and updated ulps.
[BZ #18019]
* sysdeps/ieee754/ldbl-128ibm/e_acoshl.c (__ieee754_acoshl): Use
2**56 not 2**28 as threshold for log (2x) formula.
* math/auto-libm-test-in: Add more tests of acosh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Various x86 / x86_64 versions of scalb / scalbf / scalbl produce
spurious "invalid" exceptions for (qNaN, -Inf) arguments, because this
is wrongly handled like (+/-Inf, -Inf) which *should* raise such an
exception. (In fact the NaN case of the code determining whether to
quietly return a zero or a NaN for second argument -Inf was
accidentally dead since the code had been made to return a NaN with
exception.) This patch fixes the code to do the proper test for an
infinity as distinct from a NaN.
(Since the existing code does nothing to distinguish qNaNs and sNaNs
here, this patch doesn't either. If in future we systematically
implement proper sNaN semantics following TS 18661-1:2014, there will
be lots of bugs to address - Thomas found lots of issues with his
patch <https://sourceware.org/ml/libc-ports/2013-04/msg00008.html> to
add SNaN tests (which never went in and would now require significant
reworking).)
Tested for x86_64 and x86. Committed.
[BZ #16783]
* sysdeps/i386/fpu/e_scalb.S (__ieee754_scalb): Do not handle
arguments (NaN, -Inf) the same as (+/-Inf, -Inf).
* sysdeps/i386/fpu/e_scalbf.S (__ieee754_scalbf): Likewise.
* sysdeps/i386/fpu/e_scalbl.S (__ieee754_scalbl): Likewise.
* sysdeps/x86_64/fpu/e_scalbl.S (__ieee754_scalbl): Likewise.
* math/libm-test.inc (scalb_test_data): Add more tests.
Both open and openat load their last argument 'mode' lazily, using
va_arg() only if O_CREAT is found in oflag. This is wrong, mode is also
necessary if O_TMPFILE is in oflag.
By chance on x86_64, the problem wasn't evident when using O_TMPFILE
with open, as the 3rd argument of open, even when not loaded with
va_arg, is left untouched in RDX, where the syscall expects it.
However, openat was not so lucky, and O_TMPFILE couldn't be used: mode
is the 4th argument, in RCX, but the syscall expects its 4th argument in
a different register than the glibc wrapper, in R10.
Introduce a macro __OPEN_NEEDS_MODE (oflag) to test if either O_CREAT or
O_TMPFILE is set in oflag.
Tested on Linux x86_64.
[BZ #17523]
* io/fcntl.h (__OPEN_NEEDS_MODE): New macro.
* io/bits/fcntl2.h (open): Use it.
(openat): Likewise.
* io/open.c (__libc_open): Likewise.
* io/open64.c (__libc_open64): Likewise.
* io/open64_2.c (__open64_2): Likewise.
* io/open_2.c (__open_2): Likewise.
* io/openat.c (__openat): Likewise.
* io/openat64.c (__openat64): Likewise.
* io/openat64_2.c (__openat64_2): Likewise.
* io/openat_2.c (__openat_2): Likewise.
* sysdeps/mach/hurd/open.c (__libc_open): Likewise.
* sysdeps/mach/hurd/openat.c (__openat): Likewise.
* sysdeps/posix/open64.c (__libc_open64): Likewise.
* sysdeps/unix/sysv/linux/dl-openat64.c (openat64): Likewise.
* sysdeps/unix/sysv/linux/generic/open.c (__libc_open): Likewise.
(__open_nocancel): Likewise.
* sysdeps/unix/sysv/linux/generic/open64.c (__libc_open64): Likewise.
* sysdeps/unix/sysv/linux/open64.c (__libc_open64): Likewise.
* sysdeps/unix/sysv/linux/openat.c (__OPENAT): Likewise.
* sysdeps/unix/sysv/linux/s390/lowlevellock.h: Include
<sysdeps/nptl/lowlevellock.h> and remove macros and
functions that are now defined there.
(SYS_futex): Remove.
(lll_compare_and_swap): Remove.
* sysdeps/s390/bits/atomic.h (atomic_exchange_acq): Define.
This patch fixes bug 15319, missing underflows from atan / atan2 when
the result of atan is very close to its small argument (or that of
atan2 is very close to the ratio of its arguments, which may be an
exact division).
The usual approach of doing an underflowing computation if the
computed result is subnormal is followed. For 32-bit x86, there are
extra complications: the inline __ieee754_atan2 in bits/mathinline.h
needs to be disabled for float and double because other libm functions
using it generally rely on getting proper underflow exceptions from
it, while the out-of-line functions have to remove excess range and
precision from the underflowing result so as to return an exact 0 in
the case where errno should be set for underflow to 0. (The failures
I saw without that are similar to those Carlos reported for other
functions, where I haven't seen a response to
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>
confirming if my diagnosis is correct. Arguably all libm functions
with float and double returns should remove excess range and
precision, but that's a separate matter.)
The x86_64 long double case reported in a comment in bug 15319 is not
a bug (it's an argument of LDBL_MIN, and x86_64 is an after-rounding
architecture so the correct IEEE result is not to raise underflow in
the given rounding mode, in addition to treating the result as an
exact LDBL_MIN being within the newly clarified documentation of
accuracy goals). I'm presuming that the fpatan instruction can be
trusted to raise appropriate exceptions when the (long double) result
underflows (after rounding) and so no changes are needed for x86 /
x86_64 long double functions here; empirically this is the case for
the cases covered in the testsuite, on my system.
Tested for x86_64, x86, powerpc and mips64. Only 32-bit x86 needs
ulps updates (for the changes to inlines meaning some functions no
longer get excess precision from their __ieee754_atan2* calls).
[BZ #15319]
* sysdeps/i386/fpu/e_atan2.S (dbl_min): New object.
(MO): New macro.
(__ieee754_atan2): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/e_atan2f.S (flt_min): New object.
(MO): New macro.
(__ieee754_atan2f): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/s_atan.S (dbl_min): New object.
(MO): New macro.
(__atan): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/i386/fpu/s_atanf.S (flt_min): New object.
(MO): New macro.
(__atanf): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/ieee754/dbl-64/e_atan2.c: Include <float.h> and
<math.h>.
(__ieee754_atan2): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/dbl-64/s_atan.c: Include <float.h> and
<math_private.h>.
(atan): Force underflow exception for results with small absolute
value.
* sysdeps/ieee754/flt-32/s_atanf.c: Include <float.h>.
(__atanf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Include <float.h> and
<math.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Include <float.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/x86/fpu/bits/mathinline.h
[!__SSE2_MATH__ && !__x86_64__ && __LIBC_INTERNAL_MATH_INLINES]
(__ieee754_atan2): Only define inline for long double.
* sysdeps/x86_64/fpu/multiarch/e_atan2.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Include <math.h>.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 15319. Add more tests of atan2.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (casin_test_data): Do not mark underflow
exceptions as possibly missing for bug 15319.
(casinh_test_data): Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Update.
posix_spawn (a standard POSIX function) brings in a use of getrlimit64
(not a standard POSIX function). This patch fixes this by using
__getrlimit64 and making getrlimit64 a weak alias.
This is more complicated than some such changes because of files that
define getrlimit64 in their own way using symbol versioning after
including the main sysdeps/unix/sysv/linux/getrlimit64.c with a
getrlimit macro defined. There are various existing patterns for such
cases in glibc; the one I've used here is that a getrlimit64 macro
disables the weak_alias / libc_hidden_weak calls, leaving it to the
including file to define the getrlimit64 name in whatever way is
appropriate.
Tested for x86_64 and x86 that installed stripped shared libraries are
unchanged by this patch.
[BZ #17991]
* include/sys/resource.h (__getrlimit64): Declare. Use
libc_hidden_proto.
* resource/getrlimit64.c (getrlimit64): Rename to __getrlimit64
and define as weak alias of __getrlimit64. Use libc_hidden_weak.
* sysdeps/posix/spawni.c (__spawni): Call __getrlimit64 instead of
getrlimit64.
* sysdeps/unix/sysv/linux/getrlimit64.c (getrlimit64): Rename to
__getrlimit64.
[!getrlimit64] (getrlimit64): Define as weak alias of
__getrlimit64. Use libc_hidden_weak.
* sysdeps/unix/sysv/linux/i386/getrlimit64.c (getrlimit64): Define
using __getrlimit64 not __new_getrlimit64.
(__GI_getrlimit64): Likewise.
* sysdeps/unix/sysv/linux/mips/getrlimit64.c (getrlimit64):
Likewise.
(__GI_getrlimit64): Likewise.
(__old_getrlimit64): Use __getrlimit64 not __new_getrlimit64.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/syscalls.list
(getrlimit): Add __getrlimit64 alias.
* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (getrlimit):
Likewise.
* conform/Makefile (test-xfail-XOPEN2K/spawn.h/linknamespace):
Remove variable.
(test-xfail-POSIX2008/spawn.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/spawn.h/linknamespace): Likewise.
ia64 seems to use the same implementation of low-level locks as the
generic Linux lowlevellock.h. The futex syscalls are somewhat
different, but Roland thought it shouldn't matter. Note that the futex
calls are on the slow path always (except for PI mutexes).
Removing the custom low-level lock implementation will make further
refactoring easier, for example adding proper error checking to futex
operations.
Various remquo implementations produce a zero remainder with the wrong
sign (a zero remainder should always have the sign of the first
argument, as specified in IEEE 754) in round-downward mode, resulting
from the sign of 0 - 0. This patch checks for zero results and fixes
their sign accordingly.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #17987]
* sysdeps/ieee754/dbl-64/s_remquo.c (__remquo): Ensure sign of
zero result does not depend on the sign resulting from
subtraction.
* sysdeps/ieee754/dbl-64/wordsize-64/s_remquo.c (__remquo):
Likewise.
* sysdeps/ieee754/flt-32/s_remquof.c (__remquof): Likewise.
* sysdeps/ieee754/ldbl-128/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-96/s_remquol.c (__remquol): Likewise.
* math/libm-test.inc (remquo_test_data): Add more tests.
Various remquo implementations, when computing the last three bits of
the quotient, have spurious overflows when 4 times the second argument
to remquo overflows. These overflows can in turn cause bad results in
rounding modes where that overflow results in a finite value. This
patch adds tests to avoid the problem multiplications in cases where
they would overflow, similar to those that control an earlier
multiplication by 8.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #17978]
* sysdeps/ieee754/dbl-64/s_remquo.c (__remquo): Do not form
products 4 * y and 2 * y where those would overflow.
* sysdeps/ieee754/dbl-64/wordsize-64/s_remquo.c (__remquo):
Likewise.
* sysdeps/ieee754/flt-32/s_remquof.c (__remquof): Likewise.
* sysdeps/ieee754/ldbl-128/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-96/s_remquol.c (__remquol): Likewise.
* math/libm-test.inc (remquo_test_data): Add more tests.
I see an error
../sysdeps/mips/memcpy.S:209:68: error: "_ABIO64" is not defined [-Werror=undef]
#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABIO32) || (_MIPS_SIM == _ABIO64))
^
cc1: some warnings being treated as errors
in MIPS builds. This patch arranges for _ABIO64 to be defined with
the same value as GCC uses when building for O64 (the ABI itself isn't
supported by glibc, but defining the macro seems the simplest way of
avoiding the error in code that may be shared with other C libraries).
* sysdeps/mips/sgidefs.h [!_ABIO64] (_ABIO64): New macro.
I see an error
../sysdeps/mips/strcmp.S:25:7: error: "_COMPILING_NEWLIB" is not defined [-Werror=undef]
#elif _COMPILING_NEWLIB
^
cc1: some warnings being treated as errors
in MIPS builds. (This is with GCC 4.9; it's possible that the DR#412
change in GCC 5 - see
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60570> - means that
-Wundef diagnostics no longer occur for #elif conditions where a
previous group's condition was true, just as with other errors there.)
This patch duly adjusts the conditionals to test whether
_COMPILING_NEWLIB is defined.
* sysdeps/mips/memcpy.S [_COMPILING_NEWLIB]: Change condition to
[defined _COMPILING_NEWLIB].
* sysdeps/mips/memset.S [_COMPILING_NEWLIB]: Likewise.
* sysdeps/mips/strcmp.S [_COMPILING_NEWLIB]: Likewise.
I see an error
In file included from ../sysdeps/mips/include/sys/asm.h:20:0,
from ../sysdeps/mips/start.S:39:
../sysdeps/mips/sys/asm.h:421:5: error: "__mips_isa_rev" is not defined [-Werror=undef]
#if __mips_isa_rev < 6
^
cc1: some warnings being treated as errors
in MIPS builds. As sys/asm.h is an installed header, it seems better
to test for !defined __mips_isa_rev here, instead of defining it to 0
as done in sysdeps/unix/mips/sysdep.h, to avoid perturbing any code
outside glibc that tests whether __mips_isa_rev is defined; this patch
does so.
* sysdeps/mips/sys/asm.h [__mips_isa_rev < 6]: Change condition to
[!defined __mips_isa_rev || __mips_isa_rev < 6].
Remove IA64 PAGE_SIZE related macros as PAGE_SIZE is not defined.
Also remove macros that are only used for BFD's trad-core support
which is not relavant for IA64 according to the thread starting
here:
https://sourceware.org/ml/libc-ports/2013-11/msg00028.html
This patch is neither built nor tested but is equivalent to a MIPS
patch for the same fix.
The dbl-64/wordsize-64 remquo implementation follows similar logic to
various other implementations, but where that logic computes some
absolute values, it wrongly uses a previously computed bit-pattern for
the absolute value of the first argument, where actually it needs the
absolute value of the first argument mod 8 times the second. This
patch fixes it to compute the correct absolute value.
The integer quotient result of remquo is only specified mod 8
(including its sign); architecture-specific versions may well vary in
what results they give for higher bits of that result (and indeed bug
17569 gives an example correct result from __builtin_remquo giving 9
for that result, where the particular glibc implementation used in
that bug report would give 1 after this fix). Thus, this patch adapts
the tests of remquo to test that result only mod 8, to allow for such
variation when tests with higher quotient are included.
Tested for x86_64 and x86.
[BZ #17569]
* sysdeps/ieee754/dbl-64/wordsize-64/s_remquo.c (__remquo):
Compute absolute value of x as modified by fmod, not original
value of x.
* math/libm-test.inc (RUN_TEST_ffI_f1): Rename to
RUN_TEST_ffI_f1_mod8. Check extra return value mod 8.
(RUN_TEST_LOOP_ffI_f1): Rename to RUN_TEST_LOOP_ffI_f1_mod8. Call
RUN_TEST_ffI_f1_mod8.
(remquo_test_data): Add more tests.
Similarly to sqrt in
<https://sourceware.org/ml/libc-alpha/2015-02/msg00353.html>, the
powerpc sqrtf implementation for when _ARCH_PPCSQ is not defined also
relies on a * b + c being contracted into a fused multiply-add.
Although this contraction is not explicitly disabled for e_sqrtf.c, it
still seems appropriate to make the file explicit about its
requirements by using __builtin_fmaf; this patch does so.
Furthermore, it turns out that doing so fixes the observed inaccuracy
and missing exceptions (that is, that without explicit __builtin_fmaf
usage, it was not being compiled as intended).
Tested for powerpc32 (hard float).
[BZ #17967]
* sysdeps/powerpc/fpu/e_sqrtf.c (__slow_ieee754_sqrtf): Use
__builtin_fmaf instead of relying on contraction of a * b + c.
As Adhemerval noted in
<https://sourceware.org/ml/libc-alpha/2015-01/msg00451.html>, the
powerpc sqrt implementation for when _ARCH_PPCSQ is not defined is
inaccurate in some cases.
The problem is that this code relies on fused multiply-add, and relies
on the compiler contracting a * b + c to get a fused operation. But
sysdeps/ieee754/dbl-64/Makefile disables contraction for e_sqrt.c,
because the implementation in that directory relies on *not* having
contracted operations.
While it would be possible to arrange makefiles so that an earlier
sysdeps directory can disable the setting in
sysdeps/ieee754/dbl-64/Makefile, it seems a lot cleaner to make the
dependence on fused operations explicit in the .c file. GCC 4.6
introduced support for __builtin_fma on powerpc and other
architectures with such instructions, so we can rely on that; this
patch duly makes the code use __builtin_fma for all such fused
operations.
Tested for powerpc32 (hard float).
2015-02-12 Joseph Myers <joseph@codesourcery.com>
[BZ #17964]
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrt): Use
__builtin_fma instead of relying on contraction of a * b + c.
This patch fixes the remaining part of bug 16560, spurious underflows
from exp2 of arguments close to 0 (when the result is close to 1, so
should not underflow), by just using 1+x instead of a more complicated
calculation when the argument is sufficiently small.
Tested for x86_64, x86 and mips64.
[BZ #16560]
* math/e_exp2l.c [LDBL_MANT_DIG == 106] (LDBL_EPSILON): Undefine
and redefine.
(__ieee754_exp2l): Do not multiply small fractional parts by
M_LN2l.
* sysdeps/i386/fpu/e_exp2l.S (__ieee754_exp2l): Just add 1 to
small argument.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise.
* sysdeps/x86_64/fpu/e_exp2l.S (__ieee754_exp2l): Likewise.
* math/auto-libm-test-in: Add more tests of exp2.
* math/auto-libm-test-out: Regenerated.
This patch optimizes strncpy for power7 for unaligned source or
destination address. The source or destination address is aligned
to doubleword and data is shifted based on the alignment and
added with the previous loaded data to be written as a doubleword.
For each load, cmpb instruction is used for faster null check.
The new optimization shows 10 to 70% of performance improvement
for longer string though it does not show big difference on string
size less than 16 due to additional checks.Hence this new algorithm
is restricted to string greater than 16.
This patch makes sincos set errno to EDOM when passed an infinity,
similarly to sin and cos.
Tested for x86_64, x86, powerpc and mips64. I don't know if the
architecture-specific implementations for ia64 and m68k might need
corresponding fixes.
2015-02-11 Joseph Myers <joseph@codesourcery.com>
[BZ #15467]
* sysdeps/ieee754/dbl-64/s_sincos.c: Include <errno.h>.
(__sincos): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/flt-32/s_sincosf.c: Include <errno.h>.
(SINCOSF_FUNC): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-128/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-128ibm/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-96/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* math/libm-test.inc (sincos_test_data): Test errno setting.
As noted in
<https://sourceware.org/ml/libc-alpha/2014-10/msg00369.html>, soft-fp
sysdeps subdirectories (and more generally, subdirectories where
sysdeps/foo/Implies contains foo/bar) are unnecessary and should be
eliminated. This patch does so for MIPS.
Tested for MIPS64 (all three ABIs, soft-float) that installed stripped
shared libraries are unchanged by this patch.
* sysdeps/mips/soft-fp/sfp-machine.h: Move to ....
* sysdeps/mips/mips32/sfp-machine.h: ... here.
* sysdeps/mips/mips64/soft-fp/Makefile: Move to ....
* sysdeps/mips/mips64/Makefile: ... here.
* sysdeps/mips/mips64/soft-fp/e_sqrtl.c: Move to ....
* sysdeps/mips/mips64/e_sqrtl.c: ... here.
* sysdeps/mips/mips64/soft-fp/sfp-machine.h: Move to ....
* sysdeps/mips/mips64/sfp-machine.h: ... here.
* sysdeps/mips/mips32/Implies: Remove mips/soft-fp.
* sysdeps/mips/mips64/n32/Implies: Remove mips/mips64/soft-fp.
* sysdeps/mips/mips64/n64/Implies: Likewise.
Current minimum binutils supported (2.22) has ".machine altivec" support
as default, so there is no need to add a configure check for such
functionality. This patches removes the configure checks for it.
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation. The patch removes the internal IFUNC for wordcopy
symbols and uses local branches in the memmmove optimization instead.
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation.
This change by removing then and used the optimized memmove instead
for supported chips.
This patch simplify the default bcopy symbol for powerpc64 by just using
memmove instead of implementing using the default bcopy. Since the
symbol is deprecated, it trades speed by code size.