This patch changes power7 memcpy to use VSX instructions only when
memory is aligned to quardword. It is to avoid unaligned kernel traps
on non-cacheable memory (for instance, memory-mapped I/O).
This patch adds an optimized memmove optimization for POWER7/powerpc64.
Basically the idea is to use the memcpy for POWER7 on non-overlapped
memory regions and a optimized backward memcpy for memory regions
that overlap (similar to the idea of string/memmove.c).
The backward memcpy algorithm used is similar the one use for memcpy for
POWER7, with adjustments done for alignment. The difference is memory
is always aligned to 16 bytes before using VSX/altivec instructions.
This patch removes the powerpc specific logic in memmove and instead
include default implementation with MEMCPY_OK_FOR_FWD_MEMMOVE defined.
This lead in a increase performance, since the constraints to use
memcpy in powerpc code are too restrictive and memcpy can be used for
any forward memmove.
Merge most of the gnulib implementation of memchr. The changes that
remain are:
- copyright header
- bp-sym.h removed
- reg_char removed
- allow MEMCHR to be redefined
- non-conforming whitespace changes
The merged code fixes a number of -Wundef warnings and also introduces
an optimized algorithm. I haven't detected any performance difference
in the new code which I believe is down to the quite specific
circumstances required to hit it. However the new code is approximately
half the size of the old code on AArch64 (which uses generic memchr).
ChangeLog:
2014-07-04 Will Newton <will.newton@linaro.org>
* string/memchr.c: Merge from gnulib.
[_LIBC]: Remove conditionals.
(__ptr_t): Remove define.
(LONG_MAX_32_BITS): Likewise.
(LONG_MAX): Likewise.
(MEMCHR): Use ANSI prototype and optimize algorithm.
The original implementation was written for EV5, which does not
record inexact in the status register for /SU (but no /I) insns.
But EV6 does record the inexact status; the lack of /I simply
means that the exception is suppressed.
Adding feholdexcept becomes the bulk of the overhead, so we might
as well use the default implementation.
Two bugs in these implementations: First is that the add of 0.5
was not done in chopped rounding mode (easily fixable). Second
is that the method generates incorrect inexact exceptions for
small integral values (not easily fixable).
This test case is very, especially on targets using soft-float or QEMU
(where soft-float is used internally), and appears to be the only such
outlier. Therefore rather than requiring to have TIMEOUTFACTOR set
large enough globally, bump up the local scaling factor instead.
* stdlib/tst-strtod-overflow.c (TIMEOUT): Bump up to 30.
c4c4124473 added the _Noreturn macro for
pre-C11 compilers, but it now throws a new Wundef warning during `make
check` for __STDC_VERSION__ which gcc does not define by default. The
following patch fixes this in line with other uses of __STDC_VERSION__
in the file.
The PAGE_COPY_THRESHOLD macro is meant to be overridden by
architecture-specific pagecopy.h, but it is currently done only by
mach; all other architectures use the default. Check to see if the
macro is defined in addition to whether it is set to a non-zero value.
This patch adds an ifunc power7 strcat symbol that uses the logic on
sysdeps/powerpc/strcat.c but call power7 strlen/strcpy symbols instead
of default ones.
Merge the latest version of the obstack.c and obstack.h files
from gnulib. The majority of this change is coding style and
cosmetic comment changes but it also fixes a -Wundef warning
in the build as a side effect.
2014-07-02 Will Newton <will.newton@linaro.org>
* malloc/obstack.c: Merge from gnulib master.
[HAVE_CONFIG_H]: Remove conditional code.
[!_LIBC]: Include config.h.
[!ELIDE_CODE]: Don't include inttypes.h, include
stdint.h unconditionally.
(print_and_abort): Mark as _Noreturn.
(_obstack_allocated_p): Mark as __attribute_pure__.
(obstack_free): Rename to __obstack_free.
[!__attribute__]: Remove conditional code.
* malloc/obstack.h: Merge from gnulib master.
[__cplusplus]: Move conditional down.
[!__attribute_pure__]: Define __attribute_pure__ here
if it is not already defined.
(_obstack_memory_used): Mark as __attribute_pure__.
[!__obstack_free]: Define as obstack_free.
[__GNUC__]: Remove check for ancient NeXT gcc.
This commit removes the aio_cancel and aio_cancel64 symbols at
GLIBC_2.3 from the ABI baseline. The ABI baseline is now complete
for hppa and considered stable.
The following ABI baselines were tested against several old releases
of debian and gentoo. Several problems were discovered and fixed as
part of developing the ABI baselines.
Firstly, libBrokenLocale on gentoo exports __ctype_get_mb_cur_max
as @@GLIBC_2.0, but it should be @@GLIBC_2.2 since that's the minimum
version defined in shlib-versions for hppa. I don't know when this
broke, but master properly parses hppa's shlib-versions which clearly
lists libBrokenLocale as defaulting to GLIBC_2.2. Therefore I'm
accepting GLBIC_2.2 as the correct version for this symbol and setting
the baseline to that, despite the fact that the present distribution
is wrong. I don't expect that any new applications should be using
libBrokenLocale, so it should match the oldest behaviour which is to
export a GLIBC_2.2 symbol. For example in debian's 2.7 has it at
version GLIBC_2.2.
Secondly, aio_cancel and aio_cancel64 previously had a compat symbol
at version @GLIBC_2.1 with a new symbol at @@GLIBC_2.3[1]. During the
Linuxthreads to NPTL transition the file aio_cancel.c was lost for hppa
and that resulted in just @@GLIBC_2.1 versions of these symbols being
exported. The @@GLIBC_2.1 version works correctly and uses the right
value of ECANCELLED. Therefore if I were to fix this today it might
break correctly working applications using aio_cancel*@GLIBC_2.1 by
causing those to use the old aio_cancel that used the older value
of ECANCELLED. Thus the best option is to accept that the ABI changed
and ignore older applications in favour of newer applications. The
best thing to do is cleanup the version files (included in the patch).
The rest of the ABI was as expected (ignoring __p_type_syms size
change in 2008).