Now that MEMCPY_OK_FOR_FWD_MEMMOVE should be define on memcopy.h there
is no need to specialized powerpc memmove implementation. This patch
moves the define set to powerpc memcopy and cleanup its definition on
powerpc code.
This patch changes power7 memcpy to use VSX instructions only when
memory is aligned to quardword. It is to avoid unaligned kernel traps
on non-cacheable memory (for instance, memory-mapped I/O).
This patch adds an optimized memmove optimization for POWER7/powerpc64.
Basically the idea is to use the memcpy for POWER7 on non-overlapped
memory regions and a optimized backward memcpy for memory regions
that overlap (similar to the idea of string/memmove.c).
The backward memcpy algorithm used is similar the one use for memcpy for
POWER7, with adjustments done for alignment. The difference is memory
is always aligned to 16 bytes before using VSX/altivec instructions.
This patch removes the powerpc specific logic in memmove and instead
include default implementation with MEMCPY_OK_FOR_FWD_MEMMOVE defined.
This lead in a increase performance, since the constraints to use
memcpy in powerpc code are too restrictive and memcpy can be used for
any forward memmove.
The PAGE_COPY_THRESHOLD macro is meant to be overridden by
architecture-specific pagecopy.h, but it is currently done only by
mach; all other architectures use the default. Check to see if the
macro is defined in addition to whether it is set to a non-zero value.
This patch adds an ifunc power7 strcat symbol that uses the logic on
sysdeps/powerpc/strcat.c but call power7 strlen/strcpy symbols instead
of default ones.
shlib-versions files can contain ABI lines that map triplets to a
canonical ABI name. This name was once used for various purposes
where test baseline files for different ABIs went in a single
directory; now these purposes use sysdeps files, generation of headers
which have per-ABI variants uses abi-variants and related Makefile
variables and the shlib-versions ABI names are unused. This patch
duly removes those lines and associated build system support for them.
Tested for x86_64 (both a full testsuite run and confirming the
installed shared libraries are unchanged by the patch).
* Makeconfig ($(common-objpfx)soversions.mk): Do not generate
abi-name definition.
* scripts/soversions.awk: Do not handle or generate ABI lines.
* shlib-versions: Remove ABI entries.
* sysdeps/powerpc/nofpu/shlib-versions: Remove file.
* sysdeps/x86_64/x32/shlib-versions: Remove ABI entry.
This patch defines ELF_MACHINE_NO_RELA on all architectures. Tested
only on x86_64 to verify that the sources before and after are
identical except for two instructions that pass the current line
number in dl-machine.h to assert_fail.
This patch makes non-ex-ports architectures set base_machine and
machine based on the original configured machine value in preconfigure
fragments, like ex-ports architectures, rather than in the toplevel
configure.ac.
Tested x86 that the disassembly of installed shared libraries is
unchanged by the patch.
* configure.ac (base_machine): Do not set specially for particular
machines here.
* configure: Regenerated.
* sysdeps/powerpc/preconfigure: Move machine and base_machine
settings from configure.ac.
* sysdeps/i386/preconfigure: New file.
* sysdeps/s390/preconfigure: Likewise.
* sysdeps/sh/preconfigure: Likewise.
* sysdeps/sparc/preconfigure: Likewise.
Linux commit dd58a092c4202f2bd490adab7285b3ff77f8e467 added the
PPC_FEATURE2_VEC_CRYPTO auvx capability to indicate whether to
hardware supports vector crypto hardware instructions. This patch
adds its definition to powerpc hwcap bits.
This patch makes files using __ASSUME_* macros include
<kernel-features.h> explicitly, rather than relying on some other
header (such as tls.h, lowlevellock.h or pthreadP.h) to include it
implicitly. (I omitted cases where I've already posted or am testing
the patch that stops the file from needing __ASSUME_* at all.) This
accords with the general principle of making source files include the
headers for anything they use, and also helps make it safe to remove
<kernel-features.h> includes from any file that doesn't use
__ASSUME_* (some of those may be stray includes left behind after
increasing the minimum kernel version, others may never have been
needed or may have become obsolete after some other change).
Tested x86_64 that the disassembly of installed shared libraries is
unchanged by this patch.
* nptl/pthread_cond_wait.c: Include <kernel-features.h>.
* nptl/pthread_rwlock_timedrdlock.c: Likewise.
* nptl/pthread_rwlock_timedwrlock.c: Likewise.
* nptl/sysdeps/unix/sysv/linux/lowlevelrobustlock.c: Likewise.
* nscd/nscd.c: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym: Likewise.
* sysdeps/powerpc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sh/nptl/tcb-offsets.sym: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym: Likewise.
Optimization is achieved on 8 byte aligned strings with double word
comparison using cmpb instruction. On unaligned strings loop unrolling
is applied for Power7 gain.
This patch fixes the optimized ppc64/power7 strncat strlen call for
static build without ifunc enabled. The strlen symbol to call in such
situation is just strlen, instead of __GI_strlen (since the __GI_
alias is just created for shared objects).
This patch fixes a similar issue to
736c304a1a, where for PPC32 if the symbol
is defined as hidden (memchr) then compiler will create a local branc
(symbol@local) and the linker will not create a required PLT call to
make the ifunc work. It changes the default hidden symbol (__GI_memchr)
to default memchr symbol for powerpc32 (__memchr_ppc32).
This patch fixes the __copysignf optimized macro meant to internal libm
usage when used with constant value. Without the explicit cast to
float, if it is used with const double value (for instance, on
s_casinhf.c) double constants will be used and it may lead to precision
issues in some algorithms.
It fixes the following failures on PPC64/POWER7:
Failure: Test: Real part of: cacos_downward (inf + 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf - 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf + 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf - 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf + 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf - 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf + 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf - 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
The optimization is achieved by following techniques:
> data alignment [gain from aligned memory access on read/write]
> POWER7 gains performance with loop unrolling/unwinding
[gain by reduction of branch penalty].
> zero padding done by calling optimized memset
This patch changes de default symbol redirection for internal call of
memcpy, memset, memchr, and strlen to the IFUNC resolved ones. The
performance improvement is noticeable in algorithms that uses these
symbols extensible, like the regex functions.
This patch optimizes the FPSCR update on exception and rounding change
functions by just updating its value if new value if different from
current one. It also optimizes fedisableexcept and feenableexcept by
removing an unecessary FPSCR read.
This patch fixes some powerpc32 and powerpc64 builds with
--disable-multi-arch option along with different --with-cpu=powerN.
It cleanups the Implies directories by removing the multiarch
folder for non multiarch config and also fixing two assembly
implementations: powerpc64/power7/strncat.S that is calling the
wrong strlen; and power8/fpu/s_isnan.S that misses the hidden_def and
weak_alias directives.
This patch fixes the powerpc32 optimized nearbyint/nearbyintf bogus
results for FE_DOWNWARD rounding mode. This is due wrong instructions
sequence used in the rounding calculation (two subtractions instead of
adition and a subtraction).
Fixes BZ#16815.
This patch add an optimized strpbrk for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance on
the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and memory
clear using VSX instructions.
This patch add a optimized strcspn for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance
on the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and align
stack memory to table to 16 bytes (so VSX clean can ran without
alignment issues).
The roundl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_roundl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_roundl.c instead fixes the failing math.
This fixes 16707.
The nearbyintl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_nearbyintl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c instead fixes the failing
math.
Fixes BZ#16706.
The ceill assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_ceill.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_ceill.c instead fixes the failing math.
Fixes BZ#16701.
This patch fixes an issue for powerpc32-fpu static build which fails
with an 'bzero' undefined reference. This patch adds bzero ifunc selector
for static builds and fixes the '__bzero_ppc' reference to default
memset symbol (since static memset build does not provide ifunc
selector).
Fixes BZ#16689.
This patch fixes an issue for powerpc64[le] static build where __bzero
is definied in multiple places (memset-ppc64.o and bzero.o). It is now
defined only in bzero.o and memset-ppc64.o only defined __bzero_ppc for
both dynamic and static library.
Fixes BZ#16683.
The optimization is achieved by following techniques:
> hashing of needle.
> hashing avoids scanning of duplicate entries in needle across the string.
> initializing the hash table with Vector instructions (VSX) by quadword access.
> unrolling when scanning for character in string across hash table.
The optimization is achieved by following techniques:
1. Doubleword aligned memory access and compares using
cmpb instruction.
2. Loop unrolling for byte load/store.
3. CPU pre-fetch to avoid cache miss.
This patch fix the optimized powerpc-fpu modf/modff implementation
when using in non-default rounding mode where the zero sign is not
as expected. It fixes the libm testsuite tests
modf_downward (0) == 0.00000000000000000000e+00
modf_downward (20) == 0.00000000000000000000e+00
modf_downward (21) == 0.00000000000000000000e+00
Where the sign returned was negative.
As recently discussed
<https://sourceware.org/ml/libc-alpha/2014-02/msg00670.html>, it
doesn't seem particularly useful for libm-test-ulps files to contain
huge amounts of data on ulps for individual tests; just the global
maximum observed ulps for each function, together with the
verification of exceptions, errno and special results such as
infinities and NaNs for each test, suffices to verify that a
function's behavior on the given test inputs is within the expected
accuracy. Removing this data reduces source tree churn caused by
updates to these files when libm tests are added, and reduces the
frequency with which testsuite additions actually need libm-test-ulps
changes at all.
Accordingly, this patch removes that data, so that individual tests
get checked against the global bounds for the given function and only
generate an error if those are exceeded. Tested x86_64 (including
verifying that if an ulps value is artificially reduced, the tests do
indeed fail as they should and "make regen-ulps" generates the
expected changes).
* math/libm-test.inc (struct ulp_data): Don't refer to ulps for
individual tests in comment.
(libm-test-ulps.h): Don't refer to test_ulps in #include comment.
(prev_max_error): New variable.
(prev_real_max_error): Likewise.
(prev_imag_max_error): Likewise.
(compare_ulp_data): Don't refer to test names in comment.
(find_test_ulps): Remove function.
(find_function_ulps): Likewise.
(find_complex_function_ulps): Likewise.
(init_max_error): Take function name as argument. Look up ulps
for that function.
(print_ulps): Remove function.
(print_max_error): Use prev_max_error instead of calling
find_function_ulps.
(print_complex_max_error): Use prev_real_max_error and
prev_imag_max_error instead of calling find_complex_function_ulps.
(check_float_internal): Take max_ulp parameter instead of calling
find_test_ulps. Don't call print_ulps.
(check_float): Update call to check_float_internal.
(check_complex): Update calls to check_float_internal.
(START): Pass argument to init_max_error.
* math/gen-libm-test.pl (%results): Don't include "kind"
information.
(parse_ulps): Don't handle ulps of individual tests.
(print_ulps_file): Likewise.
(output_ulps): Likewise.
* math/README.libm-test: Update.
* manual/libm-err-tab.pl (parse_ulps): Don't handle ulps of
individual tests.
* sysdeps/aarch64/libm-test-ulps: Remove individual test ulps.
* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
* sysdeps/arm/libm-test-ulps: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/ia64/fpu/libm-test-ulps: Likewise.
* sysdeps/m68k/coldfire/fpu/libm-test-ulps: Likewise.
* sysdeps/m68k/m680x0/fpu/libm-test-ulps: Likewise.
* sysdeps/microblaze/libm-test-ulps: Likewise.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
* sysdeps/s390/fpu/libm-test-ulps: Likewise.
* sysdeps/sh/libm-test-ulps: Likewise.
* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
* sysdeps/tile/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* sysdeps/hppa/fpu/libm-test-ulps: Remove individual test ulps.
This patch optimizes strrchr() for ppc64. It uses aligned memory
access along with cmpb instruction and CPU prefetch to avoid
cache misses for speed improvement.
This patch add a optimized llround/llroundf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized llrint/llrintf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized finite/finitef implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized isinf/isinff implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized isnan/isnanf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
elf/tst-auxv.c includes misc/sys/auxv.h, which ends up not actually
being included due to the guard overlap, and getauxval becomes an
implicit declaration and implicit pointer conversion which means, at
best, the test isn't actually testing what it thinks it is and, at
worst, it'll crash and burn on platforms where implict pointer
conversion is a Very Bad Thing.
* sysdeps/powerpc/bits/hwcap.h: Allow _SYSDEPS_SYSDEP_H guard as a
synonym for _SYS_AUXV_H to allow direct inclusion.
* sysdeps/sparc/bits/hwcap.h: Likewise.
* sysdeps/powerpc/sysdep.h: Define _SYSDEPS_SYSDEP_H instead of
_SYS_AUXV_H so we can include sysdep.h and sys/auxv.h together.
* sysdeps/sparc/sysdep.h: Likewise.
IEEE 754-2008 defines two ways in which tiny results can be detected,
"before rounding" (based on the infinite-precision result) and "after
rounding" (based on the result when rounded to normal precision as if
the exponent range were unbounded). All binary operations on an
architecture must use the same choice of how tininess is detected.
soft-fp has so far implemented only before-rounding tininess
detection. This patch adds support for after-rounding tininess
detection. A new macro _FP_TININESS_AFTER_ROUNDING is added that
sfp-machine.h must define (soft-fp is meant to be self-contained so
the existing tininess.h files aren't used here, though the information
going in sfp-machine.h has been taken from them). The soft-fp macros
dealing with raising underflow exceptions then handle the cases where
the choice matters specially, rounding a copy of the input to the
appropriate precision to see if a value that's tiny before rounding
isn't tiny after rounding.
Tested for mips64 using GCC trunk (which now uses soft-fp on MIPS, so
supporting exceptions and rounding modes for long double where not
previously supported - this is the immediate motivation for doing this
patch now) together with (a) a patch to sysdeps/mips/math-tests.h to
enable exceptions / rounding modes tests for long double for GCC 4.9
and later, and (b) corresponding changes applied to libgcc's soft-fp
and sfp-machine.h files. In the libgcc context this is also tested on
x86_64 (also an after-rounding architecture) with testcases for
__float128 that I intend to add to the GCC testsuite when updating
soft-fp there.
(To be clear: this patch does not fix any glibc bugs that were
user-visible in past releases, since after-rounding architectures
didn't use soft-fp in any affected case with support for
floating-point exceptions - so there is no corresponding Bugzilla bug.
Rather, it works together with the GCC changes to use soft-fp on MIPS
to allow previously absent long double functionality to work properly,
and allows soft-fp to be used in glibc on after-rounding architectures
in cases where it couldn't previously be used.)
* soft-fp/op-common.h (_FP_DECL): Mark exponent as possibly
unused.
(_FP_PACK_SEMIRAW): Determine tininess based on rounding shifted
value if _FP_TININESS_AFTER_ROUNDING and unrounded value is in
subnormal range.
(_FP_PACK_CANONICAL): Determine tininess based on rounding to
normal precision if _FP_TININESS_AFTER_ROUNDING and unrounded
value has largest subnormal exponent.
* soft-fp/soft-fp.h [FP_NO_EXCEPTIONS]
(_FP_TININESS_AFTER_ROUNDING): Undefine and redefine to 0.
* sysdeps/aarch64/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): New macro.
* sysdeps/alpha/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/arm/soft-fp/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING):
Likewise.
* sysdeps/mips/mips64/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/mips/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/powerpc/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/sh/soft-fp/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING):
Likewise.
* sysdeps/sparc/sparc32/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/sparc/sparc64/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/tile/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING):
Likewise.
This patch creates implicit rules to match the abifiles if
abilist-pattern is defined in the architecture Makefile. This allows
machine specific Makefiles to define different abifiles names
(for instance *-le.abilist for powerpc64le).
The truncl assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_truncl.S)
returns wrong results for some inputs where first double is a exact integer
and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_truncl.c instead it fixes tgammal
issues regarding wrong result sign.
This patch fixes bug 16390, incorrect signs of zero results from
ldbl-128ibm atan2l, soft-float only. The problem is a longstanding
GCC bug with fabsl not being correct for signed zero for soft float,
and the fix is using -fno-builtin-fabsl as a workaround, as already
done for various other source files. Tested powerpc-nofpu.
* sysdeps/powerpc/nofpu/Makefile [$(subdir) = math]
(CFLAGS-e_atan2l.c): Use -fno-builtin-fabsl.
sysdeps/powerpc/powerpc32/libgcc-compat.S makes certain symbols that
glibc once accidentally reexported from libgcc into compat symbols.
Where the exports were purely accidental, this is the right thing to
do. However, for powerpc-nofpu the soft-fp symbols are deliberately
exported from libc, given public versions in
sysdeps/powerpc/nofpu/Versions and used by libm in preference to the
libgcc versions that do not support the software exceptions and
rounding modes. The libc versions should also be usable by user
programs, though normally libgcc gets linked in first (meaning,
effectively, that the <fenv.h> functions are broken as regards their
expected effects on user arithmetic).
A longstanding todo item is to remove the functions in question from
libgcc (when built with recent enough glibc) - that is, remove them
from static libgcc and make them compat symbols in shared libgcc - so
that this works properly (this is one of the items mentioned at
<http://gcc.gnu.org/wiki/Software_floating_point> - parts of that page
are obviously out of date, but this item still applies). Doing this
requires first that the functions are actually available from libc for
new links, not just as compat symbols.
This patch stops the symbols in question being compat symbols for
powerpc-nofpu. The nofpu Versions entries for them are removed (the
symbols never were exported at GLIBC_2.3.2, only GLIBC_2.0, because
the compat symbols took precedence).
Tested powerpc-nofpu. The symbols are no longer compat symbols and
libm.so now properly gets undefined references to them (resolved to
libc.so) instead of the libgcc copies getting linked into libm as
before.
* sysdeps/powerpc/powerpc32/libgcc-compat.S
[_SOFT_FLOAT || __NO_FPRS__] (__fixdfdi_v_glibc20): Do not define
as a macro and a compat symbol.
[_SOFT_FLOAT || __NO_FPRS__] (__fixsfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__fixunsdfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__fixunssfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__floatdidf_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__floaddisf_v_glibc20): Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixdfdi): Do
not use .hidden.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixsfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixunsdfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixunssfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__floaddidf):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__floaddisf):
Likewise.
* sysdeps/powerpc/nofpu/Versions (libc): Remove __fixdfdi,
__fixsfdi, __fixunsdfdi, __fixunssfdi, __floatdidf and __floatdisf
from GLIBC_2.3.2.
For PPC64, all the wrappers at sysdeps are superfluous: they are
basically the same implementation from math/w_sqrt.c with the
'#ifdef _IEEE_LIBM'. And the power4 version just force the 'fsqrt'
instruction utilization with an inline assembly, which is already
handled by math_private.h __ieee754_sqrt implementation.
This patch add optimized __mpn_addmul, __mpn_addsub, __mpn_lshift, and
__mpn_mul_1 implementations for PowerPC64. They are originally from GMP
with adjustments for GLIBC.
This patch add static probes for setjmp/longjmp in the way gdb expects,fixing
the gdb.base/longjmp.exp gdb testcases.
It changes the symbol_name and use macros to to avoid change the probe names
and ending up adding more logic on GDB (since with the expected name
GDB work seamlessly).
The ELFv2 ABI changes the calling convention by passing and returning
structures in registers in more cases than the old ABI:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01145.htmlhttp://gcc.gnu.org/ml/gcc-patches/2013-11/msg01147.html
For the most part, this does not affect glibc, since glibc assembler
files do not use structure parameters / return values. However, one
place is affected: the LD_AUDIT interface provides a structure to
the audit routine that contains all registers holding function
argument and return values for the intercepted PLT call.
Since the new ABI now sometimes uses registers to return values
that were never used for this purpose in the old ABI, this structure
has to be extended. To force audit routines to be modified for the
new ABI if necessary, the patch defines v2 variants of the la_ppc64
types and routines.
In addition, the patch contains two unrelated changes to the
PLT trampoline routines: it fixes a bug where FPR return values
were stored in the wrong place, and it removes the unnecessary
save/restore of CR.
This updates glibc for the changes in the ELFv2 relating to the
stack frame layout. These are described in more detail here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.htmlhttp://gcc.gnu.org/ml/gcc-patches/2013-11/msg01146.html
Specifically, the "compiler and linker doublewords" were removed,
which has the effect that the save slot for the TOC register is
now at offset 24 rather than 40 to the stack pointer.
In addition, a function may now no longer necessarily assume that
its caller has set up a 64-byte register save area its use.
To address the first change, the patch goes through all assembler
files and replaces immediate offsets in instructions accessing the
ABI-defined stack slots by symbolic offsets. Those already were
defined in ucontext_i.sym and used in some of the context routines,
but that doesn't really seem like the right place for those defines.
The patch instead defines those symbolic offsets in sysdeps.h,
in two variants for the old and new ABI, and uses them systematically
in all assembler files, not just the context routines.
The second change only affected a few assembler files that used
the save area to temporarily store some registers. In those
cases where this happens within a leaf function, this patch
changes the code to store those registers to the "red zone"
below the stack pointer. Otherwise, the functions already allocate
a stack frame, and the patch changes them to add extra space in
these frames as temporary space for the ELFv2 ABI.
This is a follow-on to the previous patch to support the ELFv2 ABI in the
dynamic loader, split off into its own patch since it is just an optional
optimization.
In the ELFv2 ABI, most functions define both a global and a local entry
point; the local entry requires r2 to be already set up by the caller
to point to the callee's TOC; while the global entry does not require
the caller to know about the callee's TOC, but it needs to set up r12
to the callee's entry point address.
Now, when setting up a PLT slot, the dynamic linker will usually need
to enter the target function's global entry point. However, if the
linker can prove that the target function is in the same DSO as the
PLT slot itself, and the whole DSO only uses a single TOC (which the
linker will let ld.so know via a DT_PPC64_OPT entry), then it is
possible to actually enter the local entry point address into the
PLT slot, for a slight improvement in performance.
Note that this uncovered a problem on the first call via _dl_runtime_resolve,
because that routine neglected to restore the caller's TOC before calling
the target function for the first time, since it assumed that function
would always reload its own TOC anyway ...
This patch adds support for the ELFv2 ABI feature to remove function
descriptors. See this GCC patch for in-depth discussion:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01141.html
This mostly involves two types of changes: updating assembler source
files to the new logic, and updating the dynamic loader.
After the refactoring in the previous patch, most of the assembler source
changes can be handled simply by providing ELFv2 versions of the
macros in sysdep.h. One somewhat non-obvious change is in __GI__setjmp:
this used to "fall through" to the immediately following __setjmp ENTRY
point. This is no longer safe in the ELFv2 since ENTRY defines both
a global and a local entry point, and you cannot simply fall through
to a global entry point as it requires r12 to be set up.
Also, makecontext needs to be updated to set up registers according to
the new ABI for calling into the context's start routine.
The dynamic linker changes mostly consist of removing special code
to handle function descriptors. We also need to support the new PLT
and glink format used by the the ELFv2 linker, see:
https://sourceware.org/ml/binutils/2013-10/msg00376.html
In addition, the dynamic linker now verifies that the dynamic libraries
it loads match its own ABI.
The hack in VDSO_IFUNC_RET to "synthesize" a function descriptor
for vDSO routines is also no longer necessary for ELFv2.
This is the first patch to support the new ELFv2 ABI in glibc.
As preparation, this patch simply refactors some of the powerpc64 assembler
code to move all code related to creating function descriptors (.opd section)
or using function descriptors (function pointer call) into a central place
in sysdep.h.
Note that most locations creating .opd entries were already using macros
in sysdep.h, this patch simply extends this to the remaining places.
No relevant change in generated code expected.
This patch updates glibc in accordance with the binutils patch checked in here:
https://sourceware.org/ml/binutils/2013-10/msg00372.html
This changes the various R_PPC64_..._HI and _HA relocations to report
32-bit overflows. The motivation is that existing uses of @h / @ha
are to build up 32-bit offsets (for the "medium model" TOC access
that GCC now defaults to), and we'd really like to see failures at
link / load time rather than silent truncations.
For those rare cases where a modifier is needed to build up a 64-bit
constant, new relocations _HIGH / _HIGHA are supported.
The patch also fixes a bug in overflow checking for the R_PPC64_ADDR30
and R_PPC64_ADDR32 relocations.
This patch helps some math functions performance by adding the libc_fexxx
variant of inline functions to handle both FPU round and exception set/restore
and by using them on the libc_fexxx_ctx functions. It is based on already coded
fexxx family functions for PPC with fpu.
Here is the summary of performance improvements due this patch (measured on a
POWER7 machine):
Before:
cos(): ITERS:9.5895e+07: TOTAL:5116.03Mcy, MAX:77.6cy, MIN:49.792cy, 18744 calls/Mcy
exp(): ITERS:2.827e+07: TOTAL:5187.15Mcy, MAX:494.018cy, MIN:38.422cy, 5450.01 calls/Mcy
pow(): ITERS:6.1705e+07: TOTAL:5144.26Mcy, MAX:171.95cy, MIN:29.935cy, 11994.9 calls/Mcy
sin(): ITERS:8.6898e+07: TOTAL:5117.06Mcy, MAX:83.841cy, MIN:46.582cy, 16982 calls/Mcy
tan(): ITERS:2.9473e+07: TOTAL:5115.39Mcy, MAX:191.017cy, MIN:172.352cy, 5761.63 calls/Mcy
After:
cos(): ITERS:2.05265e+08: TOTAL:5111.37Mcy, MAX:78.754cy, MIN:24.196cy, 40158.5 calls/Mcy
exp(): ITERS:3.341e+07: TOTAL:5170.84Mcy, MAX:476.317cy, MIN:15.574cy, 6461.23 calls/Mcy
pow(): ITERS:7.6153e+07: TOTAL:5129.1Mcy, MAX:147.5cy, MIN:30.916cy, 14847.2 calls/Mcy
sin(): ITERS:1.58816e+08: TOTAL:5115.11Mcy, MAX:1490.39cy, MIN:22.341cy, 31048.4 calls/Mcy
tan(): ITERS:3.4964e+07: TOTAL:5114.18Mcy, MAX:177.422cy, MIN:146.115cy, 6836.68 calls/Mcy
Autoconf has been deprecating configure.in for quite a long time.
Rename all our configure.in and preconfigure.in files to .ac.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patch intends to unify both strcpy and stpcpy implementationsi
for PPC64 and PPC64/POWER7. The idead default powerpc64 implementation
is to provide both doubleword and word aligned memory access.
For PPC64/POWER7 is also provide doubleword and word memory access,
remove the branch hints, use the cmpb instruction for compare
doubleword/words, and add an optimization for inputs of same alignment.
* sysdeps/powerpc/powerpc32/dl-machine.c (__process_machine_rela):
Use stdint types in rather than __attribute__((mode())).
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00105.html
Like strnlen, memchr and memrchr had a number of defects fixed by this
patch as well as adding little-endian support. The first one I
noticed was that the entry to the main loop needlessly checked for
"are we done yet?" when we know the size is large enough that we can't
be done. The second defect I noticed was that the main loop count was
wrong, which in turn meant that the small loop needed to handle an
extra word. Thirdly, there is nothing to say that the string can't
wrap around zero, except of course that we'd normally hit a segfault
on trying to read from address zero. Fixing that simplified a number
of places:
- /* Are we done already? */
- addi r9,r8,8
- cmpld r9,r7
- bge L(null)
becomes
+ cmpld r8,r7
+ beqlr
However, the exit gets an extra test because I test for being on the
last word then if so whether the byte offset is less than the end.
Overall, the change is a win.
Lastly, memrchr used the wrong cache hint.
* sysdeps/powerpc/powerpc64/power7/memchr.S: Replace rlwimi with
insrdi. Make better use of reg selection to speed exit slightly.
Schedule entry path a little better. Remove useless "are we done"
checks on entry to main loop. Handle wrapping around zero address.
Correct main loop count. Handle single left-over word from main
loop inline rather than by using loop_small. Remove extra word
case in loop_small caused by wrong loop count. Add little-endian
support.
* sysdeps/powerpc/powerpc32/power7/memchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. Use proper
cache hint.
* sysdeps/powerpc/powerpc32/power7/memrchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Add little-endian
support. Avoid rlwimi.
* sysdeps/powerpc/powerpc32/power7/rawmemchr.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00104.html
One of the things I noticed when looking at power7 timing is that rlwimi
is cracked and the two resulting insns have a register dependency.
That makes it a little slower than the equivalent rldimi.
* sysdeps/powerpc/powerpc64/memset.S: Replace rlwimi with
insrdi. Formatting.
* sysdeps/powerpc/powerpc64/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memset.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00103.html
LIttle-endian support for memcpy. I spent some time cleaning up the
64-bit power7 memcpy, in order to avoid the extra alignment traps
power7 takes for little-endian. It probably would have been better
to copy the linux kernel version of memcpy.
* sysdeps/powerpc/powerpc32/power4/memcpy.S: Add little endian support.
* sysdeps/powerpc/powerpc32/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/mempcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. Make better
use of regs. Use power7 mtocrf. Tidy function tails.
http://sourceware.org/ml/libc-alpha/2013-08/msg00102.html
This is a rather large patch due to formatting and renaming. The
formatting changes were to make it possible to compare power7 and
power4 versions of memcmp. Using different register defines came
about while I was wrestling with the code, trying to find spare
registers at one stage. I found it much simpler if we refer to a reg
by the same name throughout a function, so it's better if short-term
multiple use regs like rTMP are referred to using their register
number. I made the cr field usage changes when attempting to reload
rWORDn regs in the exit path to byte swap before comparing when
little-endian. That proved a bad idea due to the pipelining involved
in the main loop; Offsets to reload the regs were different first
time around the loop.. Anyway, I left the cr field usage changes in
place for consistency.
Aside from these more-or-less cosmetic changes, I fixed a number of
places where an early exit path restores regs unnecessarily, removed
some dead code, and optimised one or two exits.
* sysdeps/powerpc/powerpc64/power7/memcmp.S: Add little-endian support.
Formatting. Consistently use rXXX register defines or rN defines.
Use early exit labels that avoid restoring unused non-volatile regs.
Make cr field use more consistent with rWORDn compares. Rename
regs used as shift registers for unaligned loop, using rN defines
for short lifetime/multiple use regs.
* sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcmp.S: Likewise. Exit with
addi 1,1,64 to pop stack frame. Simplify return value code.
* sysdeps/powerpc/powerpc32/power4/memcmp.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html
Adds little-endian support to optimised strchr assembly. I've also
tweaked the big-endian code a little. In power7/strchr.S there's a
check in the tail of the function that we didn't match 0 before
finding a c match, done by comparing leading zero counts. It's just
as valid, and quicker, to compare the raw output from cmpb.
Another little tweak is to use rldimi/insrdi in place of rlwimi for
the power7 strchr functions. Since rlwimi is cracked, it is a few
cycles slower. rldimi can be used on the 32-bit power7 functions
too.
* sysdeps/powerpc/powerpc64/power7/strchr.S (strchr): Add little-endian
support. Correct typos, formatting. Optimize tail. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc32/power7/strchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strchrnul.S (__strchrnul): Add
little-endian support. Correct typos.
* sysdeps/powerpc/powerpc32/power7/strchrnul.S: Likewise. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc64/strchr.S (rTMP4, rTMP5): Define. Use
in loop and entry code to keep "and." results.
(strchr): Add little-endian support. Comment. Move cntlzd
earlier in tail.
* sysdeps/powerpc/powerpc32/strchr.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00100.html
The strcpy changes for little-endian are quite straight-forward, just
a matter of rotating the last word differently.
I'll note that the powerpc64 version of stpcpy is just begging to be
converted to use 64-bit loads and stores..
* sysdeps/powerpc/powerpc64/strcpy.S: Add little-endian support:
* sysdeps/powerpc/powerpc32/strcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/stpcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/stpcpy.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00099.html
More little-endian support. I leave the main strcmp loops unchanged,
(well, except for renumbering rTMP to something other than r0 since
it's needed in an addi insn) and modify the tail for little-endian.
I noticed some of the big-endian tail code was a little untidy so have
cleaned that up too.
* sysdeps/powerpc/powerpc64/strcmp.S (rTMP2): Define as r0.
(rTMP): Define as r11.
(strcmp): Add little-endian support. Optimise tail.
* sysdeps/powerpc/powerpc32/strcmp.S: Similarly.
* sysdeps/powerpc/powerpc64/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/strncmp.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00098.html
The existing strnlen code has a number of defects, so this patch is more
than just adding little-endian support. The changes here are similar to
those for memchr.
* sysdeps/powerpc/powerpc64/power7/strnlen.S (strnlen): Add
little-endian support. Remove unnecessary "are we done" tests.
Handle "s" wrapping around zero and extremely large "size".
Correct main loop count. Handle single left-over word from main
loop inline rather than by using small_loop. Correct comments.
Delete "zero" tail, use "end_max" instead.
* sysdeps/powerpc/powerpc32/power7/strnlen.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00097.html
This is the first of nine patches adding little-endian support to the
existing optimised string and memory functions. I did spend some
time with a power7 simulator looking at cycle by cycle behaviour for
memchr, but most of these patches have not been run on cpu simulators
to check that we are going as fast as possible. I'm sure PowerPC can
do better. However, the little-endian support mostly leaves main
loops unchanged, so I'm banking on previous authors having done a
good job on big-endian.. As with most code you stare at long enough,
I found some improvements for big-endian too.
Little-endian support for strlen. Like most of the string functions,
I leave the main word or multiple-word loops substantially unchanged,
just needing to modify the tail.
Removing the branch in the power7 functions is just a tidy. .align
produces a branch anyway. Modifying regs in the non-power7 functions
is to suit the new little-endian tail.
* sysdeps/powerpc/powerpc64/power7/strlen.S (strlen): Add little-endian
support. Don't branch over align.
* sysdeps/powerpc/powerpc32/power7/strlen.S: Likewise.
* sysdeps/powerpc/powerpc64/strlen.S (strlen): Add little-endian support.
Rearrange tmp reg use to suit. Comment.
* sysdeps/powerpc/powerpc32/strlen.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00090.html
This patch fixes symbol versioning in setjmp/longjmp. The existing
code uses raw versions, which results in wrong symbol versioning when
you want to build glibc with a base version of 2.19 for LE.
Note that the merging the 64-bit and 32-bit versions in novmx-lonjmp.c
and pt-longjmp.c doesn't result in GLIBC_2.0 versions for 64-bit, due
to the base in shlib_versions.
* sysdeps/powerpc/longjmp.c: Use proper symbol versioning macros.
* sysdeps/powerpc/novmx-longjmp.c: Likewise.
* sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/bsd-setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/mcount.c: Likewise.
* sysdeps/powerpc/powerpc32/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/setjmp.S: Likewise.
* nptl/sysdeps/unix/sysv/linux/powerpc/pt-longjmp.c: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00089.html
Little-endian fixes for setjmp/longjmp. When writing these I noticed
the setjmp code corrupts the non volatile VMX registers when using an
unaligned buffer. Anton fixed this, and also simplified it quite a
bit.
The current code uses boilerplate for the case where we want to store
16 bytes to an unaligned address. For that we have to do a
read/modify/write of two aligned 16 byte quantities. In our case we
are storing a bunch of back to back data (consective VMX registers),
and only the start and end of the region need the read/modify/write.
[BZ #15723]
* sysdeps/powerpc/jmpbuf-offsets.h: Comment fix.
* sysdeps/powerpc/powerpc32/fpu/__longjmp-common.S: Correct
_dl_hwcap access for little-endian.
* sysdeps/powerpc/powerpc32/fpu/setjmp-common.S: Likewise. Don't
destroy vmx regs when saving unaligned.
* sysdeps/powerpc/powerpc64/__longjmp-common.S: Correct CR load.
* sysdeps/powerpc/powerpc64/setjmp-common.S: Likewise CR save. Don't
destroy vmx regs when saving unaligned.
http://sourceware.org/ml/libc-alpha/2013-08/msg00088.html
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Increase alignment of
constants to usual value for .cst8 section, and remove redundant
high address load.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Use float
constant for 0x1p52. Load little-endian words of double from
correct stack offsets.
http://sourceware.org/ml/libc-alpha/2013-07/msg00201.html
These two functions oddly test x+1>0 when a double x is >= 0.0, and
similarly when x is negative. I don't see the point of that since the
test should always be true. I also don't see any need to convert x+1
to integer rather than simply using xr+1. Note that the standard
allows these functions to return any value when the input is outside
the range of long long, but it's not too hard to prevent xr+1
overflowing so that's what I've done.
(With rounding mode FE_UPWARD, x+1 can be a lot more than what you
might naively expect, but perhaps that situation was covered by the
x - xrf < 1.0 test.)
* sysdeps/powerpc/fpu/s_llround.c (__llround): Rewrite.
* sysdeps/powerpc/fpu/s_llroundf.c (__llroundf): Rewrite.
http://sourceware.org/ml/libc-alpha/2013-07/msg00200.html
This works around the fact that vsx is disabled in current
little-endian gcc. Also, float constants take 4 bytes in memory
vs. 16 bytes for vector constants, and we don't need to write one lot
of masks for double (register format) and another for float (mem
format).
* sysdeps/powerpc/fpu/s_float_bitwise.h (__float_and_test28): Don't
use vector int constants.
(__float_and_test24, __float_and8, __float_get_exp): Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00083.html
Further replacement of ieee854 macros and unions. These files also
have some optimisations for comparison against 0.0L, infinity and nan.
Since the ABI specifies that the high double of an IBM long double
pair is the value rounded to double, a high double of 0.0 means the
low double must also be 0.0. The ABI also says that infinity and
nan are encoded in the high double, with the low double unspecified.
This means that tests for 0.0L, +/-Infinity and +/-NaN need only check
the high double.
* sysdeps/ieee754/ldbl-128ibm/e_atan2l.c (__ieee754_atan2l): Rewrite
all uses of ieee854 long double macros and unions. Simplify tests
for long doubles that are fully specified by the high double.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_ilogbl.c (__ieee754_ilogbl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_log10l.c (__ieee754_log10l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_logl.c (__ieee754_logl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c (__kernel_tanl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_isinf_nsl.c (__isinf_nsl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_isinfl.c (___isinfl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (__log1pl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_modfl.c (__modfl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c (__nextafterl): Likewise.
Comment on variable precision.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c (__scalblnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (__scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c (__tanhl): Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Adjust tan_towardzero ulps.
The pointer guard used for pointer mangling was not initialized for
static applications resulting in the security feature being disabled.
The pointer guard is now correctly initialized to a random value for
static applications. Existing static applications need to be
recompiled to take advantage of the fix.
The test tst-ptrguard1-static and tst-ptrguard1 add regression
coverage to ensure the pointer guards are sufficiently random
and initialized to a default value.
This patch fixes backtrace for PPC32 and PPC64 to correctly handle
signal trampolines. The 'debug/tst-backtrace6.c' also check for
SA_SIGINFO handling, where is triggers another vDSO symbols for PPC32.
Resolves: #15465
The program name may be unavailable if the user application tampers
with argc and argv[]. Some parts of the dynamic linker caters for
this while others don't, so this patch consolidates the check and
fallback into a single macro and updates all users.
This patch fix the 3c0265394d commits
by correctly setting minimum architecture for modf PPC optimization
to power5+ instead of power5 (since only on power5+ round/ceil will
be inline to inline assembly).
The branch prediction hints is actually hurts performance in this case.
The assembly implementation make two assumptions: 1. 'fabs (x) < 2^52'
is unlikely and 2. 'x > 0.0' is unlike (if 1. is true). Since it a
general floating point function, expected input is not bounded and then
it is better to let the hardware handle the branches.
The mantissa of mp_no is intended to take only integral values. This
is a relatively good choice for powerpc due to its 4 fpus, but not for
other architectures, which suffer due to this choice. This change
makes the default mantissa a long integer and allows powerpc to
override it. Additionally, some operations have been optimized for
integer manipulation, resulting in a significant improvement in
performance.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_ceil.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_finite.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_floor.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_frexp.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_isinf.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_isnan.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_llround.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_logb.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_lround.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_modf.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_nearbyint.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_remquo.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_rint.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_round.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_scalbln.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_scalbn.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_trunc.c: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/Implies: Add
ieee754/ldbl-opt/wordsize-64.
* sysdeps/powerpc/powerpc64/Implies: Add
ieee754/dbl-64/wordsize-64.
Adapts __ppc_get_timebase to the upcoming GCC 4.8 that provides
__builtin_ppc_get_timebase. Building applicationns with previous
versions of GCC will continue to use the internal implementation.
Initially based on the versions found in wcsmbs/* ; these files have
been changed by hand unrolling, and adding some additional variables
to allow some read-ahead to occur, which then relieves some of the
wait-for-increment/wait-for-load/wait-for-compare-results pressure
that was slowing down every iteration through the while-loop.
For 64-bit Power7, These changes give an approx 20% throughput boost
for the wcschr and wcsrchr functions; and approx 40% boost for the
wcscpy function. 32-bit improvements appear to be slightly better
with ~ %30 and ~ %45 respectively. Results for Power6 closely match
those for power7.
Assorted tweaking, twisting and tuning to squeeze a few additional cycles
out of the memchr code. Changes include bypassing the shift pairs
(sld,srd) when they are not required, and unrolling the small_loop that
handles short and trailing strings.
Per scrollpipe data measuring aligned strings for 64-bit, these changes
save between five and eight cycles (9-13% overall) for short strings (<32),
Longer aligned strings see slight improvement of 1-3% due to bypassing the
shifts and the instruction rearranging.
[BZ #13743]
A new class of installed headers has been documented for low-level
platform-specific functionality. PowerPC added the first instance with a
function to provide time base register access (__ppc_get_timebase). This
is required for applications that measure time at high frequencies with
high precision that can't afford a syscall.
Adjustments for libm ulps added with commit d8b82cad1b,
495fd99f3a, and 5ba3cc691c.
I also adjusted some exp10 ulps definition that was higher than needed.
In the past the "-ftree-loop-linear" switch provided a measurable
improvement in performance for certain functions. At some point it
was assigned as the responsibility of Graphite in GCC. It has been
found that even with Graphite enabled these flags no longer perform
any appreciable improvement over the baseline.
Graphite now has some open bugs which need to be fixed in order for it
to provide measurable performance improvements but it lacks active
development. As a result some compiler distributors may disable
Graphite. If Graphite is disabled then building GLIBC will fail if
the "-ftree-loop-linear" switch is used.
This patch removes the use of "-ftree-loop-linear" as unnecessary.
This patch provides optimized logb (1.2x on PPC32 and 2.5x on PPC64),
logbf (1.1x on PPC32 and 2.2x on PPC64), and logbl (1.3x on PPC32 and
50% on PPC64) for the POWER7 processor.
* sysdeps/powerpc/elf/ifunc-sel.h: Moved to ...
* sysdeps/powerpc/ifunc-sel.h: ... here.
* sysdeps/powerpc/elf/rtld-global-offsets.sym: Moved to ...
* sysdeps/powerpc/rtld-global-offsets.sym: ... here.
* sysdeps/powerpc/powerpc32/crti.S: New file.
* sysdeps/powerpc/powerpc32/crtn.S: New file.
* sysdeps/powerpc/powerpc64/crti.S: New file.
* sysdeps/powerpc/powerpc64/crtn.S: New file.
This patch tries to organize the implies files for ppc, since there are
a number of processors and most of them are compatible with each other
(backwards compatible).
Having in mind that we start the search for processor-specific files in
the sysdeps/unix/sysv/linux tree
(sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]/fpu to be
exact), we would like to grab any linux-specific code from that tree
prior to going through the other tree (sysdeps/powerpc/...).
For that, i removed the Implies files that were originally inside the
fpu directories and placed then in the non-fpu directories (still inside
the unix/sysv/linux tree). If no processor-specific/linux-specific files
could be found, we "imply" the other tree's (sysdeps/powerpc/...) fpu
directory for that specific processor AND also the non-fpu directory for
that same tree.
If, again, no processor-specific code is found, we read another Implies
file that will point to the most compatible processor that we should
grab code from, and so on, until we reach the power4 processor.
So, in summary, the Implies files will live inside these directories
now:
* sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]
* sysdeps/powerpc/powerpc[32|64]/[processor]
Practical example of the order we will use to pick power6-specific code
with the new structure.
sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6/fpu ->
sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6 ->
sysdeps/powerpc/powerpc[32|64]/power6/fpu ->
sysdeps/powerpc/powerpc[32|64]/power6 ->
sysdeps/powerpc/powerpc[32|64]/power5+/fpu ->
sysdeps/powerpc/powerpc[32|64]/power5+ ->
sysdeps/powerpc/powerpc[32|64]/power5/fpu ->
sysdeps/powerpc/powerpc[32|64]/power5 ->
sysdeps/powerpc/powerpc[32|64]/power4/fpu ->
sysdeps/powerpc/powerpc[32|64]/power4 (from here, it'll go to the
generic path as usual)
I've just committed STT_GNU_IFUNC ppc/ppc64 support into prelink,
and this patch is needed on the glibc side. Without it ld.so segfaults,
as in dl-conflict.c sym_map is always NULL. While dl-machine.h could use
RESOLVE_CONFLICT_FIND_MAP macro to compute it, it doesn't make sense,
because with prelink we know it is already properly relocated (all relative
relocations are applied by prelink).
2009-01-11 Ryan S. Arnold <rsa@us.ibm.com>
[BZ #9726]
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c (_SET_DI_FPSCR,
_SET_SI_FPSCR): Clobber fp0 to prevent erroneous test-case passes.
2009-01-08 Ryan S. Arnold <rsa@us.ibm.com>
[BZ #9726]
* sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S
(__CONTEXT_FUNC_NAME): Fix mtfsf to use fp31 instead of fp0.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S
(__CONTEXT_FUNC_NAME): Fix mtfsf to use fp31 instead of fp0.
2008-11-13 Ryan S. Arnold <rsa@us.ibm.com>
[BZ #6411]
* sysdeps/powerpc/fpu/Makefile: Added test case tst-setcontext-fpscr.
* sysdeps/powerpc/fpu/feholdexcpt.c (_FPU_MASK_ALL): Define to replace
magic numbers.
* sysdeps/powerpc/fpu/fenv_libc.h (fesetenv_register): Dynamically
choose mtfsf insn based on PPC_FEATURE_HAS_DFP.
(relax_fenv_state): Same as above.
(FPSCR_29): Reserve bit in ISA 2.05.
(FPSCR_NI): Provide define for compat.
* sysdeps/powerpc/fpu/fesetenv.c (_FPU_MASK_ALL): Define to replace
magic numbers.
* sysdeps/powerpc/fpu/feupdateenv.c (_FPU_MASK_ALL): Define to replace
magic numbers.
* sysdeps/powerpc/fpu/tst-setcontext-fpscr.c: New file. Test case to
test setcontext and swapcontext with dynamic 64-bit FPSCR detection.
* sysdeps/powerpc/powerpc32/fpu/__longjmp-common.S (__longjmp): Adjust
access to hwcap to account for hwcap size increase to uint64_t.
* sysdeps/powerpc/powerpc32/fpu/setjmp-common.S (__sigsetjmp ):
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/getcontext-common.S
(*setcontext): Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/power6/fpu/setcontext.S:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/power6/fpu/swapcontext.S:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S
(*setcontext): dynamically select mtfsf insn based on
PPC_FEATURE_HAS_DFP. Adjust access to hwcap to account for hwcap size
increase to uint64_t.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S
(*swapcontext): dynamically select mtfsf insn based on
PPC_FEATURE_HAS_DFP. Adjust access to hwcap to account for hwcap size
increase to uint64_t.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/power6/fpu/setcontext.S:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/power6/fpu/swapcontext.S:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/setcontext.S
(*setcontext): dynamically select mtfsf insn based on
PPC_FEATURE_HAS_DFP.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/swapcontext.S
(*swapcontext): dynamically select mtfsf insn based on
PPC_FEATURE_HAS_DFP.
2008-08-14 Ryan S. Arnold <rsa@us.ibm.com>
[BZ #6845]
* sysdeps/powerpc/fpu/bits/mathinline.h (__signbitl): Copy new
__signbitl definition and __LONG_DOUBLE_128__ guard from:
* sysdeps/unix/sysv/linux/powerpc/bits/mathinline.h: Remove as
redundant. Functions which call floating point assembler operations
should go into a sysdeps powerpc/fpu directory.
2008-08-01 Steven Munroe <sjmunroe@us.ibm.com>
Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
[BZ #6817]
* sysdeps/powerpc/dl-procinfo.c (_dl_powerpc_cap_flags):
Added the members 'vsx' and 'arch_2_06'.
(_dl_powerpc_platforms): Add the member 'power7'.
* sysdeps/powerpc/dl-procinfo.h: Modify _DL_HWCAP_FIRST
to reflect the changes required by VSX and ISA 2.06.
Modify _DL_PLATFORMS_COUNT to reflect the addition of
'power7'.
Defined PPC_PLATFORM_POWER7.
(_dl_string_platform): Add support for POWER7.
* sysdeps/powerpc/sysdep.h: Define bit masks for VSX
capability and ISA 2.06.
__fe_nomask_env.
* sysdeps/powerpc/fpu/fe_nomask.c: Add libm_hidden_def.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/fe_nomask.c: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/fpu/fe_nomask.c: Likewise.
* sysdeps/powerpc/bits/fenv.h: Make safe for C++.
* sysdeps/unix/sysv/linux/powerpc/bits/mathinline.h: New file.
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Rename
function from fegetexcept and make old name weak alias.
* include/fenv.h: Declare __fegetexcept.
* sysdeps/powerpc/fpu/fedisblxcpt.c: Use __fegetexcept instead of
fegetexcept.
* sysdeps/powerpc/fpu/feenablxcpt.c: Likewise.
* sysdeps/powerpc/fpu/fraiseexcpt.c (__feraiseexcept): Avoid call
to fetestexcept.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (__log1pl): Use __frexpl
instead of frexpl to avoid local PLT.
* math/s_significandl.c (__significandl): Use __ilogbl instead of
ilogbl to avoid local PLT.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Use __ldexpl
instead of ldexpl to avoid local PLT.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c (__ieee754_expl): Use
__roundl not roundl to avoid local PLT.
* sysdeps/ieee754/ldbl-128/e_j0l.c: Use function names which avoid
local PLTs. Use __sincosl instead of separate sinl and cosl
calls.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_lround.S (__lround): Fixed erroneous
result when x is +/-nextafter(+/-0.5,-/+1) i.e. all 1's in the
mantissa.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S (__llround):
Likewise. Also account for when x is an odd number between 2^52
and 2^53-1.
* sysdeps/powerpc/powerpc64/fpu/s_llround.S (__llround): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llroundf.S (__llroundf): Likewise.
* math/libm-test.inc (lround_test, llround_test): Added test cases to
detect aforementioned erroneous conditions.
2008-01-24 Steven Munroe <sjmunroe@us.ibm.com>
[BZ #5741]
* sysdeps/powerpc/powerpc64/dl-machine.h (PPC_DCBT, PPC_DCBF):
Define additonal Data Cache Block instruction macros.
(elf_machine_fixup_plt): Add dcbt for opd and plt entries.
Replace dcbst with dcbf and sync with sync/isync.
* sysdeps/powerpc/powerpc32/power4/hp-timing.h: New file.
* sysdeps/powerpc/powerpc64/hp-timing.h [_ARCH_PWR4] (HP_TIMING_NOW):
For ISA 2.01 and later replace mftb with mfspr 268.
* sysdeps/i386/i686/memcpy.S: Optimize copying of equally aligned
buffers.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: New file.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: New file.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: New file.
* sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: New file.
* sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: New file.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: New file.
* misc/bits/syslog.h (syslog): When __va_arg_pack is defined,
implement as __extern_always_inline function.
(vsyslog): Define as __extern_always_inline function unconditionally.
* libio/bits/stdio2.h (sprintf, snprintf, printf, fprintf):
When __va_arg_pack is defined, implement as __extern_always_inline
functions.
(vsprintf, vsnprintf, vprintf, vfprintf): Define as
__extern_always_inline functions unconditionally.
* libio/bits/stdio.h (vprintf): Ifdef out the inline when
bits/stdio2.h will be included.
* wcsmbs/bits/wchar2.h (__swprintf_alias): New redirect.
(swprintf, wprintf, fwprintf): When __va_arg_pack is defined,
implement as __extern_always_inline functions.
(vswprintf, vwprintf, vfwprintf): Define as
__extern_always_inline functions unconditionally.
* debug/tst-chk1.c (do_test): Enable remaining tests for C++.
2007-09-03 Jakub Jelinek <jakub@redhat.com>
* misc/sys/cdefs.h (__extern_inline, __extern_always_inline): Only
define in C++ for GCC 4.3+, in C++ always use __gnu_inline__
attribute.
* include/features.h (__USE_EXTERN_INLINES): Define only when
__extern_inline is defined.
* stdlib/stdlib.h: Include bits/stdlib.h when __extern_always_inline
is defined instead of when not __cplusplus.
* misc/sys/syslog.h: Include bits/syslog.h when __extern_always_inline
is defined instead of when not __cplusplus.
* socket/sys/socket.h: Include bits/socket2.h when
__extern_always_inline is defined instead of when not __cplusplus.
* libio/stdio.h: Include bits/stdio2.h when __extern_always_inline
is defined instead of when not __cplusplus.
* posix/unistd.h: Include bits/unistd.h when __extern_always_inline
is defined instead of when not __cplusplus.
* string/string.h: Include bits/string3.h when __extern_always_inline
is defined instead of when not __cplusplus.
* wcsmbs/wchar.h: Include bits/wchar2.h when __extern_always_inline
is defined instead of when not __cplusplus.
(btowc, wctob): Don't guard the inlines with ifndef __cplusplus.
* io/fcntl.h: Don't include bits/fcntl2.h if __extern_always_inline
is not defined.
* misc/bits/syslog-ldbl.h: Guard *_chk stuff with
defined __extern_always_inline instead of !defined __cplusplus.
* libio/bits/stdio-ldbl.h: Likewise.
* wcsmbs/bits/wchar-ldbl.h: Likewise.
* misc/bits/syslog.h (syslog): Don't define for C++.
(vsyslog): Use __extern_always_inline function for C++ instead of
a macro.
* libio/bits/stdio.h (__STDIO_INLINE): Define to __extern_inline
whenever that macro is defined.
(vprintf): Don't provide the inline for C++.
(fread_unlocked, fwrite_unlocked): Don't define the macros for C++.
* libio/bits/stdio2.h (sprintf, snprintf, printf, fprintf): Don't
define the macros for C++.
(vsprintf, vsnprintf, vprintf, vfprintf): Define as
__extern_always_inline functions for C++.
* io/sys/stat.h (stat, lstat, fstat, fstatat, mknod, mknodat,
stat64, lstat64, fstat64, fstatat64): Don't define if not
__USE_EXTERN_INLINES.
* wcsmbs/bits/wchar2.h: Fix #error message.
(swprintf, wprintf, fwprintf): Don't define the macros for C++.
(vswprintf, vwprintf, vfwprintf): Define using
__extern_always_inline functions for C++.
* string/bits/string3.h: Don't #undef macros if __cplusplus.
(memcpy, memmove, mempcpy, memset, bcopy, bzero, strcpy, stpcpy,
strncpy, strcat, strncat): Define as __extern_always_inline
functions instead of macros for C++.
* math/bits/cmathcalls.h: Guard __extern_inline routines with
defined __extern_inline.
* sysdeps/alpha/fpu/bits/mathinline.h (__MATH_INLINE): Define
to __extern_inline whenever that macro is defined.
* sysdeps/ia64/fpu/bits/mathinline.h (__MATH_INLINE): Likewise.
* sysdeps/i386/fpu/bits/mathinline.h (__MATH_INLINE): Likewise.
* sysdeps/i386/i486/bits/string.h (__STRING_INLINE): Likewise.
* sysdeps/s390/bits/string.h (__STRING_INLINE): Likewise.
* sysdeps/s390/fpu/bits/mathinline.h (__MATH_INLINE): Likewise.
* sysdeps/powerpc/fpu/bits/mathinline.h (__MATH_INLINE): Likewise.
* sysdeps/x86_64/fpu/bits/mathinline.h (__MATH_INLINE): Likewise.
* sysdeps/sparc/fpu/bits/mathinline.h (__MATH_INLINE): Likewise.
* sysdeps/unix/sysv/linux/sys/sysmacros.h (gnu_dev_major,
gnu_dev_minor, gnu_dev_makedev): Remove __extern_inline from
prototypes. Only provide __extern_inline routines if
__USE_EXTERN_INLINES.
* debug/Makefile: Add rules to build and run tst-{,lfs}chk{4,5,6}
tests.
* debug/tst-chk1.c (do_prepare, do_test): Allow compilation as C++.
For now avoid some *printf tests in C++. Skip all testing
if __USE_FORTIFY_LEVEL is defined, but __extern_always_inline macro
is not.
* debug/tst-chk4.cc: New file.
* debug/tst-chk5.cc: New file.
* debug/tst-chk6.cc: New file.
* debug/tst-lfschk4.cc: New file.
* debug/tst-lfschk5.cc: New file.
* debug/tst-lfschk6.cc: New file.
* include/wchar.h (__vfwprintf_chk, __vswprintf_chk): Avoid
prototypes in C++.
* include/stdio.h (__sprintf_chk, __snprintf_chk, __vsprintf_chk,
__vsnprintf_chk, __printf_chk, __fprintf_chk, __vprintf_chk,
__vfprintf_chk, __fgets_unlocked_chk, __fgets_chk): Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/Implies: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/Implies: New file.
* sysdeps/powerpc/powerpc32/power6/fpu/Implies: New file.
* sysdeps/powerpc/powerpc32/power6x/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64/970/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64/power5/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64/power6/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64/power6x/fpu/Implies: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/970/fpu/Implies: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/power4/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/power5/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/power5+/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/power6/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/power6x/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/970/fpu/Implies: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/power4/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/power5/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/power5+/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/power6/fpu/Implies:
New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/power6x/fpu/Implies:
New file.
2007-05-31 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llrint.S: Move.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrint.S: To here.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llrintf.S: Move.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llrintf.S: To here.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llround.S: Move.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: To here.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llroundf.S: Move.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llroundf.S: To here.
2007-05-22 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power5+/fpu/s_round.S
(LONG_DOUBLE_COMPAT): Specify correct version, GLIBC_2_1.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_trunc.S
(LONG_DOUBLE_COMPAT): Specify correct version, GLIBC_2_1.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_round.S
(LONG_DOUBLE_COMPAT): Specify correct version, GLIBC_2_1.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_trunc.S
(LONG_DOUBLE_COMPAT): Specify correct version, GLIBC_2_1.
2007-05-21 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power4/fpu/slowexp.c: New file.
* sysdeps/powerpc/powerpc32/power4/fpu/w_sqrt.c: New file.
* sysdeps/powerpc/powerpc64/power4/fpu/slowexp.c: New file.
* sysdeps/powerpc/powerpc64/power4/fpu/w_sqrt.c: New file.
2007-03-15 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llrint.S
[LONG_DOUBLE_COMPAT]: Add compat_symbol for llrintl@@GLIBC_2_1.
2006-02-13 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrint.S: New File
* sysdeps/powerpc/powerpc32/power6/fpu/s_llrintf.S: New File
* sysdeps/powerpc/powerpc32/power6/fpu/s_llround.S: New File
* sysdeps/powerpc/powerpc32/power6/fpu/s_llroundf.S: New File
2006-10-20 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power4/fpu/slowpow.c: New file.
* sysdeps/powerpc/powerpc64/power4/fpu/slowpow.c: New file.
2006-10-03 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llround.S: New file.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llroundf.S: New file.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/Makefile: Moved.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/mpa.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/Makefile: To here.
* sysdeps/powerpc/powerpc32/power4/fpu/mpa.c: Likewise.
2006-09-29 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power6x/fpu/s_lrint.S: New file.
* sysdeps/powerpc/powerpc32/power6x/fpu/s_lround.S: New file.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: New file.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llround.S: New file.
2006-09-28 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power5+/fpu/s_llround.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_llroundf.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_lround.S: New file.
* sysdeps/powerpc/powerpc32/power6x/fpu/Implies: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_llround.S: New file.
* sysdeps/powerpc/powerpc64/power6x/fpu/Implies: New file.
2006-08-31 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/powerpc64/fpu/Makefile: New file.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/mpa.c: New file.
* sysdeps/powerpc/powerpc64/power4/fpu/Makefile: New file.
* sysdeps/powerpc/powerpc64/power4/fpu/mpa.c: New file.
2006-06-15 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_round.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_roundf.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_trunc.S: New file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_truncf.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_round.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_roundf.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_trunc.S: New file.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_truncf.S: New file.
2006-03-20 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llrint.S: New file.
* sysdeps/powerpc/powerpc32/powerpc64/fpu/s_llrintf.S: New file.
2007-06-01 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power6/memset.S: New file.
* sysdeps/powerpc/powerpc64/power6/memset.S: New file.
2007-05-31 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/970/Implies: New file.
* sysdeps/powerpc/powerpc32/power5/Implies: New file.
* sysdeps/powerpc/powerpc32/power5+/Implies: New file.
* sysdeps/powerpc/powerpc32/power6/Implies: New file.
* sysdeps/powerpc/powerpc32/power6x/Implies: New file.
* sysdeps/powerpc/powerpc64/970/Implies: New file.
* sysdeps/powerpc/powerpc64/power5/Implies: New file.
* sysdeps/powerpc/powerpc64/power5+/Implies: New file.
* sysdeps/powerpc/powerpc64/power6/Implies: New file.
* sysdeps/powerpc/powerpc64/power6x/Implies: New file.
2007-05-21 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power4/memset.S: New file
2007-03-13 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc64/memcpy.S: Improve aligned loop to minimize
branch miss-predicts. Ensure that cache line crossing does not impact
dispatch grouping.
2006-12-13 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc64/power4/memcopy.h: Replace with include
"../../powerpc32/power4/memcopy.h".
* sysdeps/powerpc/powerpc64/power4/wordcopy.c: Replace with include
"../../powerpc32/power4/wordcopy.c".
2006-10-03 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/powerpc64/Makefile: Moved.
* sysdeps/powerpc/powerpc32/powerpc64/memcopy.h: Likewise.
* sysdeps/powerpc/powerpc32/powerpc64/wordcopy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/Makefile: To here.
* sysdeps/powerpc/powerpc32/power4/memcopy.h: Likewise.
* sysdeps/powerpc/powerpc32/power4/wordcopy.c: Likewise.
2006-09-10 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power6/memcpy.S: New file.
2006-08-31 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power6/wordcopy.c: New file.
* sysdeps/powerpc/powerpc32/powerpc64/Makefile: New file.
* sysdeps/powerpc/powerpc32/powerpc64/memcopy.h: New file.
* sysdeps/powerpc/powerpc32/powerpc64/wordcopy.c: New file.
* sysdeps/powerpc/powerpc64/power4/Makefile: New file.
* sysdeps/powerpc/powerpc64/power4/memcopy.h: New file.
* sysdeps/powerpc/powerpc64/power4/wordcopy.c: New file.
* sysdeps/powerpc/powerpc64/power6/wordcopy.c: New file.
2006-07-06 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc64/power6/memcpy.S: New file.
2006-03-20 Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/powerpc32/power4/memcmp.S: New file.
* sysdeps/powerpc/powerpc32/power4/memcpy.S: New file.
* sysdeps/powerpc/powerpc32/power4/memset.S: New file.
* sysdeps/powerpc/powerpc32/power4/strncmp.S: New file.
* sysdeps/powerpc/powerpc64/power4/memcmp.S: New file.
* sysdeps/powerpc/powerpc64/power4/memcpy.S: New file.
* sysdeps/powerpc/powerpc64/power4/strncmp.S: New file.
Peter Bergner <bergner@us.ibm.com>
* sysdeps/powerpc/bits/fenv.h: Declare __fe_mask_env extern.
Define FE_NOMASK_ENV as FE_EANBLED_ENV. Define FE_MASK_ENV.
* sysdeps/powerpc/fpu/Makefile: Add fe_mask to libm-support.
* sysdeps/powerpc/fpu/fe_mask.c: New file.
* sysdeps/powerpc/fpu/fe_nomask.c: Correct comment.
* sysdeps/powerpc/fpu/fedisblxcpt.c (fedisableexcept):
Call __fe_mask_env() if all FP exceptions disabled.
* sysdeps/powerpc/fpu/feholdexcpt.c (feholdexcept): Copy high 32-bits
from old FPSCR to new fenv to propagate DFP rounding modes.
Call __fe_mask_env() if FP exceptions previously enabled.
* sysdeps/powerpc/fpu/fesetenv.c (fesetenv): Change mask to merge
exceptions from env. Use __fe_nomask_env() or __fe_mask_env() when
transitioning from all exceptions disabled to any exception enabled
or visa versa.
* sysdeps/powerpc/fpu/feupdateenv.c (__feupdateenv): Change mask to
merge exceptions from env. Call __fe_nomask_env or __fe_mask_env
when transitioning from all exceptions disabled to any exception
enabled or visa versa.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fe_nomask.c: Moved to...
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/fe_nomask.c: ...here.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/fe_nomask.c: Moved to...
* sysdeps/unix/sysv/linux/powerpc/powerpc64/fpu/fe_nomask.c: ...here.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/fe_mask.c: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/fpu/fe_mask.c: New file.
rather than r->r_brk.
2006-11-08 Jakub Jelinek <jakub@redhat.com>
* elf/dl-load.c (decompose_rpath): Return bool rather than void.
If l->l_name is on inhibit_rpath list, set sps->dirs to -1 and
return false, otherwise return true.
(cache_rpath): Return decompose_rpath return value.
2006-11-07 Jakub Jelinek <jakub@redhat.com>
* include/libc-symbols.h (declare_symbol): Rename to...
(declare_symbol_alias): ... this. Add ORIGINAL argument, imply
strong_alias (ORIGINAL, SYMBOL) in asm to make sure it preceedes
.size directive.
* sysdeps/gnu/errlist-compat.awk: Adjust for declare_symbol_alias
changes.
* sysdeps/gnu/siglist.c: Likewise.
2006-09-25 Jakub Jelinek <jakub@redhat.com>
[BZ #3252]
* sysdeps/unix/sysv/linux/powerpc/fchownat.c (fchownat): Handle only
fchownat syscall and __ASSUME_LCHOWN_SYSCALL case inline, call
__{,l}chown to handle the rest.
* sysdeps/unix/sysv/linux/i386/fchownat.c (fchownat): Handle only
fchownat syscall and __ASSUME_32BITUIDS case inline, call
__{,l}chown to handle the rest.
* sysdeps/unix/sysv/linux/sparc/sparc32/fchownat.c: Include
i386/fchownat.c.
* sysdeps/unix/sysv/linux/s390/s390-32/fchownat.c: Likewise.
* sysdeps/unix/sysv/linux/sh/fchownat.c: Likewise.
[BZ #3253]
* posix/glob.c (glob_in_dir): Don't alloca one struct globlink at a
time, rather allocate increasingly bigger arrays of pointers, if
possible with alloca, if too large with malloc.
2006-09-24 Jakub Jelinek <jakub@redhat.com>
* sysdeps/powerpc/fpu/libm-test-ulps: Updated.
* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl): Fix 2 typos.
Steven Munroe <sjmunroe@us.ibm.com>
* sysdeps/powerpc/dl-procinfo.c (_dl_powerpc_cap_flags): Add 4 new cap
names to the beginning. Rename "cell" to "cellbe".
(_dl_powerpc_platforms): New.
* sysdeps/powerpc/dl-procinfo.h (_DL_HWCAP_FIRST): Decrease.
(HWCAP_IMPORTANT): Remove power{4,5,5+} and cell.
(_DL_PLATFORMS_COUNT, _DL_FIRST_PLATFORM): Define.
(_DL_HWCAP_PLATFORM): Define to new mask.
(_dl_platform_string, _dl_string_platform): New functions.
* sysdeps/powerpc/sysdep.h (PPC_FEATURE_BOOKE, PPC_FEATURE_SMT,
PPC_FEATURE_ICACHE_SNOOP, PPC_FEATURE_ARCH_2_05): Define.
2006-04-03 Steven Munroe <sjmunroe@us.ibm.com>
[BZ #2505]
* sysdeps/powerpc/powerpc32/bits/atomic.h [_ARCH_PWR4]:
Define atomic_read_barrier and __ARCH_REL_INSTR using lwsync.
* math/gen-libm-test.pl (parse_args): Take function name for pretty
output as an argument.
(generate_testfile): Pass it the name given in the START macro.
[BZ #2466]
* math/libm-test.inc (llrint_test, llround_test): Fix last change to
protect large-precision cases with [LDBL_MANT_DIG > 100].
(llrint_test_tonearest, llrint_test_towardzero): Likewise.
(llrint_test_downward, llrint_test_upward): Likewise.
2006-03-15 Steven Munroe <sjmunroe@us.ibm.com>
Alan Modra <amodra@bigpond.net.au>
[BZ #2466]
* math/libm-test.inc (llrint_test, llround_test) [TEST_LDOUBLE]:
Add new test values.
(llrint_test_tonearest, llrint_test_towardzero, llrint_test_downward,
llrint_test_upward): New functions.
(main): Call them.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c (__llrintl): Handle
rounding that spans doubles in IBM long double format.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c (__llroundl): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llrintl.S: Removed.
* sysdeps/powerpc/powerpc64/fpu/s_llroundl.S: Removed.
* sysdeps/powerpc/powerpc64/fpu/s_lrintl.S: Removed.
* sysdeps/powerpc/powerpc64/fpu/s_lroundl.S: Removed.
2006-03-16 Roland McGrath <roland@redhat.com>