Optimization is achieved on 8 byte aligned strings with double word
comparison using cmpb instruction. On unaligned strings loop unrolling
is applied for Power7 gain.
This patch fixes the optimized ppc64/power7 strncat strlen call for
static build without ifunc enabled. The strlen symbol to call in such
situation is just strlen, instead of __GI_strlen (since the __GI_
alias is just created for shared objects).
This patch fixes a similar issue to
736c304a1a, where for PPC32 if the symbol
is defined as hidden (memchr) then compiler will create a local branc
(symbol@local) and the linker will not create a required PLT call to
make the ifunc work. It changes the default hidden symbol (__GI_memchr)
to default memchr symbol for powerpc32 (__memchr_ppc32).
This patch fixes the __copysignf optimized macro meant to internal libm
usage when used with constant value. Without the explicit cast to
float, if it is used with const double value (for instance, on
s_casinhf.c) double constants will be used and it may lead to precision
issues in some algorithms.
It fixes the following failures on PPC64/POWER7:
Failure: Test: Real part of: cacos_downward (inf + 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf - 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf + 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf - 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf + 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf - 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf + 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf - 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
The optimization is achieved by following techniques:
> data alignment [gain from aligned memory access on read/write]
> POWER7 gains performance with loop unrolling/unwinding
[gain by reduction of branch penalty].
> zero padding done by calling optimized memset
This patch changes de default symbol redirection for internal call of
memcpy, memset, memchr, and strlen to the IFUNC resolved ones. The
performance improvement is noticeable in algorithms that uses these
symbols extensible, like the regex functions.
This patch optimizes the FPSCR update on exception and rounding change
functions by just updating its value if new value if different from
current one. It also optimizes fedisableexcept and feenableexcept by
removing an unecessary FPSCR read.
This patch fixes some powerpc32 and powerpc64 builds with
--disable-multi-arch option along with different --with-cpu=powerN.
It cleanups the Implies directories by removing the multiarch
folder for non multiarch config and also fixing two assembly
implementations: powerpc64/power7/strncat.S that is calling the
wrong strlen; and power8/fpu/s_isnan.S that misses the hidden_def and
weak_alias directives.
This patch fixes the powerpc32 optimized nearbyint/nearbyintf bogus
results for FE_DOWNWARD rounding mode. This is due wrong instructions
sequence used in the rounding calculation (two subtractions instead of
adition and a subtraction).
Fixes BZ#16815.
This patch add an optimized strpbrk for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance on
the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and memory
clear using VSX instructions.
This patch add a optimized strcspn for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance
on the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and align
stack memory to table to 16 bytes (so VSX clean can ran without
alignment issues).
The roundl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_roundl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_roundl.c instead fixes the failing math.
This fixes 16707.
The nearbyintl assembly implementation
(sysdeps/powerpc/powerpc64/fpu/s_nearbyintl.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c instead fixes the failing
math.
Fixes BZ#16706.
The ceill assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_ceill.S)
returns wrong results for some inputs where first double is a exact
integer and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_ceill.c instead fixes the failing math.
Fixes BZ#16701.
This patch fixes an issue for powerpc32-fpu static build which fails
with an 'bzero' undefined reference. This patch adds bzero ifunc selector
for static builds and fixes the '__bzero_ppc' reference to default
memset symbol (since static memset build does not provide ifunc
selector).
Fixes BZ#16689.
This patch fixes an issue for powerpc64[le] static build where __bzero
is definied in multiple places (memset-ppc64.o and bzero.o). It is now
defined only in bzero.o and memset-ppc64.o only defined __bzero_ppc for
both dynamic and static library.
Fixes BZ#16683.
The optimization is achieved by following techniques:
> hashing of needle.
> hashing avoids scanning of duplicate entries in needle across the string.
> initializing the hash table with Vector instructions (VSX) by quadword access.
> unrolling when scanning for character in string across hash table.
The optimization is achieved by following techniques:
1. Doubleword aligned memory access and compares using
cmpb instruction.
2. Loop unrolling for byte load/store.
3. CPU pre-fetch to avoid cache miss.
This patch fix the optimized powerpc-fpu modf/modff implementation
when using in non-default rounding mode where the zero sign is not
as expected. It fixes the libm testsuite tests
modf_downward (0) == 0.00000000000000000000e+00
modf_downward (20) == 0.00000000000000000000e+00
modf_downward (21) == 0.00000000000000000000e+00
Where the sign returned was negative.
As recently discussed
<https://sourceware.org/ml/libc-alpha/2014-02/msg00670.html>, it
doesn't seem particularly useful for libm-test-ulps files to contain
huge amounts of data on ulps for individual tests; just the global
maximum observed ulps for each function, together with the
verification of exceptions, errno and special results such as
infinities and NaNs for each test, suffices to verify that a
function's behavior on the given test inputs is within the expected
accuracy. Removing this data reduces source tree churn caused by
updates to these files when libm tests are added, and reduces the
frequency with which testsuite additions actually need libm-test-ulps
changes at all.
Accordingly, this patch removes that data, so that individual tests
get checked against the global bounds for the given function and only
generate an error if those are exceeded. Tested x86_64 (including
verifying that if an ulps value is artificially reduced, the tests do
indeed fail as they should and "make regen-ulps" generates the
expected changes).
* math/libm-test.inc (struct ulp_data): Don't refer to ulps for
individual tests in comment.
(libm-test-ulps.h): Don't refer to test_ulps in #include comment.
(prev_max_error): New variable.
(prev_real_max_error): Likewise.
(prev_imag_max_error): Likewise.
(compare_ulp_data): Don't refer to test names in comment.
(find_test_ulps): Remove function.
(find_function_ulps): Likewise.
(find_complex_function_ulps): Likewise.
(init_max_error): Take function name as argument. Look up ulps
for that function.
(print_ulps): Remove function.
(print_max_error): Use prev_max_error instead of calling
find_function_ulps.
(print_complex_max_error): Use prev_real_max_error and
prev_imag_max_error instead of calling find_complex_function_ulps.
(check_float_internal): Take max_ulp parameter instead of calling
find_test_ulps. Don't call print_ulps.
(check_float): Update call to check_float_internal.
(check_complex): Update calls to check_float_internal.
(START): Pass argument to init_max_error.
* math/gen-libm-test.pl (%results): Don't include "kind"
information.
(parse_ulps): Don't handle ulps of individual tests.
(print_ulps_file): Likewise.
(output_ulps): Likewise.
* math/README.libm-test: Update.
* manual/libm-err-tab.pl (parse_ulps): Don't handle ulps of
individual tests.
* sysdeps/aarch64/libm-test-ulps: Remove individual test ulps.
* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
* sysdeps/arm/libm-test-ulps: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/ia64/fpu/libm-test-ulps: Likewise.
* sysdeps/m68k/coldfire/fpu/libm-test-ulps: Likewise.
* sysdeps/m68k/m680x0/fpu/libm-test-ulps: Likewise.
* sysdeps/microblaze/libm-test-ulps: Likewise.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
* sysdeps/s390/fpu/libm-test-ulps: Likewise.
* sysdeps/sh/libm-test-ulps: Likewise.
* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
* sysdeps/tile/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* sysdeps/hppa/fpu/libm-test-ulps: Remove individual test ulps.
This patch optimizes strrchr() for ppc64. It uses aligned memory
access along with cmpb instruction and CPU prefetch to avoid
cache misses for speed improvement.
This patch add a optimized llround/llroundf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized llrint/llrintf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized finite/finitef implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized isinf/isinff implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
This patch add a optimized isnan/isnanf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
elf/tst-auxv.c includes misc/sys/auxv.h, which ends up not actually
being included due to the guard overlap, and getauxval becomes an
implicit declaration and implicit pointer conversion which means, at
best, the test isn't actually testing what it thinks it is and, at
worst, it'll crash and burn on platforms where implict pointer
conversion is a Very Bad Thing.
* sysdeps/powerpc/bits/hwcap.h: Allow _SYSDEPS_SYSDEP_H guard as a
synonym for _SYS_AUXV_H to allow direct inclusion.
* sysdeps/sparc/bits/hwcap.h: Likewise.
* sysdeps/powerpc/sysdep.h: Define _SYSDEPS_SYSDEP_H instead of
_SYS_AUXV_H so we can include sysdep.h and sys/auxv.h together.
* sysdeps/sparc/sysdep.h: Likewise.
IEEE 754-2008 defines two ways in which tiny results can be detected,
"before rounding" (based on the infinite-precision result) and "after
rounding" (based on the result when rounded to normal precision as if
the exponent range were unbounded). All binary operations on an
architecture must use the same choice of how tininess is detected.
soft-fp has so far implemented only before-rounding tininess
detection. This patch adds support for after-rounding tininess
detection. A new macro _FP_TININESS_AFTER_ROUNDING is added that
sfp-machine.h must define (soft-fp is meant to be self-contained so
the existing tininess.h files aren't used here, though the information
going in sfp-machine.h has been taken from them). The soft-fp macros
dealing with raising underflow exceptions then handle the cases where
the choice matters specially, rounding a copy of the input to the
appropriate precision to see if a value that's tiny before rounding
isn't tiny after rounding.
Tested for mips64 using GCC trunk (which now uses soft-fp on MIPS, so
supporting exceptions and rounding modes for long double where not
previously supported - this is the immediate motivation for doing this
patch now) together with (a) a patch to sysdeps/mips/math-tests.h to
enable exceptions / rounding modes tests for long double for GCC 4.9
and later, and (b) corresponding changes applied to libgcc's soft-fp
and sfp-machine.h files. In the libgcc context this is also tested on
x86_64 (also an after-rounding architecture) with testcases for
__float128 that I intend to add to the GCC testsuite when updating
soft-fp there.
(To be clear: this patch does not fix any glibc bugs that were
user-visible in past releases, since after-rounding architectures
didn't use soft-fp in any affected case with support for
floating-point exceptions - so there is no corresponding Bugzilla bug.
Rather, it works together with the GCC changes to use soft-fp on MIPS
to allow previously absent long double functionality to work properly,
and allows soft-fp to be used in glibc on after-rounding architectures
in cases where it couldn't previously be used.)
* soft-fp/op-common.h (_FP_DECL): Mark exponent as possibly
unused.
(_FP_PACK_SEMIRAW): Determine tininess based on rounding shifted
value if _FP_TININESS_AFTER_ROUNDING and unrounded value is in
subnormal range.
(_FP_PACK_CANONICAL): Determine tininess based on rounding to
normal precision if _FP_TININESS_AFTER_ROUNDING and unrounded
value has largest subnormal exponent.
* soft-fp/soft-fp.h [FP_NO_EXCEPTIONS]
(_FP_TININESS_AFTER_ROUNDING): Undefine and redefine to 0.
* sysdeps/aarch64/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): New macro.
* sysdeps/alpha/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/arm/soft-fp/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING):
Likewise.
* sysdeps/mips/mips64/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/mips/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/powerpc/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/sh/soft-fp/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING):
Likewise.
* sysdeps/sparc/sparc32/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/sparc/sparc64/soft-fp/sfp-machine.h
(_FP_TININESS_AFTER_ROUNDING): Likewise.
* sysdeps/tile/sfp-machine.h (_FP_TININESS_AFTER_ROUNDING):
Likewise.
This patch creates implicit rules to match the abifiles if
abilist-pattern is defined in the architecture Makefile. This allows
machine specific Makefiles to define different abifiles names
(for instance *-le.abilist for powerpc64le).
The truncl assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_truncl.S)
returns wrong results for some inputs where first double is a exact integer
and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_truncl.c instead it fixes tgammal
issues regarding wrong result sign.
This patch fixes bug 16390, incorrect signs of zero results from
ldbl-128ibm atan2l, soft-float only. The problem is a longstanding
GCC bug with fabsl not being correct for signed zero for soft float,
and the fix is using -fno-builtin-fabsl as a workaround, as already
done for various other source files. Tested powerpc-nofpu.
* sysdeps/powerpc/nofpu/Makefile [$(subdir) = math]
(CFLAGS-e_atan2l.c): Use -fno-builtin-fabsl.
sysdeps/powerpc/powerpc32/libgcc-compat.S makes certain symbols that
glibc once accidentally reexported from libgcc into compat symbols.
Where the exports were purely accidental, this is the right thing to
do. However, for powerpc-nofpu the soft-fp symbols are deliberately
exported from libc, given public versions in
sysdeps/powerpc/nofpu/Versions and used by libm in preference to the
libgcc versions that do not support the software exceptions and
rounding modes. The libc versions should also be usable by user
programs, though normally libgcc gets linked in first (meaning,
effectively, that the <fenv.h> functions are broken as regards their
expected effects on user arithmetic).
A longstanding todo item is to remove the functions in question from
libgcc (when built with recent enough glibc) - that is, remove them
from static libgcc and make them compat symbols in shared libgcc - so
that this works properly (this is one of the items mentioned at
<http://gcc.gnu.org/wiki/Software_floating_point> - parts of that page
are obviously out of date, but this item still applies). Doing this
requires first that the functions are actually available from libc for
new links, not just as compat symbols.
This patch stops the symbols in question being compat symbols for
powerpc-nofpu. The nofpu Versions entries for them are removed (the
symbols never were exported at GLIBC_2.3.2, only GLIBC_2.0, because
the compat symbols took precedence).
Tested powerpc-nofpu. The symbols are no longer compat symbols and
libm.so now properly gets undefined references to them (resolved to
libc.so) instead of the libgcc copies getting linked into libm as
before.
* sysdeps/powerpc/powerpc32/libgcc-compat.S
[_SOFT_FLOAT || __NO_FPRS__] (__fixdfdi_v_glibc20): Do not define
as a macro and a compat symbol.
[_SOFT_FLOAT || __NO_FPRS__] (__fixsfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__fixunsdfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__fixunssfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__floatdidf_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__floaddisf_v_glibc20): Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixdfdi): Do
not use .hidden.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixsfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixunsdfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixunssfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__floaddidf):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__floaddisf):
Likewise.
* sysdeps/powerpc/nofpu/Versions (libc): Remove __fixdfdi,
__fixsfdi, __fixunsdfdi, __fixunssfdi, __floatdidf and __floatdisf
from GLIBC_2.3.2.
For PPC64, all the wrappers at sysdeps are superfluous: they are
basically the same implementation from math/w_sqrt.c with the
'#ifdef _IEEE_LIBM'. And the power4 version just force the 'fsqrt'
instruction utilization with an inline assembly, which is already
handled by math_private.h __ieee754_sqrt implementation.
This patch add optimized __mpn_addmul, __mpn_addsub, __mpn_lshift, and
__mpn_mul_1 implementations for PowerPC64. They are originally from GMP
with adjustments for GLIBC.
This patch add static probes for setjmp/longjmp in the way gdb expects,fixing
the gdb.base/longjmp.exp gdb testcases.
It changes the symbol_name and use macros to to avoid change the probe names
and ending up adding more logic on GDB (since with the expected name
GDB work seamlessly).
The ELFv2 ABI changes the calling convention by passing and returning
structures in registers in more cases than the old ABI:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01145.htmlhttp://gcc.gnu.org/ml/gcc-patches/2013-11/msg01147.html
For the most part, this does not affect glibc, since glibc assembler
files do not use structure parameters / return values. However, one
place is affected: the LD_AUDIT interface provides a structure to
the audit routine that contains all registers holding function
argument and return values for the intercepted PLT call.
Since the new ABI now sometimes uses registers to return values
that were never used for this purpose in the old ABI, this structure
has to be extended. To force audit routines to be modified for the
new ABI if necessary, the patch defines v2 variants of the la_ppc64
types and routines.
In addition, the patch contains two unrelated changes to the
PLT trampoline routines: it fixes a bug where FPR return values
were stored in the wrong place, and it removes the unnecessary
save/restore of CR.
This updates glibc for the changes in the ELFv2 relating to the
stack frame layout. These are described in more detail here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.htmlhttp://gcc.gnu.org/ml/gcc-patches/2013-11/msg01146.html
Specifically, the "compiler and linker doublewords" were removed,
which has the effect that the save slot for the TOC register is
now at offset 24 rather than 40 to the stack pointer.
In addition, a function may now no longer necessarily assume that
its caller has set up a 64-byte register save area its use.
To address the first change, the patch goes through all assembler
files and replaces immediate offsets in instructions accessing the
ABI-defined stack slots by symbolic offsets. Those already were
defined in ucontext_i.sym and used in some of the context routines,
but that doesn't really seem like the right place for those defines.
The patch instead defines those symbolic offsets in sysdeps.h,
in two variants for the old and new ABI, and uses them systematically
in all assembler files, not just the context routines.
The second change only affected a few assembler files that used
the save area to temporarily store some registers. In those
cases where this happens within a leaf function, this patch
changes the code to store those registers to the "red zone"
below the stack pointer. Otherwise, the functions already allocate
a stack frame, and the patch changes them to add extra space in
these frames as temporary space for the ELFv2 ABI.
This is a follow-on to the previous patch to support the ELFv2 ABI in the
dynamic loader, split off into its own patch since it is just an optional
optimization.
In the ELFv2 ABI, most functions define both a global and a local entry
point; the local entry requires r2 to be already set up by the caller
to point to the callee's TOC; while the global entry does not require
the caller to know about the callee's TOC, but it needs to set up r12
to the callee's entry point address.
Now, when setting up a PLT slot, the dynamic linker will usually need
to enter the target function's global entry point. However, if the
linker can prove that the target function is in the same DSO as the
PLT slot itself, and the whole DSO only uses a single TOC (which the
linker will let ld.so know via a DT_PPC64_OPT entry), then it is
possible to actually enter the local entry point address into the
PLT slot, for a slight improvement in performance.
Note that this uncovered a problem on the first call via _dl_runtime_resolve,
because that routine neglected to restore the caller's TOC before calling
the target function for the first time, since it assumed that function
would always reload its own TOC anyway ...
This patch adds support for the ELFv2 ABI feature to remove function
descriptors. See this GCC patch for in-depth discussion:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01141.html
This mostly involves two types of changes: updating assembler source
files to the new logic, and updating the dynamic loader.
After the refactoring in the previous patch, most of the assembler source
changes can be handled simply by providing ELFv2 versions of the
macros in sysdep.h. One somewhat non-obvious change is in __GI__setjmp:
this used to "fall through" to the immediately following __setjmp ENTRY
point. This is no longer safe in the ELFv2 since ENTRY defines both
a global and a local entry point, and you cannot simply fall through
to a global entry point as it requires r12 to be set up.
Also, makecontext needs to be updated to set up registers according to
the new ABI for calling into the context's start routine.
The dynamic linker changes mostly consist of removing special code
to handle function descriptors. We also need to support the new PLT
and glink format used by the the ELFv2 linker, see:
https://sourceware.org/ml/binutils/2013-10/msg00376.html
In addition, the dynamic linker now verifies that the dynamic libraries
it loads match its own ABI.
The hack in VDSO_IFUNC_RET to "synthesize" a function descriptor
for vDSO routines is also no longer necessary for ELFv2.
This is the first patch to support the new ELFv2 ABI in glibc.
As preparation, this patch simply refactors some of the powerpc64 assembler
code to move all code related to creating function descriptors (.opd section)
or using function descriptors (function pointer call) into a central place
in sysdep.h.
Note that most locations creating .opd entries were already using macros
in sysdep.h, this patch simply extends this to the remaining places.
No relevant change in generated code expected.
This patch updates glibc in accordance with the binutils patch checked in here:
https://sourceware.org/ml/binutils/2013-10/msg00372.html
This changes the various R_PPC64_..._HI and _HA relocations to report
32-bit overflows. The motivation is that existing uses of @h / @ha
are to build up 32-bit offsets (for the "medium model" TOC access
that GCC now defaults to), and we'd really like to see failures at
link / load time rather than silent truncations.
For those rare cases where a modifier is needed to build up a 64-bit
constant, new relocations _HIGH / _HIGHA are supported.
The patch also fixes a bug in overflow checking for the R_PPC64_ADDR30
and R_PPC64_ADDR32 relocations.
This patch helps some math functions performance by adding the libc_fexxx
variant of inline functions to handle both FPU round and exception set/restore
and by using them on the libc_fexxx_ctx functions. It is based on already coded
fexxx family functions for PPC with fpu.
Here is the summary of performance improvements due this patch (measured on a
POWER7 machine):
Before:
cos(): ITERS:9.5895e+07: TOTAL:5116.03Mcy, MAX:77.6cy, MIN:49.792cy, 18744 calls/Mcy
exp(): ITERS:2.827e+07: TOTAL:5187.15Mcy, MAX:494.018cy, MIN:38.422cy, 5450.01 calls/Mcy
pow(): ITERS:6.1705e+07: TOTAL:5144.26Mcy, MAX:171.95cy, MIN:29.935cy, 11994.9 calls/Mcy
sin(): ITERS:8.6898e+07: TOTAL:5117.06Mcy, MAX:83.841cy, MIN:46.582cy, 16982 calls/Mcy
tan(): ITERS:2.9473e+07: TOTAL:5115.39Mcy, MAX:191.017cy, MIN:172.352cy, 5761.63 calls/Mcy
After:
cos(): ITERS:2.05265e+08: TOTAL:5111.37Mcy, MAX:78.754cy, MIN:24.196cy, 40158.5 calls/Mcy
exp(): ITERS:3.341e+07: TOTAL:5170.84Mcy, MAX:476.317cy, MIN:15.574cy, 6461.23 calls/Mcy
pow(): ITERS:7.6153e+07: TOTAL:5129.1Mcy, MAX:147.5cy, MIN:30.916cy, 14847.2 calls/Mcy
sin(): ITERS:1.58816e+08: TOTAL:5115.11Mcy, MAX:1490.39cy, MIN:22.341cy, 31048.4 calls/Mcy
tan(): ITERS:3.4964e+07: TOTAL:5114.18Mcy, MAX:177.422cy, MIN:146.115cy, 6836.68 calls/Mcy