Commit Graph

33 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan
30e4cc5413 powerpc: Fix return code of strcasecmp for unaligned inputs
If the input values are unaligned and if there are null characters in the
memory before the starting address of the input values, strcasecmp
gives incorrect return code. Fixed it by adding mask the bits that
are not part of the string.
2016-07-05 21:20:41 +05:30
Anton Blanchard
aa95fc13f5 powerpc: Add a POWER8-optimized version of sinf()
This uses the implementation of sinf() in sysdeps/x86_64/fpu/s_sinf.S
as inspiration.
2016-06-30 16:08:49 -03:00
Tulio Magno Quites Machado Filho
35da2541c3 powerpc: Add a POWER8-optimized version of expf()
This implementation is based on the one already used at
sysdeps/x86_64/fpu/e_expf.S.

This implementation improves the performance by ~14% on average in synthetic
benchmarks at the cost of decreasing accuracy to 1 ULP.
2016-06-30 14:56:14 -03:00
raji
c8376f3e07 powerpc: strcasecmp/strncasecmp optmization for power8
This implementation utilizes vectors to improve performance
compared to current byte by byte implementation for POWER7.
The performance improvement is upto 4x.  This patch is tested
on powerpc64 and powerpc64le.
2016-06-14 14:51:16 +05:30
Tulio Magno Quites Machado Filho
c24480ce3b powerpc: Fix --disable-multi-arch build on POWER8
Add missing symbols of stpncpy and strcasestr when multi-arch is
disabled.
Fix memset call from strncpy/stpncpy when multi-arch is disabled.
2016-06-06 16:03:29 -03:00
Gabriel F. T. Gomes
eb3b8a4924 powerpc: Fix operand prefixes
The file sysdeps/powerpc/sysdeps.h defines aliases for condition register
operands.  E.g.: 'cr7' means condition register 7.  On the one hand, this
increases readability, as it makes it easier for readers to know whether the
operand is a condition register, a general purpose register or an immediate.
On the other hand, this permits that condition registers be written as if they
were general purpose, and vice-versa, thus reducing the readability of the
code.

This commit removes some of these unintentional misuses.

The changes have no effect on the final code.  Checked with objdump.
2016-05-04 09:14:52 -03:00
Gabriel F. T. Gomes
72c11b353e powerpc: Zero pad using memset in strncpy/stpncpy
Call __memset_power8 to pad, with zeros, the remaining bytes in the
dest string on __strncpy_power8 and __stpncpy_power8.  This improves
performance when n is larger than the input string, giving ~30% gain for
larger strings without impacting much shorter strings.
2016-04-29 10:05:33 -03:00
Paul E. Murphy
8f1b841e45 powerpc: Add optimized strcspn for P8
A few minor adjustments to the P8 strspn gives us
an almost equally optimized P8 strcspn.
2016-04-25 09:11:02 -05:00
Rajalakshmi Srinivasaraghavan
e413b14e18 powerpc: strcasestr optmization for power8
This patch optimizes strcasestr function for power >= 8 systems.  The average
improvement of this optimization is ~40% and compares 16 bytes at a time
using vector instructions.  This patch is tested on powerpc64 and powerpc64le.
2016-04-22 19:23:13 +05:30
Carlos Eduardo Seo
1b045ee53e powerpc: Optimization for strlen for POWER8.
This implementation takes advantage of vectorization to improve performance of
the loop over the current strlen implementation for POWER7.
2016-04-15 17:19:19 -03:00
Paul E. Murphy
25dba0ad05 powerpc: Add optimized P8 strspn
This utilizes vectors and bitmasks.  For small needle, large
haystack, the performance improvement is upto 8x.  For short
strings (0-4B), the cost of computing the bitmask dominates,
and is a tad slower.
2016-04-07 15:51:28 -05:00
Joseph Myers
f7a9f785e5 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
Gabriel F. T. Gomes
b0f81637d5 PowerPC: Add comments to optimized strncpy
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Added comments to some
	assembly instructions.
2015-10-01 17:36:55 -03:00
Gabriel F. T. Gomes
850713336e PowerPC: Fix operand prefixes
The file sysdeps/powerpc/sysdeps.h defines aliases for register operands,
which add the letter 'r' as a prefix to a register name.  E.g.: register 20
can be written as 'r20', instead of '20'.  On the one hand, this increases
readability, as it makes it easier for readers to know whether the operand is a
register or an immediate.  On the other hand, this permits that immediate
operands be written as if they were registers, and vice-versa, thus reducing
the readability of the code.

This commit removes some of these unintentional misuses.

This commit also increases readability of the code by adding the prefix 'cr' to
some uses of the control register.

Both changes have no effect on the final code.  Checked with objdump.

	* sysdeps/powerpc/powerpc64/power8/strncpy.S: Remove or add register
	prefix from operands.
2015-10-01 17:36:46 -03:00
Adhemerval Zanella
bea5801360 powerpc: Fix powerpc64 build failure with binutils 2.22
GLIBC memset optimization for POWER8 uses the '.machine power8'
directive, which is only supported officially on binutils 2.24+.  This
causes a build failure on older binutils.

Since the requirement of .machine power8 is to correctly assembly the
'mtvsrd' instruction and it is already handled by the MTVSRD_V1_R4
macro, there is no really needed of using it.

The patch replaces the power8 with power7 for .machine directive.

It fixes BZ#17869.
2015-01-24 08:40:04 -05:00
Adhemerval Zanella
d3b00f468b powerpc: Optimized strncmp for POWER8/PPC64
This patch adds an optimized POWER8 strncmp.  The implementation focus
on speeding up unaligned cases follwing the ideas of power8 strcmp.

The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
(where sources alignment are different).
2015-01-13 14:35:40 -05:00
Adhemerval Zanella
8bedcb5f03 powerpc: Optimized strcmp for POWER8/PPC64
This patch adds an optimized POWER8 strcmp using unaligned accesses.
The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
2015-01-13 11:28:58 -05:00
Adhemerval Zanella
f06a4faf8a powerpc: Optimized st{r,p}ncpy for POWER8/PPC64
This patch adds an optimized POWER8 st{r,p}ncpy using unaligned accesses.
It shows 10%-80% improvement over the optimized POWER7 one that uses
only aligned accesses, specially on unaligned inputs.

The algorithm first read and check 16 bytes (if inputs do not cross a 4K
page size).  The it realign source to 16-bytes and issue a 16 bytes read
and compare loop to speedup null byte checks for large strings.  Also,
different from POWER7 optimization, the null pad is done inline in the
implementation using possible unaligned accesses, instead of realying on
a memset call.  Special case is added for page cross reads.
2015-01-13 11:28:44 -05:00
Adhemerval Zanella
96d6fd6c40 powerpc: Optimized st{r,p}cpy for POWER8/PPC64
This patch adds an optimized POWER8 strcpy using unaligned accesses.
For strings up to 16 bytes the implementation first calculate the
string size, like strlen, and issues a memcpy.  For larger strings,
source is first aligned to 16 bytes and then tested over a loop that
reads 16 bytes am combine the cmpb results for speedup.  Special case is
added for page cross reads.

It shows 30%-60% improvement over the optimized POWER7 one that uses
only aligned accesses.
2015-01-13 11:28:30 -05:00
Joseph Myers
b168057aaa Update copyright dates with scripts/update-copyrights. 2015-01-02 16:29:47 +00:00
Siddhesh Poyarekar
a109996ef9 Remove IS_IN_libm
Replace with IS_IN (libm). Generated code unchanged on x86_64.

        * include/math.h: Use IS_IN instead of IS_IN_libm.
        * sysdeps/alpha/fpu/s_copysign.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_finitel.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_frexpl.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_isinfl.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_isnanl.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_modfl.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c: Likewise.
        * sysdeps/ieee754/ldbl-128ibm/s_signbitl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_copysignl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_finitel.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_frexpl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_isinfl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_isnanl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_modfl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_scalbnl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/s_signbitl.c: Likewise.
        * sysdeps/ieee754/ldbl-64-128/w_scalblnl.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_copysign.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_finite.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_frexp.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_isinf.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_isnan.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_ldexp.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_ldexpl.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_modf.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_scalbln.c: Likewise.
        * sysdeps/ieee754/ldbl-opt/s_scalbn.c: Likewise.
        * sysdeps/powerpc/power5+/fpu/s_modf.c: Likewise.
        * sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Likewise.
        * sysdeps/powerpc/powerpc32/fpu/s_copysignl.S: Likewise.
        * sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c: Likewise.
        * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise.
        * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise.
        * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
        * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_modf.c: Likewise.
        * sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise.
        * sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise.
        * sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise.
        * sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/s_modf.c: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/s_copysignl.S: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise.
        * sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise.
        * sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise.
        * sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise.
        * sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise.
        * sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise.
        * sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise.
        * sysdeps/sparc/sparc32/fpu/s_signbitl.S: Likewise.
        * sysdeps/sparc/sparc32/sparcv9/fpu/s_isnan.S: Likewise.
        * sysdeps/unix/sysv/linux/alpha/fraiseexcpt.S: Likewise.
2014-11-24 11:41:47 +05:30
Adhemerval Zanella
7110166d4f powerpc: Simplify encoding of POWER8 instruction 2014-11-05 08:01:09 -05:00
Adhemerval Zanella
5e4df2848d powerpc: Fix encoding of POWER8 instruction
This patch adds a binary encoding for 'mtvsrd' instruction to avoid
build failures when assembler does not support POWER8.
2014-11-03 07:26:33 -05:00
Adhemerval Zanella
71ae86478e PowerPC: memset optimization for POWER8/PPC64
This patch adds an optimized memset implementation for POWER8.  For
sizes from 0 to 255 bytes, a word/doubleword algorithm similar to
POWER7 optimized one is used.

For size higher than 255 two strategies are used:

1. If the constant is different than 0, the memory is written with
   altivec vector instruction;

2. If constant is 0, dbcz instructions are used.  The loop is unrolled
   to clear 512 byte at time.

Using vector instructions increases throughput considerable, with a
double performance for sizes larger than 1024.  The dcbz loops unrolls
also shows performance improvement, by doubling throughput for sizes
larger than 8192 bytes.
2014-09-10 07:39:46 -04:00
Adhemerval Zanella
de21c33c06 PowerPC: Fix --disable-multi-arch builds
This patch fixes some powerpc32 and powerpc64 builds with
--disable-multi-arch option along with different --with-cpu=powerN.
It cleanups the Implies directories by removing the multiarch
folder for non multiarch config and also fixing two assembly
implementations: powerpc64/power7/strncat.S that is calling the
wrong strlen; and power8/fpu/s_isnan.S that misses the hidden_def and
weak_alias directives.
2014-04-09 06:22:53 -05:00
Adhemerval Zanella
757d9dd5c3 PowerPC: Fix little endian enconding for mfvsrd
This patch fixes the MFVSRD_R3_V1 macro that encodes 'mfvsrd  r3,vs1'
(to support old binutils) for little endian.
2014-03-31 08:00:38 -05:00
Adhemerval Zanella
fe13a20c37 PowerPC: llround/llroundf POWER8 optimization
This patch add a optimized llround/llroundf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
2014-02-27 12:58:33 -06:00
Adhemerval Zanella
1ad8950a3e PowerPC: llrint/llrintf POWER8 optimization
This patch add a optimized llrint/llrintf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
2014-02-27 12:58:33 -06:00
Adhemerval Zanella
cac626d60a PowerPC: Optimized finite/finitef for POWER8
This patch add a optimized finite/finitef implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
2014-02-27 12:58:33 -06:00
Adhemerval Zanella
4393fc119c PowerPC: Optimized isinf/isinff for POWER8
This patch add a optimized isinf/isinff implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
2014-02-27 12:58:33 -06:00
Adhemerval Zanella
487972aea5 PowerPC: Optimized isnan/isnanf for POWER8
This patch add a optimized isnan/isnanf implementation for POWER8
using the new Move From VSR Doubleword instruction to gains some
cycles from FP to GRP register move.
2014-02-27 12:58:32 -06:00
Adhemerval Zanella
5e6a4d4b9e PowerPC: Adjust multiarch Implies for PowerPC64
This patch adds Implies files on multiarch folder for POWER chips so
multirach is enabled when building with --with-cpu and powerN
option.
2013-12-13 14:29:27 -05:00
Ryan S. Arnold
2f063a6e84 PowerPC: Enable POWER8 platform sans hwcap bits. 2013-06-24 15:33:32 -05:00