glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-25 12:11:10 +00:00

Author	SHA1	Message	Date
Joseph Myers	b22be8c368	Add fesetexcept: powerpc. This patch adds PowerPC versions of fesetexcept. * sysdeps/powerpc/fpu/fesetexcept.c: New file. * sysdeps/powerpc/nofpu/fesetexcept.c: Likewise. * sysdeps/powerpc/powerpc32/e500/nofpu/fesetexcept.c: Likewise.	2016-08-16 16:22:12 +00:00
Joseph Myers	3f0eedddbe	Add comment from sysdeps/powerpc/fpu/fraiseexcpt.c to fsetexcptflg.c. * sysdeps/powerpc/fpu/fsetexcptflg.c (__fesetexceptflag): Add comment from fraiseexcpt.c.	2016-08-12 17:49:07 +00:00
Joseph Myers	f792117921	Fix powerpc fesetexceptflag clearing FE_INVALID (bug 20455). As shown by the test math/test-fexcept, the powerpc fesetexceptflag implementation fails to clear a previously set FE_INVALID flag, when that flag is clear in the saved exceptions and FE_INVALID is included in the mask of flags to restore, because it fails to mask out the sub-exceptions of FE_INVALID from the FPSCR state. This patch fixes the masking logic accordingly. Tested for powerpc. [BZ #20455] * sysdeps/powerpc/fpu/fsetexcptflg.c (__fesetexceptflag): Mask out all FE_INVALID sub-exceptions from FPSCR when FE_INVALID specified to be restored.	2016-08-10 21:47:35 +00:00
Joseph Myers	5220a1aa8d	Add tests for fegetexceptflag, fesetexceptflag. I noticed that there was no meaningful test coverage for fegetexceptflag and fesetexceptflag (one test ensures that calls to them compile and link, but nothing to verify they work correctly). This patch adds tests for these functions. fesetexceptflag is meant to set the relevant exception flag bits to the saved state without causing enabled traps to be taken. On some architectures, it is not possible to set exception flag bits without causing enabled traps to occur. Such architectures need to define EXCEPTION_SET_FORCES_TRAP to 1 in their math-tests.h, as is done in this patch for powerpc. x86 avoids needing to define this because the traps resulting from setting exception bits don't occur until the next floating-point operation or fwait instruction. Tested for x86_64, x86 and powerpc. Note that test-fexcept fails for powerpc because of a pre-existing bug in fesetexceptflag for powerpc, which I'll fix separately. * math/test-fexcept-traps.c: New file. * math/test-fexcept.c: Likewise. * math/Makefile (tests): Add test-fexcept and test-fexcept-traps. * sysdeps/generic/math-tests.h (EXCEPTION_SET_FORCES_TRAP): New macro. * sysdeps/powerpc/math-tests.h [!__NO_FPRS__] (EXCEPTION_SET_FORCES_TRAP): Likewise.	2016-08-10 21:01:08 +00:00
Aurelien Jarno	30f926d3b3	powerpc: fix ifunc-sel.h fix asm constraints and clobber list As pointer out on the mailing list, the inline assembly code in sysdeps/powerpc/ifunc-sel.h doesn't have a list of clobbered registers and used wrong constraints. This patch fixes that. I verified it doesn't introduce any change in the generated code. Changelog: * sysdeps/powerpc/ifunc-sel.h (ifunc_sel): Add "11", "12", "cr0" to the clobber list. Use "i" constraint instead of "X". (ifunc_one): Add "12" to the clobber list. Use "i" constraint instead of "X".	2016-08-03 00:22:44 +02:00
Aurelien Jarno	ee71e5b6dd	powerpc: fix ifunc-sel.h with GCC 6 On 32-bit PowerPC GCC 6 always saves the PIC register on the stack in the prologue and adjust the stack in the epilogue. It is therefore not possible anymore to just exit the function in the inline asm code, otherwise it corrupts the stack pointer. This causes the following tests to fail when using GCC 6: FAIL: elf/ifuncmain1 FAIL: elf/ifuncmain1pic FAIL: elf/ifuncmain1picstatic FAIL: elf/ifuncmain1pie FAIL: elf/ifuncmain1staticpic FAIL: elf/ifuncmain1staticpie FAIL: elf/ifuncmain1vis FAIL: elf/ifuncmain1vispic FAIL: elf/ifuncmain1vispie FAIL: elf/ifuncmain2pic FAIL: elf/ifuncmain2picstatic FAIL: elf/ifuncmain3 FAIL: elf/ifuncmain4picstatic FAIL: elf/ifuncmain5 FAIL: elf/ifuncmain5picstatic FAIL: elf/ifuncmain5staticpic The solution is to replace the beqlr instructions by a beq to the end of the inline asm code. This fixes all the above failures. ChangeLog: * sysdeps/powerpc/ifunc-sel.h (ifunc_sel): Replace beqlr instructions by beq instructions jumping to the end of the function.	2016-08-03 00:22:44 +02:00
Aurelien Jarno	6bcc7ced4f	ppc: Fix modf (sNaN) for pre-POWER5+ CPU (bug 20240). Commit `a6a4395d` fixed modf implementation by compiling s_modf.c and s_modff.c with -fsignaling-nans. However these files are also included from the pre-POWER5+ implementation, and thus these files should also be compiled with -fsignaling-nans. Changelog: [BZ #20240] * sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile (CFLAGS-s_modf-ppc32.c): New variable. (CFLAGS-s_modff-ppc32.c): Likewise. * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (CFLAGS-s_modf-ppc64.c): Likewise. (CFLAGS-s_modff-ppc64.c): Likewise.	2016-07-08 11:24:34 +02:00
Rajalakshmi Srinivasaraghavan	30e4cc5413	powerpc: Fix return code of strcasecmp for unaligned inputs If the input values are unaligned and if there are null characters in the memory before the starting address of the input values, strcasecmp gives incorrect return code. Fixed it by adding mask the bits that are not part of the string.	2016-07-05 21:20:41 +05:30
Anton Blanchard	aa95fc13f5	powerpc: Add a POWER8-optimized version of sinf() This uses the implementation of sinf() in sysdeps/x86_64/fpu/s_sinf.S as inspiration.	2016-06-30 16:08:49 -03:00
Tulio Magno Quites Machado Filho	35da2541c3	powerpc: Add a POWER8-optimized version of expf() This implementation is based on the one already used at sysdeps/x86_64/fpu/e_expf.S. This implementation improves the performance by ~14% on average in synthetic benchmarks at the cost of decreasing accuracy to 1 ULP.	2016-06-30 14:56:14 -03:00
Torvald Riegel	76a0b73e81	Remove atomic_compare_and_exchange_bool_rel. atomic_compare_and_exchange_bool_rel and catomic_compare_and_exchange_bool_rel are removed and replaced with the new C11-like atomic_compare_exchange_weak_release. The concurrent code in nscd/cache.c has not been reviewed yet, so this patch does not add detailed comments. * nscd/cache.c (cache_add): Use new C11-like atomic operation instead of atomic_compare_and_exchange_bool_rel. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise. * include/atomic.h (atomic_compare_and_exchange_bool_rel, catomic_compare_and_exchange_bool_rel): Remove. * sysdeps/aarch64/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/alpha/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/arm/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/mips/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise. * sysdeps/tile/atomic-machine.h (atomic_compare_and_exchange_bool_rel): Likewise.	2016-06-24 23:04:40 +03:00
Florian Weimer	aca1daef29	elf: Consolidate machine-agnostic DTV definitions in <dl-dtv.h> Identical definitions of dtv_t and TLS_DTV_UNALLOCATED were repeated for all architectures using DTVs.	2016-06-20 14:31:40 +02:00
Joseph Myers	f4015c8a86	Use generic fdim on more architectures (bug 6796, bug 20255, bug 20256). Some architectures have their own versions of fdim functions, which are missing errno setting (bug 6796) and may also return sNaN instead of qNaN for sNaN input, in the case of the x86 / x86_64 long double versions (bug 20256). These versions are not actually doing anything that a compiler couldn't generate, just straightforward comparisons / arithmetic (and, in the x86 / x86_64 case, testing for NaNs with fxam, which isn't actually needed once you use an unordered comparison and let the NaNs pass through the same subtraction as non-NaN inputs). This patch removes the x86 / x86_64 / powerpc versions, so that those architectures use the generic C versions, which correctly handle setting errno and deal properly with sNaN inputs. This seems better than dealing with setting errno in lots of .S versions. The i386 versions also return results with excess range and precision, which is not appropriate for a function exactly defined by reference to IEEE operations. For errno setting to work correctly on overflow, it's necessary to remove excess range with math_narrow_eval, which this patch duly does in the float and double versions so that the tests can reliably pass on x86. For float, this avoids any double rounding issues as the long double precision is more than twice that of float. For double, double rounding issues will need to be addressed separately, so this patch does not fully fix bug 20255. Tested for x86_64, x86 and powerpc. [BZ #6796] [BZ #20255] [BZ #20256] * math/s_fdim.c: Include <math_private.h>. (__fdim): Use math_narrow_eval on result. * math/s_fdimf.c: Include <math_private.h>. (__fdimf): Use math_narrow_eval on result. * sysdeps/i386/fpu/s_fdim.S: Remove file. * sysdeps/i386/fpu/s_fdimf.S: Likewise. * sysdeps/i386/fpu/s_fdiml.S: Likewise. * sysdeps/i386/i686/fpu/s_fdim.S: Likewise. * sysdeps/i386/i686/fpu/s_fdimf.S: Likewise. * sysdeps/i386/i686/fpu/s_fdiml.S: Likewise. * sysdeps/powerpc/fpu/s_fdim.c: Likewise. * sysdeps/powerpc/fpu/s_fdimf.c: Likewise. * sysdeps/powerpc/powerpc32/fpu/s_fdim.c: Likewise. * sysdeps/powerpc/powerpc64/fpu/s_fdim.c: Likewise. * sysdeps/x86_64/fpu/s_fdiml.S: Likewise. * math/libm-test.inc (fdim_test_data): Expect errno setting on overflow. Add sNaN tests.	2016-06-14 16:04:19 +00:00
raji	c8376f3e07	powerpc: strcasecmp/strncasecmp optmization for power8 This implementation utilizes vectors to improve performance compared to current byte by byte implementation for POWER7. The performance improvement is upto 4x. This patch is tested on powerpc64 and powerpc64le.	2016-06-14 14:51:16 +05:30
Tulio Magno Quites Machado Filho	c24480ce3b	powerpc: Fix --disable-multi-arch build on POWER8 Add missing symbols of stpncpy and strcasestr when multi-arch is disabled. Fix memset call from strncpy/stpncpy when multi-arch is disabled.	2016-06-06 16:03:29 -03:00
Joseph Myers	f6ef0657e4	Fix powerpc64 ceil, rint etc. on sNaN input (bug 20160). The powerpc64 versions of ceil, floor, round, trunc, rint, nearbyint and their float versions return sNaN for sNaN input when they should return qNaN. This patch fixes them to add a NaN argument to itself to quiet sNaNs before returning. Tested for powerpc64. [BZ #20160] * sysdeps/powerpc/powerpc64/fpu/s_ceil.S (__ceil): Add NaN argument to itself before returning the result. * sysdeps/powerpc/powerpc64/fpu/s_ceilf.S (__ceilf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floor.S (__floor): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floorf.S (__floorf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S (__nearbyint): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S (__nearbyintf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rint.S (__rint): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_rintf.S (__rintf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_round.S (__round): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_roundf.S (__roundf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_trunc.S (__trunc): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_truncf.S (__truncf): Likewise.	2016-05-27 17:47:54 +00:00
Joseph Myers	debf7618f6	Fix powerpc32 ceil, rint etc. on sNaN input (bug 20160). The powerpc32 versions of ceil, floor, round, trunc, rint, nearbyint and their float versions return sNaN for sNaN input when they should return qNaN. This patch fixes them to add a NaN argument to itself to quiet sNaNs before returning. The powerpc64 versions, which have the same bug, will be addressed separately. Tested for powerpc32. [BZ #20160] * sysdeps/powerpc/powerpc32/fpu/s_ceil.S (__ceil): Add NaN argument to itself before returning the result. * sysdeps/powerpc/powerpc32/fpu/s_ceilf.S (__ceilf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_floor.S (__floor): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_floorf.S (__floorf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S (__nearbyint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S (__nearbyintf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_rint.S (__rint): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_rintf.S (__rintf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_round.S (__round): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_roundf.S (__roundf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_trunc.S (__trunc): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_truncf.S (__truncf): Likewise.	2016-05-27 17:31:21 +00:00
Joseph Myers	24e9ae1bc2	Avoid "invalid" exceptions from powerpc fabsl (sNaN) (bug 20157). The powerpc implementations of fabsl for ldbl-128ibm (both powerpc32 and powerpc64) wrongly raise the "invalid" exception for sNaN arguments. fabs functions should be quiet for all inputs including signaling NaNs. The problem is the use of a comparison instruction fcmpu to determine if the high part of the argument is negative and so the low part needs to be negated; such instructions raise "invalid" for sNaNs. There is a pure integer implementation of fabsl in sysdeps/ieee754/ldbl-128ibm/s_fabsl.c. However, it's not necessary to use it to avoid such exceptions. The fsel instruction does not raise exceptions for sNaNs, and can be used in place of the original comparison. (Note that if the high part is zero or a NaN, it does not matter whether the low part is negated; the choice of whether the low part of a zero is +0 or -0 does not affect the value, and the low part of a NaN does not affect the value / payload either.) The condition in GCC for fsel to be available is TARGET_PPC_GFXOPT, corresponding to the _ARCH_PPCGR predefined macro. fsel is available on all 64-bit processors supported by GCC. A few 32-bit processors supported by GCC do not have TARGET_PPC_GFXOPT despite having hard float support. To support those processors, integer code (similar to that in copysignl) is included for the !_ARCH_PPCGR case for powerpc32. Tested for powerpc32 (configurations with and without _ARCH_PPCGR) and powerpc64. [BZ #20157] * sysdeps/powerpc/powerpc32/fpu/s_fabsl.S (__fabsl): Use fsel to determine whether to negate low half if [_ARCH_PPCGR], and integer comparison otherwise. * sysdeps/powerpc/powerpc64/fpu/s_fabsl.S (__fabsl): Use fsel to determine whether to negate low half.	2016-05-27 15:29:31 +00:00
Joseph Myers	b4d80349bb	Do not raise "inexact" from powerpc64 ceil, floor, trunc (bug 15479). Continuing fixes for ceil, floor and trunc functions not to raise the "inexact" exception, this patch fixes the versions used on older powerpc64 processors. As was done with the round implementations some time ago, the save of floating-point state is moved after the first floating-point operation on the input to ensure that any "invalid" exception from signaling NaN input is included in the saved state, and then the whole state gets restored rather than just the rounding mode. This has no effect on configurations using the power5+ code, since such processors can do these operations with a single instruction (and those instructions do not set "inexact", so are correct for TS 18661-1 semantics). Tested for powerpc64. [BZ #15479] * sysdeps/powerpc/powerpc64/fpu/s_ceil.S (__ceil): Move save of floating-point state after first floating-point operation on input. Restore full floating-point state instead of just rounding mode. * sysdeps/powerpc/powerpc64/fpu/s_ceilf.S (__ceilf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floor.S (__floor): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_floorf.S (__floorf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_trunc.S (__trunc): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_truncf.S (__truncf): Likewise.	2016-05-25 17:42:22 +00:00
Joseph Myers	1f921a93e4	Do not raise "inexact" from powerpc32 ceil, floor, trunc (bug 15479). Continuing fixes for ceil, floor and trunc functions not to raise the "inexact" exception, this patch fixes the versions used on older powerpc32 processors. As was done with the round implementations some time ago, the save of floating-point state is moved after the first floating-point operation on the input to ensure that any "invalid" exception from signaling NaN input is included in the saved state, and then the whole state gets restored rather than just the rounding mode. This has no effect on configurations using the power5+ code, since such processors can do these operations with a single instruction (and those instructions do not set "inexact", so are correct for TS 18661-1 semantics). Tested for powerpc32. [BZ #15479] * sysdeps/powerpc/powerpc32/fpu/s_ceil.S (__ceil): Move save of floating-point state after first floating-point operation on input. Restore full floating-point state instead of just rounding mode. * sysdeps/powerpc/powerpc32/fpu/s_ceilf.S (__ceilf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_floor.S (__floor): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_floorf.S (__floorf): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_trunc.S (__trunc): Likewise. * sysdeps/powerpc/powerpc32/fpu/s_truncf.S (__truncf): Likewise.	2016-05-25 16:53:23 +00:00
Gabriel F. T. Gomes	eb3b8a4924	powerpc: Fix operand prefixes The file sysdeps/powerpc/sysdeps.h defines aliases for condition register operands. E.g.: 'cr7' means condition register 7. On the one hand, this increases readability, as it makes it easier for readers to know whether the operand is a condition register, a general purpose register or an immediate. On the other hand, this permits that condition registers be written as if they were general purpose, and vice-versa, thus reducing the readability of the code. This commit removes some of these unintentional misuses. The changes have no effect on the final code. Checked with objdump.	2016-05-04 09:14:52 -03:00
Gabriel F. T. Gomes	72c11b353e	powerpc: Zero pad using memset in strncpy/stpncpy Call __memset_power8 to pad, with zeros, the remaining bytes in the dest string on __strncpy_power8 and __stpncpy_power8. This improves performance when n is larger than the input string, giving ~30% gain for larger strings without impacting much shorter strings.	2016-04-29 10:05:33 -03:00
Paul E. Murphy	8f1b841e45	powerpc: Add optimized strcspn for P8 A few minor adjustments to the P8 strspn gives us an almost equally optimized P8 strcspn.	2016-04-25 09:11:02 -05:00
Rajalakshmi Srinivasaraghavan	e413b14e18	powerpc: strcasestr optmization for power8 This patch optimizes strcasestr function for power >= 8 systems. The average improvement of this optimization is ~40% and compares 16 bytes at a time using vector instructions. This patch is tested on powerpc64 and powerpc64le.	2016-04-22 19:23:13 +05:30
Carlos Eduardo Seo	1b045ee53e	powerpc: Optimization for strlen for POWER8. This implementation takes advantage of vectorization to improve performance of the loop over the current strlen implementation for POWER7.	2016-04-15 17:19:19 -03:00
Paul E. Murphy	25dba0ad05	powerpc: Add optimized P8 strspn This utilizes vectors and bitmasks. For small needle, large haystack, the performance improvement is upto 8x. For short strings (0-4B), the cost of computing the bitmask dominates, and is a tad slower.	2016-04-07 15:51:28 -05:00
Adhemerval Zanella	528ffb3a04	Remove powerpc64 strspn, strcspn, and strpbrk implementation This patch removes the powerpc64 optimized strspn, strcspn, and strpbrk assembly implementation now that the default C one implements the same strategy. On internal glibc benchtests current implementations shows similar performance with -O2. Tested on powerpc64le (POWER8). * sysdeps/powerpc/powerpc64/strcspn.S: Remove file. * sysdeps/powerpc/powerpc64/strpbrk.S: Remove file. * sysdeps/powerpc/powerpc64/strspn.S: Remove file.	2016-04-01 10:44:45 -03:00
Rajalakshmi Srinivasaraghavan	869d7180dd	powerpc: Rearrange cfi_offset calls This patch rearranges cfi_offset() calls after the last store so as to avoid extra DW_CFA_advance opcodes in unwind information.	2016-03-11 11:31:58 -03:00
Joseph Myers	613c92b3b5	Fix ldbl-128ibm nearbyintl in non-default rounding modes (bug 19790). The ldbl-128ibm implementation of nearbyintl uses logic that only works in round-to-nearest mode. This contrasts with rintl, which works in all rounding modes. Now, arguably nearbyintl could simply be aliased to rintl, given that spurious "inexact" is generally allowed for ldbl-128ibm, even for the underlying arithmetic operations. But given that the only point of nearbyintl is to avoid "inexact", this patch follows the more conservative approach of adding conditionals to the rintl implementation to make it suitable for use to implement nearbyintl, then builds it for nearbyintl with USE_AS_NEARBYINTL defined. The test test-nearbyint-except-2 shows up issues when traps on "inexact" are enabled, which turn out to be problems with the powerpc fenv_private.h implementation (two functions that should disable exception traps potentially failing to do so in some cases); this patch duly fixes that as well (I don't see any other existing cases where this would be user-visible; there isn't much use of _NOEX, hold* etc. in libm that requires exceptions to be discarded and not trapped on). Tested for powerpc. [BZ #19790] * sysdeps/ieee754/ldbl-128ibm/s_rintl.c [USE_AS_NEARBYINTL] (rintl): Define as macro. [USE_AS_NEARBYINTL] (__rintl): Likewise. (__rintl) [USE_AS_NEARBYINTL]: Use SET_RESTORE_ROUND_NOEX instead of fesetround. Ensure results are evaluated before end of scope. * sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c: Define USE_AS_NEARBYINTL and include s_rintl.c. * sysdeps/powerpc/fpu/fenv_private.h (libc_feholdsetround_ppc): Disable exception traps in new environment. (libc_feholdsetround_ppc_ctx): Likewise.	2016-03-09 00:30:59 +00:00
Gabriel F. T. Gomes	183a34dc4a	powerpc: Remove uses of operand modifier (%s) in inline asm The operand modifier %s on powerpc is an undocumented internal implementation detail of GCC. Besides that, the GCC community wants to remove it. This patch rewrites the expressions that use this modifier with logically equivalent expressions that don't require it. Explanation for the substitution: The %s modifier takes an immediate operand and prints 32 less such immediate. Thus, in the previous code, the expression resulted in: 32 - __builtin_ffs(e) where e was guaranteed to have exactly a single bit set, by the following expressions: (e & (e-1) == 0) : e has at most one bit set. (e != 0) : e is not zero, thus it has at least one bit set. Since we guarantee that there is exactly only one bit set, the following statement is true: 32 - __builtin_ffs(e) == __builtin_clz(e) Thus, we can replace __builtin_ffs with __builtin_clz and remove the %s operand modifier.	2016-03-08 15:30:28 -03:00
Carlos Eduardo Seo	911569d02d	powerpc: Fix dl-procinfo HWCAP HWCAP-related code should had been updated when the 32 bits of HWCAP were used. This patch updates the code in dl-procinfo.h to loop through all the 32 bits in HWCAP and updates _dl_powerpc_cap_flags accordingly.	2016-03-08 15:30:06 -03:00
Rajalakshmi Srinivasaraghavan	ebf1264f61	powerpc: Regenerate libm-test-ulps	2016-02-04 16:40:54 -02:00
Andreas Schwab	4fb66fac3a	Remove unused variables They are flagged by -Wunused-const-variable.	2016-01-27 09:30:16 +01:00
Joseph Myers	2e3d0de31f	Fix ulps regeneration for -finite tests. On running tests after from-scratch ulps regeneration, I found that some libm tests failed with ulps in excess of those recorded in the from-scratch regeneration, which should never happen unless those ulps exceed the limit on ulps that can go in libm-test-ulps files. Failure: Test: atan2_upward (inf, -inf) Result: is: 2.35619498e+00 0x1.2d97ccp+1 should be: 2.35619450e+00 0x1.2d97c8p+1 difference: 4.76837159e-07 0x1.000000p-21 ulp : 2.0000 max.ulp : 1.0000 Maximal error of `atan2_upward' is : 2 ulp accepted: 1 ulp Failure: Test: carg_upward (-inf + inf i) Result: is: 2.35619498e+00 0x1.2d97ccp+1 should be: 2.35619450e+00 0x1.2d97c8p+1 difference: 4.76837159e-07 0x1.000000p-21 ulp : 2.0000 max.ulp : 1.0000 Maximal error of `carg_upward' is : 2 ulp accepted: 1 ulp The problem comes from the addition of tests for the finite-math-only versions of libm functions. Those tests share ulps with the default function variants. make regen-ulps runs the default tests before the finite-math-only tests, concatenating the resulting ulps before feeding them to gen-libm-test.pl to generate a new libm-test-ulps file. But gen-libm-test.pl always takes the last ulps value given for any (function, type) pair. So, if the largest ulps for a function come from non-finite inputs, a from-scratch regeneration loses those ulps. This patch fixes gen-libm-test.pl, in the case where there are multiple ulps values for a (function, type) pair - which can only happen as part of a regeneration - to take the largest ulps value rather than the last one. Tested for ARM / MIPS / powerpc-nofpu. math/gen-libm-test.pl (parse_ulps): Do not reduce already-recorded ulps. * sysdeps/arm/libm-test-ulps: Regenerated. * sysdeps/mips/mips32/libm-test-ulps: Likewise. * sysdeps/mips/mips64/libm-test-ulps: Likewise. * sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.	2016-01-19 21:42:58 +00:00
Joseph Myers	844c75aa06	Regenerate powerpc-nofpu libm-test-ulps. * sysdeps/powerpc/nofpu/libm-test-ulps: Regenerated.	2016-01-18 23:02:03 +00:00
Tulio Magno Quites Machado Filho	42bf1c8971	powerpc: Enforce compiler barriers on hardware transactions Work around a GCC behavior with hardware transactional memory built-ins. GCC doesn't treat the PowerPC transactional built-ins as compiler barriers, moving instructions past the transaction boundaries and altering their atomicity.	2016-01-08 17:47:33 -02:00
Carlos Eduardo Seo	d2de9ef7ad	powerpc: Add hwcap2 bits for POWER9. Added hwcap2 bit masks for Power ISA 3.0 and VSX IEEE binary float 128-bit features.	2016-01-08 11:19:40 -02:00
Joseph Myers	f7a9f785e5	Update copyright dates with scripts/update-copyrights.	2016-01-04 16:05:18 +00:00
Carlos Eduardo Seo	c676e65939	powerpc: Export __parse_hwcap_and_convert_at_platform to libc.a. Commit `67385a01d2` added a new feature for powerpc, where we store HWCAP/Platform bits in the TCB. In the dynamic linking case, we use the versioned symbol '__parse_hwcap_and_convert_at_platform' to verify if this feature is available. However, the same symbol was not exported to libc.a, making it not possible for GCC to check for it prior to link time.	2015-12-22 15:41:19 -02:00
Carlos Eduardo Seo	b1f19b8ef1	powerpc: Add basic support for POWER9 sans hwcap. This patch adds the minimum changes for supporting the POWER9 processor.	2015-12-22 14:45:55 -02:00
Adhemerval Zanella	661a29a518	powerpc: Regenerate libm-test-ulps * sysdeps/powerpc/fpu/libm-test-ulps: Regenerated.	2015-12-22 11:11:01 -02:00
Adhemerval Zanella	2094350c9c	Fix POWER7 logb results for negative subnormals (bug 19375) The optimized POWER7 logb implementation does not use the absolute value of the word extracted from the input to apply the leading 0-bits builtin (to ignore the float sign). This patch fixes it by clearing the signal bit in the resulting word. It fixes the subnormal tests failures when running on POWER7 ou newer chip. Tested on powerpc64le (POWER8). [BZ# 19375] * sysdeps/powerpc/power7/fpu/s_logb.c (__logb): Fix return for negative subnormals.	2015-12-17 14:34:33 -02:00
Carlos Eduardo Seo	67385a01d2	powerpc: Add hwcap/hwcap2/platform data to TCB. This patch adds a new feature for powerpc. In order to get faster access to the HWCAP/HWCAP2 bits and platform number (i.e. for implementing __builtin_cpu_is () / __builtin_cpu_supports () in GCC) without the overhead of reading from the auxiliary vector, we now reserve space for them in the TCB. This is an ABI change for GLIBC 2.23. A new versioned symbol '__parse_hwcap_and_convert_at_platform' is available to get the data from the auxiliary vector and parse it, and store it for later use in the TLS initialization code. This function is called very early (in _dl_sysdep_start () via DL_PLATFORM_INFO for the dynamic linking case, and in __libc_start_main () for the static linking case) to make sure the data is available at the time of TLS initialization. * sysdeps/powerpc/Makefile (sysdep-dl-routines): Add hwcapinfo. (sysdep_routines): Likewise. (sysdep-rtld-routines): Likewise. [$(subdir) = nptl](tests): Add test-get_hwcap and test-get_hwcap-static [$(subdir) = nptl](tests-static): test-get_hwcap-static * sysdeps/powerpc/Versions: Added new __parse_hwcap_and_convert_at_platform symbol to GLIBC-2.23. * sysdeps/powerpc/hwcapinfo.c: New file. (__tcb_parse_hwcap_and_convert_at_platform): New function to initialize and parse hwcap, hwcap2 and platform number information. * sysdeps/powerpc/hwcapinfo.h: New file. Creates global variables to store HWCAP+HWCAP2 and platform number. * sysdeps/powerpc/nptl/tcb-offsets.sym: Added new offsets for HWCAP+HWCAP2 and platform number in the TCB. * sysdeps/powerpc/nptl/tls.h: New functionality. Stores the HWCAP, HWCAP2 and platform number in the TCB. (dtv): Added new fields for HWCAP+HWCAP2 and platform number. (TLS_INIT_TP): Included calls to add the hwcap and at_platform values in the TCB in TP initialization. (TLS_DEFINE_INIT_TP): Likewise. (THREAD_GET_HWCAP): New macro. (THREAD_SET_HWCAP): Likewise. (THREAD_GET_AT_PLATFORM): Likewise. (THREAD_SET_AT_PLATFORM): Likewise. * sysdeps/powerpc/powerpc32/dl-machine.h: (dl_platform_init): New function that calls __parse_hwcap_and_convert_at_platform for the dymanic linking case for powerpc32. * sysdeps/powerpc/powerpc64/dl-machine.h: Likewise, for powerpc64. * sysdeps/powerpc/test-get_hwcap-static.c: New file. Testcase for this functionality, static linking case. * sysdeps/powerpc/test-get_hwcap.c: New file. Likewise, dynamic linking case. * sysdeps/unix/sysv/linux/powerpc/libc-start.c: Added call to __parse_hwcap_and_convert_at_platform for the static linking case. * sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist: Included the new __parse_hwcap_and_convert_at_platform symbol in the ABI list for GLIBC 2.23. * sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist: Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist: Likewise.	2015-12-03 13:56:13 -02:00
Paul Murphy	9695cb3e65	powerpc: Spinlock optimization and cleanup This patch optimizes powerpc spinlock implementation by: * Use the correct EH hint bit on the larx for supported ISA. For lock acquisition, the thread that acquired the lock with a successful stcx does not want to give away the write ownership on the cacheline. The idea is to make the load reservation "sticky" about retaining write authority to the line. That way, the store that must inevitably come to release the lock can succeed quickly and not contend with other threads issuing lwarx. If another thread does a store to the line (false sharing), the winning thread must give up write authority to the proper value of EH for the larx for a lock acquisition is 1. * Increase contented lock performance by up to 40%, and no measurable impact on uncontended locks on P8. Thanks to Adhemerval Zanella who did most of the work. I've run some tests, and addressed some minor feedback. * sysdeps/powerpc/nptl/pthread_spin_lock.c (pthread_spin_lock): Add lwarx hint, and use macro for acquire instruction. * sysdeps/powerpc/nptl/pthread_spin_trylock.c (pthread_spin_trylock): Likewise. * sysdep/unix/sysv/linux/powerpc/pthread_spin_unlock.c: Move to ... * sysdeps/powerpc/nptl/pthread_spin_unlock.c: ... here, and update to use new atomic macros.	2015-11-19 18:04:30 -02:00
Joseph Myers	21378ae0d3	Fix powerpc round, roundf spurious "inexact" (bug 19238). The powerpc hard-float round and roundf functions, both 32-bit and 64-bit, raise spurious "inexact" exceptions for integer arguments from adding 0.5 and rounding to integer toward zero. Since these functions already save and restore the rounding mode, it's natural to make them restore the full floating-point state instead to fix this bug, which this patch does. The save of the state is moved after the first floating-point operation on the input so that any "invalid" exceptions from signaling NaN inputs are properly preserved. As a consequence of this approach to the fix, "inexact" for noninteger arguments (disallowed by TS 18661-1 but not by C99/C11, see bug 15479) is also avoided for these implementations; this is not a general fix for bug 15479 since plenty of other implementations of various functions still raise spurious "inexact" for noninteger arguments. This issue and fix do not apply to builds using power5+ versions of round and roundf, which use the frin instruction and avoid "inexact" exceptions that way. This patch should get hard-float powerpc32 and powerpc64 (default function implementations) back to a state where test-float and test-double will pass after ulps regeneration. Tested for powerpc32 and powerpc64. [BZ #15479] [BZ #19238] * sysdeps/powerpc/powerpc32/fpu/s_round.S (__round): Save floating-point state after first operation on input. Restore full state rather than just rounding mode. * sysdeps/powerpc/powerpc32/fpu/s_roundf.S (__roundf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_round.S (__round): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_roundf.S (__roundf): Likewise.	2015-11-12 19:00:06 +00:00
Joseph Myers	32b71ad358	Fix powerpc64 lround, lroundf, llround, llroundf spurious "inexact" exceptions (bug 19235). Similar to bug 19134 for powerpc32, the powerpc64 implementations of lround, lroundf, llround, llroundf can raise spurious "inexact" exceptions for integer arguments from adding 0.5 then converting to integer (this does not apply to the power5+ version for double, which uses the frin instruction which is defined never to raise "inexact"; I don't know why power5+ doesn't use that version for float as well). This patch fixes the bug in a similar way to the powerpc32 bug, by testing for integers (adding and subtracting 2^52 and comparing with the value before that addition and subtraction) and not adding 0.5 in that case. The powerpc maintainers may wish to look at making power5+ / power6x / power8 use frin for float lround / llround as well as for double, unless there's some reason I've missed that this isn't beneficial. Tested for powerpc64. [BZ #19235] * sysdeps/powerpc/powerpc64/fpu/s_llround.S (__llround): Do not add 0.5 to integer arguments. * sysdeps/powerpc/powerpc64/fpu/s_llroundf.S (__llroundf): Likewise. (.LC2): New object.	2015-11-12 16:24:00 +00:00
Joseph Myers	71d1b0166b	Fix powerpc nearbyint wrongly clearing "inexact" and leaving traps disabled (bug 19228). Similar to bug 15491 recently fixed for x86_64 / x86, the powerpc (both powerpc32 and powerpc64) hard-float implementations of nearbyintf and nearbyint wrongly clear an "inexact" exception that was raised before the function was called; this shows up as failure of the test math/test-nearbyint-except added when that bug was fixed. They also wrongly leave traps on "inexact" disabled if they were enabled before the function was called. This patch fixes the bugs similar to how the x86 bug was fixed: saving and restoring the whole floating-point state, both to restore the original "inexact" flag state and to restore the original state of whether traps on "inexact" were enabled. Because there's a convenient point in the powerpc implementations to save state after any sNaN arguments will have raised "invalid" but before "inexact" traps need to be disabled, no special handling for "invalid" is needed as in the x86 version. Tested for powerpc64 and powerpc32, where it fixes the math/test-nearbyint-except failure as well as fixing the new test math/test-nearbyint-except-2 added by this patch. Also tested for x86_64 and x86 that the new test passes. If powerpc experts see a more efficient way of doing this (e.g. instruction positioning that's better for pipelines on typical processors) then of course followups optimizing the fix are welcome. [BZ #19228] * sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S (__nearbyint): Save and restore full floating-point state. * sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S (__nearbyintf): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S (__nearbyint): Likewise. * sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S (__nearbyintf): Likewise. * math/test-nearbyint-except-2.c: New file. * math/Makefile (tests): Add test-nearbyint-except-2.	2015-11-11 00:06:09 +00:00
Carlos Eduardo Seo	352988a4a6	powerpc: Provide __tls_get_addr () in static libc Since '--no-tls-optimize' is available for Power in ld, we need to provide __tls_get_addr () in static libc in order to avoid undefined references to this symbol when that flag is used. * sysdeps/powerpc/libc-tls.c: New file. Provides __tls_get_addr () in static libc.	2015-10-28 11:42:23 -02:00
Paul Murphy	72f1463df8	powerpc: Fix usage of elision transient failure adapt param The skip_lock_out_of_tbegin_retries adaptive parameter was not being used correctly, nor as described. This prevents a fallback for all users of the lock if a transient abort occurs within the accepted number of retries. [BZ #19174] * sysdeps/powerpc/nptl/elide.h (__elide_lock): Fix usage of .skip_lock_out_of_tbegin_retries. * sysdeps/unix/sysv/linux/powerpc/elision-lock.c (__lll_lock_elision): Likewise, and respect a value of try_tbegin <= 0.	2015-10-27 17:27:41 -02:00
Tulio Magno Quites Machado Filho	6ec52bf634	PowerPC: Fix a race condition when eliding a lock The previous code used to evaluate the preprocessor token is_lock_free to a variable before starting a transaction. This behavior can cause an error if another thread got the lock (without using a transaction) between the evaluation of the token and the beginning of the transaction. This bug can be triggered with the following order of events: 1. The lock accessed by is_lock_free is free. 2. Thread T1 evaluates is_lock_free and stores into register R1 that the lock is free. 3. Thread T2 acquires the same lock used in is_lock_free. 4. T1 begins the transaction, creating a memory barrier where is_lock_free is false, but R1 is true. 5. T1 reads R1 and doesn't abort the transaction. 6. T1 calls ELIDE_UNLOCK, which reads false from is_lock_free and decides to unlock a lock acquired by T2, leading to undefined behavior. This patch delays the evaluation of is_lock_free to inside a transaction by moving this part of the code to the macro ELIDE_LOCK. [BZ #18743] * sysdeps/powerpc/nptl/elide.h (__elide_lock): Move most of this code to... (ELIDE_LOCK): ...here. (__get_new_count): New function with part of the code from __elide_lock that updates the value of adapt_count after a transaction abort. (__elided_trylock): Moved this code to... (ELIDE_TRYLOCK): ...here.	2015-10-19 16:58:03 -02:00

1 2 3 4 5 ...

1055 Commits