glibc/sysdeps/ieee754/ldbl-128/x2y2m1l.c

/* Compute x^2 + y^2 - 1, without large cancellation error.
   Copyright (C) 2012-2016 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#include <math.h>
#include <math_private.h>
#include <mul_splitl.h>
#include <stdlib.h>


/* Calculate X + Y exactly and store the result in *HI + *LO.  It is
   given that |X| >= |Y| and the values are small enough that no
   overflow occurs.  */

static inline void
add_split (_Float128 *hi, _Float128 *lo, _Float128 x, _Float128 y)
{
  /* Apply Dekker's algorithm.  */
  *hi = x + y;
  *lo = (x - *hi) + y;
}

/* Compare absolute values of floating-point values pointed to by P
   and Q for qsort.  */

static int
compare (const void *p, const void *q)
{
  _Float128 pld = fabsl (*(const _Float128 *) p);
  _Float128 qld = fabsl (*(const _Float128 *) q);
  if (pld < qld)
    return -1;
  else if (pld == qld)
    return 0;
  else
    return 1;
}

/* Return X^2 + Y^2 - 1, computed without large cancellation error.
   It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
   0.5.  */

_Float128
__x2y2m1l (_Float128 x, _Float128 y)
{
  _Float128 vals[5];
  SET_RESTORE_ROUNDL (FE_TONEAREST);
  mul_splitl (&vals[1], &vals[0], x, x);
  mul_splitl (&vals[3], &vals[2], y, y);
  vals[4] = -1;
  qsort (vals, 5, sizeof (_Float128), compare);
  /* Add up the values so that each element of VALS has absolute value
     at most equal to the last set bit of the next nonzero
     element.  */
  for (size_t i = 0; i <= 3; i++)
    {
      add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]);
      qsort (vals + i + 1, 4 - i, sizeof (_Float128), compare);
    }
  /* Now any error from this addition will be small.  */
  return vals[4] + vals[3] + vals[2] + vals[1] + vals[0];
}
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`/* Compute x^2 + y^2 - 1, without large cancellation error.`
Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00			`Copyright (C) 2012-2016 Free Software Foundation, Inc.`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`This file is part of the GNU C Library.`

			`The GNU C Library is free software; you can redistribute it and/or`
			`modify it under the terms of the GNU Lesser General Public`
			`License as published by the Free Software Foundation; either`
			`version 2.1 of the License, or (at your option) any later version.`

			`The GNU C Library is distributed in the hope that it will be useful,`
			`but WITHOUT ANY WARRANTY; without even the implied warranty of`
			`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU`
			`Lesser General Public License for more details.`

			`You should have received a copy of the GNU Lesser General Public`
			`License along with the GNU C Library; if not, see`
			`<http://www.gnu.org/licenses/>. */`

			`#include <math.h>`
			`#include <math_private.h>`
Merge common usage of mul_split function A number of files share identical code for the mul_split function. This moves the duplicated function mul_split into its own header, and refactors the fma usage into a single selection macro. Likewise, mul_split when used by a long double implementation is renamed mul_splitl for clarity. 2016-08-08 20:58:28 +00:00			`#include <mul_splitl.h>`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`#include <stdlib.h>`

Merge common usage of mul_split function A number of files share identical code for the mul_split function. This moves the duplicated function mul_split into its own header, and refactors the fma usage into a single selection macro. Likewise, mul_split when used by a long double implementation is renamed mul_splitl for clarity. 2016-08-08 20:58:28 +00:00
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`/* Calculate X + Y exactly and store the result in HI + LO. It is`
			`given that \|X\| >= \|Y\| and the values are small enough that no`
			`overflow occurs. */`

			`static inline void`
ldbl-128: Rename 'long double' to '_Float128' Add a layer of macro indirection for long double files which need to be built using another typename. Likewise, add the L(num) macro used in a later patch to override real constants. These macros are only defined through the ldbl-128 math_ldbl.h header, thereby implicitly restricting these macros to machines which back long double with an IEEE binary128 format. Likewise, appropriate changes are made for the few files which indirectly include such ldbl-128 files. These changes produce identical binaries for s390x, aarch64, and ppc64. 2016-07-20 20:20:51 +00:00			`add_split (_Float128 hi, _Float128 lo, _Float128 x, _Float128 y)`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`{`
			`/* Apply Dekker's algorithm. */`
			`*hi = x + y;`
			`lo = (x - hi) + y;`
			`}`

			`/* Compare absolute values of floating-point values pointed to by P`
			`and Q for qsort. */`

			`static int`
			`compare (const void p, const void q)`
			`{`
ldbl-128: Rename 'long double' to '_Float128' Add a layer of macro indirection for long double files which need to be built using another typename. Likewise, add the L(num) macro used in a later patch to override real constants. These macros are only defined through the ldbl-128 math_ldbl.h header, thereby implicitly restricting these macros to machines which back long double with an IEEE binary128 format. Likewise, appropriate changes are made for the few files which indirectly include such ldbl-128 files. These changes produce identical binaries for s390x, aarch64, and ppc64. 2016-07-20 20:20:51 +00:00			`_Float128 pld = fabsl ((const _Float128 ) p);`
			`_Float128 qld = fabsl ((const _Float128 ) q);`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`if (pld < qld)`
			`return -1;`
			`else if (pld == qld)`
			`return 0;`
			`else`
			`return 1;`
			`}`

			`/* Return X^2 + Y^2 - 1, computed without large cancellation error.`
Fix clog, clog10 inaccuracy (bug 19016). For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids cancellation error and then using log1p. However, the thresholds for using that approach still result in log being used on argument as large as sqrt(13/16) > 0.9, leading to significant errors, in some cases above the 9ulp maximum allowed in glibc libm. This patch arranges for the approach using log1p to be used in any cases where \|X\|, \|Y\| < 1 and X^2 + Y^2 >= 0.5 (with the existing allowance for cases where one of X and Y is very small), adjusting the __x2y2m1 functions to work with the wider range of inputs. This way, log only gets used on arguments below sqrt(1/2) (or substantially above 1), where the error involved is much less. Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration I removed the existing clog and clog10 ulps before regenerating to allow any reduced ulps to appear. Tests added include those found by random test generation to produce large ulps either before or after the patch, and some found by trying inputs close to the (0.75, 0.5) threshold where the potential errors from using log are largest. [BZ #19016] * sysdeps/generic/math_private.h (__x2y2m1f): Update comment to allow more cases with X^2 + Y^2 >= 0.5. * sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment. * sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise. * sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0] (__x2y2m1): Update comment. * sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * math/s_clog.c (__clog): Handle more cases using log1p without hypot. * math/s_clog10.c (__clog10): Likewise. * math/s_clog10f.c (__clog10f): Likewise. * math/s_clog10l.c (__clog10l): Likewise. * math/s_clogf.c (__clogf): Likewise. * math/s_clogl.c (__clogl): Likewise. * math/auto-libm-test-in: Add more tests of clog and clog10. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise. 2015-09-28 22:11:22 +00:00			`It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=`
			`0.5. */`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00
ldbl-128: Rename 'long double' to '_Float128' Add a layer of macro indirection for long double files which need to be built using another typename. Likewise, add the L(num) macro used in a later patch to override real constants. These macros are only defined through the ldbl-128 math_ldbl.h header, thereby implicitly restricting these macros to machines which back long double with an IEEE binary128 format. Likewise, appropriate changes are made for the few files which indirectly include such ldbl-128 files. These changes produce identical binaries for s390x, aarch64, and ppc64. 2016-07-20 20:20:51 +00:00			`_Float128`
			`__x2y2m1l (_Float128 x, _Float128 y)`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`{`
ldbl-128: Rename 'long double' to '_Float128' Add a layer of macro indirection for long double files which need to be built using another typename. Likewise, add the L(num) macro used in a later patch to override real constants. These macros are only defined through the ldbl-128 math_ldbl.h header, thereby implicitly restricting these macros to machines which back long double with an IEEE binary128 format. Likewise, appropriate changes are made for the few files which indirectly include such ldbl-128 files. These changes produce identical binaries for s390x, aarch64, and ppc64. 2016-07-20 20:20:51 +00:00			`_Float128 vals[5];`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`SET_RESTORE_ROUNDL (FE_TONEAREST);`
Merge common usage of mul_split function A number of files share identical code for the mul_split function. This moves the duplicated function mul_split into its own header, and refactors the fma usage into a single selection macro. Likewise, mul_split when used by a long double implementation is renamed mul_splitl for clarity. 2016-08-08 20:58:28 +00:00			`mul_splitl (&vals[1], &vals[0], x, x);`
			`mul_splitl (&vals[3], &vals[2], y, y);`
ldbl-128: Use L(x) macro for long double constants This runs the attached sed script against these files using a regex which aggressively matches long double literals when not obviously part of a comment. Likewise, 5 digit or less integral constants are replaced with integer constants, excepting the two cases of 0 used in large tables, which are also the only integral values of the form x.0*E0L encountered within these converted files. Likewise, -L(x) is transformed into L(-x). Naturally, the script has a few minor hiccups which are more clearly remedied via the attached fixup patch. Such hiccups include, context-sensitive promotion to a real type, and munging constants inside harder to detect comment blocks. 2016-09-02 16:01:07 +00:00			`vals[4] = -1;`
ldbl-128: Rename 'long double' to '_Float128' Add a layer of macro indirection for long double files which need to be built using another typename. Likewise, add the L(num) macro used in a later patch to override real constants. These macros are only defined through the ldbl-128 math_ldbl.h header, thereby implicitly restricting these macros to machines which back long double with an IEEE binary128 format. Likewise, appropriate changes are made for the few files which indirectly include such ldbl-128 files. These changes produce identical binaries for s390x, aarch64, and ppc64. 2016-07-20 20:20:51 +00:00			`qsort (vals, 5, sizeof (_Float128), compare);`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`/* Add up the values so that each element of VALS has absolute value`
			`at most equal to the last set bit of the next nonzero`
			`element. */`
Fix clog, clog10 inaccuracy (bug 19016). For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids cancellation error and then using log1p. However, the thresholds for using that approach still result in log being used on argument as large as sqrt(13/16) > 0.9, leading to significant errors, in some cases above the 9ulp maximum allowed in glibc libm. This patch arranges for the approach using log1p to be used in any cases where \|X\|, \|Y\| < 1 and X^2 + Y^2 >= 0.5 (with the existing allowance for cases where one of X and Y is very small), adjusting the __x2y2m1 functions to work with the wider range of inputs. This way, log only gets used on arguments below sqrt(1/2) (or substantially above 1), where the error involved is much less. Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration I removed the existing clog and clog10 ulps before regenerating to allow any reduced ulps to appear. Tests added include those found by random test generation to produce large ulps either before or after the patch, and some found by trying inputs close to the (0.75, 0.5) threshold where the potential errors from using log are largest. [BZ #19016] * sysdeps/generic/math_private.h (__x2y2m1f): Update comment to allow more cases with X^2 + Y^2 >= 0.5. * sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment. * sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise. * sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0] (__x2y2m1): Update comment. * sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * math/s_clog.c (__clog): Handle more cases using log1p without hypot. * math/s_clog10.c (__clog10): Likewise. * math/s_clog10f.c (__clog10f): Likewise. * math/s_clog10l.c (__clog10l): Likewise. * math/s_clogf.c (__clogf): Likewise. * math/s_clogl.c (__clogl): Likewise. * math/auto-libm-test-in: Add more tests of clog and clog10. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise. 2015-09-28 22:11:22 +00:00			`for (size_t i = 0; i <= 3; i++)`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`{`
			`add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]);`
ldbl-128: Rename 'long double' to '_Float128' Add a layer of macro indirection for long double files which need to be built using another typename. Likewise, add the L(num) macro used in a later patch to override real constants. These macros are only defined through the ldbl-128 math_ldbl.h header, thereby implicitly restricting these macros to machines which back long double with an IEEE binary128 format. Likewise, appropriate changes are made for the few files which indirectly include such ldbl-128 files. These changes produce identical binaries for s390x, aarch64, and ppc64. 2016-07-20 20:20:51 +00:00			`qsort (vals + i + 1, 4 - i, sizeof (_Float128), compare);`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`}`
			`/* Now any error from this addition will be small. */`
Fix clog, clog10 inaccuracy (bug 19016). For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids cancellation error and then using log1p. However, the thresholds for using that approach still result in log being used on argument as large as sqrt(13/16) > 0.9, leading to significant errors, in some cases above the 9ulp maximum allowed in glibc libm. This patch arranges for the approach using log1p to be used in any cases where \|X\|, \|Y\| < 1 and X^2 + Y^2 >= 0.5 (with the existing allowance for cases where one of X and Y is very small), adjusting the __x2y2m1 functions to work with the wider range of inputs. This way, log only gets used on arguments below sqrt(1/2) (or substantially above 1), where the error involved is much less. Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration I removed the existing clog and clog10 ulps before regenerating to allow any reduced ulps to appear. Tests added include those found by random test generation to produce large ulps either before or after the patch, and some found by trying inputs close to the (0.75, 0.5) threshold where the potential errors from using log are largest. [BZ #19016] * sysdeps/generic/math_private.h (__x2y2m1f): Update comment to allow more cases with X^2 + Y^2 >= 0.5. * sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment. * sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise. * sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0] (__x2y2m1): Update comment. * sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1 as normal element in sum instead of special-casing based on values of arguments. * math/s_clog.c (__clog): Handle more cases using log1p without hypot. * math/s_clog10.c (__clog10): Likewise. * math/s_clog10f.c (__clog10f): Likewise. * math/s_clog10l.c (__clog10l): Likewise. * math/s_clogf.c (__clogf): Likewise. * math/s_clogl.c (__clogl): Likewise. * math/auto-libm-test-in: Add more tests of clog and clog10. * math/auto-libm-test-out: Regenerated. * sysdeps/i386/fpu/libm-test-ulps: Update. * sysdeps/x86_64/fpu/libm-test-ulps: Likewise. 2015-09-28 22:11:22 +00:00			`return vals[4] + vals[3] + vals[2] + vals[1] + vals[0];`
Fix inaccuracy of clog, clog10 near \|z\| = 1 (bug 13629). 2012-09-25 19:43:49 +00:00			`}`