Fix clog, clog10 inaccuracy (bug 19016).

For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large
errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids
cancellation error and then using log1p.

However, the thresholds for using that approach still result in log
being used on argument as large as sqrt(13/16) > 0.9, leading to
significant errors, in some cases above the 9ulp maximum allowed in
glibc libm.  This patch arranges for the approach using log1p to be
used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the
existing allowance for cases where one of X and Y is very small),
adjusting the __x2y2m1 functions to work with the wider range of
inputs.  This way, log only gets used on arguments below sqrt(1/2) (or
substantially above 1), where the error involved is much less.

Tested for x86_64, x86, mips64 and powerpc.  For the ulps regeneration
I removed the existing clog and clog10 ulps before regenerating to
allow any reduced ulps to appear.  Tests added include those found by
random test generation to produce large ulps either before or after
the patch, and some found by trying inputs close to the (0.75, 0.5)
threshold where the potential errors from using log are largest.

	[BZ #19016]
	* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
	allow more cases with X^2 + Y^2 >= 0.5.
	* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise.  Add -1 as
	normal element in sum instead of special-casing based on values of
	arguments.
	* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
	* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise.  Add
	-1 as normal element in sum instead of special-casing based on
	values of arguments.
	* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
	* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
	(__x2y2m1): Update comment.
	* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise.  Add -1
	as normal element in sum instead of special-casing based on values
	of arguments.
	* math/s_clog.c (__clog): Handle more cases using log1p without
	hypot.
	* math/s_clog10.c (__clog10): Likewise.
	* math/s_clog10f.c (__clog10f): Likewise.
	* math/s_clog10l.c (__clog10l): Likewise.
	* math/s_clogf.c (__clogf): Likewise.
	* math/s_clogl.c (__clogl): Likewise.
	* math/auto-libm-test-in: Add more tests of clog and clog10.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This commit is contained in:
Joseph Myers 2015-09-28 22:11:22 +00:00
parent 60cf80f09d
commit a5721ebc68
19 changed files with 6759 additions and 156 deletions

View File

@ -1,3 +1,33 @@
2015-09-28 Joseph Myers <joseph@codesourcery.com>
[BZ #19016]
* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
allow more cases with X^2 + Y^2 >= 0.5.
* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as
normal element in sum instead of special-casing based on values of
arguments.
* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add
-1 as normal element in sum instead of special-casing based on
values of arguments.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
(__x2y2m1): Update comment.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1
as normal element in sum instead of special-casing based on values
of arguments.
* math/s_clog.c (__clog): Handle more cases using log1p without
hypot.
* math/s_clog10.c (__clog10): Likewise.
* math/s_clog10f.c (__clog10f): Likewise.
* math/s_clog10l.c (__clog10l): Likewise.
* math/s_clogf.c (__clogf): Likewise.
* math/s_clogl.c (__clogl): Likewise.
* math/auto-libm-test-in: Add more tests of clog and clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-09-28 Martin Sebor <msebor@redhat.com> 2015-09-28 Martin Sebor <msebor@redhat.com>
[BZ #18969] [BZ #18969]

3
NEWS
View File

@ -16,7 +16,8 @@ Version 2.23
18618, 18647, 18661, 18674, 18675, 18681, 18757, 18778, 18781, 18787, 18618, 18647, 18661, 18674, 18675, 18681, 18757, 18778, 18781, 18787,
18789, 18790, 18795, 18796, 18803, 18820, 18823, 18824, 18825, 18857, 18789, 18790, 18795, 18796, 18803, 18820, 18823, 18824, 18825, 18857,
18863, 18870, 18872, 18873, 18875, 18887, 18921, 18951, 18952, 18956, 18863, 18870, 18872, 18873, 18875, 18887, 18921, 18951, 18952, 18956,
18961, 18966, 18967, 18969, 18970, 18977, 18980, 18981, 18985, 19003. 18961, 18966, 18967, 18969, 18970, 18977, 18980, 18981, 18985, 19003,
19016.
* The obsolete header <regexp.h> has been removed. Programs that require * The obsolete header <regexp.h> has been removed. Programs that require
this header must be updated to use <regex.h> instead. this header must be updated to use <regex.h> instead.

View File

@ -660,6 +660,51 @@ clog -0xa.7ac41a0b417cb8fp-4 -0x6.c5a32eaeedd4p-4
clog 0x3.c16p-136 0x8p-152 clog 0x3.c16p-136 0x8p-152
clog -0x1.0a69de710590dp+0 -0x7.bc7e121e2b0d1088p-4 clog -0x1.0a69de710590dp+0 -0x7.bc7e121e2b0d1088p-4
clog -0x2.7bdep-4 0x5.ab7a4p-4
clog -0xb.e1d3d0ff44358p-4 -0x7.54785e1b143f8p-4
clog 0x3.ba473p+0 0x7.eea9ap-4
clog 0x9.d02220baee4ep+36 0x2.b9a29cp+0
clog -0x5.1a5cf8p-4 -0xb.73012p-4
clog -0xa.ff292a609dbb8p-4 0x6.f73d4cp-4
clog -0x5.1a5cfc2301114p-4 -0xb.730118p-4
clog 0xb.ffffcp-4 0x7.ffff1p-4
clog 0xb.ffffp-4 0x7.ffffap-4
clog 0xb.ffffp-4 0x7.fffff8p-4
clog 0xb.ffffp-4 0x7.ffffp-4
clog 0xb.fffffp-4 0x7.ffff68p-4
clog 0xb.fffffp-4 0x7.ffffp-4
clog 0xb.ffff8p-4 0x7.ffffcp-4
clog 0xb.ffffp-4 0x7.ffffcp-4
clog 0xb.ffffp-4 0x7.ffffb8p-4
clog 0xb.ffffp-4 0x7.ffff7p-4
clog 0xb.ffffp-4 0x7.ffff5p-4
clog 0xb.fffffffffff7p-4 0x7.fffff8p-4
clog 0xb.fffffffffff08p-4 0x7.fffffffffffdp-4
clog 0xb.fffffffffff08p-4 0x7.fffffffffff9p-4
clog 0xb.fffffffffffp-4 0x7.fffffffffffdcp-4
clog 0xb.fffffp-4 0x7.ffffffffffff4p-4
clog 0xb.fffffffffffp-4 0x7.fffffffffffecp-4
clog 0xb.fffffffffff8p-4 0x7.fffff8p-4
clog 0x8p-152 -0x1.10233ap+0
clog 0xa.03634p-4 -0x4.7bb918p-20
clog -0x5.e23d2p-4 0x8.525df889c21ap-4
clog 0x9.8ce58p-4 -0x8p-152
clog 0x8p-152 0x9.2af75p-4
clog 0x9.97a15de8e59d8p-4 -0
clog -0x4.74556ec92eb4746p-4 0x1.1e7aa1d936f6efe6p+0
clog 0x9.97a15de8e59d8p-4 -0
clog -0x9.7f1d7p-64 0x9.db37dp-4
clog -0x8.5efc4p-4 -0x5.40310cp-4
clog -0x9.0b459p-4 0
clog -0x6.a9419e9b30e68p-4 -0x6.262c7p-4
clog 0x5.2767cdfdfbf2p-4 0x7.69ee98p-4
clog -0x9.f5563cb3227d8p-4 0
clog -0x9.5a284p-4 0x6.899578p-8
clog 0xa.3e62bp-4 0x1.18c03p-100
clog 0 -0x9.22a99p-4
clog 0 0x9.7915bp-4
clog 0x3.00d1ap-12 0x1.23ff6ap+0
clog 0x1.fffffep+127 0x1.fffffep+127 clog 0x1.fffffep+127 0x1.fffffep+127
clog 0x1.fffffep+127 1.0 clog 0x1.fffffep+127 1.0
clog 0x1p-149 0x1p-149 clog 0x1p-149 0x1p-149
@ -808,6 +853,51 @@ clog10 -0xa.7ac41a0b417cb8fp-4 -0x6.c5a32eaeedd4p-4
clog10 0x3.c16p-136 0x8p-152 clog10 0x3.c16p-136 0x8p-152
clog10 -0x1.0a69de710590dp+0 -0x7.bc7e121e2b0d1088p-4 clog10 -0x1.0a69de710590dp+0 -0x7.bc7e121e2b0d1088p-4
clog10 -0x2.7bdep-4 0x5.ab7a4p-4
clog10 -0xb.e1d3d0ff44358p-4 -0x7.54785e1b143f8p-4
clog10 0x3.ba473p+0 0x7.eea9ap-4
clog10 0x9.d02220baee4ep+36 0x2.b9a29cp+0
clog10 -0x5.1a5cf8p-4 -0xb.73012p-4
clog10 -0xa.ff292a609dbb8p-4 0x6.f73d4cp-4
clog10 -0x5.1a5cfc2301114p-4 -0xb.730118p-4
clog10 0xb.ffffcp-4 0x7.ffff1p-4
clog10 0xb.ffffp-4 0x7.ffffap-4
clog10 0xb.ffffp-4 0x7.fffff8p-4
clog10 0xb.ffffp-4 0x7.ffffp-4
clog10 0xb.fffffp-4 0x7.ffff68p-4
clog10 0xb.fffffp-4 0x7.ffffp-4
clog10 0xb.ffff8p-4 0x7.ffffcp-4
clog10 0xb.ffffp-4 0x7.ffffcp-4
clog10 0xb.ffffp-4 0x7.ffffb8p-4
clog10 0xb.ffffp-4 0x7.ffff7p-4
clog10 0xb.ffffp-4 0x7.ffff5p-4
clog10 0xb.fffffffffff7p-4 0x7.fffff8p-4
clog10 0xb.fffffffffff08p-4 0x7.fffffffffffdp-4
clog10 0xb.fffffffffff08p-4 0x7.fffffffffff9p-4
clog10 0xb.fffffffffffp-4 0x7.fffffffffffdcp-4
clog10 0xb.fffffp-4 0x7.ffffffffffff4p-4
clog10 0xb.fffffffffffp-4 0x7.fffffffffffecp-4
clog10 0xb.fffffffffff8p-4 0x7.fffff8p-4
clog10 0x8p-152 -0x1.10233ap+0
clog10 0xa.03634p-4 -0x4.7bb918p-20
clog10 -0x5.e23d2p-4 0x8.525df889c21ap-4
clog10 0x9.8ce58p-4 -0x8p-152
clog10 0x8p-152 0x9.2af75p-4
clog10 0x9.97a15de8e59d8p-4 -0
clog10 -0x4.74556ec92eb4746p-4 0x1.1e7aa1d936f6efe6p+0
clog10 0x9.97a15de8e59d8p-4 -0
clog10 -0x9.7f1d7p-64 0x9.db37dp-4
clog10 -0x8.5efc4p-4 -0x5.40310cp-4
clog10 -0x9.0b459p-4 0
clog10 -0x6.a9419e9b30e68p-4 -0x6.262c7p-4
clog10 0x5.2767cdfdfbf2p-4 0x7.69ee98p-4
clog10 -0x9.f5563cb3227d8p-4 0
clog10 -0x9.5a284p-4 0x6.899578p-8
clog10 0xa.3e62bp-4 0x1.18c03p-100
clog10 0 -0x9.22a99p-4
clog10 0 0x9.7915bp-4
clog10 0x3.00d1ap-12 0x1.23ff6ap+0
clog10 0x1.fffffep+127 0x1.fffffep+127 clog10 0x1.fffffep+127 0x1.fffffep+127
clog10 0x1.fffffep+127 1.0 clog10 0x1.fffffep+127 1.0
clog10 0x1p-149 0x1p-149 clog10 0x1p-149 0x1p-149

File diff suppressed because it is too large Load Diff

View File

@ -76,14 +76,17 @@ __clog (__complex__ double x)
__real__ result = __log1p (d2m1) / 2.0; __real__ result = __log1p (d2m1) / 2.0;
} }
else if (absx < 1.0 else if (absx < 1.0
&& absx >= 0.75 && absx >= 0.5
&& absy < DBL_EPSILON / 2.0 && absy < DBL_EPSILON / 2.0
&& scale == 0) && scale == 0)
{ {
double d2m1 = (absx - 1.0) * (absx + 1.0); double d2m1 = (absx - 1.0) * (absx + 1.0);
__real__ result = __log1p (d2m1) / 2.0; __real__ result = __log1p (d2m1) / 2.0;
} }
else if (absx < 1.0 && (absx >= 0.75 || absy >= 0.5) && scale == 0) else if (absx < 1.0
&& absx >= 0.5
&& scale == 0
&& absx * absx + absy * absy >= 0.5)
{ {
double d2m1 = __x2y2m1 (absx, absy); double d2m1 = __x2y2m1 (absx, absy);
__real__ result = __log1p (d2m1) / 2.0; __real__ result = __log1p (d2m1) / 2.0;

View File

@ -82,14 +82,17 @@ __clog10 (__complex__ double x)
__real__ result = __log1p (d2m1) * (M_LOG10E / 2.0); __real__ result = __log1p (d2m1) * (M_LOG10E / 2.0);
} }
else if (absx < 1.0 else if (absx < 1.0
&& absx >= 0.75 && absx >= 0.5
&& absy < DBL_EPSILON / 2.0 && absy < DBL_EPSILON / 2.0
&& scale == 0) && scale == 0)
{ {
double d2m1 = (absx - 1.0) * (absx + 1.0); double d2m1 = (absx - 1.0) * (absx + 1.0);
__real__ result = __log1p (d2m1) * (M_LOG10E / 2.0); __real__ result = __log1p (d2m1) * (M_LOG10E / 2.0);
} }
else if (absx < 1.0 && (absx >= 0.75 || absy >= 0.5) && scale == 0) else if (absx < 1.0
&& absx >= 0.5
&& scale == 0
&& absx * absx + absy * absy >= 0.5)
{ {
double d2m1 = __x2y2m1 (absx, absy); double d2m1 = __x2y2m1 (absx, absy);
__real__ result = __log1p (d2m1) * (M_LOG10E / 2.0); __real__ result = __log1p (d2m1) * (M_LOG10E / 2.0);

View File

@ -82,14 +82,17 @@ __clog10f (__complex__ float x)
__real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f); __real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f);
} }
else if (absx < 1.0f else if (absx < 1.0f
&& absx >= 0.75f && absx >= 0.5f
&& absy < FLT_EPSILON / 2.0f && absy < FLT_EPSILON / 2.0f
&& scale == 0) && scale == 0)
{ {
float d2m1 = (absx - 1.0f) * (absx + 1.0f); float d2m1 = (absx - 1.0f) * (absx + 1.0f);
__real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f); __real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f);
} }
else if (absx < 1.0f && (absx >= 0.75f || absy >= 0.5f) && scale == 0) else if (absx < 1.0f
&& absx >= 0.5f
&& scale == 0
&& absx * absx + absy * absy >= 0.5f)
{ {
float d2m1 = __x2y2m1f (absx, absy); float d2m1 = __x2y2m1f (absx, absy);
__real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f); __real__ result = __log1pf (d2m1) * ((float) M_LOG10E / 2.0f);

View File

@ -89,14 +89,17 @@ __clog10l (__complex__ long double x)
__real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L); __real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L);
} }
else if (absx < 1.0L else if (absx < 1.0L
&& absx >= 0.75L && absx >= 0.5L
&& absy < LDBL_EPSILON / 2.0L && absy < LDBL_EPSILON / 2.0L
&& scale == 0) && scale == 0)
{ {
long double d2m1 = (absx - 1.0L) * (absx + 1.0L); long double d2m1 = (absx - 1.0L) * (absx + 1.0L);
__real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L); __real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L);
} }
else if (absx < 1.0L && (absx >= 0.75L || absy >= 0.5L) && scale == 0) else if (absx < 1.0L
&& absx >= 0.5L
&& scale == 0
&& absx * absx + absy * absy >= 0.5L)
{ {
long double d2m1 = __x2y2m1l (absx, absy); long double d2m1 = __x2y2m1l (absx, absy);
__real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L); __real__ result = __log1pl (d2m1) * (M_LOG10El / 2.0L);

View File

@ -76,14 +76,17 @@ __clogf (__complex__ float x)
__real__ result = __log1pf (d2m1) / 2.0f; __real__ result = __log1pf (d2m1) / 2.0f;
} }
else if (absx < 1.0f else if (absx < 1.0f
&& absx >= 0.75f && absx >= 0.5f
&& absy < FLT_EPSILON / 2.0f && absy < FLT_EPSILON / 2.0f
&& scale == 0) && scale == 0)
{ {
float d2m1 = (absx - 1.0f) * (absx + 1.0f); float d2m1 = (absx - 1.0f) * (absx + 1.0f);
__real__ result = __log1pf (d2m1) / 2.0f; __real__ result = __log1pf (d2m1) / 2.0f;
} }
else if (absx < 1.0f && (absx >= 0.75f || absy >= 0.5f) && scale == 0) else if (absx < 1.0f
&& absx >= 0.5f
&& scale == 0
&& absx * absx + absy * absy >= 0.5f)
{ {
float d2m1 = __x2y2m1f (absx, absy); float d2m1 = __x2y2m1f (absx, absy);
__real__ result = __log1pf (d2m1) / 2.0f; __real__ result = __log1pf (d2m1) / 2.0f;

View File

@ -83,14 +83,17 @@ __clogl (__complex__ long double x)
__real__ result = __log1pl (d2m1) / 2.0L; __real__ result = __log1pl (d2m1) / 2.0L;
} }
else if (absx < 1.0L else if (absx < 1.0L
&& absx >= 0.75L && absx >= 0.5L
&& absy < LDBL_EPSILON / 2.0L && absy < LDBL_EPSILON / 2.0L
&& scale == 0) && scale == 0)
{ {
long double d2m1 = (absx - 1.0L) * (absx + 1.0L); long double d2m1 = (absx - 1.0L) * (absx + 1.0L);
__real__ result = __log1pl (d2m1) / 2.0L; __real__ result = __log1pl (d2m1) / 2.0L;
} }
else if (absx < 1.0L && (absx >= 0.75L || absy >= 0.5L) && scale == 0) else if (absx < 1.0L
&& absx >= 0.5L
&& scale == 0
&& absx * absx + absy * absy >= 0.5L)
{ {
long double d2m1 = __x2y2m1l (absx, absy); long double d2m1 = __x2y2m1l (absx, absy);
__real__ result = __log1pl (d2m1) / 2.0L; __real__ result = __log1pl (d2m1) / 2.0L;

View File

@ -365,8 +365,8 @@ extern double __slowpow (double __x, double __y, double __z);
extern void __docos (double __x, double __dx, double __v[]); extern void __docos (double __x, double __dx, double __v[]);
/* Return X^2 + Y^2 - 1, computed without large cancellation error. /* Return X^2 + Y^2 - 1, computed without large cancellation error.
It is given that 1 > X >= Y >= epsilon / 2, and that either X >= It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
0.75 or Y >= 0.5. */ 0.5. */
extern float __x2y2m1f (float x, float y); extern float __x2y2m1f (float x, float y);
extern double __x2y2m1 (double x, double y); extern double __x2y2m1 (double x, double y);
extern long double __x2y2m1l (long double x, long double y); extern long double __x2y2m1l (long double x, long double y);

View File

@ -836,12 +836,12 @@ ildouble: 3
ldouble: 3 ldouble: 3
Function: Real part of "clog": Function: Real part of "clog":
double: 3 double: 2
float: 2 float: 1
idouble: 3 idouble: 2
ifloat: 2 ifloat: 1
ildouble: 4 ildouble: 3
ldouble: 4 ldouble: 3
Function: Imaginary part of "clog": Function: Imaginary part of "clog":
double: 1 double: 1
@ -864,10 +864,10 @@ ildouble: 2
ldouble: 2 ldouble: 2
Function: Real part of "clog10_downward": Function: Real part of "clog10_downward":
double: 5 double: 3
float: 4 float: 3
idouble: 5 idouble: 3
ifloat: 4 ifloat: 3
ildouble: 8 ildouble: 8
ldouble: 8 ldouble: 8
@ -876,14 +876,14 @@ double: 1
float: 1 float: 1
idouble: 1 idouble: 1
ifloat: 1 ifloat: 1
ildouble: 2 ildouble: 3
ldouble: 2 ldouble: 3
Function: Real part of "clog10_towardzero": Function: Real part of "clog10_towardzero":
double: 5 double: 3
float: 4 float: 3
idouble: 5 idouble: 3
ifloat: 4 ifloat: 3
ildouble: 8 ildouble: 8
ldouble: 8 ldouble: 8
@ -896,12 +896,12 @@ ildouble: 3
ldouble: 3 ldouble: 3
Function: Real part of "clog10_upward": Function: Real part of "clog10_upward":
double: 5 double: 3
float: 5 float: 3
idouble: 5 idouble: 3
ifloat: 5 ifloat: 3
ildouble: 6 ildouble: 7
ldouble: 6 ldouble: 7
Function: Imaginary part of "clog10_upward": Function: Imaginary part of "clog10_upward":
double: 1 double: 1
@ -912,12 +912,12 @@ ildouble: 3
ldouble: 3 ldouble: 3
Function: Real part of "clog_downward": Function: Real part of "clog_downward":
double: 5 double: 3
float: 5 float: 3
idouble: 5 idouble: 3
ifloat: 5 ifloat: 3
ildouble: 7 ildouble: 5
ldouble: 7 ldouble: 5
Function: Imaginary part of "clog_downward": Function: Imaginary part of "clog_downward":
double: 1 double: 1
@ -928,12 +928,12 @@ ildouble: 1
ldouble: 1 ldouble: 1
Function: Real part of "clog_towardzero": Function: Real part of "clog_towardzero":
double: 5 double: 3
float: 5 float: 3
idouble: 5 idouble: 3
ifloat: 5 ifloat: 3
ildouble: 8 ildouble: 5
ldouble: 8 ldouble: 5
Function: Imaginary part of "clog_towardzero": Function: Imaginary part of "clog_towardzero":
double: 1 double: 1
@ -944,12 +944,12 @@ ildouble: 1
ldouble: 1 ldouble: 1
Function: Real part of "clog_upward": Function: Real part of "clog_upward":
double: 5 double: 2
float: 5 float: 3
idouble: 5 idouble: 2
ifloat: 5 ifloat: 3
ildouble: 6 ildouble: 4
ldouble: 6 ldouble: 4
Function: Imaginary part of "clog_upward": Function: Imaginary part of "clog_upward":
double: 1 double: 1

View File

@ -80,32 +80,26 @@ compare (const void *p, const void *q)
} }
/* Return X^2 + Y^2 - 1, computed without large cancellation error. /* Return X^2 + Y^2 - 1, computed without large cancellation error.
It is given that 1 > X >= Y >= epsilon / 2, and that either X >= It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
0.75 or Y >= 0.5. */ 0.5. */
double double
__x2y2m1 (double x, double y) __x2y2m1 (double x, double y)
{ {
double vals[4]; double vals[5];
SET_RESTORE_ROUND (FE_TONEAREST); SET_RESTORE_ROUND (FE_TONEAREST);
mul_split (&vals[1], &vals[0], x, x); mul_split (&vals[1], &vals[0], x, x);
mul_split (&vals[3], &vals[2], y, y); mul_split (&vals[3], &vals[2], y, y);
if (x >= 0.75) vals[4] = -1.0;
vals[1] -= 1.0; qsort (vals, 5, sizeof (double), compare);
else
{
vals[1] -= 0.5;
vals[3] -= 0.5;
}
qsort (vals, 4, sizeof (double), compare);
/* Add up the values so that each element of VALS has absolute value /* Add up the values so that each element of VALS has absolute value
at most equal to the last set bit of the next nonzero at most equal to the last set bit of the next nonzero
element. */ element. */
for (size_t i = 0; i <= 2; i++) for (size_t i = 0; i <= 3; i++)
{ {
add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]);
qsort (vals + i + 1, 3 - i, sizeof (double), compare); qsort (vals + i + 1, 4 - i, sizeof (double), compare);
} }
/* Now any error from this addition will be small. */ /* Now any error from this addition will be small. */
return vals[3] + vals[2] + vals[1] + vals[0]; return vals[4] + vals[3] + vals[2] + vals[1] + vals[0];
} }

View File

@ -21,8 +21,8 @@
#include <float.h> #include <float.h>
/* Return X^2 + Y^2 - 1, computed without large cancellation error. /* Return X^2 + Y^2 - 1, computed without large cancellation error.
It is given that 1 > X >= Y >= epsilon / 2, and that either X >= It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
0.75 or Y >= 0.5. */ 0.5. */
float float
__x2y2m1f (float x, float y) __x2y2m1f (float x, float y)

View File

@ -80,32 +80,26 @@ compare (const void *p, const void *q)
} }
/* Return X^2 + Y^2 - 1, computed without large cancellation error. /* Return X^2 + Y^2 - 1, computed without large cancellation error.
It is given that 1 > X >= Y >= epsilon / 2, and that either X >= It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
0.75 or Y >= 0.5. */ 0.5. */
long double long double
__x2y2m1l (long double x, long double y) __x2y2m1l (long double x, long double y)
{ {
long double vals[4]; long double vals[5];
SET_RESTORE_ROUNDL (FE_TONEAREST); SET_RESTORE_ROUNDL (FE_TONEAREST);
mul_split (&vals[1], &vals[0], x, x); mul_split (&vals[1], &vals[0], x, x);
mul_split (&vals[3], &vals[2], y, y); mul_split (&vals[3], &vals[2], y, y);
if (x >= 0.75L) vals[4] = -1.0L;
vals[1] -= 1.0L; qsort (vals, 5, sizeof (long double), compare);
else
{
vals[1] -= 0.5L;
vals[3] -= 0.5L;
}
qsort (vals, 4, sizeof (long double), compare);
/* Add up the values so that each element of VALS has absolute value /* Add up the values so that each element of VALS has absolute value
at most equal to the last set bit of the next nonzero at most equal to the last set bit of the next nonzero
element. */ element. */
for (size_t i = 0; i <= 2; i++) for (size_t i = 0; i <= 3; i++)
{ {
add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]);
qsort (vals + i + 1, 3 - i, sizeof (long double), compare); qsort (vals + i + 1, 4 - i, sizeof (long double), compare);
} }
/* Now any error from this addition will be small. */ /* Now any error from this addition will be small. */
return vals[3] + vals[2] + vals[1] + vals[0]; return vals[4] + vals[3] + vals[2] + vals[1] + vals[0];
} }

View File

@ -80,13 +80,13 @@ compare (const void *p, const void *q)
} }
/* Return X^2 + Y^2 - 1, computed without large cancellation error. /* Return X^2 + Y^2 - 1, computed without large cancellation error.
It is given that 1 > X >= Y >= epsilon / 2, and that either X >= It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
0.75 or Y >= 0.5. */ 0.5. */
long double long double
__x2y2m1l (long double x, long double y) __x2y2m1l (long double x, long double y)
{ {
double vals[12]; double vals[13];
SET_RESTORE_ROUND (FE_TONEAREST); SET_RESTORE_ROUND (FE_TONEAREST);
union ibm_extended_long_double xu, yu; union ibm_extended_long_double xu, yu;
xu.ld = x; xu.ld = x;
@ -105,25 +105,19 @@ __x2y2m1l (long double x, long double y)
vals[8] *= 2.0; vals[8] *= 2.0;
vals[9] *= 2.0; vals[9] *= 2.0;
mul_split (&vals[11], &vals[10], yu.d[1].d, yu.d[1].d); mul_split (&vals[11], &vals[10], yu.d[1].d, yu.d[1].d);
if (xu.d[0].d >= 0.75) vals[12] = -1.0;
vals[1] -= 1.0; qsort (vals, 13, sizeof (double), compare);
else
{
vals[1] -= 0.5;
vals[7] -= 0.5;
}
qsort (vals, 12, sizeof (double), compare);
/* Add up the values so that each element of VALS has absolute value /* Add up the values so that each element of VALS has absolute value
at most equal to the last set bit of the next nonzero at most equal to the last set bit of the next nonzero
element. */ element. */
for (size_t i = 0; i <= 10; i++) for (size_t i = 0; i <= 11; i++)
{ {
add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]);
qsort (vals + i + 1, 11 - i, sizeof (double), compare); qsort (vals + i + 1, 12 - i, sizeof (double), compare);
} }
/* Now any error from this addition will be small. */ /* Now any error from this addition will be small. */
long double retval = (long double) vals[11]; long double retval = (long double) vals[12];
for (size_t i = 10; i != (size_t) -1; i--) for (size_t i = 11; i != (size_t) -1; i--)
retval += (long double) vals[i]; retval += (long double) vals[i];
return retval; return retval;
} }

View File

@ -27,8 +27,8 @@
#else #else
/* Return X^2 + Y^2 - 1, computed without large cancellation error. /* Return X^2 + Y^2 - 1, computed without large cancellation error.
It is given that 1 > X >= Y >= epsilon / 2, and that either X >= It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
0.75 or Y >= 0.5. */ 0.5. */
double double
__x2y2m1 (double x, double y) __x2y2m1 (double x, double y)

View File

@ -80,32 +80,26 @@ compare (const void *p, const void *q)
} }
/* Return X^2 + Y^2 - 1, computed without large cancellation error. /* Return X^2 + Y^2 - 1, computed without large cancellation error.
It is given that 1 > X >= Y >= epsilon / 2, and that either X >= It is given that 1 > X >= Y >= epsilon / 2, and that X^2 + Y^2 >=
0.75 or Y >= 0.5. */ 0.5. */
long double long double
__x2y2m1l (long double x, long double y) __x2y2m1l (long double x, long double y)
{ {
long double vals[4]; long double vals[5];
SET_RESTORE_ROUNDL (FE_TONEAREST); SET_RESTORE_ROUNDL (FE_TONEAREST);
mul_split (&vals[1], &vals[0], x, x); mul_split (&vals[1], &vals[0], x, x);
mul_split (&vals[3], &vals[2], y, y); mul_split (&vals[3], &vals[2], y, y);
if (x >= 0.75L) vals[4] = -1.0L;
vals[1] -= 1.0L; qsort (vals, 5, sizeof (long double), compare);
else
{
vals[1] -= 0.5L;
vals[3] -= 0.5L;
}
qsort (vals, 4, sizeof (long double), compare);
/* Add up the values so that each element of VALS has absolute value /* Add up the values so that each element of VALS has absolute value
at most equal to the last set bit of the next nonzero at most equal to the last set bit of the next nonzero
element. */ element. */
for (size_t i = 0; i <= 2; i++) for (size_t i = 0; i <= 3; i++)
{ {
add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]); add_split (&vals[i + 1], &vals[i], vals[i + 1], vals[i]);
qsort (vals + i + 1, 3 - i, sizeof (long double), compare); qsort (vals + i + 1, 4 - i, sizeof (long double), compare);
} }
/* Now any error from this addition will be small. */ /* Now any error from this addition will be small. */
return vals[3] + vals[2] + vals[1] + vals[0]; return vals[4] + vals[3] + vals[2] + vals[1] + vals[0];
} }

View File

@ -869,11 +869,11 @@ ldouble: 3
Function: Real part of "clog": Function: Real part of "clog":
double: 3 double: 3
float: 2 float: 3
idouble: 3 idouble: 3
ifloat: 2 ifloat: 3
ildouble: 4 ildouble: 3
ldouble: 4 ldouble: 3
Function: Imaginary part of "clog": Function: Imaginary part of "clog":
float: 1 float: 1
@ -883,9 +883,9 @@ ldouble: 1
Function: Real part of "clog10": Function: Real part of "clog10":
double: 3 double: 3
float: 3 float: 4
idouble: 3 idouble: 3
ifloat: 3 ifloat: 4
ildouble: 4 ildouble: 4
ldouble: 4 ldouble: 4
@ -898,10 +898,10 @@ ildouble: 2
ldouble: 2 ldouble: 2
Function: Real part of "clog10_downward": Function: Real part of "clog10_downward":
double: 6 double: 5
float: 6 float: 4
idouble: 6 idouble: 5
ifloat: 6 ifloat: 4
ildouble: 8 ildouble: 8
ldouble: 8 ldouble: 8
@ -910,14 +910,14 @@ double: 2
float: 4 float: 4
idouble: 2 idouble: 2
ifloat: 4 ifloat: 4
ildouble: 2 ildouble: 3
ldouble: 2 ldouble: 3
Function: Real part of "clog10_towardzero": Function: Real part of "clog10_towardzero":
double: 5 double: 5
float: 4 float: 5
idouble: 5 idouble: 5
ifloat: 4 ifloat: 5
ildouble: 8 ildouble: 8
ldouble: 8 ldouble: 8
@ -930,28 +930,28 @@ ildouble: 3
ldouble: 3 ldouble: 3
Function: Real part of "clog10_upward": Function: Real part of "clog10_upward":
double: 8 double: 6
float: 5 float: 5
idouble: 8 idouble: 6
ifloat: 5 ifloat: 5
ildouble: 6 ildouble: 7
ldouble: 6 ldouble: 7
Function: Imaginary part of "clog10_upward": Function: Imaginary part of "clog10_upward":
double: 2 double: 2
float: 3 float: 4
idouble: 2 idouble: 2
ifloat: 3 ifloat: 4
ildouble: 3 ildouble: 3
ldouble: 3 ldouble: 3
Function: Real part of "clog_downward": Function: Real part of "clog_downward":
double: 7 double: 4
float: 5 float: 3
idouble: 7 idouble: 4
ifloat: 5 ifloat: 3
ildouble: 7 ildouble: 5
ldouble: 7 ldouble: 5
Function: Imaginary part of "clog_downward": Function: Imaginary part of "clog_downward":
double: 1 double: 1
@ -962,28 +962,28 @@ ildouble: 1
ldouble: 1 ldouble: 1
Function: Real part of "clog_towardzero": Function: Real part of "clog_towardzero":
double: 7 double: 4
float: 5 float: 4
idouble: 7 idouble: 4
ifloat: 5 ifloat: 4
ildouble: 8 ildouble: 5
ldouble: 8 ldouble: 5
Function: Imaginary part of "clog_towardzero": Function: Imaginary part of "clog_towardzero":
double: 1 double: 1
float: 2 float: 3
idouble: 1 idouble: 1
ifloat: 2 ifloat: 3
ildouble: 1 ildouble: 1
ldouble: 1 ldouble: 1
Function: Real part of "clog_upward": Function: Real part of "clog_upward":
double: 8 double: 4
float: 5 float: 3
idouble: 8 idouble: 4
ifloat: 5 ifloat: 3
ildouble: 6 ildouble: 4
ldouble: 6 ldouble: 4
Function: Imaginary part of "clog_upward": Function: Imaginary part of "clog_upward":
double: 1 double: 1