mirror of
https://sourceware.org/git/glibc.git
synced 2024-11-26 23:10:06 +00:00
594 lines
20 KiB
Plaintext
594 lines
20 KiB
Plaintext
@node Representation Limits, System Configuration Limits, System Information, Top
|
|
@chapter Representation Limits
|
|
|
|
This chapter contains information about constants and parameters that
|
|
characterize the representation of the various integer and
|
|
floating-point types supported by the GNU C library.
|
|
|
|
@menu
|
|
* Integer Representation Limits:: Determining maximum and minimum
|
|
representation values of
|
|
various integer subtypes.
|
|
* Floating-Point Limits :: Parameters which characterize
|
|
supported floating-point
|
|
representations on a particular
|
|
system.
|
|
@end menu
|
|
|
|
@node Integer Representation Limits, Floating-Point Limits , , Representation Limits
|
|
@section Integer Representation Limits
|
|
@cindex integer representation limits
|
|
@cindex representation limits, integer
|
|
@cindex limits, integer representation
|
|
|
|
Sometimes it is necessary for programs to know about the internal
|
|
representation of various integer subtypes. For example, if you want
|
|
your program to be careful not to overflow an @code{int} counter
|
|
variable, you need to know what the largest representable value that
|
|
fits in an @code{int} is. These kinds of parameters can vary from
|
|
compiler to compiler and machine to machine. Another typical use of
|
|
this kind of parameter is in conditionalizing data structure definitions
|
|
with @samp{#ifdef} to select the most appropriate integer subtype that
|
|
can represent the required range of values.
|
|
|
|
Macros representing the minimum and maximum limits of the integer types
|
|
are defined in the header file @file{limits.h}. The values of these
|
|
macros are all integer constant expressions.
|
|
@pindex limits.h
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int CHAR_BIT
|
|
This is the number of bits in a @code{char}, usually eight.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int SCHAR_MIN
|
|
This is the minimum value that can be represented by a @code{signed char}.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int SCHAR_MAX
|
|
This is the maximum value that can be represented by a @code{signed char}.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int UCHAR_MAX
|
|
This is the maximum value that can be represented by a @code{unsigned char}.
|
|
(The minimum value of an @code{unsigned char} is zero.)
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int CHAR_MIN
|
|
This is the minimum value that can be represented by a @code{char}.
|
|
It's equal to @code{SCHAR_MIN} if @code{char} is signed, or zero
|
|
otherwise.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int CHAR_MAX
|
|
This is the maximum value that can be represented by a @code{char}.
|
|
It's equal to @code{SCHAR_MAX} if @code{char} is signed, or
|
|
@code{UCHAR_MAX} otherwise.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int SHRT_MIN
|
|
This is the minimum value that can be represented by a @code{signed
|
|
short int}. On most machines that the GNU C library runs on,
|
|
@code{short} integers are 16-bit quantities.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int SHRT_MAX
|
|
This is the maximum value that can be represented by a @code{signed
|
|
short int}.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int USHRT_MAX
|
|
This is the maximum value that can be represented by an @code{unsigned
|
|
short int}. (The minimum value of an @code{unsigned short int} is zero.)
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int INT_MIN
|
|
This is the minimum value that can be represented by a @code{signed
|
|
int}. On most machines that the GNU C system runs on, an @code{int} is
|
|
a 32-bit quantity.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro int INT_MAX
|
|
This is the maximum value that can be represented by a @code{signed
|
|
int}.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro {unsigned int} UINT_MAX
|
|
This is the maximum value that can be represented by an @code{unsigned
|
|
int}. (The minimum value of an @code{unsigned int} is zero.)
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro {long int} LONG_MIN
|
|
This is the minimum value that can be represented by a @code{signed long
|
|
int}. On most machines that the GNU C system runs on, @code{long}
|
|
integers are 32-bit quantities, the same size as @code{int}.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro {long int} LONG_MAX
|
|
This is the maximum value that can be represented by a @code{signed long
|
|
int}.
|
|
@end deftypevr
|
|
|
|
@comment limits.h
|
|
@comment ANSI
|
|
@deftypevr Macro {unsigned long int} ULONG_MAX
|
|
This is the maximum value that can be represented by an @code{unsigned
|
|
long int}. (The minimum value of an @code{unsigned long int} is zero.)
|
|
@end deftypevr
|
|
|
|
@strong{Incomplete:} There should be corresponding limits for the GNU
|
|
C Compiler's @code{long long} type, too. (But they are not now present
|
|
in the header file.)
|
|
|
|
The header file @file{limits.h} also defines some additional constants
|
|
that parameterize various operating system and file system limits. These
|
|
constants are described in @ref{System Parameters} and @ref{File System
|
|
Parameters}.
|
|
@pindex limits.h
|
|
|
|
|
|
@node Floating-Point Limits , , Integer Representation Limits, Representation Limits
|
|
@section Floating-Point Limits
|
|
@cindex floating-point number representation
|
|
@cindex representation, floating-point number
|
|
@cindex limits, floating-point representation
|
|
|
|
Because floating-point numbers are represented internally as approximate
|
|
quantities, algorithms for manipulating floating-point data often need
|
|
to be parameterized in terms of the accuracy of the representation.
|
|
Some of the functions in the C library itself need this information; for
|
|
example, the algorithms for printing and reading floating-point numbers
|
|
(@pxref{I/O on Streams}) and for calculating trigonometric and
|
|
irrational functions (@pxref{Mathematics}) use information about the
|
|
underlying floating-point representation to avoid round-off error and
|
|
loss of accuracy. User programs that implement numerical analysis
|
|
techniques also often need to be parameterized in this way in order to
|
|
minimize or compute error bounds.
|
|
|
|
The specific representation of floating-point numbers varies from
|
|
machine to machine. The GNU C library defines a set of parameters which
|
|
characterize each of the supported floating-point representations on a
|
|
particular system.
|
|
|
|
@menu
|
|
* Floating-Point Representation:: Definitions of terminology.
|
|
* Floating-Point Parameters:: Descriptions of the library
|
|
facilities.
|
|
* IEEE Floating Point:: An example of a common
|
|
representation.
|
|
@end menu
|
|
|
|
@node Floating-Point Representation, Floating-Point Parameters, , Floating-Point Limits
|
|
@subsection Floating-Point Representation
|
|
|
|
This section introduces the terminology used to characterize the
|
|
representation of floating-point numbers.
|
|
|
|
You are probably already familiar with most of these concepts in terms
|
|
of scientific or exponential notation for floating-point numbers. For
|
|
example, the number @code{123456.0} could be expressed in exponential
|
|
notation as @code{1.23456e+05}, a shorthand notation indicating that the
|
|
mantissa @code{1.23456} is multiplied by the base @code{10} raised to
|
|
power @code{5}.
|
|
|
|
More formally, the internal representation of a floating-point number
|
|
can be characterized in terms of the following parameters:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
The @dfn{sign} is either @code{-1} or @code{1}.
|
|
@cindex sign (of floating-point number)
|
|
|
|
@item
|
|
The @dfn{base} or @dfn{radix} for exponentiation; an integer greater
|
|
than @code{1}. This is a constant for the particular representation.
|
|
@cindex base (of floating-point number)
|
|
@cindex radix (of floating-point number)
|
|
|
|
@item
|
|
The @dfn{exponent} to which the base is raised. The upper and lower
|
|
bounds of the exponent value are constants for the particular
|
|
representation.
|
|
@cindex exponent (of floating-point number)
|
|
|
|
Sometimes, in the actual bits representing the floating-point number,
|
|
the exponent is @dfn{biased} by adding a constant to it, to make it
|
|
always be represented as an unsigned quantity. This is only important
|
|
if you have some reason to pick apart the bit fields making up the
|
|
floating-point number by hand, which is something for which the GNU
|
|
library provides no support. So this is ignored in the discussion that
|
|
follows.
|
|
@cindex bias (of floating-point number exponent)
|
|
|
|
@item
|
|
The value of the @dfn{mantissa} or @dfn{significand}, which is an
|
|
unsigned integer.
|
|
@cindex mantissa (of floating-point number)
|
|
@cindex significand (of floating-point number)
|
|
|
|
@item
|
|
The @dfn{precision} of the mantissa. If the base of the representation
|
|
is @var{b}, then the precision is the number of base-@var{b} digits in
|
|
the mantissa. This is a constant for the particular representation.
|
|
|
|
Many floating-point representations have an implicit @dfn{hidden bit} in
|
|
the mantissa. Any such hidden bits are counted in the precision.
|
|
Again, the GNU library provides no facilities for dealing with such low-level
|
|
aspects of the representation.
|
|
@cindex precision (of floating-point number)
|
|
@cindex hidden bit (of floating-point number mantissa)
|
|
@end itemize
|
|
|
|
The mantissa of a floating-point number actually represents an implicit
|
|
fraction whose denominator is the base raised to the power of the
|
|
precision. Since the largest representable mantissa is one less than
|
|
this denominator, the value of the fraction is always strictly less than
|
|
@code{1}. The mathematical value of a floating-point number is then the
|
|
product of this fraction; the sign; and the base raised to the exponent.
|
|
|
|
If the floating-point number is @dfn{normalized}, the mantissa is also
|
|
greater than or equal to the base raised to the power of one less
|
|
than the precision (unless the number represents a floating-point zero,
|
|
in which case the mantissa is zero). The fractional quantity is
|
|
therefore greater than or equal to @code{1/@var{b}}, where @var{b} is
|
|
the base.
|
|
@cindex normalized floating-point number
|
|
|
|
@node Floating-Point Parameters, IEEE Floating Point, Floating-Point Representation, Floating-Point Limits
|
|
@subsection Floating-Point Parameters
|
|
|
|
@strong{Incomplete:} This section needs some more concrete examples
|
|
of what these parameters mean and how to use them in a program.
|
|
|
|
These macro definitions can be accessed by including the header file
|
|
@file{float.h} in your program.
|
|
@pindex float.h
|
|
|
|
Macro names starting with @samp{FLT_} refer to the @code{float} type,
|
|
while names beginning with @samp{DBL_} refer to the @code{double} type
|
|
and names beginning with @samp{LDBL_} refer to the @code{long double}
|
|
type. (In implementations that do not support @code{long double} as
|
|
a distinct data type, the values for those constants are the same
|
|
as the corresponding constants for the @code{double} type.)@refill
|
|
@cindex @code{float} representation limits
|
|
@cindex @code{double} representation limits
|
|
@cindex @code{long double} representation limits
|
|
|
|
Of these macros, only @code{FLT_RADIX} is guaranteed to be a constant
|
|
expression. The other macros listed here cannot be reliably used in
|
|
places that require constant expressions, such as @samp{#if}
|
|
preprocessing directives or array size specifications.
|
|
|
|
Although the ANSI C standard specifies minimum and maximum values for
|
|
most of these parameters, the GNU C implementation uses whatever
|
|
floating-point representations are supported by the underlying hardware.
|
|
So whether GNU C actually satisfies the ANSI C requirements depends on
|
|
what machine it is running on.
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_ROUNDS
|
|
This value characterizes the rounding mode for floating-point addition.
|
|
The following values indicate standard rounding modes:
|
|
|
|
@table @code
|
|
@item -1
|
|
The mode is indeterminable.
|
|
@item 0
|
|
Rounding is towards zero.
|
|
@item 1
|
|
Rounding is to the nearest number.
|
|
@item 2
|
|
Rounding is towards positive infinity.
|
|
@item 3
|
|
Rounding is towards negative infinity.
|
|
@end table
|
|
|
|
@noindent
|
|
Any other value represents a machine-dependent nonstandard rounding
|
|
mode.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_RADIX
|
|
This is the value of the base, or radix, of exponent representation.
|
|
This is guaranteed to be a constant expression, unlike the other macros
|
|
described in this section.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_MANT_DIG
|
|
This is the number of base-@code{FLT_RADIX} digits in the floating-point
|
|
mantissa for the @code{float} data type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int DBL_MANT_DIG
|
|
This is the number of base-@code{FLT_RADIX} digits in the floating-point
|
|
mantissa for the @code{double} data type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int LDBL_MANT_DIG
|
|
This is the number of base-@code{FLT_RADIX} digits in the floating-point
|
|
mantissa for the @code{long double} data type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_DIG
|
|
This is the number of decimal digits of precision for the @code{float}
|
|
data type. Technically, if @var{p} and @var{b} are the precision and
|
|
base (respectively) for the representation, then the decimal precision
|
|
@var{q} is the maximum number of decimal digits such that any floating
|
|
point number with @var{q} base 10 digits can be rounded to a floating
|
|
point number with @var{p} base @var{b} digits and back again, without
|
|
change to the @var{q} decimal digits.
|
|
|
|
The value of this macro is guaranteed to be at least @code{6}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int DBL_DIG
|
|
This is similar to @code{FLT_DIG}, but is for the @code{double} data
|
|
type. The value of this macro is guaranteed to be at least @code{10}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int LDBL_DIG
|
|
This is similar to @code{FLT_DIG}, but is for the @code{long double}
|
|
data type. The value of this macro is guaranteed to be at least
|
|
@code{10}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_MIN_EXP
|
|
This is the minimum negative integer such that the mathematical value
|
|
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
|
|
normalized floating-point number of type @code{float}. In terms of the
|
|
actual implementation, this is just the smallest value that can be
|
|
represented in the exponent field of the number.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int DBL_MIN_EXP
|
|
This is similar to @code{FLT_MIN_EXP}, but is for the @code{double} data
|
|
type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int LDBL_MIN_EXP
|
|
This is similar to @code{FLT_MIN_EXP}, but is for the @code{long double}
|
|
data type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_MIN_10_EXP
|
|
This is the minimum negative integer such that the mathematical value
|
|
@code{10} raised to this power minus 1 can be represented as a
|
|
normalized floating-point number of type @code{float}. This is
|
|
guaranteed to be no greater than @code{-37}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int DBL_MIN_10_EXP
|
|
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{double}
|
|
data type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int LDBL_MIN_10_EXP
|
|
This is similar to @code{FLT_MIN_10_EXP}, but is for the @code{long
|
|
double} data type.
|
|
@end deftypevr
|
|
|
|
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_MAX_EXP
|
|
This is the maximum negative integer such that the mathematical value
|
|
@code{FLT_RADIX} raised to this power minus 1 can be represented as a
|
|
floating-point number of type @code{float}. In terms of the actual
|
|
implementation, this is just the largest value that can be represented
|
|
in the exponent field of the number.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int DBL_MAX_EXP
|
|
This is similar to @code{FLT_MAX_EXP}, but is for the @code{double} data
|
|
type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int LDBL_MAX_EXP
|
|
This is similar to @code{FLT_MAX_EXP}, but is for the @code{long double}
|
|
data type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int FLT_MAX_10_EXP
|
|
This is the maximum negative integer such that the mathematical value
|
|
@code{10} raised to this power minus 1 can be represented as a
|
|
normalized floating-point number of type @code{float}. This is
|
|
guaranteed to be at least @code{37}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int DBL_MAX_10_EXP
|
|
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{double}
|
|
data type.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro int LDBL_MAX_10_EXP
|
|
This is similar to @code{FLT_MAX_10_EXP}, but is for the @code{long
|
|
double} data type.
|
|
@end deftypevr
|
|
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro double FLT_MAX
|
|
The value of this macro is the maximum representable floating-point
|
|
number of type @code{float}, and is guaranteed to be at least
|
|
@code{1E+37}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro double DBL_MAX
|
|
The value of this macro is the maximum representable floating-point
|
|
number of type @code{double}, and is guaranteed to be at least
|
|
@code{1E+37}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro {long double} LDBL_MAX
|
|
The value of this macro is the maximum representable floating-point
|
|
number of type @code{long double}, and is guaranteed to be at least
|
|
@code{1E+37}.
|
|
@end deftypevr
|
|
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro double FLT_MIN
|
|
The value of this macro is the minimum normalized positive
|
|
floating-point number that is representable by type @code{float}, and is
|
|
guaranteed to be no more than @code{1E-37}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro double DBL_MIN
|
|
The value of this macro is the minimum normalized positive
|
|
floating-point number that is representable by type @code{double}, and
|
|
is guaranteed to be no more than @code{1E-37}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro {long double} LDBL_MIN
|
|
The value of this macro is the minimum normalized positive
|
|
floating-point number that is representable by type @code{long double},
|
|
and is guaranteed to be no more than @code{1E-37}.
|
|
@end deftypevr
|
|
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro double FLT_EPSILON
|
|
This is the minimum positive floating-point number of type @code{float}
|
|
such that @code{1.0 + FLT_EPSILON != 1.0} is true. It's guaranteed to
|
|
be no greater than @code{1E-5}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro double DBL_EPSILON
|
|
This is similar to @code{FLT_EPSILON}, but is for the @code{double}
|
|
type. The maximum value is @code{1E-9}.
|
|
@end deftypevr
|
|
|
|
@comment float.h
|
|
@comment ANSI
|
|
@deftypevr Macro {long double} LDBL_EPSILON
|
|
This is similar to @code{FLT_EPSILON}, but is for the @code{long double}
|
|
type. The maximum value is @code{1E-9}.
|
|
@end deftypevr
|
|
|
|
|
|
@node IEEE Floating Point, , Floating-Point Parameters, Floating-Point Limits
|
|
@subsection IEEE Floating Point
|
|
@cindex IEEE floating-point representation
|
|
@cindex floating-point, IEEE
|
|
@cindex IEEE Std 754
|
|
|
|
|
|
Here is an example showing how these parameters work for a common
|
|
floating point representation, specified by the @cite{IEEE Standard for
|
|
Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)}. Nearly
|
|
all computers today use this format.
|
|
|
|
The IEEE single-precision float representation uses a base of 2. There
|
|
is a sign bit, a mantissa with 23 bits plus one hidden bit (so the total
|
|
precision is 24 base-2 digits), and an 8-bit exponent that can represent
|
|
values in the range -125 to 128, inclusive.
|
|
|
|
So, for an implementation that uses this representation for the
|
|
@code{float} data type, appropriate values for the corresponding
|
|
parameters are:
|
|
|
|
@example
|
|
FLT_RADIX 2
|
|
FLT_MANT_DIG 24
|
|
FLT_DIG 6
|
|
FLT_MIN_EXP -125
|
|
FLT_MIN_10_EXP -37
|
|
FLT_MAX_EXP 128
|
|
FLT_MAX_10_EXP +38
|
|
FLT_MIN 1.17549435E-38F
|
|
FLT_MAX 3.40282347E+38F
|
|
FLT_EPSILON 1.19209290E-07F
|
|
@end example
|
|
|
|
Here are the values for the @code{double} data type:
|
|
|
|
@example
|
|
DBL_MANT_DIG 53
|
|
DBL_DIG 15
|
|
DBL_MIN_EXP -1021
|
|
DBL_MIN_10_EXP -307
|
|
DBL_MAX_EXP 1024
|
|
DBL_MAX_10_EXP 308
|
|
DBL_MAX 1.7976931348623157E+308
|
|
DBL_MIN 2.2250738585072014E-308
|
|
DBL_EPSILON 2.2204460492503131E-016
|
|
@end example
|