mirror of
https://sourceware.org/git/glibc.git
synced 2024-11-24 14:00:30 +00:00
30aa57851a
2006-09-12 Jakub Jelinek <jakub@redhat.com> [BZ #2526] * README.libm: Fix a thinko in sqrt algorithm description. [BZ #3143] * manual/string.texi (argz_delete): Fix prototype. Patch by <alpt@freaknet.org>. 2006-08-26 Joseph Myers <joseph@codesourcery.com> [BZ #3138] * io/test-lfs.c (do_prepare): Give name_len type size_t. * io/tst-fcntl.c (do_prepare): Likewise. * posix/tst-exec.c (do_prepare): Likewise. * posix/tst-preadwrite.c (do_prepare): Likewise. * posix/tst-spawn.c (do_prepare): Likewise. * posix/tst-truncate.c (do_prepare): Likewise. * rt/tst-aio.c (do_prepare): Likewise. * rt/tst-aio64.c (do_prepare): Likewise. * stdlib/test-canon2.c (do_prepare): Give test_dir_len type size_t.
2675 lines
105 KiB
Plaintext
2675 lines
105 KiB
Plaintext
@node String and Array Utilities, Character Set Handling, Character Handling, Top
|
|
@c %MENU% Utilities for copying and comparing strings and arrays
|
|
@chapter String and Array Utilities
|
|
|
|
Operations on strings (or arrays of characters) are an important part of
|
|
many programs. The GNU C library provides an extensive set of string
|
|
utility functions, including functions for copying, concatenating,
|
|
comparing, and searching strings. Many of these functions can also
|
|
operate on arbitrary regions of storage; for example, the @code{memcpy}
|
|
function can be used to copy the contents of any kind of array.
|
|
|
|
It's fairly common for beginning C programmers to ``reinvent the wheel''
|
|
by duplicating this functionality in their own code, but it pays to
|
|
become familiar with the library functions and to make use of them,
|
|
since this offers benefits in maintenance, efficiency, and portability.
|
|
|
|
For instance, you could easily compare one string to another in two
|
|
lines of C code, but if you use the built-in @code{strcmp} function,
|
|
you're less likely to make a mistake. And, since these library
|
|
functions are typically highly optimized, your program may run faster
|
|
too.
|
|
|
|
@menu
|
|
* Representation of Strings:: Introduction to basic concepts.
|
|
* String/Array Conventions:: Whether to use a string function or an
|
|
arbitrary array function.
|
|
* String Length:: Determining the length of a string.
|
|
* Copying and Concatenation:: Functions to copy the contents of strings
|
|
and arrays.
|
|
* String/Array Comparison:: Functions for byte-wise and character-wise
|
|
comparison.
|
|
* Collation Functions:: Functions for collating strings.
|
|
* Search Functions:: Searching for a specific element or substring.
|
|
* Finding Tokens in a String:: Splitting a string into tokens by looking
|
|
for delimiters.
|
|
* strfry:: Function for flash-cooking a string.
|
|
* Trivial Encryption:: Obscuring data.
|
|
* Encode Binary Data:: Encoding and Decoding of Binary Data.
|
|
* Argz and Envz Vectors:: Null-separated string vectors.
|
|
@end menu
|
|
|
|
@node Representation of Strings
|
|
@section Representation of Strings
|
|
@cindex string, representation of
|
|
|
|
This section is a quick summary of string concepts for beginning C
|
|
programmers. It describes how character strings are represented in C
|
|
and some common pitfalls. If you are already familiar with this
|
|
material, you can skip this section.
|
|
|
|
@cindex string
|
|
@cindex multibyte character string
|
|
A @dfn{string} is an array of @code{char} objects. But string-valued
|
|
variables are usually declared to be pointers of type @code{char *}.
|
|
Such variables do not include space for the text of a string; that has
|
|
to be stored somewhere else---in an array variable, a string constant,
|
|
or dynamically allocated memory (@pxref{Memory Allocation}). It's up to
|
|
you to store the address of the chosen memory space into the pointer
|
|
variable. Alternatively you can store a @dfn{null pointer} in the
|
|
pointer variable. The null pointer does not point anywhere, so
|
|
attempting to reference the string it points to gets an error.
|
|
|
|
@cindex wide character string
|
|
``string'' normally refers to multibyte character strings as opposed to
|
|
wide character strings. Wide character strings are arrays of type
|
|
@code{wchar_t} and as for multibyte character strings usually pointers
|
|
of type @code{wchar_t *} are used.
|
|
|
|
@cindex null character
|
|
@cindex null wide character
|
|
By convention, a @dfn{null character}, @code{'\0'}, marks the end of a
|
|
multibyte character string and the @dfn{null wide character},
|
|
@code{L'\0'}, marks the end of a wide character string. For example, in
|
|
testing to see whether the @code{char *} variable @var{p} points to a
|
|
null character marking the end of a string, you can write
|
|
@code{!*@var{p}} or @code{*@var{p} == '\0'}.
|
|
|
|
A null character is quite different conceptually from a null pointer,
|
|
although both are represented by the integer @code{0}.
|
|
|
|
@cindex string literal
|
|
@dfn{String literals} appear in C program source as strings of
|
|
characters between double-quote characters (@samp{"}) where the initial
|
|
double-quote character is immediately preceded by a capital @samp{L}
|
|
(ell) character (as in @code{L"foo"}). In @w{ISO C}, string literals
|
|
can also be formed by @dfn{string concatenation}: @code{"a" "b"} is the
|
|
same as @code{"ab"}. For wide character strings one can either use
|
|
@code{L"a" L"b"} or @code{L"a" "b"}. Modification of string literals is
|
|
not allowed by the GNU C compiler, because literals are placed in
|
|
read-only storage.
|
|
|
|
Character arrays that are declared @code{const} cannot be modified
|
|
either. It's generally good style to declare non-modifiable string
|
|
pointers to be of type @code{const char *}, since this often allows the
|
|
C compiler to detect accidental modifications as well as providing some
|
|
amount of documentation about what your program intends to do with the
|
|
string.
|
|
|
|
The amount of memory allocated for the character array may extend past
|
|
the null character that normally marks the end of the string. In this
|
|
document, the term @dfn{allocated size} is always used to refer to the
|
|
total amount of memory allocated for the string, while the term
|
|
@dfn{length} refers to the number of characters up to (but not
|
|
including) the terminating null character.
|
|
@cindex length of string
|
|
@cindex allocation size of string
|
|
@cindex size of string
|
|
@cindex string length
|
|
@cindex string allocation
|
|
|
|
A notorious source of program bugs is trying to put more characters in a
|
|
string than fit in its allocated size. When writing code that extends
|
|
strings or moves characters into a pre-allocated array, you should be
|
|
very careful to keep track of the length of the text and make explicit
|
|
checks for overflowing the array. Many of the library functions
|
|
@emph{do not} do this for you! Remember also that you need to allocate
|
|
an extra byte to hold the null character that marks the end of the
|
|
string.
|
|
|
|
@cindex single-byte string
|
|
@cindex multibyte string
|
|
Originally strings were sequences of bytes where each byte represents a
|
|
single character. This is still true today if the strings are encoded
|
|
using a single-byte character encoding. Things are different if the
|
|
strings are encoded using a multibyte encoding (for more information on
|
|
encodings see @ref{Extended Char Intro}). There is no difference in
|
|
the programming interface for these two kind of strings; the programmer
|
|
has to be aware of this and interpret the byte sequences accordingly.
|
|
|
|
But since there is no separate interface taking care of these
|
|
differences the byte-based string functions are sometimes hard to use.
|
|
Since the count parameters of these functions specify bytes a call to
|
|
@code{strncpy} could cut a multibyte character in the middle and put an
|
|
incomplete (and therefore unusable) byte sequence in the target buffer.
|
|
|
|
@cindex wide character string
|
|
To avoid these problems later versions of the @w{ISO C} standard
|
|
introduce a second set of functions which are operating on @dfn{wide
|
|
characters} (@pxref{Extended Char Intro}). These functions don't have
|
|
the problems the single-byte versions have since every wide character is
|
|
a legal, interpretable value. This does not mean that cutting wide
|
|
character strings at arbitrary points is without problems. It normally
|
|
is for alphabet-based languages (except for non-normalized text) but
|
|
languages based on syllables still have the problem that more than one
|
|
wide character is necessary to complete a logical unit. This is a
|
|
higher level problem which the @w{C library} functions are not designed
|
|
to solve. But it is at least good that no invalid byte sequences can be
|
|
created. Also, the higher level functions can also much easier operate
|
|
on wide character than on multibyte characters so that a general advise
|
|
is to use wide characters internally whenever text is more than simply
|
|
copied.
|
|
|
|
The remaining of this chapter will discuss the functions for handling
|
|
wide character strings in parallel with the discussion of the multibyte
|
|
character strings since there is almost always an exact equivalent
|
|
available.
|
|
|
|
@node String/Array Conventions
|
|
@section String and Array Conventions
|
|
|
|
This chapter describes both functions that work on arbitrary arrays or
|
|
blocks of memory, and functions that are specific to null-terminated
|
|
arrays of characters and wide characters.
|
|
|
|
Functions that operate on arbitrary blocks of memory have names
|
|
beginning with @samp{mem} and @samp{wmem} (such as @code{memcpy} and
|
|
@code{wmemcpy}) and invariably take an argument which specifies the size
|
|
(in bytes and wide characters respectively) of the block of memory to
|
|
operate on. The array arguments and return values for these functions
|
|
have type @code{void *} or @code{wchar_t}. As a matter of style, the
|
|
elements of the arrays used with the @samp{mem} functions are referred
|
|
to as ``bytes''. You can pass any kind of pointer to these functions,
|
|
and the @code{sizeof} operator is useful in computing the value for the
|
|
size argument. Parameters to the @samp{wmem} functions must be of type
|
|
@code{wchar_t *}. These functions are not really usable with anything
|
|
but arrays of this type.
|
|
|
|
In contrast, functions that operate specifically on strings and wide
|
|
character strings have names beginning with @samp{str} and @samp{wcs}
|
|
respectively (such as @code{strcpy} and @code{wcscpy}) and look for a
|
|
null character to terminate the string instead of requiring an explicit
|
|
size argument to be passed. (Some of these functions accept a specified
|
|
maximum length, but they also check for premature termination with a
|
|
null character.) The array arguments and return values for these
|
|
functions have type @code{char *} and @code{wchar_t *} respectively, and
|
|
the array elements are referred to as ``characters'' and ``wide
|
|
characters''.
|
|
|
|
In many cases, there are both @samp{mem} and @samp{str}/@samp{wcs}
|
|
versions of a function. The one that is more appropriate to use depends
|
|
on the exact situation. When your program is manipulating arbitrary
|
|
arrays or blocks of storage, then you should always use the @samp{mem}
|
|
functions. On the other hand, when you are manipulating null-terminated
|
|
strings it is usually more convenient to use the @samp{str}/@samp{wcs}
|
|
functions, unless you already know the length of the string in advance.
|
|
The @samp{wmem} functions should be used for wide character arrays with
|
|
known size.
|
|
|
|
@cindex wint_t
|
|
@cindex parameter promotion
|
|
Some of the memory and string functions take single characters as
|
|
arguments. Since a value of type @code{char} is automatically promoted
|
|
into an value of type @code{int} when used as a parameter, the functions
|
|
are declared with @code{int} as the type of the parameter in question.
|
|
In case of the wide character function the situation is similarly: the
|
|
parameter type for a single wide character is @code{wint_t} and not
|
|
@code{wchar_t}. This would for many implementations not be necessary
|
|
since the @code{wchar_t} is large enough to not be automatically
|
|
promoted, but since the @w{ISO C} standard does not require such a
|
|
choice of types the @code{wint_t} type is used.
|
|
|
|
@node String Length
|
|
@section String Length
|
|
|
|
You can get the length of a string using the @code{strlen} function.
|
|
This function is declared in the header file @file{string.h}.
|
|
@pindex string.h
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun size_t strlen (const char *@var{s})
|
|
The @code{strlen} function returns the length of the null-terminated
|
|
string @var{s} in bytes. (In other words, it returns the offset of the
|
|
terminating null character within the array.)
|
|
|
|
For example,
|
|
@smallexample
|
|
strlen ("hello, world")
|
|
@result{} 12
|
|
@end smallexample
|
|
|
|
When applied to a character array, the @code{strlen} function returns
|
|
the length of the string stored there, not its allocated size. You can
|
|
get the allocated size of the character array that holds a string using
|
|
the @code{sizeof} operator:
|
|
|
|
@smallexample
|
|
char string[32] = "hello, world";
|
|
sizeof (string)
|
|
@result{} 32
|
|
strlen (string)
|
|
@result{} 12
|
|
@end smallexample
|
|
|
|
But beware, this will not work unless @var{string} is the character
|
|
array itself, not a pointer to it. For example:
|
|
|
|
@smallexample
|
|
char string[32] = "hello, world";
|
|
char *ptr = string;
|
|
sizeof (string)
|
|
@result{} 32
|
|
sizeof (ptr)
|
|
@result{} 4 /* @r{(on a machine with 4 byte pointers)} */
|
|
@end smallexample
|
|
|
|
This is an easy mistake to make when you are working with functions that
|
|
take string arguments; those arguments are always pointers, not arrays.
|
|
|
|
It must also be noted that for multibyte encoded strings the return
|
|
value does not have to correspond to the number of characters in the
|
|
string. To get this value the string can be converted to wide
|
|
characters and @code{wcslen} can be used or something like the following
|
|
code can be used:
|
|
|
|
@smallexample
|
|
/* @r{The input is in @code{string}.}
|
|
@r{The length is expected in @code{n}.} */
|
|
@{
|
|
mbstate_t t;
|
|
char *scopy = string;
|
|
/* In initial state. */
|
|
memset (&t, '\0', sizeof (t));
|
|
/* Determine number of characters. */
|
|
n = mbsrtowcs (NULL, &scopy, strlen (scopy), &t);
|
|
@}
|
|
@end smallexample
|
|
|
|
This is cumbersome to do so if the number of characters (as opposed to
|
|
bytes) is needed often it is better to work with wide characters.
|
|
@end deftypefun
|
|
|
|
The wide character equivalent is declared in @file{wchar.h}.
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun size_t wcslen (const wchar_t *@var{ws})
|
|
The @code{wcslen} function is the wide character equivalent to
|
|
@code{strlen}. The return value is the number of wide characters in the
|
|
wide character string pointed to by @var{ws} (this is also the offset of
|
|
the terminating null wide character of @var{ws}).
|
|
|
|
Since there are no multi wide character sequences making up one
|
|
character the return value is not only the offset in the array, it is
|
|
also the number of wide characters.
|
|
|
|
This function was introduced in @w{Amendment 1} to @w{ISO C90}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun size_t strnlen (const char *@var{s}, size_t @var{maxlen})
|
|
The @code{strnlen} function returns the length of the string @var{s} in
|
|
bytes if this length is smaller than @var{maxlen} bytes. Otherwise it
|
|
returns @var{maxlen}. Therefore this function is equivalent to
|
|
@code{(strlen (@var{s}) < n ? strlen (@var{s}) : @var{maxlen})} but it
|
|
is more efficient and works even if the string @var{s} is not
|
|
null-terminated.
|
|
|
|
@smallexample
|
|
char string[32] = "hello, world";
|
|
strnlen (string, 32)
|
|
@result{} 12
|
|
strnlen (string, 5)
|
|
@result{} 5
|
|
@end smallexample
|
|
|
|
This function is a GNU extension and is declared in @file{string.h}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
|
|
@code{wcsnlen} is the wide character equivalent to @code{strnlen}. The
|
|
@var{maxlen} parameter specifies the maximum number of wide characters.
|
|
|
|
This function is a GNU extension and is declared in @file{wchar.h}.
|
|
@end deftypefun
|
|
|
|
@node Copying and Concatenation
|
|
@section Copying and Concatenation
|
|
|
|
You can use the functions described in this section to copy the contents
|
|
of strings and arrays, or to append the contents of one string to
|
|
another. The @samp{str} and @samp{mem} functions are declared in the
|
|
header file @file{string.h} while the @samp{wstr} and @samp{wmem}
|
|
functions are declared in the file @file{wchar.h}.
|
|
@pindex string.h
|
|
@pindex wchar.h
|
|
@cindex copying strings and arrays
|
|
@cindex string copy functions
|
|
@cindex array copy functions
|
|
@cindex concatenating strings
|
|
@cindex string concatenation functions
|
|
|
|
A helpful way to remember the ordering of the arguments to the functions
|
|
in this section is that it corresponds to an assignment expression, with
|
|
the destination array specified to the left of the source array. All
|
|
of these functions return the address of the destination array.
|
|
|
|
Most of these functions do not work properly if the source and
|
|
destination arrays overlap. For example, if the beginning of the
|
|
destination array overlaps the end of the source array, the original
|
|
contents of that part of the source array may get overwritten before it
|
|
is copied. Even worse, in the case of the string functions, the null
|
|
character marking the end of the string may be lost, and the copy
|
|
function might get stuck in a loop trashing all the memory allocated to
|
|
your program.
|
|
|
|
All functions that have problems copying between overlapping arrays are
|
|
explicitly identified in this manual. In addition to functions in this
|
|
section, there are a few others like @code{sprintf} (@pxref{Formatted
|
|
Output Functions}) and @code{scanf} (@pxref{Formatted Input
|
|
Functions}).
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {void *} memcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
|
|
The @code{memcpy} function copies @var{size} bytes from the object
|
|
beginning at @var{from} into the object beginning at @var{to}. The
|
|
behavior of this function is undefined if the two arrays @var{to} and
|
|
@var{from} overlap; use @code{memmove} instead if overlapping is possible.
|
|
|
|
The value returned by @code{memcpy} is the value of @var{to}.
|
|
|
|
Here is an example of how you might use @code{memcpy} to copy the
|
|
contents of an array:
|
|
|
|
@smallexample
|
|
struct foo *oldarray, *newarray;
|
|
int arraysize;
|
|
@dots{}
|
|
memcpy (new, old, arraysize * sizeof (struct foo));
|
|
@end smallexample
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wmemcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
|
|
The @code{wmemcpy} function copies @var{size} wide characters from the object
|
|
beginning at @var{wfrom} into the object beginning at @var{wto}. The
|
|
behavior of this function is undefined if the two arrays @var{wto} and
|
|
@var{wfrom} overlap; use @code{wmemmove} instead if overlapping is possible.
|
|
|
|
The following is a possible implementation of @code{wmemcpy} but there
|
|
are more optimizations possible.
|
|
|
|
@smallexample
|
|
wchar_t *
|
|
wmemcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
|
|
size_t size)
|
|
@{
|
|
return (wchar_t *) memcpy (wto, wfrom, size * sizeof (wchar_t));
|
|
@}
|
|
@end smallexample
|
|
|
|
The value returned by @code{wmemcpy} is the value of @var{wto}.
|
|
|
|
This function was introduced in @w{Amendment 1} to @w{ISO C90}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {void *} mempcpy (void *restrict @var{to}, const void *restrict @var{from}, size_t @var{size})
|
|
The @code{mempcpy} function is nearly identical to the @code{memcpy}
|
|
function. It copies @var{size} bytes from the object beginning at
|
|
@code{from} into the object pointed to by @var{to}. But instead of
|
|
returning the value of @var{to} it returns a pointer to the byte
|
|
following the last written byte in the object beginning at @var{to}.
|
|
I.e., the value is @code{((void *) ((char *) @var{to} + @var{size}))}.
|
|
|
|
This function is useful in situations where a number of objects shall be
|
|
copied to consecutive memory positions.
|
|
|
|
@smallexample
|
|
void *
|
|
combine (void *o1, size_t s1, void *o2, size_t s2)
|
|
@{
|
|
void *result = malloc (s1 + s2);
|
|
if (result != NULL)
|
|
mempcpy (mempcpy (result, o1, s1), o2, s2);
|
|
return result;
|
|
@}
|
|
@end smallexample
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun {wchar_t *} wmempcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
|
|
The @code{wmempcpy} function is nearly identical to the @code{wmemcpy}
|
|
function. It copies @var{size} wide characters from the object
|
|
beginning at @code{wfrom} into the object pointed to by @var{wto}. But
|
|
instead of returning the value of @var{wto} it returns a pointer to the
|
|
wide character following the last written wide character in the object
|
|
beginning at @var{wto}. I.e., the value is @code{@var{wto} + @var{size}}.
|
|
|
|
This function is useful in situations where a number of objects shall be
|
|
copied to consecutive memory positions.
|
|
|
|
The following is a possible implementation of @code{wmemcpy} but there
|
|
are more optimizations possible.
|
|
|
|
@smallexample
|
|
wchar_t *
|
|
wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
|
|
size_t size)
|
|
@{
|
|
return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
|
|
@}
|
|
@end smallexample
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {void *} memmove (void *@var{to}, const void *@var{from}, size_t @var{size})
|
|
@code{memmove} copies the @var{size} bytes at @var{from} into the
|
|
@var{size} bytes at @var{to}, even if those two blocks of space
|
|
overlap. In the case of overlap, @code{memmove} is careful to copy the
|
|
original values of the bytes in the block at @var{from}, including those
|
|
bytes which also belong to the block at @var{to}.
|
|
|
|
The value returned by @code{memmove} is the value of @var{to}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wmemmove (wchar *@var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
|
|
@code{wmemmove} copies the @var{size} wide characters at @var{wfrom}
|
|
into the @var{size} wide characters at @var{wto}, even if those two
|
|
blocks of space overlap. In the case of overlap, @code{memmove} is
|
|
careful to copy the original values of the wide characters in the block
|
|
at @var{wfrom}, including those wide characters which also belong to the
|
|
block at @var{wto}.
|
|
|
|
The following is a possible implementation of @code{wmemcpy} but there
|
|
are more optimizations possible.
|
|
|
|
@smallexample
|
|
wchar_t *
|
|
wmempcpy (wchar_t *restrict wto, const wchar_t *restrict wfrom,
|
|
size_t size)
|
|
@{
|
|
return (wchar_t *) mempcpy (wto, wfrom, size * sizeof (wchar_t));
|
|
@}
|
|
@end smallexample
|
|
|
|
The value returned by @code{wmemmove} is the value of @var{wto}.
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment SVID
|
|
@deftypefun {void *} memccpy (void *restrict @var{to}, const void *restrict @var{from}, int @var{c}, size_t @var{size})
|
|
This function copies no more than @var{size} bytes from @var{from} to
|
|
@var{to}, stopping if a byte matching @var{c} is found. The return
|
|
value is a pointer into @var{to} one byte past where @var{c} was copied,
|
|
or a null pointer if no byte matching @var{c} appeared in the first
|
|
@var{size} bytes of @var{from}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {void *} memset (void *@var{block}, int @var{c}, size_t @var{size})
|
|
This function copies the value of @var{c} (converted to an
|
|
@code{unsigned char}) into each of the first @var{size} bytes of the
|
|
object beginning at @var{block}. It returns the value of @var{block}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wmemset (wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
|
|
This function copies the value of @var{wc} into each of the first
|
|
@var{size} wide characters of the object beginning at @var{block}. It
|
|
returns the value of @var{block}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strcpy (char *restrict @var{to}, const char *restrict @var{from})
|
|
This copies characters from the string @var{from} (up to and including
|
|
the terminating null character) into the string @var{to}. Like
|
|
@code{memcpy}, this function has undefined results if the strings
|
|
overlap. The return value is the value of @var{to}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcscpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
|
|
This copies wide characters from the string @var{wfrom} (up to and
|
|
including the terminating null wide character) into the string
|
|
@var{wto}. Like @code{wmemcpy}, this function has undefined results if
|
|
the strings overlap. The return value is the value of @var{wto}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
|
|
This function is similar to @code{strcpy} but always copies exactly
|
|
@var{size} characters into @var{to}.
|
|
|
|
If the length of @var{from} is more than @var{size}, then @code{strncpy}
|
|
copies just the first @var{size} characters. Note that in this case
|
|
there is no null terminator written into @var{to}.
|
|
|
|
If the length of @var{from} is less than @var{size}, then @code{strncpy}
|
|
copies all of @var{from}, followed by enough null characters to add up
|
|
to @var{size} characters in all. This behavior is rarely useful, but it
|
|
is specified by the @w{ISO C} standard.
|
|
|
|
The behavior of @code{strncpy} is undefined if the strings overlap.
|
|
|
|
Using @code{strncpy} as opposed to @code{strcpy} is a way to avoid bugs
|
|
relating to writing past the end of the allocated space for @var{to}.
|
|
However, it can also make your program much slower in one common case:
|
|
copying a string which is probably small into a potentially large buffer.
|
|
In this case, @var{size} may be large, and when it is, @code{strncpy} will
|
|
waste a considerable amount of time copying null characters.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcsncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
|
|
This function is similar to @code{wcscpy} but always copies exactly
|
|
@var{size} wide characters into @var{wto}.
|
|
|
|
If the length of @var{wfrom} is more than @var{size}, then
|
|
@code{wcsncpy} copies just the first @var{size} wide characters. Note
|
|
that in this case there is no null terminator written into @var{wto}.
|
|
|
|
If the length of @var{wfrom} is less than @var{size}, then
|
|
@code{wcsncpy} copies all of @var{wfrom}, followed by enough null wide
|
|
characters to add up to @var{size} wide characters in all. This
|
|
behavior is rarely useful, but it is specified by the @w{ISO C}
|
|
standard.
|
|
|
|
The behavior of @code{wcsncpy} is undefined if the strings overlap.
|
|
|
|
Using @code{wcsncpy} as opposed to @code{wcscpy} is a way to avoid bugs
|
|
relating to writing past the end of the allocated space for @var{wto}.
|
|
However, it can also make your program much slower in one common case:
|
|
copying a string which is probably small into a potentially large buffer.
|
|
In this case, @var{size} may be large, and when it is, @code{wcsncpy} will
|
|
waste a considerable amount of time copying null wide characters.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment SVID
|
|
@deftypefun {char *} strdup (const char *@var{s})
|
|
This function copies the null-terminated string @var{s} into a newly
|
|
allocated string. The string is allocated using @code{malloc}; see
|
|
@ref{Unconstrained Allocation}. If @code{malloc} cannot allocate space
|
|
for the new string, @code{strdup} returns a null pointer. Otherwise it
|
|
returns a pointer to the new string.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun {wchar_t *} wcsdup (const wchar_t *@var{ws})
|
|
This function copies the null-terminated wide character string @var{ws}
|
|
into a newly allocated string. The string is allocated using
|
|
@code{malloc}; see @ref{Unconstrained Allocation}. If @code{malloc}
|
|
cannot allocate space for the new string, @code{wcsdup} returns a null
|
|
pointer. Otherwise it returns a pointer to the new wide character
|
|
string.
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {char *} strndup (const char *@var{s}, size_t @var{size})
|
|
This function is similar to @code{strdup} but always copies at most
|
|
@var{size} characters into the newly allocated string.
|
|
|
|
If the length of @var{s} is more than @var{size}, then @code{strndup}
|
|
copies just the first @var{size} characters and adds a closing null
|
|
terminator. Otherwise all characters are copied and the string is
|
|
terminated.
|
|
|
|
This function is different to @code{strncpy} in that it always
|
|
terminates the destination string.
|
|
|
|
@code{strndup} is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment Unknown origin
|
|
@deftypefun {char *} stpcpy (char *restrict @var{to}, const char *restrict @var{from})
|
|
This function is like @code{strcpy}, except that it returns a pointer to
|
|
the end of the string @var{to} (that is, the address of the terminating
|
|
null character @code{to + strlen (from)}) rather than the beginning.
|
|
|
|
For example, this program uses @code{stpcpy} to concatenate @samp{foo}
|
|
and @samp{bar} to produce @samp{foobar}, which it then prints.
|
|
|
|
@smallexample
|
|
@include stpcpy.c.texi
|
|
@end smallexample
|
|
|
|
This function is not part of the ISO or POSIX standards, and is not
|
|
customary on Unix systems, but we did not invent it either. Perhaps it
|
|
comes from MS-DOG.
|
|
|
|
Its behavior is undefined if the strings overlap. The function is
|
|
declared in @file{string.h}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun {wchar_t *} wcpcpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
|
|
This function is like @code{wcscpy}, except that it returns a pointer to
|
|
the end of the string @var{wto} (that is, the address of the terminating
|
|
null character @code{wto + strlen (wfrom)}) rather than the beginning.
|
|
|
|
This function is not part of ISO or POSIX but was found useful while
|
|
developing the GNU C Library itself.
|
|
|
|
The behavior of @code{wcpcpy} is undefined if the strings overlap.
|
|
|
|
@code{wcpcpy} is a GNU extension and is declared in @file{wchar.h}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {char *} stpncpy (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
|
|
This function is similar to @code{stpcpy} but copies always exactly
|
|
@var{size} characters into @var{to}.
|
|
|
|
If the length of @var{from} is more then @var{size}, then @code{stpncpy}
|
|
copies just the first @var{size} characters and returns a pointer to the
|
|
character directly following the one which was copied last. Note that in
|
|
this case there is no null terminator written into @var{to}.
|
|
|
|
If the length of @var{from} is less than @var{size}, then @code{stpncpy}
|
|
copies all of @var{from}, followed by enough null characters to add up
|
|
to @var{size} characters in all. This behavior is rarely useful, but it
|
|
is implemented to be useful in contexts where this behavior of the
|
|
@code{strncpy} is used. @code{stpncpy} returns a pointer to the
|
|
@emph{first} written null character.
|
|
|
|
This function is not part of ISO or POSIX but was found useful while
|
|
developing the GNU C Library itself.
|
|
|
|
Its behavior is undefined if the strings overlap. The function is
|
|
declared in @file{string.h}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun {wchar_t *} wcpncpy (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
|
|
This function is similar to @code{wcpcpy} but copies always exactly
|
|
@var{wsize} characters into @var{wto}.
|
|
|
|
If the length of @var{wfrom} is more then @var{size}, then
|
|
@code{wcpncpy} copies just the first @var{size} wide characters and
|
|
returns a pointer to the wide character directly following the last
|
|
non-null wide character which was copied last. Note that in this case
|
|
there is no null terminator written into @var{wto}.
|
|
|
|
If the length of @var{wfrom} is less than @var{size}, then @code{wcpncpy}
|
|
copies all of @var{wfrom}, followed by enough null characters to add up
|
|
to @var{size} characters in all. This behavior is rarely useful, but it
|
|
is implemented to be useful in contexts where this behavior of the
|
|
@code{wcsncpy} is used. @code{wcpncpy} returns a pointer to the
|
|
@emph{first} written null character.
|
|
|
|
This function is not part of ISO or POSIX but was found useful while
|
|
developing the GNU C Library itself.
|
|
|
|
Its behavior is undefined if the strings overlap.
|
|
|
|
@code{wcpncpy} is a GNU extension and is declared in @file{wchar.h}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefn {Macro} {char *} strdupa (const char *@var{s})
|
|
This macro is similar to @code{strdup} but allocates the new string
|
|
using @code{alloca} instead of @code{malloc} (@pxref{Variable Size
|
|
Automatic}). This means of course the returned string has the same
|
|
limitations as any block of memory allocated using @code{alloca}.
|
|
|
|
For obvious reasons @code{strdupa} is implemented only as a macro;
|
|
you cannot get the address of this function. Despite this limitation
|
|
it is a useful function. The following code shows a situation where
|
|
using @code{malloc} would be a lot more expensive.
|
|
|
|
@smallexample
|
|
@include strdupa.c.texi
|
|
@end smallexample
|
|
|
|
Please note that calling @code{strtok} using @var{path} directly is
|
|
invalid. It is also not allowed to call @code{strdupa} in the argument
|
|
list of @code{strtok} since @code{strdupa} uses @code{alloca}
|
|
(@pxref{Variable Size Automatic}) can interfere with the parameter
|
|
passing.
|
|
|
|
This function is only available if GNU CC is used.
|
|
@end deftypefn
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefn {Macro} {char *} strndupa (const char *@var{s}, size_t @var{size})
|
|
This function is similar to @code{strndup} but like @code{strdupa} it
|
|
allocates the new string using @code{alloca}
|
|
@pxref{Variable Size Automatic}. The same advantages and limitations
|
|
of @code{strdupa} are valid for @code{strndupa}, too.
|
|
|
|
This function is implemented only as a macro, just like @code{strdupa}.
|
|
Just as @code{strdupa} this macro also must not be used inside the
|
|
parameter list in a function call.
|
|
|
|
@code{strndupa} is only available if GNU CC is used.
|
|
@end deftypefn
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strcat (char *restrict @var{to}, const char *restrict @var{from})
|
|
The @code{strcat} function is similar to @code{strcpy}, except that the
|
|
characters from @var{from} are concatenated or appended to the end of
|
|
@var{to}, instead of overwriting it. That is, the first character from
|
|
@var{from} overwrites the null character marking the end of @var{to}.
|
|
|
|
An equivalent definition for @code{strcat} would be:
|
|
|
|
@smallexample
|
|
char *
|
|
strcat (char *restrict to, const char *restrict from)
|
|
@{
|
|
strcpy (to + strlen (to), from);
|
|
return to;
|
|
@}
|
|
@end smallexample
|
|
|
|
This function has undefined results if the strings overlap.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcscat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom})
|
|
The @code{wcscat} function is similar to @code{wcscpy}, except that the
|
|
characters from @var{wfrom} are concatenated or appended to the end of
|
|
@var{wto}, instead of overwriting it. That is, the first character from
|
|
@var{wfrom} overwrites the null character marking the end of @var{wto}.
|
|
|
|
An equivalent definition for @code{wcscat} would be:
|
|
|
|
@smallexample
|
|
wchar_t *
|
|
wcscat (wchar_t *wto, const wchar_t *wfrom)
|
|
@{
|
|
wcscpy (wto + wcslen (wto), wfrom);
|
|
return wto;
|
|
@}
|
|
@end smallexample
|
|
|
|
This function has undefined results if the strings overlap.
|
|
@end deftypefun
|
|
|
|
Programmers using the @code{strcat} or @code{wcscat} function (or the
|
|
following @code{strncat} or @code{wcsncar} functions for that matter)
|
|
can easily be recognized as lazy and reckless. In almost all situations
|
|
the lengths of the participating strings are known (it better should be
|
|
since how can one otherwise ensure the allocated size of the buffer is
|
|
sufficient?) Or at least, one could know them if one keeps track of the
|
|
results of the various function calls. But then it is very inefficient
|
|
to use @code{strcat}/@code{wcscat}. A lot of time is wasted finding the
|
|
end of the destination string so that the actual copying can start.
|
|
This is a common example:
|
|
|
|
@cindex __va_copy
|
|
@cindex va_copy
|
|
@smallexample
|
|
/* @r{This function concatenates arbitrarily many strings. The last}
|
|
@r{parameter must be @code{NULL}.} */
|
|
char *
|
|
concat (const char *str, @dots{})
|
|
@{
|
|
va_list ap, ap2;
|
|
size_t total = 1;
|
|
const char *s;
|
|
char *result;
|
|
|
|
va_start (ap, str);
|
|
/* @r{Actually @code{va_copy}, but this is the name more gcc versions}
|
|
@r{understand.} */
|
|
__va_copy (ap2, ap);
|
|
|
|
/* @r{Determine how much space we need.} */
|
|
for (s = str; s != NULL; s = va_arg (ap, const char *))
|
|
total += strlen (s);
|
|
|
|
va_end (ap);
|
|
|
|
result = (char *) malloc (total);
|
|
if (result != NULL)
|
|
@{
|
|
result[0] = '\0';
|
|
|
|
/* @r{Copy the strings.} */
|
|
for (s = str; s != NULL; s = va_arg (ap2, const char *))
|
|
strcat (result, s);
|
|
@}
|
|
|
|
va_end (ap2);
|
|
|
|
return result;
|
|
@}
|
|
@end smallexample
|
|
|
|
This looks quite simple, especially the second loop where the strings
|
|
are actually copied. But these innocent lines hide a major performance
|
|
penalty. Just imagine that ten strings of 100 bytes each have to be
|
|
concatenated. For the second string we search the already stored 100
|
|
bytes for the end of the string so that we can append the next string.
|
|
For all strings in total the comparisons necessary to find the end of
|
|
the intermediate results sums up to 5500! If we combine the copying
|
|
with the search for the allocation we can write this function more
|
|
efficient:
|
|
|
|
@smallexample
|
|
char *
|
|
concat (const char *str, @dots{})
|
|
@{
|
|
va_list ap;
|
|
size_t allocated = 100;
|
|
char *result = (char *) malloc (allocated);
|
|
|
|
if (result != NULL)
|
|
@{
|
|
char *newp;
|
|
char *wp;
|
|
|
|
va_start (ap, str);
|
|
|
|
wp = result;
|
|
for (s = str; s != NULL; s = va_arg (ap, const char *))
|
|
@{
|
|
size_t len = strlen (s);
|
|
|
|
/* @r{Resize the allocated memory if necessary.} */
|
|
if (wp + len + 1 > result + allocated)
|
|
@{
|
|
allocated = (allocated + len) * 2;
|
|
newp = (char *) realloc (result, allocated);
|
|
if (newp == NULL)
|
|
@{
|
|
free (result);
|
|
return NULL;
|
|
@}
|
|
wp = newp + (wp - result);
|
|
result = newp;
|
|
@}
|
|
|
|
wp = mempcpy (wp, s, len);
|
|
@}
|
|
|
|
/* @r{Terminate the result string.} */
|
|
*wp++ = '\0';
|
|
|
|
/* @r{Resize memory to the optimal size.} */
|
|
newp = realloc (result, wp - result);
|
|
if (newp != NULL)
|
|
result = newp;
|
|
|
|
va_end (ap);
|
|
@}
|
|
|
|
return result;
|
|
@}
|
|
@end smallexample
|
|
|
|
With a bit more knowledge about the input strings one could fine-tune
|
|
the memory allocation. The difference we are pointing to here is that
|
|
we don't use @code{strcat} anymore. We always keep track of the length
|
|
of the current intermediate result so we can safe us the search for the
|
|
end of the string and use @code{mempcpy}. Please note that we also
|
|
don't use @code{stpcpy} which might seem more natural since we handle
|
|
with strings. But this is not necessary since we already know the
|
|
length of the string and therefore can use the faster memory copying
|
|
function. The example would work for wide characters the same way.
|
|
|
|
Whenever a programmer feels the need to use @code{strcat} she or he
|
|
should think twice and look through the program whether the code cannot
|
|
be rewritten to take advantage of already calculated results. Again: it
|
|
is almost always unnecessary to use @code{strcat}.
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strncat (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
|
|
This function is like @code{strcat} except that not more than @var{size}
|
|
characters from @var{from} are appended to the end of @var{to}. A
|
|
single null character is also always appended to @var{to}, so the total
|
|
allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
|
|
longer than its initial length.
|
|
|
|
The @code{strncat} function could be implemented like this:
|
|
|
|
@smallexample
|
|
@group
|
|
char *
|
|
strncat (char *to, const char *from, size_t size)
|
|
@{
|
|
to[strlen (to) + size] = '\0';
|
|
strncpy (to + strlen (to), from, size);
|
|
return to;
|
|
@}
|
|
@end group
|
|
@end smallexample
|
|
|
|
The behavior of @code{strncat} is undefined if the strings overlap.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcsncat (wchar_t *restrict @var{wto}, const wchar_t *restrict @var{wfrom}, size_t @var{size})
|
|
This function is like @code{wcscat} except that not more than @var{size}
|
|
characters from @var{from} are appended to the end of @var{to}. A
|
|
single null character is also always appended to @var{to}, so the total
|
|
allocated size of @var{to} must be at least @code{@var{size} + 1} bytes
|
|
longer than its initial length.
|
|
|
|
The @code{wcsncat} function could be implemented like this:
|
|
|
|
@smallexample
|
|
@group
|
|
wchar_t *
|
|
wcsncat (wchar_t *restrict wto, const wchar_t *restrict wfrom,
|
|
size_t size)
|
|
@{
|
|
wto[wcslen (to) + size] = L'\0';
|
|
wcsncpy (wto + wcslen (wto), wfrom, size);
|
|
return wto;
|
|
@}
|
|
@end group
|
|
@end smallexample
|
|
|
|
The behavior of @code{wcsncat} is undefined if the strings overlap.
|
|
@end deftypefun
|
|
|
|
Here is an example showing the use of @code{strncpy} and @code{strncat}
|
|
(the wide character version is equivalent). Notice how, in the call to
|
|
@code{strncat}, the @var{size} parameter is computed to avoid
|
|
overflowing the character array @code{buffer}.
|
|
|
|
@smallexample
|
|
@include strncat.c.texi
|
|
@end smallexample
|
|
|
|
@noindent
|
|
The output produced by this program looks like:
|
|
|
|
@smallexample
|
|
hello
|
|
hello, wo
|
|
@end smallexample
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun void bcopy (const void *@var{from}, void *@var{to}, size_t @var{size})
|
|
This is a partially obsolete alternative for @code{memmove}, derived from
|
|
BSD. Note that it is not quite equivalent to @code{memmove}, because the
|
|
arguments are not in the same order and there is no return value.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun void bzero (void *@var{block}, size_t @var{size})
|
|
This is a partially obsolete alternative for @code{memset}, derived from
|
|
BSD. Note that it is not as general as @code{memset}, because the only
|
|
value it can store is zero.
|
|
@end deftypefun
|
|
|
|
@node String/Array Comparison
|
|
@section String/Array Comparison
|
|
@cindex comparing strings and arrays
|
|
@cindex string comparison functions
|
|
@cindex array comparison functions
|
|
@cindex predicates on strings
|
|
@cindex predicates on arrays
|
|
|
|
You can use the functions in this section to perform comparisons on the
|
|
contents of strings and arrays. As well as checking for equality, these
|
|
functions can also be used as the ordering functions for sorting
|
|
operations. @xref{Searching and Sorting}, for an example of this.
|
|
|
|
Unlike most comparison operations in C, the string comparison functions
|
|
return a nonzero value if the strings are @emph{not} equivalent rather
|
|
than if they are. The sign of the value indicates the relative ordering
|
|
of the first characters in the strings that are not equivalent: a
|
|
negative value indicates that the first string is ``less'' than the
|
|
second, while a positive value indicates that the first string is
|
|
``greater''.
|
|
|
|
The most common use of these functions is to check only for equality.
|
|
This is canonically done with an expression like @w{@samp{! strcmp (s1, s2)}}.
|
|
|
|
All of these functions are declared in the header file @file{string.h}.
|
|
@pindex string.h
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun int memcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
|
|
The function @code{memcmp} compares the @var{size} bytes of memory
|
|
beginning at @var{a1} against the @var{size} bytes of memory beginning
|
|
at @var{a2}. The value returned has the same sign as the difference
|
|
between the first differing pair of bytes (interpreted as @code{unsigned
|
|
char} objects, then promoted to @code{int}).
|
|
|
|
If the contents of the two blocks are equal, @code{memcmp} returns
|
|
@code{0}.
|
|
@end deftypefun
|
|
|
|
@comment wcjar.h
|
|
@comment ISO
|
|
@deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
|
|
The function @code{wmemcmp} compares the @var{size} wide characters
|
|
beginning at @var{a1} against the @var{size} wide characters beginning
|
|
at @var{a2}. The value returned is smaller than or larger than zero
|
|
depending on whether the first differing wide character is @var{a1} is
|
|
smaller or larger than the corresponding character in @var{a2}.
|
|
|
|
If the contents of the two blocks are equal, @code{wmemcmp} returns
|
|
@code{0}.
|
|
@end deftypefun
|
|
|
|
On arbitrary arrays, the @code{memcmp} function is mostly useful for
|
|
testing equality. It usually isn't meaningful to do byte-wise ordering
|
|
comparisons on arrays of things other than bytes. For example, a
|
|
byte-wise comparison on the bytes that make up floating-point numbers
|
|
isn't likely to tell you anything about the relationship between the
|
|
values of the floating-point numbers.
|
|
|
|
@code{wmemcmp} is really only useful to compare arrays of type
|
|
@code{wchar_t} since the function looks at @code{sizeof (wchar_t)} bytes
|
|
at a time and this number of bytes is system dependent.
|
|
|
|
You should also be careful about using @code{memcmp} to compare objects
|
|
that can contain ``holes'', such as the padding inserted into structure
|
|
objects to enforce alignment requirements, extra space at the end of
|
|
unions, and extra characters at the ends of strings whose length is less
|
|
than their allocated size. The contents of these ``holes'' are
|
|
indeterminate and may cause strange behavior when performing byte-wise
|
|
comparisons. For more predictable results, perform an explicit
|
|
component-wise comparison.
|
|
|
|
For example, given a structure type definition like:
|
|
|
|
@smallexample
|
|
struct foo
|
|
@{
|
|
unsigned char tag;
|
|
union
|
|
@{
|
|
double f;
|
|
long i;
|
|
char *p;
|
|
@} value;
|
|
@};
|
|
@end smallexample
|
|
|
|
@noindent
|
|
you are better off writing a specialized comparison function to compare
|
|
@code{struct foo} objects instead of comparing them with @code{memcmp}.
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun int strcmp (const char *@var{s1}, const char *@var{s2})
|
|
The @code{strcmp} function compares the string @var{s1} against
|
|
@var{s2}, returning a value that has the same sign as the difference
|
|
between the first differing pair of characters (interpreted as
|
|
@code{unsigned char} objects, then promoted to @code{int}).
|
|
|
|
If the two strings are equal, @code{strcmp} returns @code{0}.
|
|
|
|
A consequence of the ordering used by @code{strcmp} is that if @var{s1}
|
|
is an initial substring of @var{s2}, then @var{s1} is considered to be
|
|
``less than'' @var{s2}.
|
|
|
|
@code{strcmp} does not take sorting conventions of the language the
|
|
strings are written in into account. To get that one has to use
|
|
@code{strcoll}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun int wcscmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
|
|
|
|
The @code{wcscmp} function compares the wide character string @var{ws1}
|
|
against @var{ws2}. The value returned is smaller than or larger than zero
|
|
depending on whether the first differing wide character is @var{ws1} is
|
|
smaller or larger than the corresponding character in @var{ws2}.
|
|
|
|
If the two strings are equal, @code{wcscmp} returns @code{0}.
|
|
|
|
A consequence of the ordering used by @code{wcscmp} is that if @var{ws1}
|
|
is an initial substring of @var{ws2}, then @var{ws1} is considered to be
|
|
``less than'' @var{ws2}.
|
|
|
|
@code{wcscmp} does not take sorting conventions of the language the
|
|
strings are written in into account. To get that one has to use
|
|
@code{wcscoll}.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun int strcasecmp (const char *@var{s1}, const char *@var{s2})
|
|
This function is like @code{strcmp}, except that differences in case are
|
|
ignored. How uppercase and lowercase characters are related is
|
|
determined by the currently selected locale. In the standard @code{"C"}
|
|
locale the characters @"A and @"a do not match but in a locale which
|
|
regards these characters as parts of the alphabet they do match.
|
|
|
|
@noindent
|
|
@code{strcasecmp} is derived from BSD.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun int wcscasecmp (const wchar_t *@var{ws1}, const wchar_T *@var{ws2})
|
|
This function is like @code{wcscmp}, except that differences in case are
|
|
ignored. How uppercase and lowercase characters are related is
|
|
determined by the currently selected locale. In the standard @code{"C"}
|
|
locale the characters @"A and @"a do not match but in a locale which
|
|
regards these characters as parts of the alphabet they do match.
|
|
|
|
@noindent
|
|
@code{wcscasecmp} is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun int strncmp (const char *@var{s1}, const char *@var{s2}, size_t @var{size})
|
|
This function is the similar to @code{strcmp}, except that no more than
|
|
@var{size} wide characters are compared. In other words, if the two
|
|
strings are the same in their first @var{size} wide characters, the
|
|
return value is zero.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
|
|
This function is the similar to @code{wcscmp}, except that no more than
|
|
@var{size} wide characters are compared. In other words, if the two
|
|
strings are the same in their first @var{size} wide characters, the
|
|
return value is zero.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
|
|
This function is like @code{strncmp}, except that differences in case
|
|
are ignored. Like @code{strcasecmp}, it is locale dependent how
|
|
uppercase and lowercase characters are related.
|
|
|
|
@noindent
|
|
@code{strncasecmp} is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun int wcsncasecmp (const wchar_t *@var{ws1}, const wchar_t *@var{s2}, size_t @var{n})
|
|
This function is like @code{wcsncmp}, except that differences in case
|
|
are ignored. Like @code{wcscasecmp}, it is locale dependent how
|
|
uppercase and lowercase characters are related.
|
|
|
|
@noindent
|
|
@code{wcsncasecmp} is a GNU extension.
|
|
@end deftypefun
|
|
|
|
Here are some examples showing the use of @code{strcmp} and
|
|
@code{strncmp} (equivalent examples can be constructed for the wide
|
|
character functions). These examples assume the use of the ASCII
|
|
character set. (If some other character set---say, EBCDIC---is used
|
|
instead, then the glyphs are associated with different numeric codes,
|
|
and the return values and ordering may differ.)
|
|
|
|
@smallexample
|
|
strcmp ("hello", "hello")
|
|
@result{} 0 /* @r{These two strings are the same.} */
|
|
strcmp ("hello", "Hello")
|
|
@result{} 32 /* @r{Comparisons are case-sensitive.} */
|
|
strcmp ("hello", "world")
|
|
@result{} -15 /* @r{The character @code{'h'} comes before @code{'w'}.} */
|
|
strcmp ("hello", "hello, world")
|
|
@result{} -44 /* @r{Comparing a null character against a comma.} */
|
|
strncmp ("hello", "hello, world", 5)
|
|
@result{} 0 /* @r{The initial 5 characters are the same.} */
|
|
strncmp ("hello, world", "hello, stupid world!!!", 5)
|
|
@result{} 0 /* @r{The initial 5 characters are the same.} */
|
|
@end smallexample
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun int strverscmp (const char *@var{s1}, const char *@var{s2})
|
|
The @code{strverscmp} function compares the string @var{s1} against
|
|
@var{s2}, considering them as holding indices/version numbers. Return
|
|
value follows the same conventions as found in the @code{strverscmp}
|
|
function. In fact, if @var{s1} and @var{s2} contain no digits,
|
|
@code{strverscmp} behaves like @code{strcmp}.
|
|
|
|
Basically, we compare strings normally (character by character), until
|
|
we find a digit in each string - then we enter a special comparison
|
|
mode, where each sequence of digits is taken as a whole. If we reach the
|
|
end of these two parts without noticing a difference, we return to the
|
|
standard comparison mode. There are two types of numeric parts:
|
|
"integral" and "fractional" (those begin with a '0'). The types
|
|
of the numeric parts affect the way we sort them:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
integral/integral: we compare values as you would expect.
|
|
|
|
@item
|
|
fractional/integral: the fractional part is less than the integral one.
|
|
Again, no surprise.
|
|
|
|
@item
|
|
fractional/fractional: the things become a bit more complex.
|
|
If the common prefix contains only leading zeroes, the longest part is less
|
|
than the other one; else the comparison behaves normally.
|
|
@end itemize
|
|
|
|
@smallexample
|
|
strverscmp ("no digit", "no digit")
|
|
@result{} 0 /* @r{same behavior as strcmp.} */
|
|
strverscmp ("item#99", "item#100")
|
|
@result{} <0 /* @r{same prefix, but 99 < 100.} */
|
|
strverscmp ("alpha1", "alpha001")
|
|
@result{} >0 /* @r{fractional part inferior to integral one.} */
|
|
strverscmp ("part1_f012", "part1_f01")
|
|
@result{} >0 /* @r{two fractional parts.} */
|
|
strverscmp ("foo.009", "foo.0")
|
|
@result{} <0 /* @r{idem, but with leading zeroes only.} */
|
|
@end smallexample
|
|
|
|
This function is especially useful when dealing with filename sorting,
|
|
because filenames frequently hold indices/version numbers.
|
|
|
|
@code{strverscmp} is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun int bcmp (const void *@var{a1}, const void *@var{a2}, size_t @var{size})
|
|
This is an obsolete alias for @code{memcmp}, derived from BSD.
|
|
@end deftypefun
|
|
|
|
@node Collation Functions
|
|
@section Collation Functions
|
|
|
|
@cindex collating strings
|
|
@cindex string collation functions
|
|
|
|
In some locales, the conventions for lexicographic ordering differ from
|
|
the strict numeric ordering of character codes. For example, in Spanish
|
|
most glyphs with diacritical marks such as accents are not considered
|
|
distinct letters for the purposes of collation. On the other hand, the
|
|
two-character sequence @samp{ll} is treated as a single letter that is
|
|
collated immediately after @samp{l}.
|
|
|
|
You can use the functions @code{strcoll} and @code{strxfrm} (declared in
|
|
the headers file @file{string.h}) and @code{wcscoll} and @code{wcsxfrm}
|
|
(declared in the headers file @file{wchar}) to compare strings using a
|
|
collation ordering appropriate for the current locale. The locale used
|
|
by these functions in particular can be specified by setting the locale
|
|
for the @code{LC_COLLATE} category; see @ref{Locales}.
|
|
@pindex string.h
|
|
@pindex wchar.h
|
|
|
|
In the standard C locale, the collation sequence for @code{strcoll} is
|
|
the same as that for @code{strcmp}. Similarly, @code{wcscoll} and
|
|
@code{wcscmp} are the same in this situation.
|
|
|
|
Effectively, the way these functions work is by applying a mapping to
|
|
transform the characters in a string to a byte sequence that represents
|
|
the string's position in the collating sequence of the current locale.
|
|
Comparing two such byte sequences in a simple fashion is equivalent to
|
|
comparing the strings with the locale's collating sequence.
|
|
|
|
The functions @code{strcoll} and @code{wcscoll} perform this translation
|
|
implicitly, in order to do one comparison. By contrast, @code{strxfrm}
|
|
and @code{wcsxfrm} perform the mapping explicitly. If you are making
|
|
multiple comparisons using the same string or set of strings, it is
|
|
likely to be more efficient to use @code{strxfrm} or @code{wcsxfrm} to
|
|
transform all the strings just once, and subsequently compare the
|
|
transformed strings with @code{strcmp} or @code{wcscmp}.
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun int strcoll (const char *@var{s1}, const char *@var{s2})
|
|
The @code{strcoll} function is similar to @code{strcmp} but uses the
|
|
collating sequence of the current locale for collation (the
|
|
@code{LC_COLLATE} locale).
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun int wcscoll (const wchar_t *@var{ws1}, const wchar_t *@var{ws2})
|
|
The @code{wcscoll} function is similar to @code{wcscmp} but uses the
|
|
collating sequence of the current locale for collation (the
|
|
@code{LC_COLLATE} locale).
|
|
@end deftypefun
|
|
|
|
Here is an example of sorting an array of strings, using @code{strcoll}
|
|
to compare them. The actual sort algorithm is not written here; it
|
|
comes from @code{qsort} (@pxref{Array Sort Function}). The job of the
|
|
code shown here is to say how to compare the strings while sorting them.
|
|
(Later on in this section, we will show a way to do this more
|
|
efficiently using @code{strxfrm}.)
|
|
|
|
@smallexample
|
|
/* @r{This is the comparison function used with @code{qsort}.} */
|
|
|
|
int
|
|
compare_elements (char **p1, char **p2)
|
|
@{
|
|
return strcoll (*p1, *p2);
|
|
@}
|
|
|
|
/* @r{This is the entry point---the function to sort}
|
|
@r{strings using the locale's collating sequence.} */
|
|
|
|
void
|
|
sort_strings (char **array, int nstrings)
|
|
@{
|
|
/* @r{Sort @code{temp_array} by comparing the strings.} */
|
|
qsort (array, nstrings,
|
|
sizeof (char *), compare_elements);
|
|
@}
|
|
@end smallexample
|
|
|
|
@cindex converting string to collation order
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun size_t strxfrm (char *restrict @var{to}, const char *restrict @var{from}, size_t @var{size})
|
|
The function @code{strxfrm} transforms the string @var{from} using the
|
|
collation transformation determined by the locale currently selected for
|
|
collation, and stores the transformed string in the array @var{to}. Up
|
|
to @var{size} characters (including a terminating null character) are
|
|
stored.
|
|
|
|
The behavior is undefined if the strings @var{to} and @var{from}
|
|
overlap; see @ref{Copying and Concatenation}.
|
|
|
|
The return value is the length of the entire transformed string. This
|
|
value is not affected by the value of @var{size}, but if it is greater
|
|
or equal than @var{size}, it means that the transformed string did not
|
|
entirely fit in the array @var{to}. In this case, only as much of the
|
|
string as actually fits was stored. To get the whole transformed
|
|
string, call @code{strxfrm} again with a bigger output array.
|
|
|
|
The transformed string may be longer than the original string, and it
|
|
may also be shorter.
|
|
|
|
If @var{size} is zero, no characters are stored in @var{to}. In this
|
|
case, @code{strxfrm} simply returns the number of characters that would
|
|
be the length of the transformed string. This is useful for determining
|
|
what size the allocated array should be. It does not matter what
|
|
@var{to} is if @var{size} is zero; @var{to} may even be a null pointer.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun size_t wcsxfrm (wchar_t *restrict @var{wto}, const wchar_t *@var{wfrom}, size_t @var{size})
|
|
The function @code{wcsxfrm} transforms wide character string @var{wfrom}
|
|
using the collation transformation determined by the locale currently
|
|
selected for collation, and stores the transformed string in the array
|
|
@var{wto}. Up to @var{size} wide characters (including a terminating null
|
|
character) are stored.
|
|
|
|
The behavior is undefined if the strings @var{wto} and @var{wfrom}
|
|
overlap; see @ref{Copying and Concatenation}.
|
|
|
|
The return value is the length of the entire transformed wide character
|
|
string. This value is not affected by the value of @var{size}, but if
|
|
it is greater or equal than @var{size}, it means that the transformed
|
|
wide character string did not entirely fit in the array @var{wto}. In
|
|
this case, only as much of the wide character string as actually fits
|
|
was stored. To get the whole transformed wide character string, call
|
|
@code{wcsxfrm} again with a bigger output array.
|
|
|
|
The transformed wide character string may be longer than the original
|
|
wide character string, and it may also be shorter.
|
|
|
|
If @var{size} is zero, no characters are stored in @var{to}. In this
|
|
case, @code{wcsxfrm} simply returns the number of wide characters that
|
|
would be the length of the transformed wide character string. This is
|
|
useful for determining what size the allocated array should be (remember
|
|
to multiply with @code{sizeof (wchar_t)}). It does not matter what
|
|
@var{wto} is if @var{size} is zero; @var{wto} may even be a null pointer.
|
|
@end deftypefun
|
|
|
|
Here is an example of how you can use @code{strxfrm} when
|
|
you plan to do many comparisons. It does the same thing as the previous
|
|
example, but much faster, because it has to transform each string only
|
|
once, no matter how many times it is compared with other strings. Even
|
|
the time needed to allocate and free storage is much less than the time
|
|
we save, when there are many strings.
|
|
|
|
@smallexample
|
|
struct sorter @{ char *input; char *transformed; @};
|
|
|
|
/* @r{This is the comparison function used with @code{qsort}}
|
|
@r{to sort an array of @code{struct sorter}.} */
|
|
|
|
int
|
|
compare_elements (struct sorter *p1, struct sorter *p2)
|
|
@{
|
|
return strcmp (p1->transformed, p2->transformed);
|
|
@}
|
|
|
|
/* @r{This is the entry point---the function to sort}
|
|
@r{strings using the locale's collating sequence.} */
|
|
|
|
void
|
|
sort_strings_fast (char **array, int nstrings)
|
|
@{
|
|
struct sorter temp_array[nstrings];
|
|
int i;
|
|
|
|
/* @r{Set up @code{temp_array}. Each element contains}
|
|
@r{one input string and its transformed string.} */
|
|
for (i = 0; i < nstrings; i++)
|
|
@{
|
|
size_t length = strlen (array[i]) * 2;
|
|
char *transformed;
|
|
size_t transformed_length;
|
|
|
|
temp_array[i].input = array[i];
|
|
|
|
/* @r{First try a buffer perhaps big enough.} */
|
|
transformed = (char *) xmalloc (length);
|
|
|
|
/* @r{Transform @code{array[i]}.} */
|
|
transformed_length = strxfrm (transformed, array[i], length);
|
|
|
|
/* @r{If the buffer was not large enough, resize it}
|
|
@r{and try again.} */
|
|
if (transformed_length >= length)
|
|
@{
|
|
/* @r{Allocate the needed space. +1 for terminating}
|
|
@r{@code{NUL} character.} */
|
|
transformed = (char *) xrealloc (transformed,
|
|
transformed_length + 1);
|
|
|
|
/* @r{The return value is not interesting because we know}
|
|
@r{how long the transformed string is.} */
|
|
(void) strxfrm (transformed, array[i],
|
|
transformed_length + 1);
|
|
@}
|
|
|
|
temp_array[i].transformed = transformed;
|
|
@}
|
|
|
|
/* @r{Sort @code{temp_array} by comparing transformed strings.} */
|
|
qsort (temp_array, sizeof (struct sorter),
|
|
nstrings, compare_elements);
|
|
|
|
/* @r{Put the elements back in the permanent array}
|
|
@r{in their sorted order.} */
|
|
for (i = 0; i < nstrings; i++)
|
|
array[i] = temp_array[i].input;
|
|
|
|
/* @r{Free the strings we allocated.} */
|
|
for (i = 0; i < nstrings; i++)
|
|
free (temp_array[i].transformed);
|
|
@}
|
|
@end smallexample
|
|
|
|
The interesting part of this code for the wide character version would
|
|
look like this:
|
|
|
|
@smallexample
|
|
void
|
|
sort_strings_fast (wchar_t **array, int nstrings)
|
|
@{
|
|
@dots{}
|
|
/* @r{Transform @code{array[i]}.} */
|
|
transformed_length = wcsxfrm (transformed, array[i], length);
|
|
|
|
/* @r{If the buffer was not large enough, resize it}
|
|
@r{and try again.} */
|
|
if (transformed_length >= length)
|
|
@{
|
|
/* @r{Allocate the needed space. +1 for terminating}
|
|
@r{@code{NUL} character.} */
|
|
transformed = (wchar_t *) xrealloc (transformed,
|
|
(transformed_length + 1)
|
|
* sizeof (wchar_t));
|
|
|
|
/* @r{The return value is not interesting because we know}
|
|
@r{how long the transformed string is.} */
|
|
(void) wcsxfrm (transformed, array[i],
|
|
transformed_length + 1);
|
|
@}
|
|
@dots{}
|
|
@end smallexample
|
|
|
|
@noindent
|
|
Note the additional multiplication with @code{sizeof (wchar_t)} in the
|
|
@code{realloc} call.
|
|
|
|
@strong{Compatibility Note:} The string collation functions are a new
|
|
feature of @w{ISO C90}. Older C dialects have no equivalent feature.
|
|
The wide character versions were introduced in @w{Amendment 1} to @w{ISO
|
|
C90}.
|
|
|
|
@node Search Functions
|
|
@section Search Functions
|
|
|
|
This section describes library functions which perform various kinds
|
|
of searching operations on strings and arrays. These functions are
|
|
declared in the header file @file{string.h}.
|
|
@pindex string.h
|
|
@cindex search functions (for strings)
|
|
@cindex string search functions
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {void *} memchr (const void *@var{block}, int @var{c}, size_t @var{size})
|
|
This function finds the first occurrence of the byte @var{c} (converted
|
|
to an @code{unsigned char}) in the initial @var{size} bytes of the
|
|
object beginning at @var{block}. The return value is a pointer to the
|
|
located byte, or a null pointer if no match was found.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wmemchr (const wchar_t *@var{block}, wchar_t @var{wc}, size_t @var{size})
|
|
This function finds the first occurrence of the wide character @var{wc}
|
|
in the initial @var{size} wide characters of the object beginning at
|
|
@var{block}. The return value is a pointer to the located wide
|
|
character, or a null pointer if no match was found.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {void *} rawmemchr (const void *@var{block}, int @var{c})
|
|
Often the @code{memchr} function is used with the knowledge that the
|
|
byte @var{c} is available in the memory block specified by the
|
|
parameters. But this means that the @var{size} parameter is not really
|
|
needed and that the tests performed with it at runtime (to check whether
|
|
the end of the block is reached) are not needed.
|
|
|
|
The @code{rawmemchr} function exists for just this situation which is
|
|
surprisingly frequent. The interface is similar to @code{memchr} except
|
|
that the @var{size} parameter is missing. The function will look beyond
|
|
the end of the block pointed to by @var{block} in case the programmer
|
|
made an error in assuming that the byte @var{c} is present in the block.
|
|
In this case the result is unspecified. Otherwise the return value is a
|
|
pointer to the located byte.
|
|
|
|
This function is of special interest when looking for the end of a
|
|
string. Since all strings are terminated by a null byte a call like
|
|
|
|
@smallexample
|
|
rawmemchr (str, '\0')
|
|
@end smallexample
|
|
|
|
@noindent
|
|
will never go beyond the end of the string.
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {void *} memrchr (const void *@var{block}, int @var{c}, size_t @var{size})
|
|
The function @code{memrchr} is like @code{memchr}, except that it searches
|
|
backwards from the end of the block defined by @var{block} and @var{size}
|
|
(instead of forwards from the front).
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strchr (const char *@var{string}, int @var{c})
|
|
The @code{strchr} function finds the first occurrence of the character
|
|
@var{c} (converted to a @code{char}) in the null-terminated string
|
|
beginning at @var{string}. The return value is a pointer to the located
|
|
character, or a null pointer if no match was found.
|
|
|
|
For example,
|
|
@smallexample
|
|
strchr ("hello, world", 'l')
|
|
@result{} "llo, world"
|
|
strchr ("hello, world", '?')
|
|
@result{} NULL
|
|
@end smallexample
|
|
|
|
The terminating null character is considered to be part of the string,
|
|
so you can use this function get a pointer to the end of a string by
|
|
specifying a null character as the value of the @var{c} argument. It
|
|
would be better (but less portable) to use @code{strchrnul} in this
|
|
case, though.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcschr (const wchar_t *@var{wstring}, int @var{wc})
|
|
The @code{wcschr} function finds the first occurrence of the wide
|
|
character @var{wc} in the null-terminated wide character string
|
|
beginning at @var{wstring}. The return value is a pointer to the
|
|
located wide character, or a null pointer if no match was found.
|
|
|
|
The terminating null character is considered to be part of the wide
|
|
character string, so you can use this function get a pointer to the end
|
|
of a wide character string by specifying a null wude character as the
|
|
value of the @var{wc} argument. It would be better (but less portable)
|
|
to use @code{wcschrnul} in this case, though.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {char *} strchrnul (const char *@var{string}, int @var{c})
|
|
@code{strchrnul} is the same as @code{strchr} except that if it does
|
|
not find the character, it returns a pointer to string's terminating
|
|
null character rather than a null pointer.
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment GNU
|
|
@deftypefun {wchar_t *} wcschrnul (const wchar_t *@var{wstring}, wchar_t @var{wc})
|
|
@code{wcschrnul} is the same as @code{wcschr} except that if it does not
|
|
find the wide character, it returns a pointer to wide character string's
|
|
terminating null wide character rather than a null pointer.
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
One useful, but unusual, use of the @code{strchr}
|
|
function is when one wants to have a pointer pointing to the NUL byte
|
|
terminating a string. This is often written in this way:
|
|
|
|
@smallexample
|
|
s += strlen (s);
|
|
@end smallexample
|
|
|
|
@noindent
|
|
This is almost optimal but the addition operation duplicated a bit of
|
|
the work already done in the @code{strlen} function. A better solution
|
|
is this:
|
|
|
|
@smallexample
|
|
s = strchr (s, '\0');
|
|
@end smallexample
|
|
|
|
There is no restriction on the second parameter of @code{strchr} so it
|
|
could very well also be the NUL character. Those readers thinking very
|
|
hard about this might now point out that the @code{strchr} function is
|
|
more expensive than the @code{strlen} function since we have two abort
|
|
criteria. This is right. But in the GNU C library the implementation of
|
|
@code{strchr} is optimized in a special way so that @code{strchr}
|
|
actually is faster.
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strrchr (const char *@var{string}, int @var{c})
|
|
The function @code{strrchr} is like @code{strchr}, except that it searches
|
|
backwards from the end of the string @var{string} (instead of forwards
|
|
from the front).
|
|
|
|
For example,
|
|
@smallexample
|
|
strrchr ("hello, world", 'l')
|
|
@result{} "ld"
|
|
@end smallexample
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcsrchr (const wchar_t *@var{wstring}, wchar_t @var{c})
|
|
The function @code{wcsrchr} is like @code{wcschr}, except that it searches
|
|
backwards from the end of the string @var{wstring} (instead of forwards
|
|
from the front).
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strstr (const char *@var{haystack}, const char *@var{needle})
|
|
This is like @code{strchr}, except that it searches @var{haystack} for a
|
|
substring @var{needle} rather than just a single character. It
|
|
returns a pointer into the string @var{haystack} that is the first
|
|
character of the substring, or a null pointer if no match was found. If
|
|
@var{needle} is an empty string, the function returns @var{haystack}.
|
|
|
|
For example,
|
|
@smallexample
|
|
strstr ("hello, world", "l")
|
|
@result{} "llo, world"
|
|
strstr ("hello, world", "wo")
|
|
@result{} "world"
|
|
@end smallexample
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcsstr (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
|
|
This is like @code{wcschr}, except that it searches @var{haystack} for a
|
|
substring @var{needle} rather than just a single wide character. It
|
|
returns a pointer into the string @var{haystack} that is the first wide
|
|
character of the substring, or a null pointer if no match was found. If
|
|
@var{needle} is an empty string, the function returns @var{haystack}.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment XPG
|
|
@deftypefun {wchar_t *} wcswcs (const wchar_t *@var{haystack}, const wchar_t *@var{needle})
|
|
@code{wcswcs} is an deprecated alias for @code{wcsstr}. This is the
|
|
name originally used in the X/Open Portability Guide before the
|
|
@w{Amendment 1} to @w{ISO C90} was published.
|
|
@end deftypefun
|
|
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {char *} strcasestr (const char *@var{haystack}, const char *@var{needle})
|
|
This is like @code{strstr}, except that it ignores case in searching for
|
|
the substring. Like @code{strcasecmp}, it is locale dependent how
|
|
uppercase and lowercase characters are related.
|
|
|
|
|
|
For example,
|
|
@smallexample
|
|
strcasestr ("hello, world", "L")
|
|
@result{} "llo, world"
|
|
strcasestr ("hello, World", "wo")
|
|
@result{} "World"
|
|
@end smallexample
|
|
@end deftypefun
|
|
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {void *} memmem (const void *@var{haystack}, size_t @var{haystack-len},@*const void *@var{needle}, size_t @var{needle-len})
|
|
This is like @code{strstr}, but @var{needle} and @var{haystack} are byte
|
|
arrays rather than null-terminated strings. @var{needle-len} is the
|
|
length of @var{needle} and @var{haystack-len} is the length of
|
|
@var{haystack}.@refill
|
|
|
|
This function is a GNU extension.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun size_t strspn (const char *@var{string}, const char *@var{skipset})
|
|
The @code{strspn} (``string span'') function returns the length of the
|
|
initial substring of @var{string} that consists entirely of characters that
|
|
are members of the set specified by the string @var{skipset}. The order
|
|
of the characters in @var{skipset} is not important.
|
|
|
|
For example,
|
|
@smallexample
|
|
strspn ("hello, world", "abcdefghijklmnopqrstuvwxyz")
|
|
@result{} 5
|
|
@end smallexample
|
|
|
|
Note that ``character'' is here used in the sense of byte. In a string
|
|
using a multibyte character encoding (abstract) character consisting of
|
|
more than one byte are not treated as an entity. Each byte is treated
|
|
separately. The function is not locale-dependent.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun size_t wcsspn (const wchar_t *@var{wstring}, const wchar_t *@var{skipset})
|
|
The @code{wcsspn} (``wide character string span'') function returns the
|
|
length of the initial substring of @var{wstring} that consists entirely
|
|
of wide characters that are members of the set specified by the string
|
|
@var{skipset}. The order of the wide characters in @var{skipset} is not
|
|
important.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun size_t strcspn (const char *@var{string}, const char *@var{stopset})
|
|
The @code{strcspn} (``string complement span'') function returns the length
|
|
of the initial substring of @var{string} that consists entirely of characters
|
|
that are @emph{not} members of the set specified by the string @var{stopset}.
|
|
(In other words, it returns the offset of the first character in @var{string}
|
|
that is a member of the set @var{stopset}.)
|
|
|
|
For example,
|
|
@smallexample
|
|
strcspn ("hello, world", " \t\n,.;!?")
|
|
@result{} 5
|
|
@end smallexample
|
|
|
|
Note that ``character'' is here used in the sense of byte. In a string
|
|
using a multibyte character encoding (abstract) character consisting of
|
|
more than one byte are not treated as an entity. Each byte is treated
|
|
separately. The function is not locale-dependent.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun size_t wcscspn (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
|
|
The @code{wcscspn} (``wide character string complement span'') function
|
|
returns the length of the initial substring of @var{wstring} that
|
|
consists entirely of wide characters that are @emph{not} members of the
|
|
set specified by the string @var{stopset}. (In other words, it returns
|
|
the offset of the first character in @var{string} that is a member of
|
|
the set @var{stopset}.)
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strpbrk (const char *@var{string}, const char *@var{stopset})
|
|
The @code{strpbrk} (``string pointer break'') function is related to
|
|
@code{strcspn}, except that it returns a pointer to the first character
|
|
in @var{string} that is a member of the set @var{stopset} instead of the
|
|
length of the initial substring. It returns a null pointer if no such
|
|
character from @var{stopset} is found.
|
|
|
|
@c @group Invalid outside the example.
|
|
For example,
|
|
|
|
@smallexample
|
|
strpbrk ("hello, world", " \t\n,.;!?")
|
|
@result{} ", world"
|
|
@end smallexample
|
|
@c @end group
|
|
|
|
Note that ``character'' is here used in the sense of byte. In a string
|
|
using a multibyte character encoding (abstract) character consisting of
|
|
more than one byte are not treated as an entity. Each byte is treated
|
|
separately. The function is not locale-dependent.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcspbrk (const wchar_t *@var{wstring}, const wchar_t *@var{stopset})
|
|
The @code{wcspbrk} (``wide character string pointer break'') function is
|
|
related to @code{wcscspn}, except that it returns a pointer to the first
|
|
wide character in @var{wstring} that is a member of the set
|
|
@var{stopset} instead of the length of the initial substring. It
|
|
returns a null pointer if no such character from @var{stopset} is found.
|
|
@end deftypefun
|
|
|
|
|
|
@subsection Compatibility String Search Functions
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun {char *} index (const char *@var{string}, int @var{c})
|
|
@code{index} is another name for @code{strchr}; they are exactly the same.
|
|
New code should always use @code{strchr} since this name is defined in
|
|
@w{ISO C} while @code{index} is a BSD invention which never was available
|
|
on @w{System V} derived systems.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun {char *} rindex (const char *@var{string}, int @var{c})
|
|
@code{rindex} is another name for @code{strrchr}; they are exactly the same.
|
|
New code should always use @code{strrchr} since this name is defined in
|
|
@w{ISO C} while @code{rindex} is a BSD invention which never was available
|
|
on @w{System V} derived systems.
|
|
@end deftypefun
|
|
|
|
@node Finding Tokens in a String
|
|
@section Finding Tokens in a String
|
|
|
|
@cindex tokenizing strings
|
|
@cindex breaking a string into tokens
|
|
@cindex parsing tokens from a string
|
|
It's fairly common for programs to have a need to do some simple kinds
|
|
of lexical analysis and parsing, such as splitting a command string up
|
|
into tokens. You can do this with the @code{strtok} function, declared
|
|
in the header file @file{string.h}.
|
|
@pindex string.h
|
|
|
|
@comment string.h
|
|
@comment ISO
|
|
@deftypefun {char *} strtok (char *restrict @var{newstring}, const char *restrict @var{delimiters})
|
|
A string can be split into tokens by making a series of calls to the
|
|
function @code{strtok}.
|
|
|
|
The string to be split up is passed as the @var{newstring} argument on
|
|
the first call only. The @code{strtok} function uses this to set up
|
|
some internal state information. Subsequent calls to get additional
|
|
tokens from the same string are indicated by passing a null pointer as
|
|
the @var{newstring} argument. Calling @code{strtok} with another
|
|
non-null @var{newstring} argument reinitializes the state information.
|
|
It is guaranteed that no other library function ever calls @code{strtok}
|
|
behind your back (which would mess up this internal state information).
|
|
|
|
The @var{delimiters} argument is a string that specifies a set of delimiters
|
|
that may surround the token being extracted. All the initial characters
|
|
that are members of this set are discarded. The first character that is
|
|
@emph{not} a member of this set of delimiters marks the beginning of the
|
|
next token. The end of the token is found by looking for the next
|
|
character that is a member of the delimiter set. This character in the
|
|
original string @var{newstring} is overwritten by a null character, and the
|
|
pointer to the beginning of the token in @var{newstring} is returned.
|
|
|
|
On the next call to @code{strtok}, the searching begins at the next
|
|
character beyond the one that marked the end of the previous token.
|
|
Note that the set of delimiters @var{delimiters} do not have to be the
|
|
same on every call in a series of calls to @code{strtok}.
|
|
|
|
If the end of the string @var{newstring} is reached, or if the remainder of
|
|
string consists only of delimiter characters, @code{strtok} returns
|
|
a null pointer.
|
|
|
|
Note that ``character'' is here used in the sense of byte. In a string
|
|
using a multibyte character encoding (abstract) character consisting of
|
|
more than one byte are not treated as an entity. Each byte is treated
|
|
separately. The function is not locale-dependent.
|
|
|
|
Note that ``character'' is here used in the sense of byte. In a string
|
|
using a multibyte character encoding (abstract) character consisting of
|
|
more than one byte are not treated as an entity. Each byte is treated
|
|
separately. The function is not locale-dependent.
|
|
@end deftypefun
|
|
|
|
@comment wchar.h
|
|
@comment ISO
|
|
@deftypefun {wchar_t *} wcstok (wchar_t *@var{newstring}, const char *@var{delimiters})
|
|
A string can be split into tokens by making a series of calls to the
|
|
function @code{wcstok}.
|
|
|
|
The string to be split up is passed as the @var{newstring} argument on
|
|
the first call only. The @code{wcstok} function uses this to set up
|
|
some internal state information. Subsequent calls to get additional
|
|
tokens from the same wide character string are indicated by passing a
|
|
null pointer as the @var{newstring} argument. Calling @code{wcstok}
|
|
with another non-null @var{newstring} argument reinitializes the state
|
|
information. It is guaranteed that no other library function ever calls
|
|
@code{wcstok} behind your back (which would mess up this internal state
|
|
information).
|
|
|
|
The @var{delimiters} argument is a wide character string that specifies
|
|
a set of delimiters that may surround the token being extracted. All
|
|
the initial wide characters that are members of this set are discarded.
|
|
The first wide character that is @emph{not} a member of this set of
|
|
delimiters marks the beginning of the next token. The end of the token
|
|
is found by looking for the next wide character that is a member of the
|
|
delimiter set. This wide character in the original wide character
|
|
string @var{newstring} is overwritten by a null wide character, and the
|
|
pointer to the beginning of the token in @var{newstring} is returned.
|
|
|
|
On the next call to @code{wcstok}, the searching begins at the next
|
|
wide character beyond the one that marked the end of the previous token.
|
|
Note that the set of delimiters @var{delimiters} do not have to be the
|
|
same on every call in a series of calls to @code{wcstok}.
|
|
|
|
If the end of the wide character string @var{newstring} is reached, or
|
|
if the remainder of string consists only of delimiter wide characters,
|
|
@code{wcstok} returns a null pointer.
|
|
|
|
Note that ``character'' is here used in the sense of byte. In a string
|
|
using a multibyte character encoding (abstract) character consisting of
|
|
more than one byte are not treated as an entity. Each byte is treated
|
|
separately. The function is not locale-dependent.
|
|
@end deftypefun
|
|
|
|
@strong{Warning:} Since @code{strtok} and @code{wcstok} alter the string
|
|
they is parsing, you should always copy the string to a temporary buffer
|
|
before parsing it with @code{strtok}/@code{wcstok} (@pxref{Copying and
|
|
Concatenation}). If you allow @code{strtok} or @code{wcstok} to modify
|
|
a string that came from another part of your program, you are asking for
|
|
trouble; that string might be used for other purposes after
|
|
@code{strtok} or @code{wcstok} has modified it, and it would not have
|
|
the expected value.
|
|
|
|
The string that you are operating on might even be a constant. Then
|
|
when @code{strtok} or @code{wcstok} tries to modify it, your program
|
|
will get a fatal signal for writing in read-only memory. @xref{Program
|
|
Error Signals}. Even if the operation of @code{strtok} or @code{wcstok}
|
|
would not require a modification of the string (e.g., if there is
|
|
exactly one token) the string can (and in the GNU libc case will) be
|
|
modified.
|
|
|
|
This is a special case of a general principle: if a part of a program
|
|
does not have as its purpose the modification of a certain data
|
|
structure, then it is error-prone to modify the data structure
|
|
temporarily.
|
|
|
|
The functions @code{strtok} and @code{wcstok} are not reentrant.
|
|
@xref{Nonreentrancy}, for a discussion of where and why reentrancy is
|
|
important.
|
|
|
|
Here is a simple example showing the use of @code{strtok}.
|
|
|
|
@comment Yes, this example has been tested.
|
|
@smallexample
|
|
#include <string.h>
|
|
#include <stddef.h>
|
|
|
|
@dots{}
|
|
|
|
const char string[] = "words separated by spaces -- and, punctuation!";
|
|
const char delimiters[] = " .,;:!-";
|
|
char *token, *cp;
|
|
|
|
@dots{}
|
|
|
|
cp = strdupa (string); /* Make writable copy. */
|
|
token = strtok (cp, delimiters); /* token => "words" */
|
|
token = strtok (NULL, delimiters); /* token => "separated" */
|
|
token = strtok (NULL, delimiters); /* token => "by" */
|
|
token = strtok (NULL, delimiters); /* token => "spaces" */
|
|
token = strtok (NULL, delimiters); /* token => "and" */
|
|
token = strtok (NULL, delimiters); /* token => "punctuation" */
|
|
token = strtok (NULL, delimiters); /* token => NULL */
|
|
@end smallexample
|
|
|
|
The GNU C library contains two more functions for tokenizing a string
|
|
which overcome the limitation of non-reentrancy. They are only
|
|
available for multibyte character strings.
|
|
|
|
@comment string.h
|
|
@comment POSIX
|
|
@deftypefun {char *} strtok_r (char *@var{newstring}, const char *@var{delimiters}, char **@var{save_ptr})
|
|
Just like @code{strtok}, this function splits the string into several
|
|
tokens which can be accessed by successive calls to @code{strtok_r}.
|
|
The difference is that the information about the next token is stored in
|
|
the space pointed to by the third argument, @var{save_ptr}, which is a
|
|
pointer to a string pointer. Calling @code{strtok_r} with a null
|
|
pointer for @var{newstring} and leaving @var{save_ptr} between the calls
|
|
unchanged does the job without hindering reentrancy.
|
|
|
|
This function is defined in POSIX.1 and can be found on many systems
|
|
which support multi-threading.
|
|
@end deftypefun
|
|
|
|
@comment string.h
|
|
@comment BSD
|
|
@deftypefun {char *} strsep (char **@var{string_ptr}, const char *@var{delimiter})
|
|
This function has a similar functionality as @code{strtok_r} with the
|
|
@var{newstring} argument replaced by the @var{save_ptr} argument. The
|
|
initialization of the moving pointer has to be done by the user.
|
|
Successive calls to @code{strsep} move the pointer along the tokens
|
|
separated by @var{delimiter}, returning the address of the next token
|
|
and updating @var{string_ptr} to point to the beginning of the next
|
|
token.
|
|
|
|
One difference between @code{strsep} and @code{strtok_r} is that if the
|
|
input string contains more than one character from @var{delimiter} in a
|
|
row @code{strsep} returns an empty string for each pair of characters
|
|
from @var{delimiter}. This means that a program normally should test
|
|
for @code{strsep} returning an empty string before processing it.
|
|
|
|
This function was introduced in 4.3BSD and therefore is widely available.
|
|
@end deftypefun
|
|
|
|
Here is how the above example looks like when @code{strsep} is used.
|
|
|
|
@comment Yes, this example has been tested.
|
|
@smallexample
|
|
#include <string.h>
|
|
#include <stddef.h>
|
|
|
|
@dots{}
|
|
|
|
const char string[] = "words separated by spaces -- and, punctuation!";
|
|
const char delimiters[] = " .,;:!-";
|
|
char *running;
|
|
char *token;
|
|
|
|
@dots{}
|
|
|
|
running = strdupa (string);
|
|
token = strsep (&running, delimiters); /* token => "words" */
|
|
token = strsep (&running, delimiters); /* token => "separated" */
|
|
token = strsep (&running, delimiters); /* token => "by" */
|
|
token = strsep (&running, delimiters); /* token => "spaces" */
|
|
token = strsep (&running, delimiters); /* token => "" */
|
|
token = strsep (&running, delimiters); /* token => "" */
|
|
token = strsep (&running, delimiters); /* token => "" */
|
|
token = strsep (&running, delimiters); /* token => "and" */
|
|
token = strsep (&running, delimiters); /* token => "" */
|
|
token = strsep (&running, delimiters); /* token => "punctuation" */
|
|
token = strsep (&running, delimiters); /* token => "" */
|
|
token = strsep (&running, delimiters); /* token => NULL */
|
|
@end smallexample
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {char *} basename (const char *@var{filename})
|
|
The GNU version of the @code{basename} function returns the last
|
|
component of the path in @var{filename}. This function is the preferred
|
|
usage, since it does not modify the argument, @var{filename}, and
|
|
respects trailing slashes. The prototype for @code{basename} can be
|
|
found in @file{string.h}. Note, this function is overriden by the XPG
|
|
version, if @file{libgen.h} is included.
|
|
|
|
Example of using GNU @code{basename}:
|
|
|
|
@smallexample
|
|
#include <string.h>
|
|
|
|
int
|
|
main (int argc, char *argv[])
|
|
@{
|
|
char *prog = basename (argv[0]);
|
|
|
|
if (argc < 2)
|
|
@{
|
|
fprintf (stderr, "Usage %s <arg>\n", prog);
|
|
exit (1);
|
|
@}
|
|
|
|
@dots{}
|
|
@}
|
|
@end smallexample
|
|
|
|
@strong{Portability Note:} This function may produce different results
|
|
on different systems.
|
|
|
|
@end deftypefun
|
|
|
|
@comment libgen.h
|
|
@comment XPG
|
|
@deftypefun {char *} basename (char *@var{path})
|
|
This is the standard XPG defined @code{basename}. It is similar in
|
|
spirit to the GNU version, but may modify the @var{path} by removing
|
|
trailing '/' characters. If the @var{path} is made up entirely of '/'
|
|
characters, then "/" will be returned. Also, if @var{path} is
|
|
@code{NULL} or an empty string, then "." is returned. The prototype for
|
|
the XPG version can be found in @file{libgen.h}.
|
|
|
|
Example of using XPG @code{basename}:
|
|
|
|
@smallexample
|
|
#include <libgen.h>
|
|
|
|
int
|
|
main (int argc, char *argv[])
|
|
@{
|
|
char *prog;
|
|
char *path = strdupa (argv[0]);
|
|
|
|
prog = basename (path);
|
|
|
|
if (argc < 2)
|
|
@{
|
|
fprintf (stderr, "Usage %s <arg>\n", prog);
|
|
exit (1);
|
|
@}
|
|
|
|
@dots{}
|
|
|
|
@}
|
|
@end smallexample
|
|
@end deftypefun
|
|
|
|
@comment libgen.h
|
|
@comment XPG
|
|
@deftypefun {char *} dirname (char *@var{path})
|
|
The @code{dirname} function is the compliment to the XPG version of
|
|
@code{basename}. It returns the parent directory of the file specified
|
|
by @var{path}. If @var{path} is @code{NULL}, an empty string, or
|
|
contains no '/' characters, then "." is returned. The prototype for this
|
|
function can be found in @file{libgen.h}.
|
|
@end deftypefun
|
|
|
|
@node strfry
|
|
@section strfry
|
|
|
|
The function below addresses the perennial programming quandary: ``How do
|
|
I take good data in string form and painlessly turn it into garbage?''
|
|
This is actually a fairly simple task for C programmers who do not use
|
|
the GNU C library string functions, but for programs based on the GNU C
|
|
library, the @code{strfry} function is the preferred method for
|
|
destroying string data.
|
|
|
|
The prototype for this function is in @file{string.h}.
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {char *} strfry (char *@var{string})
|
|
|
|
@code{strfry} creates a pseudorandom anagram of a string, replacing the
|
|
input with the anagram in place. For each position in the string,
|
|
@code{strfry} swaps it with a position in the string selected at random
|
|
(from a uniform distribution). The two positions may be the same.
|
|
|
|
The return value of @code{strfry} is always @var{string}.
|
|
|
|
@strong{Portability Note:} This function is unique to the GNU C library.
|
|
|
|
@end deftypefun
|
|
|
|
|
|
@node Trivial Encryption
|
|
@section Trivial Encryption
|
|
@cindex encryption
|
|
|
|
|
|
The @code{memfrob} function converts an array of data to something
|
|
unrecognizable and back again. It is not encryption in its usual sense
|
|
since it is easy for someone to convert the encrypted data back to clear
|
|
text. The transformation is analogous to Usenet's ``Rot13'' encryption
|
|
method for obscuring offensive jokes from sensitive eyes and such.
|
|
Unlike Rot13, @code{memfrob} works on arbitrary binary data, not just
|
|
text.
|
|
@cindex Rot13
|
|
|
|
For true encryption, @xref{Cryptographic Functions}.
|
|
|
|
This function is declared in @file{string.h}.
|
|
@pindex string.h
|
|
|
|
@comment string.h
|
|
@comment GNU
|
|
@deftypefun {void *} memfrob (void *@var{mem}, size_t @var{length})
|
|
|
|
@code{memfrob} transforms (frobnicates) each byte of the data structure
|
|
at @var{mem}, which is @var{length} bytes long, by bitwise exclusive
|
|
oring it with binary 00101010. It does the transformation in place and
|
|
its return value is always @var{mem}.
|
|
|
|
Note that @code{memfrob} a second time on the same data structure
|
|
returns it to its original state.
|
|
|
|
This is a good function for hiding information from someone who doesn't
|
|
want to see it or doesn't want to see it very much. To really prevent
|
|
people from retrieving the information, use stronger encryption such as
|
|
that described in @xref{Cryptographic Functions}.
|
|
|
|
@strong{Portability Note:} This function is unique to the GNU C library.
|
|
|
|
@end deftypefun
|
|
|
|
@node Encode Binary Data
|
|
@section Encode Binary Data
|
|
|
|
To store or transfer binary data in environments which only support text
|
|
one has to encode the binary data by mapping the input bytes to
|
|
characters in the range allowed for storing or transfering. SVID
|
|
systems (and nowadays XPG compliant systems) provide minimal support for
|
|
this task.
|
|
|
|
@comment stdlib.h
|
|
@comment XPG
|
|
@deftypefun {char *} l64a (long int @var{n})
|
|
This function encodes a 32-bit input value using characters from the
|
|
basic character set. It returns a pointer to a 7 character buffer which
|
|
contains an encoded version of @var{n}. To encode a series of bytes the
|
|
user must copy the returned string to a destination buffer. It returns
|
|
the empty string if @var{n} is zero, which is somewhat bizarre but
|
|
mandated by the standard.@*
|
|
@strong{Warning:} Since a static buffer is used this function should not
|
|
be used in multi-threaded programs. There is no thread-safe alternative
|
|
to this function in the C library.@*
|
|
@strong{Compatibility Note:} The XPG standard states that the return
|
|
value of @code{l64a} is undefined if @var{n} is negative. In the GNU
|
|
implementation, @code{l64a} treats its argument as unsigned, so it will
|
|
return a sensible encoding for any nonzero @var{n}; however, portable
|
|
programs should not rely on this.
|
|
|
|
To encode a large buffer @code{l64a} must be called in a loop, once for
|
|
each 32-bit word of the buffer. For example, one could do something
|
|
like this:
|
|
|
|
@smallexample
|
|
char *
|
|
encode (const void *buf, size_t len)
|
|
@{
|
|
/* @r{We know in advance how long the buffer has to be.} */
|
|
unsigned char *in = (unsigned char *) buf;
|
|
char *out = malloc (6 + ((len + 3) / 4) * 6 + 1);
|
|
char *cp = out, *p;
|
|
|
|
/* @r{Encode the length.} */
|
|
/* @r{Using `htonl' is necessary so that the data can be}
|
|
@r{decoded even on machines with different byte order.}
|
|
@r{`l64a' can return a string shorter than 6 bytes, so }
|
|
@r{we pad it with encoding of 0 (}'.'@r{) at the end by }
|
|
@r{hand.} */
|
|
|
|
p = stpcpy (cp, l64a (htonl (len)));
|
|
cp = mempcpy (p, "......", 6 - (p - cp));
|
|
|
|
while (len > 3)
|
|
@{
|
|
unsigned long int n = *in++;
|
|
n = (n << 8) | *in++;
|
|
n = (n << 8) | *in++;
|
|
n = (n << 8) | *in++;
|
|
len -= 4;
|
|
p = stpcpy (cp, l64a (htonl (n)));
|
|
cp = mempcpy (p, "......", 6 - (p - cp));
|
|
@}
|
|
if (len > 0)
|
|
@{
|
|
unsigned long int n = *in++;
|
|
if (--len > 0)
|
|
@{
|
|
n = (n << 8) | *in++;
|
|
if (--len > 0)
|
|
n = (n << 8) | *in;
|
|
@}
|
|
cp = stpcpy (cp, l64a (htonl (n)));
|
|
@}
|
|
*cp = '\0';
|
|
return out;
|
|
@}
|
|
@end smallexample
|
|
|
|
It is strange that the library does not provide the complete
|
|
functionality needed but so be it.
|
|
|
|
@end deftypefun
|
|
|
|
To decode data produced with @code{l64a} the following function should be
|
|
used.
|
|
|
|
@comment stdlib.h
|
|
@comment XPG
|
|
@deftypefun {long int} a64l (const char *@var{string})
|
|
The parameter @var{string} should contain a string which was produced by
|
|
a call to @code{l64a}. The function processes at least 6 characters of
|
|
this string, and decodes the characters it finds according to the table
|
|
below. It stops decoding when it finds a character not in the table,
|
|
rather like @code{atoi}; if you have a buffer which has been broken into
|
|
lines, you must be careful to skip over the end-of-line characters.
|
|
|
|
The decoded number is returned as a @code{long int} value.
|
|
@end deftypefun
|
|
|
|
The @code{l64a} and @code{a64l} functions use a base 64 encoding, in
|
|
which each character of an encoded string represents six bits of an
|
|
input word. These symbols are used for the base 64 digits:
|
|
|
|
@multitable {xxxxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx} {xxx}
|
|
@item @tab 0 @tab 1 @tab 2 @tab 3 @tab 4 @tab 5 @tab 6 @tab 7
|
|
@item 0 @tab @code{.} @tab @code{/} @tab @code{0} @tab @code{1}
|
|
@tab @code{2} @tab @code{3} @tab @code{4} @tab @code{5}
|
|
@item 8 @tab @code{6} @tab @code{7} @tab @code{8} @tab @code{9}
|
|
@tab @code{A} @tab @code{B} @tab @code{C} @tab @code{D}
|
|
@item 16 @tab @code{E} @tab @code{F} @tab @code{G} @tab @code{H}
|
|
@tab @code{I} @tab @code{J} @tab @code{K} @tab @code{L}
|
|
@item 24 @tab @code{M} @tab @code{N} @tab @code{O} @tab @code{P}
|
|
@tab @code{Q} @tab @code{R} @tab @code{S} @tab @code{T}
|
|
@item 32 @tab @code{U} @tab @code{V} @tab @code{W} @tab @code{X}
|
|
@tab @code{Y} @tab @code{Z} @tab @code{a} @tab @code{b}
|
|
@item 40 @tab @code{c} @tab @code{d} @tab @code{e} @tab @code{f}
|
|
@tab @code{g} @tab @code{h} @tab @code{i} @tab @code{j}
|
|
@item 48 @tab @code{k} @tab @code{l} @tab @code{m} @tab @code{n}
|
|
@tab @code{o} @tab @code{p} @tab @code{q} @tab @code{r}
|
|
@item 56 @tab @code{s} @tab @code{t} @tab @code{u} @tab @code{v}
|
|
@tab @code{w} @tab @code{x} @tab @code{y} @tab @code{z}
|
|
@end multitable
|
|
|
|
This encoding scheme is not standard. There are some other encoding
|
|
methods which are much more widely used (UU encoding, MIME encoding).
|
|
Generally, it is better to use one of these encodings.
|
|
|
|
@node Argz and Envz Vectors
|
|
@section Argz and Envz Vectors
|
|
|
|
@cindex argz vectors (string vectors)
|
|
@cindex string vectors, null-character separated
|
|
@cindex argument vectors, null-character separated
|
|
@dfn{argz vectors} are vectors of strings in a contiguous block of
|
|
memory, each element separated from its neighbors by null-characters
|
|
(@code{'\0'}).
|
|
|
|
@cindex envz vectors (environment vectors)
|
|
@cindex environment vectors, null-character separated
|
|
@dfn{Envz vectors} are an extension of argz vectors where each element is a
|
|
name-value pair, separated by a @code{'='} character (as in a Unix
|
|
environment).
|
|
|
|
@menu
|
|
* Argz Functions:: Operations on argz vectors.
|
|
* Envz Functions:: Additional operations on environment vectors.
|
|
@end menu
|
|
|
|
@node Argz Functions, Envz Functions, , Argz and Envz Vectors
|
|
@subsection Argz Functions
|
|
|
|
Each argz vector is represented by a pointer to the first element, of
|
|
type @code{char *}, and a size, of type @code{size_t}, both of which can
|
|
be initialized to @code{0} to represent an empty argz vector. All argz
|
|
functions accept either a pointer and a size argument, or pointers to
|
|
them, if they will be modified.
|
|
|
|
The argz functions use @code{malloc}/@code{realloc} to allocate/grow
|
|
argz vectors, and so any argz vector creating using these functions may
|
|
be freed by using @code{free}; conversely, any argz function that may
|
|
grow a string expects that string to have been allocated using
|
|
@code{malloc} (those argz functions that only examine their arguments or
|
|
modify them in place will work on any sort of memory).
|
|
@xref{Unconstrained Allocation}.
|
|
|
|
All argz functions that do memory allocation have a return type of
|
|
@code{error_t}, and return @code{0} for success, and @code{ENOMEM} if an
|
|
allocation error occurs.
|
|
|
|
@pindex argz.h
|
|
These functions are declared in the standard include file @file{argz.h}.
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} argz_create (char *const @var{argv}[], char **@var{argz}, size_t *@var{argz_len})
|
|
The @code{argz_create} function converts the Unix-style argument vector
|
|
@var{argv} (a vector of pointers to normal C strings, terminated by
|
|
@code{(char *)0}; @pxref{Program Arguments}) into an argz vector with
|
|
the same elements, which is returned in @var{argz} and @var{argz_len}.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} argz_create_sep (const char *@var{string}, int @var{sep}, char **@var{argz}, size_t *@var{argz_len})
|
|
The @code{argz_create_sep} function converts the null-terminated string
|
|
@var{string} into an argz vector (returned in @var{argz} and
|
|
@var{argz_len}) by splitting it into elements at every occurrence of the
|
|
character @var{sep}.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {size_t} argz_count (const char *@var{argz}, size_t @var{arg_len})
|
|
Returns the number of elements in the argz vector @var{argz} and
|
|
@var{argz_len}.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {void} argz_extract (char *@var{argz}, size_t @var{argz_len}, char **@var{argv})
|
|
The @code{argz_extract} function converts the argz vector @var{argz} and
|
|
@var{argz_len} into a Unix-style argument vector stored in @var{argv},
|
|
by putting pointers to every element in @var{argz} into successive
|
|
positions in @var{argv}, followed by a terminator of @code{0}.
|
|
@var{Argv} must be pre-allocated with enough space to hold all the
|
|
elements in @var{argz} plus the terminating @code{(char *)0}
|
|
(@code{(argz_count (@var{argz}, @var{argz_len}) + 1) * sizeof (char *)}
|
|
bytes should be enough). Note that the string pointers stored into
|
|
@var{argv} point into @var{argz}---they are not copies---and so
|
|
@var{argz} must be copied if it will be changed while @var{argv} is
|
|
still active. This function is useful for passing the elements in
|
|
@var{argz} to an exec function (@pxref{Executing a File}).
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {void} argz_stringify (char *@var{argz}, size_t @var{len}, int @var{sep})
|
|
The @code{argz_stringify} converts @var{argz} into a normal string with
|
|
the elements separated by the character @var{sep}, by replacing each
|
|
@code{'\0'} inside @var{argz} (except the last one, which terminates the
|
|
string) with @var{sep}. This is handy for printing @var{argz} in a
|
|
readable manner.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} argz_add (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str})
|
|
The @code{argz_add} function adds the string @var{str} to the end of the
|
|
argz vector @code{*@var{argz}}, and updates @code{*@var{argz}} and
|
|
@code{*@var{argz_len}} accordingly.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} argz_add_sep (char **@var{argz}, size_t *@var{argz_len}, const char *@var{str}, int @var{delim})
|
|
The @code{argz_add_sep} function is similar to @code{argz_add}, but
|
|
@var{str} is split into separate elements in the result at occurrences of
|
|
the character @var{delim}. This is useful, for instance, for
|
|
adding the components of a Unix search path to an argz vector, by using
|
|
a value of @code{':'} for @var{delim}.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} argz_append (char **@var{argz}, size_t *@var{argz_len}, const char *@var{buf}, size_t @var{buf_len})
|
|
The @code{argz_append} function appends @var{buf_len} bytes starting at
|
|
@var{buf} to the argz vector @code{*@var{argz}}, reallocating
|
|
@code{*@var{argz}} to accommodate it, and adding @var{buf_len} to
|
|
@code{*@var{argz_len}}.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {void} argz_delete (char **@var{argz}, size_t *@var{argz_len}, char *@var{entry})
|
|
If @var{entry} points to the beginning of one of the elements in the
|
|
argz vector @code{*@var{argz}}, the @code{argz_delete} function will
|
|
remove this entry and reallocate @code{*@var{argz}}, modifying
|
|
@code{*@var{argz}} and @code{*@var{argz_len}} accordingly. Note that as
|
|
destructive argz functions usually reallocate their argz argument,
|
|
pointers into argz vectors such as @var{entry} will then become invalid.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} argz_insert (char **@var{argz}, size_t *@var{argz_len}, char *@var{before}, const char *@var{entry})
|
|
The @code{argz_insert} function inserts the string @var{entry} into the
|
|
argz vector @code{*@var{argz}} at a point just before the existing
|
|
element pointed to by @var{before}, reallocating @code{*@var{argz}} and
|
|
updating @code{*@var{argz}} and @code{*@var{argz_len}}. If @var{before}
|
|
is @code{0}, @var{entry} is added to the end instead (as if by
|
|
@code{argz_add}). Since the first element is in fact the same as
|
|
@code{*@var{argz}}, passing in @code{*@var{argz}} as the value of
|
|
@var{before} will result in @var{entry} being inserted at the beginning.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun {char *} argz_next (char *@var{argz}, size_t @var{argz_len}, const char *@var{entry})
|
|
The @code{argz_next} function provides a convenient way of iterating
|
|
over the elements in the argz vector @var{argz}. It returns a pointer
|
|
to the next element in @var{argz} after the element @var{entry}, or
|
|
@code{0} if there are no elements following @var{entry}. If @var{entry}
|
|
is @code{0}, the first element of @var{argz} is returned.
|
|
|
|
This behavior suggests two styles of iteration:
|
|
|
|
@smallexample
|
|
char *entry = 0;
|
|
while ((entry = argz_next (@var{argz}, @var{argz_len}, entry)))
|
|
@var{action};
|
|
@end smallexample
|
|
|
|
(the double parentheses are necessary to make some C compilers shut up
|
|
about what they consider a questionable @code{while}-test) and:
|
|
|
|
@smallexample
|
|
char *entry;
|
|
for (entry = @var{argz};
|
|
entry;
|
|
entry = argz_next (@var{argz}, @var{argz_len}, entry))
|
|
@var{action};
|
|
@end smallexample
|
|
|
|
Note that the latter depends on @var{argz} having a value of @code{0} if
|
|
it is empty (rather than a pointer to an empty block of memory); this
|
|
invariant is maintained for argz vectors created by the functions here.
|
|
@end deftypefun
|
|
|
|
@comment argz.h
|
|
@comment GNU
|
|
@deftypefun error_t argz_replace (@w{char **@var{argz}, size_t *@var{argz_len}}, @w{const char *@var{str}, const char *@var{with}}, @w{unsigned *@var{replace_count}})
|
|
Replace any occurrences of the string @var{str} in @var{argz} with
|
|
@var{with}, reallocating @var{argz} as necessary. If
|
|
@var{replace_count} is non-zero, @code{*@var{replace_count}} will be
|
|
incremented by number of replacements performed.
|
|
@end deftypefun
|
|
|
|
@node Envz Functions, , Argz Functions, Argz and Envz Vectors
|
|
@subsection Envz Functions
|
|
|
|
Envz vectors are just argz vectors with additional constraints on the form
|
|
of each element; as such, argz functions can also be used on them, where it
|
|
makes sense.
|
|
|
|
Each element in an envz vector is a name-value pair, separated by a @code{'='}
|
|
character; if multiple @code{'='} characters are present in an element, those
|
|
after the first are considered part of the value, and treated like all other
|
|
non-@code{'\0'} characters.
|
|
|
|
If @emph{no} @code{'='} characters are present in an element, that element is
|
|
considered the name of a ``null'' entry, as distinct from an entry with an
|
|
empty value: @code{envz_get} will return @code{0} if given the name of null
|
|
entry, whereas an entry with an empty value would result in a value of
|
|
@code{""}; @code{envz_entry} will still find such entries, however. Null
|
|
entries can be removed with @code{envz_strip} function.
|
|
|
|
As with argz functions, envz functions that may allocate memory (and thus
|
|
fail) have a return type of @code{error_t}, and return either @code{0} or
|
|
@code{ENOMEM}.
|
|
|
|
@pindex envz.h
|
|
These functions are declared in the standard include file @file{envz.h}.
|
|
|
|
@comment envz.h
|
|
@comment GNU
|
|
@deftypefun {char *} envz_entry (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
|
|
The @code{envz_entry} function finds the entry in @var{envz} with the name
|
|
@var{name}, and returns a pointer to the whole entry---that is, the argz
|
|
element which begins with @var{name} followed by a @code{'='} character. If
|
|
there is no entry with that name, @code{0} is returned.
|
|
@end deftypefun
|
|
|
|
@comment envz.h
|
|
@comment GNU
|
|
@deftypefun {char *} envz_get (const char *@var{envz}, size_t @var{envz_len}, const char *@var{name})
|
|
The @code{envz_get} function finds the entry in @var{envz} with the name
|
|
@var{name} (like @code{envz_entry}), and returns a pointer to the value
|
|
portion of that entry (following the @code{'='}). If there is no entry with
|
|
that name (or only a null entry), @code{0} is returned.
|
|
@end deftypefun
|
|
|
|
@comment envz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} envz_add (char **@var{envz}, size_t *@var{envz_len}, const char *@var{name}, const char *@var{value})
|
|
The @code{envz_add} function adds an entry to @code{*@var{envz}}
|
|
(updating @code{*@var{envz}} and @code{*@var{envz_len}}) with the name
|
|
@var{name}, and value @var{value}. If an entry with the same name
|
|
already exists in @var{envz}, it is removed first. If @var{value} is
|
|
@code{0}, then the new entry will the special null type of entry
|
|
(mentioned above).
|
|
@end deftypefun
|
|
|
|
@comment envz.h
|
|
@comment GNU
|
|
@deftypefun {error_t} envz_merge (char **@var{envz}, size_t *@var{envz_len}, const char *@var{envz2}, size_t @var{envz2_len}, int @var{override})
|
|
The @code{envz_merge} function adds each entry in @var{envz2} to @var{envz},
|
|
as if with @code{envz_add}, updating @code{*@var{envz}} and
|
|
@code{*@var{envz_len}}. If @var{override} is true, then values in @var{envz2}
|
|
will supersede those with the same name in @var{envz}, otherwise not.
|
|
|
|
Null entries are treated just like other entries in this respect, so a null
|
|
entry in @var{envz} can prevent an entry of the same name in @var{envz2} from
|
|
being added to @var{envz}, if @var{override} is false.
|
|
@end deftypefun
|
|
|
|
@comment envz.h
|
|
@comment GNU
|
|
@deftypefun {void} envz_strip (char **@var{envz}, size_t *@var{envz_len})
|
|
The @code{envz_strip} function removes any null entries from @var{envz},
|
|
updating @code{*@var{envz}} and @code{*@var{envz_len}}.
|
|
@end deftypefun
|