mbstowcs: Document, test, and fix null pointer dst semantics (Bug 25219)

The function mbstowcs, by an XSI extension to POSIX, accepts a null
pointer for the destination wchar_t array.  This API behaviour allows
you to use the function to compute the length of the required wchar_t
array i.e. does the conversion without storing it and returns the
number of wide characters required.

We remove the __write_only__ markup for the first argument because it
is not true since the destination may be a null pointer, and so the
length argument may not apply.  We remove the markup otherwise the new
test case cannot be compiled with -Werror=nonnull.

We add a new test case for mbstowcs which exercises the destination is
a null pointer behaviour which we have now explicitly documented.

The mbsrtowcs and mbsnrtowcs behave similarly, and mbsrtowcs is
documented as doing this in C11, even if the standard doesn't come out
and call out this specific use case.  We add one note to each of
mbsrtowcs and mbsnrtowcs to call out that they support a null pointer
for the destination.

The wcsrtombs function behaves similarly but in the other way around
and allows you to use a null destination pointer to compute how many
bytes you would need to convert the wide character input.  We document
this particular case also, but leave wcsnrtombs as a references to
wcsrtombs, so the reader must still read the details of the semantics
for wcsrtombs.
This commit is contained in:
Carlos O'Donell 2020-05-21 17:50:53 -04:00
parent 9e2dc874e6
commit 61af4bbb2a
4 changed files with 71 additions and 6 deletions

View File

@ -1026,6 +1026,10 @@ stores in the pointer pointed to by @var{src} either a null pointer (if
the NUL byte in the input string was reached) or the address of the byte the NUL byte in the input string was reached) or the address of the byte
following the last converted multibyte character. following the last converted multibyte character.
Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and
the function can be used to count the number of wide characters that
would be required.
@pindex wchar.h @pindex wchar.h
@code{mbsrtowcs} was introduced in @w{Amendment 1} to @w{ISO C90} and is @code{mbsrtowcs} was introduced in @w{Amendment 1} to @w{ISO C90} and is
declared in @file{wchar.h}. declared in @file{wchar.h}.
@ -1101,10 +1105,11 @@ successfully converted.
Except in the case of an encoding error the return value of the Except in the case of an encoding error the return value of the
@code{wcsrtombs} function is the number of bytes in all the multibyte @code{wcsrtombs} function is the number of bytes in all the multibyte
character sequences stored in @var{dst}. Before returning, the state in character sequences which were or would have been (if @var{dst} was
the object pointed to by @var{ps} (or the internal object in case not a null) stored in @var{dst}. Before returning, the state in the
@var{ps} is a null pointer) is updated to reflect the state after the object pointed to by @var{ps} (or the internal object in case @var{ps}
last conversion. The state is the initial shift state in case the is a null pointer) is updated to reflect the state after the last
conversion. The state is the initial shift state in case the
terminating NUL wide character was converted. terminating NUL wide character was converted.
@pindex wchar.h @pindex wchar.h
@ -1131,6 +1136,10 @@ string @code{*@var{src}} need not be NUL-terminated. But if a NUL byte
is found within the @var{nmc} first bytes of the string, the conversion is found within the @var{nmc} first bytes of the string, the conversion
stops there. stops there.
Like @code{mbstowcs} the @var{dst} parameter may be a null pointer and
the function can be used to count the number of wide characters that
would be required.
This function is a GNU extension. It is meant to work around the This function is a GNU extension. It is meant to work around the
problems mentioned above. Now it is possible to convert a buffer with problems mentioned above. Now it is possible to convert a buffer with
multibyte character text piece by piece without having to care about multibyte character text piece by piece without having to care about
@ -1465,6 +1474,12 @@ mbstowcs_alloc (const char *string)
@} @}
@end smallexample @end smallexample
If @var{wstring} is a null pointer then no output is written and the
conversion proceeds as above, and the result is returned. In practice
such behaviour is useful for calculating the exact number of wide
characters required to convert @var{string}. This behaviour of
accepting a null pointer for @var{wstring} is an @w{XPG4.2} extension
that is not specified in @w{ISO C} and is optional in @w{POSIX}.
@end deftypefun @end deftypefun
@deftypefun size_t wcstombs (char *@var{string}, const wchar_t *@var{wstring}, size_t @var{size}) @deftypefun size_t wcstombs (char *@var{string}, const wchar_t *@var{wstring}, size_t @var{size})

View File

@ -932,7 +932,7 @@ extern int wctomb (char *__s, wchar_t __wchar) __THROW;
/* Convert a multibyte string to a wide char string. */ /* Convert a multibyte string to a wide char string. */
extern size_t mbstowcs (wchar_t *__restrict __pwcs, extern size_t mbstowcs (wchar_t *__restrict __pwcs,
const char *__restrict __s, size_t __n) __THROW const char *__restrict __s, size_t __n) __THROW
__attr_access ((__write_only__, 1, 3)) __attr_access ((__read_only__, 2)); __attr_access ((__read_only__, 2));
/* Convert a wide char string to multibyte string. */ /* Convert a wide char string to multibyte string. */
extern size_t wcstombs (char *__restrict __s, extern size_t wcstombs (char *__restrict __s,
const wchar_t *__restrict __pwcs, size_t __n) const wchar_t *__restrict __pwcs, size_t __n)

View File

@ -52,7 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \
tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \ tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \
tst-wcstod-round test-char-types tst-fgetwc-after-eof \ tst-wcstod-round test-char-types tst-fgetwc-after-eof \
tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \ tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \
$(addprefix test-,$(strop-tests)) $(addprefix test-,$(strop-tests)) tst-mbstowcs
include ../Rules include ../Rules

50
wcsmbs/tst-mbstowcs.c Normal file
View File

@ -0,0 +1,50 @@
/* Test basic mbstowcs including wstring == NULL (Bug 25219).
Copyright (C) 2020 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
#include <stdlib.h>
#include <string.h>
#include <support/check.h>
static int
do_test (void)
{
char string[] = { '1', '2', '3' , '4', '5', '\0' };
size_t len = strlen (string);
wchar_t wstring[] = { L'1', L'2', L'3', L'4', L'5', L'\0' };
#define NUM_WCHAR 6
wchar_t wout[NUM_WCHAR];
size_t result;
/* The input ASCII string in the C/POSIX locale must convert
to the matching WSTRING. */
result = mbstowcs (wout, string, NUM_WCHAR);
TEST_VERIFY (result == (NUM_WCHAR - 1));
TEST_COMPARE_BLOB (wstring, sizeof (wchar_t) * (NUM_WCHAR - 1),
wout, sizeof (wchar_t) * result);
/* The input ASCII string in the C/POSIX locale must be the
same length when using mbstowcs to compute the length of
the string required in the conversion. Using mbstowcs
in this way is an XSI extension to POSIX. */
result = mbstowcs (NULL, string, len);
TEST_VERIFY (result == len);
return 0;
}
#include <support/test-driver.c>