The GNU implementation of wcrtomb assumes that there are at least
MB_CUR_MAX bytes available in the destination buffer passed to wcrtomb
as the first argument. This is not compatible with the POSIX
definition, which only requires enough space for the input wide
character.
This does not break much in practice because when users supply buffers
smaller than MB_CUR_MAX (e.g. in ncurses), they compute and dynamically
allocate the buffer, which results in enough spare space (thanks to
usable_size in malloc and padding in alloca) that no actual buffer
overflow occurs. However when the code is built with _FORTIFY_SOURCE,
it runs into the hard check against MB_CUR_MAX in __wcrtomb_chk and
hence fails. It wasn't evident until now since dynamic allocations
would result in wcrtomb not being fortified but since _FORTIFY_SOURCE=3,
that limitation is gone, resulting in such code failing.
To fix this problem, introduce an internal buffer that is MB_LEN_MAX
long and use that to perform the conversion and then copy the resultant
bytes into the destination buffer. Also move the fortification check
into the main implementation, which checks the result after conversion
and aborts if the resultant byte count is greater than the destination
buffer size.
One complication is that applications that assume the MB_CUR_MAX
limitation to be gone may not be able to run safely on older glibcs if
they use static destination buffers smaller than MB_CUR_MAX; dynamic
allocations will always have enough spare space that no actual overruns
will occur. One alternative to fixing this is to bump symbol version to
prevent them from running on older glibcs but that seems too strict a
constraint. Instead, since these users will only have made this
decision on reading the manual, I have put a note in the manual warning
them about the pitfalls of having static buffers smaller than
MB_CUR_MAX and running them on older glibc.
Benchmarking:
The wcrtomb microbenchmark shows significant increases in maximum
execution time for all locales, ranging from 10x for ar_SA.UTF-8 to
1.5x-2x for nearly everything else. The mean execution time however saw
practically no impact, with some results even being quicker, indicating
that cache locality has a much bigger role in the overhead.
Given that the additional copy uses a temporary buffer inside wcrtomb,
it's likely that a hot path will end up putting that buffer (which is
responsible for the additional overhead) in a similar place on stack,
giving the necessary cache locality to negate the overhead. However in
situations where wcrtomb ends up getting called at wildly different
spots on the call stack (or is on different call stacks, e.g. with
threads or different execution contexts) and is still a hotspot, the
performance lag will be visible.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Cover key corner cases (e.g., whether errno is set) that are well
settled in glibc, fix some examples to avoid integer overflow, and
update some other dated examples (code needed for K&R C, e.g.).
* manual/charset.texi (Non-reentrant String Conversion):
* manual/filesys.texi (Symbolic Links):
* manual/memory.texi (Allocating Cleared Space):
* manual/socket.texi (Host Names):
* manual/string.texi (Concatenating Strings):
* manual/users.texi (Setting Groups):
Use reallocarray instead of realloc, to avoid integer overflow issues.
* manual/filesys.texi (Scanning Directory Content):
* manual/memory.texi (The GNU Allocator, Hooks for Malloc):
* manual/tunables.texi:
Use code font for 'malloc' instead of roman font.
(Symbolic Links): Don't assume readlink return value fits in 'int'.
* manual/memory.texi (Memory Allocation and C, Basic Allocation)
(Malloc Examples, Alloca Example):
* manual/stdio.texi (Formatted Output Functions):
* manual/string.texi (Concatenating Strings, Collation Functions):
Omit pointer casts that are needed only in ancient K&R C.
* manual/memory.texi (Basic Allocation):
Say that malloc sets errno on failure.
Say "convert" rather than "cast", since casts are no longer needed.
* manual/memory.texi (Basic Allocation):
* manual/string.texi (Concatenating Strings):
In examples, use C99 declarations after statements for brevity.
* manual/memory.texi (Malloc Examples): Add portability notes for
malloc (0), errno setting, and PTRDIFF_MAX.
(Changing Block Size): Say that realloc (p, 0) acts like
(p ? (free (p), NULL) : malloc (0)).
Add xreallocarray example, since other examples can use it.
Add portability notes for realloc (0, 0), realloc (p, 0),
PTRDIFF_MAX, and improve notes for reallocating to the same size.
(Allocating Cleared Space): Reword now-confusing discussion
about replacement, and xref "Replacing malloc".
* manual/stdio.texi (Formatted Output Functions):
Don't assume message size fits in 'int'.
* manual/string.texi (Concatenating Strings):
Fix undefined behavior involving arithmetic on a freed pointer.
The function mbstowcs, by an XSI extension to POSIX, accepts a null
pointer for the destination wchar_t array. This API behaviour allows
you to use the function to compute the length of the required wchar_t
array i.e. does the conversion without storing it and returns the
number of wide characters required.
We remove the __write_only__ markup for the first argument because it
is not true since the destination may be a null pointer, and so the
length argument may not apply. We remove the markup otherwise the new
test case cannot be compiled with -Werror=nonnull.
We add a new test case for mbstowcs which exercises the destination is
a null pointer behaviour which we have now explicitly documented.
The mbsrtowcs and mbsnrtowcs behave similarly, and mbsrtowcs is
documented as doing this in C11, even if the standard doesn't come out
and call out this specific use case. We add one note to each of
mbsrtowcs and mbsnrtowcs to call out that they support a null pointer
for the destination.
The wcsrtombs function behaves similarly but in the other way around
and allows you to use a null destination pointer to compute how many
bytes you would need to convert the wide character input. We document
this particular case also, but leave wcsnrtombs as a references to
wcsrtombs, so the reader must still read the details of the semantics
for wcsrtombs.
In the "Extended Char Intro" the example incorrectly uses a function
called wgetc which doesn't exist. The example is corrected to use
getwc, which is correct for the use in this case.
Reported-by: Toomas Rosin <toomas@rosin.ee>
The example did not work because the null byte was not converted, and
mbrtowc was called with a zero-length input string. This results in a
(size_t) -2 return value, so the function always returns NULL.
The size computation for the heap allocation of the result was
incorrect because it did not deal with integer overflow.
Error checking was missing, and the allocated memory was not freed on
error paths. All error returns now set errno. (Note that there is an
assumption that free does not clobber errno.)
The slightly unportable comparision against (size_t) -2 to catch both
(size_t) -1 and (size_t) -2 return values is gone as well.
A null wide character needs to be stored in the result explicitly, to
terminate it.
The description in the manual is updated to deal with these finer
points. The (size_t) -2 behavior (consuming the input bytes) matches
what is specified in ISO C11.
The Summary is now generated from @standards, and syntax-checking is
performed. If invalid @standards syntax is detected, summary.pl will
fail, reporting all errors. Failure and error reporting is disabled
for now, however, since much of the manual is still incomplete
wrt. header and standards annotations.
Note that the sorting order of the Summary has changed; summary.pl
respects the locale, like summary.awk did, but the use of LC_ALL=C is
introduced in the Makefile. Other notable deviations are improved
detection of the annotated elements' names, which are used for
sorting, and improved detection of the @node used to reference into
the manual. The most noticeable difference in the rendered Summary is
that entries may now contain multiple lines, one for each header and
standard combination.
summary.pl accepts a `--help' option, which details the expected
syntax of @standards. If errors are reported, the user is directed to
this feature for further information.
* manual/Makefile: Generate summary.texi with summary.pl.
Force use of the C locale. Update Perl dependency comment.
* manual/header.texi: Update reference to summary.awk.
* manual/macros.texi: Refer authors to `summary.pl --help'.
* manual/summary.awk: Remove file.
* manual/summary.pl: New file. Generate summary.texi, and
check for @standards-related syntax errors.
* manual/argp.texi: Convert header and standards @comments to
@standards.
* manual/arith.texi: Likewise.
* manual/charset.texi: Likewise.
* manual/conf.texi: Likewise.
* manual/creature.texi: Likewise.
* manual/crypt.texi: Likewise.
* manual/ctype.texi: Likewise.
* manual/debug.texi: Likewise.
* manual/errno.texi: Likewise.
* manual/filesys.texi: Likewise.
* manual/getopt.texi: Likewise.
* manual/job.texi: Likewise.
* manual/lang.texi: Likewise.
* manual/llio.texi: Likewise.
* manual/locale.texi: Likewise.
* manual/math.texi: Likewise.
* manual/memory.texi: Likewise.
* manual/message.texi: Likewise.
* manual/pattern.texi: Likewise.
* manual/pipe.texi: Likewise.
* manual/process.texi: Likewise.
* manual/resource.texi: Likewise.
* manual/search.texi: Likewise.
* manual/setjmp.texi: Likewise.
* manual/signal.texi: Likewise.
* manual/socket.texi: Likewise.
* manual/startup.texi: Likewise.
* manual/stdio.texi: Likewise.
* manual/string.texi: Likewise.
* manual/sysinfo.texi: Likewise.
* manual/syslog.texi: Likewise.
* manual/terminal.texi: Likewise.
* manual/threads.texi: Likewise.
* manual/time.texi: Likewise.
* manual/users.texi: Likewise.
2001-09-29 Jes Sorensen <jes@trained-monkey.org>
* sysdeps/unix/sysv/linux/ia64/bits/sigcontext.h (struct sigcontext):
Add sc_loadrs and sc_rbs_bas to match current kernel.
2001-09-27 Jakub Jelinek <jakub@redhat.com>
* sysdeps/sparc/sparc64/fpu/libm-test-ulps: Update.
* sysdeps/ieee754/ldbl-128/s_erfl.c (__erfcl): Fix erfc(-inf).
2001-09-27 Jakub Jelinek <jakub@redhat.com>
* elf/dl-open.c (dl_open_worker): If l_opencount of freshly loaded
object has been bumped because of relocation dependency, avoid
duplicates in l_scope.
(show_scope): Fix typos.
* elf/Makefile: Add rules to build and run reldep6.
* elf/reldep6.c: New file.
* elf/reldep6mod0.c: New file.
* elf/reldep6mod1.c: New file.
* elf/reldep6mod2.c: New file.
* elf/reldep6mod3.c: New file.
* elf/reldep6mod4.c: New file.
2001-09-26 Jakub Jelinek <jakub@redhat.com>
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Call
sparc64_fixup_plt.
(sparc64_fixup_plt): Moved from elf_machine_fixup_plt. Optimize
near jumps and 0xfffff800XXXXXXXX target addresses, no thread safety
for non-lazy binding. Fix .plt[32768+] handling.
(elf_machine_plt_value): Don't add addend.
(elf_machine_rela): Call sparc64_fixup_plt instead of
elf_machine_fixup_plt.
(elf_machine_runtime_setup, TRAMPOLINE_TEMPLATE): Optimize for
dynamic linker at 0xfffff800XXXXXXXX.
* sysdeps/sparc/sparc32/fpu/libm-test-ulps: Update.
2000-12-22 Ben Collins <bcollins@debian.org>
* manual/charset.texi: Fix typo in description of WCHAR_MAX.
* manual/argp.texi: Document argp_domain as part of struct argp.
2000-12-23 Ben Collins <bcollins@debian.org>
* manual/charset.texi (Extended Char Intro): Fix typo in ISO 6937
description.
* manual/stdio.texi (Dynamic Output): Document the return value of
asprintf. Also make the asprintf/snprintf examples a little
better (check for some error returns).
* manual/message.texi (Using gettextized software): Fix typo.
* manual/charset.texi (Converting a Character): Fix mbstouwcs
program to compile.
Patch by Martin Buchholz <martin@xemacs.org>.
1999-11-25 H.J. Lu <hjl@gnu.org>
* stdlib/exit.c (exit): Run funtions only if
__exit_funcs->idx > 0.
1999-11-25 Ulrich Drepper <drepper@cygnus.com>
* manual/charset.texi (iconv Examples): Add iconv call to flush
state. Reported by Andrew Clausen <clausen@alphalink.com.au>.
1999-11-25 Andreas Jaeger <aj@suse.de>
* manual/install.texi (Running make install): Better describe
update from libc5.
Patch by Michael Deutschmann <michael@talamasca.wkpowerlink.com>.
1999-11-25 Andreas Jaeger <aj@suse.de>
* include/sys/mman.h: Remove K&R compatibility.
1999-11-15 Andreas Jaeger <aj@suse.de>
* misc/sys/mman.h: Use __REDIRECT for mmap, correct prototype to
use __off64_t.
1999-11-25 Ulrich Drepper <drepper@cygnus.com>
* iconv/iconv_prog.c (process_block): For stateful charsets write
out byte sequence to get to initial state at the end of the file.
which was reported to not work (which proofed to be wrong).
1999-07-08 Andreas Schwab <schwab@suse.de>
* libio/iofopncook.c (fopencookie): Set _fileno to -2.
* libio/libioP.h (_IO_file_is_open): Only check for -1, not all
negative numbers.
* libio/fileops.c (_IO_new_file_close_it): Set _fileno to -1, not
EOF.
* libio/oldfileops.c (_IO_old_file_close_it): Likewise.
1999-07-08 Andreas Schwab <schwab@suse.de>
* stdio-common/vfprintf.c (buffered_vfprintf): Initialize _mode.
1999-07-08 Andreas Schwab <schwab@suse.de>
* libio/fileno.c: Return -1 instead of EOF and set errno if the
stream is not a real file stream.
1999-07-08 Andreas Schwab <schwab@suse.de>
* manual/charset.texi: Fix typos.