According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
Thus this patch fixes this behaviour for converting from utf32 to internal and
from internal to utf8.
Furthermore the conversion from utf16 to internal does not report an error if the
input-stream consists of two low-surrogate values. If an uint16_t value is in the
range of 0xd800 .. 0xdfff, the next uint16_t value is checked, if it is in the
range of a low surrogate (0xdc00 .. 0xdfff). Afterwards these two uint16_t
values are interpreted as a high- and low-surrogates pair. But there is no test
if the first uint16_t value is really in the range of a high-surrogate
(0xd800 .. 0xdbff). If there would be two uint16_t values in the range of a low
surrogate, then they will be treated as a valid high- and low-surrogates pair.
This patch adds this test.
This patch also adds a new testcase, which checks UTF conversions with input
values in range of UTF16 surrogates. The test converts from UTF-xx to INTERNAL,
INTERNAL to UTF-xx and directly between UTF-xx to UTF-yy. The latter conversion
is needed because s390 has iconv-modules, which converts from/to UTF in one step.
The new testcase was tested on a s390, power and intel machine.
ChangeLog:
[BZ #19727]
* iconvdata/utf-16.c (BODY): Report an error if first word is not a
valid high surrogate.
* iconvdata/utf-32.c (BODY): Report an error if the value is in range
of an utf16 surrogate.
* iconv/gconv_simple.c (BODY): Likewise.
* iconvdata/bug-iconv12.c: New file.
* iconvdata/Makefile (tests): Add bug-iconv12.
rename test
* iconv/gconv_simple.c (internal_ucs4le_loop_unaligned): Return
__GCONV_EMPTY_INPUT only if input is really empty. Otherwise
__GCONV_INCOMPLETE_INPUT.
(ucs4le_internal_loop): Likewise.
(ucs4le_internal_loop_unaligned): Likewise.
* iconvdata/unicode.c (PREPARE_LOOP): Likewise.
* iconvdata/utf-16.c (PREPARE_LOOP): Likewise.
* iconvdata/utf-32.c (PREPARE_LOOP): Likewise.
* iconv/loop.c (LOOPFCT): First test for empty input then for full
output buffer.
2002-05-26 Bruno Haible <bruno@clisp.org>
* iconv/loop.c (STANDARD_FROM_LOOP_ERR_HANDLER): New macro.
(STANDARD_TO_LOOP_ERR_HANDLER): Renamed from STANDARD_ERR_HANDLER.
All callers changed.
* iconv/gconv_simple.c (ascii_internal_loop): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(utf8_internal_loop): Likewise.
(ucs2_internal_loop): Likewise.
(internal_ucs2_loop): Perform error handling like in
STANDARD_FROM_LOOP_ERR_HANDLER.
* iconvdata/unicode.c (BODY for TO_LOOP): Perform error handling like
in STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
* iconvdata/utf-16.c (BODY for TO_LOOP): Perform error handling like
in STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
* iconvdata/utf-32.c (BODY for TO_LOOP): Perform error handling like
in STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
* iconvdata/big5.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
* iconvdata/iso-2022-jp.c (BODY for FROM_LOOP): Likewise.
* iconvdata/8bit-gap.c (BODY for FROM_LOOP): Likewise.
* iconvdata/8bit-generic.c (BODY for FROM_LOOP): Likewise.
* iconvdata/ansi_x3.110.c (BODY for FROM_LOOP): Likewise.
* iconvdata/armscii-8.c (BODY for FROM_LOOP): Likewise.
* iconvdata/cp1255.c (BODY for FROM_LOOP): Likewise.
* iconvdata/cp1258.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-cn.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-jisx0213.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-jp.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-kr.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-tw.c (BODY for FROM_LOOP): Likewise.
* iconvdata/big5hkscs.c (BODY for FROM_LOOP): Likewise.
* iconvdata/gb18030.c (BODY for FROM_LOOP): Likewise.
* iconvdata/gbk.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-cn-ext.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-cn.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-jp-3.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-kr.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso646.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso_6937-2.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso_6937.c (BODY for FROM_LOOP): Likewise.
* iconvdata/johab.c (BODY for FROM_LOOP): Likewise.
* iconvdata/shift_jisx0213.c (BODY for FROM_LOOP): Likewise.
* iconvdata/sjis.c (BODY for FROM_LOOP): Likewise.
* iconvdata/t.61.c (BODY for FROM_LOOP): Likewise.
* iconvdata/uhc.c (BODY for FROM_LOOP): Likewise.
* iconvdata/utf-7.c (BODY for FROM_LOOP): Likewise.
* iconvdata/gbbig5.c (BODY for FROM_LOOP): Likewise. When ignoring
an error, still set result = __GCONV_ILLEGAL_INPUT.
(BODY for TO_LOOP): Likewise.
* iconvdata/ibm930.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm932.c: Include <dlfcn.h> and <stdint.h>.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm933.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm935.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm937.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm939.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm943.c: Include <dlfcn.h> and <stdint.h>.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/gbgbk.c (BODY for FROM_LOOP): Update.
* iconvdata/iso8859-1.c (BODY for TO_LOOP): Update.
* iconvdata/tcvn5712-1.c (BODY for TO_LOOP): Update.
2002-06-28 Kaz Kojima <kkojima@rr.iij4u.or.jp>
* sysdeps/sh/dl-machine.h (elf_machine_load_address): Use local
labels in assembler instructions.
2001-07-06 Paul Eggert <eggert@twinsun.com>
* manual/argp.texi: Remove ignored LGPL copyright notice; it's
not appropriate for documentation anyway.
* manual/libc-texinfo.sh: "Library General Public License" ->
"Lesser General Public License".
2001-07-06 Andreas Jaeger <aj@suse.de>
* All files under GPL/LGPL version 2: Place under LGPL version
2.1.
2001-04-12 Bruno Haible <haible@clisp.cons.org>
* iconvdata/TESTS2: New file.
* iconvdata/run-iconv-test.sh: Also run tests from TESTS2.
* iconvdata/testdata/alfabeta..UTF-8: New file.
* iconvdata/testdata/alfabeta..UTF-16.BE: New file.
* iconvdata/testdata/alfabeta..UTF-16.LE: New file.
* iconvdata/testdata/alfabeta..UTF-32.BE: New file.
* iconvdata/testdata/alfabeta..UTF-32.LE: New file.
2001-04-11 Bruno Haible <haible@clisp.cons.org>
* iconvdata/utf-32.c: New file.
* iconvdata/gconv-modules: Add entries for UTF-32, UTF-32LE, UTF-32BE.
* iconvdata/Makefile (modules): Add UTF-32.
(distribute): Add utf-32.c.
2001-04-11 Bruno Haible <haible@clisp.cons.org>
* iconvdata/utf-16.c (PREPARE_LOOP): Initialize 'swap' after possibly
changing it in the state. After incrementing 'inptr', store it back.
* iconvdata/unicode.c (PREPARE_LOOP): After incrementing 'inptr',
store it back.
2001-04-11 Bruno Haible <haible@clisp.cons.org>
* iconvdata/utf-16.c (gconv_init): Use MAX_NEEDED_FROM, not
MIN_NEEDED_FROM.
2000-11-02 Ulrich Drepper <drepper@redhat.com>
* iconvdata/utf-16.c (PREPARE_LOOP): Correct typo preventing BOM from
being written.
* manual/socket.texi (Local Namespace Concepts): Don't mention what
permissions are necessary to connect to a socket.
Reported by Peter Eisentraut <peter_e@gmx.net>.
* sysdeps/generic/backtracesyms.c (__backtrace_symbols): Fix
computation of total for 64-bit machines.
Patch by Byron Stanoszek <gandalf@winds.org>.
* manual/arith.texi (Rounding): Correct description of fesetround
return value. Patch by Conrado Badenas <Conrado.Badenas@uv.es>.
2000-09-18 Ulrich Drepper <drepper@redhat.com>
* version.h (VERSION): Bump to 2.1.94.
* malloc/mtrace.c (mtrace): Mark stream as close on exec.
2000-09-17 Bruno Haible <haible@clisp.cons.org>
* iconvdata/utf-16.c (BODY for TO_LOOP): Reject UCS-4 input in the
range 0xD800..0xDFFF.
* iconvdata/unicode.c (BODY for TO_LOOP): Likewise.
(BODY for FROM_LOOP): Likewise.
* iconv/gconv_simple.c (ucs2_internal_loop): Likewise.
(internal_ucs2_loop): Likewise.
(ucs2reverse_internal_loop): Likewise.
(internal_ucs2reverse_loop): Likewise.
2000-09-17 Bruno Haible <haible@clisp.cons.org>
* iconvdata/utf-16.c (gconv_init): Add missing slashes to encoding
names.
2000-09-17 Bruno Haible <haible@clisp.cons.org>
* iconvdata/tst-table-from.c (main): Fix test for error on stdout.
* iconvdata/tst-table-to.c (main): Likewise.
2000-09-17 Bruno Haible <haible@clisp.cons.org>
* iconvdata/iso-ir-165.c (__isoir165_from_tab): Renamed from
__isoir165_tab.
* iconvdata/cns11643.h (__cns11643l1_to_ucs4_tab): New declaration.
* iconvdata/iso-2022-cn-ext.c: Include "cns11643.h".
(GB7590_set, GB13132_set, CNS11643_3_set, CNS11643_4_set,
CNS11643_5_set, CNS11643_6_set, CNS11643_7_set): Change enum values.
(BODY for FROM_LOOP): Fix buffer overrun. Treat CNS11643 plane 3.
Return __GCONV_INCOMPLETE_INPUT instead of __GCONV_EMPTY_INPUT.
(BODY for TO_LOOP): Fix usage of `set' vs. `used'. Fix typo that
caused GB2312 to be used instead of ISO-IR-165. Treat CNS11643
plane 3. Fix shift sequences. Output announcement for SS2 and SS3
encodings when needed. When outputting an announcement, don't clear
most other announcements.
2000-09-17 Bruno Haible <haible@clisp.cons.org>
* iconvdata/iso-2022-cn.c (BODY for FROM_LOOP): Fix buffer overrun.
(BODY for TO_LOOP): Fix usage of `set' vs. `used'.
2000-09-14 Bruno Haible <haible@clisp.cons.org>
* intl/Versions: Add bind_textdomain_codeset.
2000-06-19 Ulrich Drepper <drepper@redhat.com>
* iconv/gconv.h (__gconv_trans_fct): Add new parameter.
General namespace cleanup.
(struct __gconv_trans_data): Add next field.
(struct __gconv_step_data): Make __trans a pointer.
* iconv/gconv_conf.c: Split out code to find gconv directories from
__gconv_read_conf in new functions.
* iconv/gconv_int.h: Define new data structure and declare new
functions for handling of gconv directory list.
* iconv/gconv_open.c: Allow more than one error handling step being
used. Call function to load error handling module if it is none
of the builtin transformations.
* iconv/gconv_close.c: Add code to free transliteration data.
* iconv/gconv_trans.c: Add functions to load and unload modules
implementing transliteration etc.
* iconv/skeleton.c: Call all context functions now that more than
one module is allowed.
* iconv/loop.c (STANDARD_ERR_HANDLING): New macro.
* iconv/gconv_simple.c: Use STANDARD_ERR_HANDLING macro for places
where the full error handling using transliteration is needed.
* iconvdata/8bit-gap.c: Likewise.
* iconvdata/8bit-generic.c: Likewise.
* iconvdata/ansi_x3.110.c: Likewise.
* iconvdata/big5.c: Likewise.
* iconvdata/big5hkscs.c: Likewise.
* iconvdata/euc-cn.c: Likewise.
* iconvdata/euc-jp.c: Likewise.
* iconvdata/euc-kr.c: Likewise.
* iconvdata/euc-tw.c: Likewise.
* iconvdata/gbgbk.c: Likewise.
* iconvdata/gbk.c: Likewise.
* iconvdata/iso-2022-cn.c: Likewise.
* iconvdata/iso-2022-jp.c: Likewise.
* iconvdata/iso-2022-kr.c: Likewise.
* iconvdata/iso646.c: Likewise.
* iconvdata/iso8859-1.c: Likewise.
* iconvdata/iso_6937-2.c: Likewise.
* iconvdata/iso_6937.c: Likewise.
* iconvdata/johab.c: Likewise.
* iconvdata/sjis.c: Likewise.
* iconvdata/t.61.c: Likewise.
* iconvdata/uhc.c: Likewise.
* iconvdata/unicode.c: Likewise.
* iconvdata/utf-16.c: Likewise.
* libio/iofwide.c: Reset __trans member of __gconv_trans_data
structure correctly after last change.
* wcsmbs/btowc.c: Likewise.
* wcsmbs/mbrtowc.c: Likewise.
* wcsmbs/mbsnrtowcs.c: Likewise.
* wcsmbs/mbsrtowcs.c: Likewise.
* wcsmbs/wcrtomb.c: Likewise.
* wcsmbs/wcsnrtombs.c: Likewise.
* wcsmbs/wcsrtombs.c: Likewise.
* wcsmbs/wctob.c: Likewise.
* localedata/Makefile: Set -Wno-format for some files since gcc does
not know all the format specifiers.
2000-06-18 Ulrich Drepper <drepper@redhat.com>
* locale/loadlocale.c (_nl_unload_locale): Remove a bit of
unneeded code.
* locale/lc-time.c (_nl_init_era_entries): Likewise.
2000-06-16 Ulrich Drepper <drepper@redhat.com>
* iconv/gconv_int.h (norm_add_slashes): Optionally add given suffix.
* iconv/gconv_open.c: Remove error handling specification from `from'
character set name.
* intl/loadmsgcat.c (_nl_load_domain): Call norm_add_slashes with
new parameter to always enable transliteration.
* locale/localeinfo.h (LIMAGIC): Bump number because of incompatible
change.
(struct locale_data): Add new members use_translit and options.
* locale/findlocale.c (_nl_find_locale): Set use_translit flag is
character set name contained modifier TRANSLIT.
* locale/loadlocale.c (_nl_load_locale): Initialize new use_translit
and options fields.
(_nl_unload_locale): Free options string if necessary.
* wcsmbs/wcsmbsload.c (__wcsmbs_load_conv): Enable translation if
the locale names suggested this.
* locale/C-address.c: Add two new initialilzers to adjust data
structure for new format.
* locale/C-collate.c: Likewise.
* locale/C-ctype.c: Likewise.
* locale/C-identification.c: Likewise.
* locale/C-measurement.c: Likewise.
* locale/C-messages.c: Likewise.
* locale/C-monetary.c: Likewise.
* locale/C-name.c: Likewise.
* locale/C-numeric.c: Likewise.
* locale/C-paper.c: Likewise.
* locale/C-telephone.c: Likewise.
* locale/C-time.c: Likewise.
* locale/setlocale.c: Add some more __builtin_expect.
2000-04-09 Ulrich Drepper <drepper@redhat.com>
Implement handling of restartable conversion functions according to
ISO C.
* iconv/gconv.h (__gconv_fct): Add additional parameter.
* iconv/gconv_int.h (__BUILTIN_TRANS): Likewise.
* iconv/gconv.c: Pass additional parameter to conversion function.
* iconv/gconv_simple.c (internal_ucs4_loop_single): New function.
(internal_ucs4le_loop_single): New function.
(__gconv_transform_ascii_internal): Define ONE_DIRECTION.
(__gconv_transform_internal_ascii): Likewise.
(__gconv_transform_internal_utf8): Likewise.
(__gconv_transform_utf8_internal): Likewise.
(__gconv_transform_ucs2_internal): Likewise.
(__gconv_transform_internal_ucs2): Likewise.
(__gconv_transform_ucs2reverse_internal): Likewise.
(__gconv_transform_internal_ucs2reverse): Likewise.
(internal_ucs4le_loop_unaligned): Before return
__GCONV_INCOMPLETE_INPUT check that the remaining bytes really form
a valid character. Otherwise return __GCONV_ILLEGAL_INPUT.
(__gconv_transform_utf8_internal): Define STORE_REST and UNPACK_BYTES.
* iconv/loop.c: Fit in definition of function to convert one character
for processing of left-over bytes from the state object.
* iconv/skeleton.c (gconv): Rename inbuf to inptrp and inbufend to
inend to match names in loop functions.
(RESET_INPUT_BUFFER): Change apprpriately.
(gconv): If needed, call function to process bytes from the state
object. Similar at the end: store left over bytes if input is
incomplete.
Take extra argument and add new argument to all calls of the
conversion function.
* iconvdata/iso-2022-cn.c: Adjust numeric values used to store
information in the state object to not conflict with length count.
* iconvdata/iso-2022-jp.c: Likewise.
* iconvdata/iso-2022-kr.c: Likewise.
* iconvdata/unicode.c: Adjust for change change in parameters of
skeleton function.
* iconvdata/utf-16.c: Likewise.
* libio/iofwide.c: Add new parameter to all calls of conversion
function.
* wcsmbs/btowc.c: Likewise.
* wcsmbs/mbrtowc.c: Likewise.
* wcsmbs/mbsnrtowcs.c: Likewise.
* wcsmbs/mbsrtowcs.c: Likewise.
* wcsmbs/wcrtomb.c: Likewise.
* wcsmbs/wcsnrtombs.c: Likewise.
* wcsmbs/wcsrtombs.c: Likewise.
* wcsmbs/wctob.c: Likewise.
* iconvdata/gbgbk.c: Always define MAX_NEEDED_OUTPUT and
MAX_NEEDED_INPUT.
2000-03-28 Ulrich Drepper <drepper@redhat.com>
* iconvdata/TESTS: Use UCS-2BE instead of UCS2.
* iconv/loop.c: Define get16, get32, put16, and put32 macros to
allow as well reading from/writing to unaligned addresses on machines
which don't support this in hardware. Use FCTNAME macro to define
function name. Include the file a second time for platforms which
need special unaligned handling.
* iconv/skeleton.c: Define get16u, get32u, put16u, and put32u macros
to access potentially unaligned addresses. These macros are intended
to be used only outside the loops.
(unaligned): New definition. In case the machine can handle unaligned
access define as zero. Otherwise as a variable which is initialized
as nonzero in case the buffer passed in at runtime is unaligned with
respect to the character set encoding involved.
Call aligned or unaligned looop functions according to unaligned
variable.
* iconvdata/8bit-gap.c: Use get16, get32, put16, and put32 instead
of direct casting pointer to potentially handle unaligned memory
accesses.
* iconvdata/8bit-generic.c: Likewise.
* iconvdata/ansi_x3.110.c: Likewise.
* iconvdata/big5.c: Likewise.
* iconvdata/euc-cn.c: Likewise.
* iconvdata/euc-jp.c: Likewise.
* iconvdata/euc-kr.c: Likewise.
* iconvdata/euc-tw.c: Likewise.
* iconvdata/gbk.c: Likewise.
* iconvdata/iso-2022-cn.c: Likewise.
* iconvdata/iso-2022-jp.c: Likewise.
* iconvdata/iso-2022-kr.c: Likewise.
* iconvdata/iso646.c: Likewise.
* iconvdata/iso_6937-2.c: Likewise.
* iconvdata/iso_6937.c: Likewise.
* iconvdata/johab.c: Likewise.
* iconvdata/sjis.c: Likewise.
* iconvdata/t.61.c: Likewise.
* iconvdata/uhc.c: Likewise.
* iconvdata/unicode.c: Likewise.
* iconvdata/utf-16.c: Likewise.
* locale/programs/simple-hash.c: Little optimizations. Remove K&R
prototypes.
* malloc/Versions [libc] (GLIBC_2.2): Add mcheck_check_all.
* malloc/mcheck.c (mcheck_check_all): Renamed from check_all and made
public.
* malloc/mcheck.h (mcheck_check_all): Declare.
* stdio-common/Makefile (tests): Add tst-obprintf.
2000-02-13 Ulrich Drepper <drepper@redhat.com>
* iconvdata/Makefile (modules): Add UTF-16.
(distribute): Add utf-16.c.
* iconvdata/gconv-modules: Add entries for UTF-16, UTF-16BE, and
UTF-16LE.
* iconvdata/utf-16.c: New file.
* iconv/gconv_builtin.h: Remove UTF-16 entries here.
* iconv/gconv_simple.c: Remove conversion functions to and from UTF-16.
* iconv/skeleton.c: Increment __invocation_coounter after every call
to the loops.