In several converters, a __GCONV_ILLEGAL_INPUT result gets overwritten
with __GCONV_FULL_OUTPUT. As a result, iconv (the function) returns
E2BIG instead of EILSEQ. The iconv program does not see the original
EILSEQ failure, does not recognize the invalid input, and may
incorrectly exit successfully.
To address this, a new __flags bit is used to indicate a sticky input
error state. All __GCONV_ILLEGAL_INPUT results are replaced with a
function call that sets this new __GCONV_ENCOUNTERED_ILLEGAL_INPUT and
returns __GCONV_ILLEGAL_INPUT. The iconv program checks for
__GCONV_ENCOUNTERED_ILLEGAL_INPUT and overrides the exit status.
The converter changes introducing __gconv_mark_illegal_input are
mostly mechanical, except for the res variable initialization in
iconvdata/iso-2022-jp.c: this error gets overwritten with __GCONV_OK
and other results in the following code. If res ==
__GCONV_ILLEGAL_INPUT afterwards, STANDARD_TO_LOOP_ERR_HANDLER below
will handle it.
The __gconv_mark_illegal_input changes do not alter the errno value
set by the iconv function. This is simpler to implement than
reviewing each __GCONV_FULL_OUTPUT result and adjust it not to
override a previous __GCONV_ILLEGAL_INPUT result. Doing it that way
would also change some E2BIG errors in to EILSEQ errors, so it had to
be done conditionally (under a flag set by the iconv program only), to
avoid confusing buffer management in other applications.
Reviewed-by: DJ Delorie <dj@redhat.com>
All the changes are in comments or '#error' messages.
Applying this commit results in bit-identical rebuild of iconvdata/*.so
Reviewed-by: Florian Weimer <fw@deneb.enyo.de>
Use put/get macros __builtin_bswap32 instead. It allows to remove
the unaligned routines, the compiler will generate unaligned access
if the ABI allows it.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
On 32-bit machines this has no affect. On 64-bit machines
{u}int_fast{16|32} are set as {u}int64_t which is often not
ideal. Particularly x86_64 this change both saves code size and
may save instruction cost.
Full xcheck passes on x86_64.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Previously, in UCS4 conversion routines we limit the number of
characters we examine to the minimum of the number of characters in the
input and the number of characters in the output. This is not the
correct behavior when __GCONV_IGNORE_ERRORS is set, as we do not consume
an output character when we skip a code unit. Instead, track the input
and output pointers and terminate the loop when either reaches its
limit.
This resolves assertion failures when resetting the input buffer in a step of
iconv, which assumes that the input will be fully consumed given sufficient
output space.
According to the latest Unicode standard, a conversion from/to UTF-xx has
to report an error if the character value is in range of an utf16 surrogate
(0xd800..0xdfff). See https://sourceware.org/ml/libc-help/2015-12/msg00015.html.
Thus this patch fixes this behaviour for converting from utf32 to internal and
from internal to utf8.
Furthermore the conversion from utf16 to internal does not report an error if the
input-stream consists of two low-surrogate values. If an uint16_t value is in the
range of 0xd800 .. 0xdfff, the next uint16_t value is checked, if it is in the
range of a low surrogate (0xdc00 .. 0xdfff). Afterwards these two uint16_t
values are interpreted as a high- and low-surrogates pair. But there is no test
if the first uint16_t value is really in the range of a high-surrogate
(0xd800 .. 0xdbff). If there would be two uint16_t values in the range of a low
surrogate, then they will be treated as a valid high- and low-surrogates pair.
This patch adds this test.
This patch also adds a new testcase, which checks UTF conversions with input
values in range of UTF16 surrogates. The test converts from UTF-xx to INTERNAL,
INTERNAL to UTF-xx and directly between UTF-xx to UTF-yy. The latter conversion
is needed because s390 has iconv-modules, which converts from/to UTF in one step.
The new testcase was tested on a s390, power and intel machine.
ChangeLog:
[BZ #19727]
* iconvdata/utf-16.c (BODY): Report an error if first word is not a
valid high surrogate.
* iconvdata/utf-32.c (BODY): Report an error if the value is in range
of an utf16 surrogate.
* iconv/gconv_simple.c (BODY): Likewise.
* iconvdata/bug-iconv12.c: New file.
* iconvdata/Makefile (tests): Add bug-iconv12.
rename test
When converting from UCS4LE to INTERNAL, the input-value is checked for a too
large value and the iconv() call sets errno to EILSEQ. In this case the inbuf
argument of the iconv() call should point to the invalid character, but it
points to the beginning of the inbuf.
Thus this patch updates the pointers inptrp and outptrp before returning in
this error case.
This patch also adds a new testcase for this issue.
The new test was tested on a s390, power, intel machine.
ChangeLog:
[BZ #19726]
* iconv/gconv_simple.c (ucs4le_internal_loop): Update inptrp and
outptrp in case of an illegal input.
* iconv/tst-iconv6.c: New file.
* iconv/Makefile (tests): Add tst-iconv6.
This patch defines _STRING_ARCH_unaligned to 0 on default bits/string.h
header to avoid undefined compiler warnings on platforms that do not
define it. It also make adjustments in code where tests checked if macro
existed or not.
* sysdeps/generic/strchrnul.c: Add cast to avoid warning.
* libio/iofwide.c: Add casts to avoid warnings.
* stdio-common/printf-prs.c (parse_printf_format): Introduce new
variable f to avoid warnings.
* sysdeps/unix/sysv/linux/x86_64/makecontext.c (__makecontext):
Fix a few casts to avoid warnings.
* iconv/gconv_simple.c (internal_utf8_loop): Make start unsigned
to avoid warning.
* iconv/gconv_db.c: Don't define lock as static. Rename to
__gconv_lock and export from the file.
* iconv/gconv_int.h: Declare __gconv_lock.
* libio/iofclose.c [_LIBC] (_IO_new_fclose): Lock gconv lock
before __gconv_release_step calls.
Patch by Shunichi Sagawa <s-sagawa@jp.fujitsu.com>.
* iconv/gconv_simple.c (internal_ucs4_loop): Fix typo in last change.
* iconv/gconv_db.c
* iconv/gconv_simple.c (STORE_REST): Explicitly store the total
expected size into state.
(UNPACK_BYTES): Do the reverse.
* wcsmbs/tst-mbrtowc.c (utf8_test_1): Add test for the bug.
Reported by Al Viro <aviro@redhat.com>.
2002-11-30 Bruno Haible <bruno@clisp.org>
* iconv/gconv.h (__gconv_btowc_fct): New typedef.
(struct __gconv_step): New field __btowc_fct.
* wcsmbs/btowc.c (__btowc): Use the __btowc_fct shortcut if possible.
* iconv/gconv_int.h (__BUILTIN_TRANSFORM): Renamed from
__BUILTIN_TRANS.
(__gconv_btwoc_ascii): New declaration.
* iconv/gconv_simple.c (BUILTIN_TRANSFORMATION): Add BtowcFct argument.
(__gconv_btwoc_ascii): New function.
* iconv/gconv_builtin.h: Add BtowcFct argument to all
BUILTIN_TRANSFORMATION invocations.
* iconv/gconv_conf.c (BUILTIN_TRANSFORMATION): Add BtowcFct argument.
* iconv/iconvconfig.c (BUILTIN_TRANSFORMATION): Likewise.
* iconv/gconv_builtin.c (map): New field btowc_fct.
(BUILTIN_TRANSFORMATION): Add BtowcFct argument. Use it to initialize
btowc_fct field.
(__gconv_get_builtin_trans): Initialize __btowc_fct field.
* iconv/gconv_cache.c (find_module): Initialize __btowc_fct field.
* iconv/gconv_db.c (gen_steps, increment_counter): Likewise.
* wcsmbs/wcsmbsload.c (to_wc, to_mb): Likewise.
* iconv/skeleton.c: Document STORE_REST and FROM_ONEBYTE.
(gconv_init): Initialize __btowc_fct field.
Undefine EXTRA_LOOP_ARGS and FROM_ONEBYTE at the end.
* iconv/loop.c: Document ONEBYTE_BODY.
(gconv_btowc, FROM_ONEBYTE): Define if ONEBYTE_BODY is defined.
Undefine ONEBYTE_BODY at the end.
* iconvdata/8bit-generic.c (ONEBYTE_BODY): New macro.
* iconvdata/8bit-gap.c (NONNUL): New macro.
(BODY for FROM_LOOP): Use it.
(ONEBYTE_BODY): New macro.
* iconvdata/isiri-3342.c (HAS_HOLES): Set to 1.
(NONNUL): New macro.
* iconvdata/ansi_x3.110.c (ONEBYTE_BODY): New macro.
* iconvdata/armscii-8.c (ONEBYTE_BODY): New macro.
* iconvdata/cp1255.c (ONEBYTE_BODY): New macro.
* iconvdata/cp1258.c (ONEBYTE_BODY): New macro.
* iconvdata/tcvn5712-1.c (ONEBYTE_BODY): New macro.
* iconvdata/big5.c (ONEBYTE_BODY): New macro.
* iconvdata/big5hkscs.c (ONEBYTE_BODY): New macro.
* iconvdata/euc-cn.c (ONEBYTE_BODY): New macro.
* iconvdata/euc-jp.c (ONEBYTE_BODY): New macro.
* iconvdata/euc-jisx0213.c (ONEBYTE_BODY): New macro.
* iconvdata/euc-kr.c (ONEBYTE_BODY): New macro.
* iconvdata/euc-tw.c (ONEBYTE_BODY): New macro.
* iconvdata/gbk.c (ONEBYTE_BODY): New macro.
* iconvdata/gb18030.c (ONEBYTE_BODY): New macro.
* iconvdata/ibm932.c: Include <stdbool.h>.
(TRUE, FALSE): Remove macros.
(BODY for FROM_LOOP): Remove unused variable rp1.
(ONEBYTE_BODY): New macro.
(BODY for TO_LOOP): Use bool.
* iconvdata/ibm932.h (__ibm932sb_to_ucs4_idx): Remove array.
* iconvdata/ibm943.c: Include <stdbool.h>.
(TRUE, FALSE): Remove macros.
(BODY for FROM_LOOP): Remove unused variable rp1.
(ONEBYTE_BODY): New macro.
(BODY for TO_LOOP): Use bool.
* iconvdata/ibm943.h (__ibm943sb_to_ucs4_idx): Remove array.
* iconvdata/iso8859-1.c (ONEBYTE_BODY): New macro.
* iconvdata/iso_6937-2.c (ONEBYTE_BODY): New macro.
* iconvdata/iso_6937.c (ONEBYTE_BODY): New macro.
* iconvdata/johab.c (ONEBYTE_BODY): New macro.
* iconvdata/sjis.c (ONEBYTE_BODY): New macro.
* iconvdata/shift_jisx0213.c (ONEBYTE_BODY): New macro.
* iconvdata/t.61.c (ONEBYTE_BODY): New macro.
* iconvdata/uhc.c (ONEBYTE_BODY): New macro.
* iconvdata/gbbig5.c: Tweak comment.
* iconv/gconv_simple.c (internal_ucs4le_loop_unaligned): Return
__GCONV_EMPTY_INPUT only if input is really empty. Otherwise
__GCONV_INCOMPLETE_INPUT.
(ucs4le_internal_loop): Likewise.
(ucs4le_internal_loop_unaligned): Likewise.
* iconvdata/unicode.c (PREPARE_LOOP): Likewise.
* iconvdata/utf-16.c (PREPARE_LOOP): Likewise.
* iconvdata/utf-32.c (PREPARE_LOOP): Likewise.
* iconv/loop.c (LOOPFCT): First test for empty input then for full
output buffer.
2002-05-26 Bruno Haible <bruno@clisp.org>
* iconv/loop.c (STANDARD_FROM_LOOP_ERR_HANDLER): New macro.
(STANDARD_TO_LOOP_ERR_HANDLER): Renamed from STANDARD_ERR_HANDLER.
All callers changed.
* iconv/gconv_simple.c (ascii_internal_loop): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(utf8_internal_loop): Likewise.
(ucs2_internal_loop): Likewise.
(internal_ucs2_loop): Perform error handling like in
STANDARD_FROM_LOOP_ERR_HANDLER.
* iconvdata/unicode.c (BODY for TO_LOOP): Perform error handling like
in STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
* iconvdata/utf-16.c (BODY for TO_LOOP): Perform error handling like
in STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
* iconvdata/utf-32.c (BODY for TO_LOOP): Perform error handling like
in STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
* iconvdata/big5.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
* iconvdata/iso-2022-jp.c (BODY for FROM_LOOP): Likewise.
* iconvdata/8bit-gap.c (BODY for FROM_LOOP): Likewise.
* iconvdata/8bit-generic.c (BODY for FROM_LOOP): Likewise.
* iconvdata/ansi_x3.110.c (BODY for FROM_LOOP): Likewise.
* iconvdata/armscii-8.c (BODY for FROM_LOOP): Likewise.
* iconvdata/cp1255.c (BODY for FROM_LOOP): Likewise.
* iconvdata/cp1258.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-cn.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-jisx0213.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-jp.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-kr.c (BODY for FROM_LOOP): Likewise.
* iconvdata/euc-tw.c (BODY for FROM_LOOP): Likewise.
* iconvdata/big5hkscs.c (BODY for FROM_LOOP): Likewise.
* iconvdata/gb18030.c (BODY for FROM_LOOP): Likewise.
* iconvdata/gbk.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-cn-ext.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-cn.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-jp-3.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso-2022-kr.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso646.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso_6937-2.c (BODY for FROM_LOOP): Likewise.
* iconvdata/iso_6937.c (BODY for FROM_LOOP): Likewise.
* iconvdata/johab.c (BODY for FROM_LOOP): Likewise.
* iconvdata/shift_jisx0213.c (BODY for FROM_LOOP): Likewise.
* iconvdata/sjis.c (BODY for FROM_LOOP): Likewise.
* iconvdata/t.61.c (BODY for FROM_LOOP): Likewise.
* iconvdata/uhc.c (BODY for FROM_LOOP): Likewise.
* iconvdata/utf-7.c (BODY for FROM_LOOP): Likewise.
* iconvdata/gbbig5.c (BODY for FROM_LOOP): Likewise. When ignoring
an error, still set result = __GCONV_ILLEGAL_INPUT.
(BODY for TO_LOOP): Likewise.
* iconvdata/ibm930.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm932.c: Include <dlfcn.h> and <stdint.h>.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm933.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm935.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm937.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm939.c (BODY for FROM_LOOP): For error handling use
STANDARD_FROM_LOOP_ERR_HANDLER.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/ibm943.c: Include <dlfcn.h> and <stdint.h>.
(BODY for FROM_LOOP): Use STANDARD_FROM_LOOP_ERR_HANDLER for error
handling.
(BODY for TO_LOOP): Here use STANDARD_TO_LOOP_ERR_HANDLER.
* iconvdata/gbgbk.c (BODY for FROM_LOOP): Update.
* iconvdata/iso8859-1.c (BODY for TO_LOOP): Update.
* iconvdata/tcvn5712-1.c (BODY for TO_LOOP): Update.
2002-06-28 Kaz Kojima <kkojima@rr.iij4u.or.jp>
* sysdeps/sh/dl-machine.h (elf_machine_load_address): Use local
labels in assembler instructions.