The consolidation of configuration parsing broke behaviour with
--prefix, where the prefix bled into the modules cache. Accept a
prefix which, when non-NULL, is prepended to the path when looking for
configuration files but only the original directory is added to the
modules cache.
This has no effect on the codegen of gconv_conf since it passes NULL.
Reported-by: Patrick McCarty <patrick.mccarty@intel.com>
Reported-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
We add a new C.UTF-8 locale. This locale is not builtin to glibc, but
is provided as a distinct locale. The locale provides full support for
UTF-8 and this includes full code point sorting via STRCMP-based
collation (strcmp or wcscmp).
The collation uses a new keyword 'codepoint_collation' which drops all
collation rules and generates an empty zero rules collation to enable
STRCMP usage in collation. This ensures that we get full code point
sorting for C.UTF-8 with a minimal 1406 bytes of overhead (LC_COLLATE
structure information and ASCII collating tables).
The new locale is added to SUPPORTED. Minimal test data for specific
code points (minus those not supported by collate-test) is provided in
C.UTF-8.in, and this verifies code point sorting is working reasonably
across the range. The locale was tested manually with the full set of
code points without failure.
The locale is harmonized with locales already shipping in various
downstream distributions. A new tst-iconv9 test is added which verifies
the C.UTF-8 locale is generally usable.
Testing for fnmatch, regexec, and recomp is provided by extending
bug-regex1, bugregex19, bug-regex4, bug-regex6, transbug, tst-fnmatch,
tst-regcomp-truncated, and tst-regex to use C.UTF-8.
Tested on x86_64 or i686 without regression.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Current GCC mainline produces -Wstringop-overflow errors building some
iconv converters, as discussed at
<https://gcc.gnu.org/pipermail/gcc/2021-July/236943.html>. Add an
__builtin_unreachable call as suggested so that GCC can see the case
that would involve a buffer overflow is unreachable; because the
unreachability depends on valid conversion state being passed into the
function from previous conversion steps, it's not something the
compiler can reasonably deduce on its own.
Tested with build-many-glibcs.py that, together with
<https://sourceware.org/pipermail/libc-alpha/2021-August/130244.html>,
it restores the glibc build for powerpc-linux-gnu.
The allocated `conf` would leak if we have to skip over the file due
to the underlying filesystem not supporting dt_type.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Build of iconvconfig failed with CFLAGS=-Os since __feof_unlocked is
not a public symbol. Replace with feof_unlocked (defined to
__feof_unlocked when IS_IN (libc)) to fix this.
Reported-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
It was noticed on big-endian systems that msgfmt would fail with the
following error:
msgfmt: gconv_builtin.c:70: __gconv_get_builtin_trans: Assertion `cnt < sizeof (map) / sizeof (map[0])' failed.
Aborted (core dumped)
This is only seen on installed systems because it was due to a
corrupted gconv-modules.cache. iconvconfig had the following issues
(it was specifically freeing fulldir that caused this issue, but other
cleanups are also needed) that this patch fixes.
- Add prefix only if dir starts with '/'
- Use asprintf instead of mempcpy so that the directory string is NULL
terminated
- Make a copy of the directory reference in new_module so that fulldir
can be freed within the same scope in handle_dir.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
On filesystems that do not support dt_type, a regular file shows up as
DT_UNKNOWN. Fall back to using lstat64 to read file properties in
such cases.
Reviewed-by: DJ Delorie <dj@redhat.com>
Drop local copy of gconv file parsing and use the one in
gconv_parseconfdir.h instead. Now there is a single implementation of
configuration file parsing.
Reviewed-by: DJ Delorie <dj@redhat.com>
Split configuration file processing into a separate header file and
include it. Macroize all calls that need to go through internal
interfaces so that iconvconfig can also use them.
Reviewed-by: DJ Delorie <dj@redhat.com>
The modules and nmodules parameters passed to add_modules, add_alias,
etc. are not used and are hence unnecessary. Remove them so that
their signatures match the functions in iconvconfig.
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
The alloca sizes ought to be constrained to PATH_MAX, but replace them
with dynamic allocation to be safe. A static PATH_MAX array would
have worked too but Hurd does not have PATH_MAX and the code path is
not hot enough to micro-optimise this allocation. Revisit if any of
those realities change.
Reviewed-by: DJ Delorie <dj@redhat.com>
For the legacy ABI with supports 32-bit time_t it calls the 64-bit
time directly, since the LFS symbols calls the 64-bit time_t ones
internally.
Checked on i686-linux-gnu and x86_64-linux-gnu.
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Read configuration files with names ending in .conf in
GCONV_PATH/gconv-modules.d to mirror configuration flexibility in
iconvconfig into the iconv program and function.
Reviewed-by: DJ Delorie <dj@redhat.com>
In addition to GCONV_PATH/gconv-modules, also read module
configuration from *.conf files in GCONV_PATH/gconv-modules.d. This
allows a single gconv directory to have multiple sets of gconv modules
but at the same time, a single modules cache.
With this feature, one could separate the glibc supported gconv
modules into a minimal essential set (ISO-8859-*, UTF, etc.) from the
remaining modules. In future, these could be further segregated into
langpack-associated sets with their own
gconv-modules.d/someconfig.conf.
Reviewed-by: DJ Delorie <dj@redhat.com>
Split out configuration file handling code from handle_dir into its
own function so that it can be reused for multiple configuration
files.
Reviewed-by: DJ Delorie <dj@redhat.com>
When glibc is configured with --enable-hardcoded-path-in-tests,
"make xcheck" failed with
...
env GCONV_PATH=/export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/iconvdata LOCPATH=/export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/localedata LC_ALL=C /export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/iconv/iconvconfig --output=$tmp --nostdlib /usr/lib64/gconv;
...
/export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/iconv/iconvconfig: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /export/build/gnu/tools-build/glibc-cet-gitlab/build-x86_64-linux/iconv/iconvconfig)
...
FAIL: iconv/test-iconvconfig
Since $(objpfx)iconvconfig is an installed program, run it with
$(run-program-prefix).
I've updated copyright dates in glibc for 2021. This is the patch for
the changes not generated by scripts/update-copyrights and subsequent
build / regeneration of generated files. As well as the usual annual
updates, mainly dates in --version output (minus csu/version.c which
previously had to be handled manually but is now successfully updated
by update-copyrights), there is a small change to the copyright notice
in NEWS which should let NEWS get updated automatically next year.
Please remember to include 2021 in the dates for any new files added
in future (which means updating any existing uncommitted patches you
have that add new files to use the new copyright dates in them).
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Previously, in UCS4 conversion routines we limit the number of
characters we examine to the minimum of the number of characters in the
input and the number of characters in the output. This is not the
correct behavior when __GCONV_IGNORE_ERRORS is set, as we do not consume
an output character when we skip a code unit. Instead, track the input
and output pointers and terminate the loop when either reaches its
limit.
This resolves assertion failures when resetting the input buffer in a step of
iconv, which assumes that the input will be fully consumed given sufficient
output space.
The IBM1364, IBM1371, IBM1388, IBM1390 and IBM1399 character sets
share converter logic (iconvdata/ibm1364.c) which would reject
redundant shift sequences when processing input in these character
sets. This led to a hang in the iconv program (CVE-2020-27618).
This commit adjusts the converter to ignore redundant shift sequences
and adds test cases for iconv_prog hangs that would be triggered upon
their rejection. This brings the implementation in line with other
converters that also ignore redundant shift sequences (e.g. IBM930
etc., fixed in commit 692de4b396).
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Commit 91927b7c76 (Rewrite iconv option parsing [BZ #19519]) did not
handle cases where the output codeset for translations (via the `gettext'
family of functions) might have a caller specified encoding suffix such as
TRANSLIT or IGNORE. This led to a regression where translations did not
work when the codeset had a suffix.
This commit fixes the above issue by parsing any suffixes passed to
__dcigettext and adds two new test-cases to intl/tst-codeset.c to
verify correct behaviour. The iconv-internal function __gconv_create_spec
and the static iconv-internal function gconv_destroy_spec are now visible
internally within glibc and used in intl/dcigettext.c.
It replaces the internal usage of __{f,l}xstat{at}{64} with the
__{f,l}stat{at}{64}. It should not change the generate code since
sys/stat.h explicit defines redirections to internal calls back to
xstat* symbols.
Checked with a build for all affected ABIs. I also check on
x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Lukasz Majewski <lukma@denx.de>
This commit replaces string manipulation during `iconv_open' and iconv_prog
option parsing with a structured, flag based conversion specification. In
doing so, it alters the internal `__gconv_open' interface and accordingly
adjusts its uses.
This change fixes several hangs in the iconv program and therefore includes
a new test to exercise iconv_prog options that originally led to these hangs.
It also includes a new regression test for option handling in the iconv
function.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
On s390x, I've recognize various -Werror=stringop-overflow messages
in iconv/loop.c and iconv/skeleton.c if build with gcc10 -O3.
With this commit gcc knows the size and do not raise those errors anymore.
I've updated copyright dates in glibc for 2020. This is the patch for
the changes not generated by scripts/update-copyrights and subsequent
build / regeneration of generated files. As well as the usual annual
updates, mainly dates in --version output (minus libc.texinfo which
previously had to be handled manually but is now successfully updated
by update-copyrights), there is a fix to
sysdeps/unix/sysv/linux/powerpc/bits/termios-c_lflag.h where a typo in
the copyright notice meant it failed to be updated automatically.
Please remember to include 2020 in the dates for any new files added
in future (which means updating any existing uncommitted patches you
have that add new files to use the new copyright dates in them).
The changes introduce a memory leak for gconv steps arrays whose
first element is an internal conversion, which has a fixed
reference count which is not decremented. As a result, after the
change in commit 50ce3eae5b, the steps
array is never freed, resulting in an unbounded memory leak.
This reverts commit 50ce3eae5b
("gconv: Check reference count in __gconv_release_cache
[BZ #24677]") and commit 7e740ab2e7
("libio: Fix gconv-related memory leak [BZ #24583]"). It
reintroduces bug 24583. (Bug 24677 was just a regression caused by
the second commit.)
This fixes a regression introduced in commit
7e740ab2e7 ("libio: Fix gconv-related
memory leak [BZ #24583]").
__gconv_release_cache is only ever called with heap-allocated
arrays which contain at least one member. The statically allocated
ASCII steps are filtered out by __wcsmbs_close_conv.
Commit ba7b4d294b ("Complete the
removal of __gconv_translit_find") added a declaration of the
GLIBC_PRIVATE function, __gconv_transliterate, to the installed
header <gconv.h>. It should have been added to the internal
<gconv_int.h> header.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
When -Werror=parentheses is in use, iconvconfig.c builds fail with:
iconvconfig.c: In function ‘write_output’:
iconvconfig.c:1084:34: error: suggest parentheses around ‘+’ inside ‘>>’ [-Werror=parentheses]
hash_size = next_prime (nnames + nnames >> 1);
~~~~~~~^~~~~~~~
This patch adds parentheses to the expression. Not where suggested by
the compiler warning, but where it produces the expected result, i.e.:
where it has the effect of multiplying nnames by 1.5.
Likewise for elem_size in ld-collate.c.
Tested for powerpc64le.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Two cases of "int * 1.4" may result in imprecise results, which
in at least one case resulted in i686 and x86-64 producing
different locale files. This replaced that floating point multiply
with integer operations. While the hash table margin is increased
from 40% to 50%, testing shows only 2% increase in overall size
of the locale archive.
https://bugzilla.redhat.com/show_bug.cgi?id=1311954
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
__gconv_read_conf is only ever called once during the program's lifetime.
This means that __gconv_path_elem is always uninitialized when the function
begins executing. __gconv_get_path has an assert to ensure that this
expected runtime behaviour is always exhibited. Given this, checking for a
NULL value before calling __gconv_get_path is unnecessary. This commit
drops the condition and calls __gconv_get_path unconditionally.
In iconv/gconv_conf.c, __gconv_get_path unnecessarily obtains a lock when
populating the array pointed to by __gconv_path_elem. The locking is not
necessary because all calls to __gconv_read_conf (which in turn calls
__gconv_get_path) are serialized using __libc_once.
This patch:
- removes all locking in __gconv_get_path;
- replaces all explicitly serialized __gconv_read_conf calls with calls to
__gconv_load_conf, a new wrapper that is serialized internally;
- adds a new test, iconv/tst-iconv_mt.c, to exercise iconv initialization,
usage, and cleanup in a multi-threaded program;
- indents __gconv_get_path correctly, removing tab characters (which makes
the patch look a little bigger than it really is).
After removing the unnecessary locking, it was confirmed that the test case
fails if the relevant __libc_once is removed. Additionally, four localedata
and iconvdata tests also fail. This gives confidence that the testsuite
sufficiently guards against some regressions relating to multi-threading
with iconv.
Tested on x86_64 and i686.
The variables __gconv_path_elem, __gconv_max_path_elem_len and function
__gconv_get_path declared in, as well as the type path_elem and macro
GCONV_NCHAR_GOAL defined in gconv_int.h are all used in only one iconv
compilation unit each. In addition, the extern declaration of the variable
__gconv_nmodules refers to a variable that does not exist any more.
Considering this, these symbols do not need to be exposed via a header file.
This patch removes the extern declarations from the header file and moves
the definitions to the compilation units where they are used.
Building glibc for s390 with -Os (32-bit only, with GCC 7) fails with:
In file included from ../sysdeps/s390/multiarch/8bit-generic.c:370:0,
from ebcdic-at-de.c:28:
../iconv/loop.c: In function '__to_generic_vx':
../iconv/loop.c:264:22: error: 'ch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (((Character) >> 7) == (0xe0000 >> 7)) \
^~
In file included from ebcdic-at-de.c:28:0:
../sysdeps/s390/multiarch/8bit-generic.c:340:15: note: 'ch' was declared here
uint32_t ch; \
^
../iconv/loop.c:325:7: note: in expansion of macro 'BODY'
BODY
^~~~
It's fairly easy to see, looking at the (long) expansion of the BODY
macro, that this is a false positive and the relevant variable 'ch' is
always initialized before use, in one of two possible places. As
such, disabling the warning for -Os with the DIAG_* macros is the
natural approach to fix this build failure. However, because of the
location at which the warning is reported, the disabling needs to go
in iconv/loop.c, around the definition of UNICODE_TAG_HANDLER (not
inside the definition), as that macro definition is where the
uninitialized use is reported, whereas the code that needs to be
reasoned about to see that the warning is a false positive is in the
definition of BODY elsewhere.
Thus, the patch adds such disabling in iconv/loop.c, with a comment
pointing to the s390-specific code and a comment in the s390-specific
code pointing to the generic file to alert people to the possible need
to update one place when changing the other. It would be possible if
desired to use #ifdef __s390__ around the disabling, though in general
we try to avoid that sort of thing in generic files. (Or some
extremely specialized macros for "disable -Wmaybe-uninitialized in
this particular place" could be specified, defined to 0 in a lot of
different files that include iconv/loop.c and to 1 in that particular
s390 file.)
Tested that this fixed -Os compilation for s390-linux-gnu with
build-many-glibcs.py.
* iconv/loop.c (UNICODE_TAG_HANDLER): Disable
-Wmaybe-uninitialized for -Os.
* sysdeps/s390/multiarch/8bit-generic.c (BODY): Add comment about
this disabling.
Continuing the fixes for linknamespace and localplt test failures with
-Os that arise from functions not being inlined in that case, this
patch fixes such failures for feof_unlocked.
The usual approach is followed of adding __feof_unlocked (inlined when
feof_unlocked is), making calls use it when required for namespace
reasons, and using libc_hidden_proto / libc_hidden_weak for the
feof_unlocked weak alias when only localplt but not namespace issues
are involved. In the case of getaddrinfo.c, use of __feof_unlocked
needs to be conditional since that code is also used in nscd (where
__feof_unlocked is not available).
Tested for x86_64 (both without -Os to make sure that case continues
to work, and with -Os to make sure all the relevant linknamespace and
localplt test failures are resolved). Because of other such failures
that remain after this patch, neither of the bugs can yet be closed.
[BZ #15105]
[BZ #19463]
* libio/feof_u.c (feof_unlocked): Rename to __feof_unlocked and
define as weak alias of __feof_unlocked. Use libc_hidden_weak.
* include/stdio.h (feof_unlocked): Use libc_hidden_proto.
(__feof_unlocked): New declaration, and inline function if
[__USE_EXTERN_INLINES].
* iconv/gconv_conf.c (read_conf_file): Call __feof_unlocked
instead of feof_unlocked.
* intl/localealias.c [_LIBC] (FEOF): Likewise.
* nss/nsswitch.c (nss_parse_file): Likewise.
* sysdeps/unix/sysv/linux/readonly-area.c (__readonly_area):
Likewise.
* time/getdate.c (__getdate_r): Likewise.
* sysdeps/posix/getaddrinfo.c [IS_IN (libc)] (feof_unlocked):
Define as macro to call __feof_unlocked.