The commit does the following things:
* Move non-transliteration Unicode generated data to i18n_ctype.
* Copy the i18n_ctype data into i18n and add transliteration.
In the future, any locale which needs Unicode LC_CTYPE data can
also just use `copy i18n_ctype` and get the base character classes
and maps without transliteration.
Tested by compiling all the locales and my prototype C.UTF-8 which
uses it.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
[BZ #22070]
* localedata/unicode-gen/utf8_gen.py: Set the width for
characters with Prepended_Concatenation_Mark property to 1
* localedata/charmaps/UTF-8: Updated using the improved script.
Writing ranges of neighbouring characters with the same with like this
<U000E0100>...<U000E01EF> 0
in charmaps/UTF-8 is more efficient than writing many single character lines
like:
<U000E0100> 0
<U000E0101> 0
...
[BZ #21750]
* unicode-gen/utf8_gen.py: Write all ranges of neighbouring characters
with the same width using the range notation in charmaps/UTF-8.
[BZ #21750]
* unicode-gen/utf8_gen.py (U+00AD): Set width to 1.
* unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0.
* unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2.
* unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.
[BZ #19852]
[BZ #21750]
* unicode-gen/utf8_gen.py: Process EastAsianWidth lines before
UnicodeData lines so the latter have precedence; remove hack
to group output by EastAsianWidth ranges.
* Unicode 10.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 10.0.0, using
generator scripts contributed by Mike FABIAN (Red Hat).
* Unicode 9.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 9.0.0, using
generator scripts contributed by Mike FABIAN (Red Hat).
This patch makes the automation of Unicode LC_CTYPE generation also
support generating the modified LC_CTYPE used for Turkish (where case
conversions of 'i' and 'I' differ from ASCII conventions), so allowing
that to be more readily kept in sync for future Unicode updates. The
patch includes the locale update generated by the scripts.
Tested for x86_64.
[BZ #18491]
* unicode-gen/unicode_utils.py (to_upper_turkish): New function.
(to_lower_turkish): Likewise.
* unicode-gen/gen_unicode_ctype.py (output_tables): Support
producing output with Turkish case conversions.
(--turkish): New command-line option.
* unicode-gen/Makefile (GENERATED): Add tr_TR.
(tr_TR): New rule.
* locales/tr_TR: Regenerate LC_CTYPE.
Update __STDC_ISO_10646__ to 201505L for Unicode 8.0.0.
Update character encoding, ctype, and transliteration tables.
New scripts autogenerate transliteration tables.
for ChangeLog
* include/stdc-predef.h (__STDC_ISO_10646__): Update to
201304L, for Unicode 7.
for localedata/ChangeLog
* unicode-gen/ctype_compatibility.py: Use date ranges in
copyright notice.
* unicode-gen/ctype_compatibility_test_cases.py: Likewise.
* unicode-gen/gen_unicode_ctype.py: Likewise.
* unicode-gen/utf8_compatibility.py: Likewise.
* unicode-gen/utf8_gen.py: Likewise. Use upper case for
global variables, use tuples for global constant arrays. From
Mike FABIAN. Suggested by Mike Frysinger <vapier@gentoo.org>.
for localedata/ChangeLog
[BZ #17588]
[BZ #13064]
[BZ #14094]
[BZ #17998]
* unicode-gen/Makefile: New.
* unicode-gen/unicode-license.txt: New, from Unicode.
* unicode-gen/UnicodeData.txt: New, from Unicode.
* unicode-gen/DerivedCoreProperties.txt: New, from Unicode.
* unicode-gen/EastAsianWidth.txt: New, from Unicode.
* unicode-gen/gen_unicode_ctype.py: New generator, from Mike
FABIAN <mfabian@redhat.com>.
* unicode-gen/ctype_compatibility.py: New verifier, from
Pravin Satpute <psatpute@redhat.com> and Mike FABIAN.
* unicode-gen/ctype_compatibility_test_cases.py: New verifier
module, from Mike FABIAN.
* unicode-gen/utf8_gen.py: New generator, from Pravin Satpute
and Mike FABIAN.
* unicode-gen/utf8_compatibility.py: New verifier, from Pravin
Satpute and Mike FABIAN.
* charmaps/UTF-8: Update.
* locales/i18n: Update.
* gen-unicode-ctype.c: Remove.
* tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns
true for ordinal indicators.