This patch makes the automation of Unicode LC_CTYPE generation also
support generating the modified LC_CTYPE used for Turkish (where case
conversions of 'i' and 'I' differ from ASCII conventions), so allowing
that to be more readily kept in sync for future Unicode updates. The
patch includes the locale update generated by the scripts.
Tested for x86_64.
[BZ #18491]
* unicode-gen/unicode_utils.py (to_upper_turkish): New function.
(to_lower_turkish): Likewise.
* unicode-gen/gen_unicode_ctype.py (output_tables): Support
producing output with Turkish case conversions.
(--turkish): New command-line option.
* unicode-gen/Makefile (GENERATED): Add tr_TR.
(tr_TR): New rule.
* locales/tr_TR: Regenerate LC_CTYPE.
Update __STDC_ISO_10646__ to 201505L for Unicode 8.0.0.
Update character encoding, ctype, and transliteration tables.
New scripts autogenerate transliteration tables.
- Remove duplicate transliterations for U+0152 and U+0153 from
C-translit.h.in.
- Change Ö U+00D6 LATIN CAPITAL LETTER O WITH STROKE → O
(instead of → OE)
- Change ö U+00F6 LATIN SMALL LETTER O WITH STROKE → o
(instead of → oe)
- Add ₹ U+20B9 INDIAN RUPEE SIGN → INR
- Add ₫ U+20AB DONG SIGN → Dong (in addition to "₫ → Đồng")
- Add many others from
http://unicode.org/cldr/trac/browser/trunk/common/transforms/Latin-ASCII.xml
- Add some more currency signs suggested by Marko Myllynen
- Add another patch with more characters by Marko Myllynen
The previous (11th) version of the Hungarian spelling rules (released
in 1984) said that the separator had to be a dot, e.g. 10.35 meaning
10 o'clock 35 minutes. glibc correctly implements this.
The brand new (12th) version, in effect since September 1, 2015 adopts
to the common use of colon (especially in the digital world) and
allows to use either separator, without even expressing a preference.
For computer systems, using colons is way more typical and probably
easier to recognize. Dot is typically used in printed materials.
It also avoids an almost ambiguous situation where a space makes a
difference, e.g. "10.15-ig" means "until 10 o'clock 15 minutes"
whereas "10. 15-ig" means "until 15th of October". So I believe using
the colon as the separator is not only more frequent in the computer
world, but is also easier and quicker to recognize for the brain that
it's about hour:minute rather than month and day. And luckily it's now
equally correct according to the official rules.
11th edition: http://helyesiras.mta.hu/helyesiras/default/akh11
12th edition: http://helyesiras.mta.hu/helyesiras/default/akh12
In both editions it's the very last (299th and 300th, respectively) rule.
Microsoft also uses and recommends a colon since at least May 2011:
http://download.microsoft.com/download/e/6/1/e61266b2-d8b4-4fe0-a553-f01dc3976675/hun-hun-StyleGuide.pdf
The time format is different in common language and in the language of
IT. In common texts we usually do not abbreviate, so the full forms are
used: “7 óra 10 perckor csörgött a telefon”. However, the short format,
consisting of numerals only, can also be used. In this case a period
must be used between the two numbers and there must not be a space
between them: “találkozzunk 10.45-kor”.
However, in software mostly the short format is used, and the numbers
are separated by a colon. An obvious example is the clock in the bottom
right corner of your screen, thus 18:31.
lang_lib (which reflects ISO 639-2/B (bibliographic) codes) and
lang_term (which reflects ISO 639-2/T (terminology) codes) should be
identical except for those languages for which ISO 639-2 specifies
separate bibliographic/terminology values.
I used this Library of Congress page as the source:
http://www.loc.gov/standards/iso639-2/php/code_list.php
as discussed in the thread starting at
https://sourceware.org/ml/libc-alpha/2015-06/msg00098.html
it looks like the best options is to remove locale timezone information
from locales which currently provide it (in incomplete or incorrect
fashion) rather than to start duplicating tzdata info in glibc.
repertoire maps and character mnemonics were used early in the glibc
i18n/l10n effort but were quickly deprecated in favor of Unicode code
points. According to ChangeLog, the in-tree repertoire maps were
removed 2000-07-07 but some stray references remain even today. The
patch below removes them.
After renaming localedef now complains and build fails
LC_ADDRESS: field `lang_ab' must not be defined
earlier the names were similar to lang_ab definitions 'tu' or 'bh'
but after rename they are not.
Bhili [1] and Tulu [2] language does not have iso-639-1 codes. Patch
moves locale file with correct code and also fix iso-639.def.
1. http://www-01.sil.org/iso639-3/documentation.asp?id=bhb
2. http://www-01.sil.org/iso639-3/documentation.asp?id=tcy
localedata/ChangeLog:
2015-07-02 Pravin Satpute <psatpute@redhat.com>
[BZ #17475]
* locales/tu_IN: renamed to tcy_IN
* locales/bh_IN: renamed to bhb_IN
Changelog:
2015-03-05 Pravin Satpute <psatpute@redhat.com>
[BZ #17475]
* locale/iso-639.def: Update Bhili and Tulu language codes as
per iso639-3.
In the introduction for the official orthography rules for Ukrainian
language (http://spelling.ulif.org.ua/peredmova.htm) there's a note
that only apostrophe does not affect order of the words when sorting.
As could be seen from the official alphabet the soft sign
(U+044C/U+042C) has its hard position and thus affects the order and
also letters "е" and "є" (CYR-IE: U+0435/U+0415 and UKR-IE:
U+0454/U+0404) have their own positions and should have separate place
when sorting.
This also corresponds to official Unicode collation chart for these
letters: http://unicode.org/charts/collation/chart_Cyrillic.html
Both bo_CN and bo_IN were not compiling. The following fix
gets them into a usable state again giving a clean build
result for `make localedata/install-locales`.
for localedata/ChangeLog
[BZ #17588]
[BZ #13064]
[BZ #14094]
[BZ #17998]
* unicode-gen/Makefile: New.
* unicode-gen/unicode-license.txt: New, from Unicode.
* unicode-gen/UnicodeData.txt: New, from Unicode.
* unicode-gen/DerivedCoreProperties.txt: New, from Unicode.
* unicode-gen/EastAsianWidth.txt: New, from Unicode.
* unicode-gen/gen_unicode_ctype.py: New generator, from Mike
FABIAN <mfabian@redhat.com>.
* unicode-gen/ctype_compatibility.py: New verifier, from
Pravin Satpute <psatpute@redhat.com> and Mike FABIAN.
* unicode-gen/ctype_compatibility_test_cases.py: New verifier
module, from Mike FABIAN.
* unicode-gen/utf8_gen.py: New generator, from Pravin Satpute
and Mike FABIAN.
* unicode-gen/utf8_compatibility.py: New verifier, from Pravin
Satpute and Mike FABIAN.
* charmaps/UTF-8: Update.
* locales/i18n: Update.
* gen-unicode-ctype.c: Remove.
* tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns
true for ordinal indicators.
We add yesstr and nostr to three more locales.
We ignore the issue of capitalization of the first
character in yesstr and nostr. All locales will need
to be revisited to make this uniform policy change.
---
2013-05-02 Carlos O'Donell <carlos@redhat.com>
[BZ #15264]
* localedata/locales/en_CA (LC_MESSAGES): Define yesstr and nostr.
* localedata/locales/es_AR (LC_MESSAGES): Copy es_ES.
* localedata/locales/es_ES (LC_MESSAGES): Define yesstr and nostr.
Define yesstr/nostr in fi_FI (as "Kyllä" and "Ei").
Fixes part of BZ#15264.
---
2013-04-06 Marko Myllynen <myllynen@redhat.com>
[BZ #15264]
* locales/fi_FI (LC_MESSAGES): Define yesstr and nostr.
2012-11-21 Chris Leonard <cjl@sugarlabs.org>
[BZ #14863]
* SUPPORTED: Add niu_NU and niu_NZ.
* locales/niu_NU: Add Niuean (Vagahau Niue) locale for Niue,
contributed by Chris Leonard <cjl@sugarlabs.org> and Emani
Fakaotimanava-Lui <emani@niue.nu>.
* locales/niu_NZ: Add Niuean (Vagahau Niue) locale for New
Zealand, contributed by Chris Leonard <cjl@sugarlabs.org> and Emani
Fakaotimanava-Lui <emani@niue.nu>.
[BZ #14368]
* locales/szl_PL: New Silesian Language Locale for Poland.
Contributed by Przemyslaw Buczkowski <przemub@yahoo.pl>.
* localedata/SUPPORTED (SUPPORTED-LOCALES): Add szl_PL.
[BZ # 14828]
* locales/ayc_PE: Add Aymara locale for Peru
contributed by Chris Leonard <cjl@sugarlabs.org> and
Amos Batto <amosbatto@yahoo.com>.
* SUPPORTED (SUPPORTED-LOCALES): Add ayc_PE.