* localedata/locales/iso14651_t1_common: Add some convenient collation
symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
rules similar to those in CLDR.
* localedata/locales/iso14651_t1_common: The new version of this
file downloaded from ISO contained several syntax errors which
are fixed by this patch.
[BZ #14095] - Review / update collation data from Unicode / ISO 14651
File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt
Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.
This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq
[BZ #14095]
* localedata/locales/iso14651_t1_common: Update file to
latest version from ISO (ISO14651_2016_TABLE1_en.txt).
LC_TIME in these 4 locales is identical, using “copy "es_BO"” makes
that more obvious.
[BZ #22646]
* localedata/locales/es_CL (LC_TIME): copy "es_BO".
* localedata/locales/es_CU (LC_TIME): copy "es_BO".
* localedata/locales/es_EC (LC_TIME): copy "es_BO".
[BZ #10871]
* localedata/locales/ru_RU (mon): Rename to...
(alt_mon): This.
(abmon): Rename to...
(ab_alt_mon): This.
(mon): Import from CLDR (genitive case).
(abmon): Copy from the old content except the 5th month which is
now in the genitive case, even when abbreviated.
* localedata/locales/ru_UA: Likewise.
* time/tst-strptime.c (day_tests): Add an actual example of
a difference between %b and %Ob in Russian.
Primary month names are in a genitive case now, alternative month names
are in a nominative case.
The alternative digits hack is no longer needed and has been removed.
[BZ #10871]
* localedata/locales/uk_UA (mon): Renamed to...
(alt_mon): This.
(alt_digits): "0" removed and then renamed to...
(mon): This.
(date_fmt): Definition changed not to use the alternative
digits hack.
[BZ #10871]
* localedata/locales/pl_PL: Alternative month names added,
primary month names are genitive now.
* time/tst-strptime.c (day_tests): Actually use a genitive case
of a month name in Polish language.
Some languages (Slavic, Baltic, etc.) require a genitive case of the
month name when formatting a full date (with the day number) while
they require a nominative case when referring to the month standalone.
This requirement cannot be fulfilled without providing two forms for
each month name. From now it is specified that nl_langinfo(MON_1)
series (up to MON_12) and strftime("%B") generate the month names in
the grammatical form used when the month is a part of a complete date.
If the grammatical form used when the month is named by itself is needed,
the new values nl_langinfo(ALTMON_1) (up to ALTMON_12) and
strftime("%OB") are supported. This new feature is optional so the
languages which do not need it or do not yet provide the updated
locales simply do not use it and their behaviour is unchanged.
[BZ #10871]
* locale/C-time.c (_nl_C_LC_TIME): Add alternative month names,
define them as the same as primary full month names explicitly.
* locale/categories.def (LC_TIME): Add alt_mon and wide-alt_mon.
* locale/langinfo.h (__ALTMON_1, __ALTMON_2, __ALTMON_3, __ALTMON_4,
__ALTMON_5, __ALTMON_6, __ALTMON_7, __ALTMON_8, __ALTMON_9, __ALTMON_10,
__ALTMON_11, __ALTMON_12, _NL_WALTMON_1, _NL_WALTMON_2, _NL_WALTMON_3,
_NL_WALTMON_4, _NL_WALTMON_5, _NL_WALTMON_6, _NL_WALTMON_7,
_NL_WALTMON_8, _NL_WALTMON_9, _NL_WALTMON_10, _NL_WALTMON_11,
_NL_WALTMON_12): New enum constants.
[__USE_GNU] (ALTMON_1, ALTMON_2, ALTMON_3, ALTMON_4, ALTMON_5, ALTMON_6,
ALTMON_7, ALTMON_8, ALTMON_9, ALTMON_10, ALTMON_11, ALTMON_12): New
macros.
* locale/programs/ld-time.c (struct locale_time_t): Add alt_mon,
walt_mon, and alt_mon_defined members.
(time_output): Output alt_mon and walt_mon members.
(time_read): Read them, initialize them as copies of mon and wmon
respectively if they are missing, initialize alt_mon_defined.
* locale/programs/locfile-kw.gperf (alt_mon): Define.
* locale/programs/locfile-kw.h: Regenerate.
* locale/programs/locfile-token.h (tok_alt_mon): New enum constant.
* localedata/tst-langinfo.c (map): Add tests for the new constants
ALTMON_1 .. ALTMON_12.
* time/Makefile [$(run-built-tests) = yes] (LOCALES): Add fr_FR.UTF-8
and pl_PL.UTF-8.
* time/strftime_l.c (f_altmonth): New macro.
(__strftime_internal): Handle %OB format.
* time/strptime_l.c [_LIBC] (alt_month_name): New macro.
(__strptime_internal): Handle %OB format.
* time/tst-strptime.c (day_tests): Add tests to parse different forms
of month names including the new %OB format specifier.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reported-by: Robert Pluim <rpluim@gmail.com>
* localedata/locales/gu_IN (LC_IDENTIFICATION): Fix an obvious typo
in date: "2004-14-09" should be "2004-09-14".
* localedata/locales/lo_LA: Fix an obvious typo in date in the header:
"2003-15-09" should be "2003-09-15".
* localedata/locales/bho_NP (LC_IDENTIFICATION): Fix an obvious typo
in date: "2017-24-07" should be "2017-07-24".
* localedata/locales/mai_IN: Likewise.
* localedata/locales/mai_NP: Likewise.
The current date format prefixes one-digit days with a space, resulting
in ugly two spaces:
$ LC_ALL=hu_HU.UTF-8 date
2018. jan. 1., hétfő, 21:25:35 CET
^^
The official orthography rules doesn't contain an explicit rule about
this (which already gives no sane reason for double space), and an
implicit example of "1848. március 9." under bullet point 296 at
http://helyesiras.mta.hu/helyesiras/default/akh12 contains a single
space only. It's sure not convincing on an HTML page, but I confirm
that the official book edition (e.g.
https://www.libri.hu/en/konyv/a-magyar-helyesiras-szabalyai-32.html)
also contains a single space there.
[BZ #22657]
* localedata/locales/hu_HU (d_t_fmt): Avoid a leading space
before the day number which may produce a double space.
(date_fmt): Likewise.
[BZ #22524]
* localedata/Makefile: Add lt_LT.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/lt_LT.UTF-8.in: New file for testing the collation.
* localedata/locales/lt_LT (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22515]
* localedata/Makefile: Add hsb_DE.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/hsb_DE.UTF-8.in: New file for testing the collation.
* localedata/locales/hsb_DE (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22517]
* localedata/Makefile: Add et_EE.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/et_EE.UTF-8.in: New file for testing the collation.
* localedata/locales/et_EE (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22527]
* localedata/locales/tr_TR (LC_COLLATE): Base collation rules
on iso14651_t1. A test file localedata/tr_TR.UTF-8.in is already
available, this rewrite of the collation rules does reproduce
the test file in the same order.
[BZ #10580]
* localedata/locales/hr_HR (LC_TIME): Use two letters for the
digraphs in the month and day names. Using single code points for
digraphs is deprecated. While there are dedicated Unicode
codepoints, for the digraphs, these are included for backwards
compatibility and modern texts use a sequence of Basic Latin
characters. See: https://www.unicode.org/faq/ligature_digraph.html
This makes the month and day names agree exactly with CLDR now,
CLDR does not use the single code points for the digraphs either.
According to CLDR, collation rules for Serbian and Bosnian
should be the same as for Croatian.
[BZ #22534]
* localedata/Makefile: Add sr_RS.UTF-8 and bs_BA.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/bs_BA.UTF-8.in: New file (same as hr_HR.UTF-8.in).
* localedata/sr_RS.UTF-8.in: New file (same as hr_HR.UTF-8.in).
* localedata/locales/bs_BA (LC_COLLATE): Use “copy "hr_HR"”.
* localedata/locales/sr_RS (LC_COLLATE): Use “copy "hr_HR"”.
[BZ #10580]
* localedata/locales/hr_HR (LC_COLLATE): Base collation rules on
iso14651_t1.
* localedata/locales/hr_HR (LC_TIME): Sync month and day names with
CLDR (except use ligatures for the digraphs, CLDR does not use
the ligatures), add first_workday, some fixes in the date and time
formats.
* localedata/locales/hr_HR (LC_CTYPE): Add transliteration rules
for Đ and đ.
* localedata/locales/hr_HR (LC_MONETARY): Change currency_symbol to
lower case. p_cs_precedes and n_cs_precedes should be 0 instead of 1.
Add int_p_cs_precedes and int_n_cs_precedes.
* localedata/locales/hr_HR (LC_NUMERIC): Change thousands_sep to
"<U202F>" (NARROW NO-BREAK SPACE) and grouping to 3;3 (Agrees with
LC_MONETARY now).
* localedata/locales/hr_HR (LC_TELEPHONE): Add tel_dom_fmt.
* localedata/locales/hr_HR (LC_NAME): Add name_mr, name_mrs, and
name_miss.
* localedata/locales/hr_HR (LC_ADDRESS): Add country_post, country_isbn,
and lang_lib. Change postal_fmt.
change
[BZ #17750]
* Makefile: add fr_CA.UTF-8 to test-input and LOCALES.
* localedata/fr_CA.UTF-8.in: New file with test data for backward
accents sorting.
* localedata/fr_FR.UTF-8.in: Fix test data for forward accents
sorting.
* localedata/locales/cs_CZ (LC_COLLATE): Remove “define DIACRIT_FORWARD”
* localedata/locales/de_DE (LC_COLLATE): Likewise.
* localedata/locales/hu_HU (LC_COLLATE): Likewise.
* localedata/locales/lb_LU (LC_COLLATE): Likewise.
* localedata/locales/yuw_PG (LC_COLLATE): Likewise.
* localedata/locales/fr_CA (LC_COLLATE): Add “define DIACRIT_BACKWARD”
* localedata/locales/iso14651_t1_common: Use “ifdef DIACRIT_FORWARD”
instead of “ifdef DIACRIT_BACKWARD”.
The only locale which currently needs backward accents sorting is fr_CA.
Therefore, forward accents sorting should be the default.
Before this patch, backwards accent sorting was the default and all
locales except fr_CA had to use
define DIACRIT_FORWARD
before
copy "iso14651_t1"
Most locales didn’t do that and thus got the inappropriate backwards accents sorting
by accident. Now only the fr_CA locale needs to use
define DIACRIT_BACKWARD
before
copy "iso14651_t1"
Original patch slightly modified by: Mike FABIAN <mfabian@redhat.com>
The LOCALES variable in the localedata had two instances of cs_CZ
which generated the following warning:
../gen-locales.mk:11: target '/opt/build/localedata/cs_CZ.UTF-8/LC_CTYPE' given more than once in the same rule
Dropped the duplicate entry.
[BZ #22336]
* localedata/locales/cs_CZ (LC_COLLATE): Use “copy "iso14651_t1"”
and implement the collation rules for cs from CLDR on top of that.
* Makefile: Add cs_CZ.UTF-8 to test-input and to the list
of locales to be built for testing.
* cs_CZ.UTF-8.in: New file with test data to test the Czech sorting.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #22469]
* localedata/locales/pl_PL (LC_COLLATE): Use “copy "iso14651_t1"”
and implement the collation rules for pl from CLDR on top of that.
* Makefile: Add pl_PL.UTF-8 to test-input and to the list
of locales to be built for testing.
* pl_PL.UTF-8.in: New file with test data to test the Polish sorting.
[BZ #15537]
* localedata/locales/lv_LV (LC_COLLATE): Fix collation by
using “copy "iso14651_t1"” and then implementing the
collation rules for lv from CLDR on top of that.
* Makefile: Add lv_LV.UTF-8 to test-input and to the list
of locales to be built for testing.
* lv_LV.UTF-8.in: New file with test data to test the Latvian
sorting.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Update all sourceware links to https. The website redirects
everything to https anyway so let the web server do a bit less work.
The only reference that remains unchanged is the one in the old
ChangeLog, since it didn't seem worth changing it.
* NEWS: Update sourceware link to https.
* configure.ac: Likewise.
* crypt/md5test-giant.c: Likewise.
* dlfcn/bug-atexit1.c: Likewise.
* dlfcn/bug-atexit2.c: Likewise.
* localedata/README: Likewise.
* malloc/tst-mallocfork.c: Likewise.
* manual/install.texi: Likewise.
* nptl/tst-pthread-getattr.c: Likewise.
* stdio-common/tst-fgets.c: Likewise.
* stdio-common/tst-fwrite.c: Likewise.
* sunrpc/Makefile: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy_impl.S: Likewise.
* wcsmbs/tst-mbrtowc2.c: Likewise.
* configure: Regenerate.
* INSTALL: Regenerate.
Following the previous work by Carlos O'Donell the category of LC_CTYPE
is correctly set to "i18n:2012" rather than "unicode:2014" and the
i18n_ctype file is once again regenerated from scratch to make sure it
does not contain any manual additions except the copyright message.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* localedata/unicode-gen/gen_unicode_ctype.py (output_head):
category of LC_CTYPE set to "i18n:2012".
* localedata/locales/i18n_ctype: Regenerate.
[BZ #19485]
* localedata/locales/csb_PL (LC_TIME): Fix “abmon” for March
and use a better translation for March in “mon”.
* localedata/locales/csb_PL: Use more ASCII to improve the
readability of the source.
[BZ #13953]
* localedata/locales/km_KH: Use ASCII as much
as possible for better readability of the source and
remove useless comments.
* localedata/locales/km_KH (LC_TIME): Remove era stuff, it
was commented out and apparently wrong anyway because it was
using Lao characters. If Buddhist era should be used
for km_KH, a native speaker should write the correct formaat
for Khmer.
* localedata/locales/km_KH (LC_TIME): Add first_weekday 1
(According to CLDR, the first weekday for Cambodia is Sunday).
* localedata/locales/km_KH (LC_NAME): Remove name_mr and name_mrs
(These were using Lao characters which must be wrong. If we get
the correct data from a native speaker, we could add it back, until
then it is better not to have name_mr and name_mrs at all than
having it wrong).
[BZ #15260]
* localedata/locales/doi_IN (LC_MESSAGES): Match only for the
first letters of yesstr and nostr in yesexpr and noexpr,
not for the full words.
* localedata/locales/hne_IN (LC_MESSAGES): Likewise.
* localedata/locales/kok_IN (LC_MESSAGES): Likewise.
* localedata/locales/mr_IN (LC_MESSAGES): Likewise.
* localedata/locales/sat_IN (LC_MESSAGES): Likewise.
* localedata/locales/km_KH (LC_MESSAGES): Match also for the
first letters of yesstr and nostr in yesexpr and noexpr,
until now only English was matched in yesexpr and noexpr.
* localedata/locales/tl_PH (LC_MESSAGES): Use “copy "fil_PH"”
instead of “copy "en_US"”. CLDR has yesstr and nostr data for
fil but not for tl. As tl and fil are very similar, using fil
is probably better than using English.
Pablo was l10n/i18n coordinator back in the old days but MandrakeSoft is
dead now
* localedata/locales/br_FR (LC_IDENTIFICATON): Add
Thierry Vignaud <thierry.vignaud@gmail.com> as the contact
for the br_FR locale.
"Ket" is the the most used negative answer, as it's the negative answer
to a positively phrased question
It's used as it or with the verb ("Ne ran ket", ...)
As such, "Ket" is used in most translations.
"Nann" is less used as it's the negative answer to a negatively phrased
question
See https://en.wikipedia.org/wiki/Yes_and_no for explanations about
languages with 3 or 4 form systems.
We still keep "Nn" for short answers as:
- new learners are used to "Non" in french
- and they often misuses "Nann"
- for compatibility with english
[BZ #21706]
* localedata/locales/br_FR (LC_MESSAGES): Fix nostr.
From localedef --help:
Output control:
...
--no-warnings=<warnings> Comma-separated list of warnings to disable;
supported warnings are: ascii, intcurrsym
...
--warnings=<warnings> Comma-separated list of warnings to enable;
supported warnings are: ascii, intcurrsym
Locales using SHIFT_JIS and SHIFT_JISX0213 character maps are not ASCII
compatible. In order to build locales using these character maps, and
have localedef exit with a status of 0, we add new option to localedef
to disable or enable specific warnings. The options are --no-warnings
and --warnings, to disable and enable specific warnings respectively.
The options take a comma-separated list of warning names. The warning
names are taken directly from the generated warning. When a warning
that can be disabled is issued it will print something like this: foo is
not defined [--no-warnings=foo]
For the initial implementation we add two controllable warnings; first
'ascii' which is used by the localedata installation makefile target to
install SHIFT_JIS and SHIFT_JISX0213-using locales without error; second
'intcurrsym' which allows a program to use a non-standard international
currency symbol without triggering a warning. The 'intcurrsym' is
useful in the future if country codes are added that are not in our
current ISO 4217 list, and the user wants to avoid the warning. Having
at least two warnings to control gives an example for how the changes
can be extended to more warnings if required in the future.
These changes allow ja_JP.SHIFT_JIS and ja_JP.SHIFT_JISX0213 to be
compiled without warnings using --no-warnings=ascii. The
localedata/Makefile $(INSTALL-SUPPORTED-LOCALES) target is adjusted to
automatically add `--no-warnings=ascii` for such charmaps, and likewise
localedata/gen-locale.sh is adjusted with similar logic.
v2: Bring verbose, be_quiet, and all warning control booleans into
record-status.c, and compile this object file to be used by locale,
iconv, and localedef. Any users include record-status.h.
v3: Fix an instance of boolean coercion in set_warning().
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
The localedata collation test data is encoded in a particular
character set. We rename the test data to match the full locale
name with encoding, and adjust the Makefile and sort-test.sh
script. This allows us to have a future C.UTF-8 test that is
disambiguated from the built-in C locale.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
After the transition to generating a distinct file for Unicode ctype
information e.g. i18n_ctype, the check target was left with the wrong
target name. This patch fixes the check target and regenerates the
files with more information than previously used, filling in the the
LC_IDENTIFICATION data.
Tested on x86_64 by regenerating from Unicode source files, and
running checks. Tested by subsequently rebuilding all locales.
No regressions in testsuite.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Reported-by: Rafal Luzynski <digitalfreak@lingonborough.com>
* localedata/locales/hi_IN (LC_MESSAGES): In yesexpr and noexpr,
also check for the first characters of yesstr and nostr.
* localedata/locales/kn_IN (LC_MESSAGES): Likewise.
* localedata/locales/ks_IN@devanagari (LC_MESSAGES): Likewise.
* localedata/locales/chr_US (LC_MESSAGES): In yesexpr and noexpr,
match also for the contents of yesstr and nostr. As the first letter
of yesstr and nostr is equal, checking only for the first letter
is not enough.
* localedata/locales/ug_CN (LC_MESSAGES): Fix noexpr and yesexpr
by including the first letters of nostr and yesexpr in the regexp.
Also make it more readable by using ASCII where possible.
* localedata/locales/te_IN (LC_MESSAGES): Fix noexpr by including
the first letter of nostr in the regexp. It agrees with CLDR now.
Also make it more readable by using ASCII where possible.
* localedata/locales/km_KH (LC_MESSAGES): Fix yestr and nostr.
The yesstr and nostr apparently came from CLDR. And CLDR has a bug there:
these strings contain a U+17D6 (which somewhat looks like a colon)
instead of a real colon to separate the full words for “yes”
and “no” from the single letter responses.
* localedata/locales/ka_GE (LC_MESSAGES): Fix yesexp to make
it agree with CLDR (include the first letter of yesstr).
Also make it more readable by using ASCII where possible.
* localedata/locales/mr_IN (LC_MESSAGES): Fix yesstr and nostr
and improve yesexpr and noexpr. The yesstr and nostr apparently
came from CLDR. And CLDR has a bug there: these strings contain
a U+0903 (which looks like a colon) instead of a real colon
to separate the full words for “yes” and “no” from the single
letter responses.
Using all characters of the full words for yes and no in yesexpr and noexpr
makes no sense here, especially not because the words for yes and no
share one character.
* localedata/locales/bn_BD (LC_MESSAGES): Use only the first
letters of the full yesstr and nostr in yesexpr and noexpr.
* localedata/locales/an_ES (LC_MESSAGES): Add yesstr and nostr.
* localedata/locales/an_ES (LC_ADDRESS): Add lang_term and lang_lib.
* localedata/locales/an_ES: Make source more readable by using ASCII
where possible.
* localedata/locales/tpi_PG (LC_MESSAGES): Fix yesexpr and noexpr
by adding the generic +1 and -0 as in all other locales.
* localedata/locales/tpi_PG (LC_TIME): Fix some typos in the month and
day names and make it more readable by using ASCII where possible.
[BZ #16777]
* localedata/locales/pl_PL (LC_MONETARY): Use U+202F as mon_thousands_sep
and improve readability by using more ASCII.
* localedata/locales/pl_PL (LC_NUMERIC): Use U+202F as thousands_sep
and improve readability by using more ASCII.
The Valencian (meridional Catalan) locale is basically a copy of the
Catalan locale. The point of having a separate locale is only for PO
translations. This locale is already provided by several distributions
and is already supported by various projects like LibreOffice, Mozilla,
Gnome, KDE.
Aurelien Jarno <aurelien@aurel32.net>
[BZ #2522]
* localedata/locales/ca_ES@valencia: New file.
* localedata/SUPPORTED: Add ca_ES@valencia/UTF-8.
CLDR uses this pattern as well.
[BZ #22019]
* localedata/locales/el_GR: Set n_cs_precedes to 0.
* localedata/locales/el_CY: copy "el_GR" because it is identical.
* stdlib/tst-strfmon_l.c: adapt test case.
The error and warning handling in localedef, locale, and iconv
is a bit of a mess.
We use ugly constructs like this:
WITH_CUR_LOCALE (error (1, errno, gettext ("\
cannot read character map directory `%s'"), directory));
to issue errors, and read error_message_count directly from the
error API to detect errors. The problem with that is that the
code also uses error to print warnings, and informative messages.
All of this leads to problems where just having warnings will
produce an exit status as-if errors had been seen.
To fix this situation I have adopted the following high-level
changes:
* All errors are counted distinctly.
* All warnings are counted distinctly.
* All informative messages are not counted.
* Increasing verbosity cannot generate *more* errors, and
it previously did for errors conditional on verbose,
this is now fixed.
* Increasing verbosity *can* generate *more* warnings.
* Making the output quiet cannot generate *fewer* errors,
and it previously did for errors conditional on be_quiet,
this is now fixed.
* Each of error, warning, and informative message has it's
own function to call defined in record-status.h, and they
are: record_error, record_warning, and record_verbose.
* The record_error function always records an error, but
conditional on be_quiet may not print it.
* The record_warning function always records a warning,
but conditional on be_quiet may not print it.
* The record_verbose function only prints the verbose
message if verbose is true and be_quiet is false.
This has allowed the following fix:
* Previously any warnings were being treated as errors
because they incremented error_message_count, but now
we properly return an exit status of 1 if there are
warnings but output was generated.
All of this allows localedef to correctly decide if errors,
or warnings were present, and produce the correct exit code.
The locale and iconv programs now also use record-status.h
and we have removed the WITH_CUR_LOCALE hack, and instead
have internal push_locale/pop_locale functions centralized
in the record routines.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
The commit does the following things:
* Move non-transliteration Unicode generated data to i18n_ctype.
* Copy the i18n_ctype data into i18n and add transliteration.
In the future, any locale which needs Unicode LC_CTYPE data can
also just use `copy i18n_ctype` and get the base character classes
and maps without transliteration.
Tested by compiling all the locales and my prototype C.UTF-8 which
uses it.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
This code page is identical to code page 850 except that X'D5'
has been changed from LI61 (dotless i) to SC20 (euro symbol).
The code points from /x01 to /x1f in the /localedata/charmaps/IBM858
file have the same mapping as those in localedata/charmaps/ANSI_X3.4-1968.
That means they disagree with with
ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00858.txt
in that range.
For example, localedata/charmaps/IBM858 and localedata/charmaps/ANSI_X3.4-1968 have:
“<U0001> /x01 START OF HEADING (SOH)”
whereas CP00858.txt has:
“01 SS000000 Smiling Face”
That means that CP00858.txt is not really ASCII-compatible and to make
it ASCII-compatible we deviate fro CP00858.txt in the code points from /x01
to /x1f.
[BZ #21084]
* benchtests/strcoll-inputs/filelist#en_US.UTF-8: Add IBM858 and ibm858.c.
* iconvdata/Makefile: Add IBM858.
* iconvdata/gconv-modules: Add IBM858.
* iconvdata/ibm858.c: New file.
* iconvdata/tst-tables.sh: Add IBM858
* localedata/charmaps/IBM858: New file.
“Bengali” still remained in some comments in the bn_BD locale file,
in iso-639.def and in a test input file. Change it there as well.
“Bangla” is now used as the English name for this language in CLDR.
[BZ #14925]
* libio/tst-widetext.input: Change “Bengali” to “Bangla”.
* locale/iso-639.def: Change “Bengali” to “Bangla”.
* localedata/locales/bn_BD: “Bengali” was still used in some
comments. Change it to “Bangla”.
[BZ #22070]
* localedata/unicode-gen/utf8_gen.py: Set the width for
characters with Prepended_Concatenation_Mark property to 1
* localedata/charmaps/UTF-8: Updated using the improved script.
Writing ranges of neighbouring characters with the same with like this
<U000E0100>...<U000E01EF> 0
in charmaps/UTF-8 is more efficient than writing many single character lines
like:
<U000E0100> 0
<U000E0101> 0
...
[BZ #21750]
* unicode-gen/utf8_gen.py: Write all ranges of neighbouring characters
with the same width using the range notation in charmaps/UTF-8.
[BZ #15332]
* locales/es_CU (LC_MONETARY): use “,” for mon_decimal_point
and “.” for mon_thousands_sep (to agree with CLDR)
* locales/es_CU (LC_NUMERIC): Likewise.
[BZ #22038]
* locales/so_DJ (LC_TIME): Fix abday, abmon and
make t_fmt in the comment agree with the value of t_fmt.
* locales/so_ET (LC_TIME): Fix abday (From Axa to Axd)
* locales/so_KE (LC_TIME): Fix abday (From Axa to Axd)
* locales/so_SO (LC_TIME): Fix abday (From Axa to Axd)
[BZ #13805]
* locales/ru_RU (LC_MONETARY): Use “,” for mon_decimal_point
(to agree with CLDR).
* locales/ru_RU (LC_NUMERIC): Write mon_decimal_point in ASCII
for readability.
* locales/os_RU (LC_MONETARY): Copy from ru_RU,
makes it agree with CLDR.
Add locale for “Morisyen” which is also called “Mauritian Creole”
and is spoken in Mauritius.
[BZ #21971]
* localedata/SUPPORTED: Add mfe_MU/UTF-8.
* localedata/locales/mfe_MU: New File.
[BZ #21971]
* locale/iso-639.def: add Morisyen.
[BZ #21750]
* unicode-gen/utf8_gen.py (U+00AD): Set width to 1.
* unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0.
* unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2.
* unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.
[BZ #19852]
[BZ #21750]
* unicode-gen/utf8_gen.py: Process EastAsianWidth lines before
UnicodeData lines so the latter have precedence; remove hack
to group output by EastAsianWidth ranges.
[BZ #14925]
* locales/bn_BD (LC_IDENTIFICATION): Change language name in
“title” and “language” from Bengali to Bangla.
* locales/bn_IN (LC_IDENTIFICATION): Likewise.
The custom stuff which was in LC_CTYPE of the km_KH locale seems
to be a very incomplete subset of what one gets by using
“copy "i18n"”. I cannot find anything special there which is not
in “copy "i18n"”, only lots of stuff which is missing.
[BZ #20008]
* locales/km_KH (LC_CTYPE): Use “copy "i18n"”.
[BZ #20482]
* locales/de_AT (LC_TIME): Use 2 letter abbreviations in abday.
* locales/de_BE (LC_TIME): Use 2 letter abbreviations in abday.
* locales/de_CH (LC_TIME): Use 2 letter abbreviations in abday.
* locales/de_DE (LC_TIME): Use readable ASCII in abday.
* locales/de_IT (LC_TIME): Use readable ASCII in abday.
* locales/de_LU (LC_TIME): Use 2 letter abbreviations in abday.
See also [BZ #20756].
U+202F NARROW NO-BREAK SPACE: a narrow form of a no-break space,
typically the width of a thin space or a mid space.
U+2009 THIN SPACE.
Many languages use small gap as thousands separator.
Thousands separator should not be a plain space, but a narrow space.
And additionally, it is not allowed to wrap line in the middle of the
number.
Locale data were created in a deep age of 8-bit encodings, so most of
them use space (incorrect: it allows wrapping the line in the middle
of the number), or NBSP (better, but typographically incorrect: space
between groups is too wide).
Now UNICODE is widely supported, so we should leave legacy characters
in favor of correct UNICODE character.
UNICODE has a dedicated character for this purpose:
NNBSP
U+202F NARROW NO-BREAK SPACE: a narrow form of a no-break space,
typically the width of a thin space or a mid space
The NNBSP exists since Unicode 3.0.
Use of NNBSP will prevent line wrapping in the midle of number and
improve readability of numbers.
[BZ #20756]
* locales/aa_DJ (LC_MONETARY): Replace space by NNBSP as thousands separator.
* locales/az_AZ (LC_MONETARY): Likewise.
* locales/be_BY (LC_MONETARY): Likewise.
* locales/be_BY@latin (LC_MONETARY): Likewise.
* locales/bg_BG (LC_MONETARY): Likewise.
* locales/bs_BA (LC_MONETARY): Likewise.
* locales/ce_RU (LC_MONETARY): Likewise.
* locales/crh_UA (LC_MONETARY): Likewise.
* locales/cs_CZ (LC_MONETARY): Likewise.
* locales/cs_CZ (LC_NUMERIC): Likewise.
* locales/cv_RU (LC_MONETARY): Likewise.
* locales/de_AT (LC_MONETARY): Likewise.
* locales/eo (LC_MONETARY): Likewise.
* locales/es_CR (LC_MONETARY): Likewise.
* locales/es_CR (LC_NUMERIC): Likewise.
* locales/es_CU (LC_MONETARY): Likewise.
* locales/et_EE (LC_MONETARY): Likewise.
* locales/et_EE (LC_NUMERIC): Likewise.
* locales/fi_FI (LC_MONETARY): Likewise.
* locales/fi_FI (LC_NUMERIC): Likewise.
* locales/fr_CA (LC_MONETARY): Likewise.
* locales/fr_FR (LC_MONETARY): Likewise.
* locales/fr_FR (LC_NUMERIC): Likewise.
* locales/fr_LU (LC_MONETARY): Likewise.
* locales/fr_LU (LC_NUMERIC): Likewise.
* locales/hr_HR (LC_MONETARY): Likewise.
* locales/ht_HT (LC_NUMERIC): Likewise.
* locales/kk_KZ (LC_MONETARY): Likewise.
* locales/kk_KZ (LC_NUMERIC): Likewise.
* locales/ky_KG (LC_MONETARY): Likewise.
* locales/ky_KG (LC_NUMERIC): Likewise.
* locales/lv_LV (LC_MONETARY): Likewise.
* locales/lv_LV (LC_NUMERIC): Likewise.
* locales/mg_MG (LC_MONETARY): Likewise.
* locales/mhr_RU (LC_MONETARY): Likewise.
* locales/mk_MK (LC_MONETARY): Likewise.
* locales/mk_MK (LC_NUMERIC): Likewise.
* locales/mn_MN (LC_MONETARY): Likewise.
* locales/nb_NO (LC_MONETARY): Likewise.
* locales/nb_NO (LC_NUMERIC): Likewise.
* locales/nl_AW (LC_MONETARY): Likewise.
* locales/nl_NL (LC_MONETARY): Likewise.
* locales/nn_NO (LC_MONETARY): Likewise.
* locales/os_RU (LC_MONETARY): Likewise.
* locales/pap_AW (LC_MONETARY): Likewise.
* locales/pap_CW (LC_MONETARY): Likewise.
* locales/ru_RU (LC_MONETARY): Likewise.
* locales/ru_RU (LC_NUMERIC): Likewise.
* locales/ru_UA (LC_MONETARY): Likewise.
* locales/sk_SK (LC_MONETARY): Likewise.
* locales/sk_SK (LC_NUMERIC): Likewise.
* locales/sl_SI (LC_MONETARY): Likewise.
* locales/sl_SI (LC_NUMERIC): Likewise.
* locales/sq_MK (LC_MONETARY): Likewise.
* locales/sv_SE (LC_MONETARY): Likewise.
* locales/sv_SE (LC_NUMERIC): Likewise.
* locales/tg_TJ (LC_MONETARY): Likewise.
* locales/tt_RU (LC_MONETARY): Likewise.
* locales/tt_RU@iqtelif (LC_MONETARY): Likewise.
* locales/uk_UA (LC_MONETARY): Likewise.
* locales/uk_UA (LC_NUMERIC): Likewise.
* locales/unm_US (LC_MONETARY): Likewise.
* locales/unm_US (LC_NUMERIC): Likewise.
* locales/wo_SN (LC_MONETARY): Likewise.
[BZ #17563]
[BZ #16905]
* locales/cmn_TW (LC_COLLATE): Use cns11643_stroke file for sorting.
* locales/cmn_TW (LC_TIME): Improve time and date formats.
* locales/cmn_TW (LC_MESSAGES): Add yesstr and nostr.
* locales/cns11643_stroke: New file, stroke count collation for
traditional Chinese.
These comments are useless and only confusing. The encodings used to
create binary locales from source locales are listed in the
localedata/SUPPORTED file. The source files itself are ASCII or UTF-8
encoded where non-ASCII UTF-8 is currently only used in comments. If
all locale source files are UTF-8 anyway, there is no need to specify
that in a special comment.
New locale is added for the Seychelles which is a member of the African
Union. English is an offical language for the Seychelles.
[BZ #21854]
* locales/en_SC: New file.
* localedata/SUPPORTED : Add en_SC/UTF-8.
For the locales doi_IN, kok_IN, and sat_IN, the words for
“yes” and “no” were apparently in yesexpr and noexpr.
Copy them from there to add yesstr and nostr.
Also make yesexpr and noexpr more readable by using
the POSIX portable character set.
* locales/doi_IN (LC_MESSAGES): Add yesstr and nostr.
* locales/kok_IN (LC_MESSAGES): Add yesstr and nostr.
* locales/sat_IN (LC_MESSAGES): Add yesstr and nostr.
This reverts commit 8f75515080
Revert “Fix yesexpr in en_DK locale”.
* locales/en_DK (LC_MESSAGES): Restore original yesexpr, noexpr,
yesstr, nostr. Convert them to ASCII and add a comment why
we want to have them like this.
And make the expressions more readable by using the POSIX portable character set
instead of Unicode code points.
* locales/agr_PE (LC_MESSAGES): drop .* from yesexpr and noexpr
* locales/az_IR (LC_MESSAGES): Improve yesexpr and noexpr.
* locales/az_IR (LC_ADDRESS): Fix typo in comment and
use the individual iso-639-3 code for South Azerbaijani
"azb" in lang_term.
* locales/az_IR (LC_NAME): Improve readability of name_fmt in source.
After the recent import of month names from CLDRv31 (bug 21217,
commit c853f14) an import of abbreviated month names is also needed
to make sure they match the full forms.
In case of kok_IN CLDR does not provide the abbreviated month names
explicitly but uses full month names in such cases so abmon section
has been copied from mon.
* localedata/locales/as_IN (abmon): Update from CLDR.
* localedata/locales/bn_BD (abmon): Likewise.
* localedata/locales/bn_IN (abmon): Likewise.
* localedata/locales/gu_IN (abmon): Likewise.
* localedata/locales/hi_IN (abmon): Likewise.
* localedata/locales/kn_IN (abmon): Likewise.
* localedata/locales/ml_IN (abmon): Likewise.
* localedata/locales/mr_IN (abmon): Likewise.
* localedata/locales/ne_NP (abmon): Likewise.
* localedata/locales/or_IN (abmon): Likewise.
* localedata/locales/pa_IN (abmon): Likewise.
* localedata/locales/ta_IN (abmon): Likewise.
* localedata/locales/te_IN (abmon): Likewise.
* localedata/locales/kok_IN (abmon): Likewise but copied from mon.
Maithili which is an official language not only in India but in Nepal as well.
https://en.wikipedia.org/wiki/Maithili_language
Reference is taken form mai_IN.
[BZ #21835]
* localedata/locales/mai_NP: New file.
* localedata/SUPPORTED: Add mai_NP/UTF-8.
After the recent update of int_select the comment needed an update, too.
While at this, all comments in LC_TELEPHONE were moved above their
respective values because this looks better. Some minor typos fixed.
[BZ #21783]
* localedata/locales/lg_UG (LC_TELEPHONE): Move all comments
above the values, correct some of them.
The commit to add the Fiji Hindi locale mentioned
Bug 21207 - ce_RU: update weekdays from CLDR
which was wrong, correct is:
Bug 21694 - Current Glibc Locale Does Not Support Tok-Pisin and Fiji Hindi Locale
During Hindi Locale review I found many fields are incorrect
[BZ #21729]
* locales/hi_IN (LC_NAME): Fix name_mr, name_mrs, name_miss, name_ms
Signed-off-by: Akhilesh Kumar <akhilesh.k@samsung.com>
During Locale verification I observed that
yesstr and nostr are missing for Xhosa language locale
for South Africa
[BZ #21724]
* locales/xh_ZA (LC_MESSAGES): add yesstr and nostr
Signed-off-by: Akhilesh Kumar <akhilesh.k@samsung.com>
During Locale verification I observed that
Incorrect Full Weekday names for ks_IN@devanagari
Reference is taken from
http://www.mkraina.com/PDF/3-Self-authored%20Works%20(English)/15.pdf
And kashmiri devanagari travel book and other sources
[BZ #21721]
* locales/ks_IN@devanagari: Full weekday name Fix.
Signed-off-by: Akhilesh Kumar <akhilesh.k@samsung.com>
After the recent import of month names from CLDRv31 (bug 21217,
commit c853f14) more imports are also needed, mostly abbreviated month
names.
This patch also updates May (full month name) in ps_AF which was
skipped in the previous patch.
Incidentally, this import fixes bug 17225 (ar_SY) and partially
bug 19066 (ar_SA).
CLDR currently has a bug in the full month name for October for ar_IQ, see
http://unicode.org/cldr/trac/ticket/10460
* localedata/locales/ar_DZ (abmon): Full import from CLDR, abmon
is no longer abbreviated.
* localedata/locales/ar_IQ (abmon): Likewise.
* localedata/locales/ar_MA (abmon): Likewise.
* localedata/locales/ar_TN (abmon): Likewise.
* localedata/locales/ps_AF (abmon): Likewise.
* localedata/locales/ug_CN (abmon): Likewise.
* localedata/locales/ar_SA (abmon): Likewise, partially
fixes bug 19066.
* localedata/locales/ks_IN (abmon): A copy of mon.
* localedata/locales/ur_IN (abmon): Oct reworded "اكتوبر" to
"اکتوبر" (same change as mon).
* localedata/locales/ur_PK (abmon): Same changes as mon applied.
* localedata/locales/ps_AF (mon): May reworded "می" to "مۍ".
[BZ #17225]
* localedata/locales/ar_SY (abmon): May reworded "نوار" to
"أيار", this closes bug 17225.
* localedata/locales/ar_JO (abmon): Likewise.
* localedata/locales/ar_LB (abmon): Likewise.
[BZ #21711]
During Locale verification I observed that
yesstr and nostr are missing for Pashto [LC_MESSAGES] Locale
For Afghanistan reference google translate and Pashto travel book.
[BZ #21706]
During Locale verification i observed that
yesstr and nostr are missing for Breton [LC_MESSAGES] locale
Signed-off-by: Akhilesh Kumar <akhilesh.k@samsung.com>
After the recent import of month names from CLDR (bug 21217) more
imports are also needed, mostly abbreviated month names.
* localedata/locales/br_FR (abmon): Reworded "Eve " to "Mezh".
* localedata/locales/fy_NL (abmon): Reworded "Maa" (March) to
"Mrt" and "Maa" (May) to "Mai".
* localedata/locales/lg_UG (abmon): Reworded "Jun" to "Juu".
* localedata/locales/ln_CD (abmon): "yan", "fbl", "msi",
and so on.
* localedata/locales/mn_MN (abmon): "1-р сар", "2-р сар",
"3-р сар", and so on.
* localedata/locales/vi_VN (abmon): Reworded "Th01" to "Thg 1",
"Th02" to "Thg 2" and so on.
* localedata/locales/yo_NG (abday): "Àìkú", "Ajé", "Ìsẹ́gun",
and so on, also comment updated to match the new content.
(day): "Ọjọ́ Àìkú", "Ọjọ́ Ajé", "Ọjọ́ Ìsẹ́gun", and so on.
(abmon): "Ṣẹ́rẹ́", "Èrèlè", "Ẹrẹ̀nà", and so on.
(mon): Comment updated to match the actual content.
(d_t_fmt): Changed "%A" to "%a" and "%B" to "%b".
* localedata/locales/zu_ZA (abmon): "Jan", "Feb", "Mas",
and so on, also comment updated to match the new content.
(mon): comment updated to match the actual content.
This updates a bunch of locales based on CLDR v29 data:
az_AZ: changing Azərbaycanca to azərbaycan dili
be_BY: changing беларуская мова to беларуская
bem_ZM: changing iciBemba to Ichibemba
bg_BG: changing български език to български
bo_CN: changing པོད་སྐད་ to བོད་སྐད་
bo_IN: changing པོད་སྐད་ to བོད་སྐད་
br_FR: changing Brezhoneg to brezhoneg
brx_IN: lang_name: setting to बड़ो
ce_RU: changing нохчийн мотт to нохчийн
cs_CZ: changing Čeština to čeština
dz_BT: changing (རྫོང་ཁ to རྫོང་ཁ
el_CY: changing ελληνικά to Ελληνικά
el_GR: changing ελληνικά to Ελληνικά
es_AR: changing Español to español
es_BO: changing Español to español
es_CL: changing Español to español
es_CO: changing Español to español
es_CR: changing Español to español
es_CU: changing Español to español
es_DO: changing Español to español
es_EC: changing Español to español
es_ES: changing Español to español
es_GT: changing Español to español
es_HN: changing Español to español
es_MX: changing Español to español
es_NI: changing Español to español
es_PA: changing Español to español
es_PE: changing Español to español
es_PR: changing Español to español
es_PY: changing Español to español
es_SV: changing Español to español
es_US: changing Español to español
es_UY: changing Español to español
es_VE: changing Español to español
et_EE: changing eesti keel to eesti
eu_ES: changing Euskara to euskara
fr_BE: changing Français to français
fr_CA: changing Français to français
fr_CH: changing Français to français
fr_FR: changing Français to français
fr_LU: changing Français to français
fur_IT: changing Furlan to furlan
fy_NL: changing Frysk to West-Frysk
gl_ES: changing Galego to galego
gv_GB: changing y Ghaelg to Gaelg
he_IL: lang_name: setting to עברית
hsb_DE: changing Hornjoserbšćina to hornjoserbšćina
hy_AM: changing Հայերեն to հայերեն
id_ID: changing Bahasa Indonesia to Bahasa Indonesia
it_CH: changing Italiano to italiano
it_IT: changing Italiano to italiano
kl_GL: changing Kalaallisut to kalaallisut
km_KH: changing ភាសាខ្មែរ to ខ្មែរ
ko_KR: changing 한국말 to 한국어
ks_IN: changing kạ̄šur to کٲشُر
kw_GB: changing Kernowek to kernewek
ky_KG: changing Кыргызча to кыргызча
lg_UG: changing Oluganda to Luganda
lt_LT: changing lietuvių kalba to lietuvių
lv_LV: changing latviešu valoda to latviešu
mk_MK: changing македонск/и јазик to македонски
mn_MN: changing Монгол хэл to монгол
nb_NO: changing Bokmål to norsk bokmål
nn_NO: changing Nynorsk to nynorsk
os_RU: lang_name: setting to ирон
ru_RU: lang_name: setting to русский
ru_UA: lang_name: setting to русский
se_NO: changing Davvisámegiella to davvisámegiella
sk_SK: lang_name: setting to slovenčina
ta_IN: lang_name: setting to தமிழ்
ta_LK: lang_name: setting to தமிழ்
tk_TM: changing Türkmençe to türkmençe
tr_CY: changing Turkish to Türkçe
tr_TR: changing Turkish to Türkçe
ur_IN: lang_name: setting to {اردو}
ur_PK: lang_name: setting to {اردو}
vi_VN: changing Việt ngữ to Tiếng Việt
yo_NG: changing Yorùbá to Èdè Yorùbá
zu_ZA: changing IsiZulu to isiZulu
Most of these are simple case changes, but they match the CLDR db.
A search for a few of the others suggests they're also correct.
[BZ #21217]
* localedata/locales/ln_CD (mon): Months imported from CLDR, all changed:
"Yanwáli" to "sánzá ya yambo", "Febwáli" to "sánzá ya míbalé" and so on.
* locales/mn_MN (mon): reworded "Хулгана сарын" to "Нэгдүгээр сар",
"Үхэр сарын" to "Хоёрдугаар сар", and so on.
* locales/vi_VN (mon): reworded "Tháng một" to "Tháng 1", "Tháng hai"
to "Tháng 2", "Tháng ba" to "Tháng 3", and so on.
* locales/yo_NG (mon): reworded "Jánúárì" to "Oṣù Ṣẹ́rẹ́", "Fẹ́búárì"
to "Oṣù Èrèlè", "Máàṣì" to "Oṣù Ẹrẹ̀nà", and so on.
* locales/zu_ZA (mon): reworded "uMasingana" to "Januwari",
"uNhlolanja" to "Februwari", "uNdasa" to "Mashi", and so on.
[BZ #21217]
* localedata/locales/be_BY (mon, abmon): Reworded "Травень" ("travyen'")
and abbreviated "Тра" ("tra") to "Май" ("may").
* localedata/locales/be_BY@latin (mon, abmon): Likewise, "Travień"
and abbreviated "Tra" reworded to "Maj".
* localedata/locales/br_FR (day, abmon, mon, d_t_fmt): Use the proper
Unicode apostrophe (Cʼhwevrer, mercʼher, Dʼar).
* localedata/locales/es_PE (mon, abmon): Reworded "septiembre" to
"setiembre" and abbreviated "sep" to "set".
* localedata/locales/es_UY: Likewise.
* localedata/locales/fil_PH (mon): Reworded "Septiyembre" to "Setyembre"
and "Nobiyembre" to "Nobyembre".
* localedata/locales/fur_IT (mon, abmon): Reworded "Decembar" to
"Dicembar" and abbreviated "Dec" to "Dic".
* localedata/locales/fy_NL (mon): Reworded "Janaris" to "Jannewaris".
* localedata/locales/ha_NG (mon, abmon): Reworded "Fabrairu" to
"Faburairu" and "Afrilu" to "Afirilu", also abbreviated "Afr" to "Afi".
* localedata/locales/ig_NG (mon, abmon): All months begin with
uppercase. Reworded: "febụrụwarị" to "Febrụwarị", "epreel" to "Eprel",
"ọgọstụ" to "Ọgọọst", "nọvemba" to "Novemba", "nọv" to "Nov".
* localedata/locales/kw_GB (mon): Months imported from CLDR, many
changes: all "Mys" to "mis", "Mys Whevrel" to "mis Hwevrer", "Mys Merth"
to "mis Meurth", "Mys Evan" to "mis Metheven", and so on.
(abmon): "Whe>" to "Hwe", "Mer" to "Meu", "Evn" to "Met".
* localedata/locales/lg_UG (mon): Reworded "Julaai" to "Julaayi".
* localedata/locales/ln_CD (mon): Months imported from CLDR, all changed:
"Yanwáli" to "sánzá ya yambo", "Febwáli" to "sánzá ya míbalé" and so on.
* localedata/locales/mg_MG (mon, abmon): All months now begin with
uppercase.
* localedata/locales/se_NO (mon): Months imported from CLDR, all
suffixes changed from "mánu" to "mánnu".
* localedata/locales/sr_RS@latin (mon): Reworded "juni" to "jun" and
"juli" to "jul".
* localedata/locales/uz_UZ (mon): Reworded "Sentyabr" to "Sentabr"
and "Oktyabr" to "Oktabr".
* localedata/locales/uz_UZ@cyrillic (mon): Most of the names reworded,
no longer end with a soft sign.
* localedata/locales/wae_CH (mon): reworded "Bráchet" to "Bráčet",
"Öigschte" to "Öigšte", "Herbschtmánet" to "Herbštmánet",
"Chrischtmánet" to "Chrištmánet".
* Unicode 10.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 10.0.0, using
generator scripts contributed by Mike FABIAN (Red Hat).
<locale.h> is specified to define locale_t in POSIX.1-2008, and so are
all of the headers that define functions that take locale_t arguments.
Under _GNU_SOURCE, the additional headers that define such functions
have also always defined locale_t. Therefore, there is no need to use
__locale_t in public function prototypes, nor in any internal code.
* ctype/ctype-c99_l.c, ctype/ctype.h, ctype/ctype_l.c
* include/monetary.h, include/stdlib.h, include/time.h
* include/wchar.h, locale/duplocale.c, locale/freelocale.c
* locale/global-locale.c, locale/langinfo.h, locale/locale.h
* locale/localeinfo.h, locale/newlocale.c
* locale/nl_langinfo_l.c, locale/uselocale.c
* localedata/bug-usesetlocale.c, localedata/tst-xlocale2.c
* stdio-common/vfscanf.c, stdlib/monetary.h, stdlib/stdlib.h
* stdlib/strfmon_l.c, stdlib/strtod_l.c, stdlib/strtof_l.c
* stdlib/strtol.c, stdlib/strtol_l.c, stdlib/strtold_l.c
* stdlib/strtoll_l.c, stdlib/strtoul_l.c, stdlib/strtoull_l.c
* string/strcasecmp.c, string/strcoll_l.c, string/string.h
* string/strings.h, string/strncase.c, string/strxfrm_l.c
* sysdeps/ieee754/float128/strtof128_l.c
* sysdeps/ieee754/float128/wcstof128.c
* sysdeps/ieee754/float128/wcstof128_l.c
* sysdeps/ieee754/ldbl-128ibm/strtold_l.c
* sysdeps/ieee754/ldbl-64-128/strtold_l.c
* sysdeps/ieee754/ldbl-opt/nldbl-compat.c
* sysdeps/ieee754/ldbl-opt/nldbl-strfmon_l.c
* sysdeps/ieee754/ldbl-opt/nldbl-strtold_l.c
* sysdeps/ieee754/ldbl-opt/nldbl-wcstold_l.c
* sysdeps/powerpc/powerpc32/power7/strcasecmp.S
* sysdeps/powerpc/powerpc64/power7/strcasecmp.S
* sysdeps/x86_64/strcasecmp_l-nonascii.c
* sysdeps/x86_64/strncase_l-nonascii.c, time/strftime_l.c
* time/strptime_l.c, time/time.h, wcsmbs/mbsrtowcs_l.c
* wcsmbs/wchar.h, wcsmbs/wcscasecmp.c, wcsmbs/wcsncase.c
* wcsmbs/wcstod.c, wcsmbs/wcstod_l.c, wcsmbs/wcstof.c
* wcsmbs/wcstof_l.c, wcsmbs/wcstol_l.c, wcsmbs/wcstold.c
* wcsmbs/wcstold_l.c, wcsmbs/wcstoll_l.c, wcsmbs/wcstoul_l.c
* wcsmbs/wcstoull_l.c, wctype/iswctype_l.c
* wctype/towctrans_l.c, wctype/wcfuncs_l.c
* wctype/wctrans_l.c, wctype/wctype.h, wctype/wctype_l.c:
Change all uses of __locale_t to locale_t.
[BZ #21207]
* locales/ce_RU (day): Updated (imported) from CLDR. Uppercase letters
left unchanged.
* locales/ce_RU (abday): Minor updates to match (day): Latin uppercase
"I" replaced with Cyrillic "Ӏ" ("Palochka", Unicode: U04C0). Trailing
spaces removed.
Despite the fact that el_GR is ISO-8859-7:2003 which contains the euro
symobl, it is not possible to know this apriori to selecting the el_GR
locale. Therefore you don't know if el_GR can possibly have the 2003
ammendments which include the euro symbol. This is resolved by creating
an el_GR@euro locale similar to all the other @euro locales for non-UTF8
charsets.
Fix the incorrect sorting order of a digraph and its geminated variant,
regression introduced by a faulty fix to bug 13547 in commit
b008d4c856.
Fix two inconsistencies in sorting unusual capitalization of digraphs
(bug #18587).
Enable DIACRIT_FORWARD to work around bug #17750.
Sort foreign accents after the Hungarian ones.
Add extensive unittests containing all the examples from The Rules of
Hungarian Orthography and many more, including explanatory comments.
* Unicode 9.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 9.0.0, using
generator scripts contributed by Mike FABIAN (Red Hat).
Both regexes end with a "*." which means the previous match can be
omitted, and then the . allows them to match any input at all.
This means tools like coreutils' `rm -i` will always delete things
when prompted because the yesexpr regex matches all inputs (even
the negative ones).
Microsoft long ago added a mapping for 0x80 to the Euro sign to their
CP936. While GBK 1.0 doesn't include this mapping, it is compatible,
and Microsoft and glibc alias the two codepages. We could split them
apart so GBK wouldn't include the mapping, but that seems like a lot
of work for little gain.
The standard currently in effect (LST ISO 8601:1997) mandates the use
of hyphens (as opposed to full stops, currently) in date formats. It
also matches current CLDR data (v29), Wikipedia's & Wikia's settings,
and Microsoft's Lithuanian Style Guide.
According to "Requirements of information technology in Estonian
language and cultural environment" the monetary symbol should be
written after the amount number:
https://www.evs.ee/products/evs-8-2008
The date format in en_CA/LC_TIME specifies the date format as "%d/%m/%y".
However, it should be "%Y-%m-%d". This is the standard date format in
Canada as specified by the Canadian Standards Association in CSA Z234.5:1989,
which adopts the ISO 8601 standard.
Here's the web page from the National Research Council of Canada
citing ISO 8601 as the standard date/time format in Canada:
http://www.nrc-cnrc.gc.ca/eng/services/time/faq/#Q8
International Standard ISO 8601 specifies numeric representations
of date and time. The recommended full format is of the form
2001-12-31 23:59:28.73 UTC. The intent of this standard is to avoid
confusion in international communications which can arise with the
many different national notations. This format has the advantage
that it permits dates to be readily sorted in chronological order
by computer systems.
Windows 8+ and OS X also switched to this format.
Fix the postal_fmt and country_name entries to continue on the following
line without indentation.
localedata/Changelog:
* locales/de_LI (postal_fmt): Fix indentation.
(country_name): Likewise.
The Principality of Liechtenstein currently does not have a corresponding
locale. Given the links with Switzerland, the best is to base the locale
on the de_CH one (German is the official language) and only change the
country related categories: LC_ADDRESS. and LC_TELEPHONE.
localedata/Changelog:
* locales/de_LI: New locale.
* SUPPORTED: Add de_LI.
Some of the newer symbols we're using are missing translit entries which
causes troubles when generating the locales with older encodings.
tr_TR: ₺ -> "TL"
uz_UZ: ʻ -> "'"
common:
֏ -> "AMD"
₪ -> "ILS"
₱ -> "PHP"
₸ -> "KZT"
₾ -> "GEL"
The current test code doesn't check the return value of malloc.
This should rarely (if ever) cause a problem, but rather than add
some return value checks, just statically allocate the buffer on
the stack. This will never fail (or if it does, we've got much
bigger problems that don't matter to the test).
The yes/no strings should be based on the dictionary words. That means
they are capitalized based on the dictionary rather than position in the
sentence (e.g. the first word).
bo_CN: nostr: changing མེན to མིན།
bo_CN: yesstr: changing ཨིན to ཡིན།
dz_BT: nostr: changing མེན to མེན་
dz_BT: yesstr: changing ཨིན to ཨིན་
en_CA: yesstr: changing Yes to yes
en_CA: nostr: changing No to no
en_US: yesstr: changing Yes to yes
en_US: nostr: changing No to no
es_ES: nostr: changing No to no
es_ES: yesstr: changing Si to sí
fi_FI: nostr: changing Ei to ei
fi_FI: yesstr: changing Kyllä to kyllä
ig_NG: yesstr: changing Ee to Eye
ko_KR: nostr: changing 아니오 to 아니요
ky_KG: nostr: changing Жок to жок
ky_KG: yesstr: changing Ооба to ооба
ms_MY: nostr: changing Tidak to tidak
ms_MY: yesstr: changing Ya to ya
te_IN: nostr: changing కాదు to వద్దు
te_IN: yesstr: changing అవను to అవును
ur_PK: nostr: changing نهيں to نہیں
ur_PK: yesstr: changing بلكل to ہاں
uz_UZ: nostr: changing Yo'q to yo‘q
uz_UZ: yesstr: changing Ha to ha
uz_UZ@cyrillic: nostr: changing Йўқ to йўқ
uz_UZ@cyrillic: yesstr: changing Ҳа to ҳа
wae_CH: nostr: changing Nei to nei
wae_CH: yesstr: changing Ja to ja
yo_NG: nostr: changing Bẹ́ẹ̀ kọ́ to Bẹ́ẹ̀kọ́
yo_NG: yesstr: changing Bẹ́ẹ̀ ni to Bẹ́ẹ̀ni
Some of the translations were just wrong.
el_GR: nostr: changing no to όχι
el_GR: yesstr: changing yes to ναι
km_KH: nostr: changing no:NO:n:N to ទេ៖ n
km_KH: yesstr: changing yes:YES:y:Y to បាទ/ចាស៖ y
ug_CN: nostr: changing No to ياق
ug_CN: yesstr: changing Yes to ھەئە
Add missing translations for a number of locales:
af_ZA: nostr: setting to nee
af_ZA: yesstr: setting to ja
am_ET: nostr: setting to አይ
am_ET: yesstr: setting to አዎን
ast_ES: nostr: setting to non
ast_ES: yesstr: setting to sí
be_BY: nostr: setting to не
be_BY: yesstr: setting to так
bem_ZM: nostr: setting to Awe
bem_ZM: yesstr: setting to Ee
bg_BG: nostr: setting to не
bg_BG: yesstr: setting to да
brx_IN: nostr: setting to नहीं
brx_IN: yesstr: setting to हाँ
bs_BA: nostr: setting to ne
bs_BA: yesstr: setting to da
ca_ES: nostr: setting to no
ca_ES: yesstr: setting to sí
da_DK: nostr: setting to nej
da_DK: yesstr: setting to ja
de_DE: nostr: setting to nein
de_DE: yesstr: setting to ja
en_DK: nostr: setting to yes
en_DK: yesstr: setting to no
et_EE: nostr: setting to ei
et_EE: yesstr: setting to jah
eu_ES: nostr: setting to ez
eu_ES: yesstr: setting to bai
fa_IR: nostr: setting to نه
fa_IR: yesstr: setting to بله
ff_SN: nostr: setting to Alaa
ff_SN: yesstr: setting to Eey
fo_FO: nostr: setting to nei
fo_FO: yesstr: setting to já
fr_BE: nostr: setting to non
fr_BE: yesstr: setting to oui
fr_CH: nostr: setting to non
fr_CH: yesstr: setting to oui
fr_FR: nostr: setting to non
fr_FR: yesstr: setting to oui
fr_LU: nostr: setting to non
fr_LU: yesstr: setting to oui
fur_IT: nostr: setting to no
fur_IT: yesstr: setting to sì
fy_DE: nostr: setting to nee
fy_DE: yesstr: setting to ja
ga_IE: nostr: setting to níl
ga_IE: yesstr: setting to tá
gd_GB: nostr: setting to chan eil
gd_GB: yesstr: setting to tha
gl_ES: nostr: setting to non
gl_ES: yesstr: setting to si
gu_IN: nostr: setting to નહીં
gu_IN: yesstr: setting to હા
he_IL: nostr: setting to לא
he_IL: yesstr: setting to כן
hi_IN: nostr: setting to नहीं
hi_IN: yesstr: setting to हाँ
hr_HR: nostr: setting to ne
hr_HR: yesstr: setting to da
hu_HU: nostr: setting to nem
hu_HU: yesstr: setting to igen
id_ID: nostr: setting to tidak
id_ID: yesstr: setting to ya
is_IS: nostr: setting to nei
is_IS: yesstr: setting to já
it_CH: nostr: setting to no
it_CH: yesstr: setting to sì
it_IT: nostr: setting to no
it_IT: yesstr: setting to sì
ka_GE: nostr: setting to არა
ka_GE: yesstr: setting to კი
kk_KZ: nostr: setting to жоқ
kk_KZ: yesstr: setting to иә
kl_GL: nostr: setting to naagga
kl_GL: yesstr: setting to aap
kn_IN: nostr: setting to ಇಲ್ಲ
kn_IN: yesstr: setting to ಹೌದು
ko_KR: yesstr: setting to 예
lb_LU: nostr: setting to nee
lb_LU: yesstr: setting to jo
lg_UG: nostr: setting to Nedda
lg_UG: yesstr: setting to Ye
lt_LT: nostr: setting to ne
lt_LT: yesstr: setting to taip
lv_LV: nostr: setting to nē
lv_LV: yesstr: setting to jā
mg_MG: nostr: setting to Tsia
mg_MG: yesstr: setting to Eny
mn_MN: nostr: setting to үгүй
mn_MN: yesstr: setting to тийм
mr_IN: nostr: setting to नाहीःना
mr_IN: yesstr: setting to होयःहो
mt_MT: nostr: setting to le
mt_MT: yesstr: setting to iva
nb_NO: nostr: setting to nei
nb_NO: yesstr: setting to ja
ne_NP: nostr: setting to होइन
ne_NP: yesstr: setting to हो
nl_NL: nostr: setting to nee
nl_NL: yesstr: setting to ja
nn_NO: nostr: setting to nei
nn_NO: yesstr: setting to ja
or_IN: nostr: setting to ନା
or_IN: yesstr: setting to ହଁ
os_RU: nostr: setting to нӕйы
os_RU: yesstr: setting to уойы
pa_IN: nostr: setting to ਨਹੀਂ
pa_IN: yesstr: setting to ਹਾਂ
pl_PL: nostr: setting to nie
pl_PL: yesstr: setting to tak
pt_BR: nostr: setting to não
pt_BR: yesstr: setting to sim
pt_PT: nostr: setting to não
pt_PT: yesstr: setting to sim
ro_RO: nostr: setting to nu
ro_RO: yesstr: setting to da
ru_RU: nostr: setting to нет
ru_RU: yesstr: setting to да
ru_UA: nostr: setting to нет
ru_UA: yesstr: setting to да
se_NO: nostr: setting to ii
se_NO: yesstr: setting to jo
sl_SI: nostr: setting to ne
sl_SI: yesstr: setting to da
so_DJ: nostr: setting to maya
so_DJ: yesstr: setting to haa
so_SO: nostr: setting to maya
so_SO: yesstr: setting to haa
sq_AL: nostr: setting to jo
sq_AL: yesstr: setting to po
sr_RS@latin: nostr: setting to ne
sr_RS@latin: yesstr: setting to da
sr_RS: nostr: setting to не
sr_RS: yesstr: setting to да
sv_SE: nostr: setting to nej
sv_SE: yesstr: setting to ja
sw_KE: nostr: setting to Hapana
sw_KE: yesstr: setting to Ndiyo
yue_HK: nostr: setting to 唔係
yue_HK: yesstr: setting to 係
zu_ZA: nostr: setting to cha
zu_ZA: yesstr: setting to yebo
The vast majority of languages include yY/nN in their yes/no regexes.
Standardize the few that were missing them.
ms_MY: noexpr: add nN
nan_TW@latin: yesexpr: add yY
nan_TW@latin: noexpr: add nN
se_NO: noexpr: add nN
This also highlighted a few that were incorrectly using yY/nN because
they clashed with their localized messages:
uz_UZ: yesexpr: change ^[+1YyHh] to ^[+1ҲҳHh]
uz_UZ: noexpr: change ^[-0JjNn] to ^[-0ЙйNnYyJj]
uz_UZ@cyrillic: yesexpr: change ^[+1ҲҳYy] to ^[+1ҲҳHh]
uz_UZ@cyrillic: noexpr: change ^[-0ЙйNn] to [-0ЙйNnYyJj]
yo_NG: move nN (short for Bẹ́ẹ̀ni) from noexpr to yesexpr
A handful of regexes were allowing +1 for yesexpr and -0 for noexpr,
and it's the i18n definition. Standardize all locales by allowing
these language-independent values in them.
Example change for en_US goes from ^[yY] to ^[+1yY], and from ^[nN]
to ^[-0nN].