Unicode 13.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 13.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Total added characters in newly generated CHARMAP: 5930
Total added characters in newly generated WIDTH: 5536
Confirmed by CLDR and a native speaker: "abril" is more often used even
if "abrial" is also correct. Both nominative (alt_mon) and genitive (mon)
cases are updated.
It is not specified what should be the content of d_t_fmt and date_fmt
but in the built-in C locale those fields have only one difference:
date_fmt contains "%Z" (the current time zone) while d_t_fmt does not.
For most of the locales this commit does the following operation:
copy d_t_fmt to date_fmt, and then remove "%Z" from d_t_fmt.
If "%Z" was originally missing from d_t_fmt add it to date_fmt.
It also corrects comments where necessary.
Exceptions:
* In bo_CN, dz_BT, and km_KH "%Z" has not been added to date_fmt because
it was too difficult. In these locales date_fmt has been set to the
copy of d_t_fmt.
* In en_DK "%Z" has not been removed from d_t_fmt in order to preserve
the conformance with the standard mentioned in the comment.
The command to identify and initially edit the locales that need the
update was:
for i in `grep -lw d_t_fmt *`
do
if ! grep -qw date_fmt $i ; then
awk '/d_t_fmt/ { print $0; gsub("d_t_fmt", "date_fmt"); } //{ print $0 }' < $i > $i.next
mv $i.next $i
fi
done
and then each file was further edited manually.
Currently d_t_fmt formats time as "plkst. %H un %M". A quick Google
search says that "plkst." means "o’clock" and "un" means "and".
Also this format does not display seconds.
CLDR does not mention anything like that. We have no reason to use
anything different than "%H:%M:%S".
Replacing incorrect abbreviated weekday names "Пнд", "Вто", "Срд"...
with correct ones "Пн", "Вт", "Ср"... makes the LC_TIME sections in
those two locales almost identical. The only remaining difference
was that ab_alt_mon elements in ru_UA were lowercase while in ru_RU
they had the first letter uppercase, the latter was pointed as
a better choice by a native speaker. This commit unifies LC_TIME
between ru_RU and ru_UA.
This commit adds previously missing transliterations for several code points
in the Unicode blocks "Miscellaneous Mathematical Symbols-A/B" -
transliterated to their approximate ASCII representations. It also adds a
corresponding iconv transliteration test.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The testroot does not have a gunzip command, so the charmap files
should not be installed gzipped else they cannot be used (and thus
tested). With this patch, installing with INSTALL_UNCOMPRESSED=yes
installs uncompressed charmaps instead.
Note that we must purge the $(symbolic_link_list) as it contains
references to $(DESTDIR), which we change during the testroot
installation.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Sync these values with CLDR and langtable as much as possible. Add
missing values.
If possible, take the values from CLDR, if CLDR does not have it,
take it from langtable. The values from langtable which are not from
CLDR are from Wikipedia or native speakers.
The first day of the week in China (Mainland) should be Monday according
to the national standard GB/T 7408-2005. References:
* https://www.doc88.com/p-1166696540287.html
* https://unicode-org.atlassian.net/browse/CLDR-11510
[BZ #24682]
* localedata/locales/bo_CN (first_weekday): Add, set to 2 (Monday).
* localedata/locales/ug_CN (first_weekday): Likewise.
* localedata/locales/zh_CN (first_weekday): Likewise.
This commit updates month and weekday names (full and abbreviated)
from CLDR 35.1 with the following exceptions.
It was not clear why the full name of February in aa_DJ and aa_ER was
"Kudo" while the abbreviated version is "Nah" but some additional
sources [1] [2] as well as the content of aa_ER and aa_ER@saaho
suggest it should be "Naharsi Kudo". This commit consequently sets
the translation of February to "Naharsi Kudo" in aa_DJ and aa_ET.
aa_ER@saaho is not supported by CLDR but since the month names were
identical to aa_ER before this commit, the same values have been copied
from aa_ER.
Links:
[1] https://fr.wiktionary.org/wiki/naharsi_kudo
[2] http://www.mcit.gov.et/web/guest/-/localization-standard-for-afaraf
[BZ #21897]
* localedata/locales/aa_DJ (abday): Update from CLDR, all words
begin with an uppercase letter now.
(abmon): Likewise.
(mon): Update from CLDR, reword February from "Kudo" to
"Naharsi Kudo", April from "Agda Baxisso" to "Agda Baxis",
and August from "Liiqen" to "Leqeeni".
* localedata/locales/aa_ER (mon): Update from CLDR, reword
April from "Agda Baxisso" to "Agda Baxis" and August from
"Leqeeni" to "Liiqen".
* localedata/locales/aa_ER@saaho (mon): Likewise.
* localedata/locales/aa_ET (abmon): Update from CLDR, reword
abbreviated February from "Kud" to "Nah".
(mon): Update from CLDR, reword February from "Kudo" to
"Naharsi Kudo" and April from "Agda Baxisso" to "Agda Baxis".
These values were removed by the commit 0a410e76f5.
[BZ #24200]
* localedata/locales/ga_IE (first_weekday): Add, set to 2 (Monday).
* localedata/locales/en_IE (first_weekday): Likewise.
The Unicode sequences in the format <Uxxxx> should be used instead of
non-ASCII characters.
Reported by Piotr Drąg:
https://sourceware.org/bugzilla/show_bug.cgi?id=24652#c8
[BZ #24652]
* localedata/locales/szl_PL (day): Use the correct Unicode
sequences instead of non-ASCII characters.
This commit also provides the correct month names in both nominative
and genitive case for Silesian language, as required by the fix for
the bug 10871.
[BZ #24652]
* localedata/locales/szl_PL (abday): Spelling corrections.
(day): Likewise.
(abmon): Likewise.
(mon): Rename to...
(alt_mon): This, then apply spelling corrections.
(mon): New entry, month names in the genitive case.
According to CLDR 35.1 and the bug report the thousands grouping
separator should be always "." (a single dot) and digits should be
grouped by 3.
[BZ #23831]
* localedata/locales/nl_AW (mon_thousands_sep): Set to ".".
* localedata/locales/nl_NL (mon_thousands_sep): Likewise.
(thousands_sep): Likewise.
(grouping): Set to 3;3.
Follow the same changes as made in the commit 02d8b5ab1c because the
respective entries in nl_NL and nl_AW had been the same before the change
so they should be the same after. CLDR does not provide complete data
for nl_AW, it says it is missing and displays a copy of nl_NL.
[BZ #24614]
* localedata/locales/nl_AW (n_sep_by_space): Set to 2 (a space
between the currency symbol and the minus sign).
(n_sign_posn): Set to 4 (the minus sign after the currency symbol).
According to CLDR 35.1 and the bug report the correct monetary format
for negative amounts should be "EUR -1 234,56" while previously it was
"EUR 1 234,56-".
This patch does not change the thousands (grouping) separator.
[BZ #24614]
* localedata/Makefile (LOCALES): Add nl_NL.UTF-8.
* localedata/locales/nl_NL (n_sep_by_space): Set to 2 (a space
between the currency symbol and the minus sign).
(n_sign_posn): Set to 4 (the minus sign after the currency symbol).
* localedata/tst-strfmon1.c (tests): Add test data for nl_NL.UTF-8.
This commit fixes some errors and converts all month names to lowercase.
The content is synchronized with CLDR-35.1 now but trailing dots are
removed from abmon values in order to maintain consistency with the
previous values and with many other locales which do the same.
[BZ #24369]
* localedata/locales/tt_RU (mon): Update from CLDR-35.1, fix errors.
(abmon): Likewise, but remove the trailing dots.
Unicode 12.1.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 12.1.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Some info about the number of characters added or changed:
Total added characters in newly generated CHARMAP: 1
added: <U32FF> /xe3/x8b/xbf SQUARE ERA NAME REIWA
Total added characters in newly generated WIDTH: 1
added: <U32FF> 2 : eaw=W category=So bidi=L name=SQUARE ERA NAME REIWA
graph: Added 1 characters in new ctype which were not in old ctype
graph: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
print: Added 1 characters in new ctype which were not in old ctype
print: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
punct: Added 1 characters in new ctype which were not in old ctype
punct: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
The Japanese era name will be changed on May 1, 2019. The Japanese
government made a preliminary announcement on April 1, 2019.
The glibc ja_JP locale must be updated to include the new era name for
strftime's alternative year format support.
Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
ChangeLog:
[BZ #22964]
* localedata/locales/ja_JP (LC_TIME): Add entry for the new Japanese
era.
* time/tst-strftime2.c (dates): Add 2019-04-30 and 2019-05-01.
(mkreftable): Add rules for the new Japanese era and the new dates.
This commit fixes some errors and converts all weekday names to lowercase.
The content is synchronized with CLDR-34 now, but trailing dots are removed
from abday values in order to maintain consistency with the previous values
and with many other locales which do the same.
[BZ #24296]
* localedata/locales/tt_RU (day): Update from CLDR-34, fix errors.
(abday): Likewise, but remove the trailing dots.
Minguo calendar is the official calendar system, and very widely used in
Taiwan. This commit adds its support into glibc.
Some background information: The government website (www.gov.tw) uses it,
popular public services like Taiwan HSR also use this calendar system.
Link to Wikipedia: https://en.wikipedia.org/wiki/Minguo_calendar
[BZ #24293]
* localedata/locales/zh_TW (era): Add, support Minguo calendar.
* localedata/locales/cmn_TW (era): Likewise.
* localedata/locales/hak_TW (era): Likewise.
* localedata/locales/lzh_TW (era): Likewise.
* localedata/locales/nan_TW (era): Likewise.
Unicode 12.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 12.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Some info about the number of characters added or changed:
Total added characters in newly generated CHARMAP: 554
Total added characters in newly generated WIDTH: 106
alpha: Missing 8 characters of old ctype in new ctype
(These are combining marks, apparently they were removed from alpha
on purpose)
alpha: Added 295 characters in new ctype which were not in old ctype
combining: Missing 2 characters of old ctype in new ctype
(U+1CF2 VEDIC SIGN ARDHAVISARGA and U+1CF3 VEDIC SIGN ROTATED ARDHAVISARGA,
these are now "Alphabetic" in Unicode 12.0.0)
combining: Added 37 characters in new ctype which were not in old ctype
combining_level3: Missing 2 characters of old ctype in new ctype
(U+1CF2 VEDIC SIGN ARDHAVISARGA and U+1CF3 VEDIC SIGN ROTATED ARDHAVISARGA,
these are now "Alphabetic" in Unicode 12.0.0)
combining_level3: Added 26 characters in new ctype which were not in old ctype
graph: Added 554 characters in new ctype which were not in old ctype
lower: Added 6 characters in new ctype which were not in old ctype
print: Added 554 characters in new ctype which were not in old ctype
punct: Missing 29 characters of old ctype in new ctype
(These characters have all become "Alphabetic" in Unicode 12.0.0.
Therefore, they are not in "punct" anymore (see: is_punct() in unicode_utils.py))
punct: Added 296 characters in new ctype which were not in old ctype
tolower: Added 7 characters in new ctype which were not in old ctype
totitle: Added 7 characters in new ctype which were not in old ctype
toupper: Added 7 characters in new ctype which were not in old ctype
upper: Added 7 characters in new ctype which were not in old ctype
[BZ #24307]
* localedata/unicode-gen/Makefile (UNICODE_VERSION): Set to 12.0.0.
* localedata/unicode-gen/DerivedCoreProperties.txt: Update to Unicode 12.0.0.
* localedata/unicode-gen/EastAsianWidth.txt: Likewise.
* localedata/unicode-gen/PropList.txt: Likewise.
* localedata/unicode-gen/UnicodeData.txt: Likewise.
* localedata/unicode-gen/ctype_compatibility_test_cases.py: U+108D became
"Alphabetic" in Unicode 12.0.0. Adapt test case.
* localedata/charmaps/UTF-8: Regenerate.
* localedata/locales/i18n_ctype: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/translit_circle: Likewise.
* localedata/locales/translit_cjk_compat: Likewise.
* localedata/locales/translit_combining: Likewise.
* localedata/locales/translit_compat: Likewise.
* localedata/locales/translit_font: Likewise.
* localedata/locales/translit_fraction: Likewise.
The offset in era-string format for Taisho gan-nen (1912) is currently
defined as 2, but it should be 1. So fix it. "Gan-nen" means the 1st
(origin) year, Taisho started on July 30, 1912.
Reported-by: Morimitsu, Junji <junji.morimitsu@hpe.com>
Reviewed-by: Rafal Luzynski <digitalfreak@lingonborough.com>
ChangeLog:
[BZ #24162]
* localedata/locales/ja_JP (LC_TIME): Change the offset for Taisho
gan-nen from 2 to 1. Problem reported by Morimitsu, Junji.
The en_US locale use a 12h am/pm format in both d_fmt and d_t_fmt, which
is correct, but does not define date_fmt. This causes the default value
to be used, which is in 24h format.
This patch adds the date_fmt entry to the en_US locale with the same
value as d_t_fmt as the latter already includes the timezone.
Changelog
[BZ #24046]
* localedata/locales/en_US (date_fmt): Add, set to
"%a %d %b %Y %r %Z".
It has been discovered that some locales use the 12-hour time formats but
do not use any AM/PM indicator thus making the time ambiguous. This
commit adds "%p" wherever it was missing. In some cases it has been
identified that a locale should use 24-hour time format rather than
12-hour. All time formats come from CLDR but this commit introduces as
few changes as possible (for example, it tries not to change the time zone
display). For the locales which are not supported by CLDR the consistency
with similar locales (which means the same language or the same country)
has been preserved: if the time formats were the same before the change
then they are still the same after the change.
The time format updates can be roughly summarized as follows:
* Most of the locales of Djibouti, Eritrea, and Ethiopia now use
"%l:%M:%S %p".
* Most of the locales of India and some surrounding countries (Bangladesh,
Nepal etc.) now use "%I:%M:%S %p %Z".
* Most of the Arabic locales now use "%Z %I:%M:%S %p".
* Ge'ez language (Eritrea and Ethiopia) now uses "%l:%M:%S፡%p" (note the
consistent use of Ethiopic wordspace character).
* Tamil (India) now uses "%p %I:%M:%S %Z".
* Chinese (Hong Kong) t_fmt now uses "%p %I<U6642>%M<U5206>%S<U79D2> %Z".
* Additionally, the following locales have been switched from 12-hour time
formats to 24-hour, according to CLDR: Arabic (Morocco), Maltese, Somali
(Kenya), and Tamil (Sri Lanka).
* Finally, the Bulgarian, Czech, and Slovak locales used 24-hour time
format correctly but their t_fmt_ampm field was not empty containing
12-hour time format which was incorrect so it is now replaced with an
empty string.
[BZ #10496]
* localedata/locales/aa_DJ (t_fmt): Set to "%l:%M:%S %p".
(t_fmt_ampm): Likewise.
* localedata/locales/aa_ER (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/aa_ER@saaho (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/aa_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/am_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/byn_ER (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/om_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/sid_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/so_DJ (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/so_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/so_SO (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/ti_ER (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/ti_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/tig_ER (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/wal_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/anp_IN (t_fmt): Set to "%I:%M:%S %p %Z".
* localedata/locales/ar_IN (t_fmt): Likewise.
* localedata/locales/bhb_IN (t_fmt): Likewise.
* localedata/locales/bho_IN (t_fmt): Likewise.
* localedata/locales/bi_VU (t_fmt): Likewise.
* localedata/locales/bn_BD (t_fmt): Likewise.
* localedata/locales/bn_IN (t_fmt): Likewise.
* localedata/locales/brx_IN (t_fmt): Likewise.
* localedata/locales/doi_IN (t_fmt): Likewise.
* localedata/locales/en_HK (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/en_IN (t_fmt): Likewise.
* localedata/locales/en_PH (t_fmt): Likewise.
* localedata/locales/gu_IN (t_fmt): Likewise.
* localedata/locales/hi_IN (t_fmt): Likewise.
* localedata/locales/hif_FJ (t_fmt): Likewise.
* localedata/locales/hne_IN (t_fmt): Likewise.
* localedata/locales/kn_IN (t_fmt): Likewise.
* localedata/locales/kok_IN (t_fmt): Likewise.
* localedata/locales/ks_IN (t_fmt): Likewise.
* localedata/locales/ks_IN@devanagari (t_fmt): Likewise.
* localedata/locales/mag_IN (t_fmt): Likewise.
* localedata/locales/mai_IN (t_fmt): Likewise.
* localedata/locales/mjw_IN (t_fmt): Likewise.
* localedata/locales/ml_IN (t_fmt): Likewise.
* localedata/locales/mni_IN (t_fmt): Likewise.
* localedata/locales/mr_IN (t_fmt): Likewise.
* localedata/locales/ms_MY (t_fmt): Likewise.
* localedata/locales/pa_IN (t_fmt): Likewise.
* localedata/locales/raj_IN (t_fmt): Likewise.
* localedata/locales/sa_IN (t_fmt): Likewise.
* localedata/locales/sat_IN (t_fmt): Likewise.
* localedata/locales/sd_IN (t_fmt): Likewise.
* localedata/locales/sd_IN@devanagari (t_fmt): Likewise.
* localedata/locales/tcy_IN (t_fmt): Likewise.
* localedata/locales/the_NP (t_fmt): Likewise.
* localedata/locales/to_TO (t_fmt): Likewise.
* localedata/locales/ur_IN (t_fmt): Likewise.
* localedata/locales/hif_FJ (d_t_fmt): Set to
"%A %d %b %Y %I:%M:%S %p".
(date_fmt): Add, set to "%A %d %b %Y %I:%M:%S %p %Z".
* localedata/locales/ar_AE (t_fmt): Set to "%Z %I:%M:%S %p".
* localedata/locales/ar_BH (t_fmt): Likewise.
* localedata/locales/ar_DZ (t_fmt): Likewise.
* localedata/locales/ar_EG (t_fmt): Likewise.
* localedata/locales/ar_IQ (t_fmt): Likewise.
* localedata/locales/ar_JO (t_fmt): Likewise.
* localedata/locales/ar_KW (t_fmt): Likewise.
* localedata/locales/ar_LB (t_fmt): Likewise.
* localedata/locales/ar_LY (t_fmt): Likewise.
* localedata/locales/ar_OM (t_fmt): Likewise.
* localedata/locales/ar_QA (t_fmt): Likewise.
* localedata/locales/ar_SD (t_fmt): Likewise.
* localedata/locales/ar_SS (t_fmt): Likewise.
* localedata/locales/ar_SY (t_fmt): Likewise.
* localedata/locales/ar_TN (t_fmt): Likewise.
* localedata/locales/ar_YE (t_fmt): Likewise.
* localedata/locales/gez_ER (t_fmt): Set to "%l:%M:%S<U1361>%p".
(t_fmt_ampm): Likewise.
* localedata/locales/gez_ET (t_fmt): Likewise.
(t_fmt_ampm): Likewise.
* localedata/locales/ta_IN (t_fmt): Set to "%p %I:%M:%S %Z".
(t_fmt_ampm): Likewise.
(d_t_fmt): Set to "%A %d %B %Y %p %I:%M:%S %Z".
* localedata/locales/zh_HK (t_fmt):
Set to "%p %I<U6642>%M<U5206>%S<U79D2> %Z".
* localedata/locales/ar_MA (t_fmt_ampm): Set to "" (empty string)
because this locale does not use the 12-hour clock.
(t_fmt): Set to "%Z %H:%M:%S".
(d_t_fmt): Set to "%d %b, %Y %Z %H:%M:%S".
* localedata/locales/mt_MT (t_fmt_ampm): Set to "" (empty string)
because this locale does not use the 12-hour clock.
(t_fmt): Set to "%H:%M:%S %Z".
(d_t_fmt): Set to "%A, %d ta %b, %Y %H:%M:%S %Z".
* localedata/locales/so_KE (t_fmt_ampm): Set to "" (empty string)
because this locale does not use the 12-hour clock.
(t_fmt): Set to "%T".
(d_t_fmt): Set to "%A, %B %e, %Y %X %Z".
(date_fmt): Set to "%A, %B %e, %X %Z %Y".
* localedata/locales/ta_LK (t_fmt_ampm): Set to "" (empty string)
because this locale does not use the 12-hour clock.
(t_fmt): Set to "%H:%M:%S %Z".
(d_t_fmt): Set to "%A %d %B %Y %H:%M:%S %Z".
* localedata/locales/bg_BG (t_fmt_ampm): Set to "" (empty string)
because this locale does not use the 12-hour clock.
* localedata/locales/cs_CZ (t_fmt_ampm): Likewise.
* localedata/locales/sk_SK (t_fmt_ampm): Likewise.
Albanian locale uses the 12-hour clock but some time formats did not
use any AM/PM indicator making the time ambiguous. This commit adds
"%p" wherever it was missing.
It also sets the correct date format because the old "%Y-%b-%d" produced
rather weird results like "2018-Sht-28".
All time formats come from CLDR but as few changes have been introduced
by this commit as possible. Some articles from MSDN and other available
online sources have been also taken into account.
[BZ #10496]
[BZ #23724]
* localedata/locales/sq_AL (t_fmt): Set to "%I:%M:%S.%p %Z".
(t_fmt_ampm): Likewise.
(d_t_fmt): Set to "%a %-d %b %Y %I:%M:%S.%p".
(date_fmt): Add, set to "%a %-d %b %Y %I:%M:%S.%p %Z".
(d_fmt): Set to "%-d.%-m.%y".
Downstream distributions need consistent sets of hardlinks in
order for rpm to operate effectively. This means that even if
locales are built with a high level of parallelism that the
resulting files need to have consistent hardlink counts. The only
way to achieve this is with a post-install hardlink pass using a
program like 'hardlink' (shipped in Fedora).
If the downstream distro wants to post-process the hardlinks then
the time spent in localedef looking up sibling directories and
processing hardlinks is wasted effort.
To optimize the build and install pass we add a --no-hard-links
option to localedef to avoid doing the hardlink optimziation for
size.
Tested on x86_64 with 'make localedata/install-locale-files'
before and after. Without the patch we have files with 100+
hardlink counts. After the patch and running with --no-hard-links
all link counts are 1. This patch also alters the convenience
target 'make localedata/install-locale-files' to use the new
option.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Month names as provided by Oqaasileriffik, the official Greenlandic
language regulator. They have recently reached the consensus regarding
the orthography of the month names.
Date formats updated to match the correct Greenlandic order which is MDY.
[BZ #23740]
* localedata/locales/kl_GL (mon): Update, the relative case.
(alt_mon): Add, fill with month names in the nominative case.
(d_t_fmt): Set to "%a %b %d %Y %T %Z".
(d_fmt): Set to "%b %d %Y".
Although CLDR says otherwise, it is confirmed by Oqaasileriffik, the
official Greenlandic language regulator, that this change is correct.
[BZ #20209]
* localedata/locales/kl_GL: (abday): Fix spelling of Sun (Sunday),
should be "sap" rather than "sab".
(day): Fix spelling of Sunday, should be "sapaat" rather than
"sabaat".
Synchronize some values with CLDR and apply some suggestions from Bugzilla.
[BZ #10425]
* localedata/locales/it_IT (d_t_fmt): Use "%a %-d %b %Y, %T".
(date_fmt): Use "%a %-d %b %Y, %T, %Z".
* localedata/locales/it_CH (d_t_fmt): Use "%a %-d %b %Y, %T"
which is the same as in it_IT.
(d_fmt): Use "%d.%m.%Y" which is the same as in de_CH.
(date_fmt): Use "%a %-d %b %Y, %T, %Z" which is the same as in it_IT.
CLDR and many other sources say that it_IT (Italian) should use a dot
(".") as a thousands separator and a comma (",") as a decimal separator.
For it_CH and de_CH CLDR says that they should use the Right Single
Quotation Mark ("’") as a thousands separator and a dot (".") as a
decimal separator. Consequently, the same rules are copied to all other
locales in Switzerland.
These rules apply to both LC_MONETARY and LC_NUMERIC.
[BZ #10797]
* localedata/locales/de_CH (mon_thousands_sep): Use "<U2019>" (Right
Single Quotation Mark).
(thousands_sep): Likewise.
* localedata/locales/it_CH (LC_NUMERIC): Use “copy "de_CH"”.
* localedata/locales/it_IT (thousands_sep): Use ".".
(grouping): Use "3;3".
This commit also fixes d_fmt in bn_BD which is identical to bn_IN,
in ne_NP which is identical to ne_IN (not supported by Glibc but supported
by CLDR), and in ta_LK which is identical to ta_IN.
For those locales which are supported by CLDR data is imported from
CLDR v33. For others it is copied from those locales which were identical
before this commit.
[BZ #17426]
* localedata/locales/anp_IN (d_fmt): Use "%-d//%-m//%y".
* localedata/locales/ar_IN (d_fmt): Likewise.
* localedata/locales/bhb_IN (d_fmt): Likewise.
* localedata/locales/bho_IN (d_fmt): Likewise.
* localedata/locales/bn_BD (d_fmt): Likewise.
* localedata/locales/bn_IN (d_fmt): Likewise.
* localedata/locales/doi_IN (d_fmt): Likewise.
* localedata/locales/gu_IN (d_fmt): Likewise.
* localedata/locales/hi_IN (d_fmt): Likewise.
* localedata/locales/hne_IN (d_fmt): Likewise.
* localedata/locales/kn_IN (d_fmt): Likewise.
* localedata/locales/mag_IN (d_fmt): Likewise.
* localedata/locales/mai_IN (d_fmt): Likewise.
* localedata/locales/mjw_IN (d_fmt): Likewise.
* localedata/locales/ml_IN (d_fmt): Likewise.
* localedata/locales/mni_IN (d_fmt): Likewise.
* localedata/locales/mr_IN (d_fmt): Likewise.
* localedata/locales/pa_IN (d_fmt): Likewise.
* localedata/locales/raj_IN (d_fmt): Likewise.
* localedata/locales/sat_IN (d_fmt): Likewise.
* localedata/locales/sd_IN (d_fmt): Likewise.
* localedata/locales/sd_IN@devanagari (d_fmt): Likewise.
* localedata/locales/ta_IN (d_fmt): Likewise.
* localedata/locales/ta_LK (d_fmt): Likewise.
* localedata/locales/tcy_IN (d_fmt): Likewise.
* localedata/locales/ur_IN (d_fmt): Likewise.
* localedata/locales/brx_IN (d_fmt): Use "%-m//%-d//%y".
* localedata/locales/ks_IN (d_fmt): Likewise.
* localedata/locales/ks_IN@devanagari (d_fmt): Likewise.
* localedata/locales/kok_IN (d_fmt): Use "%-d-%-m-%y".
* localedata/locales/ne_NP (d_fmt): Use "%y//%-m//%-d".
* localedata/locales/sa_IN (d_fmt): Use "%-d-%m-%y".
* localedata/locales/te_IN (d_fmt): Use "%d-%m-%y".
The convenience install target 'install-locale-files' is created
to allow distributions to install all of the SUPPORTED locales as
files instead of into the locale-archive.
You invoke the new convenience target like this:
make localedata/install-locale-files DESTDIR=<prefix>
In commit 9479b6d5e0 we updated all of
the collation data to harmonize with the new version of ISO 14651
which is derived from Unicode 9.0.0. This collation update brought
with it some changes to locales which were not desirable by some
users, in particular it altered the meaning of the
locale-dependent-range regular expression, namely [a-z] and [A-Z], and
for en_US it caused uppercase letters to be matched by [a-z] for the
first time. The matching of uppercase letters by [a-z] is something
which is already known to users of other locales which have this
property, but this change could cause significant problems to en_US
and other similar locales that had never had this change before.
Whether this behaviour is desirable or not is contentious and GNU Awk
has this to say on the topic:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html
While the POSIX standard also has this further to say: "RE Bracket
Expression":
http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html
"The current standard leaves unspecified the behavior of a range
expression outside the POSIX locale. ... As noted above, efforts were
made to resolve the differences, but no solution has been found that
would be specific enough to allow for portable software while not
invalidating existing implementations."
In glibc we implement the requirement of ISO POSIX-2:1993 and use
collation element order (CEO) to construct the range expression, the
API internally is __collseq_table_lookup(). The fact that we use CEO
and also have 4-level weights on each collation rule means that we can
in practice reorder the collation rules in iso14651_t1_common (the new
data) to provide consistent range expression resolution *and* the
weights should maintain the expected total order. Therefore this
patch does three things:
* Reorder the collation rules for the LATIN script in
iso14651_t1_common to deinterlace uppercase and lowercase letters in
the collation element orders.
* Adds new test data en_US.UTF-8.in for sort-test.sh which exercises
strcoll* and strxfrm* and ensures the ISO 14651 collation remains.
* Add back tests to tst-fnmatch.input and tst-regexloc.c which
exercise that [a-z] does not match A or Z.
The reordering of the ISO 14651 data is done in an entirely mechanical
fashion using the following program attached to the bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=23393#c28
It is up for discussion if the iso14651_t1_common data should be
refined further to have 3 very tight collation element ranges that
include only a-z, A-Z, and 0-9, which would implement the solution
sought after in:
https://sourceware.org/bugzilla/show_bug.cgi?id=23393#c12
and implemented here:
https://www.sourceware.org/ml/libc-alpha/2018-07/msg00854.html
No regressions on x86_64.
Verified that removal of the iso14651_t1_common change causes tst-fnmatch
to regress with:
422: fnmatch ("[a-z]", "A", 0) = 0 (FAIL, expected FNM_NOMATCH) ***
...
425: fnmatch ("[A-Z]", "z", 0) = 0 (FAIL, expected FNM_NOMATCH) ***
Multiple updates for Occitan language including alternative month names,
update abday and abmon, fix typos in day, fix d_fmt, correct LC_NAME,
and use “copy "ca_ES"” as LC_COLLATE.
[BZ #23140]
* localedata/locales/oc_FR (mon): Rename to...
(alt_mon): This, then update October (typo fix).
(mon): New content (genitive case, month names preceded by
"de" or "d’").
[BZ #23422]
* localedata/locales/oc_FR (abday): Update all items.
(day): Update Wednesday and Saturday (typo fixes).
(abmon): Update all items, except May.
(d_fmt): Update "%d.%m.%Y" -> "%d/%m/%Y".
(LC_IDENTIFICATION): Bump the revision number and date.
Keep the "category" entries in alphabetic order.
(LC_ADDRESS): Remove no longer needed comment.
(LC_COLLATE): Use “copy "ca_ES"”.
(LC_NAME): Set the correct values of "name_fmt", "name_mr", and
"name_mrs".
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Fixed syntax error in the collation rules of Lower Sorbian language.
Collation test added in order to test the bugs like this early.
Reported-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
[BZ #23208]
* localedata/Makefile (test-input): Add dsb_DE.UTF-8.
(LOCALES): Likewise.
* localedata/dsb_DE.UTF-8.in: New file.
* localedata/locales/dsb_DE (LC_COLLATE): Fix syntax error.
In some places there was still the old Unicode version 10.0.0 in the files.
* localedata/charmaps/UTF-8: Use correct Unicode version 11.0.0 in comment.
* localedata/locales/i18n_ctype: Use correct Unicode version in comments
and headers.
* localedata/unicode-gen/utf8_gen.py: Add option to specify Unicode version
* localedata/unicode-gen/Makefile: Use option to specify Unicode version
for utf8_gen.py
There is a glibc optimization which allows for locale categories
to be removed during static compilation. There have been various
bugs for this support over the years, with bug 16915 being the
most recent. The solution there was to emit a reference to all the
categories to avoid any being removed. This fix, although it's in
the generic __nl_langinfo_l function, doesn't appear to be enough
to fix the case for a statically linked program that uses newlocale
and nl_langinfo_l. This commit doesn't fix the problem, but it does
add a XFAIL'd test case such that a fix can be applied against this
and the XFAIL removed. It's not entirely clear that the problem is
the same as that which was seen in bug 16915.
Unicode 11.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 11.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Some info about the number of characters added:
Total added characters in newly generated CHARMAP: 684
Total added characters in newly generated WIDTH: 119
alpha: Added 380 characters in new ctype which were not in old ctype
combining: Added 56 characters in new ctype which were not in old ctype
combining_level3: Added 37 characters in new ctype which were not in old ctype
graph: Added 684 characters in new ctype which were not in old ctype
lower: Added 82 characters in new ctype which were not in old ctype
print: Added 684 characters in new ctype which were not in old ctype
punct: Added 304 characters in new ctype which were not in old ctype
tolower: Added 79 characters in new ctype which were not in old ctype
totitle: Added 33 characters in new ctype which were not in old ctype
toupper: Added 79 characters in new ctype which were not in old ctype
upper: Added 79 characters in new ctype which were not in old ctype
No characters were removed.
[BZ #23308]
* unicode-gen/Makefile (UNICODE_VERSION): Set to 11.0.0.
* localedata/unicode-gen/DerivedCoreProperties.txt: Update to Unicode 11.0.0.
* localedata/unicode-gen/EastAsianWidth.txt: likewise.
* localedata/unicode-gen/PropList.txt: likewise.
* localedata/unicode-gen/UnicodeData.txt: likewise.
* localedata/charmaps/UTF-8: Regenerate.
* localedata/locales/i18n_ctype: likewise.
* localedata/locales/tr_TR: likewise.
* localedata/locales/translit_circle: likewise.
* localedata/locales/translit_cjk_compat: likewise.
* localedata/locales/translit_combining: likewise.
* localedata/locales/translit_compat: likewise.
* localedata/locales/translit_font: likewise.
* localedata/locales/translit_fraction: likewise.
This locale already contained correct data in mon array. Updated from
CLDR to start the month names with the lowercase letters.
alt_mon is a new import from CLDR. The change has been consulted
off-list with a native speaker.
[BZ #23140]
* localedata/locales/hy_AM (mon): Synchronize with CLDR (lowercase,
genitive case).
(alt_mon): New entry, import from CLDR (nominative case).
Kashubian language is not supported by CLDR, data copied from Wikipedia
and documents released by RJK (official Kashubian Language Council),
also consulted with a native speaker.
Note that this language also needs ab_alt_mon feature due to the month
May: nominative "môj", genitive "maja"; abbreviated nominative "môj",
abbreviated genitive "maj".
[BZ #23140]
* localedata/locales/csb_PL (mon): Rename to...
(alt_mon): This.
(abmon): Rename to...
(ab_alt_mon): This.
(mon): Add with proper genitive forms, copy from Wikipedia.
(abmon): Likewise.
Thank you Michal Ostrowski for the feedback.
[BZ #19485]
* localedata/locales/csb_PL (mon): Fix typos:
"łżëkwiôt" -> "łżëkwiat" (April); "lëpinc" -> "lëpińc" (July).
(yesstr): Add, value is "jo".
(nostr): Add, value is "nié".
As a followup of fixing bug 10871, these three languages now support two
grammatical cases of the month names.
This commit does not resolve the bug because there are more languages
to be committed.
[BZ #23140]
* localedata/locales/gd_GB (mon): Rename to...
(alt_mon): This.
(mon): Import from CLDR (genitive case).
* localedata/locales/hsb_DE (mon): Rename to...
(alt_mon): This.
(mon): Import from CLDR (genitive case).
* localedata/locales/wa_BE (mon): Rename to...
(alt_mon): This.
(mon): Add, fill with the proper genitive forms, but CLDR data
is incomplete; completed according to the comments in this file.
(d_t_fmt): Do not use "di" before the month name, no longer needed.
* localedata/locales/wa_BE (country_name): Reword
"Beljike" -> "Beldjike".
[BZ #23152]
* localedata/locales/gd_GB (abmon): Fix typo in May:
"Mhàrt" -> "Cèit". Adjust the comment according to the change.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
As spotted by GNOME translation team, Greek language has the actually
visible difference between the abbreviated nominative and the abbreviated
genitive case for some month names. Examples:
May:
abbreviated nominative: "Μάι" -> abbreviated genitive: "Μαΐ"
July:
abbreviated nominative: "Ιούν" -> abbreviated genitive: "Ιουλ"
and more month names with similar differences.
Original discussion: https://bugzilla.gnome.org/show_bug.cgi?id=793645#c21
[BZ #22937]
* localedata/locales/el_CY (abmon): Rename to...
(ab_alt_mon): This.
(abmon): Import from CLDR (abbreviated genitive case).
* localedata/locales/el_GR (abmon): Rename to...
(ab_alt_mon): This.
(abmon): Import from CLDR (abbreviated genitive case).
A GNOME translator asked to use the same abbreviated month names
as provided by CLDR. This sounds reasonable. See the discussion:
https://bugzilla.gnome.org/show_bug.cgi?id=793645#c27
[BZ #22932]
* localedata/locales/lt_LT (abmon): Synchronize with CLDR.
See this bug https://sourceware.org/bugzilla/show_bug.cgi?id=22898
These lines don’t yet work because of a glibc bug, not because of
problems in the locale data. No matter what sorting rules one uses,
these characters cannot be sorted at all at the moment.
As soon as that bug is fixed, these lines should be added back to the
test file.
* localedata/cmn_TW.UTF-8.in: Remove the lines which cannot
be sorted correctly at the moment because of a bug.
With out this, adding collation test files like localedata/gez_ER.UTF-8@abegede.in
does not work for locales which contain @ modifiers.
* gen-locales.mk: Make test files which contain @ modifiers in their
name work.
* localedata/gen-locale.sh: Likewise.
* localedata/da_DK.ISO-8859-1.in: In the new iso14651_t1_common file
downloaded from ISO, the collation order of @-. and space has changed.
Therefore, this test file needed to be adapted.
* localedata/fr_CA.UTF-8.in: Likewise.
* localedata/fr_FR.UTF-8.in: Likewise.
* localedata/uk_UA.UTF-8.in: Likewise.
* localedata/cs_CZ.UTF-8.in: adapt this test file to the collation
order of ȥ in the new iso14651_t1_common file.
* localedata/pl_PL.UTF-8.in: Likewise.
Entries for characters which have “IGNORE” on all 4 levels like:
<U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
are changed into:
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.
* localedata/locales/iso14651_t1_common: Use the code point of a
character in the fourth collation level instead of IGNORE for all
entries which have IGNORE on all 4 levels.
* localedata/locales/iso14651_t1_common: Add some convenient collation
symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
rules similar to those in CLDR.
* localedata/locales/iso14651_t1_common: The new version of this
file downloaded from ISO contained several syntax errors which
are fixed by this patch.
[BZ #14095] - Review / update collation data from Unicode / ISO 14651
File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt
Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.
This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq
[BZ #14095]
* localedata/locales/iso14651_t1_common: Update file to
latest version from ISO (ISO14651_2016_TABLE1_en.txt).
LC_TIME in these 4 locales is identical, using “copy "es_BO"” makes
that more obvious.
[BZ #22646]
* localedata/locales/es_CL (LC_TIME): copy "es_BO".
* localedata/locales/es_CU (LC_TIME): copy "es_BO".
* localedata/locales/es_EC (LC_TIME): copy "es_BO".
[BZ #10871]
* localedata/locales/ru_RU (mon): Rename to...
(alt_mon): This.
(abmon): Rename to...
(ab_alt_mon): This.
(mon): Import from CLDR (genitive case).
(abmon): Copy from the old content except the 5th month which is
now in the genitive case, even when abbreviated.
* localedata/locales/ru_UA: Likewise.
* time/tst-strptime.c (day_tests): Add an actual example of
a difference between %b and %Ob in Russian.
Primary month names are in a genitive case now, alternative month names
are in a nominative case.
The alternative digits hack is no longer needed and has been removed.
[BZ #10871]
* localedata/locales/uk_UA (mon): Renamed to...
(alt_mon): This.
(alt_digits): "0" removed and then renamed to...
(mon): This.
(date_fmt): Definition changed not to use the alternative
digits hack.
[BZ #10871]
* localedata/locales/pl_PL: Alternative month names added,
primary month names are genitive now.
* time/tst-strptime.c (day_tests): Actually use a genitive case
of a month name in Polish language.
Some languages (Slavic, Baltic, etc.) require a genitive case of the
month name when formatting a full date (with the day number) while
they require a nominative case when referring to the month standalone.
This requirement cannot be fulfilled without providing two forms for
each month name. From now it is specified that nl_langinfo(MON_1)
series (up to MON_12) and strftime("%B") generate the month names in
the grammatical form used when the month is a part of a complete date.
If the grammatical form used when the month is named by itself is needed,
the new values nl_langinfo(ALTMON_1) (up to ALTMON_12) and
strftime("%OB") are supported. This new feature is optional so the
languages which do not need it or do not yet provide the updated
locales simply do not use it and their behaviour is unchanged.
[BZ #10871]
* locale/C-time.c (_nl_C_LC_TIME): Add alternative month names,
define them as the same as primary full month names explicitly.
* locale/categories.def (LC_TIME): Add alt_mon and wide-alt_mon.
* locale/langinfo.h (__ALTMON_1, __ALTMON_2, __ALTMON_3, __ALTMON_4,
__ALTMON_5, __ALTMON_6, __ALTMON_7, __ALTMON_8, __ALTMON_9, __ALTMON_10,
__ALTMON_11, __ALTMON_12, _NL_WALTMON_1, _NL_WALTMON_2, _NL_WALTMON_3,
_NL_WALTMON_4, _NL_WALTMON_5, _NL_WALTMON_6, _NL_WALTMON_7,
_NL_WALTMON_8, _NL_WALTMON_9, _NL_WALTMON_10, _NL_WALTMON_11,
_NL_WALTMON_12): New enum constants.
[__USE_GNU] (ALTMON_1, ALTMON_2, ALTMON_3, ALTMON_4, ALTMON_5, ALTMON_6,
ALTMON_7, ALTMON_8, ALTMON_9, ALTMON_10, ALTMON_11, ALTMON_12): New
macros.
* locale/programs/ld-time.c (struct locale_time_t): Add alt_mon,
walt_mon, and alt_mon_defined members.
(time_output): Output alt_mon and walt_mon members.
(time_read): Read them, initialize them as copies of mon and wmon
respectively if they are missing, initialize alt_mon_defined.
* locale/programs/locfile-kw.gperf (alt_mon): Define.
* locale/programs/locfile-kw.h: Regenerate.
* locale/programs/locfile-token.h (tok_alt_mon): New enum constant.
* localedata/tst-langinfo.c (map): Add tests for the new constants
ALTMON_1 .. ALTMON_12.
* time/Makefile [$(run-built-tests) = yes] (LOCALES): Add fr_FR.UTF-8
and pl_PL.UTF-8.
* time/strftime_l.c (f_altmonth): New macro.
(__strftime_internal): Handle %OB format.
* time/strptime_l.c [_LIBC] (alt_month_name): New macro.
(__strptime_internal): Handle %OB format.
* time/tst-strptime.c (day_tests): Add tests to parse different forms
of month names including the new %OB format specifier.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reported-by: Robert Pluim <rpluim@gmail.com>
* localedata/locales/gu_IN (LC_IDENTIFICATION): Fix an obvious typo
in date: "2004-14-09" should be "2004-09-14".
* localedata/locales/lo_LA: Fix an obvious typo in date in the header:
"2003-15-09" should be "2003-09-15".
* localedata/locales/bho_NP (LC_IDENTIFICATION): Fix an obvious typo
in date: "2017-24-07" should be "2017-07-24".
* localedata/locales/mai_IN: Likewise.
* localedata/locales/mai_NP: Likewise.
The current date format prefixes one-digit days with a space, resulting
in ugly two spaces:
$ LC_ALL=hu_HU.UTF-8 date
2018. jan. 1., hétfő, 21:25:35 CET
^^
The official orthography rules doesn't contain an explicit rule about
this (which already gives no sane reason for double space), and an
implicit example of "1848. március 9." under bullet point 296 at
http://helyesiras.mta.hu/helyesiras/default/akh12 contains a single
space only. It's sure not convincing on an HTML page, but I confirm
that the official book edition (e.g.
https://www.libri.hu/en/konyv/a-magyar-helyesiras-szabalyai-32.html)
also contains a single space there.
[BZ #22657]
* localedata/locales/hu_HU (d_t_fmt): Avoid a leading space
before the day number which may produce a double space.
(date_fmt): Likewise.
[BZ #22524]
* localedata/Makefile: Add lt_LT.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/lt_LT.UTF-8.in: New file for testing the collation.
* localedata/locales/lt_LT (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22515]
* localedata/Makefile: Add hsb_DE.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/hsb_DE.UTF-8.in: New file for testing the collation.
* localedata/locales/hsb_DE (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22517]
* localedata/Makefile: Add et_EE.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/et_EE.UTF-8.in: New file for testing the collation.
* localedata/locales/et_EE (LC_COLLATE): Use “copy "iso14651_t1"”
and build the collation rules upon that.
[BZ #22527]
* localedata/locales/tr_TR (LC_COLLATE): Base collation rules
on iso14651_t1. A test file localedata/tr_TR.UTF-8.in is already
available, this rewrite of the collation rules does reproduce
the test file in the same order.
[BZ #10580]
* localedata/locales/hr_HR (LC_TIME): Use two letters for the
digraphs in the month and day names. Using single code points for
digraphs is deprecated. While there are dedicated Unicode
codepoints, for the digraphs, these are included for backwards
compatibility and modern texts use a sequence of Basic Latin
characters. See: https://www.unicode.org/faq/ligature_digraph.html
This makes the month and day names agree exactly with CLDR now,
CLDR does not use the single code points for the digraphs either.
According to CLDR, collation rules for Serbian and Bosnian
should be the same as for Croatian.
[BZ #22534]
* localedata/Makefile: Add sr_RS.UTF-8 and bs_BA.UTF-8 to test-input
and to the list of locales to be built for testing.
* localedata/bs_BA.UTF-8.in: New file (same as hr_HR.UTF-8.in).
* localedata/sr_RS.UTF-8.in: New file (same as hr_HR.UTF-8.in).
* localedata/locales/bs_BA (LC_COLLATE): Use “copy "hr_HR"”.
* localedata/locales/sr_RS (LC_COLLATE): Use “copy "hr_HR"”.
[BZ #10580]
* localedata/locales/hr_HR (LC_COLLATE): Base collation rules on
iso14651_t1.
* localedata/locales/hr_HR (LC_TIME): Sync month and day names with
CLDR (except use ligatures for the digraphs, CLDR does not use
the ligatures), add first_workday, some fixes in the date and time
formats.
* localedata/locales/hr_HR (LC_CTYPE): Add transliteration rules
for Đ and đ.
* localedata/locales/hr_HR (LC_MONETARY): Change currency_symbol to
lower case. p_cs_precedes and n_cs_precedes should be 0 instead of 1.
Add int_p_cs_precedes and int_n_cs_precedes.
* localedata/locales/hr_HR (LC_NUMERIC): Change thousands_sep to
"<U202F>" (NARROW NO-BREAK SPACE) and grouping to 3;3 (Agrees with
LC_MONETARY now).
* localedata/locales/hr_HR (LC_TELEPHONE): Add tel_dom_fmt.
* localedata/locales/hr_HR (LC_NAME): Add name_mr, name_mrs, and
name_miss.
* localedata/locales/hr_HR (LC_ADDRESS): Add country_post, country_isbn,
and lang_lib. Change postal_fmt.
change
[BZ #17750]
* Makefile: add fr_CA.UTF-8 to test-input and LOCALES.
* localedata/fr_CA.UTF-8.in: New file with test data for backward
accents sorting.
* localedata/fr_FR.UTF-8.in: Fix test data for forward accents
sorting.
* localedata/locales/cs_CZ (LC_COLLATE): Remove “define DIACRIT_FORWARD”
* localedata/locales/de_DE (LC_COLLATE): Likewise.
* localedata/locales/hu_HU (LC_COLLATE): Likewise.
* localedata/locales/lb_LU (LC_COLLATE): Likewise.
* localedata/locales/yuw_PG (LC_COLLATE): Likewise.
* localedata/locales/fr_CA (LC_COLLATE): Add “define DIACRIT_BACKWARD”
* localedata/locales/iso14651_t1_common: Use “ifdef DIACRIT_FORWARD”
instead of “ifdef DIACRIT_BACKWARD”.
The only locale which currently needs backward accents sorting is fr_CA.
Therefore, forward accents sorting should be the default.
Before this patch, backwards accent sorting was the default and all
locales except fr_CA had to use
define DIACRIT_FORWARD
before
copy "iso14651_t1"
Most locales didn’t do that and thus got the inappropriate backwards accents sorting
by accident. Now only the fr_CA locale needs to use
define DIACRIT_BACKWARD
before
copy "iso14651_t1"
Original patch slightly modified by: Mike FABIAN <mfabian@redhat.com>
The LOCALES variable in the localedata had two instances of cs_CZ
which generated the following warning:
../gen-locales.mk:11: target '/opt/build/localedata/cs_CZ.UTF-8/LC_CTYPE' given more than once in the same rule
Dropped the duplicate entry.
[BZ #22336]
* localedata/locales/cs_CZ (LC_COLLATE): Use “copy "iso14651_t1"”
and implement the collation rules for cs from CLDR on top of that.
* Makefile: Add cs_CZ.UTF-8 to test-input and to the list
of locales to be built for testing.
* cs_CZ.UTF-8.in: New file with test data to test the Czech sorting.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #22469]
* localedata/locales/pl_PL (LC_COLLATE): Use “copy "iso14651_t1"”
and implement the collation rules for pl from CLDR on top of that.
* Makefile: Add pl_PL.UTF-8 to test-input and to the list
of locales to be built for testing.
* pl_PL.UTF-8.in: New file with test data to test the Polish sorting.
[BZ #15537]
* localedata/locales/lv_LV (LC_COLLATE): Fix collation by
using “copy "iso14651_t1"” and then implementing the
collation rules for lv from CLDR on top of that.
* Makefile: Add lv_LV.UTF-8 to test-input and to the list
of locales to be built for testing.
* lv_LV.UTF-8.in: New file with test data to test the Latvian
sorting.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Update all sourceware links to https. The website redirects
everything to https anyway so let the web server do a bit less work.
The only reference that remains unchanged is the one in the old
ChangeLog, since it didn't seem worth changing it.
* NEWS: Update sourceware link to https.
* configure.ac: Likewise.
* crypt/md5test-giant.c: Likewise.
* dlfcn/bug-atexit1.c: Likewise.
* dlfcn/bug-atexit2.c: Likewise.
* localedata/README: Likewise.
* malloc/tst-mallocfork.c: Likewise.
* manual/install.texi: Likewise.
* nptl/tst-pthread-getattr.c: Likewise.
* stdio-common/tst-fgets.c: Likewise.
* stdio-common/tst-fwrite.c: Likewise.
* sunrpc/Makefile: Likewise.
* sysdeps/arm/armv7/multiarch/memcpy_impl.S: Likewise.
* wcsmbs/tst-mbrtowc2.c: Likewise.
* configure: Regenerate.
* INSTALL: Regenerate.
Following the previous work by Carlos O'Donell the category of LC_CTYPE
is correctly set to "i18n:2012" rather than "unicode:2014" and the
i18n_ctype file is once again regenerated from scratch to make sure it
does not contain any manual additions except the copyright message.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* localedata/unicode-gen/gen_unicode_ctype.py (output_head):
category of LC_CTYPE set to "i18n:2012".
* localedata/locales/i18n_ctype: Regenerate.
[BZ #19485]
* localedata/locales/csb_PL (LC_TIME): Fix “abmon” for March
and use a better translation for March in “mon”.
* localedata/locales/csb_PL: Use more ASCII to improve the
readability of the source.
[BZ #13953]
* localedata/locales/km_KH: Use ASCII as much
as possible for better readability of the source and
remove useless comments.
* localedata/locales/km_KH (LC_TIME): Remove era stuff, it
was commented out and apparently wrong anyway because it was
using Lao characters. If Buddhist era should be used
for km_KH, a native speaker should write the correct formaat
for Khmer.
* localedata/locales/km_KH (LC_TIME): Add first_weekday 1
(According to CLDR, the first weekday for Cambodia is Sunday).
* localedata/locales/km_KH (LC_NAME): Remove name_mr and name_mrs
(These were using Lao characters which must be wrong. If we get
the correct data from a native speaker, we could add it back, until
then it is better not to have name_mr and name_mrs at all than
having it wrong).
[BZ #15260]
* localedata/locales/doi_IN (LC_MESSAGES): Match only for the
first letters of yesstr and nostr in yesexpr and noexpr,
not for the full words.
* localedata/locales/hne_IN (LC_MESSAGES): Likewise.
* localedata/locales/kok_IN (LC_MESSAGES): Likewise.
* localedata/locales/mr_IN (LC_MESSAGES): Likewise.
* localedata/locales/sat_IN (LC_MESSAGES): Likewise.
* localedata/locales/km_KH (LC_MESSAGES): Match also for the
first letters of yesstr and nostr in yesexpr and noexpr,
until now only English was matched in yesexpr and noexpr.
* localedata/locales/tl_PH (LC_MESSAGES): Use “copy "fil_PH"”
instead of “copy "en_US"”. CLDR has yesstr and nostr data for
fil but not for tl. As tl and fil are very similar, using fil
is probably better than using English.
Pablo was l10n/i18n coordinator back in the old days but MandrakeSoft is
dead now
* localedata/locales/br_FR (LC_IDENTIFICATON): Add
Thierry Vignaud <thierry.vignaud@gmail.com> as the contact
for the br_FR locale.
"Ket" is the the most used negative answer, as it's the negative answer
to a positively phrased question
It's used as it or with the verb ("Ne ran ket", ...)
As such, "Ket" is used in most translations.
"Nann" is less used as it's the negative answer to a negatively phrased
question
See https://en.wikipedia.org/wiki/Yes_and_no for explanations about
languages with 3 or 4 form systems.
We still keep "Nn" for short answers as:
- new learners are used to "Non" in french
- and they often misuses "Nann"
- for compatibility with english
[BZ #21706]
* localedata/locales/br_FR (LC_MESSAGES): Fix nostr.
From localedef --help:
Output control:
...
--no-warnings=<warnings> Comma-separated list of warnings to disable;
supported warnings are: ascii, intcurrsym
...
--warnings=<warnings> Comma-separated list of warnings to enable;
supported warnings are: ascii, intcurrsym
Locales using SHIFT_JIS and SHIFT_JISX0213 character maps are not ASCII
compatible. In order to build locales using these character maps, and
have localedef exit with a status of 0, we add new option to localedef
to disable or enable specific warnings. The options are --no-warnings
and --warnings, to disable and enable specific warnings respectively.
The options take a comma-separated list of warning names. The warning
names are taken directly from the generated warning. When a warning
that can be disabled is issued it will print something like this: foo is
not defined [--no-warnings=foo]
For the initial implementation we add two controllable warnings; first
'ascii' which is used by the localedata installation makefile target to
install SHIFT_JIS and SHIFT_JISX0213-using locales without error; second
'intcurrsym' which allows a program to use a non-standard international
currency symbol without triggering a warning. The 'intcurrsym' is
useful in the future if country codes are added that are not in our
current ISO 4217 list, and the user wants to avoid the warning. Having
at least two warnings to control gives an example for how the changes
can be extended to more warnings if required in the future.
These changes allow ja_JP.SHIFT_JIS and ja_JP.SHIFT_JISX0213 to be
compiled without warnings using --no-warnings=ascii. The
localedata/Makefile $(INSTALL-SUPPORTED-LOCALES) target is adjusted to
automatically add `--no-warnings=ascii` for such charmaps, and likewise
localedata/gen-locale.sh is adjusted with similar logic.
v2: Bring verbose, be_quiet, and all warning control booleans into
record-status.c, and compile this object file to be used by locale,
iconv, and localedef. Any users include record-status.h.
v3: Fix an instance of boolean coercion in set_warning().
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
The localedata collation test data is encoded in a particular
character set. We rename the test data to match the full locale
name with encoding, and adjust the Makefile and sort-test.sh
script. This allows us to have a future C.UTF-8 test that is
disambiguated from the built-in C locale.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
After the transition to generating a distinct file for Unicode ctype
information e.g. i18n_ctype, the check target was left with the wrong
target name. This patch fixes the check target and regenerates the
files with more information than previously used, filling in the the
LC_IDENTIFICATION data.
Tested on x86_64 by regenerating from Unicode source files, and
running checks. Tested by subsequently rebuilding all locales.
No regressions in testsuite.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Reported-by: Rafal Luzynski <digitalfreak@lingonborough.com>
* localedata/locales/hi_IN (LC_MESSAGES): In yesexpr and noexpr,
also check for the first characters of yesstr and nostr.
* localedata/locales/kn_IN (LC_MESSAGES): Likewise.
* localedata/locales/ks_IN@devanagari (LC_MESSAGES): Likewise.