This updates IBM256, IBM277, IBM278, IBM280, IBM284, IBM297, IBM424
in the same way that IBM273 was updated for bug 23290.
IBM256 and IBM424 still have holes after this change, so HAS_HOLES
is not updated.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The 13th edition of Svenska Akademiens ordlista lists 'W' as a
distinct letter that sorts after 'V'. We adjust the sv_SE locale
(and tests) to match this updated and "reformed" language change.
This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
the letter 'W'.
No regressions on x86_64, and locale sorting tests all pass.
Co-authored-by: Carlos O'Donell <carlos@redhat.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
In 2000 when date_fmt was originally added as an extension the
en_US locale did not have a date_fmt specifier and so used the
default which resulted in the abbreviated month name coming
before the day of the month (as expected in the US and other
locales). In commit 7395f3a0ef the
date_fmt was added to en_US with a 12H time to better align with
US user expectations. Unfortunately the abbreviated month name
and day were inverted during that transition, and that was seen
as a regression and reported against Fedora 32:
https://bugzilla.redhat.com/show_bug.cgi?id=1830623
The progression of date_fmt looks like this:
"%a %b %e %H:%M:%S %Z %Y" <- Originally (2000)
"%a %d %b %Y %I:%M:%S %p %Z" <- glibc 2.29 (2019)
"%a %b %e %r %Z %Y" <- glibc 2.32 (2020) [this commit]
Note: "%r" is "%I:%M:%S %p" in en_US and so shorter to write.
Likewise the year is in the wrong place in commit
7395f3a0ef and this is corrected in
this patch.
For reference d_t_fmt:
"%a %d %b %Y %r %Z" <- d_t_fmt (1997)
Yes, d_t_fmt and date_fmt are *not* the same, this is just the
history of this locale. This commit does not change d_t_fmt to
better align with date_fmt. No users have requested we change
d_t_fmt or given any justification for such a change.
The only goals of this change are to place the abbreviated month
name before the day of the month as it has been printed since
2000, and place the year at the end. This minimizes the change
from commit 7395f3a0ef and makes
good on changing only from 24H clock to 12H clock.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The new tst-localedef-hardlinks verifies that when compiling
two locales (with default output directory) one with
--no-hard-links and one without the option, results in the
expected behaviour. When --no-hard-links is used the link
counts on LC_CTYPE is 1, indicating that even thoug the two
locale are identical (though different named source files and
output direcotry) the localedef did not carry out the hard
link optimization. Then when --no-hard-links is omitted the
localedef hard link optimization is correctly carried out and
for 2 compiled locales the link count for LC_CTYPE is 2.
Reviewed-by: DJ Delorie <dj@redhat.com>
Unicode 13.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 13.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Total added characters in newly generated CHARMAP: 5930
Total added characters in newly generated WIDTH: 5536
Confirmed by CLDR and a native speaker: "abril" is more often used even
if "abrial" is also correct. Both nominative (alt_mon) and genitive (mon)
cases are updated.
It is not specified what should be the content of d_t_fmt and date_fmt
but in the built-in C locale those fields have only one difference:
date_fmt contains "%Z" (the current time zone) while d_t_fmt does not.
For most of the locales this commit does the following operation:
copy d_t_fmt to date_fmt, and then remove "%Z" from d_t_fmt.
If "%Z" was originally missing from d_t_fmt add it to date_fmt.
It also corrects comments where necessary.
Exceptions:
* In bo_CN, dz_BT, and km_KH "%Z" has not been added to date_fmt because
it was too difficult. In these locales date_fmt has been set to the
copy of d_t_fmt.
* In en_DK "%Z" has not been removed from d_t_fmt in order to preserve
the conformance with the standard mentioned in the comment.
The command to identify and initially edit the locales that need the
update was:
for i in `grep -lw d_t_fmt *`
do
if ! grep -qw date_fmt $i ; then
awk '/d_t_fmt/ { print $0; gsub("d_t_fmt", "date_fmt"); } //{ print $0 }' < $i > $i.next
mv $i.next $i
fi
done
and then each file was further edited manually.
Currently d_t_fmt formats time as "plkst. %H un %M". A quick Google
search says that "plkst." means "o’clock" and "un" means "and".
Also this format does not display seconds.
CLDR does not mention anything like that. We have no reason to use
anything different than "%H:%M:%S".
Replacing incorrect abbreviated weekday names "Пнд", "Вто", "Срд"...
with correct ones "Пн", "Вт", "Ср"... makes the LC_TIME sections in
those two locales almost identical. The only remaining difference
was that ab_alt_mon elements in ru_UA were lowercase while in ru_RU
they had the first letter uppercase, the latter was pointed as
a better choice by a native speaker. This commit unifies LC_TIME
between ru_RU and ru_UA.
This commit adds previously missing transliterations for several code points
in the Unicode blocks "Miscellaneous Mathematical Symbols-A/B" -
transliterated to their approximate ASCII representations. It also adds a
corresponding iconv transliteration test.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The testroot does not have a gunzip command, so the charmap files
should not be installed gzipped else they cannot be used (and thus
tested). With this patch, installing with INSTALL_UNCOMPRESSED=yes
installs uncompressed charmaps instead.
Note that we must purge the $(symbolic_link_list) as it contains
references to $(DESTDIR), which we change during the testroot
installation.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Sync these values with CLDR and langtable as much as possible. Add
missing values.
If possible, take the values from CLDR, if CLDR does not have it,
take it from langtable. The values from langtable which are not from
CLDR are from Wikipedia or native speakers.
The first day of the week in China (Mainland) should be Monday according
to the national standard GB/T 7408-2005. References:
* https://www.doc88.com/p-1166696540287.html
* https://unicode-org.atlassian.net/browse/CLDR-11510
[BZ #24682]
* localedata/locales/bo_CN (first_weekday): Add, set to 2 (Monday).
* localedata/locales/ug_CN (first_weekday): Likewise.
* localedata/locales/zh_CN (first_weekday): Likewise.
This commit updates month and weekday names (full and abbreviated)
from CLDR 35.1 with the following exceptions.
It was not clear why the full name of February in aa_DJ and aa_ER was
"Kudo" while the abbreviated version is "Nah" but some additional
sources [1] [2] as well as the content of aa_ER and aa_ER@saaho
suggest it should be "Naharsi Kudo". This commit consequently sets
the translation of February to "Naharsi Kudo" in aa_DJ and aa_ET.
aa_ER@saaho is not supported by CLDR but since the month names were
identical to aa_ER before this commit, the same values have been copied
from aa_ER.
Links:
[1] https://fr.wiktionary.org/wiki/naharsi_kudo
[2] http://www.mcit.gov.et/web/guest/-/localization-standard-for-afaraf
[BZ #21897]
* localedata/locales/aa_DJ (abday): Update from CLDR, all words
begin with an uppercase letter now.
(abmon): Likewise.
(mon): Update from CLDR, reword February from "Kudo" to
"Naharsi Kudo", April from "Agda Baxisso" to "Agda Baxis",
and August from "Liiqen" to "Leqeeni".
* localedata/locales/aa_ER (mon): Update from CLDR, reword
April from "Agda Baxisso" to "Agda Baxis" and August from
"Leqeeni" to "Liiqen".
* localedata/locales/aa_ER@saaho (mon): Likewise.
* localedata/locales/aa_ET (abmon): Update from CLDR, reword
abbreviated February from "Kud" to "Nah".
(mon): Update from CLDR, reword February from "Kudo" to
"Naharsi Kudo" and April from "Agda Baxisso" to "Agda Baxis".
These values were removed by the commit 0a410e76f5.
[BZ #24200]
* localedata/locales/ga_IE (first_weekday): Add, set to 2 (Monday).
* localedata/locales/en_IE (first_weekday): Likewise.
The Unicode sequences in the format <Uxxxx> should be used instead of
non-ASCII characters.
Reported by Piotr Drąg:
https://sourceware.org/bugzilla/show_bug.cgi?id=24652#c8
[BZ #24652]
* localedata/locales/szl_PL (day): Use the correct Unicode
sequences instead of non-ASCII characters.
This commit also provides the correct month names in both nominative
and genitive case for Silesian language, as required by the fix for
the bug 10871.
[BZ #24652]
* localedata/locales/szl_PL (abday): Spelling corrections.
(day): Likewise.
(abmon): Likewise.
(mon): Rename to...
(alt_mon): This, then apply spelling corrections.
(mon): New entry, month names in the genitive case.
According to CLDR 35.1 and the bug report the thousands grouping
separator should be always "." (a single dot) and digits should be
grouped by 3.
[BZ #23831]
* localedata/locales/nl_AW (mon_thousands_sep): Set to ".".
* localedata/locales/nl_NL (mon_thousands_sep): Likewise.
(thousands_sep): Likewise.
(grouping): Set to 3;3.
Follow the same changes as made in the commit 02d8b5ab1c because the
respective entries in nl_NL and nl_AW had been the same before the change
so they should be the same after. CLDR does not provide complete data
for nl_AW, it says it is missing and displays a copy of nl_NL.
[BZ #24614]
* localedata/locales/nl_AW (n_sep_by_space): Set to 2 (a space
between the currency symbol and the minus sign).
(n_sign_posn): Set to 4 (the minus sign after the currency symbol).
According to CLDR 35.1 and the bug report the correct monetary format
for negative amounts should be "EUR -1 234,56" while previously it was
"EUR 1 234,56-".
This patch does not change the thousands (grouping) separator.
[BZ #24614]
* localedata/Makefile (LOCALES): Add nl_NL.UTF-8.
* localedata/locales/nl_NL (n_sep_by_space): Set to 2 (a space
between the currency symbol and the minus sign).
(n_sign_posn): Set to 4 (the minus sign after the currency symbol).
* localedata/tst-strfmon1.c (tests): Add test data for nl_NL.UTF-8.
This commit fixes some errors and converts all month names to lowercase.
The content is synchronized with CLDR-35.1 now but trailing dots are
removed from abmon values in order to maintain consistency with the
previous values and with many other locales which do the same.
[BZ #24369]
* localedata/locales/tt_RU (mon): Update from CLDR-35.1, fix errors.
(abmon): Likewise, but remove the trailing dots.
Unicode 12.1.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 12.1.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Some info about the number of characters added or changed:
Total added characters in newly generated CHARMAP: 1
added: <U32FF> /xe3/x8b/xbf SQUARE ERA NAME REIWA
Total added characters in newly generated WIDTH: 1
added: <U32FF> 2 : eaw=W category=So bidi=L name=SQUARE ERA NAME REIWA
graph: Added 1 characters in new ctype which were not in old ctype
graph: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
print: Added 1 characters in new ctype which were not in old ctype
print: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
punct: Added 1 characters in new ctype which were not in old ctype
punct: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
The Japanese era name will be changed on May 1, 2019. The Japanese
government made a preliminary announcement on April 1, 2019.
The glibc ja_JP locale must be updated to include the new era name for
strftime's alternative year format support.
Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
ChangeLog:
[BZ #22964]
* localedata/locales/ja_JP (LC_TIME): Add entry for the new Japanese
era.
* time/tst-strftime2.c (dates): Add 2019-04-30 and 2019-05-01.
(mkreftable): Add rules for the new Japanese era and the new dates.
This commit fixes some errors and converts all weekday names to lowercase.
The content is synchronized with CLDR-34 now, but trailing dots are removed
from abday values in order to maintain consistency with the previous values
and with many other locales which do the same.
[BZ #24296]
* localedata/locales/tt_RU (day): Update from CLDR-34, fix errors.
(abday): Likewise, but remove the trailing dots.
Minguo calendar is the official calendar system, and very widely used in
Taiwan. This commit adds its support into glibc.
Some background information: The government website (www.gov.tw) uses it,
popular public services like Taiwan HSR also use this calendar system.
Link to Wikipedia: https://en.wikipedia.org/wiki/Minguo_calendar
[BZ #24293]
* localedata/locales/zh_TW (era): Add, support Minguo calendar.
* localedata/locales/cmn_TW (era): Likewise.
* localedata/locales/hak_TW (era): Likewise.
* localedata/locales/lzh_TW (era): Likewise.
* localedata/locales/nan_TW (era): Likewise.
Unicode 12.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 12.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Some info about the number of characters added or changed:
Total added characters in newly generated CHARMAP: 554
Total added characters in newly generated WIDTH: 106
alpha: Missing 8 characters of old ctype in new ctype
(These are combining marks, apparently they were removed from alpha
on purpose)
alpha: Added 295 characters in new ctype which were not in old ctype
combining: Missing 2 characters of old ctype in new ctype
(U+1CF2 VEDIC SIGN ARDHAVISARGA and U+1CF3 VEDIC SIGN ROTATED ARDHAVISARGA,
these are now "Alphabetic" in Unicode 12.0.0)
combining: Added 37 characters in new ctype which were not in old ctype
combining_level3: Missing 2 characters of old ctype in new ctype
(U+1CF2 VEDIC SIGN ARDHAVISARGA and U+1CF3 VEDIC SIGN ROTATED ARDHAVISARGA,
these are now "Alphabetic" in Unicode 12.0.0)
combining_level3: Added 26 characters in new ctype which were not in old ctype
graph: Added 554 characters in new ctype which were not in old ctype
lower: Added 6 characters in new ctype which were not in old ctype
print: Added 554 characters in new ctype which were not in old ctype
punct: Missing 29 characters of old ctype in new ctype
(These characters have all become "Alphabetic" in Unicode 12.0.0.
Therefore, they are not in "punct" anymore (see: is_punct() in unicode_utils.py))
punct: Added 296 characters in new ctype which were not in old ctype
tolower: Added 7 characters in new ctype which were not in old ctype
totitle: Added 7 characters in new ctype which were not in old ctype
toupper: Added 7 characters in new ctype which were not in old ctype
upper: Added 7 characters in new ctype which were not in old ctype
[BZ #24307]
* localedata/unicode-gen/Makefile (UNICODE_VERSION): Set to 12.0.0.
* localedata/unicode-gen/DerivedCoreProperties.txt: Update to Unicode 12.0.0.
* localedata/unicode-gen/EastAsianWidth.txt: Likewise.
* localedata/unicode-gen/PropList.txt: Likewise.
* localedata/unicode-gen/UnicodeData.txt: Likewise.
* localedata/unicode-gen/ctype_compatibility_test_cases.py: U+108D became
"Alphabetic" in Unicode 12.0.0. Adapt test case.
* localedata/charmaps/UTF-8: Regenerate.
* localedata/locales/i18n_ctype: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/translit_circle: Likewise.
* localedata/locales/translit_cjk_compat: Likewise.
* localedata/locales/translit_combining: Likewise.
* localedata/locales/translit_compat: Likewise.
* localedata/locales/translit_font: Likewise.
* localedata/locales/translit_fraction: Likewise.
The offset in era-string format for Taisho gan-nen (1912) is currently
defined as 2, but it should be 1. So fix it. "Gan-nen" means the 1st
(origin) year, Taisho started on July 30, 1912.
Reported-by: Morimitsu, Junji <junji.morimitsu@hpe.com>
Reviewed-by: Rafal Luzynski <digitalfreak@lingonborough.com>
ChangeLog:
[BZ #24162]
* localedata/locales/ja_JP (LC_TIME): Change the offset for Taisho
gan-nen from 2 to 1. Problem reported by Morimitsu, Junji.
The en_US locale use a 12h am/pm format in both d_fmt and d_t_fmt, which
is correct, but does not define date_fmt. This causes the default value
to be used, which is in 24h format.
This patch adds the date_fmt entry to the en_US locale with the same
value as d_t_fmt as the latter already includes the timezone.
Changelog
[BZ #24046]
* localedata/locales/en_US (date_fmt): Add, set to
"%a %d %b %Y %r %Z".