Commit Graph

1002 Commits

Author SHA1 Message Date
Mike FABIAN
cdce63a767 localedata: uz_UZ and uz_UZ@cyrillic: Fix decimal point and thousands separator
Resolves: BZ # 31204
2024-01-02 16:36:43 +01:00
Mike FABIAN
fce5528fcb localedata: yo_NT: remove redundant comments
See: https://sourceware.org/pipermail/libc-alpha/2023-December/153538.html
2023-12-26 13:27:07 +01:00
Mike FABIAN
6b3ace3a1d localedata: convert en_AU, en_NZ, mi_NZ, niu_NZ to UTF-8 2023-12-26 10:05:50 +01:00
Mike FABIAN
89d727efd7 localedata: First day of the week in AU is Monday, LC_TIME in en_NZ is identical to LC_TIME in en_AU then
Resolves: BZ # 24877
2023-12-26 09:59:10 +01:00
Mike FABIAN
e65ca11515 localedata: convert yo_NG to UTF-8, check that language name in Yoruba agrees with CLDR
Related: BZ # 24878
2023-12-25 21:04:38 +01:00
Mike FABIAN
1e70252508 localedata: id_ID: change first weekday to Sunday
Resolves: BZ # 30412

See: https://sourceware.org/bugzilla/show_bug.cgi?id=30412#c7

CLDR also has ID in the list of territories which have Sunday as the
first day of the week.
2023-12-19 11:23:19 +01:00
RushingAlien
12ab77e893 id_ID: Update Time Locales
Hello! I am Indonesian, was born and raised in Indonesia and still do live in
Indonesia.

This patch brings a few changes to the time locales of id_ID, which
includes :
\- Defining am_pm and time_fmpt_ampm
\- Changing time_fmt and d_t_fmt to use the 24-hour format
\- Changing first_weekday to Monday
This is a squashed version of what is previously a 5 patch set

Here are reasons and details of the changes :

Change 1 part 1

id_ID: Define `am_pm` string

Current formatting does not define am_pm string, leading to AM and PM
not being specified in 12 H time format. This change defines the string
by changing it from an empty string to "AM";"PM".

output of `date +%r`:
before commit: 01:23
after commit: 01:23 PM

Change 1 part 2

id_ID: Define time_fmt_ampm, change from an empty string

Currently, time_fmpt_ampm is set to an empty string, causing some
programs to not be able to display time in the 12-hour format, for
example, glib: https://gitlab.gnome.org/GNOME/glib/-/issues/2967.
This commit changes it from an empty string to "%I:%M:%S %p"

Change 2 part 1

id_ID: Use 24-hour format for time_fmt

Indonesian standard and formal time format uses the 24-hour format inst-
ead of the 12-hour format. This commit aims to change the id_ID locale's
time_fmt to match that accordingly.

Change 2 part 2

id_ID: Use 24-hour format for d_t_fmt.

Indonesian standard and formal time format uses the 24-hour format inst-
ead of the 12-hour format. This commit aims to change the id_ID locale's
d_t_fmt to match that accordingly.

Change 3

id_ID: Change first_weekday to monday

Indonesian calendar starts of the week with Monday, let's comply

Message-ID: <20230821035530.9075-1-rushing27alien@gmail.com>
Resolves: BZ # 30412
Reviewed-by: Mike Fabian <mfabian@redhat.com>
2023-12-18 09:57:33 +01:00
Mike FABIAN
73d92c4b73 localedata: Convert el_GR and el_CY locales to UTF-8 2023-12-15 21:08:44 +01:00
Mike FABIAN
14a94f2e35 localedata: el_GR: Greece now uses the 24h format for time
Resolves: BZ # 23012
2023-12-15 21:08:44 +01:00
Mike FABIAN
958478889c localedata: Convert day names in nn_NO locale to UTF-8 2023-12-07 08:28:25 +01:00
Mike FABIAN
ff25f355af localedata: Remove trailing whitespace in weekday names in nn_NO locale
Resolves: BZ # 25868
2023-12-07 08:28:25 +01:00
Mike FABIAN
dae3cf4134 localedata: Convert oc_FR locale to UTF-8 2023-11-16 23:58:17 +01:00
Mike FABIAN
70246b8495 localedata: Add information for Occitan
Resolves: BZ # 28787
2023-11-16 23:58:17 +01:00
Mike FABIAN
3fddfe3c5d New Zealand locales (en_NZ & mi_NZ) first day of week should be Monday
Resolves: BZ #29486
2023-11-16 13:59:00 +01:00
Mike FABIAN
aceda10bd5 Adapt collation in th_TH locale to use the iso14651_t1_common file and sync the collation with CLDR
I made it to agree as much as possible with the rules from CLDR (see:
https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml).

It seems to be impossible to follow the CLDR rules

  &[before 1]๚<ฯ # should be "variable"

and

  &๛<ๆ # should be "variable"

exactly though. These ask for a primary difference in punctuation
characters whose primary weight should be "IGNORE". But using a
secondary differnence instead still sorts the test data correctly and
the previously used collation in th_TH used tertiary differences for
these characters.

There was old localedata/th_TH.in test data in TIS-620 encoding which
was not used (it was not in the localedata/Makefile). I converted this
to UTF-8 and moved it to localedata/th_TH.UTF-8.in and added it to
localedata/Makefile.

Using the existing collation rules in the th_TH locale did not sort that
test file completely correct, I think my new collation rules based on
iso14651_t1 are better.
2023-09-21 10:34:35 +02:00
Mike FABIAN
bb5bbc2070 Update to Unicode 15.1.0 [BZ #30854]
Unicode 15.1.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 15.1.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

    Total removed characters in newly generated CHARMAP: 0
    Total changed characters in newly generated CHARMAP: 0
    Total added characters in newly generated CHARMAP: 627
    Total removed characters in newly generated WIDTH: 0
    Total changed characters in newly generated WIDTH: 0
    Total added characters in newly generated WIDTH: 627

    alpha: Added 622 characters in new ctype which were not in old ctype
    graph: Added 627 characters in new ctype which were not in old ctype
    print: Added 627 characters in new ctype which were not in old ctype
    punct: Added 5 characters in new ctype which were not in old ctype
        The five characters added to punct are:
        2FFC;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM RIGHT;So;0;ON;;;;;N;;;;;
        2FFD;IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER RIGHT;So;0;ON;;;;;N;;;;;
        2FFE;IDEOGRAPHIC DESCRIPTION CHARACTER HORIZONTAL REFLECTION;So;0;ON;;;;;N;;;;;
        2FFF;IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION;So;0;ON;;;;;N;;;;;
        31EF;IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION;So;0;ON;;;;;N;;;;;

    The Unicode announcement blog entry says "[...] adds 627
    characters, [...] additions include 622 CJK unified ideographs in
    a new block, [...]", so that looks OK. The Unicode
    blog mentions "six completely new emoji" but they don't appear here as
    they are all sequences and not single code points.

Resolves: BZ #30854

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-09-16 08:37:03 +02:00
Colin Leroy-Mira
dfe8c44588 localedata: Translit common emojis to smileys [BZ #30649]
Add common emojis to the translit-able characters (mostly
faces and hearts), and translit them to old-fashioned
smileys.

Signed-off-by: Colin Leroy-Mira <colin@colino.net>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-08-29 09:31:23 +02:00
Florian Weimer
4dc6b2dfb0 localedata: de_DE should not use Fräulein
This honorific has fallen out of use quite some time ago.
2023-02-27 16:54:22 +01:00
Mike FABIAN
7fe6734d28 Update to Unicode 15.0.0 [BZ #29604]
Unicode 15.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 15.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

    Total added characters in newly generated CHARMAP: 4489
    Total removed characters in newly generated WIDTH: 0
    Total changed characters in newly generated WIDTH: 0
    Total added characters in newly generated WIDTH: 4257

    alpha: Added 4389 characters in new ctype which were not in old ctype
    combining: Added 42 characters in new ctype which were not in old ctype
    combining_level3: Added 34 characters in new ctype which were not in old ctype
    graph: Added 4489 characters in new ctype which were not in old ctype
    lower: Added 73 characters in new ctype which were not in old ctype
    print: Added 4489 characters in new ctype which were not in old ctype
    punct: Missing 5 characters of old ctype in new ctype
        punct: Missing: ఄ 0xc04 TELUGU SIGN COMBINING ANUSVARA ABOVE
        punct: Missing: ྂ 0xf82 TIBETAN SIGN NYI ZLA NAA DA
        punct: Missing: ྃ 0xf83 TIBETAN SIGN SNA LDAN
        punct: Missing: 𑂀 0x11080 KAITHI SIGN CANDRABINDU
        punct: Missing: 𑂁 0x11081 KAITHI SIGN ANUSVARA
            That’s OK, because these are now Alphabetic in DerivedCoreProperties.txt
    punct: Added 105 characters in new ctype which were not in old ctype

Resolves: BZ #29604
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-10-06 08:58:33 +02:00
Florian Weimer
1d78299911 localedata: Convert French language locales (fr_*) to UTF-8 2022-08-17 11:07:00 +02:00
Florian Weimer
01441ae333 de_DE: Convert to UTF-8
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05 09:07:02 +02:00
Emil Soleyman-Zomalan
3e29dc5233 Add locale for syr_SY 2022-04-21 13:05:40 +02:00
Ilyahoo Proshel
189906b687 Add rif_MA locale [BZ #27781]
Resolves: BZ #27781
2022-04-07 14:59:41 +02:00
Carlos O'Donell
7e0ad15c0f localedata: Adjust C.UTF-8 to align with C/POSIX.
We have had one downstream report from Canonical [1] that
an rrdtool test was broken by the differences in LC_TIME
that we had in the non-builtin C locale (C.UTF-8). If one
application has an issue there are going to be others, and
so with this commit we review and fix all the issues that
cause the builtin C locale to be different from C.UTF-8,
which includes:
* mon_decimal_point should be empty e.g. ""
 - Depends on mon_decimal_point_wc fix.
* negative_sign should be empty e.g. ""
* week should be aligned with the builtin C/POSIX locale
* d_fmt corrected with escaped slashes e.g. "%m//%d//%y"
* yesstr and nostr should be empty e.g. ""
* country_ab2 and country_ab3 should be empty e.g. ""

We bump LC_IDENTIFICATION version and adjust the date to
indicate the change in the locale.

A new tst-c-utf8-consistency test is added to ensure
consistency between C/POSIX and C.UTF-8.

Tested on x86_64 and i686 without regression.

[1] https://sourceware.org/pipermail/libc-alpha/2022-January/135703.html

Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-02-01 11:12:36 -05:00
Maxim Kuvyrkov
c16dc431c8 Update copyright header in recently merged ab_GE locale
ab_GE locale was committed under DCO and this header
proposed in [1] suits it better.

[1] https://sourceware.org/pipermail/libc-alpha/2021-September/130692.html

Signed-off-by: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
Signed-off-by: Nart Tlisha <daniel.abzakh@gmail.com>
2021-12-17 18:22:21 +00:00
Nart Tlisha
a16c5ab139 localedata: add new locale ab_GE
Add the Abkhazian language in the Georgia territory

The ab_GE was just recently added to CLDR, it should be available
in CLDR v41, https://github.com/unicode-org/cldr/pull/1402

The Abkhazian language has been added to Gnome for localization

The locale has been tested on Ubuntu 20.04, Mint 20.2 and Fedora 35 Beta

Signed-off-by: Nart Tlisha <daniel.abzakh@gmail.com>
Reviewed-by: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
2021-12-16 14:37:14 +00:00
Mike FABIAN
b517256015 Update to Unicode 14.0.0 [BZ #28390]
Unicode 14.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 14.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

Total added characters in newly generated CHARMAP: 838
Total removed characters in newly generated WIDTH: 1
    (Characters not in WIDTH get width 1 by default, i.e. these have width 1 now.)
    removed: <U1734> 0 : eaw=N category=Mc bidi=L   name=HANUNOO SIGN PAMUDPOD
    That seems intentional, the character had category Mn (Mark, nonspacing) before
    and now has Mc (Mark, spacing combining)
Total changed characters in newly generated WIDTH: 0
Total added characters in newly generated WIDTH: 175
2021-10-04 08:54:27 +02:00
Carlos O'Donell
466f2be6c0 Add generic C.UTF-8 locale (Bug 17318)
We add a new C.UTF-8 locale. This locale is not builtin to glibc, but
is provided as a distinct locale. The locale provides full support for
UTF-8 and this includes full code point sorting via STRCMP-based
collation (strcmp or wcscmp).

The collation uses a new keyword 'codepoint_collation' which drops all
collation rules and generates an empty zero rules collation to enable
STRCMP usage in collation. This ensures that we get full code point
sorting for C.UTF-8 with a minimal 1406 bytes of overhead (LC_COLLATE
structure information and ASCII collating tables).

The new locale is added to SUPPORTED. Minimal test data for specific
code points (minus those not supported by collate-test) is provided in
C.UTF-8.in, and this verifies code point sorting is working reasonably
across the range. The locale was tested manually with the full set of
code points without failure.

The locale is harmonized with locales already shipping in various
downstream distributions. A new tst-iconv9 test is added which verifies
the C.UTF-8 locale is generally usable.

Testing for fnmatch, regexec, and recomp is provided by extending
bug-regex1, bugregex19, bug-regex4, bug-regex6, transbug, tst-fnmatch,
tst-regcomp-truncated, and tst-regex to use C.UTF-8.

Tested on x86_64 or i686 without regression.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-09-06 11:30:28 -04:00
Siddhesh Poyarekar
30891f35fa Remove "Contributed by" lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date.  Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.

Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions.  These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.

The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively.  These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:

https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc
https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-09-03 22:06:44 +05:30
Sebastian Rasmussen
ebde2baeb5 Update sv_SE to treate 'W' as a distinct character (Bug 25036)
The 13th edition of Svenska Akademiens ordlista lists 'W' as a
distinct letter that sorts after 'V'. We adjust the sv_SE locale
(and tests) to match this updated and "reformed" language change.
This harmonizes us with CLDR 1.5.0 (2007) for sv_SE sorting of
the letter 'W'.

No regressions on x86_64, and locale sorting tests all pass.

Co-authored-by: Carlos O'Donell <carlos@redhat.com>
2021-04-06 12:34:02 -04:00
Marc Aurèle La France
c6e2ca2c3f POSIX locale: Fix typo in comment 2021-01-09 12:14:44 +01:00
Carlos O'Donell
8cde977077 en_US: Minimize changes to date_fmt (Bug 25923)
In 2000 when date_fmt was originally added as an extension the
en_US locale did not have a date_fmt specifier and so used the
default which resulted in the abbreviated month name coming
before the day of the month (as expected in the US and other
locales).  In commit 7395f3a0ef the
date_fmt was added to en_US with a 12H time to better align with
US user expectations.  Unfortunately the abbreviated month name
and day were inverted during that transition, and that was seen
as a regression and reported against Fedora 32:
https://bugzilla.redhat.com/show_bug.cgi?id=1830623

The progression of date_fmt looks like this:
"%a %b %e %H:%M:%S %Z %Y"    <- Originally (2000)
"%a %d %b %Y %I:%M:%S %p %Z" <- glibc 2.29 (2019)
"%a %b %e %r %Z %Y"          <- glibc 2.32 (2020) [this commit]

Note: "%r" is "%I:%M:%S %p" in en_US and so shorter to write.

Likewise the year is in the wrong place in commit
7395f3a0ef and this is corrected in
this patch.

For reference d_t_fmt:
"%a %d %b %Y %r %Z"          <- d_t_fmt    (1997)

Yes, d_t_fmt and date_fmt are *not* the same, this is just the
history of this locale. This commit does not change d_t_fmt to
better align with date_fmt. No users have requested we change
d_t_fmt or given any justification for such a change.

The only goals of this change are to place the abbreviated month
name before the day of the month as it has been printed since
2000, and place the year at the end. This minimizes the change
from commit 7395f3a0ef and makes
good on changing only from 24H clock to 12H clock.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2020-07-16 17:17:10 -04:00
Mike FABIAN
6e540caa21 Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120]
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-06-26 09:54:43 +02:00
Florian Weimer
3404def00a ckb_IQ, or_IN locales: Add missing reorder-end keywords
This suppresses a non-fatal error during locale building.

Reviewed-by: Rafał Lużyński <digitalfreak@lingonborough.com>
2020-05-08 10:52:22 +02:00
Mike FABIAN
8645f62469 Bug 25819: Update to Unicode 13.0.0
Unicode 13.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 13.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).

Total added characters in newly generated CHARMAP: 5930
Total added characters in newly generated WIDTH: 5536
2020-04-21 18:17:23 +02:00
kokoye2007
8a1d13d0c7 Updates to the shn_MM locale [BZ #25532] 2020-04-08 12:22:36 +02:00
Rafał Lużyński
10b2cdc3b3 oc_FR locale: Fix spelling of April (bug 25639)
Confirmed by CLDR and a native speaker: "abril" is more often used even
if "abrial" is also correct.  Both nominative (alt_mon) and genitive (mon)
cases are updated.
2020-04-07 00:20:53 +02:00
Rafał Lużyński
649fdf039b oc_FR locale: Fix spelling of Thursday (bug 25639)
As reported by a native speaker:

Thursday: "dijóus" -> "dijòus" (also confirmed by CLDR)
2020-03-19 00:19:07 +01:00
Mike FABIAN
eb948facd8 Fix typo in the name for Wednesday in Kurdish [BZ #9809] 2020-02-11 10:18:45 +01:00
Mike FABIAN
cdeae33d71 Update or_IN collation [BZ #22525]
- Add a test file or_IN.UTF-8.in.
- Make the collation agree with CLDR.
2020-02-03 10:19:20 +01:00
Mike FABIAN
ae199e7d64 Fix ckb_IQ [BZ #9809]
Add ckb_IQ to SUPPORTED file.
Add ckb_IQ.UTF-8.in collation test file.
Mention new ckb_IQ locale in NEWS.
2020-02-03 10:19:20 +01:00
Jwtiyar Nariman
4267522f5e Add new locale: ckb_IQ (Kurdish/Sorani spoken in Iraq) [BZ #9809] 2020-02-03 10:19:20 +01:00
Rafał Lużyński
135540285c sl_SI locale: Use "." as the thousands separator (bug 25233)
This is correct according to CLDR [1] and Florian Weimer's quick
research. [2]

[1] https://st.unicode.org/cldr-apps/v#/sl/Symbols/
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=25233#c0

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-08 00:13:48 +01:00
Rafał Lużyński
75ba929987 Multiple locales: Add date_fmt (bug 24054)
It is not specified what should be the content of d_t_fmt and date_fmt
but in the built-in C locale those fields have only one difference:
date_fmt contains "%Z" (the current time zone) while d_t_fmt does not.

For most of the locales this commit does the following operation:
copy d_t_fmt to date_fmt, and then remove "%Z" from d_t_fmt.
If "%Z" was originally missing from d_t_fmt add it to date_fmt.
It also corrects comments where necessary.

Exceptions:

* In bo_CN, dz_BT, and km_KH "%Z" has not been added to date_fmt because
  it was too difficult.  In these locales date_fmt has been set to the
  copy of d_t_fmt.
* In en_DK "%Z" has not been removed from d_t_fmt in order to preserve
  the conformance with the standard mentioned in the comment.

The command to identify and initially edit the locales that need the
update was:

    for i in `grep -lw d_t_fmt *`
    do
        if ! grep -qw date_fmt $i ; then
            awk '/d_t_fmt/ { print $0; gsub("d_t_fmt", "date_fmt"); } //{ print $0 }' < $i > $i.next
            mv $i.next $i
        fi
    done

and then each file was further edited manually.
2020-01-02 11:45:45 +01:00
Rafał Lużyński
d99b500e3d lv_LV locale: Correct the time part of d_t_fmt (bug 25324)
Currently d_t_fmt formats time as "plkst. %H un %M".  A quick Google
search says that "plkst." means "o’clock" and "un" means "and".
Also this format does not display seconds.

CLDR does not mention anything like that.  We have no reason to use
anything different than "%H:%M:%S".
2019-12-30 11:48:20 +01:00
Rafał Lużyński
20a740b2b2 km_KH locale: Use "%M" instead of "m" in d_t_fmt (bug 25323)
A quick analysis suggests that the original author meant "%M" (minutes
format specifier) instead of "m" which is just a literal "m" letter.
2019-12-30 11:48:19 +01:00
Rafał Lużyński
b8c210bcc7 mnw_MM, my_MM, and shn_MM locales: Do not use %Op
The "O" modifier does nothing when used with "%p" so let's better not
use it at all and replace "%Op" with "%p".
2019-12-23 23:49:22 +01:00
Rafał Lużyński
c372d2e863 ru_UA locale: use copy "ru_RU" in LC_TIME (bug 25044)
Replacing incorrect abbreviated weekday names "Пнд", "Вто", "Срд"...
with correct ones "Пн", "Вт", "Ср"... makes the LC_TIME sections in
those two locales almost identical.  The only remaining difference
was that ab_alt_mon elements in ru_UA were lowercase while in ru_RU
they had the first letter uppercase, the latter was pointed as
a better choice by a native speaker.  This commit unifies LC_TIME
between ru_RU and ru_UA.
2019-11-26 11:54:29 +01:00
Talachan Mon
c5fbd7c3ea Add new locale: mnw_MM (Mon language spoken in Myanmar) [BZ #25139] 2019-11-06 08:15:16 +01:00
Arjun Shankar
513aaa0d78 Add Transliterations for Unicode Misc. Mathematical Symbols-A/B [BZ #23132]
This commit adds previously missing transliterations for several code points
in the Unicode blocks "Miscellaneous Mathematical Symbols-A/B" -
transliterated to their approximate ASCII representations.  It also adds a
corresponding iconv transliteration test.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2019-10-25 19:45:55 +02:00