A handful of regexes were allowing +1 for yesexpr and -0 for noexpr,
and it's the i18n definition. Standardize all locales by allowing
these language-independent values in them.
Example change for en_US goes from ^[yY] to ^[+1yY], and from ^[nN]
to ^[-0nN].
I've spot checked a number of these, including some that were def
wrong (like ff_SN). It also fixes all open week-related bugs.
Since ff_SN is the only one that changes its base date, I also made
sure that its ordering of day translations were correct. Looks like
another case Petr brought up where the week field was not actually
checked against the day arrays.
I also took the opportunity to drop first_weekday/first_workday when
the value aligned with the defaults (1 & 2 respectively). This didn't
impact too many locales In practice because the majority omitted them
already.
A few locales were defining some values incorrectly for their region:
ak_GH: week: changing [7, 19971130, 7] to [7, 19971130, 1]
ak_GH: first_weekday: changing 1 to 2
ayc_PE: week: changing [7, 19971130, 7] to [7, 19971130, 1]
bem_ZM: week: changing [7, 19971130, 4] to [7, 19971130, 1]
bem_ZM: first_weekday: changing 1 to 2
en_IE: first_weekday: changing 2 to 1
en_US: week: changing [7, 19971130, 7] to [7, 19971130, 1]
es_CO: first_weekday: changing 2 to 1
es_ES: week: changing [7, 19971130, 5] to [7, 19971130, 4]
ff_SN: week: changing [7, 19971129, 1] to [7, 19971130, 1]
ff_SN: first_weekday: changing 1 to 2
ga_IE: first_weekday: changing 2 to 1
ht_HT: week: changing [7, 19971130, 7] to [7, 19971130, 1]
ht_HT: first_weekday: changing 1 to 2
mk_MK: week: changing [7, 19971130, 4] to [7, 19971130, 1]
mt_MT: first_weekday: changing 2 to 1
quz_PE: week: changing [7, 19971130, 7] to [7, 19971130, 1]
sr_ME: week: changing [7, 19971130, 4] to [7, 19971130, 1]
sr_RS: week: changing [7, 19971130, 4] to [7, 19971130, 1]
sr_RS@latin: week: changing [7, 19971130, 4] to [7, 19971130, 1]
sw_KE: week: changing [7, 19971130, 4] to [7, 19971130, 1]
sw_KE: first_weekday: changing 2 to 1
uk_UA: week: changing [7, 19971130, 4] to [7, 19971130, 1]
unm_US: week: changing [7, 19971130, 4] to [7, 19971130, 1]
Some locales were copying locales that had the wrong week settings, so
that content had to be duplicated so the values could be adjusted:
el_CY: week: setting to [7, 19971130, 1]
en_AG: week: setting to [7, 19971130, 1]
en_AG: first_weekday: changing 2 to 1
en_ZM: week: setting to [7, 19971130, 1]
es_CU: week: setting to [7, 19971130, 1]
nl_AW: week: setting to [7, 19971130, 1]
sw_TZ: first_weekday: setting to 2
ta_LK: first_weekday: setting to 2
The majority of locales were omitting the week field thus getting the
default [7, 19971130, 0 (localedef) / 7 (ISO standard)]. Unfortunately,
neither of those are used by any locales, so we end up having to define
the field just to se the ndays field. In practice, this rarely matters
due to it usage, and the first two fields match the defaults.
aa_DJ: setting to [7, 19971130, 1]
aa_ER: setting to [7, 19971130, 1]
aa_ER@saaho: setting to [7, 19971130, 1]
aa_ET: setting to [7, 19971130, 1]
af_ZA: setting to [7, 19971130, 1]
am_ET: setting to [7, 19971130, 1]
an_ES: setting to [7, 19971130, 4]
anp_IN: setting to [7, 19971130, 1]
ar_AE: setting to [7, 19971130, 1]
ar_BH: setting to [7, 19971130, 1]
ar_DZ: setting to [7, 19971130, 1]
ar_EG: setting to [7, 19971130, 1]
ar_IN: setting to [7, 19971130, 1]
ar_IQ: setting to [7, 19971130, 1]
ar_JO: setting to [7, 19971130, 1]
ar_KW: setting to [7, 19971130, 1]
ar_LB: setting to [7, 19971130, 1]
ar_LY: setting to [7, 19971130, 1]
ar_MA: setting to [7, 19971130, 1]
ar_OM: setting to [7, 19971130, 1]
ar_QA: setting to [7, 19971130, 1]
ar_SA: setting to [7, 19971130, 1]
ar_SD: setting to [7, 19971130, 1]
ar_SS: setting to [7, 19971130, 1]
ar_SY: setting to [7, 19971130, 1]
ar_TN: setting to [7, 19971130, 1]
ar_YE: setting to [7, 19971130, 1]
as_IN: setting to [7, 19971130, 1]
ast_ES: setting to [7, 19971130, 4]
az_AZ: setting to [7, 19971130, 1]
be_BY: setting to [7, 19971130, 1]
be_BY@latin: setting to [7, 19971130, 1]
ber_DZ: setting to [7, 19971130, 1]
ber_MA: setting to [7, 19971130, 1]
bg_BG: setting to [7, 19971130, 4]
bhb_IN: setting to [7, 19971130, 1]
bho_IN: setting to [7, 19971130, 1]
bn_BD: setting to [7, 19971130, 1]
bn_IN: setting to [7, 19971130, 1]
bo_CN: setting to [7, 19971130, 1]
br_FR: setting to [7, 19971130, 4]
brx_IN: setting to [7, 19971130, 1]
bs_BA: setting to [7, 19971130, 1]
byn_ER: setting to [7, 19971130, 1]
ca_AD: setting to [7, 19971130, 4]
ca_ES: setting to [7, 19971130, 4]
ca_ES@euro: setting to [7, 19971130, 4]
ca_FR: setting to [7, 19971130, 4]
ca_IT: setting to [7, 19971130, 4]
ce_RU: setting to [7, 19971130, 1]
cmn_TW: setting to [7, 19971130, 1]
crh_UA: setting to [7, 19971130, 1]
cv_RU: setting to [7, 19971130, 1]
cy_GB: setting to [7, 19971130, 4]
de_BE: setting to [7, 19971130, 4]
de_LU: setting to [7, 19971130, 4]
doi_IN: setting to [7, 19971130, 1]
dv_MV: setting to [7, 19971130, 1]
dz_BT: setting to [7, 19971130, 1]
el_GR: setting to [7, 19971130, 4]
el_GR@euro: setting to [7, 19971130, 4]
en_AU: setting to [7, 19971130, 1]
en_BW: setting to [7, 19971130, 1]
en_CA: setting to [7, 19971130, 1]
en_HK: setting to [7, 19971130, 1]
en_IE: setting to [7, 19971130, 4]
en_IN: setting to [7, 19971130, 1]
en_NG: setting to [7, 19971130, 1]
en_NZ: setting to [7, 19971130, 1]
en_PH: setting to [7, 19971130, 1]
en_SG: setting to [7, 19971130, 1]
en_ZA: setting to [7, 19971130, 1]
en_ZW: setting to [7, 19971130, 1]
es_AR: setting to [7, 19971130, 1]
es_BO: setting to [7, 19971130, 1]
es_CL: setting to [7, 19971130, 1]
es_CO: setting to [7, 19971130, 1]
es_CR: setting to [7, 19971130, 1]
es_DO: setting to [7, 19971130, 1]
es_EC: setting to [7, 19971130, 1]
es_ES@euro: setting to [7, 19971130, 4]
es_GT: setting to [7, 19971130, 1]
es_HN: setting to [7, 19971130, 1]
es_MX: setting to [7, 19971130, 1]
es_NI: setting to [7, 19971130, 1]
es_PA: setting to [7, 19971130, 1]
es_PE: setting to [7, 19971130, 1]
es_PR: setting to [7, 19971130, 1]
es_PY: setting to [7, 19971130, 1]
es_SV: setting to [7, 19971130, 1]
es_US: setting to [7, 19971130, 1]
es_UY: setting to [7, 19971130, 1]
es_VE: setting to [7, 19971130, 1]
eu_ES: setting to [7, 19971130, 4]
fa_IR: setting to [7, 19971130, 1]
fil_PH: setting to [7, 19971130, 1]
fo_FO: setting to [7, 19971130, 4]
fr_CA: setting to [7, 19971130, 1]
fr_CH: setting to [7, 19971130, 4]
fr_LU: setting to [7, 19971130, 4]
fy_NL: setting to [7, 19971130, 4]
ga_IE: setting to [7, 19971130, 4]
gd_GB: setting to [7, 19971130, 4]
gez_ER: setting to [7, 19971130, 1]
gez_ET: setting to [7, 19971130, 1]
gl_ES: setting to [7, 19971130, 4]
gu_IN: setting to [7, 19971130, 1]
gv_GB: setting to [7, 19971130, 4]
hak_TW: setting to [7, 19971130, 1]
ha_NG: setting to [7, 19971130, 1]
he_IL: setting to [7, 19971130, 1]
hi_IN: setting to [7, 19971130, 1]
hne_IN: setting to [7, 19971130, 1]
hr_HR: setting to [7, 19971130, 1]
hy_AM: setting to [7, 19971130, 1]
id_ID: setting to [7, 19971130, 1]
ig_NG: setting to [7, 19971130, 1]
ik_CA: setting to [7, 19971130, 1]
is_IS: setting to [7, 19971130, 4]
it_CH: setting to [7, 19971130, 4]
it_IT: setting to [7, 19971130, 4]
it_IT@euro: setting to [7, 19971130, 4]
iu_CA: setting to [7, 19971130, 1]
ja_JP: setting to [7, 19971130, 1]
ka_GE: setting to [7, 19971130, 1]
kk_KZ: setting to [7, 19971130, 1]
kl_GL: setting to [7, 19971130, 1]
km_KH: setting to [7, 19971130, 1]
kn_IN: setting to [7, 19971130, 1]
kok_IN: setting to [7, 19971130, 1]
ko_KR: setting to [7, 19971130, 1]
ks_IN: setting to [7, 19971130, 1]
ks_IN@devanagari: setting to [7, 19971130, 1]
ku_TR: setting to [7, 19971130, 1]
kw_GB: setting to [7, 19971130, 4]
ky_KG: setting to [7, 19971130, 1]
lg_UG: setting to [7, 19971130, 1]
lij_IT: setting to [7, 19971130, 4]
lo_LA: setting to [7, 19971130, 1]
lt_LT: setting to [7, 19971130, 4]
lv_LV: setting to [7, 19971130, 1]
lzh_TW: setting to [7, 19971130, 1]
mag_IN: setting to [7, 19971130, 1]
mai_IN: setting to [7, 19971130, 1]
mg_MG: setting to [7, 19971130, 1]
mhr_RU: setting to [7, 19971130, 1]
mi_NZ: setting to [7, 19971130, 1]
ml_IN: setting to [7, 19971130, 1]
mni_IN: setting to [7, 19971130, 1]
mn_MN: setting to [7, 19971130, 1]
mr_IN: setting to [7, 19971130, 1]
ms_MY: setting to [7, 19971130, 1]
mt_MT: setting to [7, 19971130, 1]
my_MM: setting to [7, 19971130, 1]
nan_TW: setting to [7, 19971130, 1]
nan_TW@latin: setting to [7, 19971130, 1]
ne_NP: setting to [7, 19971130, 1]
nhn_MX: setting to [7, 19971130, 1]
niu_NU: setting to [7, 19971130, 1]
niu_NZ: setting to [7, 19971130, 1]
nl_BE: setting to [7, 19971130, 4]
nl_BE@euro: setting to [7, 19971130, 4]
nr_ZA: setting to [7, 19971130, 1]
nso_ZA: setting to [7, 19971130, 1]
oc_FR: setting to [7, 19971130, 4]
om_ET: setting to [7, 19971130, 1]
om_KE: setting to [7, 19971130, 1]
or_IN: setting to [7, 19971130, 1]
os_RU: setting to [7, 19971130, 1]
pa_IN: setting to [7, 19971130, 1]
pap_AW: setting to [7, 19971130, 1]
pap_CW: setting to [7, 19971130, 1]
pa_PK: setting to [7, 19971130, 1]
ps_AF: setting to [7, 19971130, 1]
pt_BR: setting to [7, 19971130, 1]
pt_PT: setting to [7, 19971130, 4]
pt_PT@euro: setting to [7, 19971130, 4]
raj_IN: setting to [7, 19971130, 1]
ro_RO: setting to [7, 19971130, 1]
ru_RU: setting to [7, 19971130, 1]
ru_UA: setting to [7, 19971130, 1]
rw_RW: setting to [7, 19971130, 1]
sa_IN: setting to [7, 19971130, 1]
sat_IN: setting to [7, 19971130, 1]
sd_IN: setting to [7, 19971130, 1]
sd_IN@devanagari: setting to [7, 19971130, 1]
se_NO: setting to [7, 19971130, 4]
shs_CA: setting to [7, 19971130, 1]
sid_ET: setting to [7, 19971130, 1]
si_LK: setting to [7, 19971130, 1]
sl_SI: setting to [7, 19971130, 1]
so_DJ: setting to [7, 19971130, 1]
so_ET: setting to [7, 19971130, 1]
so_KE: setting to [7, 19971130, 1]
so_SO: setting to [7, 19971130, 1]
sq_AL: setting to [7, 19971130, 1]
ss_ZA: setting to [7, 19971130, 1]
st_ZA: setting to [7, 19971130, 1]
sv_FI: setting to [7, 19971130, 4]
sv_SE: setting to [7, 19971130, 4]
ta_IN: setting to [7, 19971130, 1]
tcy_IN: setting to [7, 19971130, 1]
te_IN: setting to [7, 19971130, 1]
tg_TJ: setting to [7, 19971130, 1]
the_NP: setting to [7, 19971130, 1]
th_TH: setting to [7, 19971130, 1]
ti_ER: setting to [7, 19971130, 1]
ti_ET: setting to [7, 19971130, 1]
tig_ER: setting to [7, 19971130, 1]
tk_TM: setting to [7, 19971130, 1]
tl_PH: setting to [7, 19971130, 1]
tn_ZA: setting to [7, 19971130, 1]
tr_CY: setting to [7, 19971130, 1]
tr_TR: setting to [7, 19971130, 1]
ts_ZA: setting to [7, 19971130, 1]
tt_RU: setting to [7, 19971130, 1]
tt_RU@iqtelif: setting to [7, 19971130, 1]
ug_CN: setting to [7, 19971130, 1]
ur_IN: setting to [7, 19971130, 1]
ur_PK: setting to [7, 19971130, 1]
uz_UZ: setting to [7, 19971130, 1]
uz_UZ@cyrillic: setting to [7, 19971130, 1]
ve_ZA: setting to [7, 19971130, 1]
vi_VN: setting to [7, 19971130, 1]
wa_BE: setting to [7, 19971130, 4]
wal_ET: setting to [7, 19971130, 1]
wo_SN: setting to [7, 19971130, 1]
xh_ZA: setting to [7, 19971130, 1]
yi_US: setting to [7, 19971130, 1]
yo_NG: setting to [7, 19971130, 1]
yue_HK: setting to [7, 19971130, 1]
zh_CN: setting to [7, 19971130, 1]
zh_HK: setting to [7, 19971130, 1]
zh_SG: setting to [7, 19971130, 1]
zh_TW: setting to [7, 19971130, 1]
zu_ZA: setting to [7, 19971130, 1]
Finally, set first_weekday in all the locales that were omitting it
and wanted something other than the default of 1.
aa_DJ: setting to 7
aa_ER: setting to 2
aa_ER@saaho: setting to 2
ar_AE: setting to 7
ar_BH: setting to 7
ar_DZ: setting to 7
ar_EG: setting to 7
ar_IQ: setting to 7
ar_JO: setting to 7
ar_KW: setting to 7
ar_LB: setting to 2
ar_LY: setting to 7
ar_MA: setting to 7
ar_OM: setting to 7
ar_QA: setting to 7
ar_SD: setting to 7
ar_SS: setting to 2
ar_SY: setting to 7
az_AZ: setting to 2
be_BY: setting to 2
be_BY@latin: setting to 2
ber_DZ: setting to 7
ber_MA: setting to 7
bn_BD: setting to 6
bs_BA: setting to 2
byn_ER: setting to 2
dv_MV: setting to 6
en_NG: setting to 2
es_BO: setting to 2
es_CL: setting to 2
es_EC: setting to 2
es_UY: setting to 2
fo_FO: setting to 2
fr_CH: setting to 2
gd_GB: setting to 2
gez_ER: setting to 2
ha_NG: setting to 2
hr_HR: setting to 2
hy_AM: setting to 2
ig_NG: setting to 2
is_IS: setting to 2
it_CH: setting to 2
ka_GE: setting to 2
kk_KZ: setting to 2
kl_GL: setting to 2
ku_TR: setting to 2
ky_KG: setting to 2
lg_UG: setting to 2
mg_MG: setting to 2
mn_MN: setting to 2
ms_MY: setting to 2
niu_NU: setting to 2
pap_AW: setting to 2
pap_CW: setting to 2
pt_PT: setting to 2
pt_PT@euro: setting to 2
rw_RW: setting to 2
se_NO: setting to 2
si_LK: setting to 2
so_DJ: setting to 7
so_SO: setting to 2
sq_AL: setting to 2
tg_TJ: setting to 2
ti_ER: setting to 2
tig_ER: setting to 2
tk_TM: setting to 2
tt_RU: setting to 2
tt_RU@iqtelif: setting to 2
uz_UZ: setting to 2
uz_UZ@cyrillic: setting to 2
vi_VN: setting to 2
wo_SN: setting to 2
yo_NG: setting to 2
This updates a few locales based on CLDR v29 data. I've verified most by
hand while the rest I know are correct.
For int_curr_symbol, it should be 3 characters followed by a space:
ar_SS: changing SDG to SSP
bem_ZM: changing ZMK to ZMW
dz_BT: changing BTN to BTN # Just changing " " to "<U0020>".
en_ZW: changing ZWD to USD
es_SV: changing SVC to USD
lv_LV: changing LVL to EUR
ne_NP: changing INR to NPR
pap_AW: changing ANG to AWG
the_NP: changing INR to NPR
Some of these require updates iso-4217.def.
For currency_symbol, it should be the standard/localized symbol name:
aa_DJ: changing $ to Fdj
ar_SA: changing ريال to ر.س
ar_SS: changing ج.س. to £
az_AZ: changing man. to ₼
bg_BG: changing лв to лв.
ce_RU: changing руб to ₽
crh_UA: changing gr to ₴
cv_RU: changing t to ₽
de_CH: changing Fr. to CHF
dz_BT: changing དངུལ་ཀྲམ་ to Nu.
en_BW: changing Pu to P
en_DK: changing ¤ to kr.
en_PH: changing Php to ₱
en_ZW: changing Z$ to $
es_BO: changing $b to Bs
es_DO: changing $ to RD$
es_HN: changing L. to L
es_PA: changing B/ to B/.
es_SV: changing ₡ to $
fil_PH: changing PhP to ₱
he_IL: changing שח to ₪
hy_AM: changing Դ to ֏
ka_GE: changing ლ to ₾
kk_KZ: changing тг to ₸
ko_KR: changing ₩ to ₩
lg_UG: changing /- to USh
lv_LV: changing Ls to €
mg_MG: changing AR to Ar
mhr_RU: changing ТЕҤ to ₽
my_MM: changing Ks to K
os_RU: changing сом to ₽
pap_AW: changing f to ƒ
pap_CW: changing f to ƒ
ps_AF: changing افغانۍ to ؋
rw_RW: changing Frw to FRw
ru_RU: changing руб to ₽
ru_UA: changing гр to ₴
sd_IN@devanagari: changing रु to ₹
se_NO: changing ru to kr
si_LK: changing ₨ to රු
so_SO: changing $ to S
sq_AL: changing Lek to L
ti_ER: changing $ to Nfk
ti_ET: changing $ to Br
tl_PH: changing PhP to ₱
tr_TR: changing TL to ₺
tt_RU: changing руб to ₽
tt_RU@iqtelif: changing sum to ₽
uz_UZ: changing so'm to soʻm
Note: Some of the characters might not render as they're still quite new
in the Unicode database.
The ISO 30112 standard defines the valid values for the category
keyword as only a few options:
posix:1993
i18n:2004
i18n:2012
The vast majority of locales had changed the "i18n" string to the
name of its own locale (e.g. "ak_GH:2013") as well as tweaking the
date (presumably thinking it should be the date of submission).
Convert all of them to "i18n:2012" for consistency. A follow up
change will update localedef to actually check/validate the field.
These entries have been checked mostly against Wikipedia, but also using
the sources it cites (like the UN and other treaty sources).
Fix incorrect values:
en_BW: changing RB to BW
kl_GL: changing GRO to KN
km_KH: changing LAO to KH
my_MM: changing BA to MYA
oc_FR: changing F to F
tr_CY: changing TR to CY
wae_CH: changing DH to CH
Add missing entries:
aa_DJ: changing to DJI
ak_GH: changing to GH
ar_OM: changing to OM
ar_SS: changing to SUD
ar_YE: changing to YAR
bo_CN: changing to CHN
cmn_TW: changing to RC
dv_MV: changing to MV
dz_BT: changing to BHT
en_AG: changing to AG
es_HN: changing to HN
es_PR: changing to PR
hak_TW: changing to RC
lzh_TW: changing to RC
nan_TW: changing to RC
nan_TW@latin: changing to RC
nl_AW: changing to AUA
pap_AW: changing to AUA
so_DJ: changing to DJI
the_NP: changing to NEP
ug_CN: changing to CHN
yue_HK: changing to HK
zh_CN: changing to CHN
zh_HK: changing to HK
zh_TW: changing to RC
There are only two page sizes that locales use: US-Letter and A4.
For the former, move to copying the en_US locale, while for the
latter, move to copying the i18n locale. This lets us clean up
all the stray comments like FIXME.
There should be no functional differences here.
There are only two measurement systems that locales use: US and metric.
For the former, move to copying the en_US locale, while for the latter,
move to copying the i18n locale. This lets us clean up all the stray
comments like FIXME.
There should be no functional differences here.
There's no real value in populating this field when it's the same as the
default POSIX setting, so drop it from most locales so it's clear what's
going on.
This updates a bunch of locales based on CLDR v28 data:
ar_SS: int_prefix: changing 249 to 211
bn_BD: int_prefix: changing 88 to 880
dz_BT: int_prefix: changing 66 to 975
en_HK: int_prefix: changing to 852
en_PH: int_prefix: changing to 63
en_SG: int_prefix: changing to 65
es_DO: int_prefix: changing 1809 to 1
es_PA: int_prefix: changing 502 to 507
es_PR: int_prefix: changing 1787 to 1
km_KH: int_prefix: changing 856 to 855
mt_MT: int_prefix: changing to 356
ne_NP: int_prefix: changing 91 to 977
pap_AW: int_prefix: changing 599 to 297
the_NP: int_prefix: changing 91 to 977
tk_TM: int_prefix: changing to 993
uz_UZ: int_prefix: changing 27 to 998
zh_SG: int_prefix: changing to 65
I've also checked these against https://countrycode.org/.
Note: the Dominican Republic (DO) and Puerto Rico (PR) updates are
correct: they both use +1. Historically, DO had one area code of
809 and PR of 787 which is why they were listed as such, but they
have both expanded into 829 and 989 respectively, so using the four
digit value is def incorrect now.