Commit Graph

33 Commits

Author SHA1 Message Date
Mike FABIAN
1597385481 Adapt collation in several locales to the new iso14651_t1_common file
[BZ #22550] - es_ES locale (and other es_* locales): collation should
treat ñ as a primary different character, sync the collation
for Spanish with CLDR
[BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan)

	* localedata/Makefile: Add new test files.
	* localedata/lv_LV.UTF-8.in: Adapt test file to new collation order.
	* localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation order.
	* localedata/uk_UA.UTF-8.in: Adapt test file to new collation order.
	* localedata/am_ET.UTF-8.in: New test file.
	* localedata/az_AZ.UTF-8.in: Likewise.
	* localedata/be_BY.UTF-8.in: Likewise.
	* localedata/ber_DZ.UTF-8.in: Likewise.
	* localedata/ber_MA.UTF-8.in: Likewise.
	* localedata/bg_BG.UTF-8.in: Likewise.
	* localedata/br_FR.UTF-8.in: Likewise.
	* localedata/cmn_TW.UTF-8.in: Likewise.
	* localedata/crh_UA.UTF-8.in: Likewise.
	* localedata/csb_PL.UTF-8.in: Likewise.
	* localedata/cv_RU.UTF-8.in: Likewise.
	* localedata/cy_GB.UTF-8.in: Likewise.
	* localedata/dz_BT.UTF-8.in: Likewise.
	* localedata/eo.UTF-8.in: Likewise.
	* localedata/es_ES.UTF-8.in: Likewise.
	* localedata/fa_IR.UTF-8.in: Likewise.
	* localedata/fi_FI.UTF-8.in: Likewise.
	* localedata/fil_PH.UTF-8.in: Likewise.
	* localedata/fur_IT.UTF-8.in: Likewise.
	* localedata/gez_ER.UTF-8@abegede.in: Likewise.
	* localedata/ha_NG.UTF-8.in: Likewise.
	* localedata/ig_NG.UTF-8.in: Likewise.
	* localedata/ik_CA.UTF-8.in: Likewise.
	* localedata/kk_KZ.UTF-8.in: Likewise.
	* localedata/ku_TR.UTF-8.in: Likewise.
	* localedata/ky_KG.UTF-8.in: Likewise.
	* localedata/ln_CD.UTF-8.in: Likewise.
	* localedata/mi_NZ.UTF-8.in: Likewise.
	* localedata/ml_IN.UTF-8.in: Likewise.
	* localedata/mn_MN.UTF-8.in: Likewise.
	* localedata/mr_IN.UTF-8.in: Likewise.
	* localedata/mt_MT.UTF-8.in: Likewise.
	* localedata/nb_NO.UTF-8.in: Likewise.
	* localedata/om_KE.UTF-8.in: Likewise.
	* localedata/os_RU.UTF-8.in: Likewise.
	* localedata/ps_AF.UTF-8.in: Likewise.
	* localedata/ro_RO.UTF-8.in: Likewise.
	* localedata/ru_RU.UTF-8.in: Likewise.
	* localedata/sc_IT.UTF-8.in: Likewise.
	* localedata/se_NO.UTF-8.in: Likewise.
	* localedata/sq_AL.UTF-8.in: Likewise.
	* localedata/sv_SE.UTF-8.in: Likewise.
	* localedata/szl_PL.UTF-8.in: Likewise.
	* localedata/tg_TJ.UTF-8.in: Likewise.
	* localedata/tk_TM.UTF-8.in: Likewise.
	* localedata/tt_RU.UTF-8.in: Likewise.
	* localedata/tt_RU.UTF-8@iqtelif.in: Likewise.
	* localedata/ug_CN.UTF-8.in: Likewise.
	* localedata/uz_UZ.UTF-8.in: Likewise.
	* localedata/vi_VN.UTF-8.in: Likewise.
	* localedata/yi_US.UTF-8.in: Likewise.
	* localedata/yo_NG.UTF-8.in: Likewise.
	* localedata/zh_CN.UTF-8.in: Likewise.
	* localedata/locales/am_ET: Adapt collation rules to new iso14651_t1_common
        file and fix bugs in the collation.
	* localedata/locales/az_AZ: Likewise.
	* localedata/locales/be_BY: Likewise.
	* localedata/locales/ber_DZ: Likewise.
	* localedata/locales/ber_MA: Likewise.
	* localedata/locales/bg_BG: Likewise.
	* localedata/locales/br_FR: Likewise.
	* localedata/locales/br_FR@euro: Likewise.
	* localedata/locales/ca_ES: Likewise.
	* localedata/locales/cns11643_stroke: Likewise.
	* localedata/locales/crh_UA: Likewise.
	* localedata/locales/cs_CZ: Likewise.
	* localedata/locales/csb_PL: Likewise.
	* localedata/locales/cv_RU: Likewise.
	* localedata/locales/cy_GB: Likewise.
	* localedata/locales/da_DK: Likewise.
	* localedata/locales/dz_BT: Likewise.
	* localedata/locales/en_CA: Likewise.
	* localedata/locales/eo: Likewise.
	* localedata/locales/es_CU: Likewise.
	* localedata/locales/es_EC: Likewise.
	* localedata/locales/es_ES: Likewise.
	* localedata/locales/es_US: Likewise.
	* localedata/locales/et_EE: Likewise.
	* localedata/locales/fa_IR: Likewise.
	* localedata/locales/fi_FI: Likewise.
	* localedata/locales/fil_PH: Likewise.
	* localedata/locales/fur_IT: Likewise.
	* localedata/locales/gez_ER@abegede: Likewise.
	* localedata/locales/ha_NG: Likewise.
	* localedata/locales/hr_HR: Likewise.
	* localedata/locales/hsb_DE: Likewise.
	* localedata/locales/hu_HU: Likewise.
	* localedata/locales/ig_NG: Likewise.
	* localedata/locales/ik_CA: Likewise.
	* localedata/locales/is_IS: Likewise.
	* localedata/locales/iso14651_t1_pinyin: Likewise.
	* localedata/locales/kk_KZ: Likewise.
	* localedata/locales/ku_TR: Likewise.
	* localedata/locales/ky_KG: Likewise.
	* localedata/locales/ln_CD: Likewise.
	* localedata/locales/lt_LT: Likewise.
	* localedata/locales/lv_LV: Likewise.
	* localedata/locales/mi_NZ: Likewise.
	* localedata/locales/ml_IN: Likewise.
	* localedata/locales/mn_MN: Likewise.
	* localedata/locales/mr_IN: Likewise.
	* localedata/locales/mt_MT: Likewise.
	* localedata/locales/nb_NO: Likewise.
	* localedata/locales/om_KE: Likewise.
	* localedata/locales/os_RU: Likewise.
	* localedata/locales/pl_PL: Likewise.
	* localedata/locales/ps_AF: Likewise.
	* localedata/locales/ro_RO: Likewise.
	* localedata/locales/ru_RU: Likewise.
	* localedata/locales/ru_UA: Likewise.
	* localedata/locales/sc_IT: Likewise.
	* localedata/locales/se_NO: Likewise.
	* localedata/locales/si_LK: Likewise.
	* localedata/locales/sq_AL: Likewise.
	* localedata/locales/sv_FI: Likewise.
	* localedata/locales/sv_FI@euro: Likewise.
	* localedata/locales/sv_SE: Likewise.
	* localedata/locales/szl_PL: Likewise.
	* localedata/locales/tg_TJ: Likewise.
	* localedata/locales/ti_ER: Likewise.
	* localedata/locales/tk_TM: Likewise.
	* localedata/locales/tl_PH: Likewise.
	* localedata/locales/tr_TR: Likewise.
	* localedata/locales/tt_RU: Likewise.
	* localedata/locales/tt_RU@iqtelif: Likewise.
	* localedata/locales/ug_CN: Likewise.
	* localedata/locales/uk_UA: Likewise.
	* localedata/locales/uz_UZ: Likewise.
	* localedata/locales/uz_UZ@cyrillic: Likewise.
	* localedata/locales/vi_VN: Likewise.
	* localedata/locales/yi_US: Likewise.
	* localedata/locales/yo_NG: Likewise.
2018-02-27 17:47:50 +01:00
Mike FABIAN
df74ef786f Add sections for various scripts to the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Add sections for various
	scripts to the iso14651_t1_common file.
2018-02-27 16:52:54 +01:00
Mike FABIAN
d5adfbadd4 iso14651_t1_common: make the fourth level the codepoint for characters which are ignorable on all 4 levels
Entries for characters which have “IGNORE” on all 4 levels like:

 <U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)

are changed into:

 <U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)

i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.

	* localedata/locales/iso14651_t1_common: Use the code point of a
        character in the fourth collation level instead of IGNORE for all
        entries which have IGNORE on all 4 levels.
2018-02-27 16:50:30 +01:00
Mike FABIAN
5f5a961091 Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_common
* localedata/locales/iso14651_t1_common: Add some convenient collation
	symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
	rules similar to those in CLDR.
2018-02-27 16:47:22 +01:00
Mike FABIAN
8a97e9002f Fixing syntax errors after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: The new version of this
	file downloaded from ISO contained several syntax errors which
	are fixed by this patch.
2018-02-27 16:45:30 +01:00
Mike FABIAN
bbdd2fba7d iso14651_t1_common: <U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)> → <U000\1>
* localedata/locales/iso14651_t1_common: replace all <U.....>
	with <U000.....> because glibc understands only 4 digit or 8 digit
2018-02-27 16:44:03 +01:00
Mike FABIAN
1569e551af Necessary changes after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Necessary changes
	to make the file downloaded from ISO usable by glibc.
2018-02-27 16:42:14 +01:00
Mike FABIAN
9479b6d5e0 Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095]
[BZ #14095] - Review / update collation data from Unicode / ISO 14651

File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt

Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.

This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:

https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq

	[BZ #14095]
	* localedata/locales/iso14651_t1_common: Update file to
	latest version from ISO (ISO14651_2016_TABLE1_en.txt).
2018-02-27 16:36:31 +01:00
Alexandre Oliva
8da25eec0a Collation fix: make forward accent sorting the default [BZ #17750]
[BZ #17750]
	* Makefile: add fr_CA.UTF-8 to test-input and LOCALES.
	* localedata/fr_CA.UTF-8.in: New file with test data for backward
	accents sorting.
	* localedata/fr_FR.UTF-8.in: Fix test data for forward accents
	sorting.
	* localedata/locales/cs_CZ (LC_COLLATE): Remove “define DIACRIT_FORWARD”
	* localedata/locales/de_DE (LC_COLLATE): Likewise.
	* localedata/locales/hu_HU (LC_COLLATE): Likewise.
	* localedata/locales/lb_LU (LC_COLLATE): Likewise.
	* localedata/locales/yuw_PG (LC_COLLATE): Likewise.
	* localedata/locales/fr_CA (LC_COLLATE): Add “define DIACRIT_BACKWARD”
	* localedata/locales/iso14651_t1_common: Use “ifdef DIACRIT_FORWARD”
	instead of “ifdef DIACRIT_BACKWARD”.

The only locale which currently needs backward accents sorting is fr_CA.
Therefore, forward accents sorting should be the default.

Before this patch, backwards accent sorting was the default and all
locales except fr_CA had to use

    define DIACRIT_FORWARD

before

    copy "iso14651_t1"

Most locales didn’t do that and thus got the inappropriate backwards accents sorting
by accident. Now only the fr_CA locale needs to use

    define DIACRIT_BACKWARD

before

    copy "iso14651_t1"

Original patch slightly modified by: Mike FABIAN <mfabian@redhat.com>
2017-11-29 11:56:46 +01:00
Santhosh Thottingal
b05eca0e1d Correct collation rules for Malayalam.
[BZ #19922]
	* locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF.

	[BZ #19919]
	* locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.
2017-06-11 10:08:37 -04:00
Mike Frysinger
a4cea54b12 localedata: standardize copyright/license information [BZ #11213]
Use the language from the FSF in all locale files to disclaim any
license/copyright on locale data.

See https://sourceware.org/ml/libc-locales/2013-q1/msg00048.html
2016-03-21 02:29:56 -04:00
Ulrich Drepper
b426c80f5f Fix whitespaces 2011-05-15 11:37:52 -04:00
Ulrich Drepper
08ba84136f Move Dzonghka collation rules to common collation rules file 2011-05-15 11:36:07 -04:00
Pravin Satpute
1e5e9ec825 Fix sorting of malayalam letter 'na'. 2010-02-03 03:50:01 -08:00
Ulrich Drepper
6b4f51823c Fix whitespaces. 2010-02-03 03:36:52 -08:00
Pravin Satpute
3e8a75d1b9 Move Tamil collation data to common source file. 2010-02-03 03:32:06 -08:00
Keith Stribley
3c2c4bf6f7 Implement Burmese language locale for Myanmar. 2009-10-30 08:14:02 -07:00
Ulrich Drepper
115a532734 * localedata/locales/bn_BD: Remove comment about missing collation
rules.
	* localedata/locales/iso14651_t1_common: Add Bengali collation rules.
	Patch by Pravin Satpute <psatpute@redhat.com>.
2009-05-04 21:20:20 +00:00
Ulrich Drepper
eee6b14327 [BZ #9759]
* dirent/dirent.h: Adjust prototypes of scandir, scandir64, alphasort,
	alphasort64, versionsort, and versionsort64 to POSIX 2008.
	* dirent/alphasort.c: Adjust implementation to type change.
	* dirent/alphasort64.c: Likewise.
	* dirent/scandir.c: Likewise.
	* dirent/versionsort.c: Likewise.
	* dirent/versionsort64.c: Likewise.
	* sysdeps/wordsize-64/alphasort.c: Add hack to hide alphasort64
	declaration.
	* sysdeps/wordsize-64/versionsort.c: Add hack to hide versionsort64
	declaration.
2009-03-15 21:33:19 +00:00
Ulrich Drepper
638633961d * locales/iso14651_t1_common: Add rules for sorting Malayalam.
Patch by Santhosh Thottingal <santhosh.thottingal@gmail.com>.
2009-02-11 15:42:53 +00:00
Ulrich Drepper
06057297c4 * locales/iso14651_t1_common: Fix sorting of U+0AB3.
Patch by Pravin Satpute <psatpute@redhat.com>.
2008-12-31 14:58:14 +00:00
Ulrich Drepper
6daf1a2fb1 [BZ #6867]
* sysdeps/powerpc/elf/rtld-global-offsets.sym: Fix typo.
2008-10-31 19:03:31 +00:00
Ulrich Drepper
46026b5589 * locales/iso14651_t1_common: Add Kannada collation support.
Patch by Pravin Satpute <psatpute@redhat.com>.
2008-07-11 17:05:42 +00:00
Ulrich Drepper
99ae13c825 * locales/iso14651_t1_common: Add support for Gurumukhi script.
Patch by Pravin Satpute <psatpute@redhat.com>.
2008-06-24 16:59:47 +00:00
Ulrich Drepper
e564d29d8e Remove U0C0D entry added for Telugu. 2008-05-21 15:13:02 +00:00
Ulrich Drepper
74e1338588 * string/strcasestr.c (CMP_FUNC): Use __strncasecmp, not strncasecmp. 2008-05-16 18:19:18 +00:00
Ulrich Drepper
2f9a1be867 [BZ #6442]
* string/endian.h: Add macros for fixed-size endian conversion.
	* bits/byteswap.h: Allow inclusion from <endian.h>.
	* sysdeps/i386/bits/byteswap.h: Likewise.
	* sysdeps/ia64/bits/byteswap.h: Likewise.
	* sysdeps/s390/bits/byteswap.h: Likewise.
	* sysdeps/x86_64/bits/byteswap.h: Likewise.
	* string/Makefile (tests): Add tst-endian.
	* string/tst-endian.c: New file.
2008-05-15 02:54:33 +00:00
Ulrich Drepper
23c37224d3 Fix first weight for U+1E60, U+1E62, U+1E64, U+1E66, and U+1E68. 2008-04-07 23:53:20 +00:00
Ulrich Drepper
4e0b2dbe54 * locales/iso14651_t1_common: Add support for Gujarati script.
Patch by Pravin Satpute <psatpute@redhat.com>.
2008-03-31 14:15:28 +00:00
Ulrich Drepper
85ac24138b * locales/iso14651_t1_common: Add support for Devanagari script.
* locales/mr_IN: Adjust Devanagari sorting for mr_IN.
	Patch by Pravin Satpute <psatpute@redhat.com>.
2008-03-24 05:08:33 +00:00
Ulrich Drepper
3a054d7ab0 * locale/programs/locfile-token.h: Remove tok_elif, add tok_elifdef
and tok_elifndef.
	* locale/programs/locfile-kw.gperf: Likewise.
	* locale/programs/ld-collate.c: Implement primitive preprocessor.
2007-10-11 02:36:04 +00:00
Ulrich Drepper
592a95ee7c * po/pt_BR.po: Fix typo. 2007-09-30 16:57:15 +00:00
Ulrich Drepper
762422d1bd * locale/programs/ld-collate.c (collate_read): Allow order_start
after copy.
2007-04-28 06:51:26 +00:00