glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 21:10:07 +00:00

Author	SHA1	Message	Date
Carlos O'Donell	7cd7d36f1f	Keep expected behaviour for [a-z] and [A-z] (Bug 23393). In commit `9479b6d5e0` we updated all of the collation data to harmonize with the new version of ISO 14651 which is derived from Unicode 9.0.0. This collation update brought with it some changes to locales which were not desirable by some users, in particular it altered the meaning of the locale-dependent-range regular expression, namely [a-z] and [A-Z], and for en_US it caused uppercase letters to be matched by [a-z] for the first time. The matching of uppercase letters by [a-z] is something which is already known to users of other locales which have this property, but this change could cause significant problems to en_US and other similar locales that had never had this change before. Whether this behaviour is desirable or not is contentious and GNU Awk has this to say on the topic: https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html While the POSIX standard also has this further to say: "RE Bracket Expression": http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap09.html "The current standard leaves unspecified the behavior of a range expression outside the POSIX locale. ... As noted above, efforts were made to resolve the differences, but no solution has been found that would be specific enough to allow for portable software while not invalidating existing implementations." In glibc we implement the requirement of ISO POSIX-2:1993 and use collation element order (CEO) to construct the range expression, the API internally is __collseq_table_lookup(). The fact that we use CEO and also have 4-level weights on each collation rule means that we can in practice reorder the collation rules in iso14651_t1_common (the new data) to provide consistent range expression resolution and the weights should maintain the expected total order. Therefore this patch does three things: * Reorder the collation rules for the LATIN script in iso14651_t1_common to deinterlace uppercase and lowercase letters in the collation element orders. * Adds new test data en_US.UTF-8.in for sort-test.sh which exercises strcoll* and strxfrm* and ensures the ISO 14651 collation remains. * Add back tests to tst-fnmatch.input and tst-regexloc.c which exercise that [a-z] does not match A or Z. The reordering of the ISO 14651 data is done in an entirely mechanical fashion using the following program attached to the bug: https://sourceware.org/bugzilla/show_bug.cgi?id=23393#c28 It is up for discussion if the iso14651_t1_common data should be refined further to have 3 very tight collation element ranges that include only a-z, A-Z, and 0-9, which would implement the solution sought after in: https://sourceware.org/bugzilla/show_bug.cgi?id=23393#c12 and implemented here: https://www.sourceware.org/ml/libc-alpha/2018-07/msg00854.html No regressions on x86_64. Verified that removal of the iso14651_t1_common change causes tst-fnmatch to regress with: 422: fnmatch ("[a-z]", "A", 0) = 0 (FAIL, expected FNM_NOMATCH) * ... 425: fnmatch ("[A-Z]", "z", 0) = 0 (FAIL, expected FNM_NOMATCH) *	2018-07-25 17:00:45 -04:00
Mike FABIAN	1597385481	Adapt collation in several locales to the new iso14651_t1_common file [BZ #22550] - es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR [BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan) * localedata/Makefile: Add new test files. * localedata/lv_LV.UTF-8.in: Adapt test file to new collation order. * localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation order. * localedata/uk_UA.UTF-8.in: Adapt test file to new collation order. * localedata/am_ET.UTF-8.in: New test file. * localedata/az_AZ.UTF-8.in: Likewise. * localedata/be_BY.UTF-8.in: Likewise. * localedata/ber_DZ.UTF-8.in: Likewise. * localedata/ber_MA.UTF-8.in: Likewise. * localedata/bg_BG.UTF-8.in: Likewise. * localedata/br_FR.UTF-8.in: Likewise. * localedata/cmn_TW.UTF-8.in: Likewise. * localedata/crh_UA.UTF-8.in: Likewise. * localedata/csb_PL.UTF-8.in: Likewise. * localedata/cv_RU.UTF-8.in: Likewise. * localedata/cy_GB.UTF-8.in: Likewise. * localedata/dz_BT.UTF-8.in: Likewise. * localedata/eo.UTF-8.in: Likewise. * localedata/es_ES.UTF-8.in: Likewise. * localedata/fa_IR.UTF-8.in: Likewise. * localedata/fi_FI.UTF-8.in: Likewise. * localedata/fil_PH.UTF-8.in: Likewise. * localedata/fur_IT.UTF-8.in: Likewise. * localedata/gez_ER.UTF-8@abegede.in: Likewise. * localedata/ha_NG.UTF-8.in: Likewise. * localedata/ig_NG.UTF-8.in: Likewise. * localedata/ik_CA.UTF-8.in: Likewise. * localedata/kk_KZ.UTF-8.in: Likewise. * localedata/ku_TR.UTF-8.in: Likewise. * localedata/ky_KG.UTF-8.in: Likewise. * localedata/ln_CD.UTF-8.in: Likewise. * localedata/mi_NZ.UTF-8.in: Likewise. * localedata/ml_IN.UTF-8.in: Likewise. * localedata/mn_MN.UTF-8.in: Likewise. * localedata/mr_IN.UTF-8.in: Likewise. * localedata/mt_MT.UTF-8.in: Likewise. * localedata/nb_NO.UTF-8.in: Likewise. * localedata/om_KE.UTF-8.in: Likewise. * localedata/os_RU.UTF-8.in: Likewise. * localedata/ps_AF.UTF-8.in: Likewise. * localedata/ro_RO.UTF-8.in: Likewise. * localedata/ru_RU.UTF-8.in: Likewise. * localedata/sc_IT.UTF-8.in: Likewise. * localedata/se_NO.UTF-8.in: Likewise. * localedata/sq_AL.UTF-8.in: Likewise. * localedata/sv_SE.UTF-8.in: Likewise. * localedata/szl_PL.UTF-8.in: Likewise. * localedata/tg_TJ.UTF-8.in: Likewise. * localedata/tk_TM.UTF-8.in: Likewise. * localedata/tt_RU.UTF-8.in: Likewise. * localedata/tt_RU.UTF-8@iqtelif.in: Likewise. * localedata/ug_CN.UTF-8.in: Likewise. * localedata/uz_UZ.UTF-8.in: Likewise. * localedata/vi_VN.UTF-8.in: Likewise. * localedata/yi_US.UTF-8.in: Likewise. * localedata/yo_NG.UTF-8.in: Likewise. * localedata/zh_CN.UTF-8.in: Likewise. * localedata/locales/am_ET: Adapt collation rules to new iso14651_t1_common file and fix bugs in the collation. * localedata/locales/az_AZ: Likewise. * localedata/locales/be_BY: Likewise. * localedata/locales/ber_DZ: Likewise. * localedata/locales/ber_MA: Likewise. * localedata/locales/bg_BG: Likewise. * localedata/locales/br_FR: Likewise. * localedata/locales/br_FR@euro: Likewise. * localedata/locales/ca_ES: Likewise. * localedata/locales/cns11643_stroke: Likewise. * localedata/locales/crh_UA: Likewise. * localedata/locales/cs_CZ: Likewise. * localedata/locales/csb_PL: Likewise. * localedata/locales/cv_RU: Likewise. * localedata/locales/cy_GB: Likewise. * localedata/locales/da_DK: Likewise. * localedata/locales/dz_BT: Likewise. * localedata/locales/en_CA: Likewise. * localedata/locales/eo: Likewise. * localedata/locales/es_CU: Likewise. * localedata/locales/es_EC: Likewise. * localedata/locales/es_ES: Likewise. * localedata/locales/es_US: Likewise. * localedata/locales/et_EE: Likewise. * localedata/locales/fa_IR: Likewise. * localedata/locales/fi_FI: Likewise. * localedata/locales/fil_PH: Likewise. * localedata/locales/fur_IT: Likewise. * localedata/locales/gez_ER@abegede: Likewise. * localedata/locales/ha_NG: Likewise. * localedata/locales/hr_HR: Likewise. * localedata/locales/hsb_DE: Likewise. * localedata/locales/hu_HU: Likewise. * localedata/locales/ig_NG: Likewise. * localedata/locales/ik_CA: Likewise. * localedata/locales/is_IS: Likewise. * localedata/locales/iso14651_t1_pinyin: Likewise. * localedata/locales/kk_KZ: Likewise. * localedata/locales/ku_TR: Likewise. * localedata/locales/ky_KG: Likewise. * localedata/locales/ln_CD: Likewise. * localedata/locales/lt_LT: Likewise. * localedata/locales/lv_LV: Likewise. * localedata/locales/mi_NZ: Likewise. * localedata/locales/ml_IN: Likewise. * localedata/locales/mn_MN: Likewise. * localedata/locales/mr_IN: Likewise. * localedata/locales/mt_MT: Likewise. * localedata/locales/nb_NO: Likewise. * localedata/locales/om_KE: Likewise. * localedata/locales/os_RU: Likewise. * localedata/locales/pl_PL: Likewise. * localedata/locales/ps_AF: Likewise. * localedata/locales/ro_RO: Likewise. * localedata/locales/ru_RU: Likewise. * localedata/locales/ru_UA: Likewise. * localedata/locales/sc_IT: Likewise. * localedata/locales/se_NO: Likewise. * localedata/locales/si_LK: Likewise. * localedata/locales/sq_AL: Likewise. * localedata/locales/sv_FI: Likewise. * localedata/locales/sv_FI@euro: Likewise. * localedata/locales/sv_SE: Likewise. * localedata/locales/szl_PL: Likewise. * localedata/locales/tg_TJ: Likewise. * localedata/locales/ti_ER: Likewise. * localedata/locales/tk_TM: Likewise. * localedata/locales/tl_PH: Likewise. * localedata/locales/tr_TR: Likewise. * localedata/locales/tt_RU: Likewise. * localedata/locales/tt_RU@iqtelif: Likewise. * localedata/locales/ug_CN: Likewise. * localedata/locales/uk_UA: Likewise. * localedata/locales/uz_UZ: Likewise. * localedata/locales/uz_UZ@cyrillic: Likewise. * localedata/locales/vi_VN: Likewise. * localedata/locales/yi_US: Likewise. * localedata/locales/yo_NG: Likewise.	2018-02-27 17:47:50 +01:00
Mike FABIAN	df74ef786f	Add sections for various scripts to the iso14651_t1_common file * localedata/locales/iso14651_t1_common: Add sections for various scripts to the iso14651_t1_common file.	2018-02-27 16:52:54 +01:00
Mike FABIAN	d5adfbadd4	iso14651_t1_common: make the fourth level the codepoint for characters which are ignorable on all 4 levels Entries for characters which have “IGNORE” on all 4 levels like: <U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429) are changed into: <U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429) i.e. putting the code point of the character into the fourth level instead of “IGNORE”. Without that change, all such characters would compare equal which would make a wcscoll test case fail. It is better to have a clearly defined sort order even for characters like this so it is good to use the code point as a tie-break. * localedata/locales/iso14651_t1_common: Use the code point of a character in the fourth collation level instead of IGNORE for all entries which have IGNORE on all 4 levels.	2018-02-27 16:50:30 +01:00
Mike FABIAN	5f5a961091	Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_common * localedata/locales/iso14651_t1_common: Add some convenient collation symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using rules similar to those in CLDR.	2018-02-27 16:47:22 +01:00
Mike FABIAN	8a97e9002f	Fixing syntax errors after updating the iso14651_t1_common file * localedata/locales/iso14651_t1_common: The new version of this file downloaded from ISO contained several syntax errors which are fixed by this patch.	2018-02-27 16:45:30 +01:00
Mike FABIAN	bbdd2fba7d	iso14651_t1_common: <U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)> → <U000\1> * localedata/locales/iso14651_t1_common: replace all <U.....> with <U000.....> because glibc understands only 4 digit or 8 digit	2018-02-27 16:44:03 +01:00
Mike FABIAN	1569e551af	Necessary changes after updating the iso14651_t1_common file * localedata/locales/iso14651_t1_common: Necessary changes to make the file downloaded from ISO usable by glibc.	2018-02-27 16:42:14 +01:00
Mike FABIAN	9479b6d5e0	Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095 ] [BZ #14095] - Review / update collation data from Unicode / ISO 14651 File downloaded from: http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt Updating this file alone is not enough, there are problems in the new file which need to be fixed and the collation rules for many locales need to be adapted. This is done by the following patches. This update also fixes the problem that many characters are treated as identical when sorting because they were not yet in the old iso14651_t1_common file, see: https://bugzilla.redhat.com/show_bug.cgi?id=1336308 - Infinite (∞) and empty set (∅) are treated as if they were the same character by sort and uniq [BZ #14095] * localedata/locales/iso14651_t1_common: Update file to latest version from ISO (ISO14651_2016_TABLE1_en.txt).	2018-02-27 16:36:31 +01:00
Alexandre Oliva	8da25eec0a	Collation fix: make forward accent sorting the default [BZ #17750 ] [BZ #17750] * Makefile: add fr_CA.UTF-8 to test-input and LOCALES. * localedata/fr_CA.UTF-8.in: New file with test data for backward accents sorting. * localedata/fr_FR.UTF-8.in: Fix test data for forward accents sorting. * localedata/locales/cs_CZ (LC_COLLATE): Remove “define DIACRIT_FORWARD” * localedata/locales/de_DE (LC_COLLATE): Likewise. * localedata/locales/hu_HU (LC_COLLATE): Likewise. * localedata/locales/lb_LU (LC_COLLATE): Likewise. * localedata/locales/yuw_PG (LC_COLLATE): Likewise. * localedata/locales/fr_CA (LC_COLLATE): Add “define DIACRIT_BACKWARD” * localedata/locales/iso14651_t1_common: Use “ifdef DIACRIT_FORWARD” instead of “ifdef DIACRIT_BACKWARD”. The only locale which currently needs backward accents sorting is fr_CA. Therefore, forward accents sorting should be the default. Before this patch, backwards accent sorting was the default and all locales except fr_CA had to use define DIACRIT_FORWARD before copy "iso14651_t1" Most locales didn’t do that and thus got the inappropriate backwards accents sorting by accident. Now only the fr_CA locale needs to use define DIACRIT_BACKWARD before copy "iso14651_t1" Original patch slightly modified by: Mike FABIAN <mfabian@redhat.com>	2017-11-29 11:56:46 +01:00
Santhosh Thottingal	b05eca0e1d	Correct collation rules for Malayalam. [BZ #19922] * locales/iso14651_t1_common: Add collation rules for U+07DA to U+07DF. [BZ #19919] * locales/iso14651_t1_common: Correct collation of U+0D36 and U+0D37.	2017-06-11 10:08:37 -04:00
Mike Frysinger	a4cea54b12	localedata: standardize copyright/license information [BZ #11213 ] Use the language from the FSF in all locale files to disclaim any license/copyright on locale data. See https://sourceware.org/ml/libc-locales/2013-q1/msg00048.html	2016-03-21 02:29:56 -04:00
Ulrich Drepper	b426c80f5f	Fix whitespaces	2011-05-15 11:37:52 -04:00
Ulrich Drepper	08ba84136f	Move Dzonghka collation rules to common collation rules file	2011-05-15 11:36:07 -04:00
Pravin Satpute	1e5e9ec825	Fix sorting of malayalam letter 'na'.	2010-02-03 03:50:01 -08:00
Ulrich Drepper	6b4f51823c	Fix whitespaces.	2010-02-03 03:36:52 -08:00
Pravin Satpute	3e8a75d1b9	Move Tamil collation data to common source file.	2010-02-03 03:32:06 -08:00
Keith Stribley	3c2c4bf6f7	Implement Burmese language locale for Myanmar.	2009-10-30 08:14:02 -07:00
Ulrich Drepper	115a532734	* localedata/locales/bn_BD: Remove comment about missing collation rules. * localedata/locales/iso14651_t1_common: Add Bengali collation rules. Patch by Pravin Satpute <psatpute@redhat.com>.	2009-05-04 21:20:20 +00:00
Ulrich Drepper	eee6b14327	[BZ #9759 ] * dirent/dirent.h: Adjust prototypes of scandir, scandir64, alphasort, alphasort64, versionsort, and versionsort64 to POSIX 2008. * dirent/alphasort.c: Adjust implementation to type change. * dirent/alphasort64.c: Likewise. * dirent/scandir.c: Likewise. * dirent/versionsort.c: Likewise. * dirent/versionsort64.c: Likewise. * sysdeps/wordsize-64/alphasort.c: Add hack to hide alphasort64 declaration. * sysdeps/wordsize-64/versionsort.c: Add hack to hide versionsort64 declaration.	2009-03-15 21:33:19 +00:00
Ulrich Drepper	638633961d	* locales/iso14651_t1_common: Add rules for sorting Malayalam. Patch by Santhosh Thottingal <santhosh.thottingal@gmail.com>.	2009-02-11 15:42:53 +00:00
Ulrich Drepper	06057297c4	* locales/iso14651_t1_common: Fix sorting of U+0AB3. Patch by Pravin Satpute <psatpute@redhat.com>.	2008-12-31 14:58:14 +00:00
Ulrich Drepper	6daf1a2fb1	[BZ #6867 ] * sysdeps/powerpc/elf/rtld-global-offsets.sym: Fix typo.	2008-10-31 19:03:31 +00:00
Ulrich Drepper	46026b5589	* locales/iso14651_t1_common: Add Kannada collation support. Patch by Pravin Satpute <psatpute@redhat.com>.	2008-07-11 17:05:42 +00:00
Ulrich Drepper	99ae13c825	* locales/iso14651_t1_common: Add support for Gurumukhi script. Patch by Pravin Satpute <psatpute@redhat.com>.	2008-06-24 16:59:47 +00:00
Ulrich Drepper	e564d29d8e	Remove U0C0D entry added for Telugu.	2008-05-21 15:13:02 +00:00
Ulrich Drepper	74e1338588	* string/strcasestr.c (CMP_FUNC): Use __strncasecmp, not strncasecmp.	2008-05-16 18:19:18 +00:00
Ulrich Drepper	2f9a1be867	[BZ #6442 ] * string/endian.h: Add macros for fixed-size endian conversion. * bits/byteswap.h: Allow inclusion from <endian.h>. * sysdeps/i386/bits/byteswap.h: Likewise. * sysdeps/ia64/bits/byteswap.h: Likewise. * sysdeps/s390/bits/byteswap.h: Likewise. * sysdeps/x86_64/bits/byteswap.h: Likewise. * string/Makefile (tests): Add tst-endian. * string/tst-endian.c: New file.	2008-05-15 02:54:33 +00:00
Ulrich Drepper	23c37224d3	Fix first weight for U+1E60, U+1E62, U+1E64, U+1E66, and U+1E68.	2008-04-07 23:53:20 +00:00
Ulrich Drepper	4e0b2dbe54	* locales/iso14651_t1_common: Add support for Gujarati script. Patch by Pravin Satpute <psatpute@redhat.com>.	2008-03-31 14:15:28 +00:00
Ulrich Drepper	85ac24138b	* locales/iso14651_t1_common: Add support for Devanagari script. * locales/mr_IN: Adjust Devanagari sorting for mr_IN. Patch by Pravin Satpute <psatpute@redhat.com>.	2008-03-24 05:08:33 +00:00
Ulrich Drepper	3a054d7ab0	* locale/programs/locfile-token.h: Remove tok_elif, add tok_elifdef and tok_elifndef. * locale/programs/locfile-kw.gperf: Likewise. * locale/programs/ld-collate.c: Implement primitive preprocessor.	2007-10-11 02:36:04 +00:00
Ulrich Drepper	592a95ee7c	* po/pt_BR.po: Fix typo.	2007-09-30 16:57:15 +00:00
Ulrich Drepper	762422d1bd	* locale/programs/ld-collate.c (collate_read): Allow order_start after copy.	2007-04-28 06:51:26 +00:00

34 Commits