qt5base-lts

AuroraMiddleware/qt5base-lts

Fork 0

Commit Graph

Author	SHA1	Message	Date
Edward Welbourne	81cf23c7a7	Take CLDR's distinguished attributes into account When doing XPATH searches, child nodes that have distinguished attributes that were not asked for should be skipped. This is part of the LDML spec and matters when resolving locale inheritance. Scan the LDML DTD (previously only scanned for the CLDR version) to find which attributes of which tags are ignorable - all others are distinguished - and take the result into account when performing XPATH searches. The XPath we were using for currency formats wasn't excluding currencyFormatLength elements with type="short" and patterns specific to thousands (and larger multiples); this is fixed by taking distinguished attributes into account. However, the XPATH also wasn't specifying the always distinguished attribute type="standard" that was, in practice, used for nearly all locales that weren't (wrongly) using short-forms for thousands; so type="standard" is now made explicit, so as to minimize the diff. This leaves only twenty-one locales with a negative currency formats. A later commit shall switch to using accounting by default (it falls back via an alias to standard, in any case), thereby restoring the two mentioned below that were using it by accident, but the present change gives the minimal diff here. Thousands-specific formats replaced with sensible ones: * zh_Hant_{HK,MO} (Traditional Mandarin, Hong Kong and Macau) * eo_001 (Esperanto) * fr_CA (Canadian French) * ha_* (Hausa, when not written in Arabic) * es_{GT,MX,US} (Spanish - Guatemala, Mexico, USA) * sw_KE (Swahili, Kenya) * yi_001 (Yiddish) * mfe_MU (Morisyen, Mauritius) * lag_TZ (Langi, Tanzania) * mgh_MZ (Makhuwa Meetto, Mozambique) * wae_CH (Walser, Switzerland) * kkj_CM (Kako, Cameroon) * lkt_US (Lakota, USA) * pa_Arab_PK (Punjabi, in Arabic script, as used in Pakistan; uses arabext number system, whose currency falls back to latn's, for which pa_Arab over-rides the thousands-format). Format changed from an over-ridden type="accounting" to standard (so these lost a negative-specific form) in: * en_SI (English, Slovenia) * es_DO (Spanish, Dominican Republic; same) For some locales we were picking up over-rides of narrow or short list formats, or formats for or-lists or unit-lists rather than and-lists, in place of the standard list format, that these locales don't over-ride, provided by a parent locale. This changed list formats for: * en_CA, en_IN (dropped "Oxford" comma before "and") * qu_* (Quechua; dropped "utaq", presumably meaning "and") * ur_IN (Urdu, India; was using unit-list formats) [ChangeLog][QtCore][QLocale] Data used for currency formats in several locales and list patterns in some locales have changed due to now parsing the CLDR data more faithfully. Fixes: QTBUG-81344 Change-Id: I6b95c6c37db92df167153767c1b103becfb0ac98 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>	2020-04-02 19:43:28 +01:00
Edward Welbourne	be3dfd7a71	Rework cldr2qlocalexml.py's reading of CLDR data Move the code out to a CldrReader class in cldr.py, expand CldrAccess with facilities that needs, expand ldml.py to include support for more features, finally making xpathlite.py redundant. This initial commit aims, though, to be bug-for-bug compatible with xpathlite in its reading of the CLDR data. It turns out we've been using draftier data than we were aware of (which might not be a bad thing). The xpathlite code appeared to check for draft attributes, but these only appear on leaf nodes and most data were fetched by finding a parent and then scanning its children without the draft check; only am/pm data was actually being excluded based on draft values. (We allowed contributed, for am/pm, in addition to approved, which is all the xpathlite code allows otherwise.) There are also some less equivocal bugs; I'll deal with these in later commits. Simplified number-system data look-ups; the old get_number_in_system() was taking care of old LDML versions' placement of the number system attribute; this is no longer needed. (It was also being used for a currency value to which it was not appropriate, which is now handled separately; this is one of the bugs mentioned above.) Ditched a fall-back to nativeZeroDigit, which no longer exists in CLDR. Change the command-line to take the root of the CLDR data tree, rather than its common/main/ sub-directory. Support naming the file to which to write output, as a second command-line argument, instead of always writing to stdout (which remains the default) and leaving whoever runs the script to redirect stdout. Support (internally for now, while adding TODOs to give main() more command-line options) separating the stderr output into its more and less interesting parts; for now, continue producing both, but suppress the least interesting entirely. Task-number: QTBUG-81344 Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71 Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>	2020-04-02 19:43:18 +01:00
Edward Welbourne	c834dbc6fb	Move cldr2qtimezone.py's CLDR-reading to a CldrAccess class This begins the process of replacing xpathlite.py, adding low-level DOM-access classes to ldml.py and the CldrAccess class to cldr.py Moved a format comment from cldr2qtimezone.py's doc-string to the method of CldrAccess that does the actual reading. Task-number: QTBUG-81344 Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>	2020-04-02 19:43:13 +01:00

Author

SHA1

Message

Date

Edward Welbourne

81cf23c7a7

Take CLDR's distinguished attributes into account

When doing XPATH searches, child nodes that have distinguished
attributes that were not asked for should be skipped. This is part of
the LDML spec and matters when resolving locale inheritance. Scan the
LDML DTD (previously only scanned for the CLDR version) to find which
attributes of which tags are ignorable - all others are distinguished
- and take the result into account when performing XPATH searches.

The XPath we were using for currency formats wasn't excluding
currencyFormatLength elements with type="short" and patterns specific
to thousands (and larger multiples); this is fixed by taking
distinguished attributes into account. However, the XPATH also wasn't
specifying the always distinguished attribute type="standard" that
was, in practice, used for nearly all locales that weren't (wrongly)
using short-forms for thousands; so type="standard" is now made
explicit, so as to minimize the diff.

This leaves only twenty-one locales with a negative currency formats.
A later commit shall switch to using accounting by default (it falls
back via an alias to standard, in any case), thereby restoring the two
mentioned below that were using it by accident, but the present change
gives the minimal diff here.

Thousands-specific formats replaced with sensible ones:
* zh_Hant_{HK,MO} (Traditional Mandarin, Hong Kong and Macau)
* eo_001 (Esperanto)
* fr_CA (Canadian French)
* ha_* (Hausa, when not written in Arabic)
* es_{GT,MX,US} (Spanish - Guatemala, Mexico, USA)
* sw_KE (Swahili, Kenya)
* yi_001 (Yiddish)
* mfe_MU (Morisyen, Mauritius)
* lag_TZ (Langi, Tanzania)
* mgh_MZ (Makhuwa Meetto, Mozambique)
* wae_CH (Walser, Switzerland)
* kkj_CM (Kako, Cameroon)
* lkt_US (Lakota, USA)
* pa_Arab_PK (Punjabi, in Arabic script, as used in Pakistan; uses
  arabext number system, whose currency falls back to latn's, for
  which pa_Arab over-rides the thousands-format).

Format changed from an over-ridden type="accounting" to standard (so
these lost a negative-specific form) in:
* en_SI (English, Slovenia)
* es_DO (Spanish, Dominican Republic; same)

For some locales we were picking up over-rides of narrow or short list
formats, or formats for or-lists or unit-lists rather than and-lists,
in place of the standard list format, that these locales don't
over-ride, provided by a parent locale. This changed list formats for:
* en_CA, en_IN (dropped "Oxford" comma before "and")
* qu_* (Quechua; dropped "utaq", presumably meaning "and")
* ur_IN (Urdu, India; was using unit-list formats)

[ChangeLog][QtCore][QLocale] Data used for currency formats in several
locales and list patterns in some locales have changed due to now
parsing the CLDR data more faithfully.

Fixes: QTBUG-81344
Change-Id: I6b95c6c37db92df167153767c1b103becfb0ac98
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>

2020-04-02 19:43:28 +01:00

Edward Welbourne

be3dfd7a71

Rework cldr2qlocalexml.py's reading of CLDR data

Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.

It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values.  (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.

Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.

Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.

Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.

Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>

2020-04-02 19:43:18 +01:00

Edward Welbourne

c834dbc6fb

Move cldr2qtimezone.py's CLDR-reading to a CldrAccess class

This begins the process of replacing xpathlite.py, adding low-level
DOM-access classes to ldml.py and the CldrAccess class to cldr.py

Moved a format comment from cldr2qtimezone.py's doc-string to the
method of CldrAccess that does the actual reading.

Task-number: QTBUG-81344
Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>

2020-04-02 19:43:13 +01:00

3 Commits