Previously, if we found one element with required attributes, we would
search into it and ignore any later elements also with those required
attributes. This meant that, if the first didn't contain the child
elements we were looking for, we'd fail to find what we sought, if it
was in a later matching element (e.g. with some ignored attributes).
We would then go on to look for a match in a later file, where there
might have been a match we should have found in the earlier file.
Check all matches, rather than only the first match in each file. Do
the search in each file "in parallel" to save reparsing the XPath.
This clears the search code of rather hard-to-follow break/else
handling in loops; and currently makes no change to the generated
data.
Change-Id: I86b010e65b9a1fc1b79e5fdd45a5aeff1ed5d5d5
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
In particular, this changed the US currency formats for negative
amounts to be parenthesised versions of the positive amount forms,
rather than having a minus sign after the $ sign. Test updated.
[ChangeLog][QtCore][QLocale] Currency formats are now based on CLDR's
accounting formats, where they were previously mostly based (more or
less by accident) on standard formats. In particular, this now means
negative currency formats are specified, where available, where they
(mostly) were not previously.
Task-number: QTBUG-79902
Change-Id: Ie0c07515ece8bd518a74a6956bf97ca85e9894eb
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
When doing XPATH searches, child nodes that have distinguished
attributes that were not asked for should be skipped. This is part of
the LDML spec and matters when resolving locale inheritance. Scan the
LDML DTD (previously only scanned for the CLDR version) to find which
attributes of which tags are ignorable - all others are distinguished
- and take the result into account when performing XPATH searches.
The XPath we were using for currency formats wasn't excluding
currencyFormatLength elements with type="short" and patterns specific
to thousands (and larger multiples); this is fixed by taking
distinguished attributes into account. However, the XPATH also wasn't
specifying the always distinguished attribute type="standard" that
was, in practice, used for nearly all locales that weren't (wrongly)
using short-forms for thousands; so type="standard" is now made
explicit, so as to minimize the diff.
This leaves only twenty-one locales with a negative currency formats.
A later commit shall switch to using accounting by default (it falls
back via an alias to standard, in any case), thereby restoring the two
mentioned below that were using it by accident, but the present change
gives the minimal diff here.
Thousands-specific formats replaced with sensible ones:
* zh_Hant_{HK,MO} (Traditional Mandarin, Hong Kong and Macau)
* eo_001 (Esperanto)
* fr_CA (Canadian French)
* ha_* (Hausa, when not written in Arabic)
* es_{GT,MX,US} (Spanish - Guatemala, Mexico, USA)
* sw_KE (Swahili, Kenya)
* yi_001 (Yiddish)
* mfe_MU (Morisyen, Mauritius)
* lag_TZ (Langi, Tanzania)
* mgh_MZ (Makhuwa Meetto, Mozambique)
* wae_CH (Walser, Switzerland)
* kkj_CM (Kako, Cameroon)
* lkt_US (Lakota, USA)
* pa_Arab_PK (Punjabi, in Arabic script, as used in Pakistan; uses
arabext number system, whose currency falls back to latn's, for
which pa_Arab over-rides the thousands-format).
Format changed from an over-ridden type="accounting" to standard (so
these lost a negative-specific form) in:
* en_SI (English, Slovenia)
* es_DO (Spanish, Dominican Republic; same)
For some locales we were picking up over-rides of narrow or short list
formats, or formats for or-lists or unit-lists rather than and-lists,
in place of the standard list format, that these locales don't
over-ride, provided by a parent locale. This changed list formats for:
* en_CA, en_IN (dropped "Oxford" comma before "and")
* qu_* (Quechua; dropped "utaq", presumably meaning "and")
* ur_IN (Urdu, India; was using unit-list formats)
[ChangeLog][QtCore][QLocale] Data used for currency formats in several
locales and list patterns in some locales have changed due to now
parsing the CLDR data more faithfully.
Fixes: QTBUG-81344
Change-Id: I6b95c6c37db92df167153767c1b103becfb0ac98
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
CLDR's currency formats do have number system variation, so take it
into account. (The old xpathlite code clearly intended to do this, but
failed at it due to looking for the wrong component of an XPATH to
fix.) This changes the currency formats in use for
* all Dutch locales (because nl.xml lists a currency format for arab
before the one for latn, and they differ),
* Punjabi, Urdu - specifically pa_Guru_IN, ur_Arab_PK (both like
Dutch, arabext before latn; which is correct for pa_Arab_PK and
ur_Arab_IN),
* Sindi (whose over-ride of latn currency format we were using, where
we should be using arab's format, supplied by root's default),
* Tatar (which specifies a generic currency format, which we were
using, before one specific to latn, which we now use),
* Tongan (same as Dutch),
* Konkani (like Dutch, deva before latn) and
* several North African Arabic locales (whose default number system is
latn, rather than arab, but previously used arab's formats).
Task-number: QTBUG-79902
Change-Id: I18d8ec16bfd3a516d1bcd2f63bc7f7f15179a3f4
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
This begins the process of replacing xpathlite.py, adding low-level
DOM-access classes to ldml.py and the CldrAccess class to cldr.py
Moved a format comment from cldr2qtimezone.py's doc-string to the
method of CldrAccess that does the actual reading.
Task-number: QTBUG-81344
Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>