Commit Graph

77 Commits

Author SHA1 Message Date
Ievgenii Meshcheriakov
b02d17c5c0 Convert CLDR scripts to Python 3
The convertion is moslty done using 2to3 script with manual cleanup
afterwards.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I4d33b04e7269c55a83ff2deb876a23a78a89f39d
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-15 17:06:53 +02:00
Edward Welbourne
a37c0ef55f Convert python comparison function to key function
Instead of implementing all the intricacies of a cmp for the python
sort-function, support for which is due to be dropped at Python 3 in
any case, implement a much simpler key function that achieves the same
result.

In the process, eliminate the ugly kludge of setting an attribute on a
function to, in effect, communicate with it via a global. Instead,
instantiate a class, that wraps the value previously given to the
attribute and whose instance provides the key-function.

Thanks to Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io> for
pointing out that a key function is the way of the future - and
sorted() is a nicer way to sort.

Pick-to: 6.2
Change-Id: Icf1ed5597fedf420d054fbc860e3e7fc6615875c
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-07-14 20:59:00 +02:00
Edward Welbourne
7dec56c6a5 Make locale ordering transitive
The ordering function used to sort the locale data generated for
QLocale attempted to sort the default territory for a given language
and script before other territories, but was too tangled for it to be
obvious this is what it was doing. The result turned out to be
non-transitive. Replace with code that implements the same preference
but only applies it where the result is compatible with transitivity.

This leads to a shuffling of the order of the Serbian-language
locales, which sorts the Cyrillic ones before the Latin ones. This is
consistent with my reading of the CLDR data, which fills in Cyrillic
and Serbia for Serbian; Serbian/Cyrillic/Serbia did previously sort
before all other Serbian variants.

Thanks to Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io> for
discovering the non-transitivity.

Pick-to: 6.2
Change-Id: I0ce9f78e620e714f980f32b85b7100ed0f92ad74
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-07-14 20:59:00 +02:00
Ievgenii Meshcheriakov
d804d21e8f cldr.py: Avoid raising StopIteration from generators
The behavior of StopIteration in generators was changed in Python 3
(see https://www.python.org/dev/peps/pep-0479/). Not raising that
exception makes it easier to port the code to Python 3.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Iac6e3f6f1e1e8ef3a1a0d89b19d2ac2d186434f5
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-09 13:42:21 +02:00
Ievgenii Meshcheriakov
2300146085 locale_database: Don't attempt to access property 'message' of IOError
IOError does not have property 'message' in Python 3. Instead of
attempting to access it, just use the string representation of
the exception object. This produces the error message possibly combined
with additional arguments in both Python 2 and Python 3.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Icb198a409e7f80b832e474d8390b770fdeacc6c2
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-07 19:32:52 +02:00
Ievgenii Meshcheriakov
6deb28f35a qlocalexml2cpp.py: Remove undefined name inside error processing code
Name 'stem' is undefined inside CalendarDataWriter.write(). The error
was repoted by flake8.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Ib816b40d0bde2afd3112da76deee0ce39985693a
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-06 20:19:22 +02:00
Ievgenii Meshcheriakov
1887c4ecc1 locale_database: Sort lists of unused tags before printing
This way the output is easier to compare between versions.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: If4053c574c4ad200a179b06276bd889f2cb9e1c6
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-06 15:17:15 +02:00
Ievgenii Meshcheriakov
0b2646c495 locale_data: Add new line at the end of script output
Output of cldr2qlocalexml.py looks weird without the final new line.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I5d675e475c57cdc8101887c39052007ba0a19857
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-06 14:21:39 +02:00
Ievgenii Meshcheriakov
f100d412b4 dateconverter.py: Remove shebang and executable attribute
This is not a script that can be run independently.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I82a93b9ab37ae759b789058d48e94298ecd29b6f
Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-05 18:24:23 +02:00
Edward Welbourne
1a49d7d1e0 Report unused enum members after CLDR data scan
We should at least know when members of QLocale's enums aren't adding
any value, and it may make sense to deprecate the unused ones.

Change-Id: Icf202f81d2a35904c13ccdc202d41985bcb3f2e6
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-06-07 17:14:14 +02:00
Edward Welbourne
e51831260a Nomenclature change: s/countr/territor/g in locale scripts
Change the nomenclature used in the scripts and the QLocaleXML data
format to use "territory" and "territories" in place of "country" and
"countries". Does not change the generated source files.

Change-Id: I4b208d8d01ad2bfc70d289fa6551f7e0355df5ef
Reviewed-by: JiDe Zhang <zhangjide@uniontech.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2021-05-26 18:00:01 +02:00
Edward Welbourne
21e0ef3ccf Rename util/locale_database/enumdata.py's various *_list to *_map
These variables provide mappings, not lists, so name them non-deceptively.

Change-Id: Idf15e78ad73790bc86dd8b9d4f248d1c4f73993c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-05-26 18:00:01 +02:00
Edward Welbourne
181424d9b5 QLocaleXmlWriter.enumData(): move enumdata import to method from caller
The only reason cldr.py imported enumdata was so as to pass what it
imported to writer.enumData(); that method might as well do the import
itself.

Change-Id: Ie77dcd29058f926b8cca4deef35837f30505859f
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-05-26 18:00:01 +02:00
Edward Welbourne
07ed2b054a Remove unused functions from enumdata.py
It's now a data-only module. The callers of its code-to-ID functions
have, for some time now, been rearranging its mappings to get at data
efficiently.

Change-Id: Ia16dcaa767203cdf3b81a96bd51793491ad41563
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2021-05-18 20:57:25 +02:00
JiDe Zhang
50a7eb8cf7 Add the "Territory" enumerated type for QLocale
The use of "Country" is misleading as some entries in the enumeration
are not countries (eg, HongKong), for all that most are. The Unicode
Consortium's Common Locale Data Repository (CLDR, from which QLocale's
data is taken) calls these territories, so introduce territory-based
names and prepare to deprecate the country-based ones in due course.

[ChangeLog][QtCore][QLocale] QLocale now has Territory as an alias for
its Country enumeration, and associated territory-based names to match
its country-named methods, to better match the usage in relevant
standards. The country-based names shall in due course be deprecated
in favor of the territory-based names.

Fixes: QTBUG-91686
Change-Id: Ia1ae1ad7323867016186fb775c9600cd5113aa42
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2021-04-15 20:17:49 +08:00
Edward Welbourne
05e67fbcab Update to CLDR v38.1, adding Yukon Standard Time
No change to QLocale's data, one addition to the Windows time-zone
data. What was formerly "Us Mountain Standard time / Canada" is now
Yukon Standard Time.

Fixes: QTBUG-89784
Pick-to: 6.0 5.15
Change-Id: I4c9a23620e74ea379be8a4c5ba0896d35fe9b594
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2021-01-27 15:00:57 +01:00
Edward Welbourne
b051b18490 Add a note explaining what a macrolanguage is
The comments in enumdata.py indicating macrolanguages meant nothing to
me, until I stumbled on a reference that lead me to ISO 639's usage of
the term. Add a minimal explanation to save such confusion for others.

Change-Id: Ia1d849d93a1d94c04c8c461debdecf879e9a7db5
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-11-24 18:54:45 +01:00
Edward Welbourne
17701a95f8 QLocale: simplify currency display name lookup
We were extracting several candidate display names from CLDR for each
currency, joining them with semicolons, storing in a table, then using
only the first entry from the list - where we should probably have
used the first non-empty entry in any case.

So instead extract the first non-empty candidate name from CLDR and
store that simply, saving the need for semicolon-joining or parsing
out the first entry from the thus-joined list. This significantly
reduces the size of the currency name data table.

Change-Id: I201d0528348d5fcb9eceb5df86211b9c77de3485
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-11-17 19:55:04 +01:00
Edward Welbourne
a9e4bf7eef Implement binary search in QLocale's likely sub-tag lookup
Follow through on a comment from 2012: sort the likely subtag array
(in the CLDR update script) and use bsearch to find entries in it.

This simplifies QLocaleXmlReader.likelyMap() slightly, moving the
detection of last entry to LocaleDataWriter.likelySubtags(), but
requires collecting all likely sub-tag mapping pairs (rather than just
passing them through from read to write via generators) in order to
sort them.

Change-Id: Ieb6875ccde1ddbd475ae68c0766a666ec32b7005
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-11-08 13:01:33 +01:00
Edward Welbourne
4ab6358039 Reorder locale enums alphabetically
Binary-incompatible change: change the numeric values of QLocale's
Language, Script and Country enums, as encouraged by a comment in the
generator script enumdata.py and clarify documentation around that.

In the process (since I was changing almost every line anyway),
convert the dictionary values from (mutable) lists of length two to
tuples, since they are (and should be) immutable data.

Change-Id: I26222bce45b9f5074b1d81ed70015a75ac34adcd
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-11-08 13:01:18 +01:00
Edward Welbourne
73ceb71576 Use newer names for various languages, territories and scripts
Our enumdata.py namings of countries had fallen somewhat out of sync
with CLDR's names. In the process, support including hyphenation in
the unsquashed name, along with spacing. Distinguish, in comments,
between older renamings and those first seen in Qt6.

Change-Id: I91ec444bf35222ab6a9332e389ace19cca0e4fdf
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-11-08 13:01:12 +01:00
Edward Welbourne
ed853a66f8 Simplify QLocaleXmlWriter::enumData()
Move the repeated List suffix to the __enumTable() helper, where half
the parameter's uses were having to snip it off anyway.

Change-Id: Ia396e87e59ceeb81fc4b0890a86934dc67da10cb
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-11-08 13:01:06 +01:00
Edward Welbourne
d11bf5fc24 Check our enumdata.py tables are consistent with CLDR
Compare the code->name mappings we're using to the ones CLDR's
common/main/en.xml provides; report discrepancies. Tolerate tags
missing from en.xml if they're known to the locale-inheritance
machinery.

Change-Id: Ibe96c18bf55984a35de3b3644f3586a9f30720b2
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-11-08 03:14:00 +01:00
Edward Welbourne
3a1bc4bad5 Purge deprecated language and country codes from QLocale
Requires subsequent re-numbering of the enum tables to eliminate gaps,
before locale data can be regenerated. However, it will work with the
present locale data, since it merely loses the means to use some names
for which the available data was just the name and code. This implies
a transient issue of recognising some codes for which there is no
actual enum member; but relevant code will work as before, finding
nothing but the code and its name. This shall be resolved by a coming
BiC change to resort the language, country and script codes, changing
the numbering (almost) completely.

[ChangeLog][QtCore][QLocale] Various obsolete language and country
codes have been removed. Some lacked locale data, others were obsolete
aliases. All have been deprecated in 5.15.

Task-number: QTBUG-84669
Change-Id: I45fc76a5f2f6c3b0ea3c1bb61e917da984183783
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2020-10-29 10:44:38 +00:00
Edward Welbourne
cb23d50f38 Update CLDR to v37, adding Nigerian Pidgin as a new language
Routine update by running scripts, ignoring clang-format's extensive
grumbles. Added notes to util/locale_database/'s README, on the need
for that, and enumdata.py, on when to add entries. As usual, several
new locales are also added, for existing languages, territories and
scripts.

[ChangeLog][QtCore][QLocale] Updated to new version of CLDR (the
Unicode Consortium's Common Locale Data Repository) v37.

Fixes: QTBUG-84669
Pick-to: 5.15
Change-Id: Ib76848bf4bd1219180faf46820077e8d8049a4e3
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-10-26 15:28:59 +02:00
Edward Welbourne
8b0e068847 Mark QLocale's Language, Country and Script enums as ushort
The code pervasively presumes their values can be held in a ushort, so
make sure the compiler knows we expect that to work (and doesn't
complain about narrowing when we do convert them to ushort).

Change-Id: Idde7be6cceee8a6dae333c5b1d5a0120fec32e4a
Reviewed-by: Andrei Golubev <andrei.golubev@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-10-12 16:53:40 +02:00
Edward Welbourne
48ab30e02a Update util/locale_database/'s README and timezone script instructions
The script told me the wrong path to pass as first argument, so
correct that; and the README didn't mention the need to run it.
CLDR v37 makes no change to the actual generated data, though.
Tweaked wording of a comment in the script.

Task-number: QTBUG-84669
Change-Id: I56b510c666f414d9719cef650aeec6192c4fde6e
Reviewed-by: Andrei Golubev <andrei.golubev@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-10-09 14:39:55 +02:00
Allan Sandfeld Jensen
564b59d903 Another round of replacing 0 with nullptr
This time based on grepping to also include documentation, tests and
examples previously missed by the automatic tool.

Change-Id: Ied1703f4bcc470fbc275f759ed5b7c588a5c4e9f
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
2020-10-07 23:02:47 +02:00
Edward Welbourne
d853816307 Fix handling of Suzhou numbering system
This only arises when the system locale tells us to use its zero as
our zero digit, since no CLDR locale uses it by default. Adapt an
MS-specific QLocale::system() test to use Suzhou numbering, so as to
test this.

While updating the locale-restoration code to also restore the digits
being set in that test, add restore code for the long time format,
where previously only the short time format was restored. Add a
comment to make it less likely one of those shall be missed in future.

Fixes: QTBUG-85409
Change-Id: I343324bb563ee0e455dfe77d4825bf8c3082ca30
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-07-17 12:19:01 +02:00
Edward Welbourne
bb6a73260e Support digit-grouping correctly
Read three more values from CLDR and add a byte to the bit-fields at
the end of QLocaleData, indicating the three group sizes. This adds
three new parameters to various low-level formatting functions. At the
same time, rename ThousandsGroup to GroupDigits, more faithfully
expressing what this (internal) option means.

This replaces commit 27d1391280 with a
fuller implementation that handles digit-grouping in any of the ways
that CLDR supports. The formerly "Indian" formatting now also applies
to at least some locales for Bangladesh, Bhutan and Sri Lanka.

Fixed Costa Rica currency formatting test that wrongly put a separator
after the first digit; the locale (in common with several Spanish
locales) requires at least two digits before the first separator.

[ChangeLog][QtCore][Important Behavior Changes] Some locales require
more than one digit before the first grouping separator; others use
group sizes other than three. The latter was partially supported (only
for India) at 5.15 but is now systematically supported; the former is
now also supported.

Task-number: QTBUG-24301
Fixes: QTBUG-81050
Change-Id: I4ea4e331f3254d1f34801cddf51f3c65d3815573
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-07-14 14:52:08 +02:00
Dimitrios Apostolou
1e546595e9 Remove unused imports
As found by LGTM.com.

Change-Id: I1704f10f9bab1b11ab22824aca0cfcdcb47fef2f
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2020-07-10 02:36:54 +02:00
Qt Forward Merge Bot
8823bb8d30 Merge remote-tracking branch 'origin/5.15' into dev
Conflicts:
	examples/opengl/doc/src/cube.qdoc
	src/corelib/global/qlibraryinfo.cpp
	src/corelib/text/qbytearray_p.h
	src/corelib/text/qlocale_data_p.h
	src/corelib/time/qhijricalendar_data_p.h
	src/corelib/time/qjalalicalendar_data_p.h
	src/corelib/time/qromancalendar_data_p.h
	src/network/ssl/qsslcertificate.h
	src/widgets/doc/src/graphicsview.qdoc
	src/widgets/widgets/qcombobox.cpp
	src/widgets/widgets/qcombobox.h
	tests/auto/corelib/tools/qscopeguard/tst_qscopeguard.cpp
	tests/auto/widgets/widgets/qcombobox/tst_qcombobox.cpp
	tests/benchmarks/corelib/io/qdiriterator/qdiriterator.pro
	tests/manual/diaglib/debugproxystyle.cpp
	tests/manual/diaglib/qwidgetdump.cpp
	tests/manual/diaglib/qwindowdump.cpp
	tests/manual/diaglib/textdump.cpp
	util/locale_database/cldr2qlocalexml.py
	util/locale_database/qlocalexml.py
	util/locale_database/qlocalexml2cpp.py

Resolution of util/locale_database/ are based on:
https://codereview.qt-project.org/c/qt/qtbase/+/294250
and src/corelib/{text,time}/*_data_p.h were then regenerated by
running those scripts.

Updated CMakeLists.txt in each of
	tests/auto/corelib/serialization/qcborstreamreader/
	tests/auto/corelib/serialization/qcborvalue/
	tests/auto/gui/kernel/
and generated new ones in each of
	tests/auto/gui/kernel/qaddpostroutine/
	tests/auto/gui/kernel/qhighdpiscaling/
	tests/libfuzzer/corelib/text/qregularexpression/optimize/
	tests/libfuzzer/gui/painting/qcolorspace/fromiccprofile/
	tests/libfuzzer/gui/text/qtextdocument/sethtml/
	tests/libfuzzer/gui/text/qtextdocument/setmarkdown/
	tests/libfuzzer/gui/text/qtextlayout/beginlayout/
by running util/cmake/pro2cmake.py on their changed .pro files.

Changed target name in
	tests/auto/gui/kernel/qaction/qaction.pro
	tests/auto/gui/kernel/qaction/qactiongroup.pro
	tests/auto/gui/kernel/qshortcut/qshortcut.pro
to ensure unique target names for CMake

Changed tst_QComboBox::currentIndex to not test the
currentIndexChanged(QString), as that one does not exist in Qt 6
anymore.

Change-Id: I9a85705484855ae1dc874a81f49d27a50b0dcff7
2020-04-08 20:11:39 +02:00
Edward Welbourne
727afdf344 Fix parameter order in cldr2qlocalexml.py's usage()
Callers and definition were out of sync.

Change-Id: Icda26887cb64c61c7e373766f25559b0d450d112
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-06 14:29:32 +02:00
Edward Welbourne
cabd8f860b Ensure we use UTF-8 for the emitted QLocaleXML data file
Python helpfully uses a sensible locale when stdout is a tty but uses
the system (not the filesystem) default encoding, which may be ascii
and unable to encode some of the data we need to save. So brute force
kludge it to ensure emit.encoding is UTF-8 when writing the output
we'll read as UTF-8 anyway.

(This matches dev's commit 0ef79d94f6
for the reworked version of the script.)

Task-number: QTBUG-79902
Change-Id: I60ddc896a308c06e01fa87e8e18e112faa17d601
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:44:06 +01:00
Edward Welbourne
67c0e28789 Purge a stray space from calendar locale data
It was causing all lines after the first, in each calendar's
locale_data[], to be over-indented. This only changes spacing.

Change-Id: Ibfc4986548eecbfdba2902cc18f44a2af669bc6d
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2020-04-02 19:44:01 +01:00
Edward Welbourne
3dfdc9b97a Convert the qlocale2cpp's last few %-formats to modern format() style
I've taken care of all the others in the course of other changes
already ...

Task-number: QTBUG-81344
Change-Id: I44e40a0d1c9f1e1a540a5f4cd252369fdc9b2698
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:50 +01:00
Edward Welbourne
963931550d Check all matches for each XPath when searching
Previously, if we found one element with required attributes, we would
search into it and ignore any later elements also with those required
attributes. This meant that, if the first didn't contain the child
elements we were looking for, we'd fail to find what we sought, if it
was in a later matching element (e.g. with some ignored attributes).
We would then go on to look for a match in a later file, where there
might have been a match we should have found in the earlier file.

Check all matches, rather than only the first match in each file.  Do
the search in each file "in parallel" to save reparsing the XPath.
This clears the search code of rather hard-to-follow break/else
handling in loops; and currently makes no change to the generated
data.

Change-Id: I86b010e65b9a1fc1b79e5fdd45a5aeff1ed5d5d5
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:43 +01:00
Edward Welbourne
89bd12b9ad Change QLocale to use CLDR's accounting formats for currencies
In particular, this changed the US currency formats for negative
amounts to be parenthesised versions of the positive amount forms,
rather than having a minus sign after the $ sign. Test updated.

[ChangeLog][QtCore][QLocale] Currency formats are now based on CLDR's
accounting formats, where they were previously mostly based (more or
less by accident) on standard formats. In particular, this now means
negative currency formats are specified, where available, where they
(mostly) were not previously.

Task-number: QTBUG-79902
Change-Id: Ie0c07515ece8bd518a74a6956bf97ca85e9894eb
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 20:43:34 +02:00
Edward Welbourne
81cf23c7a7 Take CLDR's distinguished attributes into account
When doing XPATH searches, child nodes that have distinguished
attributes that were not asked for should be skipped. This is part of
the LDML spec and matters when resolving locale inheritance. Scan the
LDML DTD (previously only scanned for the CLDR version) to find which
attributes of which tags are ignorable - all others are distinguished
- and take the result into account when performing XPATH searches.

The XPath we were using for currency formats wasn't excluding
currencyFormatLength elements with type="short" and patterns specific
to thousands (and larger multiples); this is fixed by taking
distinguished attributes into account. However, the XPATH also wasn't
specifying the always distinguished attribute type="standard" that
was, in practice, used for nearly all locales that weren't (wrongly)
using short-forms for thousands; so type="standard" is now made
explicit, so as to minimize the diff.

This leaves only twenty-one locales with a negative currency formats.
A later commit shall switch to using accounting by default (it falls
back via an alias to standard, in any case), thereby restoring the two
mentioned below that were using it by accident, but the present change
gives the minimal diff here.

Thousands-specific formats replaced with sensible ones:
* zh_Hant_{HK,MO} (Traditional Mandarin, Hong Kong and Macau)
* eo_001 (Esperanto)
* fr_CA (Canadian French)
* ha_* (Hausa, when not written in Arabic)
* es_{GT,MX,US} (Spanish - Guatemala, Mexico, USA)
* sw_KE (Swahili, Kenya)
* yi_001 (Yiddish)
* mfe_MU (Morisyen, Mauritius)
* lag_TZ (Langi, Tanzania)
* mgh_MZ (Makhuwa Meetto, Mozambique)
* wae_CH (Walser, Switzerland)
* kkj_CM (Kako, Cameroon)
* lkt_US (Lakota, USA)
* pa_Arab_PK (Punjabi, in Arabic script, as used in Pakistan; uses
  arabext number system, whose currency falls back to latn's, for
  which pa_Arab over-rides the thousands-format).

Format changed from an over-ridden type="accounting" to standard (so
these lost a negative-specific form) in:
* en_SI (English, Slovenia)
* es_DO (Spanish, Dominican Republic; same)

For some locales we were picking up over-rides of narrow or short list
formats, or formats for or-lists or unit-lists rather than and-lists,
in place of the standard list format, that these locales don't
over-ride, provided by a parent locale. This changed list formats for:
* en_CA, en_IN (dropped "Oxford" comma before "and")
* qu_* (Quechua; dropped "utaq", presumably meaning "and")
* ur_IN (Urdu, India; was using unit-list formats)

[ChangeLog][QtCore][QLocale] Data used for currency formats in several
locales and list patterns in some locales have changed due to now
parsing the CLDR data more faithfully.

Fixes: QTBUG-81344
Change-Id: I6b95c6c37db92df167153767c1b103becfb0ac98
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:28 +01:00
Edward Welbourne
e5eb0aa428 Take number system into account in currency format look-up
CLDR's currency formats do have number system variation, so take it
into account. (The old xpathlite code clearly intended to do this, but
failed at it due to looking for the wrong component of an XPATH to
fix.) This changes the currency formats in use for
* all Dutch locales (because nl.xml lists a currency format for arab
  before the one for latn, and they differ),
* Punjabi, Urdu - specifically pa_Guru_IN, ur_Arab_PK (both like
  Dutch, arabext before latn; which is correct for pa_Arab_PK and
  ur_Arab_IN),
* Sindi (whose over-ride of latn currency format we were using, where
  we should be using arab's format, supplied by root's default),
* Tatar (which specifies a generic currency format, which we were
  using, before one specific to latn, which we now use),
* Tongan (same as Dutch),
* Konkani (like Dutch, deva before latn) and
* several North African Arabic locales (whose default number system is
  latn, rather than arab, but previously used arab's formats).

Task-number: QTBUG-79902
Change-Id: I18d8ec16bfd3a516d1bcd2f63bc7f7f15179a3f4
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:23 +01:00
Edward Welbourne
be3dfd7a71 Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.

It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values.  (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.

Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.

Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.

Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.

Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:18 +01:00
Edward Welbourne
c834dbc6fb Move cldr2qtimezone.py's CLDR-reading to a CldrAccess class
This begins the process of replacing xpathlite.py, adding low-level
DOM-access classes to ldml.py and the CldrAccess class to cldr.py

Moved a format comment from cldr2qtimezone.py's doc-string to the
method of CldrAccess that does the actual reading.

Task-number: QTBUG-81344
Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:13 +01:00
Edward Welbourne
9fab53a513 Rework qlocalexml2cpp.py to use writers based on Transcriber
This saves repetition of temporary-file manipulation code. In the
process, ensure that we tidy away temporary files on failure.

Moved a comment in qlocale.h to *outside* the re-written portion, to
save having to rewrite it every time. Added blank lines to separate
script data from country data in the generated output. Changed 0s in
one comment to zeros, to match another comment.

Isolated use of sys to the __main__ block.
Isolated use of enumdata to the new LocaleHeaderWriter class.
Modernised all the string-formatting I touched.

Task-number: QTBUG-81344
Change-Id: I5768e45d9a8ea23facc303b3dd8af8b3ccbf7ff2
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:56 +01:00
Edward Welbourne
5b1c33cc78 Rework cldr2qtimezone.py into more maintainable form
Broke out the updating of a source file to a ZoneIdWriter helper
class, which enables tidying away the temporary file if we fail.
Collected up the rest of the script into a main() that's now
called from a __name__ == '__main__' block.
Rationalized the imports.

Eliminated an inefficient lookup function by constructing a suitable
dict() before entering the loop that needed it.

Separated the "data you might need to update" tables from the code
that does the work, to make it easier for those adding support for new
zones to see what they're doing.

Removed the spurious $Revision$ from the output and reworded the
premable of the generated file. (It would seem CLDR no longer uses an
RCS-based version-control system.) Generated output is otherwise
unchanged.

Task-number: QTBUG-81344
Change-Id: I7d9de8357ebcb599d154de9f862e25f7ade00390
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:50 +01:00
Edward Welbourne
bb4242341b Add tools to localetools to facilitate source file recreation
For now unused; later commits shall put them to use.
Transcriber -- base, takes care of tempfile and renaming.
SourceFileEditor -- handles copying parts before and after a common delimiter.

Task-number: QTBUG-81344
Change-Id: I28cf977d0a08825fbb873fb330da6823b88ad3ed
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:45 +01:00
Edward Welbourne
c3dea1ffca Move some shared code to a localetools module
The time-zone script was importing two functions from the locale data
generation script. Move them to a separate module, to which I'll
shortly add some more shared utilities. Cleaned up some imports in the
process.

Combined qlocalexml2cpp's and xpathlit's error classes into a new
Error class in the new module and made it a bit more like a proper
python error class.

Task-number: QTBUG-81344
Change-Id: Idbe0139ba9aaa2f823b8f7216dee1d2539c18b75
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:40 +01:00
Edward Welbourne
4d9f1a87de Move qlocalexml2cpp.py's XML-reading to QLocaleXmlReader
This new class mirrors the existing QLocaleXmlWriter and places the
two side-by-side in qlocalexml.py, rather than having the writing and
reading in separate places.

Made judicious use of transformed versions of mappings to save
repeated iteration of a mapping's entries to do lookups on fist
entries of pair-values; several (id, name, code) data-sets are
sometimes indexed by id, sometimes by name.

Reworked the default_map, that the complicated compareLocaleKeys()
used in sorting locale keys, to map IDs instead of names; the function
also needed the locale_map so that it could convert IDs to names,
which we can skip by going directly with IDs.

Task-number: QTBUG-81344
Change-Id: Iff6a97f7f0755b56dda70d8a6796ec074c558910
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:34 +01:00
Edward Welbourne
a20697a394 Rework cldr2qlocalexml.py in terms of a QLocaleXmlWriter class
Delegate the output of XML to a helper class provided by qlocalexml.py
and restructure the driver script so that it can be imported without
running anything. It now has a minimal __name__ == '__main__' block
that calls a main() function. This, for the moment, requires a global
via which it shares the CLDR directory with various other functions;
that shall go away in a later commit.

Task-number: QTBUG-81344
Change-Id: Ica2d3ec09f2d38ba42fd930258cc765283f29a71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:28 +01:00
Simon Hausmann
ff922e7b87 Merge remote-tracking branch 'origin/5.15' into dev
Conflicts:
	src/corelib/kernel/qmetatype.cpp

Change-Id: I88eb0d3e9c9a38abf7241a51e370c655ae74e38a
2020-03-16 18:41:27 +01:00
Edward Welbourne
ebcd8e16db Deduplicate day-name data in QLocaleXML files
This is a follow-up to commit ebb0212133.
The day name data appeared twice in the XML files.
Skip the second copy, saving 8.8% of the intermediate file-size.
This makes no change to generated QLocale data.

Change-Id: Ic2cc543a2a85cbb1d2d47ebac7df4fa9ad6ee0a7
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-03-16 08:51:46 +01:00