Commit Graph

18 Commits

Author SHA1 Message Date
Edward Welbourne
8a762c6f0f Cope with CLDR data conflict decimal == group
The digit-grouping and fractional-part separators need to be distinct
for parsing to be able to distinguish between two thousand and one vs
two and one thousandth. Thakfully ldml.py asserted this, so caught the
glitch in CLDR v43's data where mn_Mong_MN over-rode mn's decimal, but
not group, and thereby clashed with group. Fortunately the over-ride
is marked as draft="contributed" so we can back out of the collision
and limit the selection to draft="approved" values (but only when
there *is* such a conflict, as plenty of locales have (compatible)
draft data), thereby ignoring the conflicting contribution.

Brought to the attention of cldr-users at:
https://groups.google.com/a/unicode.org/g/cldr-users/c/6kW9kC6fz3g
hopefully that'll lead to a saner resolution at v44.

Task-number: QTBUG-111550
Change-Id: I1332486e60481cb4494446c0c87d89d74bd317d4
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-01 15:36:20 +02:00
Edward Welbourne
615047e98f Ignore parentLocales nodes with component="..." attributes
From CLDR v43, "The parentLocale elements now have an optional
component attribute, with a value of segmentations or
collations. These should be used for inheritance for those respective
elements." Since we aren't extracting collation or segmentation data
for the present, omit these elements from the scan for parentLocale
information.

Task-number: QTBUG-111550
Change-Id: I42871929f539c1852471812801953f2fc8be0e8a
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-01 15:36:06 +02:00
Edward Welbourne
277f3345f2 Record a recent discovery: Suzhou isn't hanidec
Revise a comment in ldml.py about Suzhou "digits", since it turns out
they aren't the same as hanidec, which is far from contiguous.

Change-Id: Ia3947dbc5a927772026e55fe197c8ebce2540da2
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-02-28 20:14:18 +01:00
Lucie Gérard
05fc3aef53 Use SPDX license identifiers
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.

Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
2022-05-16 16:37:38 +02:00
Ievgenii Meshcheriakov
41458fafa0 locale_database: Use f-strings in Python code
Replace most uses of str.format() and string arithmetic by f-strings.
This results in more compact code and the code is easier to read
when using an appropriate editor.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I3409f745b5d0324985cbd5690f5eda8d09b869ca
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-16 19:04:20 +02:00
Ievgenii Meshcheriakov
53382b7b07 locale_database: Don't use u prefix for strings in python files
This prefix is useless with Python 3.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Ic008d53fe506865759e9a5003f439f7ac107b9e6
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-15 17:06:53 +02:00
Ievgenii Meshcheriakov
b02d17c5c0 Convert CLDR scripts to Python 3
The convertion is moslty done using 2to3 script with manual cleanup
afterwards.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I4d33b04e7269c55a83ff2deb876a23a78a89f39d
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-15 17:06:53 +02:00
Edward Welbourne
e51831260a Nomenclature change: s/countr/territor/g in locale scripts
Change the nomenclature used in the scripts and the QLocaleXML data
format to use "territory" and "territories" in place of "country" and
"countries". Does not change the generated source files.

Change-Id: I4b208d8d01ad2bfc70d289fa6551f7e0355df5ef
Reviewed-by: JiDe Zhang <zhangjide@uniontech.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2021-05-26 18:00:01 +02:00
Edward Welbourne
17701a95f8 QLocale: simplify currency display name lookup
We were extracting several candidate display names from CLDR for each
currency, joining them with semicolons, storing in a table, then using
only the first entry from the list - where we should probably have
used the first non-empty entry in any case.

So instead extract the first non-empty candidate name from CLDR and
store that simply, saving the need for semicolon-joining or parsing
out the first entry from the thus-joined list. This significantly
reduces the size of the currency name data table.

Change-Id: I201d0528348d5fcb9eceb5df86211b9c77de3485
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-11-17 19:55:04 +01:00
Edward Welbourne
d853816307 Fix handling of Suzhou numbering system
This only arises when the system locale tells us to use its zero as
our zero digit, since no CLDR locale uses it by default. Adapt an
MS-specific QLocale::system() test to use Suzhou numbering, so as to
test this.

While updating the locale-restoration code to also restore the digits
being set in that test, add restore code for the long time format,
where previously only the short time format was restored. Add a
comment to make it less likely one of those shall be missed in future.

Fixes: QTBUG-85409
Change-Id: I343324bb563ee0e455dfe77d4825bf8c3082ca30
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-07-17 12:19:01 +02:00
Edward Welbourne
bb6a73260e Support digit-grouping correctly
Read three more values from CLDR and add a byte to the bit-fields at
the end of QLocaleData, indicating the three group sizes. This adds
three new parameters to various low-level formatting functions. At the
same time, rename ThousandsGroup to GroupDigits, more faithfully
expressing what this (internal) option means.

This replaces commit 27d1391280 with a
fuller implementation that handles digit-grouping in any of the ways
that CLDR supports. The formerly "Indian" formatting now also applies
to at least some locales for Bangladesh, Bhutan and Sri Lanka.

Fixed Costa Rica currency formatting test that wrongly put a separator
after the first digit; the locale (in common with several Spanish
locales) requires at least two digits before the first separator.

[ChangeLog][QtCore][Important Behavior Changes] Some locales require
more than one digit before the first grouping separator; others use
group sizes other than three. The latter was partially supported (only
for India) at 5.15 but is now systematically supported; the former is
now also supported.

Task-number: QTBUG-24301
Fixes: QTBUG-81050
Change-Id: I4ea4e331f3254d1f34801cddf51f3c65d3815573
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-07-14 14:52:08 +02:00
Qt Forward Merge Bot
8823bb8d30 Merge remote-tracking branch 'origin/5.15' into dev
Conflicts:
	examples/opengl/doc/src/cube.qdoc
	src/corelib/global/qlibraryinfo.cpp
	src/corelib/text/qbytearray_p.h
	src/corelib/text/qlocale_data_p.h
	src/corelib/time/qhijricalendar_data_p.h
	src/corelib/time/qjalalicalendar_data_p.h
	src/corelib/time/qromancalendar_data_p.h
	src/network/ssl/qsslcertificate.h
	src/widgets/doc/src/graphicsview.qdoc
	src/widgets/widgets/qcombobox.cpp
	src/widgets/widgets/qcombobox.h
	tests/auto/corelib/tools/qscopeguard/tst_qscopeguard.cpp
	tests/auto/widgets/widgets/qcombobox/tst_qcombobox.cpp
	tests/benchmarks/corelib/io/qdiriterator/qdiriterator.pro
	tests/manual/diaglib/debugproxystyle.cpp
	tests/manual/diaglib/qwidgetdump.cpp
	tests/manual/diaglib/qwindowdump.cpp
	tests/manual/diaglib/textdump.cpp
	util/locale_database/cldr2qlocalexml.py
	util/locale_database/qlocalexml.py
	util/locale_database/qlocalexml2cpp.py

Resolution of util/locale_database/ are based on:
https://codereview.qt-project.org/c/qt/qtbase/+/294250
and src/corelib/{text,time}/*_data_p.h were then regenerated by
running those scripts.

Updated CMakeLists.txt in each of
	tests/auto/corelib/serialization/qcborstreamreader/
	tests/auto/corelib/serialization/qcborvalue/
	tests/auto/gui/kernel/
and generated new ones in each of
	tests/auto/gui/kernel/qaddpostroutine/
	tests/auto/gui/kernel/qhighdpiscaling/
	tests/libfuzzer/corelib/text/qregularexpression/optimize/
	tests/libfuzzer/gui/painting/qcolorspace/fromiccprofile/
	tests/libfuzzer/gui/text/qtextdocument/sethtml/
	tests/libfuzzer/gui/text/qtextdocument/setmarkdown/
	tests/libfuzzer/gui/text/qtextlayout/beginlayout/
by running util/cmake/pro2cmake.py on their changed .pro files.

Changed target name in
	tests/auto/gui/kernel/qaction/qaction.pro
	tests/auto/gui/kernel/qaction/qactiongroup.pro
	tests/auto/gui/kernel/qshortcut/qshortcut.pro
to ensure unique target names for CMake

Changed tst_QComboBox::currentIndex to not test the
currentIndexChanged(QString), as that one does not exist in Qt 6
anymore.

Change-Id: I9a85705484855ae1dc874a81f49d27a50b0dcff7
2020-04-08 20:11:39 +02:00
Edward Welbourne
963931550d Check all matches for each XPath when searching
Previously, if we found one element with required attributes, we would
search into it and ignore any later elements also with those required
attributes. This meant that, if the first didn't contain the child
elements we were looking for, we'd fail to find what we sought, if it
was in a later matching element (e.g. with some ignored attributes).
We would then go on to look for a match in a later file, where there
might have been a match we should have found in the earlier file.

Check all matches, rather than only the first match in each file.  Do
the search in each file "in parallel" to save reparsing the XPath.
This clears the search code of rather hard-to-follow break/else
handling in loops; and currently makes no change to the generated
data.

Change-Id: I86b010e65b9a1fc1b79e5fdd45a5aeff1ed5d5d5
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:43 +01:00
Edward Welbourne
89bd12b9ad Change QLocale to use CLDR's accounting formats for currencies
In particular, this changed the US currency formats for negative
amounts to be parenthesised versions of the positive amount forms,
rather than having a minus sign after the $ sign. Test updated.

[ChangeLog][QtCore][QLocale] Currency formats are now based on CLDR's
accounting formats, where they were previously mostly based (more or
less by accident) on standard formats. In particular, this now means
negative currency formats are specified, where available, where they
(mostly) were not previously.

Task-number: QTBUG-79902
Change-Id: Ie0c07515ece8bd518a74a6956bf97ca85e9894eb
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 20:43:34 +02:00
Edward Welbourne
81cf23c7a7 Take CLDR's distinguished attributes into account
When doing XPATH searches, child nodes that have distinguished
attributes that were not asked for should be skipped. This is part of
the LDML spec and matters when resolving locale inheritance. Scan the
LDML DTD (previously only scanned for the CLDR version) to find which
attributes of which tags are ignorable - all others are distinguished
- and take the result into account when performing XPATH searches.

The XPath we were using for currency formats wasn't excluding
currencyFormatLength elements with type="short" and patterns specific
to thousands (and larger multiples); this is fixed by taking
distinguished attributes into account. However, the XPATH also wasn't
specifying the always distinguished attribute type="standard" that
was, in practice, used for nearly all locales that weren't (wrongly)
using short-forms for thousands; so type="standard" is now made
explicit, so as to minimize the diff.

This leaves only twenty-one locales with a negative currency formats.
A later commit shall switch to using accounting by default (it falls
back via an alias to standard, in any case), thereby restoring the two
mentioned below that were using it by accident, but the present change
gives the minimal diff here.

Thousands-specific formats replaced with sensible ones:
* zh_Hant_{HK,MO} (Traditional Mandarin, Hong Kong and Macau)
* eo_001 (Esperanto)
* fr_CA (Canadian French)
* ha_* (Hausa, when not written in Arabic)
* es_{GT,MX,US} (Spanish - Guatemala, Mexico, USA)
* sw_KE (Swahili, Kenya)
* yi_001 (Yiddish)
* mfe_MU (Morisyen, Mauritius)
* lag_TZ (Langi, Tanzania)
* mgh_MZ (Makhuwa Meetto, Mozambique)
* wae_CH (Walser, Switzerland)
* kkj_CM (Kako, Cameroon)
* lkt_US (Lakota, USA)
* pa_Arab_PK (Punjabi, in Arabic script, as used in Pakistan; uses
  arabext number system, whose currency falls back to latn's, for
  which pa_Arab over-rides the thousands-format).

Format changed from an over-ridden type="accounting" to standard (so
these lost a negative-specific form) in:
* en_SI (English, Slovenia)
* es_DO (Spanish, Dominican Republic; same)

For some locales we were picking up over-rides of narrow or short list
formats, or formats for or-lists or unit-lists rather than and-lists,
in place of the standard list format, that these locales don't
over-ride, provided by a parent locale. This changed list formats for:
* en_CA, en_IN (dropped "Oxford" comma before "and")
* qu_* (Quechua; dropped "utaq", presumably meaning "and")
* ur_IN (Urdu, India; was using unit-list formats)

[ChangeLog][QtCore][QLocale] Data used for currency formats in several
locales and list patterns in some locales have changed due to now
parsing the CLDR data more faithfully.

Fixes: QTBUG-81344
Change-Id: I6b95c6c37db92df167153767c1b103becfb0ac98
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:28 +01:00
Edward Welbourne
e5eb0aa428 Take number system into account in currency format look-up
CLDR's currency formats do have number system variation, so take it
into account. (The old xpathlite code clearly intended to do this, but
failed at it due to looking for the wrong component of an XPATH to
fix.) This changes the currency formats in use for
* all Dutch locales (because nl.xml lists a currency format for arab
  before the one for latn, and they differ),
* Punjabi, Urdu - specifically pa_Guru_IN, ur_Arab_PK (both like
  Dutch, arabext before latn; which is correct for pa_Arab_PK and
  ur_Arab_IN),
* Sindi (whose over-ride of latn currency format we were using, where
  we should be using arab's format, supplied by root's default),
* Tatar (which specifies a generic currency format, which we were
  using, before one specific to latn, which we now use),
* Tongan (same as Dutch),
* Konkani (like Dutch, deva before latn) and
* several North African Arabic locales (whose default number system is
  latn, rather than arab, but previously used arab's formats).

Task-number: QTBUG-79902
Change-Id: I18d8ec16bfd3a516d1bcd2f63bc7f7f15179a3f4
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:23 +01:00
Edward Welbourne
be3dfd7a71 Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.

It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values.  (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.

Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.

Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.

Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.

Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:18 +01:00
Edward Welbourne
c834dbc6fb Move cldr2qtimezone.py's CLDR-reading to a CldrAccess class
This begins the process of replacing xpathlite.py, adding low-level
DOM-access classes to ldml.py and the CldrAccess class to cldr.py

Moved a format comment from cldr2qtimezone.py's doc-string to the
method of CldrAccess that does the actual reading.

Task-number: QTBUG-81344
Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:13 +01:00