Commit Graph

46 Commits

Author SHA1 Message Date
Edward Welbourne
1ae24f8b50 Use CLDR's names in QLocale::*ToName() for language, script, territory
Various comments need to continue using the enumdata.py names, as they
associate data with particular enum members, but we can now correctly
use the en.xml versions of their names when we report them, rather
than the enum-friendly names we use in the code. Since this now means
the data may stray outside plain ASCII - it'll be UTF-8-encoded - this
implies replacing the QLatin1StringView()s of the code that formerly
read this data with QString::fromUtf8().

Fixes: QTBUG-94460
Change-Id: Id3b08875a46af58c0555c3e303b0e15a19441509
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2023-08-09 17:53:42 +02:00
Edward Welbourne
743ceb7cc2 Move enum-name-munging from LocaleHeaderWriter to QLocaleXmlReader
The former needed the latter's .dupes to do the job, so can now just
take a method as a tool to do the job instead, letting .dupes become
private. In the process refine the munging to free enumdata.py from
having to capitalize each word in its names. This will, in due course,
let us use more natural forms in various comments. This causes no
change to generted data.

Update enumdata.py's introduction doc, mainly to reflect this but also
fixing the out-of-date names (old *_list have long been *_map) and
adding some details to other paragraphs.

Task-number: QTBUG-94460
Change-Id: If195b2e94a53a495fc4f1f216bed07a910439fa7
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-09 17:53:26 +02:00
Edward Welbourne
c5ef6e1e68 Wrap QLocale's CLDR-derived string data tables to a 100-column margin
The string data tables have a mix of digit-length tokens, up to
four-digit hex; we can fit 12 of those per line within our margins.
Leave the one-row-per-locale tables as they are, though, despite long
lines.

Change-Id: I655fddecf24133c26d16187b7a5a8fbc25553e07
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-03 19:16:27 +02:00
Mate Barany
3dcd6b7ec9 Pack languageCodeList tighter
Pack some of the arrays that contain locale data more tightly. The
AlphaCode struct is a char[4] but always holds only [a-z]{,3} which
could be fit into 16 bits, halving the size of an AlphaCode struct.

With the new constructor the initialization of the AlphaCode struct
also changes - modify qlocalexml2cpp.py to reflect this change and
regenerate the languageCodeList.

Fixes: QTBUG-105050
Change-Id: I2b1e93ab7cc3f2d667bf67b45769b74a15211931
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2023-03-15 19:48:30 +01:00
Edward Welbourne
0de7e7a066 LocaleDB: Make passing qtbase root dir on command-lines optional
We can easily enough obtain the root of the present source tree using
the value of __file__, so might as well do so.

Change-Id: If14773ac1127278b6018a090c0b376437b9c6eec
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2022-11-01 19:14:14 +02:00
Yuhang Zhao
42fe677584 Core: make CLDR data constexpr
Task-number: QTBUG-100485
Pick-to: 6.3 6.2
Change-Id: Ib8c5160ca0994662a5fdc2293dc734c1bdcac4f2
Reviewed-by: Marc Mutz <marc.mutz@qt.io>
2022-05-27 09:33:28 +08:00
Lucie Gérard
05fc3aef53 Use SPDX license identifiers
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.

Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
2022-05-16 16:37:38 +02:00
Ievgenii Meshcheriakov
4f53c703e4 QLocale: Extend support for language codes
This commit extends functionality for QLocale::codeToLanguage()
and QLocale::languageToCode() by adding an additional argument
that allows selection of the ISO 639 code-set to consider for
those operations.

The following ISO 639 codes are supported:
    * Part 1
    * Part 2 bibliographic
    * Part 2 terminological
    * Part 3

As a result of this change the codeToLanguage() overload without
the additional argument now returns a Language value if it matches
any know code. Previously a valid language was returned only if
the function argument matched the first code defined for that
language from the above list.

[ChangeLog][QtCore][QLocale] Added overloads for codeToLanguage()
and languageToCode() that support specifying which ISO 639 codes
to consider.

Fixes: QTBUG-98129
Change-Id: I4da8a89e2e68a673cf63a621359cded609873fa2
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-12-09 03:45:08 +01:00
Ievgenii Meshcheriakov
e9519d207e locale_database: Use context manager interface to update source files
Use context manager interface (with statement) to atomically update source
files. This ensures that all files are properly closed and the temporary
file is removed even in case of errors.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I18cd96f1d03e467dea6212f6576a41e65f414ce1
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-07-20 16:51:51 +02:00
Ievgenii Meshcheriakov
2d0c2c9f1c locale_database: Use pathlib to manipulate paths in Python code
pathlib's API is more modern and easier to use than os.path. It
also allows to distinguish between paths and other strings in type
annotations.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Ie6d9b4e35596f7f6befa4c9635f4a65ea3b20025
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-19 22:05:54 +02:00
Ievgenii Meshcheriakov
5ef5dce53b locale_database: Use argparse module to parse command line arguments
arparse is the standard way to parse command line arguments in Python.
It provides help and usage information for free and is easier to extend
than a custom argument parser.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I1e4c9cd914449e083d01932bc871ef10d26f0bc2
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-16 19:04:20 +02:00
Ievgenii Meshcheriakov
41458fafa0 locale_database: Use f-strings in Python code
Replace most uses of str.format() and string arithmetic by f-strings.
This results in more compact code and the code is easier to read
when using an appropriate editor.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I3409f745b5d0324985cbd5690f5eda8d09b869ca
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-16 19:04:20 +02:00
Ievgenii Meshcheriakov
2f4868548b locale_database: Use super() to call base class methods
This is the standard way to call base class methods in Python 3 and
it is shorter than the custom one used now.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Ifaff591a46e92148fbf514856109ff794a50c9f7
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-15 17:06:54 +02:00
Ievgenii Meshcheriakov
b02d17c5c0 Convert CLDR scripts to Python 3
The convertion is moslty done using 2to3 script with manual cleanup
afterwards.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: I4d33b04e7269c55a83ff2deb876a23a78a89f39d
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-15 17:06:53 +02:00
Edward Welbourne
a37c0ef55f Convert python comparison function to key function
Instead of implementing all the intricacies of a cmp for the python
sort-function, support for which is due to be dropped at Python 3 in
any case, implement a much simpler key function that achieves the same
result.

In the process, eliminate the ugly kludge of setting an attribute on a
function to, in effect, communicate with it via a global. Instead,
instantiate a class, that wraps the value previously given to the
attribute and whose instance provides the key-function.

Thanks to Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io> for
pointing out that a key function is the way of the future - and
sorted() is a nicer way to sort.

Pick-to: 6.2
Change-Id: Icf1ed5597fedf420d054fbc860e3e7fc6615875c
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-07-14 20:59:00 +02:00
Edward Welbourne
7dec56c6a5 Make locale ordering transitive
The ordering function used to sort the locale data generated for
QLocale attempted to sort the default territory for a given language
and script before other territories, but was too tangled for it to be
obvious this is what it was doing. The result turned out to be
non-transitive. Replace with code that implements the same preference
but only applies it where the result is compatible with transitivity.

This leads to a shuffling of the order of the Serbian-language
locales, which sorts the Cyrillic ones before the Latin ones. This is
consistent with my reading of the CLDR data, which fills in Cyrillic
and Serbia for Serbian; Serbian/Cyrillic/Serbia did previously sort
before all other Serbian variants.

Thanks to Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io> for
discovering the non-transitivity.

Pick-to: 6.2
Change-Id: I0ce9f78e620e714f980f32b85b7100ed0f92ad74
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2021-07-14 20:59:00 +02:00
Ievgenii Meshcheriakov
2300146085 locale_database: Don't attempt to access property 'message' of IOError
IOError does not have property 'message' in Python 3. Instead of
attempting to access it, just use the string representation of
the exception object. This produces the error message possibly combined
with additional arguments in both Python 2 and Python 3.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Icb198a409e7f80b832e474d8390b770fdeacc6c2
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-07 19:32:52 +02:00
Ievgenii Meshcheriakov
6deb28f35a qlocalexml2cpp.py: Remove undefined name inside error processing code
Name 'stem' is undefined inside CalendarDataWriter.write(). The error
was repoted by flake8.

Task-number: QTBUG-83488
Pick-to: 6.2
Change-Id: Ib816b40d0bde2afd3112da76deee0ce39985693a
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-07-06 20:19:22 +02:00
Edward Welbourne
e51831260a Nomenclature change: s/countr/territor/g in locale scripts
Change the nomenclature used in the scripts and the QLocaleXML data
format to use "territory" and "territories" in place of "country" and
"countries". Does not change the generated source files.

Change-Id: I4b208d8d01ad2bfc70d289fa6551f7e0355df5ef
Reviewed-by: JiDe Zhang <zhangjide@uniontech.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2021-05-26 18:00:01 +02:00
JiDe Zhang
50a7eb8cf7 Add the "Territory" enumerated type for QLocale
The use of "Country" is misleading as some entries in the enumeration
are not countries (eg, HongKong), for all that most are. The Unicode
Consortium's Common Locale Data Repository (CLDR, from which QLocale's
data is taken) calls these territories, so introduce territory-based
names and prepare to deprecate the country-based ones in due course.

[ChangeLog][QtCore][QLocale] QLocale now has Territory as an alias for
its Country enumeration, and associated territory-based names to match
its country-named methods, to better match the usage in relevant
standards. The country-based names shall in due course be deprecated
in favor of the territory-based names.

Fixes: QTBUG-91686
Change-Id: Ia1ae1ad7323867016186fb775c9600cd5113aa42
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2021-04-15 20:17:49 +08:00
Edward Welbourne
a9e4bf7eef Implement binary search in QLocale's likely sub-tag lookup
Follow through on a comment from 2012: sort the likely subtag array
(in the CLDR update script) and use bsearch to find entries in it.

This simplifies QLocaleXmlReader.likelyMap() slightly, moving the
detection of last entry to LocaleDataWriter.likelySubtags(), but
requires collecting all likely sub-tag mapping pairs (rather than just
passing them through from read to write via generators) in order to
sort them.

Change-Id: Ieb6875ccde1ddbd475ae68c0766a666ec32b7005
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-11-08 13:01:33 +01:00
Edward Welbourne
73ceb71576 Use newer names for various languages, territories and scripts
Our enumdata.py namings of countries had fallen somewhat out of sync
with CLDR's names. In the process, support including hyphenation in
the unsquashed name, along with spacing. Distinguish, in comments,
between older renamings and those first seen in Qt6.

Change-Id: I91ec444bf35222ab6a9332e389ace19cca0e4fdf
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-11-08 13:01:12 +01:00
Edward Welbourne
8b0e068847 Mark QLocale's Language, Country and Script enums as ushort
The code pervasively presumes their values can be held in a ushort, so
make sure the compiler knows we expect that to work (and doesn't
complain about narrowing when we do convert them to ushort).

Change-Id: Idde7be6cceee8a6dae333c5b1d5a0120fec32e4a
Reviewed-by: Andrei Golubev <andrei.golubev@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-10-12 16:53:40 +02:00
Edward Welbourne
bb6a73260e Support digit-grouping correctly
Read three more values from CLDR and add a byte to the bit-fields at
the end of QLocaleData, indicating the three group sizes. This adds
three new parameters to various low-level formatting functions. At the
same time, rename ThousandsGroup to GroupDigits, more faithfully
expressing what this (internal) option means.

This replaces commit 27d1391280 with a
fuller implementation that handles digit-grouping in any of the ways
that CLDR supports. The formerly "Indian" formatting now also applies
to at least some locales for Bangladesh, Bhutan and Sri Lanka.

Fixed Costa Rica currency formatting test that wrongly put a separator
after the first digit; the locale (in common with several Spanish
locales) requires at least two digits before the first separator.

[ChangeLog][QtCore][Important Behavior Changes] Some locales require
more than one digit before the first grouping separator; others use
group sizes other than three. The latter was partially supported (only
for India) at 5.15 but is now systematically supported; the former is
now also supported.

Task-number: QTBUG-24301
Fixes: QTBUG-81050
Change-Id: I4ea4e331f3254d1f34801cddf51f3c65d3815573
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-07-14 14:52:08 +02:00
Dimitrios Apostolou
1e546595e9 Remove unused imports
As found by LGTM.com.

Change-Id: I1704f10f9bab1b11ab22824aca0cfcdcb47fef2f
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2020-07-10 02:36:54 +02:00
Qt Forward Merge Bot
8823bb8d30 Merge remote-tracking branch 'origin/5.15' into dev
Conflicts:
	examples/opengl/doc/src/cube.qdoc
	src/corelib/global/qlibraryinfo.cpp
	src/corelib/text/qbytearray_p.h
	src/corelib/text/qlocale_data_p.h
	src/corelib/time/qhijricalendar_data_p.h
	src/corelib/time/qjalalicalendar_data_p.h
	src/corelib/time/qromancalendar_data_p.h
	src/network/ssl/qsslcertificate.h
	src/widgets/doc/src/graphicsview.qdoc
	src/widgets/widgets/qcombobox.cpp
	src/widgets/widgets/qcombobox.h
	tests/auto/corelib/tools/qscopeguard/tst_qscopeguard.cpp
	tests/auto/widgets/widgets/qcombobox/tst_qcombobox.cpp
	tests/benchmarks/corelib/io/qdiriterator/qdiriterator.pro
	tests/manual/diaglib/debugproxystyle.cpp
	tests/manual/diaglib/qwidgetdump.cpp
	tests/manual/diaglib/qwindowdump.cpp
	tests/manual/diaglib/textdump.cpp
	util/locale_database/cldr2qlocalexml.py
	util/locale_database/qlocalexml.py
	util/locale_database/qlocalexml2cpp.py

Resolution of util/locale_database/ are based on:
https://codereview.qt-project.org/c/qt/qtbase/+/294250
and src/corelib/{text,time}/*_data_p.h were then regenerated by
running those scripts.

Updated CMakeLists.txt in each of
	tests/auto/corelib/serialization/qcborstreamreader/
	tests/auto/corelib/serialization/qcborvalue/
	tests/auto/gui/kernel/
and generated new ones in each of
	tests/auto/gui/kernel/qaddpostroutine/
	tests/auto/gui/kernel/qhighdpiscaling/
	tests/libfuzzer/corelib/text/qregularexpression/optimize/
	tests/libfuzzer/gui/painting/qcolorspace/fromiccprofile/
	tests/libfuzzer/gui/text/qtextdocument/sethtml/
	tests/libfuzzer/gui/text/qtextdocument/setmarkdown/
	tests/libfuzzer/gui/text/qtextlayout/beginlayout/
by running util/cmake/pro2cmake.py on their changed .pro files.

Changed target name in
	tests/auto/gui/kernel/qaction/qaction.pro
	tests/auto/gui/kernel/qaction/qactiongroup.pro
	tests/auto/gui/kernel/qshortcut/qshortcut.pro
to ensure unique target names for CMake

Changed tst_QComboBox::currentIndex to not test the
currentIndexChanged(QString), as that one does not exist in Qt 6
anymore.

Change-Id: I9a85705484855ae1dc874a81f49d27a50b0dcff7
2020-04-08 20:11:39 +02:00
Edward Welbourne
67c0e28789 Purge a stray space from calendar locale data
It was causing all lines after the first, in each calendar's
locale_data[], to be over-indented. This only changes spacing.

Change-Id: Ibfc4986548eecbfdba2902cc18f44a2af669bc6d
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2020-04-02 19:44:01 +01:00
Edward Welbourne
3dfdc9b97a Convert the qlocale2cpp's last few %-formats to modern format() style
I've taken care of all the others in the course of other changes
already ...

Task-number: QTBUG-81344
Change-Id: I44e40a0d1c9f1e1a540a5f4cd252369fdc9b2698
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:50 +01:00
Edward Welbourne
be3dfd7a71 Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.

It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values.  (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.

Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.

Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.

Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.

Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:18 +01:00
Edward Welbourne
9fab53a513 Rework qlocalexml2cpp.py to use writers based on Transcriber
This saves repetition of temporary-file manipulation code. In the
process, ensure that we tidy away temporary files on failure.

Moved a comment in qlocale.h to *outside* the re-written portion, to
save having to rewrite it every time. Added blank lines to separate
script data from country data in the generated output. Changed 0s in
one comment to zeros, to match another comment.

Isolated use of sys to the __main__ block.
Isolated use of enumdata to the new LocaleHeaderWriter class.
Modernised all the string-formatting I touched.

Task-number: QTBUG-81344
Change-Id: I5768e45d9a8ea23facc303b3dd8af8b3ccbf7ff2
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:56 +01:00
Edward Welbourne
c3dea1ffca Move some shared code to a localetools module
The time-zone script was importing two functions from the locale data
generation script. Move them to a separate module, to which I'll
shortly add some more shared utilities. Cleaned up some imports in the
process.

Combined qlocalexml2cpp's and xpathlit's error classes into a new
Error class in the new module and made it a bit more like a proper
python error class.

Task-number: QTBUG-81344
Change-Id: Idbe0139ba9aaa2f823b8f7216dee1d2539c18b75
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:40 +01:00
Edward Welbourne
4d9f1a87de Move qlocalexml2cpp.py's XML-reading to QLocaleXmlReader
This new class mirrors the existing QLocaleXmlWriter and places the
two side-by-side in qlocalexml.py, rather than having the writing and
reading in separate places.

Made judicious use of transformed versions of mappings to save
repeated iteration of a mapping's entries to do lookups on fist
entries of pair-values; several (id, name, code) data-sets are
sometimes indexed by id, sometimes by name.

Reworked the default_map, that the complicated compareLocaleKeys()
used in sorting locale keys, to map IDs instead of names; the function
also needed the locale_map so that it could convert IDs to names,
which we can skip by going directly with IDs.

Task-number: QTBUG-81344
Change-Id: Iff6a97f7f0755b56dda70d8a6796ec074c558910
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:34 +01:00
Lars Knoll
2a4b957789 Merge remote-tracking branch 'origin/5.15' into dev
Change-Id: I99ee6f8b4bdc372437ee60d1feab931487fe55c4
2020-03-04 14:39:18 +00:00
Edward Welbourne
84382bde5c Rename the localexml module to qlocalexml
It implements interaction with the QLocaleXML file format type, so
rename it to match.

Task-number: QTBUG-81344
Change-Id: I46302d4ac1038cdfc5929e73b554b6d793814c56
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-03-03 07:38:06 +01:00
Edward Welbourne
c8e00e4f73 Use char16_t in favor of ushort for locale data tables
Change-Id: I890dd2b52c1b786db1081744c8ca343baba93de4
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-02-17 14:55:42 +01:00
Edward Welbourne
ed2b110b6a Allow surrogate pairs for various "single character" locale data
Extract the character in its proper unicode form and encode it in a
new single_character_data table of locale data. Record each entry as
the range within that table that encodes it. Also added an assertion
in the generator script to check that the digits CLDR gives us are a
contiguous sequence in increasing order, as has been assumed by the
C++ code for some time. Lots of number-formatting code now has to take
account of how wide the digits are.

This leaves nowhere for updateSystemPrivate() to record values read
from sys_locale->query(), so we must always consult that function when
accessing these members of the systemData() object. Various internal
users of these single-character fields need the system-or-CLDR value
rather than the raw CLDR value, so move QLocalePrivate's methods to
supply them down to QLocaleData and ensure they check for system
values, where appropriate first.

This allows us to finally support the Chakma language and script, for
whose number system UTF-16 needs surrogate pairs.

Costs 10.8 kB in added data, much of it due to adding two new locales
that need surrogates to represent digits.

[ChangeLog][QtCore][QLocale] Various QLocale methods that returned
single QChar values now return QString values to accommodate those
locales which need a surrogate pair to represent the (single
character) return value.

Fixes: QTBUG-69324
Fixes: QTBUG-81053
Change-Id: I481722d6f5ee266164f09031679a851dfa6e7839
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-02-17 14:55:24 +01:00
Edward Welbourne
c08a31634f Separate offsets from sizes in QLocale's data
This enables us to make the sizes quint8 and benefit from the
resulting packing, making the locale data smaller. The sizes for long
month-name lists (which concatenate twelve names with semicolon as
separator) can overflow an 8-bit member, so use quint16 where needed.

Re-ordered the data in QLocaleData and QCalendarLocale. Now all
long-short(-narrow) families arise in that order; and any standalone
is grouped with the one of the same length. (This cost 20 bytes in the
date-format table, which optimises out more duplication if short is
before long, but the saving in the (smaller) time-format table more
than make up for it; and 20 bytes isn't worth the confusion that being
inconsistent in ordering might cause.)

At the same time, drop trailing semicolons from list entries (which
join various names with semicolon) as they're not needed: we know
where the end of the list is, because we know the size of the string
that results from concatenation. The code that parses such lists can
even correctly handle empty entries at the end.

Saves 26 kB of data in the compiled binaries.

Task-number: QTBUG-81053
Change-Id: If6ccc96a6910828817aa605d10fd814f567ae1e8
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-30 17:58:53 +01:00
Edward Welbourne
4e84a8b29f Deduplicate locale data tables
Some entries in tables were sub-strings (e.g. prefixes) of others.
Since we store start-index and length (with no need for terminators),
any entry that appears as a sub-string of an earlier entry can be
recorded without making a separate copy of its content, just by
recording where it appeared as a sub-string of an earlier entry.

(Sadly this doesn't apply to month- or day-names and their
short-forms: for those, we store ';'-joined lists.  Thus, although
each short-form is a prefix of its long-form, the short-form is stored
in a list with other short-forms; and this is not a prefix of the list
of matching long-forms.)

The savings are modest (780 bytes at present), but cost us nothing
except when running the python script that generates the data files
(it takes a little longer now), which usually only happens at a CLDR
update.

Change-Id: I05bdaa9283365707bac0190ae983b31f074dd6ed
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-30 17:58:15 +01:00
Edward Welbourne
47d94dab0f Minor tidy-up in qlocalexml2cpp.py
Split a long line.
Use pythonic chained comparison to save some repetition.
Comment on a field not currently in actual use.
Say "zeros" rather than "0s" in one comment to match another.
Added a .h suffix to the main locale data tempfile to match the naming
of the tempfiles used for calendar data.

Simplify generation of the blank line between Language and Script; and
include a matching blank between Script and Country.
This adds one blank line to qlocale.h

Removed a stray space that misaligned locale data lines.
This produces a space-only change in the generated *_data_p.h files.

Change-Id: I974a9e8923c3dfd2178855d2cf1d6a5074e130b3
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-30 17:56:35 +01:00
Edward Welbourne
6852ba815d Correct some references to corelib/tools/ to say corelib/text/
The Unicode data tables moved with QString and friends.
So did the locale data generated from CLDR.

This amends commit a9aa206b7b.

Change-Id: If12f0420b559dcb78993adc00e9f39751bca684a
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-10-25 11:44:27 +02:00
Soroush Rabiei
7026645712 Add support for the Islamic Civil calendar
This has its own locale data, extracted from CLDR. This data may
potentially be shared with other variants on the Islamic calendar, so
is handled by a separate base-class, QHijriCalendar, on which such
variants may base their implementations.

[ChangeLog][QtCore][QCalendar] Added support for the Islamic Civil
calendar, controlled by feature islamiccivilcalendar, with locale data
that can be shared with other implementations, controlled by feature
hijricalendar.

Fixes: QTBUG-56675
Change-Id: Idf32d3da7034baa8ec5e66ef847e59a8a2f31cbd
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-08-22 10:10:02 +00:00
Soroush Rabiei
e71bf9d5c7 Add support for the Jalali (Solar Hijri or Persian) calendar
This has its own locale data, extracted from CLDR.

[ChangeLog][QtCore][QCalendar] Added support for the Jalali (Persian
or Solar Hijri) calendar, controlled by feature jalalicalendar.

Fixes: QTBUG-58404
Change-Id: Id5c56a10db05a4fd612aafc01615273db81ec743
Reviewed-by: Paul Wicking <paul.wicking@qt.io>
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-08-21 22:18:48 +02:00
Soroush Rabiei
aa8393c94f Add support for calendars beside Gregorian
Add QCalendarBackend as a base class for calendar implementations and
QCalendar as a facade via which to access it.

QDate's implicit implementation of the Gregorian calendar becomes
QGregorianCalendar and QDate methods now support choice of calendar.

Convert QLocale's CLDR data for month names to a locale-data component
of each supported calendar and relevant QLocale methods now support
choice of calendar. Adapt Python scripts for locale data generation to
extract month name data from CLDR (keeping on version v35.1) into the
new calendar-locale files. The locale data for the Gregorian calendar
is held in a Roman calendar base, for sharing with other calendars.

Add tests for basic uses of the new API.

[ChangeLog][QtCore][QCalendar] Added QCalendar to support diverse
calendars, supported by implementing QCalendarBackend.

[ChangeLog][QtCore][QDate] Allow choice of calendar in various
operations, with Gregorian remaining the default.

Done-with: Lars Knoll <lars.knoll@qt.io>
Done-with: Edward Welbourne <edward.welbourne@qt.io>
Fixes: QTBUG-17110
Fixes: QTBUG-950
Change-Id: I9d6278f394269a183aee8156e990cec4d5198ab8
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-08-20 13:41:21 +02:00
Soroush Rabiei
c595878aa3 Extract a large format string as a module constant value
The template for the "This is a generated file" notice made a clumsy
intrusion in the code in which it appeared, so split it out as a
constant of the module and access it by name where it's used.

Change-Id: Ic4dfb8e873078c54410b191654d6c21d082c9016
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-08-08 18:04:05 +02:00
Edward Welbourne
a9aa206b7b Move text-related code out of corelib/tools/ to corelib/text/
This includes byte array, string, char, unicode, locale, collation and
regular expressions.

Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-07-10 17:05:30 +02:00
Edward Welbourne
248b6756da Rename util/locale_database/ to include the e that was missing
It was misnamed local_database, quite missing the point of its name.

Change-Id: I73a4fdf24f53daac12304de1f443636d89afacb2
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2019-05-20 20:42:10 +02:00