Commit Graph

86 Commits

Author SHA1 Message Date
Edward Welbourne
c834dbc6fb Move cldr2qtimezone.py's CLDR-reading to a CldrAccess class
This begins the process of replacing xpathlite.py, adding low-level
DOM-access classes to ldml.py and the CldrAccess class to cldr.py

Moved a format comment from cldr2qtimezone.py's doc-string to the
method of CldrAccess that does the actual reading.

Task-number: QTBUG-81344
Change-Id: I46ae3f402f8207ced6d30a1de5cedaeef47b2bcf
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:43:13 +01:00
Edward Welbourne
9fab53a513 Rework qlocalexml2cpp.py to use writers based on Transcriber
This saves repetition of temporary-file manipulation code. In the
process, ensure that we tidy away temporary files on failure.

Moved a comment in qlocale.h to *outside* the re-written portion, to
save having to rewrite it every time. Added blank lines to separate
script data from country data in the generated output. Changed 0s in
one comment to zeros, to match another comment.

Isolated use of sys to the __main__ block.
Isolated use of enumdata to the new LocaleHeaderWriter class.
Modernised all the string-formatting I touched.

Task-number: QTBUG-81344
Change-Id: I5768e45d9a8ea23facc303b3dd8af8b3ccbf7ff2
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:56 +01:00
Edward Welbourne
5b1c33cc78 Rework cldr2qtimezone.py into more maintainable form
Broke out the updating of a source file to a ZoneIdWriter helper
class, which enables tidying away the temporary file if we fail.
Collected up the rest of the script into a main() that's now
called from a __name__ == '__main__' block.
Rationalized the imports.

Eliminated an inefficient lookup function by constructing a suitable
dict() before entering the loop that needed it.

Separated the "data you might need to update" tables from the code
that does the work, to make it easier for those adding support for new
zones to see what they're doing.

Removed the spurious $Revision$ from the output and reworded the
premable of the generated file. (It would seem CLDR no longer uses an
RCS-based version-control system.) Generated output is otherwise
unchanged.

Task-number: QTBUG-81344
Change-Id: I7d9de8357ebcb599d154de9f862e25f7ade00390
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:50 +01:00
Edward Welbourne
bb4242341b Add tools to localetools to facilitate source file recreation
For now unused; later commits shall put them to use.
Transcriber -- base, takes care of tempfile and renaming.
SourceFileEditor -- handles copying parts before and after a common delimiter.

Task-number: QTBUG-81344
Change-Id: I28cf977d0a08825fbb873fb330da6823b88ad3ed
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:45 +01:00
Edward Welbourne
c3dea1ffca Move some shared code to a localetools module
The time-zone script was importing two functions from the locale data
generation script. Move them to a separate module, to which I'll
shortly add some more shared utilities. Cleaned up some imports in the
process.

Combined qlocalexml2cpp's and xpathlit's error classes into a new
Error class in the new module and made it a bit more like a proper
python error class.

Task-number: QTBUG-81344
Change-Id: Idbe0139ba9aaa2f823b8f7216dee1d2539c18b75
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:40 +01:00
Edward Welbourne
4d9f1a87de Move qlocalexml2cpp.py's XML-reading to QLocaleXmlReader
This new class mirrors the existing QLocaleXmlWriter and places the
two side-by-side in qlocalexml.py, rather than having the writing and
reading in separate places.

Made judicious use of transformed versions of mappings to save
repeated iteration of a mapping's entries to do lookups on fist
entries of pair-values; several (id, name, code) data-sets are
sometimes indexed by id, sometimes by name.

Reworked the default_map, that the complicated compareLocaleKeys()
used in sorting locale keys, to map IDs instead of names; the function
also needed the locale_map so that it could convert IDs to names,
which we can skip by going directly with IDs.

Task-number: QTBUG-81344
Change-Id: Iff6a97f7f0755b56dda70d8a6796ec074c558910
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:34 +01:00
Edward Welbourne
a20697a394 Rework cldr2qlocalexml.py in terms of a QLocaleXmlWriter class
Delegate the output of XML to a helper class provided by qlocalexml.py
and restructure the driver script so that it can be imported without
running anything. It now has a minimal __name__ == '__main__' block
that calls a main() function. This, for the moment, requires a global
via which it shares the CLDR directory with various other functions;
that shall go away in a later commit.

Task-number: QTBUG-81344
Change-Id: Ica2d3ec09f2d38ba42fd930258cc765283f29a71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-04-02 19:42:28 +01:00
Simon Hausmann
ff922e7b87 Merge remote-tracking branch 'origin/5.15' into dev
Conflicts:
	src/corelib/kernel/qmetatype.cpp

Change-Id: I88eb0d3e9c9a38abf7241a51e370c655ae74e38a
2020-03-16 18:41:27 +01:00
Edward Welbourne
ebcd8e16db Deduplicate day-name data in QLocaleXML files
This is a follow-up to commit ebb0212133.
The day name data appeared twice in the XML files.
Skip the second copy, saving 8.8% of the intermediate file-size.
This makes no change to generated QLocale data.

Change-Id: Ic2cc543a2a85cbb1d2d47ebac7df4fa9ad6ee0a7
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-03-16 08:51:46 +01:00
Lars Knoll
2a4b957789 Merge remote-tracking branch 'origin/5.15' into dev
Change-Id: I99ee6f8b4bdc372437ee60d1feab931487fe55c4
2020-03-04 14:39:18 +00:00
Edward Welbourne
84382bde5c Rename the localexml module to qlocalexml
It implements interaction with the QLocaleXML file format type, so
rename it to match.

Task-number: QTBUG-81344
Change-Id: I46302d4ac1038cdfc5929e73b554b6d793814c56
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-03-03 07:38:06 +01:00
Edward Welbourne
54413653d5 Rename the endonym members of the Locale type
All other members had camelCase names, but the endonyms had
prefix_endonym names, requiring munging where they were emitted to
XML. So just do that munging upstream in the attribute name of the
Locale objects. Makes no change to the data output by the scripts, not
even to the intermediate QLocaleXML file.

Task-number: QTBUG-81344
Change-Id: I01c15a822216281dc669b3e7ebda096d18b04f9b
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-03-03 07:38:06 +01:00
Edward Welbourne
c8e00e4f73 Use char16_t in favor of ushort for locale data tables
Change-Id: I890dd2b52c1b786db1081744c8ca343baba93de4
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-02-17 14:55:42 +01:00
Edward Welbourne
ed2b110b6a Allow surrogate pairs for various "single character" locale data
Extract the character in its proper unicode form and encode it in a
new single_character_data table of locale data. Record each entry as
the range within that table that encodes it. Also added an assertion
in the generator script to check that the digits CLDR gives us are a
contiguous sequence in increasing order, as has been assumed by the
C++ code for some time. Lots of number-formatting code now has to take
account of how wide the digits are.

This leaves nowhere for updateSystemPrivate() to record values read
from sys_locale->query(), so we must always consult that function when
accessing these members of the systemData() object. Various internal
users of these single-character fields need the system-or-CLDR value
rather than the raw CLDR value, so move QLocalePrivate's methods to
supply them down to QLocaleData and ensure they check for system
values, where appropriate first.

This allows us to finally support the Chakma language and script, for
whose number system UTF-16 needs surrogate pairs.

Costs 10.8 kB in added data, much of it due to adding two new locales
that need surrogates to represent digits.

[ChangeLog][QtCore][QLocale] Various QLocale methods that returned
single QChar values now return QString values to accommodate those
locales which need a surrogate pair to represent the (single
character) return value.

Fixes: QTBUG-69324
Fixes: QTBUG-81053
Change-Id: I481722d6f5ee266164f09031679a851dfa6e7839
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-02-17 14:55:24 +01:00
Liang Qi
6b2535ea15 Merge remote-tracking branch 'origin/5.15' into dev
Conflicts:
	examples/widgets/graphicsview/boxes/scene.h
	src/corelib/Qt5CoreMacros.cmake
	src/corelib/Qt6CoreMacros.cmake
	src/network/ssl/qsslsocket.cpp
	src/network/ssl/qsslsocket.h
	src/platformsupport/fontdatabases/windows/qwindowsfontenginedirectwrite.cpp
	src/testlib/CMakeLists.txt
	src/testlib/.prev_CMakeLists.txt
	tests/auto/corelib/tools/qscopeguard/tst_qscopeguard.cpp

Disabled building manual tests with CMake for now, because qmake
doesn't do it, and it confuses people.

Done-With: Alexandru Croitor <alexandru.croitor@qt.io>
Done-With: Volker Hilsheimer <volker.hilsheimer@qt.io>
Change-Id: I865ae347bd01f4e59f16d007b66d175a52f1f152
2020-02-13 18:31:40 +01:00
Edward Welbourne
71fa90a37c Enable system locale to skip digit-grouping if configured to do so
On macOS it's possible to configure the system locale to not do digit
grouping (separating "thousands", in most western locales); it then
returns an empty string when asked for the grouping character, which
QLocale's system-configuration then ignored, falling back on using the
base UI locale's grouping separator. This could lead to the same
separator being used for decimal and grouping, which should never
happen, least of all when configured to not group at all.

In order to notice when this happens, query() must take care to return
an empty QString (as a QVariant, which is then non-null) when it *has*
a value for the locale property, and that value is empty, as opposed
to a null QVariant when it doesn't find a configured value. The caller
can then distinguish the two cases.

Furthermore, the group and decimal separators need to be distinct, so
we need to take care to avoid cases where the system overrides one
with what the CLDR has given for the other and doesn't over-ride that
other.

Only presently implemented for macOS and MS-Win, since the (other)
Unix implementation of the system locale returns single QChar values
for the numeric tokens - see QTBUG-69324, QTBUG-81053.

Fixes: QTBUG-80459
Change-Id: Ic3fbb0fb86e974604a60781378b09abc13bab15d
Reviewed-by: Ulf Hermann <ulf.hermann@qt.io>
2020-02-03 15:34:02 +01:00
Edward Welbourne
c08a31634f Separate offsets from sizes in QLocale's data
This enables us to make the sizes quint8 and benefit from the
resulting packing, making the locale data smaller. The sizes for long
month-name lists (which concatenate twelve names with semicolon as
separator) can overflow an 8-bit member, so use quint16 where needed.

Re-ordered the data in QLocaleData and QCalendarLocale. Now all
long-short(-narrow) families arise in that order; and any standalone
is grouped with the one of the same length. (This cost 20 bytes in the
date-format table, which optimises out more duplication if short is
before long, but the saving in the (smaller) time-format table more
than make up for it; and 20 bytes isn't worth the confusion that being
inconsistent in ordering might cause.)

At the same time, drop trailing semicolons from list entries (which
join various names with semicolon) as they're not needed: we know
where the end of the list is, because we know the size of the string
that results from concatenation. The code that parses such lists can
even correctly handle empty entries at the end.

Saves 26 kB of data in the compiled binaries.

Task-number: QTBUG-81053
Change-Id: If6ccc96a6910828817aa605d10fd814f567ae1e8
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-30 17:58:53 +01:00
Edward Welbourne
4e84a8b29f Deduplicate locale data tables
Some entries in tables were sub-strings (e.g. prefixes) of others.
Since we store start-index and length (with no need for terminators),
any entry that appears as a sub-string of an earlier entry can be
recorded without making a separate copy of its content, just by
recording where it appeared as a sub-string of an earlier entry.

(Sadly this doesn't apply to month- or day-names and their
short-forms: for those, we store ';'-joined lists.  Thus, although
each short-form is a prefix of its long-form, the short-form is stored
in a list with other short-forms; and this is not a prefix of the list
of matching long-forms.)

The savings are modest (780 bytes at present), but cost us nothing
except when running the python script that generates the data files
(it takes a little longer now), which usually only happens at a CLDR
update.

Change-Id: I05bdaa9283365707bac0190ae983b31f074dd6ed
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-30 17:58:15 +01:00
Edward Welbourne
47d94dab0f Minor tidy-up in qlocalexml2cpp.py
Split a long line.
Use pythonic chained comparison to save some repetition.
Comment on a field not currently in actual use.
Say "zeros" rather than "0s" in one comment to match another.
Added a .h suffix to the main locale data tempfile to match the naming
of the tempfiles used for calendar data.

Simplify generation of the blank line between Language and Script; and
include a matching blank between Script and Country.
This adds one blank line to qlocale.h

Removed a stray space that misaligned locale data lines.
This produces a space-only change in the generated *_data_p.h files.

Change-Id: I974a9e8923c3dfd2178855d2cf1d6a5074e130b3
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-30 17:56:35 +01:00
Edward Welbourne
d5bb8d5150 Preserve the case of the exponent separator CLDR supplies
We have long (since 4.5.1) coerced it to lower-case, for no readily
apparent, much less documented, reason. CLDR says most locales use an
upper-case E for this - let's actually use what CLDR says we should
use.

The code that matches the exponent separator was doing so
case-insensitively in any case; that needed adaptation now that the
separator's case isn't pre-determined; and, in any case, should have
been done using case-folding rather than upper-casing. In the process,
removed some spurious checks for "'e' or 'E'" in the result, since the
exponent separator is always represented by 'e' (and an 'e' might also
be present for the separate reason of its use as a beyond-decimal
digit representing fourteen).

[ChangeLog][QtCore][QLocale] QLocale::exponential() now preserves the
case of the CLDR source, where previously it was lower-cased.

Change-Id: Ic9ac02136cff79cb9f136d72141b5dbf54d9e0a6
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-30 17:56:14 +01:00
Edward Welbourne
0ef79d94f6 Ensure we use UTF-8 for the emitted QLocaleXML data file
Python helpfully uses a sensible locale when stdout is a tty but uses
the system (not the filesystem) default encoding, which may be ascii
and unable to encode some of the data we need to save. So brute force
kludge it to ensure sys.stdout.encoding is UTF-8 when writing the
output we'll read as UTF-8 anyway.

Task-number: QTBUG-79902
Change-Id: I218dc0ec4c71a6b1b7181db55b018266d803bc58
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-01-29 15:23:16 +01:00
Edward Welbourne
43f64b4dc8 Update CLDR to v36
Released on October 4th.
Adds Windows names for two time zones, Qyzylorda and Volgograd.
Added languages Chickasaw (cic), Muscogee (mus) and Silesian (szl).

Norwegian number formatting has flipped back to using colon rather
than dot as time separator; it's flipped back and forth over the last
several CLDR releases.  The dot form is present as a variant, the
colon form was long given as the normal pattern, then went away; but
now it's back as a contributed draft and that's what we pick up.

The MS-Win time-zone ID script was iterating a dict, causing random
reshuffling when new entries are added. Fixed that by doing the
critical iteration in sorted order.

Omitted locales ccp_BD and ccp_IN due to QTBUG-69324.

Task-number: QTBUG-79418
Change-Id: I43869ee1810ecc1fe876523947ddcbcddf4e550a
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-10-25 11:44:48 +02:00
Edward Welbourne
6852ba815d Correct some references to corelib/tools/ to say corelib/text/
The Unicode data tables moved with QString and friends.
So did the locale data generated from CLDR.

This amends commit a9aa206b7b.

Change-Id: If12f0420b559dcb78993adc00e9f39751bca684a
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-10-25 11:44:27 +02:00
Soroush Rabiei
7026645712 Add support for the Islamic Civil calendar
This has its own locale data, extracted from CLDR. This data may
potentially be shared with other variants on the Islamic calendar, so
is handled by a separate base-class, QHijriCalendar, on which such
variants may base their implementations.

[ChangeLog][QtCore][QCalendar] Added support for the Islamic Civil
calendar, controlled by feature islamiccivilcalendar, with locale data
that can be shared with other implementations, controlled by feature
hijricalendar.

Fixes: QTBUG-56675
Change-Id: Idf32d3da7034baa8ec5e66ef847e59a8a2f31cbd
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-08-22 10:10:02 +00:00
Soroush Rabiei
e71bf9d5c7 Add support for the Jalali (Solar Hijri or Persian) calendar
This has its own locale data, extracted from CLDR.

[ChangeLog][QtCore][QCalendar] Added support for the Jalali (Persian
or Solar Hijri) calendar, controlled by feature jalalicalendar.

Fixes: QTBUG-58404
Change-Id: Id5c56a10db05a4fd612aafc01615273db81ec743
Reviewed-by: Paul Wicking <paul.wicking@qt.io>
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-08-21 22:18:48 +02:00
Soroush Rabiei
aa8393c94f Add support for calendars beside Gregorian
Add QCalendarBackend as a base class for calendar implementations and
QCalendar as a facade via which to access it.

QDate's implicit implementation of the Gregorian calendar becomes
QGregorianCalendar and QDate methods now support choice of calendar.

Convert QLocale's CLDR data for month names to a locale-data component
of each supported calendar and relevant QLocale methods now support
choice of calendar. Adapt Python scripts for locale data generation to
extract month name data from CLDR (keeping on version v35.1) into the
new calendar-locale files. The locale data for the Gregorian calendar
is held in a Roman calendar base, for sharing with other calendars.

Add tests for basic uses of the new API.

[ChangeLog][QtCore][QCalendar] Added QCalendar to support diverse
calendars, supported by implementing QCalendarBackend.

[ChangeLog][QtCore][QDate] Allow choice of calendar in various
operations, with Gregorian remaining the default.

Done-with: Lars Knoll <lars.knoll@qt.io>
Done-with: Edward Welbourne <edward.welbourne@qt.io>
Fixes: QTBUG-17110
Fixes: QTBUG-950
Change-Id: I9d6278f394269a183aee8156e990cec4d5198ab8
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-08-20 13:41:21 +02:00
Soroush Rabiei
c595878aa3 Extract a large format string as a module constant value
The template for the "This is a generated file" notice made a clumsy
intrusion in the code in which it appeared, so split it out as a
constant of the module and access it by name where it's used.

Change-Id: Ic4dfb8e873078c54410b191654d6c21d082c9016
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-08-08 18:04:05 +02:00
Edward Welbourne
a9aa206b7b Move text-related code out of corelib/tools/ to corelib/text/
This includes byte array, string, char, unicode, locale, collation and
regular expressions.

Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-07-10 17:05:30 +02:00
Edward Welbourne
8bfae093ed Add data for Windows Time-Zone IDs added in the last two years
We've not run util/locale_database/cldr2qtimezone.py for a while, so
CLDR has had time to add several more zones.  Catch up, inserting the
new entries in order.

Change-Id: I8625548b0f7775958230eccbd89b897d7afed9e9
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2019-07-01 17:48:53 +02:00
Edward Welbourne
13242673cf Tidy up in cldr2qtimezone.py and document the need to run it
It wasn't mentioned in cldr2qlocalexml.py's instructions, so I didn't
know to run it.  The data it used in an illustration was out of date.
Two tests could be combined with no loss.

Change-Id: I26e619e6210ea5b1258326fc4bc2b6aee9d6a999
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-07-01 17:48:32 +02:00
Edward Welbourne
bbd64f64b2 cldr2qtimezone.py: report all missing zones, rather than just the first
When scanning the CLDR data, the script raised an exception if it
didn't recognize a zone ID.  Instead, collect up such unrecognized IDs
in a list and report them all at the end, so that whoever runs this
can do them all in one go, rather than doing one, running the script,
doing the next, running the script, ad nauseam.

Change-Id: Ia659f1d1c7e1c1b4ccb87cc23828a0588a5bf958
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-07-01 17:48:16 +02:00
Edward Welbourne
2f19a3053e Use simpler data structures in cldr2qtimezone.py
Use tuples for the fixed data.  The numbering of rows in the data
tables isn't part of any public API, so we can change it freely; it is
thus unnecessary, as we can just enumerate a tuple of the data values
to generate sequential indices on the fly.  (Updates to the data shall
no longer need to renumber in order to insert entries.)

Restore ordering of the data tables, and remove wanton spacing from
inside parens, in the process.

Change-Id: I59956cfb6191fe729300b57070671b7e66bd0379
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2019-07-01 17:47:51 +02:00
Edward Welbourne
548513a4bd Separate out the time, zone, date code from corelib/tools/
We'll be adding calendar code here as well, and tools/ was getting
rather crowded, so it looks like time to move out a reasonably
coherent sub-bundle of it all.

Change-Id: I7e8030f38c31aa307f519dd918a43fc44baa6aa1
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-06-06 15:54:32 +02:00
Edward Welbourne
5b672693e7 Add locale support for Cebuano and Erzya languages (new in CLDR v35.1)
Change-Id: I5d0ee7bc27eeca1c046d442b0410128ea5abbdb3
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2019-05-20 20:42:11 +02:00
Edward Welbourne
b7d8169f02 Suggest name, when available, for unknown codes
When parsing the CLDR data, we only handle language, script and
territory (which we call country) codes if they are known to our
enumdata.py tables.  When reporting the rest as unknown, in the
content of an actual locale definition (not the likely subtag data),
check whether en.xml can resolve the code for us; if it can, report
the full name it provides, as a hint to whoever's running the script
that an update to enumdata.py may be in order.

Change-Id: I9ca1d6922a91d45bc436f4b622e5557261897d7f
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2019-05-20 20:42:11 +02:00
Edward Welbourne
248b6756da Rename util/locale_database/ to include the e that was missing
It was misnamed local_database, quite missing the point of its name.

Change-Id: I73a4fdf24f53daac12304de1f443636d89afacb2
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2019-05-20 20:42:10 +02:00