2021-07-05 15:45:26 +00:00
|
|
|
#!/usr/bin/env python3
|
2022-05-10 10:06:48 +00:00
|
|
|
# Copyright (C) 2021 The Qt Company Ltd.
|
|
|
|
# SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GPL-3.0-only WITH Qt-GPL-exception-1.0
|
2021-07-07 10:41:10 +00:00
|
|
|
"""Convert CLDR data to QLocaleXML
|
2017-05-23 13:24:35 +00:00
|
|
|
|
|
|
|
The CLDR data can be downloaded from CLDR_, which has a sub-directory
|
|
|
|
for each version; you need the ``core.zip`` file for your version of
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
choice (typically the latest). This script has had updates to cope up
|
2021-01-20 16:27:21 +00:00
|
|
|
to v38.1; for later versions, we may need adaptations. Unpack the
|
2017-05-23 13:24:35 +00:00
|
|
|
downloaded ``core.zip`` and check it has a common/main/ sub-directory:
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
pass the path of that root of the download to this script as its first
|
|
|
|
command-line argument. Pass the name of the file in which to write
|
|
|
|
output as the second argument; either omit it or use '-' to select the
|
|
|
|
standard output. This file is the input needed by
|
|
|
|
``./qlocalexml2cpp.py``
|
2017-05-23 13:24:35 +00:00
|
|
|
|
2018-02-16 13:59:17 +00:00
|
|
|
When you update the CLDR data, be sure to also update
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
src/corelib/text/qt_attribution.json's entry for unicode-cldr. Check
|
2021-05-04 11:20:32 +00:00
|
|
|
this script's output for unknown language, territory or script messages;
|
2018-08-15 12:09:38 +00:00
|
|
|
if any can be resolved, use their entry in common/main/en.xml to
|
|
|
|
append new entries to enumdata.py's lists and update documentation in
|
2019-05-27 17:13:54 +00:00
|
|
|
src/corelib/text/qlocale.qdoc, adding the new entries in alphabetic
|
2018-08-15 12:09:38 +00:00
|
|
|
order.
|
2018-02-16 13:59:17 +00:00
|
|
|
|
2019-05-28 16:19:38 +00:00
|
|
|
While updating the locale data, check also for updates to MS-Win's
|
|
|
|
time zone names; see cldr2qtimezone.py for details.
|
|
|
|
|
2021-07-07 10:41:10 +00:00
|
|
|
All the scripts mentioned support --help to tell you how to use them.
|
|
|
|
|
2021-11-05 15:04:34 +00:00
|
|
|
.. _CLDR: https://unicode.org/Public/cldr/
|
2017-05-23 13:24:35 +00:00
|
|
|
"""
|
2011-04-27 10:05:43 +00:00
|
|
|
|
2021-07-09 13:34:40 +00:00
|
|
|
from pathlib import Path
|
2020-03-17 16:39:43 +00:00
|
|
|
import sys
|
2021-07-07 10:41:10 +00:00
|
|
|
import argparse
|
2017-05-30 13:50:47 +00:00
|
|
|
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
from cldr import CldrReader
|
|
|
|
from qlocalexml import QLocaleXmlWriter
|
2020-02-19 14:17:16 +00:00
|
|
|
|
|
|
|
|
2021-07-07 10:41:10 +00:00
|
|
|
def main(out, err):
|
|
|
|
all_calendars = ['gregorian', 'persian', 'islamic'] # 'hebrew'
|
|
|
|
|
|
|
|
parser = argparse.ArgumentParser(
|
|
|
|
description='Generate QLocaleXML from CLDR data.',
|
|
|
|
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
|
|
|
|
parser.add_argument('cldr_path', help='path to the root of the CLDR tree')
|
|
|
|
parser.add_argument('out_file', help='output XML file name',
|
|
|
|
nargs='?', metavar='out-file.xml')
|
|
|
|
parser.add_argument('--calendars', help='select calendars to emit data for',
|
|
|
|
nargs='+', metavar='CALENDAR',
|
|
|
|
choices=all_calendars, default=all_calendars)
|
2020-02-19 14:17:16 +00:00
|
|
|
|
2021-07-07 10:41:10 +00:00
|
|
|
args = parser.parse_args()
|
2020-02-19 14:17:16 +00:00
|
|
|
|
2021-07-09 13:34:40 +00:00
|
|
|
root = Path(args.cldr_path)
|
|
|
|
root_xml_path = 'common/main/root.xml'
|
|
|
|
|
|
|
|
if not root.joinpath(root_xml_path).exists():
|
2021-07-07 10:41:10 +00:00
|
|
|
parser.error('First argument is the root of the CLDR tree: '
|
2021-07-09 13:34:40 +00:00
|
|
|
f'found no {root_xml_path} under {root}')
|
2020-02-19 14:17:16 +00:00
|
|
|
|
2021-07-07 10:41:10 +00:00
|
|
|
xml = args.out_file
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
if not xml or xml == '-':
|
|
|
|
emit = out
|
|
|
|
elif not xml.endswith('.xml'):
|
2021-07-07 10:41:10 +00:00
|
|
|
parser.error(f'Please use a .xml extension on your output file name, not {xml}')
|
2020-02-19 14:17:16 +00:00
|
|
|
else:
|
|
|
|
try:
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
emit = open(xml, 'w')
|
|
|
|
except IOError as e:
|
2021-07-07 10:41:10 +00:00
|
|
|
parser.error(f'Failed to open "{xml}" to write output to it')
|
2020-02-19 14:17:16 +00:00
|
|
|
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
# TODO - command line options to tune choice of grumble and whitter:
|
|
|
|
reader = CldrReader(root, err.write, err.write)
|
|
|
|
writer = QLocaleXmlWriter(emit.write)
|
2020-02-19 14:17:16 +00:00
|
|
|
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
writer.version(reader.root.cldrVersion)
|
2023-08-01 10:35:26 +00:00
|
|
|
writer.enumData(reader.root.englishNaming)
|
Rework cldr2qlocalexml.py's reading of CLDR data
Move the code out to a CldrReader class in cldr.py, expand CldrAccess
with facilities that needs, expand ldml.py to include support for more
features, finally making xpathlite.py redundant. This initial commit
aims, though, to be bug-for-bug compatible with xpathlite in its
reading of the CLDR data.
It turns out we've been using draftier data than we were aware of
(which might not be a bad thing). The xpathlite code appeared to check
for draft attributes, but these only appear on leaf nodes and most
data were fetched by finding a parent and then scanning its children
without the draft check; only am/pm data was actually being excluded
based on draft values. (We allowed contributed, for am/pm, in
addition to approved, which is all the xpathlite code allows
otherwise.) There are also some less equivocal bugs; I'll deal with
these in later commits.
Simplified number-system data look-ups; the old get_number_in_system()
was taking care of old LDML versions' placement of the number system
attribute; this is no longer needed. (It was also being used for a
currency value to which it was not appropriate, which is now handled
separately; this is one of the bugs mentioned above.) Ditched a
fall-back to nativeZeroDigit, which no longer exists in CLDR.
Change the command-line to take the root of the CLDR data tree, rather
than its common/main/ sub-directory. Support naming the file to which
to write output, as a second command-line argument, instead of always
writing to stdout (which remains the default) and leaving whoever runs
the script to redirect stdout.
Support (internally for now, while adding TODOs to give main() more
command-line options) separating the stderr output into its more and
less interesting parts; for now, continue producing both, but suppress
the least interesting entirely.
Task-number: QTBUG-81344
Change-Id: Ie611b47403a9452b51feaeeaaa0fbc8f7e84dc71
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2020-02-27 12:58:58 +00:00
|
|
|
writer.likelySubTags(reader.likelySubTags())
|
2021-07-07 10:41:10 +00:00
|
|
|
writer.locales(reader.readLocales(args.calendars), args.calendars)
|
2020-02-19 14:17:16 +00:00
|
|
|
|
2021-05-04 10:06:42 +00:00
|
|
|
writer.close(err.write)
|
2020-02-19 14:17:16 +00:00
|
|
|
return 0
|
|
|
|
|
|
|
|
if __name__ == '__main__':
|
2021-07-07 10:41:10 +00:00
|
|
|
sys.exit(main(sys.stdout, sys.stderr))
|