qt5base-lts/util/unicode
Edward Welbourne d38f635355 Clean up and update Unicode character data 3rd-party infrastructure
Document how to do an update, fix the bit-rot that had crept into
main.cpp since last it was compiled, correct the qt_attribution.json
to use the actual version number of UCD (its Revision number) instead
of the (admittedly correlated) Unicode release number.  Updated to
Release 22 (which came with Unicode 11.0.0) in the process; but this
doesn't change our actual qunicodetables.cpp (so is incidental).

Task-number: QTBUG-71281
Change-Id: Ieb7a6e1a4d49f639993f76ff82c8f12a572db3c3
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2018-11-11 22:09:27 +00:00
..
codecs/big5 Updated license headers 2016-01-21 18:55:18 +00:00
data Update Text segmentation and line break data to Unicode 10.0 2018-01-03 07:47:26 +00:00
x11 Initial import from the monolithic Qt. 2011-04-27 12:05:43 +02:00
.gitattributes Initial import from the monolithic Qt. 2011-04-27 12:05:43 +02:00
main.cpp Clean up and update Unicode character data 3rd-party infrastructure 2018-11-11 22:09:27 +00:00
README Clean up and update Unicode character data 3rd-party infrastructure 2018-11-11 22:09:27 +00:00
unicode.pro Initial import from the monolithic Qt. 2011-04-27 12:05:43 +02:00
writingSystems.sh Updated license headers 2016-01-21 18:55:18 +00:00

Unicode is used to generate the unicode data in src/corelib/tools.

To update:
* Find the data (UAX #44, UCD; not the XML version) at
  ftp://www.unicode.org/Public/zipped/$Version/
* Unpack the zip file; for each file in data/, replace with the new
  version; find the *BreakProperty.txt in auxiliary/. (These last are
  only in the zip, not in the web-space's unpacked versions.)
* If needed, add an entry to enum QChar::UnicodeVersion for the new
  Unicode version
* In that case, also update main.cpp's initAgeMap and DATA_VERSION_S*
  to match
* Build this project. Its binary, unicode, ignores command-line
  options and assumes it is being run from this directory. When run,
  it produces lots of output. Hopefully that doesn't matter.
* Assertions may trigger: if so, study code and understand what's more
  complicated about this update; talk to folk named in the git logs,
  maybe push a WIP to gerrit to solicit advice. Some bit-field may
  need to be expanded, for example. In some cases QChar may need
  additions to some of its enums.
* Build with the modified code, fix any compilation issues.
* That may have updated qtbase/src/corelib/tools/qunicodetables.cpp;
  if so the update matters; be sure to commit the changes to data/ at
  the same time and update tools/qt_attribution.json to match; use the
  UCD Revision number, rather than the Unicode standard number, as the
  Version, for all that qunicodetables.cpp uses the latter.

The script writingSystems.sh generates a list of writing systems,
ostensibly as a the basis for updating QFontDatabase::WritingSystem
enum; however, the Release 20 output of it contains many more writing
systems than are present in that enum, suggesting it has not been run
in a very long time. Further research needed.