Commit Graph

3 Commits

Author SHA1 Message Date
Ievgenii Meshcheriakov
c4e550703c Update UCD to Revision 30
This corresponds to Unicode version 15.0.0.

Added the following scripts:

    * Kawi
    * Nag Mundari

Full support of these scripts requires harfbuzz version 5.2.0,
this version adds support for Unicode 15.0:

    https://github.com/harfbuzz/harfbuzz/releases/tag/5.2.0

Fixes: QTBUG-106810
Change-Id: Ib06c526e49b0f01ef9f21123bcf875c6b19f2601
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2022-10-11 14:10:59 +00:00
Ievgenii Meshcheriakov
826fc8c9bd Update UCD to Revision 28
This corresponds to Unicode version 14.0.0.

Added the following scripts:

    * CyproMinoan
    * OldUyghur
    * Tangsa
    * Toto
    * Vithkuqi

Full support of these scripts requires harfbuzz version 3.0.0,
this version adds support for Unicode 14.0:

    https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0

With this release 10 test cases in tst_qurluts46 were fixed, one
additional test case is failing in tst_qtextboundaryfinder and
is commented out. In total 62 line break test cases and 44 word
break test cases are failing.

A comment in src/corelib/text/qt_attribution.json was updated to
include the URL of the page containing UCD version number.

Fixes: QTBUG-94359
Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-10-18 16:45:10 +00:00
Giuseppe D'Angelo
a794c5e287 Unicode: fix the extended grapheme cluster algorithm
UAX #29 in Unicode 11 changed the EGC algorithm to its current form.
Although Qt has upgraded the Unicode tables all the way up to
Unicode 13, the algorithm has never been adapted; in other words,
it has been working by chance for years. Luckily, MOST
of the cases were dealt with correctly, but emoji handling
actually manages to break it.

This commit:

* Adds parsing of emoji-data.txt into the unicode table generator.
  That is necessary to extract the Extended_Pictographic property,
  which is used by the EGC algorithm.

* Regenerates the tables.

* Removes some obsoleted grapheme cluster break properties, and
  adds the ones added in the meanwhile.

* Rewrites the EGC algorithm according to Unicode 13. This is
  done by simplifying a lot the lookup table. Some rules (GB11,
  GB12, GB13) can't be done by the table alone so some hand-rolled
  code is necessary in that case.

* Thanks to these fixes, the complete upstream GraphemeBreakTest
  now passes. Remove the "edited" version that ignored some rows
  (because they were failing).

Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b
Pick-to: 6.1 6.0 5.15
Fixes: QTBUG-92822
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2021-04-16 20:31:39 +02:00