Commit Graph

26 Commits

Author SHA1 Message Date
Konstantin Ritt
2fbb69a093 QTBF: Fix issue with no splitting the words at "." (FULL STOP)
As of Unicode 5.1, some punctuation marks were mapped to MidLetter and MidNumLet
for better URL and abbreviations handling which caused "hi.there" to be treated
like if it were just a single word;
until we have the Unicode Text Segmentation tailoring mechanism, retain
the old behavior by remapping (some of) those characters back to their old values.

Change-Id: I49dea6064f2ea40a82fc0b1bc3c4f0b4e803919f
Reviewed-by: David Faure <david.faure@kdab.com>
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
2012-11-23 11:59:50 +01:00
Konstantin Ritt
2672c4fa91 Update the Unicode Data and Algorithms up to Unicode 6.2
Version 6.2 of the Unicode Standard is a special release
dedicated to the early publication of the newly encoded Turkish lira sign.
In addition, there are some significant changes to the Unicode algorithms
for text segmentation and line breaking to improve breaking for emoji symbols.

For more details, see http://www.unicode.org/versions/Unicode6.2.0/

Change-Id: I21cfd4f307e41b41a19d36cce87f7a44c2661bc2
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
2012-10-09 03:04:41 +02:00
Iikka Eklund
be15856f61 Change copyrights from Nokia to Digia
Change copyrights and license headers from Nokia to Digia

Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
2012-09-22 19:20:11 +02:00
Konstantin Ritt
b57e2162ef QUnicodeTables: some internal API renamings
enums GraphemeBreak, WordBreak, and SentenceBreak has been renamed to
GraphemeBreakClass, WordBreakClass, and SentenceBreakClass respectively,
their values has been renamed to contain a '_' as logical enum-value separator
(just like many other nums in Qt, e.g. LineBreakClass);
*BreakFormat has been replaced with *Break_Extend (some format characters are
kind of subtype of the extender characters, not vice versa).

Change-Id: I9ddbcf8848da87409736c2d6d1798a62fa28cab8
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-06-22 09:47:59 +02:00
Konstantin Ritt
c1329fba13 Clean-up the Unicode tables generator code and the generated header
This fixes the blocks and memory consumption reports, the whitespace issues
and makes the code a bit cleaner.

Since I'm the only one who does change this code, such a no-op commit
could not hurt anyone or even git blame ;)

Change-Id: Ib069f925a3791c82e16c368c8392bcffbfd68c53
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2012-06-22 09:47:59 +02:00
Konstantin Ritt
d8c04d60db Make QUnicodeTables::script() support SMP code points
Instead of expanding the scripts table with script values for the code points
>= 0x10000, it has been merged with the properties table in order to
increase perfomance of the script itemization code (not affected yet).
(Stats: the properties table grew up in 97428-89800 = 7628 bytes;
        the old scripts table was of size 7680 bytes)

The outdated ScriptsInitial.txt and ScriptsCorrections.txt file has been removed
(they were just empty, the "corrigendum" script corrections should be applied
to Scripts.txt directly, *no customization allowed*!).

More script testcases has been added - at least one per supported script.

Task-number: QTBUG-6530

Change-Id: I40a9e76f681e2dd552fd4c61af0808d043962e79
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-06-14 05:22:13 +02:00
Konstantin Ritt
12e0319213 Line Breaking Algorithm: handle the Object Replacement Character
See http://www.unicode.org/reports/tr14/#CB
and http://www.unicode.org/reports/tr14/#LB20 for details

Change-Id: Ice0aa2b2ce81f6e39839a353240420436eddd754
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-06-10 15:58:13 +02:00
Konstantin Ritt
60e1892d83 Update the qunicodetables generator to deal with UCD 6.1 files
Change-Id: If22018ff83cfc6b9c984f689648da038fce11d84
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-06-10 15:57:49 +02:00
Konstantin Ritt
2b15c1b30f Move ScriptSentinel enum from header to .cpp
Change-Id: Ic74e8e2471e92aa2014735f6ab0bb4f3b88de206
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-05-25 21:48:44 +02:00
Konstantin Ritt
ba300f42bd QChar: add isSurrogate() and isNonCharacter() to the public API
+ QChar::LastValidCodePoint enum value that supercede the UNICODE_LAST_CODEPOINT macro
replace uses of hardcoded values with the new API; remove leftovers

Change-Id: I1395c9840b85fcb6b08e241b131794a98773c952
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2012-05-16 04:24:56 +02:00
Konstantin Ritt
3fe02eaa12 significant unicodetables generator performance optimization
since the entire range of a valid unicode code points is in use, QHash
is suboptimal and could be replaced with QList;
taking the value by ref and not inserting it back to the map + not calculating
the default value over and over gains us up to 60% performance boost!

Change-Id: I48c54a8e88472cf76c79c0aac44e65eeefa44861
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-05-11 10:38:25 +02:00
Konstantin Ritt
8c0048a377 add some useful methods to QUnicodeTables::
in order to reduce code duplication and prepare the ground for upcoming changes

Change-Id: I980244149f65384c9484bbec4682de8b7b848b08
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-05-10 11:34:25 +02:00
Konstantin Ritt
46b78113b2 add support for non-BMP ligatures
> http://www.unicode.org/versions/Unicode5.2.0/
D. Character Additions:
There are three new characters in the newly-encoded Kaithi script that will
require changes in implementations which make hard-coded assumptions about
composition during normalization. Most new characters added to the standard
with decompositions cannot be generated by the operations toNFC() or toNFKC),
but these three can. Implementers should check their code carefully
to ensure that it handles these three characters correctly.
U+1109A KAITHI LETTER DDDHA
U+1109C KAITHI LETTER RHA
U+110AB KAITHI LETTER VA

UCD 6.1 adds two more of them:
U+1112E CHAKMA VOWEL SIGN O
U+1112F CHAKMA VOWEL SIGN AU

Change-Id: I781a26848078d8b83a182b0fd4e681be2a6d9a27
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-05-04 15:24:52 +02:00
Konstantin Ritt
f948bb3c6c qunicodetables generator: improve the output and the generated code
better memory usage report;
an additional asserts with conditions the implementation is depends on;
a namespace for the internal static data;
styling fixes

Change-Id: Id4048ff6104c56b5f590f9ac6fbf7c0bce79ec47
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2012-04-24 21:45:00 +02:00
Konstantin Ritt
ba0d752c2d workaround issue where casing diff overflows signed short
there are two such codepoints were added in the Unicode 5.1:
  U+1D79  LATIN SMALL LETTER INSULAR G
  U+A77D  LATIN CAPITAL LETTER INSULAR G
two more of them were added in the Unicode 6.0:
  U+0265  LATIN SMALL LETTER TURNED H
  U+A78D  LATIN CAPITAL LETTER TURNED H
and two more were added in the Unicode 6.1:
  U+0266  LATIN SMALL LETTER H WITH HOOK
  U+A7AA  LATIN CAPITAL LETTER H WITH HOOK

we map them like special cases with length == 1
(note: all are in BMP which is checked explicitly in the generator)

Change-Id: I8a34164eb3ee2e575b7799cc12d4b96ad5bcd9c6
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-04-24 21:45:00 +02:00
Konstantin Ritt
50fefebc84 replace hardcoded values with a surrogate handling methods
Change-Id: Iba079953c46a29404232d2dacbe0c90170097d51
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-04-11 01:42:12 +02:00
Konstantin Ritt
3b778df102 minor improvement for NormalizationCorrections
let's don't hardcode the latests affected version value and simply use
the one parsed from NormalizationCorrections.txt

Change-Id: I37021e8238d77deada4c5ba7a2d160c87186b9dd
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-04-11 01:42:12 +02:00
Konstantin Ritt
9514138a5c optimize QString::toLower()/toUpper() for special cases, step 2
from now, QUnicodeTables::specialCaseMap[] starts with a placeholder; so,
if somethingCaseSpecial is true, then somethingCaseDiff is always greater than 0

Change-Id: Ibb1870512836eee71b1521564c0745096c05b2f9
Merge-request: 70
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>
Reviewed-by: Olivier
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
2012-02-21 22:31:00 +01:00
Konstantin Ritt
5f04962132 optimize QString::toLower()/toUpper() for special cases, step 1
reorganize QUnicodeTables::specialCaseMap as follows:
specialCaseMap contains sequence entries in form { length, a, b, .. }

Change-Id: Iea1f80bc2f4dc1f505428dad981cde26daaa52c7
Merge-request: 70
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>
Reviewed-by: Olivier
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
2012-02-21 22:31:00 +01:00
Jason McDonald
5635823e17 Remove "All rights reserved" line from license headers.
As in the past, to avoid rewriting various autotests that contain
line-number information, an extra blank line has been inserted at the
end of the license text to ensure that this commit does not change the
total number of lines in the license header.

Change-Id: I311e001373776812699d6efc045b5f742890c689
Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
2012-01-30 03:54:59 +01:00
Jason McDonald
629d6eda5c Update contact information in license headers.
Replace Nokia contact email address with Qt Project website.

Change-Id: I431bbbf76d7c27d8b502f87947675c116994c415
Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
2012-01-23 04:04:33 +01:00
Jason McDonald
1fdfc2abfe Update copyright year in license headers.
Change-Id: I02f2c620296fcd91d4967d58767ea33fc4e1e7dc
Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
2012-01-05 06:36:56 +01:00
Ritt Konstantin
42402a1672 replace 'const QChar &' with 'QChar ' for QChar and QString
Merge-request: 69
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>

Change-Id: I61f5a54b783252029fcad95677958fa6a2130d01
Reviewed-by: Olivier Goffart <ogoffart@kde.org>
2011-10-26 19:59:36 +02:00
Ritt Konstantin
d17c76feee drop an obsolete QChar::NoCategory enum value
there is no such category in the Unicode specs. the QChar::NoCategory
was a subject of bugs since it was introduced. int 4.6 it's meaning was
limited to mention ucs4 > UNICODE_LAST_CODEPOINT only (which is useless anyways)
in order to preserve the old (wrong) behavior.
fix it now for qtbase

Change-Id: I630534824e071090b39772881e747c1fdb758719
Reviewed-on: http://codereview.qt.nokia.com/1584
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2011-07-13 13:31:13 +02:00
Jyri Tahtela
f9f395c28b Update licenseheader text in source files for qtbase Qt module
Updated version of LGPL and FDL licenseheaders.
Apply release phase licenseheaders for all source files.

Reviewed-by: Trust Me
2011-05-24 12:34:08 +03:00
Qt by Nokia
38be0d1383 Initial import from the monolithic Qt.
This is the beginning of revision history for this module. If you
want to look at revision history older than this, please refer to the
Qt Git wiki for how to use Git history grafting. At the time of
writing, this wiki is located here:

http://qt.gitorious.org/qt/pages/GitIntroductionWithQt

If you have already performed the grafting and you don't see any
history beyond this commit, try running "git log" with the "--follow"
argument.

Branched from the monolithic repo, Qt master branch, at commit
896db169ea224deb96c59ce8af800d019de63f12
2011-04-27 12:05:43 +02:00