As of Unicode 5.1, some punctuation marks were mapped to MidLetter and MidNumLet
for better URL and abbreviations handling which caused "hi.there" to be treated
like if it were just a single word;
until we have the Unicode Text Segmentation tailoring mechanism, retain
the old behavior by remapping (some of) those characters back to their old values.
Change-Id: I49dea6064f2ea40a82fc0b1bc3c4f0b4e803919f
Reviewed-by: David Faure <david.faure@kdab.com>
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Add BoundaryReason::BreakOpportunity flag that will be returned by the
boundaryReasons() when the boundary finder is at the break opportunity
position that might be not an item boundary.
This is the same as (StartWord || EndWord) in Grapheme and Sentence modes;
in Word and Line modes, BreakOpportunity flag might occur between the words
or in between of Line boundaries (e.g. in conjunction with SoftHyphen flag).
In other words, the text boundaries are always break opportunities, but not vice versa.
StartWord and EndWord flags has been deprecated by new StartOfItem and EndOfItem
flags which are not about the word boundaries only. In line breaking,
StartOfItem and EndOfItem are set for the mandatory breaks only.
Change-Id: I79bf297e2b988f5976f30cff0c8ca616385f6552
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
that will be returned by boundaryReasons() when the boundary finder
is at the line end position (CR, LF, NewLine Function, End of Text, etc.).
The MandatoryBreak flag, if set, means the text should be wrapped at a given position.
Change-Id: I32d4f570935d2e015bfc5f18915396a15f009fde
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Version 6.2 of the Unicode Standard is a special release
dedicated to the early publication of the newly encoded Turkish lira sign.
In addition, there are some significant changes to the Unicode algorithms
for text segmentation and line breaking to improve breaking for emoji symbols.
For more details, see http://www.unicode.org/versions/Unicode6.2.0/
Change-Id: I21cfd4f307e41b41a19d36cce87f7a44c2661bc2
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
for the case when the boundary finder is assigned to an invalid one.
Change-Id: I5b60984ff3fd99972fcae21895684bd83b012780
Reviewed-by: Simon Hausmann <simon.hausmann@digia.com>
A simple heuristic is used to detect the word beginning and ending by
looking at the word break property value of surrounding characters.
This behaves better than the white-spaces based implementation used before
and makes it possible to tailor the default algorithm for complex scripts.
BIG FAT WARNING: The QCharAttributes buffer now has to have a length
of string length + 1 for the flags at end of text.
Task-Id: QTBUG-6498
Change-Id: I5589b191ffde6a50d2af0c14a00430d3852c67b4
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Change copyrights and license headers from Nokia to Digia
Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
This function only makes sense on a developer build. In non
developer-builds you get:
tst_qtextboundaryfinder.cpp:116:13:
warning: ‘void generateDataFromFile(const QString&)’
defined but not used [-Wunused-function]
Change-Id: Id1bda2d27b00048f7401606959b566a59c05b38d
Reviewed-by: Toby Tomkins <tjtomkins@gmail.com>
Qt 5.0 beta requires changing the default to the 5.0 API, disabling
the deprecated code. However, tests should test (and often do) the
compatibility API too, so turn it back on.
Task-number: QTBUG-25053
Change-Id: I8129c3ef3cb58541c95a32d083850d9e7f768927
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
Just like for the QChar::ByteOrderMark, `ch == QChar::SoftHyphen`
is much more readable than `ch == 0x00ad // (soft-hyphen)`, etc.
Change-Id: I9c85f14cfd979037d35103c3259a435fd729b869
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>
I didn't do this earlier since the current test data doesn't contain any SMP code points,
the Unicode 6.2 test data does - so, I can confirm this code really works.
Change-Id: Ieae35e8480a89e22d846fd038e79592fefbbf2ee
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* The Line Breaking Algorithm implementation conformance tests has been added;
* The Grapheme, Word, and Sentence Breaking Algorithm implementation
conformance tests has been updated.
Change-Id: Ia1a6eef6272d580964cb23788ddf30dfd5f4a5a3
Note: the Line Break test data contains some extended cases we don't currently support;
just skip them for now.
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
SoftHyphen enum value was added to specify such a boundary reason
Change-Id: I4248909eed6ab8cbca419de4dcf9fe917620a158
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
The existing tests has been retained, the new ones has been added.
Change-Id: I12ae1b4e63dde46f3b14a7c1423c13d5881d4507
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
there are several reasons to do this:
* text breaking is not a shaper's job;
* since the text breaking rules are bound to a specific Unicode version,
updating Qt's internal unicode data would require updating the data in HB as well;
* makes porting to HurfBuzz-NG some easier
Change-Id: I0bbf8e8a343bc074696f4ddf2ae4e7fa32a61629
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
As in the past, to avoid rewriting various autotests that contain
line-number information, an extra blank line has been inserted at the
end of the license text to ensure that this commit does not change the
total number of lines in the license header.
Change-Id: I311e001373776812699d6efc045b5f742890c689
Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
These comments were mostly empty or inaccurate. Appropriate naming of
tests and appropriate placement of tests within the directory tree
provide more reliable indicators of what is being tested.
Change-Id: Ib6bf373d9e79917e4ab1417ee5c1264a2c2d7027
Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
In .pro files, removed wince/symbian-specific DEPLOYMENT cases and
replaced them with TESTDATA where appropriate.
In .cpp files, removed SRCDIR and relative paths to testdata and
replaced them with the QFINDTESTDATA macro where appropriate.
Modified test helper apps/libs to install themselves under the test
they relate to.
This change allows corelib tests to be correctly installed, along with
their testdata, via `make install'.
Change-Id: I5e202e2f3b577af7e39072d5c9fe13e0ca125304
Reviewed-by: Jason McDonald <jason.mcdonald@nokia.com>
Remove various disabled and/or non-helpful debugging code.
Any test diagnostics that are useful should be part of the regular test
output, as the CI system cannot switch on commented-out code when there
is a test failure. Diagnostics should also be informative -- simply
printing the value of a variable with no other information about what is
being printed (or why it is being printed) is not informative.
Change-Id: I21a6c2121be86001bb57e80f426507b6e619ee9e
Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>
qttest_p4.prf was added as a convenience for Qt's own autotests in Qt4.
It enables various crufty undocumented magic, of dubious value.
Stop using it, and explicitly enable the things from it which we want.
Change-Id: I7c1ffe9c8c294dbdc988e1582e580b1ed3f4593e
Reviewed-by: Jason McDonald <jason.mcdonald@nokia.com>
Symbian is not a supported platform for Qt5, so this code is no longer
required.
Change-Id: I1172e6a42d518490e63e9599bf10579df08259aa
Reviewed-on: http://codereview.qt-project.org/5657
Reviewed-by: Rohan McGovern <rohan.mcgovern@nokia.com>