C0 to DF take one continuation byte; E0 to EF take two.
It's invalid UTF-8 anyway, but at least this is what the test row meant:
overlong sequence with 3 bytes of what should have been two.
This updates the comment to match the character that we were actually
testing.
Change-Id: I85a8bd6da2c44f52b4e3fffd14a75df2600487aa
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Replace all QT_NO_PROCESS with QT_CONFIG(process), define it in
qconfig-bootstrapped.h, add QT_REQUIRE_CONFIG(process) to the qprocess
headers, exclude the sources from compilation when switched off, guard
header inclusions in places where compilation without QProcess seems
supported, drop some unused includes, and fix some tests that were
apparently designed to work with QT_NO_PROCESS but failed to.
Change-Id: Ieceea2504dea6fdf43b81c7c6b65c547b01b9714
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@qt.io>
The population of data rows was factored into a separate file, qutf8data.cpp, in
commit e20c4730. Merge commit 9bd03235 failed to track a conflicting change into
the new file, and brought the code back into the tst_utf8.cpp, where it has been
duplicating the utf8data data ever since.
Change-Id: I4282685b882448f927289468bd7ab340a21ea0b3
Reviewed-by: David Faure <david.faure@kdab.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
- port Q_FOREACH to C++11 range-for
- port uses of inefficient QLists to QVector
Fixes errors pointed out by my tree's static checks.
Change-Id: Ica50f44d862f635df06cb8f09ce506b9d30fdfc5
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
There's no sharing, and the use of QSharedPointer(T*)
triggers my tree's static analyzer.
Easiest fix is to port to QScopedPointer, which is the
correct smart pointer to begin with.
Change-Id: I105c1a334c3d6712a475600c8394b0bebc420677
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
I wrote a script to help find the files, but I reviewed the
contributions manually to be sure I wasn't claiming copyright for search
& replace, adding Q_DECL_NOTHROW or adding "We mean it" headers.
Change-Id: I7a9e11d7b64a4cc78e24ffff142b506368fc8842
Reviewed-by: Lars Knoll <lars.knoll@theqtcompany.com>
From Qt 5.7 -> tools & applications are lisenced under GPL v3 with some
exceptions, see
http://blog.qt.io/blog/2016/01/13/new-agreement-with-the-kde-free-qt-foundation/
Updated license headers to use new GPL-EXCEPT header instead of LGPL21 one
(in those files which will be under GPL 3 with exceptions)
Change-Id: I42a473ddc97101492a60b9287d90979d9eb35ae1
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Lars Knoll <lars.knoll@theqtcompany.com>
When the byte sequence for a BOM occurs in the middle of a utf8 stream,
it is a ZWNBSP.
When a ZWNBSP occurs in the middle of a utf8 character sequence, and the
SIMD conversion does some work (meaning: the length is at least 16
characters long), it would not recognize the fact some charactes were
already decoded. So the conversion would then strip the ZWNBSP out,
thinking it's a BOM.
The non-SIMD conversion did not have this problem: the very first
character conversion would already set the headerdone flag.
Change-Id: I39aacf607e2e068107106254021a8042d164f628
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Use character literals where applicable.
Change-Id: I1a026c320079ee5ca6f70be835d5a541deee2dd1
Reviewed-by: Simon Hausmann <simon.hausmann@theqtcompany.com>
The keyword no longer has a meaning for the new CI.
Change-Id: Ibcea4c7a82fb7f982cf4569fdff19f82066543d1
Reviewed-by: Simon Hausmann <simon.hausmann@theqtcompany.com>
- Replace Q[TRY]_VERIFY(pointer == 0) by Q[TRY]_VERIFY(!pointer).
- Replace Q[TRY]_VERIFY(smartPointer == 0) by
Q[TRY]_VERIFY(smartPointer.isNull()).
- Replace Q[TRY]_VERIFY(a == b) by Q[TRY]_COMPARE(a, b) and
add casts where necessary. The values will then be logged
should a test fail.
Tests from corelib/tools were omitted in this change.
Change-Id: I4c8786d33fcf429d11b2b624c7cd89c28cadb518
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
ICU doesn't support iso8859-16, so we need to fall back to
the Qt codec for this encoding.
Task-number: QTBUG-45053
Change-Id: I9754cf098c906fe8a75363a3d090029543cd0e35
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Properly QSKIP tests that use disabled QProcess and symlink
features instead of excluding them silently by #ifdef.
Other reason is that moc doesn't respect QT_NO_* defines
in class definition which causes build issues on some
platforms.
Change-Id: I041020f7452f7d36c7ec8a5866a4ba5eb23d1f94
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@theqtcompany.com>
Reviewed-by: Jason McDonald <macadder1@gmail.com>
Qt copyrights are now in The Qt Company, so we could update the source
code headers accordingly. In the same go we should also fix the links to
point to qt.io.
Outdated header.LGPL removed (use header.LGPL21 instead)
Old header.LGPL3 renamed to header.LGPL3-COMM to match actual licensing
combination. New header.LGPL-COMM taken in the use file which were
using old header.LGPL3 (src/plugins/platforms/android/extract.cpp)
Added new header.LGPL3 containing Commercial + LGPLv3 + GPLv2 license
combination
Change-Id: I6f49b819a8a20cc4f88b794a8f6726d975e8ffbe
Reviewed-by: Matti Paaso <matti.paaso@theqtcompany.com>
ICU would return a utf-16 (endian dependent) codec for unicode
which is very rarely what people want. In most cases, unicode is
encoded in utf8 these days, so return a utf8 codec for it.
Task-number: QTBUG-41998
Change-Id: I51ee758d520702b263a8b2011787eb1f3455ed96
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Replace "Apple Roman" by "Macintosh" which is registered by IANA:
http://tools.ietf.org/html/rfc1345.
Replace unsupported "GB18030-0" by "GB18030".
Remove "JIS X 0201" and "JIS X 0208" as they are supported as parts of
other Japanese encodings but not directly.
Add "HP-ROMAN8" which is supported by both Qt and ICU.
Also clean the codecs test.
Change-Id: Iaf8e8ff1900d3f92ea0e0df75c60fe1534de23ac
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Substitute this encoding by "macintosh" when Qt is built with ICU.
It is for compatibility with Qt 4.x.
Change-Id: I70c51cba7d473ac81e25862736cb71a2f6894055
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
When a UTF-8 sequences is too short, QUtf8Functions::fromUtf8 returns
EndOfString. If the decoder is stateful, we must save the state and then
restart it when more data is supplied.
The new stateful decoder (8dd47e34b9)
mishandled the Error case by advancing the src pointer by a negative
number, thus causing a buffer overflow (the issue of the task).
And it also did not handle the len == 0 case properly, though neither
did the older decoder.
Task-number: QTBUG-38939
Change-Id: Ie03d7c55a04e51ee838ccdb3a01e5b989d8e67aa
Reviewed-by: Kai Koehne <kai.koehne@digia.com>
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
The SSE2-based conversion only kicks in with strings with 8 characters
or more in length.
Change-Id: Ic1daad0845571be03547553cc001d83550f0c89c
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
Commit 125bb81bef wasn't enough. Let's
just make it simpler and use a regular function.
Change-Id: I9627dedaa87873b4b138224afd182e4fd9a55265
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
That commit made QString::toXxx (8-bit) functions use C++11 ref
qualifiers, so we need to match it here.
Change-Id: I45b50464d36f858d012b12e0cb511aae347ddb6f
Reviewed-by: Marc Mutz <marc.mutz@kdab.com>
Reviewed-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Like before, this is taken from the existing QUrl code and is optimized for
ASCII handling (for the same reasons). And like previously, make
QString::fromUtf8 use a stateless version of the codec, which is faster.
There's a small change in behavior in the decoding: we insert a U+FFFD for
each byte that cannot be decoded properly. Previously, it would "eat" all bad
high-bit bytes and replace them all with one single U+FFFD. Either behavior is
allowed by the UTF-8 specifications, even though this new behavior will cause
misalignment in the Bradley Kuhn sample UTF-8 text.
Change-Id: Ib1b1f0b4291293bab345acaf376e00204ed87565
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This should be using qputenv; putenv is deprecated on MSVC.
Change-Id: I7c27cf5f7955624fa3553b7a34ab11c6fae462b8
Reviewed-by: Oliver Wolff <oliver.wolff@digia.com>
Changed the processing of non-character code handling in the UTF8 codec.
Non-character codes are now accepted in QStrings, QUrls and QJson strings.
Unit tests were adapted accordingly.
For more info about non-character codes,
see: http://www.unicode.org/versions/corrigendum9.html
[ChangeLog][QtCore][QUtf8]
UTF-8 now accepts non-character unicode points; these are not replaced
by the replacement character anymore
[ChangeLog][QtCore][QUrl]
QUrl now fully accepts non-character unicode points; they are encoded as
percent characters; they can also be pretty decoded
[ChangeLog][QtCore][QJson]
The Writer and the Parser now fully accept non-character unicode points.
Change-Id: I77cf4f0e6210741eac8082912a0b6118eced4f77
Task-number: QTBUG-33229
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
the function is a trivial wrapper for the QTextCodec one, so there is
little point in explicitly testing it.
Change-Id: I0c4950e5a54b7ffff9ba73a001cedb517497a596
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Codecs registered by creating new QTextCodec instances should be listed
there.
Task-number: QTBUG-32500
Change-Id: I56c00e0d6bbfef55a6cbd571bcf9aa2cf333ef3a
Reviewed-by: David Faure <david.faure@kdab.com>
Return the codec if one was found by QTextCodec::codecForUtfText,
instead of returning the default (UTF-8).
Task-number: QTBUG-31293
Change-Id: I95e3260376c00537006b7fbfdc3df5850e1ba657
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
QTextCodec::codecForHtml currently fails to detect the charset for this
HTML:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=9,chrome=1">
<title>Test</title>
</head>
This patch makes the detection of charsets more flexible, allowing for
the use of the HTML 5 charset attribute as well more terminator characters
("'", and ">").
I also added a *_data function for the unit tests.
Task-number: QTBUG-5451
Change-Id: I69fe4a04582f0d845cbbe9140a86a950fb7dc861
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
Reviewed-by: Denis Dzyubenko <denis@ddenis.info>
Change copyrights and license headers from Nokia to Digia
Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
availableMibs() unconditionally adds 2107 to the list of mibs. The patch
ensures that codecForMib() also knows about this special TSCII codec.
(Note that the autotest only really checks this code path if only this
test case is run. The other tests already fill the internal codec cache
otherwise).
Change-Id: Id987d7cecd5f5700cca75e9b85b37011f8e5c622
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Qt 5.0 beta requires changing the default to the 5.0 API, disabling
the deprecated code. However, tests should test (and often do) the
compatibility API too, so turn it back on.
Task-number: QTBUG-25053
Change-Id: I8129c3ef3cb58541c95a32d083850d9e7f768927
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
Use ICU to do code page conversion instead of the
builtin text codecs. With this QTextCodec simply
becomes a wrapper around ICU's ucnv_* methods.
We only keep our own codecs for UTF-*, ISO-8859-1,
ISO-8859-15 for performance reasons, and for TSCII
and iscii-* because they aren't supported by ICU.
Change-Id: I4fc49eba55cf772b9772c6dac606a47a44346a60
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
These tests have passed a parallel stress test on all three of Linux,
Mac, Windows. Mark them with CONFIG+=parallel_test to allow CI to run
them in parallel, saving time.
Change-Id: I19fd333c3c645a67374ca998f6c8530dd236b0f8
Reviewed-by: Toby Tomkins <toby.tomkins@nokia.com>
This operation should be a no-op anyway, since at this point in time,
the fromAscii and toAscii functions simply call their fromLatin1 and
toLatin1 counterparts.
Task-number: QTBUG-21872
Change-Id: I38f97ad379deafebef02c75d611343ca15640c8a
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: Jędrzej Nowacki <jedrzej.nowacki@nokia.com>
0xfdef-0xfdd0 is definitely 31 and not 15 :)
also fix all copy-pastes of this code (greping for '0xfdd0' helps ;)
Change-Id: I8f3bd4fd9d85f9de066f0f5df378b9188c12bd48
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Denis Dzyubenko <denis.dzyubenko@nokia.com>
This allows me to keep the UTF-8 invalid data in one safe place. I
won't need to copy & paste it.
Change-Id: Icb909d08b7f8d0e1ffbc28e01a0ba0c1fa9dccf0
Reviewed-by: David Faure <faure@kde.org>