The win32 API doesn't give us much choice. _Some_ code pages have
support for returning some error if we pass a specific flag, but not
all of them.
Anyway, since the code pages might not support all that UTF-16 provides,
we can't reasonably make it error out on characters that cannot be
converted.
So, the most reasonable thing we can handle is a unpaired high surrogate
at the end of a string, assume that the rest of the string was fine, and
that the low surrogate will be provided in the next call.
Pick-to: 6.6 6.5
Fixes: QTBUG-118185
Task-number: QTBUG-105105
Change-Id: I1f193c9d8e04bec769d885d32440c759d9dff0c2
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Both to store and to restore.
Without this a 3 or more octet sequence would cause errors or wrong
output. This can be seen with GB 18030.
Pick-to: 6.6 6.5
Fixes: QTBUG-118318
Task-number: QTBUG-105105
Change-Id: Id1f7f5f2fba4633b9f888add2186f4d8d21b7293
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
A few code pages do not support this flag[0]. It's also deprecated[1]
and is what Windows prefers to generate by default. So let's drop it.
[0] https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar
See note at the end for the dwFlags parameter.
[1] It's mentioned in the header files, but not online...
Pick-to: 6.6 6.5
Task-number: QTBUG-118185
Task-number: QTBUG-105105
Change-Id: I798c387170c73a953be874de139868543b2d775e
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Then we can easily test how fromLocal8Bit() and toLocal8Bit() behave
with different code-pages.
Pick-to: 6.6 6.5
Task-number: QTBUG-118318
Task-number: QTBUG-118185
Task-number: QTBUG-105105
Change-Id: Ib1cd3bccd27d598f4c80915557e332befcd96354
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
A text editor commonly wants to display a list of codecs that are
supported. With the introduction of the ICU based QStringConverter, that
list is no longer statically known. So provide the necessary
functionality.
Fixes: QTBUG-109104
Change-Id: I9ecf59aa6bcc6fe65c8872cab84affafec4fa362
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Add the boilerplate standalone test prelude to each test, so that they
can be opened with an IDE without the qt-cmake-standalone-test script,
but directly with qt-cmake or cmake.
Boilerplate was added using the following scripts:
https://git.qt.io/alcroito/cmake_refactor
Manual adjustments were made where the code was inserted in the wrong
location.
Task-number: QTBUG-93020
Change-Id: I28b6d3815c5f43d2c33ea65764f6f3f8f129eaf3
Reviewed-by: Amir Masoud Abdol <amir.abdol@qt.io>
Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
More and more code in Qt uses char16_t instead of QChar, even
QString::Data, so reduce the impedance mismatch with such code and
supply a char16_t overload in parallel to the existing QChar* one.
[ChangeLog][QtCore][QStringDecoder] Added appendToBuffer() overload for
char16_t*, complementing the existing overload taking QChar*.
Task-number: QTBUG-106198
Change-Id: I0cb8ab22c897c14b1318a676f5212cc0cf1b72b7
Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io>
The test needs investigation. Skip it for now so that we can enable
CI for WASM but leave a note to investigate it.
Task-number: QTBUG-109954
Change-Id: I9448312c2c16ec4f31279dcbe4e2213681cabe8a
Reviewed-by: Morten Johan Sørvig <morten.sorvig@qt.io>
With the methods that use helpers from qstring.cpp defined in the
latter.
Change-Id: I11d6b0bfb95efe34e56d33d2ecbfe8f4423a9e6c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
With the introduction of QAnyStringView, overloading based on UTF-8
and Latin-1 is becoming more common. Often, the two overloads can
share the processing backend, because we're only interested in the
US-ASCII subset of each.
But if they can't, we need a faster way to convert L1 into UTF-8 than
going via UTF-16. This is where the new private API comes in.
Eventually, we should have the converse operation, too, to complete
the set of direct conversions between the possible three
QAnyStringView encodings L1/U8/U16, but this direction is easier to
code (there are no error cases) and more immediately useful, so
provide L1->U8 alone for now.
Change-Id: I3f7e1a9c89979d0eb604cb9e42dedf3d514fca2c
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
CMakeLists.txt and .cmake files of significant size
(more than 2 lines according to our check in tst_license.pl)
now have the copyright and license header.
Existing copyright statements remain intact
Task-number: QTBUG-88621
Change-Id: I3b98cdc55ead806ec81ce09af9271f9b95af97fa
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
Now that QStringConverter can handle non UTF encodings through ICU,
add a way to get a decoder for arbitrary HTML code.
Opposed to QStringConverter::encodingForHtml(), this method will
try to create a valid string decoder also for non unicode codecs.
Pick-to: 6.4
Change-Id: I343584da1b114396c744f482d9b433c9cedcc511
Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This adds support for additional codecs to QStringConverter when ICU is
available.
We store the converter in the state (d[0]), and its canonical name in
d[1]. We need the name there, as in the clear function we close the
UConverter, and set the pointer to null. Consequently, the actual
conversion functions might need to re-open the converter again. The
advantage of this approach is that clear is used in the destructor of
State, and with this approach we properly clean up the state.
There is however a disadvantage: The clear function was so far also used
for resetting the state when QStringConverter::resetState . Discarding
the whole Uconverter for that is however rather costly. For that reason
we modify resetState to call a new function, State::reset. For existing
converters, it behaves the same as clear; for the ICU based converter,
we call the more efficient ucnv_reset. Code compiled against Qt 6.4 can
benefit from this more efficient version; code compiled against older Qt
versions will continue to work, as the conversion functions can just
recretate the converter from the name.
We can distinguish between ICU and non-ICU converters by checking if the
UsesIcu flag is set.
QStringConverter::name is changed to return the name stored in d[1]. The
interface of the ICU converter has a dummy name, so code using the old
name function from QT < 6.4 still returns something, namely a message
asking the user to recompile.
The function is moved out of line, as we need to check for the private
ICU feature, and want to avoid having that check in the public header.
As the QStringConverter ctor taking a name now can allocate memory, it
can no longer be noexcept. Removing the noexceptness is safe, as it was
only added after Qt 6.3.
Note that we cannot extend the API consuming or returning Encoding, as
we use Encoding values to index into an array of converter interfaces in
inline API.
Further API to support getting an ICU converter for HTML will be added
in a future commit.
Currently, the code depending on ICU is enabled at compile time if ICU
is found. However, in the future it could be moved into a plugin to
avoid a hard dependency on ICU in Core.
[ChangeLog][Corelib][Text] QStringConverter and API using it now
supports more text codecs if Qt is compiled with ICU support.
Fixes: QTBUG-103375
Change-Id: I7afb92fc68ef994179ebc7a3aa73beebb1386204
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Attempting to use an invalid QStringConverter would so far have resulted
in a crash, as we would dereference the null iface pointer.
Fix this by inserting adequate checks, and ensure that hasError returns
true if we attempt to en/decode with an invalid converter.
Pick-to: 6.2 6.3
Change-Id: Icf74bb88cd8c95685481cc0bd512da99b62f33e6
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Simplifies the test a little.
Pick-to: 6.3
Change-Id: I77c8221eb2824c369feffffd16f0a7fc44215aaf
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
The utf8.txt file was only 21 bytes and contained exactly two non-ASCII
characters. It wasn't very good.
This commit brings back the UTF-8 test rows that existed before commit
18ec53156e deleted tst_Utf8. There's a lot
of overlap with some of the other rows in this test, though.
Pick-to: 6.2 6.3
Change-Id: I77c8221eb2824c369feffffd16f094619b69faef
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
The QLocal8Bit implementation assumes that there's at most one
continuation byte -- that is, that all codecs are either Single or
Double Byte Character Sets (SBCS or DBCS). It appears to be the case for
all Windows default codepages, except for CP_UTF8, which is an opt-in
anyway.
Instead of fixing our codec, let's just use the optimized UTF-8
implementation.
[ChangeLog][Windows] Fixed support for using Qt applications with UTF-8
as the system codepage or by enabling that in the application's
manifest.
Discussed-on: https://lists.qt-project.org/pipermail/interest/2022-May/038241.html
Pick-to: 6.2 6.3
Change-Id: I77c8221eb2824c369feffffd16f0912550a98049
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.
Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
The test was failing because test data was not provided correctly.
That was fixed in 4aea86f5e8 but the
test was never unblacklisted.
Fixes: QTBUG-87418
Pick-to: 6.3 6.2
Change-Id: Ibef7dcfaf59ef50f90d6538a562d03af17f065e0
Reviewed-by: Assam Boudjelthia <assam.boudjelthia@qt.io>
Remove Integrity and Android specific code that explicitly adds
test data to the resource files. qt_internal_add_test functions
implicitly adds test data to resources for Android and Integrity
platforms by default.
Change-Id: Ia1d58755b47442e1953462e38606f70fec262368
Reviewed-by: Assam Boudjelthia <assam.boudjelthia@qt.io>
Reviewed-by: Alexandru Croitor <alexandru.croitor@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
- add test resources to binaries
- link Qt::Gui to tst_qpointer for static build case
Task-number: QTBUG-99123
Pick-to: 6.2 6.3
Change-Id: I311827b9c641eaf9537091b051c15f9fcbcb9f0c
Reviewed-by: Kimmo Ollila <kimmo.ollila@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
For now it only has a trivial test (empty) and an all-surrogate test
(Chakma digits) - but at least now all conversions to and from UTF-16
are tested. In particular, there were previously no UTF-32 tests.
Pick-to: 6.2
Change-Id: I9317928a88b9990530126db80e4756b880a364df
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Remove the qmake project files for most of Qt.
Leave the qmake project files for examples, because we still test those
in the CI to ensure qmake does not regress.
Also leave the qmake project files for utils and other minor parts that
lack CMake project files.
Task-number: QTBUG-88742
Change-Id: I6cdf059e6204816f617f9624f3ea9822703f73cc
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Kai Koehne <kai.koehne@qt.io>
Complete search and replace of QtTest and QtTest/QtTest with QTest, as
QtTest includes the whole module. Replace all such instances with
correct header includes. See Jira task for more discussion.
Fixes: QTBUG-88831
Change-Id: I981cfae18a1cabcabcabee376016b086d9d01f44
Pick-to: 6.0
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
QChar should not be convertible from any integral type except from
char16_t, short and possibly char (since it's a direct superset).
David provided the perfect example:
if (str == 123) { ~~~ }
compiles, with 123 implicitly converted to QChar (str == "123"
was meant instead). But similarly one can construct other
scenarios where QString(123) gets accidentally used (instead of
QString::number(123)), like QString s; s += 123;.
Add a macro to revert to the implicit constructors, for backwards
compatibility.
The breaks are mostly in tests that "abuse" of integers (arithmetic,
etc.). Maybe it's time for user-defined literals for QChar/QString,
but that is left for another commit.
[ChangeLog][Potentially Source-Incompatible Changes][QChar] QChar
constructors from integral types are now by default explicit.
It is recommended to use explicit conversions, QLatin1Char,
QChar::fromUcs4 instead of implicit conversions. The old behavior
can be restored by defining the QT_IMPLICIT_QCHAR_CONSTRUCTION
macro.
Change-Id: I6175f6ab9bcf1956f6f97ab0c9d9d5aaf777296d
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
Reviewed-by: Tor Arne Vestbø <tor.arne.vestbo@qt.io>
We want to re-enable Android tests in QTQAINFRA-3867. However,
many tests are failing already preventing that from happening.
QTBUG-87025 is currently keeping track (links) to all of those
failing tests.
The current proposal is to hide those failing tests, and enable
Android test running in COIN for other tests. After, that try
to fix them one by one, and at the same time we can make sure
no more failing tests go unnoticed.
Task-number: QTBUG-87025
Change-Id: Ic1fe9fdd167cbcfd99efce9a09c69c344a36bbe4
Reviewed-by: Alexandru Croitor <alexandru.croitor@qt.io>
Try to get rid of APIs that use raw 'const {char, QChar} *, length'
pairs. Instead, use QByteArrayView or QStringView.
As QStringConverter is a new class, simply change the API to what we'd like
to have. Also adjust hidden API in QStringBuilder and friends.
Change-Id: I897d47f63a7b965f5574a1e51da64147f9e981f6
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Modify special case locations to use the new API as well.
Clean up some stale .prev files that are not needed anymore.
Clean up some project files that are not used anymore.
Task-number: QTBUG-86815
Change-Id: I9947da921f98686023c6bb053dfcc101851276b5
Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Remove around 1000 compiler warnings about missing overrides
in our auto tests.
This significantly reduce the compiler warning noise in our auto
tests, so that one can actually better see the real problems
inbetween.
Change-Id: Id0c04dba43fcaf55d8cd2b5c6697358857c31bf9
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
QString's fromUtf16() prefers char16_t data over ushort.
Change-Id: Ib20c5afa09ceabb4e91fe434b6a057edb4739a53
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
The tests are testing deprecated functionality, which we
still want to test.
Change-Id: Iad6ed35800896170c17fe019c7a6ecda22398ac3
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
The functional style interface is nice, but does feel alien in some
contexts, so better also have explicit encode and decode methods.
Change-Id: Ic07ced15f65cdb3a7f1cf044041e341d2ef87f79
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
Move pre/and post condition handling out of the main loop
to make that one as fast as possible.
Remove special handling of a corner case when the input length
is zero, where the utf8 decoder did something else than all
other decoders.
Change-Id: I94992767ea15405b38f7953adadaa6ff98b20b6f
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Document QStringConverter, QStringDecoder and QStringEncoder.
In addition, do some touches to the API, renaming one enum value,
add a flags argument to one constructor and make some members private.
Change-Id: I8f99dc3d98fb8860cf6fa46301e34b7eb400511b
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This is a replacement for Qt::codecForHtml().
Change-Id: I31f03518fd9c70507cbd210a8bcf405b6a0106b1
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add method that tries to determine the encoding of the data
from an initial byte order mark.
Change-Id: I348c51a3d4db9b434af53359b739a7e17acfc760
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add static methods that allow converting between a name for an
encoding and the Encoding enum.
Change-Id: I12bc503cf757ea31d3ca8d5e1f1216efddcb16d4
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add a constructor, that allows constructing a string converter by
name. This is required in some cases and also makes it possible to
(in the future) extend the API to 3rd party encodings.
Also add a name() accessor.
Change-Id: I606d6ce9405ee967f76197b803615e27c5b001cf
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Feed the data one by one to the encoder or decoder to
verify that the handling of incremental decoding is
correct.
Change-Id: I565e4f1872e00859026334f7662b6778772e159d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
IgnoreHeader was a rather badly defined enum, in addition the
utf8 and utf16 codecs where handling BOMs somewhat different
for stateless decoding.
Fix this by introducing explicit flags for writing a bom when
encoding and not skipping the initial bom when decoding.
Source compatibility for QTextCodec is done with a couple of
static constexpr variables.
Change-Id: I0b2d94f84c937cec1e0494c16ef448c00382691d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>