Replacement methods do now exist in QRegExp, or
for QRegularExpression when porting to it.
Remove all autotests associated with the old methods.
Change-Id: I3ff1e0da4b53adb64d5a48a30aecd8b960f5e633
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The prepares for the removal of those methods from QString and
QStringList. The new methods in QRegExp are left as a porting help.
Change-Id: Ieffa33a79caf53b83029e9b070c4eb5cadca1418
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
- don't force the deprecation sentence into a separate memory location
(gives the compiler more leeway in how to lay stuff out)
- split the switch (will be useful when extending)
- fix a spelling mistake in one of the messages
Change-Id: Ied137dc8eee7047177983660e1a6776a0bf46bde
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Move pre/and post condition handling out of the main loop
to make that one as fast as possible.
Remove special handling of a corner case when the input length
is zero, where the utf8 decoder did something else than all
other decoders.
Change-Id: I94992767ea15405b38f7953adadaa6ff98b20b6f
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Just so we can get this cleaned up as well and remove it from
Qt Core.
Change-Id: I2b5b821b039ce2c024ec3cb7338a1a9becdd2157
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
There's no real dependency to QTextCodec in those files anymore.
Change-Id: Ifaf19ab554fd108fa26095db4e2bd4a3e9ea427f
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
And refactor the code a bit to not convert to unicode twice and
use the mime database instead of Qt::mightBeRichText().
Change-Id: I56f9a732c8ad593e7f050eaad401be536bdf6f98
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: David Faure <david.faure@kdab.com>
This should all be utf-8 anyway, but right now simply exchange the
text codec with a string converter.
Change-Id: If0a230776824598b6378bb402d692c941e371104
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Use QStringConverter instead to convert HTML to a QString. This limits
the amount of supported encodings to UTF based encodings and Latin1.
This is ok, as anything but utf8 is strongly discouraged by the HTML
spec anyway, and the support we have with this change does cover ~98% of
all real world HTML.
Change-Id: Ia610d327624b083c23d3c604aee70517a4a5eb6a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
We already assumed that 8bit data is utf8 encoded in all cases
but for HTML. Handle HTML through QStringDecoder now. This
removes support for encodings other than UTF based one and Latin1.
This is ok, as HTML should nowadays always be encoded in utf8 as
well (anything else is strongly discouraged by the HTML spec). In
addition, utf-8 and latin1 together seem to cover ~98% of all HTML
data.
Change-Id: I7e7165edd38cfac395faf72681e5715b6d014c14
Reviewed-by: David Faure <david.faure@kdab.com>
Document QStringConverter, QStringDecoder and QStringEncoder.
In addition, do some touches to the API, renaming one enum value,
add a flags argument to one constructor and make some members private.
Change-Id: I8f99dc3d98fb8860cf6fa46301e34b7eb400511b
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Optimize the common pattern of "str += decode(data);"
and "bytearray += encode(string);"
Change-Id: I1da621fa1ad400f23c9718ecf8ae64c00d9d459c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Use QStringConverter instead. Also change the default
encoding of QTextStream to utf8.
Change-Id: I30682e75fe0462d1a937539f773640c83a2d82e1
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
As a first step add setEncoding/encoding() methods that use the
QStringConverter::Encoding enum, and port all uses of setCodec()/
codec() over to the new API.
Internally QTextStream still uses QTextCodec, this will be ported
over to QStringConverter in a follow-up change.
Change-Id: Icd764cf47b449b57f4ebd010c2dad89e6717d6c0
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Use QStringDecoder to convert the data instead.
[ChangeLog][Important Behavior Changes] QXmlStreamWriter
always encodes XML in UTF-8, and QXmlStreamReader is limited to
XML files encoded in Unicode encodings (UTF-8, UTF-16 and UTF-32)
and latin1 (ISO-8859-1).
Change-Id: I10da612b951f4312ddaf63a89587697777dd8dc1
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
This is a replacement for Qt::codecForHtml().
Change-Id: I31f03518fd9c70507cbd210a8bcf405b6a0106b1
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add method that tries to determine the encoding of the data
from an initial byte order mark.
Change-Id: I348c51a3d4db9b434af53359b739a7e17acfc760
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add static methods that allow converting between a name for an
encoding and the Encoding enum.
Change-Id: I12bc503cf757ea31d3ca8d5e1f1216efddcb16d4
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add a constructor, that allows constructing a string converter by
name. This is required in some cases and also makes it possible to
(in the future) extend the API to 3rd party encodings.
Also add a name() accessor.
Change-Id: I606d6ce9405ee967f76197b803615e27c5b001cf
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Always encodee INI files as utf-8 in Qt6. This is mostly backwards
compatible, as old ini files would encode all non ascii characters.
[ChangeLog][Important behavioral changes] QSettings will now always
encode INI files as utf-8 (and the iniCodec/setIniCode methods are
removed). This is a change from Qt 5 and earlier, where QSettings would
by default escape all non ascii characters. The behavior is equivalent to
what you got in Qt5 by setting a utf-8 iniCodec on the settings object.
Settings files written in Qt 5 will still be readable in Qt 6 (unless
an iniCodec different from utf-8 was used), but to read Qt6 based ini
files in Qt 5 applications, setting the iniCodec to utf-8 is required.
Change-Id: Ic7dffcca17779bd5e3dae50d42ce633170289f6c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
As we want to move text codecs out of Qt Core, disentangle the
dependency, but moving the global codec data into qtextcodec.*.
Change-Id: Id7498423c7c4f9f42fd00c450947305d2af8c4be
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
Feed the data one by one to the encoder or decoder to
verify that the handling of incremental decoding is
correct.
Change-Id: I565e4f1872e00859026334f7662b6778772e159d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Cleanup the implementation and improve performance by
handling the first char outside of the main loop.
Also avoid one copy of the data when using QStringConverter.
Change-Id: Ie698e62de1864352612a4dddc907cb139e7e6407
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Implement proper state handling, and avoid a copy when using
it through QStringConverter.
Change-Id: I201fe966601c424c337e452e359a2e71f76354ad
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Clean up the method, and refactor it so we can avoid one
copy of the data when using QStringConverter.
Make the conversion to unicode more by avoiding conditions in
the inner loop and doing a memcpy if endianness matches.
Change-Id: I869daf861f886d69b67a1b223ac2238498b609ac
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
And optimize the method so we can avoid a copy of
the data.
Change-Id: Ic267150db80358dbc4010bb1db2af5c0eb97dc65
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Make sure that the conversion methods always get a valid state. This is
already the ecase then using the new QStringConverter API, ensure the
old QTextCodec API also passes in a valid state.
This helps simplify the logic inside those methods.
Change-Id: I1945e98cdefd46bf1427e11984337f1d62abcaa2
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
IgnoreHeader was a rather badly defined enum, in addition the
utf8 and utf16 codecs where handling BOMs somewhat different
for stateless decoding.
Fix this by introducing explicit flags for writing a bom when
encoding and not skipping the initial bom when decoding.
Source compatibility for QTextCodec is done with a couple of
static constexpr variables.
Change-Id: I0b2d94f84c937cec1e0494c16ef448c00382691d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The way the new converters are structured allows us to
use them together with QStringBuilder. Like this, we
can avoid additional and unnecessary copies of the
data.
Change-Id: I168da3860537fe81a1eb62632e4d9a6680f86af1
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Assume Unix systems are utf-8 based nowadays. glib has been
assuming this for quite some time already, and all Linux and BSD
systems shipped in the last 10 years assume utf-8 for 8-bit strings.
Utf-8 is also the encoding used by macOS and QNX since a very long time.
File systems where file names are not encoded in utf-8 can usually be
translated transparently to utf8 by specifying appropriate mount
options.
Change-Id: I1970496db24e59dee8efb79ba025355a3ce87387
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Remove support for setting a codec different from UTF-8
for writing XML files.
All XML readers today can handle UTF-8, and there is no
reason anymore to write a file in a different encoding.
Change-Id: If89fb2d2474a2b55644d9bed7473c11ad91033eb
Reviewed-by: Simon Hausmann <hausmann@gmail.com>
They are still not copyable, but can be moved.
Change-Id: Id66e35be4ecdaa781ecb9212d646d224b1767913
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Latin1 is the only non Unicode encoding that is still being used
to some extent. Current web site statistics show that it is
being used in ~2% of all web sites. An additional 1% of web sites
use Windows1251 (which is almost the same as latin1).
As it's trivial to support this encoding, we keep it supported
in QStringConverter.
Change-Id: I0eff53a490b6c43d3e474107e7823be245d1715a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Local8Bit is always UTF-8 except for Windows platforms.
Also add a Locale encoding to QStringConverter.
Change-Id: I8d729931fd4c1d7fc6857696b6442a44def3fd9d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Separate them from the qutfcodec, so that the codec
can later on be moved out of Qt Core.
Fix the QUtf methods to take qsizetype instead of int
for length arguments.
This also makes it possible to not build QTextCodec into
the bootstrap lib anymore.
Change-Id: I0b4f83139d61b19c651520a2f3a5012aa7e85cb8
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The new QStringEncoder and QStringDecoder classes
(with a common QStringConverter base class) are
there to replace QTextCodec in Qt 6.
It currently uses a trivial wrapper around the utf
encoding functionality.
Added some autotests, mostly copied from the text codec
tests.
Change-Id: Ib6eeee55fba918b9424be244cbda9dfd5096f7eb
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
When a QTextDocument is laid out, this is done in "chunks" to keep
programs responsive. During the layout process, blocks are split into
lines. When a full (re)layout is interrupted (e.g. because a
QHighlighter changes some formats before the layout for all chunks is
completed), later chunks are skipped. This results in invalid data
(e.g., blocks not split into lines).
This change ensures that full layout runs of the root frame are
completed even after interruptions.
Fixes: QTBUG-20354
Change-Id: I041c73a532a5abe74d577ca49810191b5594dca2
Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
Recent version of xdg-desktop-portal got support for opening directories
through the portal, which means we no longer need to rely on opening the
dialog inside sandbox, hoping we have permissions to user directories.
[ChangeLog][Linux] QFileDialog will open directories through the portal
if required version of xdg-desktop-portal is running on the system.
Change-Id: Ifc9035e268f1cc8d9d6a93480e651e0d9e1e9929
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Also Extract Method isProhibitedOutput(), which, in order to preserve
history, is placed in the unnamed namespace and violates the
indentation rules. But this code is delicate enough to be left
undisturbed. At least for now.
By said Extract Method, we can static_assert on the invalidity of the
QStringIterator::next() argument, which we assume will be rejected, so
that we continue to reject malformed surrogate pairs.
Also fix a comment that suggested the function would actively remove
("strip") malformed content, when in fact it only detects it.
Change-Id: I4185cbac71fb147e2f2036dbaf052af20bd1003f
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The code did not limit the length of hex and octal escape sequences,
but used an int as the accumulator, which causes UB on overflow.
Due to the use of the QChar(int) constructor when appending escapeVal,
only the lowest 16 bit of the value were appended to the result
string. An test case encoding this behavior explicitly suggests this
is intended behavior.
It therefore suffices to use an unsigned 16-bit value as the
accumulator (unsigned, because that doesn't cause UB on overflow, 16
bits, because that's all we care for).
For future-proofing, use char16_t as the accumulator.
Pick-to: 5.15
Change-Id: I07e7ebf1f312276b2bbcb08e4360c66a3b9522ca
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>