The name of is_utf8_encoded type seems to imply that QByteArray and
const char * contain UTF-8 encoded data. This is not always the case.
When used with QByteArray API, those types are handled as containing
ASCII-only. It's only when used with APIs of other classes (QString,
QLatin1String) they are handled as UTF-8. This renames the check to
is_bytearray_like_v and clarifies its usage. No need to handle
QUtf8StringView this way because it works just fine with the current
testcases.
While at it, also remove unused is_latin1_encoded trait.
Task-number: QTBUG-92021
Change-Id: I44e40cf3c0dd07e5b3cf1e47ff7a04f1c548aa97
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The testcase was disabled with no explanation in
2766322de3 but seems to work fine.
Change-Id: Ibc22a4ffb756604e22c1f2cf1165919c1d7f1212
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
When a needle has length 1 (because it's a QChar/char16_t, or because
it's a string-like of length 1) then an ad-hoc search algorithm is
used. This algorithm had a off-by-one, by not allowing to match at
the last position of a haystack (in case `from` was `haystack.size()`).
That is inconsistent with the general search of substring needles
(and what QByteArray does). Fix that case and amend wrong tests.
This in turn unveiled the fact that the algorithm was unable to cope
with 0-length haystacks (whops), so fix that as well. Drive-by, add a
similar fix for QByteArray.
Amends 6cee204d56.
Pick-to: 6.2
Change-Id: I6b3effc4ecd74bcbcd33dd2e550da2df7bf05ae3
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
When trying to fix 0-length matches at the end of a QString,
be83ff65c4 actually introduced a
regression due to how lastIndexOf interprets its `from` parameter.
The "established" (=legacy) interpretation of a negative `from` is that
it is supposed to indicate that we want the last match at offset `from +
size()`. With the default from of -1, that means we want a match
starting at most at position `size() - 1` inclusive, i.e. *at* the last
position in the string. The aforementioned commit changed that, by
allowing a match at position `size()` instead, and this behavioral
change broke code.
The problem the commit tried to fix was that empty matches *are* allowed
to happen at position size(): the last match of regexp // inside the
string "test" is indeed at position 4 (the regexp matches 5 times).
Changing the meaning of negative from to include that last position (in
general: to include position `from+size()+1` as the last valid matching
position, in case of a negative `from`) has unfortunately broken client
code. Therefore, we need to revert it. This patch does that, adapting
the tests as necessary (drive-by: a broken #undef is removed).
Reverting the patch however is not sufficient. What we are facing here
is an historical API mistake that forces the default `from` (-1) to
*skip* the truly last possible match; the mistake is that thre is simply
no way to pass a negative `from` and obtain that match. This means that
the revert will now cause code like this:
str.lastIndexOf(QRE("")); // `from` defaulted to -1
NOT to return str.size(), which is counter-intuitive and wrong. Other
APIs expose this inconsistency: for instance, using
QRegularExpressionIterator would actually yield a last match at position
str.size(). Similarly, using QString::count would return `str.size()+1`.
Note that, in general, it's still possible for clients to call
str.lastIndexOf(~~~, str.size())
to get the "truly last" match.
This patch also tries to fix this case ("have our cake and eat it").
First and foremost, a couple of bugs in QByteArray and QString code are
fixed (when dealing with 0-length needles).
Second, a lastIndexOf overload is added. One overload is the "legacy"
one, that will honor the pre-existing semantics of negative `from`. The
new overload does NOT take a `from` parameter at all, and will actually
match from the truly end (by simply calling `lastIndexOf(~~~, size())`
internally).
These overloads are offered for all the existing lastIndexOf()
overloads, not only the ones taking QRE.
This means that code simply using `lastIndexOf` without any `from`
parameter get the "correct" behavior for 0-length matches, and code that
specifies one gets the legacy behavior. Matches of length > 0 are not
affected anyways, as they can't match at position size().
[ChangeLog][Important Behavior Changes] A regression in the behavior of
the lastIndexOf() function on text-related containers and views
(QString, QStringView, QByteArray, QByteArrayView, QLatin1String) has
been fixed, and the behavior made consistent and more in line with
user expectations. When lastIndexOf() is invoked with a negative `from`
position, the last match has now to start at the last character in the
container/view (before, it was at the position *past* the last
character). This makes a difference when using lastIndexOf() with a
needle that has 0 length (for instance an empty string, a regular
expression that can match 0 characters, and so on); any other case is
unaffected. To retrieve the "truly last" match, one can pass a
positive `from` offset to lastIndexOf() (basically, pass `size()` as the
`from` parameter). To make calls such as `text.lastIndexOf(~~~);`, that
do not pass any `from` parameter, behave properly, a new lastIndexOf()
overload has been added to all the text containers/views. This overload
does not take a `from` parameter at all, and will search starting from
one character past the end of the text, therefore returning a correct
result when used with needles that may yield 0-length matches. Client
code may need to be recompiled in order to use this new overload.
Conversely, client code that needs to skip the "truly last" match now
needs to pass -1 as the `from` parameter instead of relying on the
default.
Change-Id: I5e92bdcf1a57c2c3cca97b6adccf0883d00a92e5
Fixes: QTBUG-94215
Pick-to: 6.2
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
- QString/QStringView overloads were designed to be compatible for all
possible argument types, so check that it stays that way
- QString/QAnyStringView overloads have several known ambiguities that
we cannot and don't want to fix, because it would make
QAnyStringView less versatile, but at one should at least be able to
overload QString and weak-QAnyStringView.
Change-Id: I5e5ae3c96060c93bfe070f6c19213328dae9c5f9
Reviewed-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Small change needed to make QString_char16 and QString_QChar return -1
in this case, but other combinations already returns -1.
[ChangeLog][QtCore][Behavior Change] QString::indexOf(QChar) and
QString::indexOf(char16_t) now treat a negative start-position, from,
bigger than the string's size as invalid. It previously
clipped such start-positions to the start of the string, inconsistently
with other QString indexOf overloads.
Change-Id: Ic56c8a558bf40a94845c649647db569892d4df02
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Without the overloads using QLatin1Char in QL1S member functions results
in the QL1Char being converted to QChar and QL1String being converted to
QString.
Change-Id: Ic19545539a207f025a6293f0b2d929de475dc166
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Remove the qmake project files for most of Qt.
Leave the qmake project files for examples, because we still test those
in the CI to ensure qmake does not regress.
Also leave the qmake project files for utils and other minor parts that
lack CMake project files.
Task-number: QTBUG-88742
Change-Id: I6cdf059e6204816f617f9624f3ea9822703f73cc
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Kai Koehne <kai.koehne@qt.io>
Coverage tests revealed that
QAnyStringView::compare(QAnyStringView, QAnyStringView, CaseSensitivity)
was not tested in our unit tests. This patch adds a test for this.
Pick-to: 6.0
Change-Id: Id8e0d8af87e7e7ab192fb7554a278ddbb890fb14
Reviewed-by: Andrei Golubev <andrei.golubev@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Also add the very same operators to the QBasicUtf8StringView
class to overcome the compiler issues seen on gcc.
Fixes: QTBUG-86481
Change-Id: I12484455ebd3b7b38d4ad67c38977d76f9b3ddfa
Reviewed-by: Andrei Golubev <andrei.golubev@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Modify special case locations to use the new API as well.
Clean up some stale .prev files that are not needed anymore.
Clean up some project files that are not used anymore.
Task-number: QTBUG-86815
Change-Id: I9947da921f98686023c6bb053dfcc101851276b5
Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
We need to add these two classes at the same time, because
QAnyStringView makes all QUtf8StringView relational operators moot. We
might want to add some later, esp. for UTF-8/UTf-8 comparisons, to
avoid the pessimization that we can't early-out on size() mismatch in
QAnyStringView equality operators, but that's an optimization, not a
correctness issue, and can be fixed in a source-compatible way even
after Qt 6 is released.
To deal with the char8_t problem in C++20, make QUtf8StringView a
class template out of which two UTF-8 views can be instantiated: the
Qt 7 version, which depends on C++20 char8_t as value_type, and the Qt
6 version where value_type is a char. Use inline namespaces to map the
QUtf8StringView identifier to one or the other, depending on the C++
version used to compile the user code. The inline namespace names must
needs be a bit ugly, as their inline'ness depends on __cpp_char8_t. If
we simply used q_v1/q_v2 we'd be blocking these names for Qt inline
namespaces forever, because it's likely that inline'ness of other
users of inline namespaces in Qt depends on things other than
__cpp_char8_t. While inline'ness of namespaces is, theoretically
speaking, a compile-time-only property, at least Clang warns about
mixed use of inline on a given namespace, so we need to bite the
bullet here. This is also the reason for the QT_BEGIN_..._NAMESPACE
macros: GCC is ok with the first declaration making a namespace
inline, while Clang warns upon re-opening an inline namespace as a
non-inline one.
[ChangeLog][QtCore][QUtf8StringView] New class.
[ChangeLog][QtCore][QAnyStringView] New class.
Change-Id: Ia7179760fca0e0b67d52f5accb0a62e389b17913
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
QString still has the overloads of relational operators taking
QByteArray. Add back QByteArray's relational operators taking
QString for symmetry. See also the comments of
d7ccd8cb45 for more details.
[ChangeLog][EDITORIAL] Remove the changelog about QString/QByteArray
operators being removed. They're back.
Change-Id: I22c95e727285cf8a5ef79b3a4f9d45cb66319252
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reduce the scopes so that also the result of 1-arg-sliced() can be
called 'sliced'.
Change-Id: Ie156f76838f8650d6926d3c198007aaf12f90734
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
QLatin1String::mid() etc were changed from narrow to wide contract,
but the narrow-contract replacements weren't added. This blocks using
the narrow-contract functions in QStringTokenizer.
As a drive-by, Q_REQUIRED_RESULT -> [[nodiscard]] and Q_DECL_CONSTEXPR
-> constexpr. Also centralize most Q_ASSERT()s in a single function,
verify(), in an attempt to reduce the amount of string data generated
from the asserts in assertive builds.
[ChangeLog][QtCore][QLatin1String] Added from(), sliced(), first(n),
last(n) functions.
[ChangeLog][QtCore][QLatin1String] size_type/size() is now qsizetype
(was: int). This makes QLatin1String(ptr, 0) ambiguous now between the
(ptr, ptr) and (ptr, qsizetype) constructors.
Change-Id: Ie195f66ae1974eb0752c058aa9f3b0853ed92477
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Export some private functions from QUtf8 to resolve
undefined symbols in Qt5Compat after moving QStringRef.
Task-number: QTBUG-84437
Change-Id: I9046dcb14ed520d8868a511d79da6e721e26f72b
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Both normal and relaxed constexpr are required by our new minimum of
C++17.
Change-Id: Ic028b88a2e7a6cb7d5925f3133b9d54859a81744
Reviewed-by: Sona Kurazyan <sona.kurazyan@qt.io>
After API discussions, agreement was that from(n) is a bad name
for the method. Let's go with sliced(n) instead.
Change-Id: I0338cc150148a5008c3ee72bd8fda96fb93e9c35
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
[ChangeLog][QtCore][QByteArray] Remove method overloads taking
QString as argument, all of which were equivalent to passing the
toUtf8() of the string instead.
Change-Id: I9251733a9b3711153b2faddbbc907672a7cba190
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
The recently-added slice() method has the problem that it's a noun
as well as a verb in the imperative. Like std::vector::empty, which
is both an adjective and a verb in the imperative, this may cause
confusion as to what the function does. Using the passive voice form
of slice(), sliced(), removes the confusion. While it can be read as
an adjective, too, that doesn't change the meaning compared to the
verb form.
Change-Id: If0aa01acb6cf5dd5eafa8226e3ea7f7a0c9da4f1
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
QString and QStringRef did bounds checking for left/right/mid, whereas
QStringView was asserting on out of bounds.
Relax the behavior for QStringView and do bounds checking on pos/n
as well. This removes a source of potentially hidden errors when porting
from QStringRef (or QString) to QStringView.
Unfortunately, one difference remains, where QByteArray::left/right()
behaves differently (and somewhat more sane) than QString and
QStringRef. We're keeping the difference here, as it has been around
for many years.
Mark left/right/mid as obsolete and to be replaced with the new
first/last/slice methods.
Change-Id: I18c203799ba78c928a4610a6038089f27696c22e
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
These methods are scheduled as a replacement for left/right/mid()
in Qt 6 with a consistent, narrow contract that does not allow
out of bounds indices, and therefore does permit faster
implementations.
Change-Id: Iabf22e8d4f3fef3c5e69a17f103e6cddebe420b1
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Previously it handled Latin-1, which made it incompatible with UTF-8,
which is now our preferred 8-bit encoding. For Qt6 it is limited to
ASCII. Adjusted tests to match. QLatin1String::compare() turned out
to be relying on qstrnicmp()'s Latin-1 handling.
Removed some spurious Q_UNLIKELY()s and tidied up code a little in the
process.
[ChangeLog][QtCore][Important Behavior Changes] Encoding-dependent
features of QByteArrray are now limited to ASCII, where previously
they worked for the whole of Latin-1. This affects case-insensitive
comparison, notably including qstricmp() and qstrnicmp(), and
case-transforming functions.
Fixes: QTBUG-84323
Change-Id: I2925d9908f8654599195a2860847b17083911b41
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
This class is designed as C++20-style generator / lazy sequence, and
the new return value of QString{,View}::tokenize().
It thus is more similar to a hand-coded loop around indexOf() than
QString::split(), which returns a container (the filling of which
allocates memory).
The template arguments of QStringTokenizer intricately depend on the
arguments with which it is constructed, so QStringTokenizer cannot be used
directly without C++17 CTAD. To work around this issue, add a factory
function, qTokenize().
LATER:
- ~Optimize QLatin1String needles (avoid repeated L1->UTF16 conversion)~
(out of scope for QStringTokenizer, should be solved in the respective
indexOf())
- Keep per-instantiation state:
* Boyer-Moore table
[ChangeLog][QtCore][QStringTokenizer] New class.
[ChangeLog][QtCore][qTokenize] New function.
Change-Id: I7a7a02e9175cdd3887778f29f2f91933329be759
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Make the API more symmetric with regards to both QString and QStringRef.
Change-Id: Ia67c53ba708f6c33874d1a127de8e2857ad9b5b8
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
Make the API more symmetric with regards to both QString and QStringRef.
Having this available helps making QStringView more of a drop-in
replacement for QStringRef. QStringRef is planned to get removed in Qt 6.
Change-Id: Ife036c0b55970078f42e1335442ff9ee5f4a2f0d
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
No surprises, as char16_t is transparently handled by QChar overloads.
Ok, one surprise: we seem to have QChar <> QByteArray relational
operators, but they don't work for char16_t. Probably members of
QChar, so LHS implicit conversions are disabled. Didn't investigate,
because it needs to be fixed at some point anyway, but that point is
not now.
Change-Id: I74e1c9bdd168e6480e18d7d86c1f13412e718a32
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
... to not fold QChar tests into QString ones.
This is needed for adding char16_t tests.
Change-Id: I2507d7d68a39ff96cf033eadde10e383dc976dda
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
In QByteArray, they were just not marked as such.
In QString and QStringRef, the implicit conversion from QChar to
QString would destroy it. Add a QChar overload, delegating to
QStringView.
Added docs for the new overloads, copying from the nearest neighbor so
as to not look out of place. All string classes use different wording
for these functions. A cleanup of this state of affairs is out of the
scope of this patch.
Change-Id: I0b7b1d037aa229bcaf29b793841a18caf977d66b
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
There were a few surprises:
- QByteArray::compare() are missing noexcept (will add)
- ibid., called with non-ascii content and CaseInsensitive fails
(this was discussed on the ML, with tentative agreement that
it's a feature, not a bug; waiting for QUtf8String(View) for a
fix, then).
- As was the case when we did this exercise with the relational
operators, QString(Ref)/QChar is not noexcept (will fix)
These have been QEXPECT_FAIL'ed.
Not much of the cartesian product is implemented at all, yet. These
have been #ifdef'ed with NOT_YET_IMPLEMENTED to see what's still
missing.
Change-Id: I7d9b21e292b98f980aacdc6248e88188f7472ba2
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>