Move the QRegExp porting docs into the QRegExp class documentation

It used to live in QRegularExpression, but as QRegExp gets removed from
Qt Core, the better place for it is to live in the QRegExp docs.

Also marked QRegExp as deprecated in the docs.

Change-Id: Id5b0e3040e4d46f5d806022b58fbd5b5efd58911
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
This commit is contained in:
Lars Knoll 2020-04-03 20:57:06 +02:00
parent b6145bfcc5
commit 1b65098a20
4 changed files with 142 additions and 138 deletions

View File

@ -232,3 +232,12 @@ s2 = QRegExp::escape("f(x)"); // s2 == "f\\(x\\)"
QRegExp rx("(" + QRegExp::escape(name) + QRegExp rx("(" + QRegExp::escape(name) +
"|" + QRegExp::escape(alias) + ")"); "|" + QRegExp::escape(alias) + ")");
//! [20] //! [20]
{
//! [21]
QString p("a .*|pattern");
// re matches exactly the pattern string p
QRegularExpression re(QRegularExpression::anchoredPattern(p));
//! [21]
}

View File

@ -284,15 +284,6 @@ if (!invalidRe.isValid()) {
//! [23] //! [23]
} }
{
//! [24]
QString p("a .*|pattern");
// re matches exactly the pattern string p
QRegularExpression re(QRegularExpression::anchoredPattern(p));
//! [24]
}
{ {
//! [26] //! [26]
QString escaped = QRegularExpression::escape("a(x) = f(x) + g(x)"); QString escaped = QRegularExpression::escape("a(x) = f(x) + g(x)");

View File

@ -76,6 +76,7 @@ QT_BEGIN_NAMESPACE
/*! /*!
\class QRegExp \class QRegExp
\inmodule QtCore \inmodule QtCore
\obsolete Use QRegularExpression instead
\reentrant \reentrant
\brief The QRegExp class provides pattern matching using regular expressions. \brief The QRegExp class provides pattern matching using regular expressions.
@ -84,6 +85,10 @@ QT_BEGIN_NAMESPACE
\keyword regular expression \keyword regular expression
This class is deprecated in Qt 6. Please use QRegularExpression instead
for all new code. For guidelines on porting old code from QRegExp to
QRegularExpression, see {Porting to QRegularExpression}
A regular expression, or "regexp", is a pattern for matching A regular expression, or "regexp", is a pattern for matching
substrings in a text. This is useful in many contexts, e.g., substrings in a text. This is useful in many contexts, e.g.,
@ -688,6 +693,133 @@ QT_BEGIN_NAMESPACE
\sa QString, QStringList, QSortFilterProxyModel, \sa QString, QStringList, QSortFilterProxyModel,
{tools/regexp}{Regular Expression Example} {tools/regexp}{Regular Expression Example}
\section1 Porting to QRegularExpression
The QRegularExpression class introduced in Qt 5 is a big improvement upon
QRegExp, in terms of APIs offered, supported pattern syntax and speed of
execution. The biggest difference is that QRegularExpression simply holds a
regular expression, and it's \e{not} modified when a match is requested.
Instead, a QRegularExpressionMatch object is returned, in order to check
the result of a match and extract the captured substring. The same applies
with global matching and QRegularExpressionMatchIterator.
Other differences are outlined below.
\section2 Different pattern syntax
Porting a regular expression from QRegExp to QRegularExpression may require
changes to the pattern itself.
In certain scenarios, QRegExp was too lenient and accepted patterns that
are simply invalid when using QRegularExpression. These are somehow easy
to detect, because the QRegularExpression objects built with these patterns
are not valid (cf. QRegularExpression::isValid()).
In other cases, a pattern ported from QRegExp to QRegularExpression may
silently change semantics. Therefore, it is necessary to review the
patterns used. The most notable cases of silent incompatibility are:
\list
\li Curly braces are needed in order to use a hexadecimal escape like
\c{\xHHHH} with more than 2 digits. A pattern like \c{\x2022} neeeds to
be ported to \c{\x{2022}}, or it will match a space (\c{0x20}) followed
by the string \c{"22"}. In general, it is highly recommended to always use
curly braces with the \c{\x} escape, no matter the amount of digits
specified.
\li A 0-to-n quantification like \c{{,n}} needs to be ported to \c{{0,n}} to
preserve semantics. Otherwise, a pattern such as \c{\d{,3}} would
actually match a digit followed by the exact string \c{"{,3}"}.
\li QRegExp by default does Unicode-aware matching, while
QRegularExpression requires a separate option; see below for more details.
\endlist
\section2 Porting from QRegExp::exactMatch()
QRegExp::exactMatch() in Qt 4 served two purposes: it exactly matched
a regular expression against a subject string, and it implemented partial
matching.
\section3 Porting from QRegExp's Exact Matching
Exact matching indicates whether the regular expression matches the entire
subject string. For example, the classes yield on the subject string \c{"abc123"}:
\table
\header \li \li QRegExp::exactMatch() \li QRegularExpressionMatch::hasMatch()
\row \li \c{"\\d+"} \li \b false \li \b true
\row \li \c{"[a-z]+\\d+"} \li \b true \li \b true
\endtable
Exact matching is not reflected in QRegularExpression. If you want
to be sure that the subject string matches the regular expression
exactly, you can wrap the pattern using the QRegularExpression::anchoredPattern()
function:
\snippet code/src_corelib_tools_qregexp.cpp 21
\section3 Porting from QRegExp's Partial Matching
When using QRegExp::exactMatch(), if an exact match was not found, one
could still find out how much of the subject string was matched by the
regular expression by calling QRegExp::matchedLength(). If the returned length
was equal to the subject string's length, then one could conclude that a partial
match was found.
QRegularExpression supports partial matching explicitly by means of the
appropriate MatchType.
\section2 Global matching
Due to limitations of the QRegExp API it was impossible to implement global
matching correctly (that is, like Perl does). In particular, patterns that
can match 0 characters (like \c{"a*"}) are problematic.
QRegularExpression::globalMatch() implements Perl global match correctly, and
the returned iterator can be used to examine each result.
\section2 Unicode properties support
When using QRegExp, character classes such as \c{\w}, \c{\d}, etc. match
characters with the corresponding Unicode property: for instance, \c{\d}
matches any character with the Unicode Nd (decimal digit) property.
Those character classes only match ASCII characters by default when using
QRegularExpression: for instance, \c{\d} matches exactly a character in the
\c{0-9} ASCII range. It is possible to change this behavior by using the
UseUnicodePropertiesOption pattern option.
\section2 Wildcard matching
There is no direct way to do wildcard matching in QRegularExpression.
However, the wildcardToRegularExpression method is provided to translate
glob patterns into a Perl-compatible regular expression that can be used
for that purpose.
\section2 Other pattern syntaxes
QRegularExpression supports only Perl-compatible regular expressions.
\section2 Minimal matching
QRegExp::setMinimal() implemented minimal matching by simply reversing the
greediness of the quantifiers (QRegExp did not support lazy quantifiers,
like \c{*?}, \c{+?}, etc.). QRegularExpression instead does support greedy,
lazy and possessive quantifiers. The InvertedGreedinessOption
pattern option can be useful to emulate the effects of QRegExp::setMinimal():
if enabled, it inverts the greediness of quantifiers (greedy ones become
lazy and vice versa).
\section2 Caret modes
The AnchorAtOffsetMatchOption match option can be used to emulate the
QRegExp::CaretAtOffset behavior. There is no equivalent for the other
QRegExp::CaretMode modes.
*/ */
#if defined(Q_OS_VXWORKS) && defined(EOS) #if defined(Q_OS_VXWORKS) && defined(EOS)

View File

@ -431,132 +431,6 @@ QT_BEGIN_NAMESPACE
This may change in a future version of Qt. This may change in a future version of Qt.
\section1 Notes for QRegExp Users
The QRegularExpression class introduced in Qt 5 is a big improvement upon
QRegExp, in terms of APIs offered, supported pattern syntax and speed of
execution. The biggest difference is that QRegularExpression simply holds a
regular expression, and it's \e{not} modified when a match is requested.
Instead, a QRegularExpressionMatch object is returned, in order to check
the result of a match and extract the captured substring. The same applies
with global matching and QRegularExpressionMatchIterator.
Other differences are outlined below.
\section2 Different pattern syntax
Porting a regular expression from QRegExp to QRegularExpression may require
changes to the pattern itself.
In certain scenarios, QRegExp was too lenient and accepted patterns that
are simply invalid when using QRegularExpression. These are somehow easy
to detect, because the QRegularExpression objects built with these patterns
are not valid (cf. isValid()).
In other cases, a pattern ported from QRegExp to QRegularExpression may
silently change semantics. Therefore, it is necessary to review the
patterns used. The most notable cases of silent incompatibility are:
\list
\li Curly braces are needed in order to use a hexadecimal escape like
\c{\xHHHH} with more than 2 digits. A pattern like \c{\x2022} neeeds to
be ported to \c{\x{2022}}, or it will match a space (\c{0x20}) followed
by the string \c{"22"}. In general, it is highly recommended to always use
curly braces with the \c{\x} escape, no matter the amount of digits
specified.
\li A 0-to-n quantification like \c{{,n}} needs to be ported to \c{{0,n}} to
preserve semantics. Otherwise, a pattern such as \c{\d{,3}} would
actually match a digit followed by the exact string \c{"{,3}"}.
\li QRegExp by default does Unicode-aware matching, while
QRegularExpression requires a separate option; see below for more details.
\endlist
\section2 Porting from QRegExp::exactMatch()
QRegExp::exactMatch() in Qt 4 served two purposes: it exactly matched
a regular expression against a subject string, and it implemented partial
matching.
\section3 Porting from QRegExp's Exact Matching
Exact matching indicates whether the regular expression matches the entire
subject string. For example, the classes yield on the subject string \c{"abc123"}:
\table
\header \li \li QRegExp::exactMatch() \li QRegularExpressionMatch::hasMatch()
\row \li \c{"\\d+"} \li \b false \li \b true
\row \li \c{"[a-z]+\\d+"} \li \b true \li \b true
\endtable
Exact matching is not reflected in QRegularExpression. If you want
to be sure that the subject string matches the regular expression
exactly, you can wrap the pattern using the anchoredPattern()
function:
\snippet code/src_corelib_tools_qregularexpression.cpp 24
\section3 Porting from QRegExp's Partial Matching
When using QRegExp::exactMatch(), if an exact match was not found, one
could still find out how much of the subject string was matched by the
regular expression by calling QRegExp::matchedLength(). If the returned length
was equal to the subject string's length, then one could conclude that a partial
match was found.
QRegularExpression supports partial matching explicitly by means of the
appropriate MatchType.
\section2 Global matching
Due to limitations of the QRegExp API it was impossible to implement global
matching correctly (that is, like Perl does). In particular, patterns that
can match 0 characters (like \c{"a*"}) are problematic.
QRegularExpression::globalMatch() implements Perl global match correctly, and
the returned iterator can be used to examine each result.
\section2 Unicode properties support
When using QRegExp, character classes such as \c{\w}, \c{\d}, etc. match
characters with the corresponding Unicode property: for instance, \c{\d}
matches any character with the Unicode Nd (decimal digit) property.
Those character classes only match ASCII characters by default when using
QRegularExpression: for instance, \c{\d} matches exactly a character in the
\c{0-9} ASCII range. It is possible to change this behaviour by using the
UseUnicodePropertiesOption pattern option.
\section2 Wildcard matching
There is no direct way to do wildcard matching in QRegularExpression.
However, the wildcardToRegularExpression method is provided to translate
glob patterns into a Perl-compatible regular expression that can be used
for that purpose.
\section2 Other pattern syntaxes
QRegularExpression supports only Perl-compatible regular expressions.
\section2 Minimal matching
QRegExp::setMinimal() implemented minimal matching by simply reversing the
greediness of the quantifiers (QRegExp did not support lazy quantifiers,
like \c{*?}, \c{+?}, etc.). QRegularExpression instead does support greedy,
lazy and possessive quantifiers. The InvertedGreedinessOption
pattern option can be useful to emulate the effects of QRegExp::setMinimal():
if enabled, it inverts the greediness of quantifiers (greedy ones become
lazy and vice versa).
\section2 Caret modes
The AnchorAtOffsetMatchOption match option can be used to emulate the
QRegExp::CaretAtOffset behaviour. There is no equivalent for the other
QRegExp::CaretMode modes.
\section1 Debugging Code that Uses QRegularExpression \section1 Debugging Code that Uses QRegularExpression
QRegularExpression internally uses a just in time compiler (JIT) to QRegularExpression internally uses a just in time compiler (JIT) to
@ -1936,7 +1810,7 @@ QString QRegularExpression::escape(QStringView str)
result. To get an a regular expression that is not anchored, pass result. To get an a regular expression that is not anchored, pass
UnanchoredWildcardConversion as the conversion \a option. UnanchoredWildcardConversion as the conversion \a option.
\warning Unlike QRegExp, this implementation follows closely the definition This implementation follows closely the definition
of wildcard for glob patterns: of wildcard for glob patterns:
\table \table
\row \li \b{c} \row \li \b{c}
@ -2066,8 +1940,6 @@ QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, Wil
Returns the \a expression wrapped between the \c{\A} and \c{\z} anchors to Returns the \a expression wrapped between the \c{\A} and \c{\z} anchors to
be used for exact matching. be used for exact matching.
\sa {Porting from QRegExp's Exact Matching}
*/ */
QString QRegularExpression::anchoredPattern(QStringView expression) QString QRegularExpression::anchoredPattern(QStringView expression)
{ {