QRegularExpression: add support for non-filepath wildcards/globbing
A glob pattern has different semantics depending on whether it's used in "filepath mode" (FNM_PATHNAME) or not. QRegularExpression only implemented the former, but the latter is also useful, and possibly more "intuitive" for certain use cases (e.g. offering users a simplified version of regexps that however still need "*" to match a "/"). Add this support. The problems highlighted by QTBUG-111234 have not been addressed, I've just amended a bit of documentation relating backslashes. [ChangeLog][QtCore][QRegularExpression] Support for non-filepath wildcards has been added to wildcardToRegularExpression(). Fixes: QTBUG-110901 Task-number: QTBUG-104585 Task-number: QTBUG-111234 Change-Id: If9850616267980fa843bda996fcb4552b5375938 Reviewed-by: Eike Ziller <eike.ziller@qt.io> Reviewed-by: Samuel Gaist <samuel.gaist@idiap.ch> Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
This commit is contained in:
parent
b3a60e49cd
commit
4b197c3f52
@ -1856,22 +1856,31 @@ QString QRegularExpression::escape(QStringView str)
|
||||
\value UnanchoredWildcardConversion
|
||||
The conversion will not anchor the pattern. This allows for partial string matches of
|
||||
wildcard expressions.
|
||||
|
||||
\value NonPathWildcardConversion
|
||||
The conversion will \e{not} interpret the pattern as filepath globbing.
|
||||
This enum value has been introduced in Qt 6.6.
|
||||
|
||||
\sa QRegularExpression::wildcardToRegularExpression
|
||||
*/
|
||||
|
||||
/*!
|
||||
\since 5.15
|
||||
|
||||
Returns a regular expression representation of the given glob \a pattern.
|
||||
The transformation is targeting file path globbing, which means in particular
|
||||
that path separators receive special treatment. This implies that it is not
|
||||
just a basic translation from "*" to ".*".
|
||||
|
||||
There are two transformations possible, one that targets file path
|
||||
globbing, and another one which is more generic.
|
||||
|
||||
By default, the transformation is targeting file path globbing,
|
||||
which means in particular that path separators receive special
|
||||
treatment. This implies that it is not just a basic translation
|
||||
from "*" to ".*" and similar.
|
||||
|
||||
\snippet code/src_corelib_text_qregularexpression.cpp 31
|
||||
|
||||
By default, the returned regular expression is fully anchored. In other
|
||||
words, there is no need of calling anchoredPattern() again on the
|
||||
result. To get a regular expression that is not anchored, pass
|
||||
UnanchoredWildcardConversion as the conversion \a options.
|
||||
The more generic globbing transformation is available by passing
|
||||
\c NonPathWildcardConversion in the conversion \a options.
|
||||
|
||||
This implementation follows closely the definition
|
||||
of wildcard for glob patterns:
|
||||
@ -1880,10 +1889,12 @@ QString QRegularExpression::escape(QStringView str)
|
||||
\li Any character represents itself apart from those mentioned
|
||||
below. Thus \b{c} matches the character \e c.
|
||||
\row \li \b{?}
|
||||
\li Matches any single character. It is the same as
|
||||
\b{.} in full regexps.
|
||||
\li Matches any single character, except for a path separator
|
||||
(in case file path globbing has been selected). It is the
|
||||
same as b{.} in full regexps.
|
||||
\row \li \b{*}
|
||||
\li Matches zero or more of any characters. It is the
|
||||
\li Matches zero or more of any characters, except for path
|
||||
separators (in case file path globbing has been selected). It is the
|
||||
same as \b{.*} in full regexps.
|
||||
\row \li \b{[abc]}
|
||||
\li Matches one character given in the bracket.
|
||||
@ -1897,9 +1908,10 @@ QString QRegularExpression::escape(QStringView str)
|
||||
bracket. It is the same as \b{[^a-c]} in full regexp.
|
||||
\endtable
|
||||
|
||||
\note The backslash (\\) character is \e not an escape char in this context.
|
||||
In order to match one of the special characters, place it in square brackets
|
||||
(for example, \c{[?]}).
|
||||
\note For historical reasons, a backslash (\\) character is \e not
|
||||
an escape char in this context. In order to match one of the
|
||||
special characters, place it in square brackets (for example,
|
||||
\c{[?]}).
|
||||
|
||||
More information about the implementation can be found in:
|
||||
\list
|
||||
@ -1907,6 +1919,11 @@ QString QRegularExpression::escape(QStringView str)
|
||||
\li \c {man 7 glob}
|
||||
\endlist
|
||||
|
||||
By default, the returned regular expression is fully anchored. In other
|
||||
words, there is no need of calling anchoredPattern() again on the
|
||||
result. To get a regular expression that is not anchored, pass
|
||||
UnanchoredWildcardConversion in the conversion \a options.
|
||||
|
||||
\sa escape()
|
||||
*/
|
||||
QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, WildcardConversionOptions options)
|
||||
@ -1917,29 +1934,49 @@ QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, Wil
|
||||
qsizetype i = 0;
|
||||
const QChar *wc = pattern.data();
|
||||
|
||||
struct GlobSettings {
|
||||
char16_t nativePathSeparator;
|
||||
QStringView starEscape;
|
||||
QStringView questionMarkEscape;
|
||||
};
|
||||
|
||||
const GlobSettings settings = [options]() {
|
||||
if (options.testFlag(NonPathWildcardConversion)) {
|
||||
return GlobSettings{ u'\0', u".*", u"." };
|
||||
} else {
|
||||
#ifdef Q_OS_WIN
|
||||
const char16_t nativePathSeparator = u'\\';
|
||||
const auto starEscape = "[^/\\\\]*"_L1;
|
||||
const auto questionMarkEscape = "[^/\\\\]"_L1;
|
||||
return GlobSettings{ u'\\', u"[^/\\\\]*", u"[^/\\\\]" };
|
||||
#else
|
||||
const char16_t nativePathSeparator = u'/';
|
||||
const auto starEscape = "[^/]*"_L1;
|
||||
const auto questionMarkEscape = "[^/]"_L1;
|
||||
return GlobSettings{ u'/', u"[^/]*", u"[^/]" };
|
||||
#endif
|
||||
}
|
||||
}();
|
||||
|
||||
while (i < wclen) {
|
||||
const QChar c = wc[i++];
|
||||
switch (c.unicode()) {
|
||||
case '*':
|
||||
rx += starEscape;
|
||||
rx += settings.starEscape;
|
||||
break;
|
||||
case '?':
|
||||
rx += questionMarkEscape;
|
||||
rx += settings.questionMarkEscape;
|
||||
break;
|
||||
// When not using filepath globbing: \ is escaped, / is itself
|
||||
// When using filepath globbing:
|
||||
// * Unix: \ gets escaped. / is itself
|
||||
// * Windows: \ and / can match each other -- they become [/\\] in regexp
|
||||
case '\\':
|
||||
#ifdef Q_OS_WIN
|
||||
if (options.testFlag(NonPathWildcardConversion))
|
||||
rx += u"\\\\";
|
||||
else
|
||||
rx += u"[/\\\\]";
|
||||
break;
|
||||
case '/':
|
||||
rx += "[/\\\\]"_L1;
|
||||
if (options.testFlag(NonPathWildcardConversion))
|
||||
rx += u'/';
|
||||
else
|
||||
rx += u"[/\\\\]";
|
||||
break;
|
||||
#endif
|
||||
case '$':
|
||||
@ -1967,11 +2004,13 @@ QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, Wil
|
||||
rx += wc[i++];
|
||||
|
||||
while (i < wclen && wc[i] != u']') {
|
||||
if (!options.testFlag(NonPathWildcardConversion)) {
|
||||
// The '/' appearing in a character class invalidates the
|
||||
// regular expression parsing. It also concerns '\\' on
|
||||
// Windows OS types.
|
||||
if (wc[i] == u'/' || wc[i] == nativePathSeparator)
|
||||
if (wc[i] == u'/' || wc[i] == settings.nativePathSeparator)
|
||||
return rx;
|
||||
}
|
||||
if (wc[i] == u'\\')
|
||||
rx += u'\\';
|
||||
rx += wc[i++];
|
||||
|
@ -131,7 +131,8 @@ public:
|
||||
|
||||
enum WildcardConversionOption {
|
||||
DefaultWildcardConversion = 0x0,
|
||||
UnanchoredWildcardConversion = 0x1
|
||||
UnanchoredWildcardConversion = 0x1,
|
||||
NonPathWildcardConversion = 0x2,
|
||||
};
|
||||
Q_DECLARE_FLAGS(WildcardConversionOptions, WildcardConversionOption)
|
||||
|
||||
|
@ -2446,54 +2446,68 @@ void tst_QRegularExpression::wildcard_data()
|
||||
{
|
||||
QTest::addColumn<QString>("pattern");
|
||||
QTest::addColumn<QString>("string");
|
||||
QTest::addColumn<qsizetype>("foundIndex");
|
||||
QTest::addColumn<bool>("matchesPathGlob");
|
||||
QTest::addColumn<bool>("matchesNonPathGlob");
|
||||
|
||||
auto addRow = [](const char *pattern, const char *string, qsizetype foundIndex) {
|
||||
QTest::addRow("%s@%s", pattern, string) << pattern << string << foundIndex;
|
||||
auto addRow = [](const char *pattern, const char *string, bool matchesPathGlob, bool matchesNonPathGlob) {
|
||||
QTest::addRow("%s@%s", pattern, string) << pattern << string << matchesPathGlob << matchesNonPathGlob;
|
||||
};
|
||||
|
||||
addRow("*.html", "test.html", 0);
|
||||
addRow("*.html", "test.htm", -1);
|
||||
addRow("*bar*", "foobarbaz", 0);
|
||||
addRow("*", "Qt Rocks!", 0);
|
||||
addRow("*.h", "test.cpp", -1);
|
||||
addRow("*.???l", "test.html", 0);
|
||||
addRow("*?", "test.html", 0);
|
||||
addRow("*?ml", "test.html", 0);
|
||||
addRow("*[*]", "test.html", -1);
|
||||
addRow("*[?]","test.html", -1);
|
||||
addRow("*[?]ml","test.h?ml", 0);
|
||||
addRow("*[[]ml","test.h[ml", 0);
|
||||
addRow("*[]]ml","test.h]ml", 0);
|
||||
addRow("*.h[a-z]ml", "test.html", 0);
|
||||
addRow("*.h[A-Z]ml", "test.html", -1);
|
||||
addRow("*.h[A-Z]ml", "test.hTml", 0);
|
||||
addRow("*.h[!A-Z]ml", "test.hTml", -1);
|
||||
addRow("*.h[!A-Z]ml", "test.html", 0);
|
||||
addRow("*.h[!T]ml", "test.hTml", -1);
|
||||
addRow("*.h[!T]ml", "test.html", 0);
|
||||
addRow("*.h[!T]m[!L]", "test.htmL", -1);
|
||||
addRow("*.h[!T]m[!L]", "test.html", 0);
|
||||
addRow("*.h[][!]ml", "test.h]ml", 0);
|
||||
addRow("*.h[][!]ml", "test.h[ml", 0);
|
||||
addRow("*.h[][!]ml", "test.h!ml", 0);
|
||||
addRow("*.html", "test.html", true, true);
|
||||
addRow("*.html", "test.htm", false, false);
|
||||
addRow("*bar*", "foobarbaz", true, true);
|
||||
addRow("*", "Qt Rocks!", true, true);
|
||||
addRow("*.h", "test.cpp", false, false);
|
||||
addRow("*.???l", "test.html", true, true);
|
||||
addRow("*?", "test.html", true, true);
|
||||
addRow("*?ml", "test.html", true, true);
|
||||
addRow("*[*]", "test.html", false, false);
|
||||
addRow("*[?]","test.html", false, false);
|
||||
addRow("*[?]ml","test.h?ml", true, true);
|
||||
addRow("*[[]ml","test.h[ml", true, true);
|
||||
addRow("*[]]ml","test.h]ml", true, true);
|
||||
addRow("*.h[a-z]ml", "test.html", true, true);
|
||||
addRow("*.h[A-Z]ml", "test.html", false, false);
|
||||
addRow("*.h[A-Z]ml", "test.hTml", true, true);
|
||||
addRow("*.h[!A-Z]ml", "test.hTml", false, false);
|
||||
addRow("*.h[!A-Z]ml", "test.html", true, true);
|
||||
addRow("*.h[!T]ml", "test.hTml", false, false);
|
||||
addRow("*.h[!T]ml", "test.html", true, true);
|
||||
addRow("*.h[!T]m[!L]", "test.htmL", false, false);
|
||||
addRow("*.h[!T]m[!L]", "test.html", true, true);
|
||||
addRow("*.h[][!]ml", "test.h]ml", true, true);
|
||||
addRow("*.h[][!]ml", "test.h[ml", true, true);
|
||||
addRow("*.h[][!]ml", "test.h!ml", true, true);
|
||||
|
||||
addRow("foo/*/bar", "foo/baz/bar", 0);
|
||||
addRow("foo/(*)/bar", "foo/baz/bar", -1);
|
||||
addRow("foo/(*)/bar", "foo/(baz)/bar", 0);
|
||||
addRow("foo/?/bar", "foo/Q/bar", 0);
|
||||
addRow("foo/?/bar", "foo/Qt/bar", -1);
|
||||
addRow("foo/(?)/bar", "foo/Q/bar", -1);
|
||||
addRow("foo/(?)/bar", "foo/(Q)/bar", 0);
|
||||
addRow("foo/*/bar", "foo/baz/bar", true, true);
|
||||
addRow("foo/*/bar", "foo/fie/baz/bar", false, true);
|
||||
addRow("foo?bar", "foo/bar", false, true);
|
||||
addRow("foo/(*)/bar", "foo/baz/bar", false, false);
|
||||
addRow("foo/(*)/bar", "foo/(baz)/bar", true, true);
|
||||
addRow("foo/?/bar", "foo/Q/bar", true, true);
|
||||
addRow("foo/?/bar", "foo/Qt/bar", false, false);
|
||||
addRow("foo/(?)/bar", "foo/Q/bar", false, false);
|
||||
addRow("foo/(?)/bar", "foo/(Q)/bar", true, true);
|
||||
|
||||
addRow("foo*bar", "foo/fie/baz/bar", false, true);
|
||||
addRow("fie*bar", "foo/fie/baz/bar", false, false); // regexp is anchored
|
||||
|
||||
#ifdef Q_OS_WIN
|
||||
addRow("foo\\*\\bar", "foo\\baz\\bar", 0);
|
||||
addRow("foo\\(*)\\bar", "foo\\baz\\bar", -1);
|
||||
addRow("foo\\(*)\\bar", "foo\\(baz)\\bar", 0);
|
||||
addRow("foo\\?\\bar", "foo\\Q\\bar", 0);
|
||||
addRow("foo\\?\\bar", "foo\\Qt\\bar", -1);
|
||||
addRow("foo\\(?)\\bar", "foo\\Q\\bar", -1);
|
||||
addRow("foo\\(?)\\bar", "foo\\(Q)\\bar", 0);
|
||||
addRow("foo\\*\\bar", "foo\\baz\\bar", true, true);
|
||||
addRow("foo\\*\\bar", "foo/baz/bar", true, false);
|
||||
addRow("foo\\*\\bar", "foo/baz\\bar", true, false);
|
||||
addRow("foo\\*\\bar", "foo\\fie\\baz\\bar", false, true);
|
||||
addRow("foo\\*\\bar", "foo/fie/baz/bar", false, false);
|
||||
addRow("foo/*/bar", "foo\\baz\\bar", true, false);
|
||||
addRow("foo/*/bar", "foo/baz/bar", true, true);
|
||||
addRow("foo/*/bar", "foo\\fie\\baz\\bar", false, false);
|
||||
addRow("foo/*/bar", "foo/fie/baz/bar", false, true);
|
||||
addRow("foo\\(*)\\bar", "foo\\baz\\bar", false, false);
|
||||
addRow("foo\\(*)\\bar", "foo\\(baz)\\bar", true, true);
|
||||
addRow("foo\\?\\bar", "foo\\Q\\bar", true, true);
|
||||
addRow("foo\\?\\bar", "foo\\Qt\\bar", false, false);
|
||||
addRow("foo\\(?)\\bar", "foo\\Q\\bar", false, false);
|
||||
addRow("foo\\(?)\\bar", "foo\\(Q)\\bar", true, true);
|
||||
#endif
|
||||
}
|
||||
|
||||
@ -2501,12 +2515,17 @@ void tst_QRegularExpression::wildcard()
|
||||
{
|
||||
QFETCH(QString, pattern);
|
||||
QFETCH(QString, string);
|
||||
QFETCH(qsizetype, foundIndex);
|
||||
QFETCH(bool, matchesPathGlob);
|
||||
QFETCH(bool, matchesNonPathGlob);
|
||||
|
||||
{
|
||||
QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern));
|
||||
QRegularExpressionMatch match = re.match(string);
|
||||
|
||||
QCOMPARE(match.capturedStart(), foundIndex);
|
||||
QCOMPARE(string.contains(re), matchesPathGlob);
|
||||
}
|
||||
{
|
||||
QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern, QRegularExpression::NonPathWildcardConversion));
|
||||
QCOMPARE(string.contains(re), matchesNonPathGlob);
|
||||
}
|
||||
}
|
||||
|
||||
void tst_QRegularExpression::testInvalidWildcard_data()
|
||||
|
Loading…
Reference in New Issue
Block a user