QRegularExpression: add support for non-filepath wildcards/globbing
A glob pattern has different semantics depending on whether it's used in "filepath mode" (FNM_PATHNAME) or not. QRegularExpression only implemented the former, but the latter is also useful, and possibly more "intuitive" for certain use cases (e.g. offering users a simplified version of regexps that however still need "*" to match a "/"). Add this support. The problems highlighted by QTBUG-111234 have not been addressed, I've just amended a bit of documentation relating backslashes. [ChangeLog][QtCore][QRegularExpression] Support for non-filepath wildcards has been added to wildcardToRegularExpression(). Fixes: QTBUG-110901 Task-number: QTBUG-104585 Task-number: QTBUG-111234 Change-Id: If9850616267980fa843bda996fcb4552b5375938 Reviewed-by: Eike Ziller <eike.ziller@qt.io> Reviewed-by: Samuel Gaist <samuel.gaist@idiap.ch> Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
This commit is contained in:
parent
b3a60e49cd
commit
4b197c3f52
@ -1856,22 +1856,31 @@ QString QRegularExpression::escape(QStringView str)
|
|||||||
\value UnanchoredWildcardConversion
|
\value UnanchoredWildcardConversion
|
||||||
The conversion will not anchor the pattern. This allows for partial string matches of
|
The conversion will not anchor the pattern. This allows for partial string matches of
|
||||||
wildcard expressions.
|
wildcard expressions.
|
||||||
|
|
||||||
|
\value NonPathWildcardConversion
|
||||||
|
The conversion will \e{not} interpret the pattern as filepath globbing.
|
||||||
|
This enum value has been introduced in Qt 6.6.
|
||||||
|
|
||||||
|
\sa QRegularExpression::wildcardToRegularExpression
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/*!
|
/*!
|
||||||
\since 5.15
|
\since 5.15
|
||||||
|
|
||||||
Returns a regular expression representation of the given glob \a pattern.
|
Returns a regular expression representation of the given glob \a pattern.
|
||||||
The transformation is targeting file path globbing, which means in particular
|
|
||||||
that path separators receive special treatment. This implies that it is not
|
There are two transformations possible, one that targets file path
|
||||||
just a basic translation from "*" to ".*".
|
globbing, and another one which is more generic.
|
||||||
|
|
||||||
|
By default, the transformation is targeting file path globbing,
|
||||||
|
which means in particular that path separators receive special
|
||||||
|
treatment. This implies that it is not just a basic translation
|
||||||
|
from "*" to ".*" and similar.
|
||||||
|
|
||||||
\snippet code/src_corelib_text_qregularexpression.cpp 31
|
\snippet code/src_corelib_text_qregularexpression.cpp 31
|
||||||
|
|
||||||
By default, the returned regular expression is fully anchored. In other
|
The more generic globbing transformation is available by passing
|
||||||
words, there is no need of calling anchoredPattern() again on the
|
\c NonPathWildcardConversion in the conversion \a options.
|
||||||
result. To get a regular expression that is not anchored, pass
|
|
||||||
UnanchoredWildcardConversion as the conversion \a options.
|
|
||||||
|
|
||||||
This implementation follows closely the definition
|
This implementation follows closely the definition
|
||||||
of wildcard for glob patterns:
|
of wildcard for glob patterns:
|
||||||
@ -1880,10 +1889,12 @@ QString QRegularExpression::escape(QStringView str)
|
|||||||
\li Any character represents itself apart from those mentioned
|
\li Any character represents itself apart from those mentioned
|
||||||
below. Thus \b{c} matches the character \e c.
|
below. Thus \b{c} matches the character \e c.
|
||||||
\row \li \b{?}
|
\row \li \b{?}
|
||||||
\li Matches any single character. It is the same as
|
\li Matches any single character, except for a path separator
|
||||||
\b{.} in full regexps.
|
(in case file path globbing has been selected). It is the
|
||||||
|
same as b{.} in full regexps.
|
||||||
\row \li \b{*}
|
\row \li \b{*}
|
||||||
\li Matches zero or more of any characters. It is the
|
\li Matches zero or more of any characters, except for path
|
||||||
|
separators (in case file path globbing has been selected). It is the
|
||||||
same as \b{.*} in full regexps.
|
same as \b{.*} in full regexps.
|
||||||
\row \li \b{[abc]}
|
\row \li \b{[abc]}
|
||||||
\li Matches one character given in the bracket.
|
\li Matches one character given in the bracket.
|
||||||
@ -1897,9 +1908,10 @@ QString QRegularExpression::escape(QStringView str)
|
|||||||
bracket. It is the same as \b{[^a-c]} in full regexp.
|
bracket. It is the same as \b{[^a-c]} in full regexp.
|
||||||
\endtable
|
\endtable
|
||||||
|
|
||||||
\note The backslash (\\) character is \e not an escape char in this context.
|
\note For historical reasons, a backslash (\\) character is \e not
|
||||||
In order to match one of the special characters, place it in square brackets
|
an escape char in this context. In order to match one of the
|
||||||
(for example, \c{[?]}).
|
special characters, place it in square brackets (for example,
|
||||||
|
\c{[?]}).
|
||||||
|
|
||||||
More information about the implementation can be found in:
|
More information about the implementation can be found in:
|
||||||
\list
|
\list
|
||||||
@ -1907,6 +1919,11 @@ QString QRegularExpression::escape(QStringView str)
|
|||||||
\li \c {man 7 glob}
|
\li \c {man 7 glob}
|
||||||
\endlist
|
\endlist
|
||||||
|
|
||||||
|
By default, the returned regular expression is fully anchored. In other
|
||||||
|
words, there is no need of calling anchoredPattern() again on the
|
||||||
|
result. To get a regular expression that is not anchored, pass
|
||||||
|
UnanchoredWildcardConversion in the conversion \a options.
|
||||||
|
|
||||||
\sa escape()
|
\sa escape()
|
||||||
*/
|
*/
|
||||||
QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, WildcardConversionOptions options)
|
QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, WildcardConversionOptions options)
|
||||||
@ -1917,29 +1934,49 @@ QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, Wil
|
|||||||
qsizetype i = 0;
|
qsizetype i = 0;
|
||||||
const QChar *wc = pattern.data();
|
const QChar *wc = pattern.data();
|
||||||
|
|
||||||
|
struct GlobSettings {
|
||||||
|
char16_t nativePathSeparator;
|
||||||
|
QStringView starEscape;
|
||||||
|
QStringView questionMarkEscape;
|
||||||
|
};
|
||||||
|
|
||||||
|
const GlobSettings settings = [options]() {
|
||||||
|
if (options.testFlag(NonPathWildcardConversion)) {
|
||||||
|
return GlobSettings{ u'\0', u".*", u"." };
|
||||||
|
} else {
|
||||||
#ifdef Q_OS_WIN
|
#ifdef Q_OS_WIN
|
||||||
const char16_t nativePathSeparator = u'\\';
|
return GlobSettings{ u'\\', u"[^/\\\\]*", u"[^/\\\\]" };
|
||||||
const auto starEscape = "[^/\\\\]*"_L1;
|
|
||||||
const auto questionMarkEscape = "[^/\\\\]"_L1;
|
|
||||||
#else
|
#else
|
||||||
const char16_t nativePathSeparator = u'/';
|
return GlobSettings{ u'/', u"[^/]*", u"[^/]" };
|
||||||
const auto starEscape = "[^/]*"_L1;
|
|
||||||
const auto questionMarkEscape = "[^/]"_L1;
|
|
||||||
#endif
|
#endif
|
||||||
|
}
|
||||||
|
}();
|
||||||
|
|
||||||
while (i < wclen) {
|
while (i < wclen) {
|
||||||
const QChar c = wc[i++];
|
const QChar c = wc[i++];
|
||||||
switch (c.unicode()) {
|
switch (c.unicode()) {
|
||||||
case '*':
|
case '*':
|
||||||
rx += starEscape;
|
rx += settings.starEscape;
|
||||||
break;
|
break;
|
||||||
case '?':
|
case '?':
|
||||||
rx += questionMarkEscape;
|
rx += settings.questionMarkEscape;
|
||||||
break;
|
break;
|
||||||
|
// When not using filepath globbing: \ is escaped, / is itself
|
||||||
|
// When using filepath globbing:
|
||||||
|
// * Unix: \ gets escaped. / is itself
|
||||||
|
// * Windows: \ and / can match each other -- they become [/\\] in regexp
|
||||||
case '\\':
|
case '\\':
|
||||||
#ifdef Q_OS_WIN
|
#ifdef Q_OS_WIN
|
||||||
|
if (options.testFlag(NonPathWildcardConversion))
|
||||||
|
rx += u"\\\\";
|
||||||
|
else
|
||||||
|
rx += u"[/\\\\]";
|
||||||
|
break;
|
||||||
case '/':
|
case '/':
|
||||||
rx += "[/\\\\]"_L1;
|
if (options.testFlag(NonPathWildcardConversion))
|
||||||
|
rx += u'/';
|
||||||
|
else
|
||||||
|
rx += u"[/\\\\]";
|
||||||
break;
|
break;
|
||||||
#endif
|
#endif
|
||||||
case '$':
|
case '$':
|
||||||
@ -1967,11 +2004,13 @@ QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, Wil
|
|||||||
rx += wc[i++];
|
rx += wc[i++];
|
||||||
|
|
||||||
while (i < wclen && wc[i] != u']') {
|
while (i < wclen && wc[i] != u']') {
|
||||||
|
if (!options.testFlag(NonPathWildcardConversion)) {
|
||||||
// The '/' appearing in a character class invalidates the
|
// The '/' appearing in a character class invalidates the
|
||||||
// regular expression parsing. It also concerns '\\' on
|
// regular expression parsing. It also concerns '\\' on
|
||||||
// Windows OS types.
|
// Windows OS types.
|
||||||
if (wc[i] == u'/' || wc[i] == nativePathSeparator)
|
if (wc[i] == u'/' || wc[i] == settings.nativePathSeparator)
|
||||||
return rx;
|
return rx;
|
||||||
|
}
|
||||||
if (wc[i] == u'\\')
|
if (wc[i] == u'\\')
|
||||||
rx += u'\\';
|
rx += u'\\';
|
||||||
rx += wc[i++];
|
rx += wc[i++];
|
||||||
|
@ -131,7 +131,8 @@ public:
|
|||||||
|
|
||||||
enum WildcardConversionOption {
|
enum WildcardConversionOption {
|
||||||
DefaultWildcardConversion = 0x0,
|
DefaultWildcardConversion = 0x0,
|
||||||
UnanchoredWildcardConversion = 0x1
|
UnanchoredWildcardConversion = 0x1,
|
||||||
|
NonPathWildcardConversion = 0x2,
|
||||||
};
|
};
|
||||||
Q_DECLARE_FLAGS(WildcardConversionOptions, WildcardConversionOption)
|
Q_DECLARE_FLAGS(WildcardConversionOptions, WildcardConversionOption)
|
||||||
|
|
||||||
|
@ -2446,54 +2446,68 @@ void tst_QRegularExpression::wildcard_data()
|
|||||||
{
|
{
|
||||||
QTest::addColumn<QString>("pattern");
|
QTest::addColumn<QString>("pattern");
|
||||||
QTest::addColumn<QString>("string");
|
QTest::addColumn<QString>("string");
|
||||||
QTest::addColumn<qsizetype>("foundIndex");
|
QTest::addColumn<bool>("matchesPathGlob");
|
||||||
|
QTest::addColumn<bool>("matchesNonPathGlob");
|
||||||
|
|
||||||
auto addRow = [](const char *pattern, const char *string, qsizetype foundIndex) {
|
auto addRow = [](const char *pattern, const char *string, bool matchesPathGlob, bool matchesNonPathGlob) {
|
||||||
QTest::addRow("%s@%s", pattern, string) << pattern << string << foundIndex;
|
QTest::addRow("%s@%s", pattern, string) << pattern << string << matchesPathGlob << matchesNonPathGlob;
|
||||||
};
|
};
|
||||||
|
|
||||||
addRow("*.html", "test.html", 0);
|
addRow("*.html", "test.html", true, true);
|
||||||
addRow("*.html", "test.htm", -1);
|
addRow("*.html", "test.htm", false, false);
|
||||||
addRow("*bar*", "foobarbaz", 0);
|
addRow("*bar*", "foobarbaz", true, true);
|
||||||
addRow("*", "Qt Rocks!", 0);
|
addRow("*", "Qt Rocks!", true, true);
|
||||||
addRow("*.h", "test.cpp", -1);
|
addRow("*.h", "test.cpp", false, false);
|
||||||
addRow("*.???l", "test.html", 0);
|
addRow("*.???l", "test.html", true, true);
|
||||||
addRow("*?", "test.html", 0);
|
addRow("*?", "test.html", true, true);
|
||||||
addRow("*?ml", "test.html", 0);
|
addRow("*?ml", "test.html", true, true);
|
||||||
addRow("*[*]", "test.html", -1);
|
addRow("*[*]", "test.html", false, false);
|
||||||
addRow("*[?]","test.html", -1);
|
addRow("*[?]","test.html", false, false);
|
||||||
addRow("*[?]ml","test.h?ml", 0);
|
addRow("*[?]ml","test.h?ml", true, true);
|
||||||
addRow("*[[]ml","test.h[ml", 0);
|
addRow("*[[]ml","test.h[ml", true, true);
|
||||||
addRow("*[]]ml","test.h]ml", 0);
|
addRow("*[]]ml","test.h]ml", true, true);
|
||||||
addRow("*.h[a-z]ml", "test.html", 0);
|
addRow("*.h[a-z]ml", "test.html", true, true);
|
||||||
addRow("*.h[A-Z]ml", "test.html", -1);
|
addRow("*.h[A-Z]ml", "test.html", false, false);
|
||||||
addRow("*.h[A-Z]ml", "test.hTml", 0);
|
addRow("*.h[A-Z]ml", "test.hTml", true, true);
|
||||||
addRow("*.h[!A-Z]ml", "test.hTml", -1);
|
addRow("*.h[!A-Z]ml", "test.hTml", false, false);
|
||||||
addRow("*.h[!A-Z]ml", "test.html", 0);
|
addRow("*.h[!A-Z]ml", "test.html", true, true);
|
||||||
addRow("*.h[!T]ml", "test.hTml", -1);
|
addRow("*.h[!T]ml", "test.hTml", false, false);
|
||||||
addRow("*.h[!T]ml", "test.html", 0);
|
addRow("*.h[!T]ml", "test.html", true, true);
|
||||||
addRow("*.h[!T]m[!L]", "test.htmL", -1);
|
addRow("*.h[!T]m[!L]", "test.htmL", false, false);
|
||||||
addRow("*.h[!T]m[!L]", "test.html", 0);
|
addRow("*.h[!T]m[!L]", "test.html", true, true);
|
||||||
addRow("*.h[][!]ml", "test.h]ml", 0);
|
addRow("*.h[][!]ml", "test.h]ml", true, true);
|
||||||
addRow("*.h[][!]ml", "test.h[ml", 0);
|
addRow("*.h[][!]ml", "test.h[ml", true, true);
|
||||||
addRow("*.h[][!]ml", "test.h!ml", 0);
|
addRow("*.h[][!]ml", "test.h!ml", true, true);
|
||||||
|
|
||||||
addRow("foo/*/bar", "foo/baz/bar", 0);
|
addRow("foo/*/bar", "foo/baz/bar", true, true);
|
||||||
addRow("foo/(*)/bar", "foo/baz/bar", -1);
|
addRow("foo/*/bar", "foo/fie/baz/bar", false, true);
|
||||||
addRow("foo/(*)/bar", "foo/(baz)/bar", 0);
|
addRow("foo?bar", "foo/bar", false, true);
|
||||||
addRow("foo/?/bar", "foo/Q/bar", 0);
|
addRow("foo/(*)/bar", "foo/baz/bar", false, false);
|
||||||
addRow("foo/?/bar", "foo/Qt/bar", -1);
|
addRow("foo/(*)/bar", "foo/(baz)/bar", true, true);
|
||||||
addRow("foo/(?)/bar", "foo/Q/bar", -1);
|
addRow("foo/?/bar", "foo/Q/bar", true, true);
|
||||||
addRow("foo/(?)/bar", "foo/(Q)/bar", 0);
|
addRow("foo/?/bar", "foo/Qt/bar", false, false);
|
||||||
|
addRow("foo/(?)/bar", "foo/Q/bar", false, false);
|
||||||
|
addRow("foo/(?)/bar", "foo/(Q)/bar", true, true);
|
||||||
|
|
||||||
|
addRow("foo*bar", "foo/fie/baz/bar", false, true);
|
||||||
|
addRow("fie*bar", "foo/fie/baz/bar", false, false); // regexp is anchored
|
||||||
|
|
||||||
#ifdef Q_OS_WIN
|
#ifdef Q_OS_WIN
|
||||||
addRow("foo\\*\\bar", "foo\\baz\\bar", 0);
|
addRow("foo\\*\\bar", "foo\\baz\\bar", true, true);
|
||||||
addRow("foo\\(*)\\bar", "foo\\baz\\bar", -1);
|
addRow("foo\\*\\bar", "foo/baz/bar", true, false);
|
||||||
addRow("foo\\(*)\\bar", "foo\\(baz)\\bar", 0);
|
addRow("foo\\*\\bar", "foo/baz\\bar", true, false);
|
||||||
addRow("foo\\?\\bar", "foo\\Q\\bar", 0);
|
addRow("foo\\*\\bar", "foo\\fie\\baz\\bar", false, true);
|
||||||
addRow("foo\\?\\bar", "foo\\Qt\\bar", -1);
|
addRow("foo\\*\\bar", "foo/fie/baz/bar", false, false);
|
||||||
addRow("foo\\(?)\\bar", "foo\\Q\\bar", -1);
|
addRow("foo/*/bar", "foo\\baz\\bar", true, false);
|
||||||
addRow("foo\\(?)\\bar", "foo\\(Q)\\bar", 0);
|
addRow("foo/*/bar", "foo/baz/bar", true, true);
|
||||||
|
addRow("foo/*/bar", "foo\\fie\\baz\\bar", false, false);
|
||||||
|
addRow("foo/*/bar", "foo/fie/baz/bar", false, true);
|
||||||
|
addRow("foo\\(*)\\bar", "foo\\baz\\bar", false, false);
|
||||||
|
addRow("foo\\(*)\\bar", "foo\\(baz)\\bar", true, true);
|
||||||
|
addRow("foo\\?\\bar", "foo\\Q\\bar", true, true);
|
||||||
|
addRow("foo\\?\\bar", "foo\\Qt\\bar", false, false);
|
||||||
|
addRow("foo\\(?)\\bar", "foo\\Q\\bar", false, false);
|
||||||
|
addRow("foo\\(?)\\bar", "foo\\(Q)\\bar", true, true);
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -2501,12 +2515,17 @@ void tst_QRegularExpression::wildcard()
|
|||||||
{
|
{
|
||||||
QFETCH(QString, pattern);
|
QFETCH(QString, pattern);
|
||||||
QFETCH(QString, string);
|
QFETCH(QString, string);
|
||||||
QFETCH(qsizetype, foundIndex);
|
QFETCH(bool, matchesPathGlob);
|
||||||
|
QFETCH(bool, matchesNonPathGlob);
|
||||||
|
|
||||||
|
{
|
||||||
QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern));
|
QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern));
|
||||||
QRegularExpressionMatch match = re.match(string);
|
QCOMPARE(string.contains(re), matchesPathGlob);
|
||||||
|
}
|
||||||
QCOMPARE(match.capturedStart(), foundIndex);
|
{
|
||||||
|
QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern, QRegularExpression::NonPathWildcardConversion));
|
||||||
|
QCOMPARE(string.contains(re), matchesNonPathGlob);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
void tst_QRegularExpression::testInvalidWildcard_data()
|
void tst_QRegularExpression::testInvalidWildcard_data()
|
||||||
|
Loading…
Reference in New Issue
Block a user