QRegularExpression: add support for non-filepath wildcards/globbing

A glob pattern has different semantics depending on whether
it's used in "filepath mode" (FNM_PATHNAME) or not.
QRegularExpression only implemented the former, but the latter
is also useful, and possibly more "intuitive" for certain use
cases (e.g. offering users a simplified version of regexps that
however still need "*" to match a "/").

Add this support. The problems highlighted by QTBUG-111234 have
not been addressed, I've just amended a bit of documentation
relating backslashes.

[ChangeLog][QtCore][QRegularExpression] Support for non-filepath
wildcards has been added to wildcardToRegularExpression().

Fixes: QTBUG-110901
Task-number: QTBUG-104585
Task-number: QTBUG-111234
Change-Id: If9850616267980fa843bda996fcb4552b5375938
Reviewed-by: Eike Ziller <eike.ziller@qt.io>
Reviewed-by: Samuel Gaist <samuel.gaist@idiap.ch>
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
This commit is contained in:
Giuseppe D'Angelo 2023-02-16 21:20:50 +01:00
parent b3a60e49cd
commit 4b197c3f52
3 changed files with 134 additions and 75 deletions

View File

@ -1856,22 +1856,31 @@ QString QRegularExpression::escape(QStringView str)
\value UnanchoredWildcardConversion \value UnanchoredWildcardConversion
The conversion will not anchor the pattern. This allows for partial string matches of The conversion will not anchor the pattern. This allows for partial string matches of
wildcard expressions. wildcard expressions.
\value NonPathWildcardConversion
The conversion will \e{not} interpret the pattern as filepath globbing.
This enum value has been introduced in Qt 6.6.
\sa QRegularExpression::wildcardToRegularExpression
*/ */
/*! /*!
\since 5.15 \since 5.15
Returns a regular expression representation of the given glob \a pattern. Returns a regular expression representation of the given glob \a pattern.
The transformation is targeting file path globbing, which means in particular
that path separators receive special treatment. This implies that it is not There are two transformations possible, one that targets file path
just a basic translation from "*" to ".*". globbing, and another one which is more generic.
By default, the transformation is targeting file path globbing,
which means in particular that path separators receive special
treatment. This implies that it is not just a basic translation
from "*" to ".*" and similar.
\snippet code/src_corelib_text_qregularexpression.cpp 31 \snippet code/src_corelib_text_qregularexpression.cpp 31
By default, the returned regular expression is fully anchored. In other The more generic globbing transformation is available by passing
words, there is no need of calling anchoredPattern() again on the \c NonPathWildcardConversion in the conversion \a options.
result. To get a regular expression that is not anchored, pass
UnanchoredWildcardConversion as the conversion \a options.
This implementation follows closely the definition This implementation follows closely the definition
of wildcard for glob patterns: of wildcard for glob patterns:
@ -1880,10 +1889,12 @@ QString QRegularExpression::escape(QStringView str)
\li Any character represents itself apart from those mentioned \li Any character represents itself apart from those mentioned
below. Thus \b{c} matches the character \e c. below. Thus \b{c} matches the character \e c.
\row \li \b{?} \row \li \b{?}
\li Matches any single character. It is the same as \li Matches any single character, except for a path separator
\b{.} in full regexps. (in case file path globbing has been selected). It is the
same as b{.} in full regexps.
\row \li \b{*} \row \li \b{*}
\li Matches zero or more of any characters. It is the \li Matches zero or more of any characters, except for path
separators (in case file path globbing has been selected). It is the
same as \b{.*} in full regexps. same as \b{.*} in full regexps.
\row \li \b{[abc]} \row \li \b{[abc]}
\li Matches one character given in the bracket. \li Matches one character given in the bracket.
@ -1897,9 +1908,10 @@ QString QRegularExpression::escape(QStringView str)
bracket. It is the same as \b{[^a-c]} in full regexp. bracket. It is the same as \b{[^a-c]} in full regexp.
\endtable \endtable
\note The backslash (\\) character is \e not an escape char in this context. \note For historical reasons, a backslash (\\) character is \e not
In order to match one of the special characters, place it in square brackets an escape char in this context. In order to match one of the
(for example, \c{[?]}). special characters, place it in square brackets (for example,
\c{[?]}).
More information about the implementation can be found in: More information about the implementation can be found in:
\list \list
@ -1907,6 +1919,11 @@ QString QRegularExpression::escape(QStringView str)
\li \c {man 7 glob} \li \c {man 7 glob}
\endlist \endlist
By default, the returned regular expression is fully anchored. In other
words, there is no need of calling anchoredPattern() again on the
result. To get a regular expression that is not anchored, pass
UnanchoredWildcardConversion in the conversion \a options.
\sa escape() \sa escape()
*/ */
QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, WildcardConversionOptions options) QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, WildcardConversionOptions options)
@ -1917,29 +1934,49 @@ QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, Wil
qsizetype i = 0; qsizetype i = 0;
const QChar *wc = pattern.data(); const QChar *wc = pattern.data();
struct GlobSettings {
char16_t nativePathSeparator;
QStringView starEscape;
QStringView questionMarkEscape;
};
const GlobSettings settings = [options]() {
if (options.testFlag(NonPathWildcardConversion)) {
return GlobSettings{ u'\0', u".*", u"." };
} else {
#ifdef Q_OS_WIN #ifdef Q_OS_WIN
const char16_t nativePathSeparator = u'\\'; return GlobSettings{ u'\\', u"[^/\\\\]*", u"[^/\\\\]" };
const auto starEscape = "[^/\\\\]*"_L1;
const auto questionMarkEscape = "[^/\\\\]"_L1;
#else #else
const char16_t nativePathSeparator = u'/'; return GlobSettings{ u'/', u"[^/]*", u"[^/]" };
const auto starEscape = "[^/]*"_L1;
const auto questionMarkEscape = "[^/]"_L1;
#endif #endif
}
}();
while (i < wclen) { while (i < wclen) {
const QChar c = wc[i++]; const QChar c = wc[i++];
switch (c.unicode()) { switch (c.unicode()) {
case '*': case '*':
rx += starEscape; rx += settings.starEscape;
break; break;
case '?': case '?':
rx += questionMarkEscape; rx += settings.questionMarkEscape;
break; break;
// When not using filepath globbing: \ is escaped, / is itself
// When using filepath globbing:
// * Unix: \ gets escaped. / is itself
// * Windows: \ and / can match each other -- they become [/\\] in regexp
case '\\': case '\\':
#ifdef Q_OS_WIN #ifdef Q_OS_WIN
if (options.testFlag(NonPathWildcardConversion))
rx += u"\\\\";
else
rx += u"[/\\\\]";
break;
case '/': case '/':
rx += "[/\\\\]"_L1; if (options.testFlag(NonPathWildcardConversion))
rx += u'/';
else
rx += u"[/\\\\]";
break; break;
#endif #endif
case '$': case '$':
@ -1967,11 +2004,13 @@ QString QRegularExpression::wildcardToRegularExpression(QStringView pattern, Wil
rx += wc[i++]; rx += wc[i++];
while (i < wclen && wc[i] != u']') { while (i < wclen && wc[i] != u']') {
if (!options.testFlag(NonPathWildcardConversion)) {
// The '/' appearing in a character class invalidates the // The '/' appearing in a character class invalidates the
// regular expression parsing. It also concerns '\\' on // regular expression parsing. It also concerns '\\' on
// Windows OS types. // Windows OS types.
if (wc[i] == u'/' || wc[i] == nativePathSeparator) if (wc[i] == u'/' || wc[i] == settings.nativePathSeparator)
return rx; return rx;
}
if (wc[i] == u'\\') if (wc[i] == u'\\')
rx += u'\\'; rx += u'\\';
rx += wc[i++]; rx += wc[i++];

View File

@ -131,7 +131,8 @@ public:
enum WildcardConversionOption { enum WildcardConversionOption {
DefaultWildcardConversion = 0x0, DefaultWildcardConversion = 0x0,
UnanchoredWildcardConversion = 0x1 UnanchoredWildcardConversion = 0x1,
NonPathWildcardConversion = 0x2,
}; };
Q_DECLARE_FLAGS(WildcardConversionOptions, WildcardConversionOption) Q_DECLARE_FLAGS(WildcardConversionOptions, WildcardConversionOption)

View File

@ -2446,54 +2446,68 @@ void tst_QRegularExpression::wildcard_data()
{ {
QTest::addColumn<QString>("pattern"); QTest::addColumn<QString>("pattern");
QTest::addColumn<QString>("string"); QTest::addColumn<QString>("string");
QTest::addColumn<qsizetype>("foundIndex"); QTest::addColumn<bool>("matchesPathGlob");
QTest::addColumn<bool>("matchesNonPathGlob");
auto addRow = [](const char *pattern, const char *string, qsizetype foundIndex) { auto addRow = [](const char *pattern, const char *string, bool matchesPathGlob, bool matchesNonPathGlob) {
QTest::addRow("%s@%s", pattern, string) << pattern << string << foundIndex; QTest::addRow("%s@%s", pattern, string) << pattern << string << matchesPathGlob << matchesNonPathGlob;
}; };
addRow("*.html", "test.html", 0); addRow("*.html", "test.html", true, true);
addRow("*.html", "test.htm", -1); addRow("*.html", "test.htm", false, false);
addRow("*bar*", "foobarbaz", 0); addRow("*bar*", "foobarbaz", true, true);
addRow("*", "Qt Rocks!", 0); addRow("*", "Qt Rocks!", true, true);
addRow("*.h", "test.cpp", -1); addRow("*.h", "test.cpp", false, false);
addRow("*.???l", "test.html", 0); addRow("*.???l", "test.html", true, true);
addRow("*?", "test.html", 0); addRow("*?", "test.html", true, true);
addRow("*?ml", "test.html", 0); addRow("*?ml", "test.html", true, true);
addRow("*[*]", "test.html", -1); addRow("*[*]", "test.html", false, false);
addRow("*[?]","test.html", -1); addRow("*[?]","test.html", false, false);
addRow("*[?]ml","test.h?ml", 0); addRow("*[?]ml","test.h?ml", true, true);
addRow("*[[]ml","test.h[ml", 0); addRow("*[[]ml","test.h[ml", true, true);
addRow("*[]]ml","test.h]ml", 0); addRow("*[]]ml","test.h]ml", true, true);
addRow("*.h[a-z]ml", "test.html", 0); addRow("*.h[a-z]ml", "test.html", true, true);
addRow("*.h[A-Z]ml", "test.html", -1); addRow("*.h[A-Z]ml", "test.html", false, false);
addRow("*.h[A-Z]ml", "test.hTml", 0); addRow("*.h[A-Z]ml", "test.hTml", true, true);
addRow("*.h[!A-Z]ml", "test.hTml", -1); addRow("*.h[!A-Z]ml", "test.hTml", false, false);
addRow("*.h[!A-Z]ml", "test.html", 0); addRow("*.h[!A-Z]ml", "test.html", true, true);
addRow("*.h[!T]ml", "test.hTml", -1); addRow("*.h[!T]ml", "test.hTml", false, false);
addRow("*.h[!T]ml", "test.html", 0); addRow("*.h[!T]ml", "test.html", true, true);
addRow("*.h[!T]m[!L]", "test.htmL", -1); addRow("*.h[!T]m[!L]", "test.htmL", false, false);
addRow("*.h[!T]m[!L]", "test.html", 0); addRow("*.h[!T]m[!L]", "test.html", true, true);
addRow("*.h[][!]ml", "test.h]ml", 0); addRow("*.h[][!]ml", "test.h]ml", true, true);
addRow("*.h[][!]ml", "test.h[ml", 0); addRow("*.h[][!]ml", "test.h[ml", true, true);
addRow("*.h[][!]ml", "test.h!ml", 0); addRow("*.h[][!]ml", "test.h!ml", true, true);
addRow("foo/*/bar", "foo/baz/bar", 0); addRow("foo/*/bar", "foo/baz/bar", true, true);
addRow("foo/(*)/bar", "foo/baz/bar", -1); addRow("foo/*/bar", "foo/fie/baz/bar", false, true);
addRow("foo/(*)/bar", "foo/(baz)/bar", 0); addRow("foo?bar", "foo/bar", false, true);
addRow("foo/?/bar", "foo/Q/bar", 0); addRow("foo/(*)/bar", "foo/baz/bar", false, false);
addRow("foo/?/bar", "foo/Qt/bar", -1); addRow("foo/(*)/bar", "foo/(baz)/bar", true, true);
addRow("foo/(?)/bar", "foo/Q/bar", -1); addRow("foo/?/bar", "foo/Q/bar", true, true);
addRow("foo/(?)/bar", "foo/(Q)/bar", 0); addRow("foo/?/bar", "foo/Qt/bar", false, false);
addRow("foo/(?)/bar", "foo/Q/bar", false, false);
addRow("foo/(?)/bar", "foo/(Q)/bar", true, true);
addRow("foo*bar", "foo/fie/baz/bar", false, true);
addRow("fie*bar", "foo/fie/baz/bar", false, false); // regexp is anchored
#ifdef Q_OS_WIN #ifdef Q_OS_WIN
addRow("foo\\*\\bar", "foo\\baz\\bar", 0); addRow("foo\\*\\bar", "foo\\baz\\bar", true, true);
addRow("foo\\(*)\\bar", "foo\\baz\\bar", -1); addRow("foo\\*\\bar", "foo/baz/bar", true, false);
addRow("foo\\(*)\\bar", "foo\\(baz)\\bar", 0); addRow("foo\\*\\bar", "foo/baz\\bar", true, false);
addRow("foo\\?\\bar", "foo\\Q\\bar", 0); addRow("foo\\*\\bar", "foo\\fie\\baz\\bar", false, true);
addRow("foo\\?\\bar", "foo\\Qt\\bar", -1); addRow("foo\\*\\bar", "foo/fie/baz/bar", false, false);
addRow("foo\\(?)\\bar", "foo\\Q\\bar", -1); addRow("foo/*/bar", "foo\\baz\\bar", true, false);
addRow("foo\\(?)\\bar", "foo\\(Q)\\bar", 0); addRow("foo/*/bar", "foo/baz/bar", true, true);
addRow("foo/*/bar", "foo\\fie\\baz\\bar", false, false);
addRow("foo/*/bar", "foo/fie/baz/bar", false, true);
addRow("foo\\(*)\\bar", "foo\\baz\\bar", false, false);
addRow("foo\\(*)\\bar", "foo\\(baz)\\bar", true, true);
addRow("foo\\?\\bar", "foo\\Q\\bar", true, true);
addRow("foo\\?\\bar", "foo\\Qt\\bar", false, false);
addRow("foo\\(?)\\bar", "foo\\Q\\bar", false, false);
addRow("foo\\(?)\\bar", "foo\\(Q)\\bar", true, true);
#endif #endif
} }
@ -2501,12 +2515,17 @@ void tst_QRegularExpression::wildcard()
{ {
QFETCH(QString, pattern); QFETCH(QString, pattern);
QFETCH(QString, string); QFETCH(QString, string);
QFETCH(qsizetype, foundIndex); QFETCH(bool, matchesPathGlob);
QFETCH(bool, matchesNonPathGlob);
{
QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern)); QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern));
QRegularExpressionMatch match = re.match(string); QCOMPARE(string.contains(re), matchesPathGlob);
}
QCOMPARE(match.capturedStart(), foundIndex); {
QRegularExpression re(QRegularExpression::wildcardToRegularExpression(pattern, QRegularExpression::NonPathWildcardConversion));
QCOMPARE(string.contains(re), matchesNonPathGlob);
}
} }
void tst_QRegularExpression::testInvalidWildcard_data() void tst_QRegularExpression::testInvalidWildcard_data()