QRegularExpression: allow users to skip the UTF-16 check of the subject string
PCRE does not handle invalid UTF-16 sequences. For this reason we always check a subject string's UTF-16 validity before attempting any match over it (actually we let PCRE do that). The only exception so far has been global matching -- once the first match was done, we skipped re-doing the check over and over again the same string (PCRE actually checks the /entire/ string, not only the part it uses for matching). Still, users had no way to skip this check if they were 100% sure the string was a valid UTF-16 string. This commit introduces a way for them to skip the check. Change-Id: Iea352c06f531aa2153863b3a1681acaab7ac375c Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This commit is contained in:
parent
4532669285
commit
fd80cad07e
@ -763,6 +763,13 @@ QT_BEGIN_NAMESPACE
|
||||
The match is constrained to start exactly at the offset passed to
|
||||
match() in order to be successful, even if the pattern string does not
|
||||
contain any metacharacter that anchors the match at that point.
|
||||
|
||||
\value DontCheckSubjectStringMatchOption
|
||||
The subject string is not checked for UTF-16 validity before
|
||||
attempting a match. Use this option with extreme caution, as
|
||||
attempting to match an invalid string may crash the program and/or
|
||||
constitute a security issue. This enum value has been introduced in
|
||||
Qt 5.4.
|
||||
*/
|
||||
|
||||
// after how many usages we optimize the regexp
|
||||
@ -1221,7 +1228,8 @@ static int pcre16SafeExec(const pcre16 *code, const pcre16_extra *extra,
|
||||
options \a matchOptions and returns the QRegularExpressionMatchPrivate of
|
||||
the result. It also advances a match if a previous result is given as \a
|
||||
previous. The \a subject string goes a Unicode validity check if
|
||||
\a checkSubjectString is CheckSubjectString (PCRE doesn't like illegal
|
||||
\a checkSubjectString is CheckSubjectString and the match options don't
|
||||
include DontCheckSubjectStringMatchOption (PCRE doesn't like illegal
|
||||
UTF-16 sequences).
|
||||
|
||||
Advancing a match is a tricky algorithm. If the previous match matched a
|
||||
@ -1290,8 +1298,10 @@ QRegularExpressionMatchPrivate *QRegularExpressionPrivate::doMatch(const QString
|
||||
else if (matchType == QRegularExpression::PartialPreferFirstMatch)
|
||||
pcreOptions |= PCRE_PARTIAL_HARD;
|
||||
|
||||
if (checkSubjectStringOption == DontCheckSubjectString)
|
||||
if (checkSubjectStringOption == DontCheckSubjectString
|
||||
|| matchOptions & QRegularExpression::DontCheckSubjectStringMatchOption) {
|
||||
pcreOptions |= PCRE_NO_UTF16_CHECK;
|
||||
}
|
||||
|
||||
bool previousMatchWasEmpty = false;
|
||||
if (previous && previous->hasMatch &&
|
||||
|
@ -110,7 +110,8 @@ public:
|
||||
|
||||
enum MatchOption {
|
||||
NoMatchOption = 0x0000,
|
||||
AnchoredMatchOption = 0x0001
|
||||
AnchoredMatchOption = 0x0001,
|
||||
DontCheckSubjectStringMatchOption = 0x0002
|
||||
};
|
||||
Q_DECLARE_FLAGS(MatchOptions, MatchOption)
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user