Add the QUrl::FullyDecoded flag to the component formatting

This allows the QUrl component getters to return fully decoded data,
like they did in Qt 4. This is necessary for some use-cases where the
component like the user name, password or path are used outside the
context of a URL. In those contexts, the percent-encoded data makes no
sense, and the loss of data of what could be represented in a URL is
acceptable.

Also take the opportunity to expand the documentation of those getter
methods, explaining what the options argument does.

Discussed-on: http://lists.qt-project.org/pipermail/development/2012-May/003811.html
Change-Id: I89f743cde78c02f169c88314bff0768714341419
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: David Faure <faure@kde.org>
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
This commit is contained in:
Thiago Macieira 2012-05-21 20:21:16 +02:00 committed by Qt by Nokia
parent 53d0624403
commit 1ca791faf5
5 changed files with 221 additions and 43 deletions

View File

@ -242,11 +242,17 @@
URL in QString form, exactly equal to the result of
toEncoded()
\value MostDecoded Attempt to decode as much as possible. For individual
\value FullyDecoded Attempt to decode as much as possible. For individual
components of the URL, this decodes every percent
encoding sequence, control characters (U+0000 to U+001F)
and non-US-ASCII sequences that aren't valid UTF-8
sequences.
encoding sequence, including control characters (U+0000
to U+001F) and UTF-8 sequences found in percent-encoded form.
Note: if the component contains non-US-ASCII sequences
that aren't valid UTF-8 sequences, the behaviour is
undefined since QString cannot represent those values
(data will be lost!)
This mode is should not be used in functions where more
than one URL component is returned (userInfo() and authority())
and it is not allowed in url() and toString().
The values of EncodeReserved and DecodeReserved should not be used together
in one call. The behaviour is undefined if that happens. They are provided
@ -254,6 +260,15 @@
to reserved characters is different on certain components and specially on
the full URL.
The FullyDecoded mode is similar to the behaviour of the functions
returning QString in Qt 4.x, including the fact that they will most likely
cause data loss if the component in question contains a non-UTF-8
percent-encoded sequence. Fortunately, those cases aren't common, so this
mode should be used when the component in question is used in a non-URL
context. For example, in an FTP client application, the path to the remote
file could be stored in a QUrl object, and the string to be transmitted to
the FTP server should be obtained using this flag.
\sa QUrl::FormattingOptions
*/
@ -280,17 +295,6 @@ inline static bool isHex(char c)
return (c >= '0' && c <= '9') || (c >= 'a' && c <= 'f');
}
static inline char toHex(quint8 c)
{
return c > 9 ? c - 10 + 'A' : c + '0';
}
static inline quint8 fromHex(quint8 c)
{
c |= 0x20;
return c >= 'a' ? c - 'a' + 10 : c - '0';
}
static inline QString ftpScheme()
{
return QStringLiteral("ftp");
@ -1672,7 +1676,15 @@ void QUrl::setAuthority(const QString &authority, ParsingMode mode)
Returns the authority of the URL if it is defined; otherwise
an empty string is returned.
\sa setAuthority()
The \a options argument controls how to format the authority portion of the
URL. The value of QUrl::FullyDecoded should be avoided, since it may
produce an ambiguous return value (for example, if the username contains a
colon ':' or either the username or password contain an at-sign '@'). In
all other cases, this function returns an unambiguous value, which may
contain those characters still percent-encoded, plus some control
sequences not representable in decoded form in QString.
\sa setAuthority(), userInfo(), userName(), password(), host(), port()
*/
QString QUrl::authority(ComponentFormattingOptions options) const
{
@ -1725,6 +1737,15 @@ void QUrl::setUserInfo(const QString &userInfo, ParsingMode mode)
/*!
Returns the user info of the URL, or an empty string if the user
info is undefined.
The \a options argument controls how to format the user info component. The
value of QUrl::FullyDecoded should be avoided, since it may produce an
ambiguous return value (for example, if the username contains a colon ':').
In all other cases, this function returns an unambiguous value, which may
contain that characters still percent-encoded, plus some control sequences
not representable in decoded form in QString.
\sa setUserInfo(), userName(), password(), authority()
*/
QString QUrl::userInfo(ComponentFormattingOptions options) const
{
@ -1775,7 +1796,18 @@ void QUrl::setUserName(const QString &userName, ParsingMode mode)
Returns the user name of the URL if it is defined; otherwise
an empty string is returned.
\sa setUserName(), encodedUserName()
The \a options argument controls how to format the user name component. All
values produce an unambiguous result. With QUrl::FullyDecoded, all
percent-encoded sequences are decoded; otherwise, the returned value may
contain some percent-encoded sequences for some control sequences not
representable in decoded form in QString.
Note that QUrl::FullyDecoded may cause data loss if those non-representable
sequences are present. It is recommended to use that value when the result
will be used in a non-URL context, such as setting in QAuthenticator or
negotiating a login.
\sa setUserName(), userInfo()
*/
QString QUrl::userName(ComponentFormattingOptions options) const
{
@ -1825,6 +1857,17 @@ void QUrl::setPassword(const QString &password, ParsingMode mode)
Returns the password of the URL if it is defined; otherwise
an empty string is returned.
The \a options argument controls how to format the user name component. All
values produce an unambiguous result. With QUrl::FullyDecoded, all
percent-encoded sequences are decoded; otherwise, the returned value may
contain some percent-encoded sequences for some control sequences not
representable in decoded form in QString.
Note that QUrl::FullyDecoded may cause data loss if those non-representable
sequences are present. It is recommended to use that value when the result
will be used in a non-URL context, such as setting in QAuthenticator or
negotiating a login.
\sa setPassword()
*/
QString QUrl::password(ComponentFormattingOptions options) const
@ -1887,6 +1930,20 @@ void QUrl::setHost(const QString &host, ParsingMode mode)
/*!
Returns the host of the URL if it is defined; otherwise
an empty string is returned.
The \a options argument controls how the hostname will be formatted. The
QUrl::EncodeUnicode option will cause this function to return the hostname
in the ASCII-Compatible Encoding (ACE) form, which is suitable for use in
channels that are not 8-bit clean or that require the legacy hostname (such
as DNS requests or in HTTP request headers). If that flag is not present,
this function returns the International Domain Name (IDN) in Unicode form,
according to the list of permissible top-level domains (see
idnWhiteList()).
All other flags are ignored. Host names cannot contain control or percent
characters, so the returned value can be considered fully decoded.
\sa setHost(), idnWhiteList(), setIdnWhiteList(), authority()
*/
QString QUrl::host(ComponentFormattingOptions options) const
{
@ -1983,6 +2040,16 @@ void QUrl::setPath(const QString &path, ParsingMode mode)
/*!
Returns the path of the URL.
The \a options argument controls how to format the path component. All
values produce an unambiguous result. With QUrl::FullyDecoded, all
percent-encoded sequences are decoded; otherwise, the returned value may
contain some percent-encoded sequences for some control sequences not
representable in decoded form in QString.
Note that QUrl::FullyDecoded may cause data loss if those non-representable
sequences are present. It is recommended to use that value when the result
will be used in a non-URL context, such as sending to an FTP server.
\sa setPath()
*/
QString QUrl::path(ComponentFormattingOptions options) const
@ -1999,7 +2066,7 @@ QString QUrl::path(ComponentFormattingOptions options) const
Returns true if this URL contains a Query (i.e., if ? was seen on it).
\sa hasQueryItem(), encodedQuery()
\sa setQuery(), query(), hasFragment()
*/
bool QUrl::hasQuery() const
{
@ -2028,7 +2095,15 @@ bool QUrl::hasQuery() const
In DecodedMode, '%' stand for themselves and encoded characters are not
possible.
\sa encodedQuery(), hasQuery()
Query strings often contain percent-encoded sequences, so use of
DecodedMode is discouraged. One special sequence to be aware of is that of
the plus character ('+'). QUrl does not convert spaces to plus characters,
even though HTML forms posted by web browsers do. In order to represent an
actual plus character in a query, the sequence "%2B" is usually used. This
function will leave "%2B" sequences untouched in TolerantMode or
StrictMode.
\sa query(), hasQuery()
*/
void QUrl::setQuery(const QString &query, ParsingMode mode)
{
@ -2045,6 +2120,17 @@ void QUrl::setQuery(const QString &query, ParsingMode mode)
d->sectionIsPresent &= ~QUrlPrivate::Query;
}
/*!
\overload
\since 5.0
Sets the query string of the URL to \a query.
This function reconstructs the query string from the QUrlQuery object and
sets on this QUrl object. This function does not have parsing parameters
because the QUrlQuery contains data that is already parsed.
\sa query(), hasQuery()
*/
void QUrl::setQuery(const QUrlQuery &query)
{
detach();
@ -2058,7 +2144,21 @@ void QUrl::setQuery(const QUrlQuery &query)
}
/*!
Returns the query string of the URL in percent encoded form.
Returns the query string of the URL if there's a query string, or an empty
result if not. To determine if the parsed URL contained a query string, use
hasQuery().
The \a options argument controls how to format the query component. All
values produce an unambiguous result. With QUrl::FullyDecoded, all
percent-encoded sequences are decoded; otherwise, the returned value may
contain some percent-encoded sequences for some control sequences not
representable in decoded form in QString.
Note that use of QUrl::FullyDecoded in queries is discouraged, as queries
often contain data that is supposed to remain percent-encoded, including
the use of the "%2B" sequence to represent a plus character ('+').
\sa setQuery(), hasQuery()
*/
QString QUrl::query(ComponentFormattingOptions options) const
{
@ -2116,9 +2216,20 @@ void QUrl::setFragment(const QString &fragment, ParsingMode mode)
}
/*!
Returns the fragment of the URL.
Returns the fragment of the URL. To determine if the parsed URL contained a
fragment, use hasFragment().
\sa setFragment()
The \a options argument controls how to format the fragment component. All
values produce an unambiguous result. With QUrl::FullyDecoded, all
percent-encoded sequences are decoded; otherwise, the returned value may
contain some percent-encoded sequences for some control sequences not
representable in decoded form in QString.
Note that QUrl::FullyDecoded may cause data loss if those non-representable
sequences are present. It is recommended to use that value when the result
will be used in a non-URL context.
\sa setFragment(), hasFragment()
*/
QString QUrl::fragment(ComponentFormattingOptions options) const
{
@ -2258,8 +2369,9 @@ bool QUrl::isRelative() const
}
/*!
Returns a string representation of the URL.
The output can be customized by passing flags with \a options.
Returns a string representation of the URL. The output can be customized by
passing flags with \a options. The option QUrl::FullyDecoded is not
permitted in this function since it would generate ambiguous data.
The resulting QString can be passed back to a QUrl later on.
@ -2273,8 +2385,9 @@ QString QUrl::url(FormattingOptions options) const
}
/*!
Returns a string representation of the URL.
The output can be customized by passing flags with \a options.
Returns a string representation of the URL. The output can be customized by
passing flags with \a options. The option QUrl::FullyDecoded is not
permitted in this function since it would generate ambiguous data.
The default formatting option is \l{QUrl::FormattingOptions}{PrettyDecoded}.
@ -2283,6 +2396,10 @@ QString QUrl::url(FormattingOptions options) const
QString QUrl::toString(FormattingOptions options) const
{
if (!d) return QString();
if (options == QUrl::FullyDecoded) {
qWarning("QUrl: QUrl::FullyDecoded is not permitted when reconstructing the full URL");
options = QUrl::PrettyDecoded;
}
// return just the path if:
// - QUrl::PreferLocalFile is passed
@ -2690,7 +2807,7 @@ QString QUrl::toLocalFile() const
return QString();
QString tmp;
QString ourPath = path(QUrl::MostDecoded);
QString ourPath = path(QUrl::FullyDecoded);
// magic for shared drive on windows
if (!d->host.isEmpty()) {
@ -2704,21 +2821,6 @@ QString QUrl::toLocalFile() const
tmp.remove(0, 1);
#endif
}
// check if we need to do one more decoding pass
int pct = tmp.indexOf(QLatin1Char('%'));
while (pct != -1) {
Q_ASSERT(tmp.size() >= pct + 2);
ushort char1 = tmp.at(pct + 1).unicode();
ushort char2 = tmp.at(pct + 2).unicode();
Q_ASSERT(isHex(char1) && char1 < 0x80u);
Q_ASSERT(isHex(char2) && char2 < 0x80u);
tmp.replace(pct, 3, QChar(fromHex(char1) << 4 | fromHex(char2)));
// next iteration
pct = tmp.indexOf(QLatin1Char('%'), pct + 1);
}
return tmp;
}

View File

@ -145,9 +145,10 @@ public:
EncodeDelimiters = 0x400000 | 0x800000,
EncodeReserved = 0x1000000,
DecodeReserved = 0x2000000,
// 0x4000000 used to indicate full-decode mode
FullyEncoded = EncodeSpaces | EncodeUnicode | EncodeDelimiters | EncodeReserved,
MostDecoded = PrettyDecoded | DecodeReserved
FullyDecoded = FullyEncoded | DecodeReserved | 0x4000000
};
Q_DECLARE_FLAGS(ComponentFormattingOptions, ComponentFormattingOption)
#ifdef qdoc

View File

@ -560,6 +560,43 @@ non_trivial:
return 0;
}
static int decode(QString &appendTo, const ushort *begin, const ushort *end)
{
const int origSize = appendTo.size();
const ushort *input = begin;
ushort *output = 0;
while (input != end) {
if (*input != '%') {
if (output)
*output++ = *input;
++input;
continue;
}
if (Q_UNLIKELY(!output)) {
// detach
appendTo.resize(origSize + (end - begin));
output = reinterpret_cast<ushort *>(appendTo.begin()) + origSize;
memcpy(output, begin, (input - begin) * sizeof(ushort));
output += input - begin;
}
++input;
Q_ASSERT(input <= end - 2); // we need two characters
Q_ASSERT(isHex(input[0]));
Q_ASSERT(isHex(input[1]));
*output++ = decodeNibble(input[0]) << 4 | decodeNibble(input[1]);
input += 2;
}
if (output) {
int len = output - reinterpret_cast<ushort *>(appendTo.begin());
appendTo.truncate(len);
return len - origSize;
}
return 0;
}
template <size_t N>
static void maskTable(uchar (&table)[N], const uchar (&mask)[N])
{
@ -583,6 +620,12 @@ static void maskTable(uchar (&table)[N], const uchar (&mask)[N])
\li QUrl::EncodeSpaces: if set, spaces will be encoded to "%20"; if unset, they will be " "
\li QUrl::EncodeUnicode: if set, characters above U+0080 will be encoded to their UTF-8
percent-encoded form; if unset, they will be decoded to UTF-16
\li QUrl::FullyDecoded: if set, this function will decode all percent-encoded sequences,
including that of the percent character. The resulting string
will not be percent-encoded anymore. Use with caution!
In this mode, the behaviour is undefined if the input string
contains any percent-encoding sequences above %80.
Also, the function will not correct bad % sequences.
\endlist
Other flags are ignored (including QUrl::EncodeReserved).
@ -599,6 +642,10 @@ qt_urlRecode(QString &appendTo, const QChar *begin, const QChar *end,
QUrl::ComponentFormattingOptions encoding, const ushort *tableModifications)
{
uchar actionTable[sizeof defaultActionTable];
if (encoding == QUrl::FullyDecoded) {
return decode(appendTo, reinterpret_cast<const ushort *>(begin), reinterpret_cast<const ushort *>(end));
}
if (!(encoding & QUrl::EncodeDelimiters) && encoding & QUrl::DecodeReserved) {
// reset the table
memset(actionTable, DecodeCharacter, sizeof actionTable);

View File

@ -3059,6 +3059,26 @@ void tst_QUrl::setComponents_data()
QTest::newRow("fragment-encode") << QUrl("http://example.com/foo#z")
<< int(Fragment) << "bar%23" << Decoded << true
<< PrettyDecoded << "bar%2523" << "http://example.com/foo#bar%2523";
// force decoding; note how the userinfo becomes ambiguous
QTest::newRow("userinfo-decode") << QUrl("http://example.com")
<< int(UserInfo) << "hello%3Aworld:}}>b9o%25kR(" << Tolerant << true
<< FullyDecoded << "hello:world:}}>b9o%kR("
<< "http://hello%3Aworld:%7D%7D%3Eb9o%25kR(@example.com";
QTest::newRow("username-decode") << QUrl("http://example.com")
<< int(UserName) << "hello%3Aworld%25" << Tolerant << true
<< FullyDecoded << "hello:world%" << "http://hello%3Aworld%25@example.com";
QTest::newRow("password-decode") << QUrl("http://example.com")
<< int(Password) << "}}>b9o%25kR(" << Tolerant << true
<< FullyDecoded << "}}>b9o%kR(" << "http://:%7D%7D%3Eb9o%25kR(@example.com";
QTest::newRow("path-decode") << QUrl("http://example.com/")
<< int(Path) << "/bar%25foo" << Tolerant << true
<< FullyDecoded << "/bar%foo" << "http://example.com/bar%25foo";
QTest::newRow("query-decode") << QUrl("http://example.com/foo?qq")
<< int(Query) << "bar%25foo" << Tolerant << true
<< FullyDecoded << "bar%foo" << "http://example.com/foo?bar%25foo";
QTest::newRow("fragment-decode") << QUrl("http://example.com/foo#qq")
<< int(Fragment) << "bar%25foo" << Tolerant << true
<< FullyDecoded << "bar%foo" << "http://example.com/foo#bar%25foo";
}
void tst_QUrl::setComponents()

View File

@ -886,6 +886,14 @@ void tst_QUrlInternal::encodingRecode_data()
QTest::newRow("control-unreserved-0x" + code) << "Foo%00Bar%0D%0Abksp%7F" << mode << "Foo%00Bar%0D%0Abksp%7F";
}
// however, control characters and the percent *are* decoded in FullyDecoded mode
// this is the only exception
QTest::newRow("control-nul-fullydecoded") << "%00" << F(QUrl::FullyDecoded) << QStringLiteral("\0");
QTest::newRow("control-fullydecoded") << "%0D%0A%1F%1A%7F" << F(QUrl::FullyDecoded) << "\r\n\x1f\x1a\x7f";
QTest::newRow("percent-fullydecoded") << "25%2525" << F(QUrl::FullyDecoded) << "25%25";
QTest::newRow("control-unreserved-fullydecoded") << "Foo%00Bar%0D%0Abksp%7F" << F(QUrl::FullyDecoded)
<< QStringLiteral("Foo\0Bar\r\nbksp\x7F");
// gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
// sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
// / "*" / "+" / "," / ";" / "="