QUrl("http%3A%2F%2Fexample.com") has only a path of
"http%3A%2F%2Fexample.com". In Qt 5.0 and 5.1, the %3A would get decoded
to ':', which in turn makes the URL invalid (colon before first slash).
Found via discussion on the interest mailing list.
Change-Id: I7f4f242b330df280e635eb97cce123e742aa1b10
Reviewed-by: David Faure <david.faure@kdab.com>
This fixes the wrong value for path() and fileName() when a
path or file name actually contains a '%'.
userInfo() and authority() are not individual getters, they combine
two or more fields, so full decoding isn't possible (e.g. username
containing a ':').
[ChangeLog][Important Behavior Changes][QUrl and QUrlQuery]QUrl now
defaults to decoded mode in the getters and setters for userName,
password, host, topLevelDomain, path and fileName. This means a '%'
in one of those fields is now returned (or set) as '%' rather than "%25".
In the unlikely case where the former behavior was expected, pass PrettyDecoded
to the getter and TolerantMode to the setter.
Change-Id: Iaeecbde9c269882e79f08b29ff8c661157c41743
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
For some time, we've assumed that the URL specification had a mistake in
that it didn't allow the "#" character to appear decoded in the
fragment. We've gotten away with it so far.
However, turns out that the CoreFoundation NSURL class doesn't like it.
So we have to be stricter.
[ChangeLog][Important Behavior Changes][QUrl and QUrlQuery] QUrl no
longer decodes %23 found in the fragment to "#" in the output of
toString(QUrl::FullyEncoded) or toEncoded()
Task-number: QTBUG-31945
Change-Id: If5e0fb37bae84710986c9ca89bd69ec98437cd63
Reviewed-by: David Faure (KDE) <faure@kde.org>
Those sections contain more than one components of a URL, separated by
delimiters. For that reason, QUrl::FullyDecoded and QUrl::DecodedMode do
not make sense, since they would cause the returned value to be
ambiguous and/or fail to parse again.
In fact, there was a comment in the test saying "look how it becomes
ambiguous".
Those modes are already forbidden in the setters and getters of the full
URL (setUrl(), url(), toString() and toEncoded()).
[ChangeLog][Important Behavior Changes][QUrl and QUrlQuery] QUrl no
longer supports QUrl::FullyDecoded mode in authority() and userInfo(),
nor QUrl::DecodedMode in setAuthority() and setUserInfo().
Change-Id: I538f7981a9f5a09f07d3879d31ccf6f0c8bfd940
Reviewed-by: David Faure (KDE) <faure@kde.org>
The longer explanation can be found in the comment in qurl.cpp. The
short version is as follows:
Up to now, we considered that every character could be replaced with
its percent-encoding equivalent and vice-versa, so long as the parsing
of the URL did not change. For example, x:/path+path and
x:/path%2Bpath were the same. However, to do this and yet be compliant
with most URL uses in the real world, we had to add exceptions:
- "/" and "%2F" were not the same in the path, despite the delimiter
being behind (rationale was the complex definition of path)
- "+" and "%2B" were not the same in the query, so we ended up not
transforming any sub-delim in the query at all
Now, we change our understanding based on the following line from
RFC 3986 section 2.2:
URIs that differ in the replacement of a reserved character with
its corresponding percent-encoded octet are not equivalent.
From now on, QUrl will not replace any sub-delim or gen-delim
("reserved character"), except where such a character could not exist
in the first place. This simplifies the code and removes all
exceptions.
As a side-effect, this has also changed the behaviour of the "{" and
"}" characters, which we previously allowed to remain decoded.
[ChangeLog][Important Behavior Changes][QUrl and QUrlQuery] QUrl no
longer considers all delimiter characters equivalent to their
percent-encoded forms. Now, both classes always keep all delimiters
exactly as they were in the original URL text.
[ChangeLog][Important Behavior Changes][QUrl and QUrlQuery] QUrl no
longer decodes %7B and %7D to "{" and "}" in the output of toString()
Task-number: QTBUG-31660
Change-Id: Iba0b5b31b269635ac2d0adb2bb0dfb74c139e08c
Reviewed-by: David Faure (KDE) <faure@kde.org>
We don't know what it might be used for. The RFC for URI says it's an
HEXDIG, and since we uppercase all other HEXDIGs already (in
percent-encodings...).
Change-Id: I56d0a81315576dd98eaa2657c0307d79332543a5
Reviewed-by: David Faure (KDE) <faure@kde.org>
We have no idea what it might contain, but test it anyway to make sure
it works. Turns out there were a few bugs the unit tests have now
caught.
Change-Id: I0a6c868365feec31c2360b3c341c8ca6944f4352
Reviewed-by: David Faure (KDE) <faure@kde.org>
Registered names and IP addresses can only contain unreserved
characters (letters, digits, dots, hyphens, underscores) and the
colon, which is a gen-delim. For registered names and IPv4 addresses,
we can simply use the default config -- if anything that remains
percent-encoded, it means it's not a valid hostname anyway.
For IPv6, we just need to decode the colon.
Change-Id: If8083d47f6e5375f760e7a6c59631c89e4da8378
Reviewed-by: David Faure (KDE) <faure@kde.org>
This is a bit like QDir::cleanPath(), but for URL paths.
The code is shared with QDir::cleanPath(), by extracting the common parts
it into a helper, qt_normalizePathSegments().
Change-Id: I7133c5e4aa2bf17fba98af13eb5371afba64197a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This allows to find the parent directory url using
url.adjusted(QUrl::RemoveFilename).
Change-Id: I1ca433ac67e4f93080de54a9b7ab2e538509ed04
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
qt_ACE_do(".co.uk") was returning an empty string because of the
leading dot. Allow leading dots from topLevelDomain, but not from
other calls.
Change-Id: I757d9960708e205d30554cd2bbcf618c8624792b
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This reverts commit e3fa266623b08e837cb4ccc7fe59da243d03dd27
That commit applied a change at the wrong place in the code.
Change-Id: I21e3045a3af14ad2f90c5fe338815c35a2d27ae6
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: David Faure (KDE) <faure@kde.org>
Leading and double dots are bad, but trailing dots are fine. The ASCII
part of a hostname is supposed to be LDH (letters, digits, hyphen) only,
but we accept '_' (underscore) as an exception too.
Change-Id: I79957ddec4da78a0e2357fe50c8687db03e1c99e
Reviewed-by: David Faure (KDE) <faure@kde.org>
qt_ACE_do(".co.uk") was returning an empty string because of the
leading dot. This has always caused issues in KDE code too, where ACE
normalization needs the dot removed, and re-added afterwards.
Change-Id: Id9fcea0333cf55c14d755a86d4bf33a50f194429
Reviewed-by: Frederik Gladhorn <frederik.gladhorn@digia.com>
qt_nameprep is tested by tst_qurlinternal. We just need to be sure that
QUrl handles them correctly.
Change-Id: Ic563004870d2cf2fa7a31ce49fff7280d5ffb5f3
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Most notably, .com and .net now may contain non-ASCII characters.
list has been generated from
http://www.mozilla.org/projects/security/tld-idn-policy-list.html
Change-Id: Idc3191dc782bc4173ccb19b4bc81f4f061ca7999
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The table is there to know which domains are allowed to set cookies
and which are not. There are more than 2000 new entries since the
list has last been generated.
The split to 64K chunks was made because this is the hard limit for
strings in Visual Studio.
Change-Id: I511aec062af673555e9a69442c055f75bdcd1606
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
These tests were added to Qt 4 on commit
a17fc85b51a6bdcfa33dcff183d2b7efd667fb92
Task-number: QTBUG-28985
Change-Id: I3cf595384f14272197dcfb85943213c8f8ddeba0
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
This is a very common thing to do, e.g. in order to send urls via DBus.
Change-Id: I277902460ee1ad6780446e862e86b3c2eb8c5315
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
QUrl::fromUserInput("http://") was invalid, which doesn't make sense
since QUrl("http://") is valid. Same for "smb:" which is actually
even more a valid URL from a user's point of view.
Change-Id: I371ac393d61b49499edf5adbbc2a90b426fe9e5d
Reviewed-by: Marco Martin <mart@kde.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
These cases weren't handled before.
The validateComponent function is copied from QUrlPrivate::parse, with
the added modification that it now needs to check the gen-delims for
the userinfo.
Change-Id: I055167b977199fa86b56a3a7259a7445585129c6
Reviewed-by: David Faure (KDE) <faure@kde.org>
... with respect to empty and null strings.
Change-Id: Ic107d5bcc8b659497a567b75a7244caceba5a715
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Keep the original QString that triggered the parsing error, instead of
just one QChar. This provides more powerful error messages, like:
Invalid IPv6 address; source was "http://[:::]"; scheme = "http", host = ""
(QUrl cannot keep invalid hostnames)
Invalid port or port number out of range; source was "http://example.com:abc"; scheme = "http", host = "example.com"
(QUrl cannot keep a non-numeric port number)
Invalid path (character '%' not permitted); source was "foo:/path%?"; scheme = "foo", path = "/path%25%1F"
(the tolerant parser runs first, so the faulty component is fixed)
This stores the error state in a special structure which is not
allocated under normal conditions, keeping the memory consumption
down. On 32-bit systems, QUrlPrivate does not increase in size; on
64-bit systems, it grows by 8 bytes.
Change-Id: I93d798d43401dfeb9fca7b6eed7ea758da10136b
Reviewed-by: David Faure <faure@kde.org>
Make both invalid hostname messages start with "Invalid hostname". And
split the empty port error from the invalid port one.
Change-Id: I870d1ed6fb07ec494f553871a37ed167141ffc06
Reviewed-by: David Faure <faure@kde.org>
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
That's what we have QUrl::errorString() for. This will become evident
especially now that QUrl::toString() / toEncoded() return empty if
there are errors.
Change-Id: I64a84e9c6ee57c0fc38cc0c58f5286ddc1248d1f
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
Reviewed-by: David Faure <faure@kde.org>
These two errors can only happen if one calls setPath() explicitly. They
cannot happen for parsed URLs, which is why they are only caught with
isValid(). It's not possible to set the error condition in setPath()
either because they depend on the presence / absence of the authority
and scheme.
Also update all the unit tests that set a path not starting with a slash
and were just "freeloaders" on the previous behaviour.
Change-Id: Ice58cd4589a850452d7573a5b19667bbab2fb43e
Reviewed-by: David Faure <faure@kde.org>
Change copyrights and license headers from Nokia to Digia
Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
This detected the same missing detach()s in QUrl::resolve.
Everything else works, no need for a mutex in Qt5's QUrl.
Change-Id: I0da51b7b0c6b810d314a26d4b638383cd17de12b
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The asymmetry is intentional: the getters can use toLatin1() because the
called functions, with a QUrl::FullyEncoded parameter, return ASCII
only. This gives a small performance improvement over the need to run
the UTF-8 encoder.
However, the data passed to setters could contain non-ASCII binary data,
in addition to the percent-encoded data. We can't use fromUtf8 because
it's binary and we can't use toPercentEncoded because it already encoded.
Change-Id: I5ecdb49be5af51ac86fd9764eb3a6aa96385f512
Reviewed-by: David Faure <faure@kde.org>
Qt 5.0 beta requires changing the default to the 5.0 API, disabling
the deprecated code. However, tests should test (and often do) the
compatibility API too, so turn it back on.
Task-number: QTBUG-25053
Change-Id: I8129c3ef3cb58541c95a32d083850d9e7f768927
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
Ensure that the parsing mode is cascaded down from setAuthority and
setUrl so that the hostname parsing does not attempt to decode
percent-encoded hostnames when it shouldn't.
Take the opportunity to also remove the "Boolean Trap" from
QUrlPrivate::setHost.
Change-Id: Ia64754c4a4900182700b7af1382aea8410abc7e9
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
The URI RFC defines schemes as containing only a very restricted set
of characters, none of which require encoding, so don't even
try. Testing this behaviour in some web browsers indicate that they do
not accept percent-encoded schemes either.
Change-Id: I692dd20e1aac7e8a1bcb276cb5113b5802393d38
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
If the password is empty (but present), the userinfo component of the
URL should end in a colon (":"). QUrl already supported that and it
was tested (case "password-empty").
If the username is *also* empty but present, the userinfo component is
just the colon (":"). Fix support for that case by checking if we
stored the presence flag instead of checking the size of the
component.
Change-Id: Ie224493a997dbf76b2e44dd6d55fd9674ac83c1c
Reviewed-by: David Faure <faure@kde.org>
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
QString::fromUtf8, without an explicit size, (currently) defaults to
stopping at the first NUL. That means we need to pass an explicit
size.
Also take the opportunity to test that QUrl::toPercentEncoding also
works with the same data.
Change-Id: I79362d67afda624b01ca07b0315b611c4aa3fdda
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: David Faure <faure@kde.org>
When given an invalid url, the output shouldn't be a valid url.
KDE's kurltest detected this regression compared to Qt4, where
all invalid urls were empty in toString() -- but we don't want that,
to give as much feedback as possible to the user.
Change-Id: Ie53e6e1c0a1d4bb9e12b820220dfb7e2f7753959
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This allows the QUrl component getters to return fully decoded data,
like they did in Qt 4. This is necessary for some use-cases where the
component like the user name, password or path are used outside the
context of a URL. In those contexts, the percent-encoded data makes no
sense, and the loss of data of what could be represented in a URL is
acceptable.
Also take the opportunity to expand the documentation of those getter
methods, explaining what the options argument does.
Discussed-on: http://lists.qt-project.org/pipermail/development/2012-May/003811.html
Change-Id: I89f743cde78c02f169c88314bff0768714341419
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: David Faure <faure@kde.org>
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>