Commit Graph

24 Commits

Author SHA1 Message Date
Kurt Pattyn
add2bf739a Allow non-character codes in utf8 strings
Changed the processing of non-character code handling in the UTF8 codec.
Non-character codes are now accepted in QStrings, QUrls and QJson strings.
Unit tests were adapted accordingly.
For more info about non-character codes,
see: http://www.unicode.org/versions/corrigendum9.html

[ChangeLog][QtCore][QUtf8]
UTF-8 now accepts non-character unicode points; these are not replaced
by the replacement character anymore

[ChangeLog][QtCore][QUrl]
QUrl now fully accepts non-character unicode points; they are encoded as
percent characters; they can also be pretty decoded

[ChangeLog][QtCore][QJson]
The Writer and the Parser now fully accept non-character unicode points.

Change-Id: I77cf4f0e6210741eac8082912a0b6118eced4f77
Task-number: QTBUG-33229
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2013-10-17 09:50:58 +02:00
Frederik Gladhorn
8c6755aeec Merge remote-tracking branch 'origin/stable' into dev
Conflicts:
	src/corelib/tools/qstring.cpp

Change-Id: Ifc6cd3a0f1cf14cc0fe6cf30afb0c7f40cfdbc3e
2013-09-16 14:52:40 +02:00
Marc Mutz
c8411a0281 tst_qurlinternal: fix a use of memcpy on overlapping memory
The old code smply copied 100 shorts from the pointer passed into the
ushortarray ctor, regardless of the actual bounds of the original array.

Fix by making the ctor take the array by deference, deducing the size
as a template parameter, and only copying that much.

Fixes asan trace:
==18660==ERROR: AddressSanitizer: memcpy-param-overlap: memory ranges [0x7fff3c56de00,0x7fff3c56dec8) and [0x7fff3c56dd60, 0x7fff3c56de28) overlap
    #0 0x457161 in memcpy asan_interceptors.cc:330
    #1 0x4c40fe in ushortarray::ushortarray(unsigned short*) qtbase/tests/auto/corelib/io/qurlinternal/tst_qurlinternal.cpp:62
    #2 0x4b0437 in ushortarray::ushortarray(unsigned short*) qtbase/tests/auto/corelib/io/qurlinternal/tst_qurlinternal.cpp:63
    #3 0x47b643 in tst_QUrlInternal::idna_testsuite_data() qtbase/tests/auto/corelib/io/qurlinternal/tst_qurlinternal.cpp:119
    ...

Change-Id: Ie497bc8d337bc680a562482ca71ace535797ffb3
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2013-09-14 23:41:06 +02:00
Thiago Macieira
7b964c77fa Make sure that QUrl::FullyDecoded mode uses U+FFFD for bad UTF-8
It's a good practice to always replace bad UTF-8 sequences with the
replacement character. It could be considered a security issue too.

Change-Id: I9e7d72e4c4102cdb8334449b5e7f882228a9048f
Reviewed-by: David Faure (KDE) <faure@kde.org>
2013-08-04 04:48:13 +02:00
Thiago Macieira
3d77406e27 Make the URL Recode function to fix bad input in FullyDecoded mode too
So far, this function hasn't been used for input coming in from the
user, so it wasn't necessary. But we may want to do it, or we may
already be doing it accidentally somewhere that isn't triggering the
failed assertions during unit testing.

So let's be on the safe side and allow it. And test it too.

Change-Id: Ib63addd8da468ad6908278d07a4829f1bdc26a07
Reviewed-by: David Faure (KDE) <faure@kde.org>
2013-07-20 05:06:34 +02:00
Thiago Macieira
4d93393a6d QUrl stringprep: fix handling of U+0080: it's prohibited
Edge case: a > that should have been >=. Without it, we never ran the
rest of the IDN nameprepping.

Change-Id: I2276d660de3a70d0c561bb18816820d9a0f47e77
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2013-06-08 05:06:57 +02:00
Thiago Macieira
736a052d93 QUrl stringprep: fix handling of prohibited characters
RFC 3454 says about prohibited characters (section 2, "Preparation
Overview"):

   3) Prohibit -- Check for any characters that are not allowed in the
      output.  If any are found, return an error.  This is described in
      section 5.

In other words, we mustn't simply strip the output of prohibited
characters. We must generate an error if they are present. We do that by
clearing the data.

We already had tests for prohibited output, but they were
indistinguishable from being stripped. So instead add some extra
characters so that we can tell whether the label was cleared.

Change-Id: I2d95217c27be5e2d54deed0036cb009e3b7f4886
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
2013-06-08 05:06:47 +02:00
Peter Hartmann
b20d15b58b QUrl: update top level domains that may contain non-ASCII characters
Most notably, .com and .net now may contain non-ASCII characters.
list has been generated from
http://www.mozilla.org/projects/security/tld-idn-policy-list.html

Change-Id: Idc3191dc782bc4173ccb19b4bc81f4f061ca7999
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2013-03-02 10:13:27 +01:00
Sergio Ahumada
48e0c4df23 Update copyright year in Digia's license headers
Change-Id: Ic804938fc352291d011800d21e549c10acac66fb
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
2013-01-18 09:07:35 +01:00
Iikka Eklund
be15856f61 Change copyrights from Nokia to Digia
Change copyrights and license headers from Nokia to Digia

Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e
Reviewed-by: Lars Knoll <lars.knoll@digia.com>
Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
2012-09-22 19:20:11 +02:00
Sergio Ahumada
34cd8fd566 tests: Don't omit the body of a test function with QT_BUILD_INTERNAL
Changing it outside of the test function definition to avoid running
empty/inapplicable test functions.

Change-Id: I713560cde7f715696984ed082d682900f5f1bcdd
Reviewed-by: Qt Doc Bot <qt_docbot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Caroline Chao <caroline.chao@nokia.com>
2012-09-14 06:24:38 +02:00
Thiago Macieira
ce9b010ec6 Fix decoding of QByteArray in the deprecated "encoded" setters in QUrl
The asymmetry is intentional: the getters can use toLatin1() because the
called functions, with a QUrl::FullyEncoded parameter, return ASCII
only. This gives a small performance improvement over the need to run
the UTF-8 encoder.

However, the data passed to setters could contain non-ASCII binary data,
in addition to the percent-encoded data. We can't use fromUtf8 because
it's binary and we can't use toPercentEncoded because it already encoded.

Change-Id: I5ecdb49be5af51ac86fd9764eb3a6aa96385f512
Reviewed-by: David Faure <faure@kde.org>
2012-08-20 21:59:32 +02:00
Thiago Macieira
60818231d8 tst_qurlinternal: use qurl_p.h instead of declaring the functions
Just in case someone (like me) changes the function signatures or adds
new functions.

Change-Id: I1025fea012d95ffe89acaf799aa58fd2b0babc80
Reviewed-by: David Faure <faure@kde.org>
2012-08-20 21:59:28 +02:00
Thiago Macieira
672b5b7ab6 Set the Qt API level to compatibility mode in all tests.
Qt 5.0 beta requires changing the default to the 5.0 API, disabling
the deprecated code. However, tests should test (and often do) the
compatibility API too, so turn it back on.

Task-number: QTBUG-25053
Change-Id: I8129c3ef3cb58541c95a32d083850d9e7f768927
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>
2012-08-01 15:37:46 +02:00
David Faure
61162c6e87 Add missing subdirs (the new QUrl unit tests were not compiled and run)
Change-Id: I1b39d92b8a14d5aeca957180858e1980a534894b
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
2012-06-26 14:53:46 +02:00
Thiago Macieira
1ca791faf5 Add the QUrl::FullyDecoded flag to the component formatting
This allows the QUrl component getters to return fully decoded data,
like they did in Qt 4. This is necessary for some use-cases where the
component like the user name, password or path are used outside the
context of a URL. In those contexts, the percent-encoded data makes no
sense, and the loss of data of what could be represented in a URL is
acceptable.

Also take the opportunity to expand the documentation of those getter
methods, explaining what the options argument does.

Discussed-on: http://lists.qt-project.org/pipermail/development/2012-May/003811.html
Change-Id: I89f743cde78c02f169c88314bff0768714341419
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: David Faure <faure@kde.org>
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
2012-05-22 20:56:38 +02:00
Thiago Macieira
f0ec001242 Port away from QUrl::MostDecoded
Since we're about to introduce QUrl::FullyDecoded, this
QUrl::MostDecoded value would be confusing. Replace its uses with what
was intended at the point in question.

Change-Id: Iefd87bc33d37bace507c5cb0f206fa902e08e2df
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: David Faure <faure@kde.org>
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
2012-05-22 20:56:38 +02:00
Thiago Macieira
4793356ce7 Fix the idempotent recoding tests in tst_QUrlInternal
This was trying all the possibilities by brute force, but it turns out
that some combinations are not valid so they should not be
tested. What's more, it was using old values of the flags, so this was
actually testing nothing.

Change-Id: I6c2f5230d240fc23418df2d3a1ca905dbc47dd10
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: David Faure <faure@kde.org>
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
2012-05-22 20:56:38 +02:00
Thiago Macieira
1b7e9dba75 Change the component formatting enum values so the default is zero
By having the default value equal to zero, we follow the principle of
least surprise. For example, if we had
      url.path()
and we refactored to
      url.path(QUrl::DecodeSpaces)

Then instead of ensuring spaces are decoded, we make spaces the only
thing encoded (unicode, delimiters and reserved characters are
encoded).

Besides, modifying the default can only be used to encode something
that wasn't encoded previously, so having the enums as Encode makes
more sense.

As a side-effect, toEncoded() does not support any extra encoding
options.

Change-Id: I2624ec446e65c2d979e9ca2f81bd3db22b00bb13
Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
2012-04-11 23:32:26 +02:00
Thiago Macieira
7f20dce264 Move the #include "tst_qurlinternal.moc" up to workaround a bug
I don't know if the bug is in moc or in qmake. But it bails out trying
to parse the .cpp file after the
tst_QUrlInternal::nameprep_testsuite_data function. If the #include is
placed above, it works. If it's placed below, it doesn't.

Change-Id: Ide554aa5aa3f1999e29604ba6d25ccdb09f6ef28
Reviewed-by: Marius Storm-Olsen <marius.storm-olsen@nokia.com>
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>
2012-03-30 01:19:59 +02:00
Thiago Macieira
b75aa795fe Refactor the URL recoder a little
Change it to operate on QChar pointers, which gains a little in
performance. This also avoids unnecessary detaching in the QString
source.

In addition, make the output be appended to an existing QString. This
will be useful later when we're reconstructing a URL from its
components.

Change-Id: I7e2f64028277637bd329af5f98001ace253a50c7
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-03-30 01:19:59 +02:00
Thiago Macieira
73e16b15a6 Remove the tolerant parsing function and make the recoder tolerant
The reason for this change is that the strict parser made little sense
to exist. What would the recoder do if it was passed an invalid
string?

I believe that the tolerant recoder is more efficient than the
correcting code followed by the strict recoder. This makes the recoder
more complex and probably a little less efficient, but it's better in
the common case (tolerant that doesn't need fixes) and in the worst
case (needs fixes).

Change-Id: I68a0c9fda6765de05914cbd6ba7d3cea560a7cd6
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-03-30 01:19:59 +02:00
Thiago Macieira
6028efa3ff Add the code that recodes URLs.
This one function is an all-in-one:
 - UTF-8 encoder
 - UTF-8 decoder
 - percent encoder
 - percent decoder

The next step is add the ability to modify the behaviour, by telling
the function what else it must encode or decode and what it should
leave untouched.

Change-Id: I997eccfd2f9ad8487305670b18d6c806f4cf6717
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
2012-03-30 01:19:59 +02:00
Thiago Macieira
4c7e950aad Mark QUrl::{to,from}Punycode as deprecated since 5.0
These functions are now aliases to {to,from}Ace, which are usually
what you want. The original functions from Qt 4.0 had the wrong
semantics and wrong name. The new ones from Qt 4.2 execute the ACE
processing from IDNA (specifically, the ToASCII and ToUnicode
operations described in the RFC).

But so as not to be without tests, export the tests in unit testing
environment and test the punycode roundtrip. Note that the
tst_QUrl::idna_test_suite test tests *only* the Punycode roundtrip,
not the nameprepping.

Change-Id: I9b95b4bd07b4425344a5c6ef5cce7cfcb9846d3e
Reviewed-by: João Abecasis <joao.abecasis@nokia.com>
Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
Reviewed-by: David Faure <faure@kde.org>
2012-03-30 01:19:59 +02:00