bcd1b7fe8e
Code units 0xD800 .. 0xDFFF are not UCS-4, so we can't happily return them. Instead, if we encounter a stray surrogate, replace it with 0xFFFD, which is what Unicode recommends anyhow. References: §3.9 Unicode Encoding Forms D76: Unicode scalar value: Any Unicode code point except high-surrogate and low surrogate code points. As a result of this definition, the set of Unicode scalar values consists of the ranges 0 to D7FF_16 and E000_16 to 10FFFF_16, inclusive. [...] UTF-32 encoding form: The Unicode encoding form that assigns each Unicode scalar value to a single unsigned 32-bit code unit with the same numeric value as the Unicode scalar value. § C.2 Encoding Forms in ISO/IEC 10646 UCS-4. UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in 10646. § 3.9 Unicode Encoding Forms (Best Practices for Using U+FFFD) and § 5.22 Best Practice for U+FFFD Substitution Whenever an unconvertible offset is reached during conversion of a code unit sequence: 1. The maximal subpart at that offset should be replaced by a single U+FFFD. 2. The conversion should proceed at the offset immediately after the maximal subpart. [...] Whenever an unconvertible offset is reached during conversion of a code unit sequence to Unicode: 1. Find the longest code unit sequence that is the initial subsequence of some sequence that could be converted. If there is such a sequence, replace it with a single U+FFFD; otherwise replace a single code unit with a single U+FFFD. 2. The conversion should proceed at the offset immediately after the subsequence which has been replaced. [ChangeLog][QtCore][QString] QString::toUcs4 now does not return invalid UCS-4 code units belonging to the surrogate range (U+D800 to U+DFFF) when the QString contains malformed UTF-16 data. Instead, U+FFFD is returned in place of the malformed subsequence. Change-Id: I19d7af03e749fea680fd5d9635439bc9d56558a9 Reviewed-by: Lars Knoll <lars.knoll@digia.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> |
||
---|---|---|
.. | ||
qalgorithms | ||
qarraydata | ||
qbitarray | ||
qbytearray | ||
qbytearraymatcher | ||
qbytedatabuffer | ||
qcache | ||
qchar | ||
qcollator | ||
qcommandlineparser | ||
qcontiguouscache | ||
qcryptographichash | ||
qdate | ||
qdatetime | ||
qeasingcurve | ||
qelapsedtimer | ||
qexplicitlyshareddatapointer | ||
qfreelist | ||
qhash | ||
qline | ||
qlinkedlist | ||
qlist | ||
qlocale | ||
qmap | ||
qmargins | ||
qmessageauthenticationcode | ||
qpair | ||
qpoint | ||
qpointf | ||
qqueue | ||
qrect | ||
qregexp | ||
qregularexpression | ||
qringbuffer | ||
qscopedpointer | ||
qscopedvaluerollback | ||
qset | ||
qsharedpointer | ||
qsize | ||
qsizef | ||
qstl | ||
qstring | ||
qstring_no_cast_from_bytearray | ||
qstringbuilder | ||
qstringiterator | ||
qstringlist | ||
qstringmatcher | ||
qstringref | ||
qtextboundaryfinder | ||
qtime | ||
qtimeline | ||
qtimezone | ||
qvarlengtharray | ||
qvector | ||
tools.pro |