Add an SSE4.2 even simpler version of toLatin1

Use the new PCMPESTRM instruction (Parallel CoMPare Explicit-length
STRings with result in a Mask) which is added in SSE4.2 for
facilitating string operations. The "compare ranges" mode allows us to
search for characters outside the Latin 1 range and then use the
SSE4.1 PBLENDVB instruction to replace those with question marks.

Unlike previous SSE compare instructions, the PCMPxSTRx family allows
us to operate on unsigned 16-bit values. This saves us another
parallel add.

Reviewed-By: Samuel Rødal
(cherry picked from commit 45d2d36c9dbcbce403c78838ea52acd1ab111b68)

Change-Id: I0f9d864f9d19c0f0da43ccb6918dc28295074496
Reviewed-on: http://codereview.qt-project.org/4468
Reviewed-by: Qt Sanity Bot <qt_sanity_bot@ovi.com>
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@nokia.com>
Reviewed-by: Samuel Rødal <samuel.rodal@nokia.com>
This commit is contained in:
Thiago Macieira 2010-12-22 19:52:48 +01:00 committed by Qt by Nokia
parent 526c851902
commit 83ba0d56f8
2 changed files with 27 additions and 0 deletions

View File

@ -3540,6 +3540,28 @@ static inline __m128i mergeQuestionMarks(__m128i chunk)
{
const __m128i questionMark = _mm_set1_epi16('?');
# ifdef __SSE4_2__
// compare the unsigned shorts for the range 0x0100-0xFFFF
// note on the use of _mm_cmpestrm:
// The MSDN documentation online (http://technet.microsoft.com/en-us/library/bb514080.aspx)
// says for range search the following:
// For each character c in a, determine whether b0 <= c <= b1 or b2 <= c <= b3
//
// However, all examples on the Internet, including from Intel
// (see http://software.intel.com/en-us/articles/xml-parsing-accelerator-with-intel-streaming-simd-extensions-4-intel-sse4/)
// put the range to be searched first
//
// Disassembly and instruction-level debugging with GCC and ICC show
// that they are doing the right thing. Inverting the arguments in the
// instruction does cause a bunch of test failures.
const int mode = _SIDD_UWORD_OPS | _SIDD_CMP_RANGES | _SIDD_UNIT_MASK;
const __m128i rangeMatch = _mm_cvtsi32_si128(0xffff0100);
const __m128i offLimitMask = _mm_cmpestrm(rangeMatch, 2, chunk, 8, mode);
// replace the non-Latin 1 characters in the chunk with question marks
chunk = _mm_blendv_epi8(chunk, questionMark, offLimitMask);
# else
// SSE has no compare instruction for unsigned comparison.
// The variables must be shiffted + 0x8000 to be compared
const __m128i signedBitOffset = _mm_set1_epi16(0x8000);
@ -3563,6 +3585,7 @@ static inline __m128i mergeQuestionMarks(__m128i chunk)
// merge offLimitQuestionMark and correctBytes to have the result
chunk = _mm_or_si128(correctBytes, offLimitQuestionMark);
# endif
# endif
return chunk;
}
#endif

View File

@ -3462,6 +3462,10 @@ void tst_QString::toLatin1Roundtrip_data()
static const ushort unicode6[] = { 0x180, 0x1ff, 0x8001, 0x8080, 0xfffc };
QTest::newRow("non-latin1b") << QByteArray("?????") << QString::fromUtf16(unicode6, 5) << questionmarks;
static const ushort unicode7[] = { 'H', 'e', 'l', 'l', 'o', 0x100, 0x17f, 0x180, 0x8080, 0xfffc };
static const ushort unicode7q[] = { 'H', 'e', 'l', 'l', 'o', '?', '?', '?', '?', '?' };
QTest::newRow("mixed") << QByteArray("Hello?????") << QString::fromUtf16(unicode7, 10) << QString::fromUtf16(unicode7q, 10);
}
void tst_QString::toLatin1Roundtrip()