UTF-8: always store the SIMD result, even if invalid

For ASCII content, this improves the throughput because the conditional
is no longer on the codepath to storing, so the processor can perform
the store at the same time as it's doing the movemask
operation. However, the gain is mostly theoretical: benchmarking with
mostly ASCII content shows the algorithm running within 0.5% of the
previous result (which is noise).

For non-ASCII content, we're comparing the cost of doing a 16-byte store
(which may be completely overwritten) with the loop copying and shifting
left. Benchmarking shows a slight gain of a few percent.

Change-Id: I28ef0021dffc725a922c539cc5976db367f36e78
Reviewed-by: Allan Sandfeld Jensen <allan.jensen@digia.com>
This commit is contained in:
Thiago Macieira 2014-02-21 16:26:32 -08:00 committed by The Qt Project
parent 8917179b47
commit d0c38291eb

View File

@ -74,25 +74,22 @@ static inline bool simdEncodeAscii(uchar *&dst, const ushort *&nextAscii, const
__m128i packed = _mm_packus_epi16(data1, data2);
__m128i nonAscii = _mm_cmpgt_epi8(packed, _mm_setzero_si128());
// store, even if there are non-ASCII characters here
_mm_storeu_si128((__m128i*)dst, packed);
// n will contain 1 bit set per character in [data1, data2] that is non-ASCII (or NUL)
ushort n = ~_mm_movemask_epi8(nonAscii);
if (n) {
// copy the front part that is still ASCII
while (!(n & 1)) {
*dst++ = *src++;
n >>= 1;
}
// find the next probable ASCII character
// we don't want to load 32 bytes again in this loop if we know there are non-ASCII
// characters still coming
n = _bit_scan_reverse(n);
nextAscii = src + n + 1;
nextAscii = src + _bit_scan_reverse(n) + 1;
n = _bit_scan_forward(n);
dst += n;
src += n;
return false;
}
// pack
_mm_storeu_si128((__m128i*)dst, packed);
}
return src == end;
}