LZ4_compress_HC_destSize() had a tendency
to discard its last match when this match overflowed specified dstBuffer limit.
The impact is generally moderate,
but occasionally huge,
typically when this last match is very large
(such as compressing a bunch of zeroes).
Issue #784 fixed for both Chain and Opt implementations.
Added a unit test suggested by @remittor checking this topic.
LZ4_decompress_safe_partial()
now also supports a scenario where
nb_bytes_to_generate is <= block_decompressed_size
And
nb_bytes_to_read is >= block_compressed_size.
Previously, the only supported scenario was
nb_bytes_to_read == block_compress_size.
Pay attention that,
if nb_bytes_to_read is > block_compressed_size,
then, necessarily, it requires that
nb_bytes_to_generate is <= block_decompress_size.
If both are larger, it will generate corrupted data.
which serves no more purpose.
The comment implies that the simple presence of this unused function was affecting performance,
and that's the reason why it was not removed earlier.
This is likely another side effect of instruction alignment.
It's obviously unreliable to rely on it in this way,
meaning that the impact will be different, positive of negative,
with any minor code change, and any minor compiler version change, even parameter change.
EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.
Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.
But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).
Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.
This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.
This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.
`LZ4_memcpy()` uses `__builtin_memcpy()` to ensure that clang/gcc
can inline the `memcpy()` calls in freestanding mode.
This is necessary for decompressing the Linux Kernel with LZ4.
Without an analogous patch decompression ran at 77 MB/s, and with
the patch it ran at 884 MB/s.
Similar work in the kernel:
https://patchwork.kernel.org/patch/11351499/
UBsan (+clang-10) complains about doing pointer arithmetic (adding 0)
to a nullpointer.
This patch is tested with clang-10+ubsan
Define 0-argument functions like foo(void) instead of foo(), in order
to avoid a warning with -Wold-style-definition. This makes it easier
to embed lz4.c in projects that compile with -Werror
-Wold-style-definition.