Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.
This commit is contained in:
Yann Collet 2020-08-12 17:27:33 -07:00
parent f328e329b3
commit e9cfa49d0d
3 changed files with 55 additions and 30 deletions

View File

@ -16,7 +16,7 @@ Distribution of this document is unlimited.
### Version
1.6.1 (30/01/2018)
1.6.2 (12/08/2020)
Introduction
@ -75,7 +75,7 @@ __Frame Descriptor__
3 to 15 Bytes, to be detailed in its own paragraph,
as it is the most important part of the spec.
The combined __Magic Number__ and __Frame Descriptor__ fields are sometimes
The combined _Magic_Number_ and _Frame_Descriptor_ fields are sometimes
called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes.
__Data Blocks__
@ -85,14 +85,13 @@ Thats where compressed data is stored.
__EndMark__
The flow of blocks ends when the last data block has a size of “0”.
The size is expressed as a 32-bits value.
The flow of blocks ends when the last data block is announced with
an invalid size of “0”, expressed as a 32-bits value (exactly `0x00000000`).
__Content Checksum__
Content Checksum verify that the full content has been decoded correctly.
The content checksum is the result
of [xxh32() hash function](https://github.com/Cyan4973/xxHash)
_Content_Checksum_ verify that the full content has been decoded correctly.
The content checksum is the result of [xxHash-32 algorithm]
digesting the original (decoded) data as input, and a seed of zero.
Content checksum is only present when its associated flag
is set in the frame descriptor.
@ -101,7 +100,7 @@ that all blocks were fully transmitted in the correct order and without error,
and also that the encoding/decoding process itself generated no distortion.
Its usage is recommended.
The combined __EndMark__ and __Content Checksum__ fields might sometimes be
The combined _EndMark_ and _Content_Checksum_ fields might sometimes be
referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes.
__Frame Concatenation__
@ -261,16 +260,24 @@ __Block Size__
This field uses 4-bytes, format is little-endian.
The highest bit is “1” if data in the block is uncompressed.
If highest bit is set (`1`), the block is uncompressed.
The highest bit is “0” if data in the block is compressed by LZ4.
If highest bit is not set (`0`), the block is LZ4-compressed,
using the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
All other bits give the size, in bytes, of the following data block.
All other bits give the size, in bytes, of the data section.
The size does not include the block checksum if present.
Block Size shall never be larger than Block Maximum Size.
Such a thing could potentially happen for non-compressible sources.
In such a case, such data block shall be passed using uncompressed format.
_Block_Size_ shall never be larger than _Block_Maximum_Size_.
Such an outcome could potentially happen for non-compressible sources.
In such a case, such data block must be passed using uncompressed format.
A value of `0x00000000` is invalid, and signifies an _EndMark_ instead.
Note that this is different from a value of `0x80000000` (highest bit set),
which is an uncompressed block of size 0 (empty),
which is valid, and therefore doesn't end a frame.
Note that, if _Block_checksum_ is enabled,
even an empty block must be followed by a 32-bit block checksum.
__Data__
@ -279,20 +286,22 @@ It might be compressed or not, depending on previous field indications.
When compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
Note that the block is not necessarily full.
Uncompressed size of data can be any size, up to "Block Maximum Size”,
Note that a block is not necessarily full.
Uncompressed size of data can be any size __up to__ _Block_Maximum_Size_,
so it may contain less data than the maximum block size.
__Block checksum__
Only present if the associated flag is set.
This is a 4-bytes checksum value, in little endian format,
calculated by using the xxHash-32 algorithm on the raw (undecoded) data block,
calculated by using the [xxHash-32 algorithm] on the __raw__ (undecoded) data block,
and a seed of zero.
The intention is to detect data corruption (storage or transmission errors)
before decoding.
Block checksum is cumulative with Content checksum.
_Block_checksum_ can be cumulative with _Content_checksum_.
[xxHash-32 algorithm]: https://github.com/Cyan4973/xxHash/blob/release/doc/xxhash_spec.md
Skippable Frames
@ -389,6 +398,8 @@ and trigger an error if it does not fit within acceptable range.
Version changes
---------------
1.6.2 : clarifies specification of _EndMark_
1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer"
1.6.0 : restored Dictionary ID field in Frame header

View File

@ -1483,14 +1483,16 @@ size_t LZ4F_decompress(LZ4F_dctx* dctx,
} /* if (dctx->dStage == dstage_storeBlockHeader) */
/* decode block header */
{ size_t const nextCBlockSize = LZ4F_readLE32(selectedIn) & 0x7FFFFFFFU;
{ U32 const blockHeader = LZ4F_readLE32(selectedIn);
size_t const nextCBlockSize = blockHeader & 0x7FFFFFFFU;
size_t const crcSize = dctx->frameInfo.blockChecksumFlag * BFSize;
if (nextCBlockSize==0) { /* frameEnd signal, no more block */
if (blockHeader==0) { /* frameEnd signal, no more block */
dctx->dStage = dstage_getSuffix;
break;
}
if (nextCBlockSize > dctx->maxBlockSize)
if (nextCBlockSize > dctx->maxBlockSize) {
return err0r(LZ4F_ERROR_maxBlockSize_invalid);
}
if (LZ4F_readLE32(selectedIn) & LZ4F_BLOCKUNCOMPRESSED_FLAG) {
/* next block is uncompressed */
dctx->tmpInTarget = nextCBlockSize;

View File

@ -995,13 +995,13 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
BYTE* op = (BYTE*)compressedBuffer;
BYTE* const oend = op + (neverFlush ? LZ4F_compressFrameBound(srcSize, prefsPtr) : compressedBufferSize); /* when flushes are possible, can't guarantee a max compressed size */
unsigned const maxBits = FUZ_highbit((U32)srcSize);
size_t cSegmentSize;
LZ4F_compressOptions_t cOptions;
memset(&cOptions, 0, sizeof(cOptions));
cSegmentSize = LZ4F_compressBegin(cCtx, op, (size_t)(oend-op), prefsPtr);
CHECK(LZ4F_isError(cSegmentSize), "Compression header failed (error %i)",
(int)cSegmentSize);
op += cSegmentSize;
{ size_t const fhSize = LZ4F_compressBegin(cCtx, op, (size_t)(oend-op), prefsPtr);
CHECK(LZ4F_isError(fhSize), "Compression header failed (error %i)",
(int)fhSize);
op += fhSize;
}
while (ip < iend) {
unsigned const nbBitsSeg = FUZ_rand(&randState) % maxBits;
size_t const sampleMax = (FUZ_rand(&randState) & ((1<<nbBitsSeg)-1)) + 1;
@ -1024,8 +1024,20 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
DISPLAYLEVEL(6,"flushing %u bytes \n", (unsigned)flushSize);
CHECK(LZ4F_isError(flushSize), "Compression failed (error %i)", (int)flushSize);
op += flushSize;
} }
}
if ((FUZ_rand(&randState) % 1024) == 3) {
/* add an empty block (requires uncompressed flag) */
op[0] = op[1] = op[2] = 0;
op[3] = 0x80; /* 0x80000000U in little-endian format */
op += 4;
if ((prefsPtr!= NULL) && prefsPtr->frameInfo.blockChecksumFlag) {
U32 const bc32 = XXH32(op, 0, 0);
op[0] = (BYTE)bc32; /* little endian format */
op[1] = (BYTE)(bc32>>8);
op[2] = (BYTE)(bc32>>16);
op[3] = (BYTE)(bc32>>24);
op += 4;
} } } }
} /* while (ip<iend) */
CHECK(op>=oend, "LZ4F_compressFrameBound overflow");
{ size_t const dstEndSafeSize = LZ4F_compressBound(0, prefsPtr);
int const tooSmallDstEnd = ((FUZ_rand(&randState) & 31) == 3);
@ -1086,8 +1098,8 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
DISPLAYLEVEL(6, "noisy decompression \n");
test_lz4f_decompression(compressedBuffer, cSize, srcStart, srcSize, crcOrig, &randState, dCtxNoise, seed, testNb);
/* note : we don't analyze result here : it probably failed, which is expected.
* We just check for potential out-of-bound reads and writes. */
LZ4F_resetDecompressionContext(dCtxNoise); /* context must be reset after an error */
* The sole purpose is to catch potential out-of-bound reads and writes. */
LZ4F_resetDecompressionContext(dCtxNoise); /* context must be reset after an error */
#endif
} /* for ( ; (testNb < nbTests) ; ) */