Merge pull request #899 from lz4/endMark

Clarifies and fix EndMark
This commit is contained in:
Yann Collet 2020-08-14 15:48:21 -07:00 committed by GitHub
commit 9a6e93859d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 55 additions and 30 deletions

View File

@ -16,7 +16,7 @@ Distribution of this document is unlimited.
### Version ### Version
1.6.1 (30/01/2018) 1.6.2 (12/08/2020)
Introduction Introduction
@ -75,7 +75,7 @@ __Frame Descriptor__
3 to 15 Bytes, to be detailed in its own paragraph, 3 to 15 Bytes, to be detailed in its own paragraph,
as it is the most important part of the spec. as it is the most important part of the spec.
The combined __Magic Number__ and __Frame Descriptor__ fields are sometimes The combined _Magic_Number_ and _Frame_Descriptor_ fields are sometimes
called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes. called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes.
__Data Blocks__ __Data Blocks__
@ -85,14 +85,13 @@ Thats where compressed data is stored.
__EndMark__ __EndMark__
The flow of blocks ends when the last data block has a size of “0”. The flow of blocks ends when the last data block is followed by
The size is expressed as a 32-bits value. the 32-bit value `0x00000000`.
__Content Checksum__ __Content Checksum__
Content Checksum verify that the full content has been decoded correctly. _Content_Checksum_ verify that the full content has been decoded correctly.
The content checksum is the result The content checksum is the result of [xxHash-32 algorithm]
of [xxh32() hash function](https://github.com/Cyan4973/xxHash)
digesting the original (decoded) data as input, and a seed of zero. digesting the original (decoded) data as input, and a seed of zero.
Content checksum is only present when its associated flag Content checksum is only present when its associated flag
is set in the frame descriptor. is set in the frame descriptor.
@ -101,7 +100,7 @@ that all blocks were fully transmitted in the correct order and without error,
and also that the encoding/decoding process itself generated no distortion. and also that the encoding/decoding process itself generated no distortion.
Its usage is recommended. Its usage is recommended.
The combined __EndMark__ and __Content Checksum__ fields might sometimes be The combined _EndMark_ and _Content_Checksum_ fields might sometimes be
referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes. referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes.
__Frame Concatenation__ __Frame Concatenation__
@ -261,16 +260,24 @@ __Block Size__
This field uses 4-bytes, format is little-endian. This field uses 4-bytes, format is little-endian.
The highest bit is “1” if data in the block is uncompressed. If the highest bit is set (`1`), the block is uncompressed.
The highest bit is “0” if data in the block is compressed by LZ4. If the highest bit is not set (`0`), the block is LZ4-compressed,
using the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
All other bits give the size, in bytes, of the following data block. All other bits give the size, in bytes, of the data section.
The size does not include the block checksum if present. The size does not include the block checksum if present.
Block Size shall never be larger than Block Maximum Size. _Block_Size_ shall never be larger than _Block_Maximum_Size_.
Such a thing could potentially happen for non-compressible sources. Such an outcome could potentially happen for non-compressible sources.
In such a case, such data block shall be passed using uncompressed format. In such a case, such data block must be passed using uncompressed format.
A value of `0x00000000` is invalid, and signifies an _EndMark_ instead.
Note that this is different from a value of `0x80000000` (highest bit set),
which is an uncompressed block of size 0 (empty),
which is valid, and therefore doesn't end a frame.
Note that, if _Block_checksum_ is enabled,
even an empty block must be followed by a 32-bit block checksum.
__Data__ __Data__
@ -279,20 +286,22 @@ It might be compressed or not, depending on previous field indications.
When compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md). When compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
Note that the block is not necessarily full. Note that a block is not necessarily full.
Uncompressed size of data can be any size, up to "Block Maximum Size”, Uncompressed size of data can be any size __up to__ _Block_Maximum_Size_,
so it may contain less data than the maximum block size. so it may contain less data than the maximum block size.
__Block checksum__ __Block checksum__
Only present if the associated flag is set. Only present if the associated flag is set.
This is a 4-bytes checksum value, in little endian format, This is a 4-bytes checksum value, in little endian format,
calculated by using the xxHash-32 algorithm on the raw (undecoded) data block, calculated by using the [xxHash-32 algorithm] on the __raw__ (undecoded) data block,
and a seed of zero. and a seed of zero.
The intention is to detect data corruption (storage or transmission errors) The intention is to detect data corruption (storage or transmission errors)
before decoding. before decoding.
Block checksum is cumulative with Content checksum. _Block_checksum_ can be cumulative with _Content_checksum_.
[xxHash-32 algorithm]: https://github.com/Cyan4973/xxHash/blob/release/doc/xxhash_spec.md
Skippable Frames Skippable Frames
@ -389,6 +398,8 @@ and trigger an error if it does not fit within acceptable range.
Version changes Version changes
--------------- ---------------
1.6.2 : clarifies specification of _EndMark_
1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer" 1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer"
1.6.0 : restored Dictionary ID field in Frame header 1.6.0 : restored Dictionary ID field in Frame header

View File

@ -1483,14 +1483,16 @@ size_t LZ4F_decompress(LZ4F_dctx* dctx,
} /* if (dctx->dStage == dstage_storeBlockHeader) */ } /* if (dctx->dStage == dstage_storeBlockHeader) */
/* decode block header */ /* decode block header */
{ size_t const nextCBlockSize = LZ4F_readLE32(selectedIn) & 0x7FFFFFFFU; { U32 const blockHeader = LZ4F_readLE32(selectedIn);
size_t const nextCBlockSize = blockHeader & 0x7FFFFFFFU;
size_t const crcSize = dctx->frameInfo.blockChecksumFlag * BFSize; size_t const crcSize = dctx->frameInfo.blockChecksumFlag * BFSize;
if (nextCBlockSize==0) { /* frameEnd signal, no more block */ if (blockHeader==0) { /* frameEnd signal, no more block */
dctx->dStage = dstage_getSuffix; dctx->dStage = dstage_getSuffix;
break; break;
} }
if (nextCBlockSize > dctx->maxBlockSize) if (nextCBlockSize > dctx->maxBlockSize) {
return err0r(LZ4F_ERROR_maxBlockSize_invalid); return err0r(LZ4F_ERROR_maxBlockSize_invalid);
}
if (LZ4F_readLE32(selectedIn) & LZ4F_BLOCKUNCOMPRESSED_FLAG) { if (LZ4F_readLE32(selectedIn) & LZ4F_BLOCKUNCOMPRESSED_FLAG) {
/* next block is uncompressed */ /* next block is uncompressed */
dctx->tmpInTarget = nextCBlockSize; dctx->tmpInTarget = nextCBlockSize;

View File

@ -995,13 +995,13 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
BYTE* op = (BYTE*)compressedBuffer; BYTE* op = (BYTE*)compressedBuffer;
BYTE* const oend = op + (neverFlush ? LZ4F_compressFrameBound(srcSize, prefsPtr) : compressedBufferSize); /* when flushes are possible, can't guarantee a max compressed size */ BYTE* const oend = op + (neverFlush ? LZ4F_compressFrameBound(srcSize, prefsPtr) : compressedBufferSize); /* when flushes are possible, can't guarantee a max compressed size */
unsigned const maxBits = FUZ_highbit((U32)srcSize); unsigned const maxBits = FUZ_highbit((U32)srcSize);
size_t cSegmentSize;
LZ4F_compressOptions_t cOptions; LZ4F_compressOptions_t cOptions;
memset(&cOptions, 0, sizeof(cOptions)); memset(&cOptions, 0, sizeof(cOptions));
cSegmentSize = LZ4F_compressBegin(cCtx, op, (size_t)(oend-op), prefsPtr); { size_t const fhSize = LZ4F_compressBegin(cCtx, op, (size_t)(oend-op), prefsPtr);
CHECK(LZ4F_isError(cSegmentSize), "Compression header failed (error %i)", CHECK(LZ4F_isError(fhSize), "Compression header failed (error %i)",
(int)cSegmentSize); (int)fhSize);
op += cSegmentSize; op += fhSize;
}
while (ip < iend) { while (ip < iend) {
unsigned const nbBitsSeg = FUZ_rand(&randState) % maxBits; unsigned const nbBitsSeg = FUZ_rand(&randState) % maxBits;
size_t const sampleMax = (FUZ_rand(&randState) & ((1<<nbBitsSeg)-1)) + 1; size_t const sampleMax = (FUZ_rand(&randState) & ((1<<nbBitsSeg)-1)) + 1;
@ -1024,8 +1024,20 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
DISPLAYLEVEL(6,"flushing %u bytes \n", (unsigned)flushSize); DISPLAYLEVEL(6,"flushing %u bytes \n", (unsigned)flushSize);
CHECK(LZ4F_isError(flushSize), "Compression failed (error %i)", (int)flushSize); CHECK(LZ4F_isError(flushSize), "Compression failed (error %i)", (int)flushSize);
op += flushSize; op += flushSize;
} } if ((FUZ_rand(&randState) % 1024) == 3) {
} /* add an empty block (requires uncompressed flag) */
op[0] = op[1] = op[2] = 0;
op[3] = 0x80; /* 0x80000000U in little-endian format */
op += 4;
if ((prefsPtr!= NULL) && prefsPtr->frameInfo.blockChecksumFlag) {
U32 const bc32 = XXH32(op, 0, 0);
op[0] = (BYTE)bc32; /* little endian format */
op[1] = (BYTE)(bc32>>8);
op[2] = (BYTE)(bc32>>16);
op[3] = (BYTE)(bc32>>24);
op += 4;
} } } }
} /* while (ip<iend) */
CHECK(op>=oend, "LZ4F_compressFrameBound overflow"); CHECK(op>=oend, "LZ4F_compressFrameBound overflow");
{ size_t const dstEndSafeSize = LZ4F_compressBound(0, prefsPtr); { size_t const dstEndSafeSize = LZ4F_compressBound(0, prefsPtr);
int const tooSmallDstEnd = ((FUZ_rand(&randState) & 31) == 3); int const tooSmallDstEnd = ((FUZ_rand(&randState) & 31) == 3);
@ -1086,8 +1098,8 @@ int fuzzerTests(U32 seed, unsigned nbTests, unsigned startTest, double compressi
DISPLAYLEVEL(6, "noisy decompression \n"); DISPLAYLEVEL(6, "noisy decompression \n");
test_lz4f_decompression(compressedBuffer, cSize, srcStart, srcSize, crcOrig, &randState, dCtxNoise, seed, testNb); test_lz4f_decompression(compressedBuffer, cSize, srcStart, srcSize, crcOrig, &randState, dCtxNoise, seed, testNb);
/* note : we don't analyze result here : it probably failed, which is expected. /* note : we don't analyze result here : it probably failed, which is expected.
* We just check for potential out-of-bound reads and writes. */ * The sole purpose is to catch potential out-of-bound reads and writes. */
LZ4F_resetDecompressionContext(dCtxNoise); /* context must be reset after an error */ LZ4F_resetDecompressionContext(dCtxNoise); /* context must be reset after an error */
#endif #endif
} /* for ( ; (testNb < nbTests) ; ) */ } /* for ( ; (testNb < nbTests) ; ) */