changed format of lz4 block doc
This commit is contained in:
parent
d8bb7a5bc3
commit
7b463b63c3
@ -60,7 +60,7 @@ Benchmark evaluates the compression of reference [Silesia Corpus](http://sun.aei
|
|||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
The LZ4 block compression format is detailed within [lz4_block_format.txt](lz4_block_format.txt).
|
The LZ4 block compression format is detailed within [lz4_Block_format](lz4_Block_format.md).
|
||||||
|
|
||||||
For streaming unknown amount of data, and compress files of any size, a frame format has been published, and can be consulted within the file [LZ4_Frame_Format.html](LZ4_Frame_Format.html).
|
For streaming unknown amount of data, and compress files of any size, a frame format has been published, and can be consulted within the file [LZ4_Frame_Format.html](LZ4_Frame_Format.html).
|
||||||
|
|
||||||
|
@ -4,7 +4,6 @@ Last revised: 2015-03-26.
|
|||||||
Author : Yann Collet
|
Author : Yann Collet
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
This small specification intents to provide enough information
|
This small specification intents to provide enough information
|
||||||
to anyone willing to produce LZ4-compatible compressed data blocks
|
to anyone willing to produce LZ4-compatible compressed data blocks
|
||||||
using any programming language.
|
using any programming language.
|
||||||
@ -26,7 +25,8 @@ on implementation details of the compressor, and vice versa.
|
|||||||
Compressed block format
|
Compressed block format
|
||||||
-----------------------
|
-----------------------
|
||||||
An LZ4 compressed block is composed of sequences.
|
An LZ4 compressed block is composed of sequences.
|
||||||
Schematically, a sequence is a suite of literals, followed by a match copy.
|
A sequence is a suite of literals (not-compressed bytes),
|
||||||
|
followed by a match copy.
|
||||||
|
|
||||||
Each sequence starts with a token.
|
Each sequence starts with a token.
|
||||||
The token is a one byte value, separated into two 4-bits fields.
|
The token is a one byte value, separated into two 4-bits fields.
|
||||||
@ -35,14 +35,14 @@ Therefore each field ranges from 0 to 15.
|
|||||||
|
|
||||||
The first field uses the 4 high-bits of the token.
|
The first field uses the 4 high-bits of the token.
|
||||||
It provides the length of literals to follow.
|
It provides the length of literals to follow.
|
||||||
(Note : a literal is a not-compressed byte).
|
|
||||||
If the field value is 0, then there is no literal.
|
If the field value is 0, then there is no literal.
|
||||||
If it is 15, then we need to add some more bytes to indicate the full length.
|
If it is 15, then we need to add some more bytes to indicate the full length.
|
||||||
Each additionnal byte then represent a value from 0 to 255,
|
Each additional byte then represent a value from 0 to 255,
|
||||||
which is added to the previous value to produce a total length.
|
which is added to the previous value to produce a total length.
|
||||||
When the byte value is 255, another byte is output.
|
When the byte value is 255, another byte is output.
|
||||||
There can be any number of bytes following the token. There is no "size limit".
|
There can be any number of bytes following the token. There is no "size limit".
|
||||||
(Sidenote this is why a not-compressible input block is expanded by 0.4%).
|
(Side note : this is why a not-compressible input block is expanded by 0.4%).
|
||||||
|
|
||||||
Example 1 : A length of 48 will be represented as :
|
Example 1 : A length of 48 will be represented as :
|
||||||
- 15 : value for the 4-bits High field
|
- 15 : value for the 4-bits High field
|
||||||
@ -65,7 +65,8 @@ It's possible that there are zero literal.
|
|||||||
Following the literals is the match copy operation.
|
Following the literals is the match copy operation.
|
||||||
|
|
||||||
It starts by the offset.
|
It starts by the offset.
|
||||||
This is a 2 bytes value, in little endian format.
|
This is a 2 bytes value, in little endian format
|
||||||
|
(the 1st byte is the "low" byte, the 2nd one is the "high" byte).
|
||||||
|
|
||||||
The offset represents the position of the match to be copied from.
|
The offset represents the position of the match to be copied from.
|
||||||
1 means "current position - 1 byte".
|
1 means "current position - 1 byte".
|
||||||
@ -95,9 +96,12 @@ Parsing restrictions
|
|||||||
-----------------------
|
-----------------------
|
||||||
There are specific parsing rules to respect in order to remain compatible
|
There are specific parsing rules to respect in order to remain compatible
|
||||||
with assumptions made by the decoder :
|
with assumptions made by the decoder :
|
||||||
1) The last 5 bytes are always literals
|
|
||||||
2) The last match must start at least 12 bytes before end of block
|
1. The last 5 bytes are always literals
|
||||||
Consequently, a block with less than 13 bytes cannot be compressed.
|
2. The last match must start at least 12 bytes before end of block.
|
||||||
|
|
||||||
|
Consequently, a block with less than 13 bytes cannot be compressed.
|
||||||
|
|
||||||
These rules are in place to ensure that the decoder
|
These rules are in place to ensure that the decoder
|
||||||
will never read beyond the input buffer, nor write beyond the output buffer.
|
will never read beyond the input buffer, nor write beyond the output buffer.
|
||||||
|
|
||||||
@ -118,4 +122,3 @@ or full optimal parsing.
|
|||||||
All these trade-off offer distinctive speed/memory/compression advantages.
|
All these trade-off offer distinctive speed/memory/compression advantages.
|
||||||
Whatever the method used by the compressor, its result will be decodable
|
Whatever the method used by the compressor, its result will be decodable
|
||||||
by any LZ4 decoder if it follows the format specification described above.
|
by any LZ4 decoder if it follows the format specification described above.
|
||||||
|
|
Loading…
Reference in New Issue
Block a user