lz4/doc/lz4_Frame_format.md

LZ4 Frame Format Description
============================

### Notices

Copyright (c) 2013-2015 Yann Collet

Permission is granted to copy and distribute this document
for any  purpose and without charge,
including translations into other  languages
and incorporation into compilations,
provided that the copyright notice and this notice are preserved,
and that any substantive changes or deletions from the original
are clearly marked.
Distribution of this document is unlimited.

### Version

1.6.2 (12/08/2020)


Introduction
------------

The purpose of this document is to define a lossless compressed data format,
that is independent of CPU type, operating system,
file system and character set, suitable for
File compression, Pipe and streaming compression
using the [LZ4 algorithm](http://www.lz4.org).

The data can be produced or consumed,
even for an arbitrarily long sequentially presented input data stream,
using only an a priori bounded amount of intermediate storage,
and hence can be used in data communications.
The format uses the LZ4 compression method,
and optional [xxHash-32 checksum method](https://github.com/Cyan4973/xxHash),
for detection of data corruption.

The data format defined by this specification
does not attempt to allow random access to compressed data.

This specification is intended for use by implementers of software
to compress data into LZ4 format and/or decompress data from LZ4 format.
The text of the specification assumes a basic background in programming
at the level of bits and other primitive data representations.

Unless otherwise indicated below,
a compliant compressor must produce data sets
that conform to the specifications presented here.
It doesn’t need to support all options though.

A compliant decompressor must be able to decompress
at least one working set of parameters
that conforms to the specifications presented here.
It may also ignore checksums.
Whenever it does not support a specific parameter within the compressed stream,
it must produce a non-ambiguous error code
and associated error message explaining which parameter is unsupported.


General Structure of LZ4 Frame format
-------------------------------------

| MagicNb | F. Descriptor | Block | (...) | EndMark | C. Checksum |
|:-------:|:-------------:| ----- | ----- | ------- | ----------- |
| 4 bytes |  3-15 bytes   |       |       | 4 bytes | 0-4 bytes   |

__Magic Number__

4 Bytes, Little endian format.
Value : 0x184D2204

__Frame Descriptor__

3 to 15 Bytes, to be detailed in its own paragraph,
as it is the most important part of the spec.

The combined _Magic_Number_ and _Frame_Descriptor_ fields are sometimes
called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes.

__Data Blocks__

To be detailed in its own paragraph.
That’s where compressed data is stored.

__EndMark__

The flow of blocks ends when the last data block is followed by
the 32-bit value `0x00000000`.

__Content Checksum__

_Content_Checksum_ verify that the full content has been decoded correctly.
The content checksum is the result of [xxHash-32 algorithm]
digesting the original (decoded) data as input, and a seed of zero.
Content checksum is only present when its associated flag
is set in the frame descriptor.
Content Checksum validates the result,
that all blocks were fully transmitted in the correct order and without error,
and also that the encoding/decoding process itself generated no distortion.
Its usage is recommended.

The combined _EndMark_ and _Content_Checksum_ fields might sometimes be
referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes.

__Frame Concatenation__

In some circumstances, it may be preferable to append multiple frames,
for example in order to add new data to an existing compressed file
without re-framing it.

In such case, each frame has its own set of descriptor flags.
Each frame is considered independent.
The only relation between frames is their sequential order.

The ability to decode multiple concatenated frames
within a single stream or file
is left outside of this specification.
As an example, the reference lz4 command line utility behavior is
to decode all concatenated frames in their sequential order.


Frame Descriptor
----------------

| FLG     | BD      | (Content Size) | (Dictionary ID) | HC      |
| ------- | ------- |:--------------:|:---------------:| ------- |
| 1 byte  | 1 byte  |  0 - 8 bytes   |   0 - 4 bytes   | 1 byte  |

The descriptor uses a minimum of 3 bytes,
and up to 15 bytes depending on optional parameters.

__FLG byte__

|  BitNb  |  7-6  |   5   |    4     |  3   |    2     |    1     |   0  |
| ------- |-------|-------|----------|------|----------|----------|------|
|FieldName|Version|B.Indep|B.Checksum|C.Size|C.Checksum|*Reserved*|DictID|


__BD byte__

|  BitNb  |     7    |     6-5-4     |  3-2-1-0 |
| ------- | -------- | ------------- | -------- |
|FieldName|*Reserved*| Block MaxSize |*Reserved*|

In the tables, bit 7 is highest bit, while bit 0 is lowest.

__Version Number__

2-bits field, must be set to `01`.
Any other value cannot be decoded by this version of the specification.
Other version numbers will use different flag layouts.

__Block Independence flag__

If this flag is set to “1”, blocks are independent.
If this flag is set to “0”, each block depends on previous ones
(up to LZ4 window size, which is 64 KB).
In such case, it’s necessary to decode all blocks in sequence.

Block dependency improves compression ratio, especially for small blocks.
On the other hand, it makes random access or multi-threaded decoding impossible.

__Block checksum flag__

If this flag is set, each data block will be followed by a 4-bytes checksum,
calculated by using the xxHash-32 algorithm on the raw (compressed) data block.
The intention is to detect data corruption (storage or transmission errors)
immediately, before decoding.
Block checksum usage is optional.

__Content Size flag__

If this flag is set, the uncompressed size of data included within the frame
will be present as an 8 bytes unsigned little endian value, after the flags.
Content Size usage is optional.

__Content checksum flag__

If this flag is set, a 32-bits content checksum will be appended
after the EndMark.

__Dictionary ID flag__

If this flag is set, a 4-bytes Dict-ID field will be present,
after the descriptor flags and the Content Size.

__Block Maximum Size__

This information is useful to help the decoder allocate memory.
Size here refers to the original (uncompressed) data size.
Block Maximum Size is one value among the following table :

|  0  |  1  |  2  |  3  |   4   |   5    |  6   |  7   |
| --- | --- | --- | --- | ----- | ------ | ---- | ---- |
| N/A | N/A | N/A | N/A | 64 KB | 256 KB | 1 MB | 4 MB |

The decoder may refuse to allocate block sizes above any system-specific size.
Unused values may be used in a future revision of the spec.
A decoder conformant with the current version of the spec
is only able to decode block sizes defined in this spec.

__Reserved bits__

Value of reserved bits **must** be 0 (zero).
Reserved bit might be used in a future version of the specification,
typically enabling new optional features.
When this happens, a decoder respecting the current specification version
shall not be able to decode such a frame.

__Content Size__

This is the original (uncompressed) size.
This information is optional, and only present if the associated flag is set.
Content size is provided using unsigned 8 Bytes, for a maximum of 16 Exabytes.
Format is Little endian.
This value is informational, typically for display or memory allocation.
It can be skipped by a decoder, or used to validate content correctness.

__Dictionary ID__

Dict-ID is only present if the associated flag is set.
It's an unsigned 32-bits value, stored using little-endian convention.
A dictionary is useful to compress short input sequences.
The compressor can take advantage of the dictionary context
to encode the input in a more compact manner.
It works as a kind of “known prefix” which is used by
both the compressor and the decompressor to “warm-up” reference tables.

The decompressor can use Dict-ID identifier to determine
which dictionary must be used to correctly decode data.
The compressor and the decompressor must use exactly the same dictionary.
It's presumed that the 32-bits dictID uniquely identifies a dictionary.

Within a single frame, a single dictionary can be defined.
When the frame descriptor defines independent blocks,
each block will be initialized with the same dictionary.
If the frame descriptor defines linked blocks,
the dictionary will only be used once, at the beginning of the frame.

__Header Checksum__

One-byte checksum of combined descriptor fields, including optional ones.
The value is the second byte of `xxh32()` : ` (xxh32()>>8) & 0xFF `
using zero as a seed, and the full Frame Descriptor as an input
(including optional fields when they are present).
A wrong checksum indicates an error in the descriptor.
Header checksum is informational and can be skipped.


Data Blocks
-----------

| Block Size |  data  | (Block Checksum) |
|:----------:| ------ |:----------------:|
|  4 bytes   |        |   0 - 4 bytes    |


__Block Size__

This field uses 4-bytes, format is little-endian.

If the highest bit is set (`1`), the block is uncompressed.

If the highest bit is not set (`0`), the block is LZ4-compressed,
using the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).

All other bits give the size, in bytes, of the data section.
The size does not include the block checksum if present.

_Block_Size_ shall never be larger than _Block_Maximum_Size_.
Such an outcome could potentially happen for non-compressible sources.
In such a case, such data block must be passed using uncompressed format.

A value of `0x00000000` is invalid, and signifies an _EndMark_ instead.
Note that this is different from a value of `0x80000000` (highest bit set),
which is an uncompressed block of size 0 (empty),
which is valid, and therefore doesn't end a frame.
Note that, if _Block_checksum_ is enabled,
even an empty block must be followed by a 32-bit block checksum.

__Data__

Where the actual data to decode stands.
It might be compressed or not, depending on previous field indications.

When compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).

Note that a block is not necessarily full.
Uncompressed size of data can be any size __up to__ _Block_Maximum_Size_,
so it may contain less data than the maximum block size.

__Block checksum__

Only present if the associated flag is set.
This is a 4-bytes checksum value, in little endian format,
calculated by using the [xxHash-32 algorithm] on the __raw__ (undecoded) data block,
and a seed of zero.
The intention is to detect data corruption (storage or transmission errors)
before decoding.

_Block_checksum_ can be cumulative with _Content_checksum_.

[xxHash-32 algorithm]: https://github.com/Cyan4973/xxHash/blob/release/doc/xxhash_spec.md


Skippable Frames
----------------

| Magic Number | Frame Size | User Data |
|:------------:|:----------:| --------- |
|   4 bytes    |  4 bytes   |           |

Skippable frames allow the integration of user-defined data
into a flow of concatenated frames.
Its design is pretty straightforward,
with the sole objective to allow the decoder to quickly skip
over user-defined data and continue decoding.

For the purpose of facilitating identification,
it is discouraged to start a flow of concatenated frames with a skippable frame.
If there is a need to start such a flow with some user data
encapsulated into a skippable frame,
it’s recommended to start with a zero-byte LZ4 frame
followed by a skippable frame.
This will make it easier for file type identifiers.


__Magic Number__

4 Bytes, Little endian format.
Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
All 16 values are valid to identify a skippable frame.

__Frame Size__

This is the size, in bytes, of the following User Data
(without including the magic number nor the size field itself).
4 Bytes, Little endian format, unsigned 32-bits.
This means User Data can’t be bigger than (2^32-1) Bytes.

__User Data__

User Data can be anything. Data will just be skipped by the decoder.


Legacy frame
------------

The Legacy frame format was defined into the initial versions of “LZ4Demo”.
Newer compressors should not use this format anymore, as it is too restrictive.

Main characteristics of the legacy format :

- Fixed block size : 8 MB.
- All blocks must be completely filled, except the last one.
- All blocks are always compressed, even when compression is detrimental.
- The last block is detected either because
  it is followed by the “EOF” (End of File) mark,
  or because it is followed by a known Frame Magic Number.
- No checksum
- Convention is Little endian

| MagicNb | B.CSize | CData | B.CSize | CData |  (...)  | EndMark |
| ------- | ------- | ----- | ------- | ----- | ------- | ------- |
| 4 bytes | 4 bytes | CSize | 4 bytes | CSize | x times |   EOF   |


__Magic Number__

4 Bytes, Little endian format.
Value : 0x184C2102

__Block Compressed Size__

This is the size, in bytes, of the following compressed data block.
4 Bytes, Little endian format.

__Data__

Where the actual compressed data stands.
Data is always compressed, even when compression is detrimental.

__EndMark__

End of legacy frame is implicit only.
It must be followed by a standard EOF (End Of File) signal,
wether it is a file or a stream.

Alternatively, if the frame is followed by a valid Frame Magic Number,
it is considered completed.
This policy makes it possible to concatenate legacy frames.

Any other value will be interpreted as a block size,
and trigger an error if it does not fit within acceptable range.


Version changes
---------------

1.6.2 : clarifies specification of _EndMark_

1.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer"

1.6.0 : restored Dictionary ID field in Frame header

1.5.1 : changed document format to MarkDown

1.5 : removed Dictionary ID from specification

1.4.1 : changed wording from “stream” to “frame”

1.4 : added skippable streams, re-added stream checksum

1.3 : modified header checksum

1.2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.

1.1 : optional fields are now part of the descriptor

1.0 : changed “block size” specification, adding a compressed/uncompressed flag

0.9 : reduced scale of “block maximum size” table

0.8 : removed : high compression flag

0.7 : removed : stream checksum

0.6 : settled : stream size uses 8 bytes, endian convention is little endian

0.5: added copyright notice

0.4 : changed format to Google Doc compatible OpenDocument
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								LZ4 Frame Format Description
 								============================
-												[Doc] Fix markdown

											
										
										
											2017-05-25 17:03:05 +00:00
+								### Notices
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								Copyright (c) 2013-2015 Yann Collet
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								Permission is granted to copy and distribute this document
 								for any  purpose and without charge,
 								including translations into other  languages
 								and incorporation into compilations,
 								provided that the copyright notice and this notice are preserved,
 								and that any substantive changes or deletions from the original
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								are clearly marked.
 								Distribution of this document is unlimited.
-												[Doc] Fix markdown

											
										
										
											2017-05-25 17:03:05 +00:00
+								### Version
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+.6.2 (12/08/2020)
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								Introduction
 								------------
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								The purpose of this document is to define a lossless compressed data format,
 								that is independent of CPU type, operating system,
 								file system and character set, suitable for
 								File compression, Pipe and streaming compression
-												updated hyperlink

											
										
										
											2016-01-15 15:41:10 +00:00
+								using the [LZ4 algorithm](http://www.lz4.org).
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								The data can be produced or consumed,
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								even for an arbitrarily long sequentially presented input data stream,
 								using only an a priori bounded amount of intermediate storage,
 								and hence can be used in data communications.
 								The format uses the LZ4 compression method,
 								and optional [xxHash-32 checksum method](https://github.com/Cyan4973/xxHash),
 								for detection of data corruption.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								The data format defined by this specification
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								does not attempt to allow random access to compressed data.
 								This specification is intended for use by implementers of software
 								to compress data into LZ4 format and/or decompress data from LZ4 format.
 								The text of the specification assumes a basic background in programming
 								at the level of bits and other primitive data representations.
 								Unless otherwise indicated below,
 								a compliant compressor must produce data sets
 								that conform to the specifications presented here.
 								It doesn’t need to support all options though.
 								A compliant decompressor must be able to decompress
 								at least one working set of parameters
 								that conforms to the specifications presented here.
 								It may also ignore checksums.
 								Whenever it does not support a specific parameter within the compressed stream,
 								it must produce a non-ambiguous error code
 								and associated error message explaining which parameter is unsupported.
 								General Structure of LZ4 Frame format
 								-------------------------------------
 								| MagicNb | F. Descriptor | Block | (...) | EndMark | C. Checksum |
 								|:-------:|:-------------:| ----- | ----- | ------- | ----------- |
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								| 4 bytes |  3-15 bytes   |       |       | 4 bytes | 0-4 bytes   |
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Magic Number__
 Bytes, Little endian format.
 								Value : 0x184D2204
 								__Frame Descriptor__
-												proposed a minor change to LZ4 Frame format specification

add new terms "LZ4 Frame Header" and "LZ4 Frame Footer"

											
										
										
											2018-01-31 01:53:00 +00:00
+to 15 Bytes, to be detailed in its own paragraph,
 								as it is the most important part of the spec.
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								The combined _Magic_Number_ and _Frame_Descriptor_ fields are sometimes
-												fix typos as suggested by @psteinb

											
										
										
											2018-01-31 17:54:30 +00:00
+								called ___LZ4 Frame Header___. Its size varies between 7 and 19 bytes.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Data Blocks__
-												proposed a minor change to LZ4 Frame format specification

add new terms "LZ4 Frame Header" and "LZ4 Frame Footer"

											
										
										
											2018-01-31 01:53:00 +00:00
+								To be detailed in its own paragraph.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								That’s where compressed data is stored.
 								__EndMark__
-												clarify endMark definition

											
										
										
											2020-08-14 22:03:03 +00:00
+								The flow of blocks ends when the last data block is followed by
 								the 32-bit value `0x00000000`.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Content Checksum__
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								_Content_Checksum_ verify that the full content has been decoded correctly.
 								The content checksum is the result of [xxHash-32 algorithm]
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								digesting the original (decoded) data as input, and a seed of zero.
 								Content checksum is only present when its associated flag
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								is set in the frame descriptor.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								Content Checksum validates the result,
 								that all blocks were fully transmitted in the correct order and without error,
 								and also that the encoding/decoding process itself generated no distortion.
 								Its usage is recommended.
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								The combined _EndMark_ and _Content_Checksum_ fields might sometimes be
-												fix typos as suggested by @psteinb

											
										
										
											2018-01-31 17:54:30 +00:00
+								referred to as ___LZ4 Frame Footer___. Its size varies between 4 and 8 bytes.
-												proposed a minor change to LZ4 Frame format specification

add new terms "LZ4 Frame Header" and "LZ4 Frame Footer"

											
										
										
											2018-01-31 01:53:00 +00:00
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								__Frame Concatenation__
 								In some circumstances, it may be preferable to append multiple frames,
 								for example in order to add new data to an existing compressed file
 								without re-framing it.
 								In such case, each frame has its own set of descriptor flags.
 								Each frame is considered independent.
 								The only relation between frames is their sequential order.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								The ability to decode multiple concatenated frames
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								within a single stream or file
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								is left outside of this specification.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								As an example, the reference lz4 command line utility behavior is
 								to decode all concatenated frames in their sequential order.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								Frame Descriptor
 								----------------
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								| FLG     | BD      | (Content Size) | (Dictionary ID) | HC      |
 								| ------- | ------- |:--------------:|:---------------:| ------- |
 								| 1 byte  | 1 byte  |  0 - 8 bytes   |   0 - 4 bytes   | 1 byte  |
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								The descriptor uses a minimum of 3 bytes,
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								and up to 15 bytes depending on optional parameters.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__FLG byte__
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								|  BitNb  |  7-6  |   5   |    4     |  3   |    2     |    1     |   0  |
 								| ------- |-------|-------|----------|------|----------|----------|------|
 								|FieldName|Version|B.Indep|B.Checksum|C.Size|C.Checksum|*Reserved*|DictID|
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__BD byte__
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								|  BitNb  |     7    |     6-5-4     |  3-2-1-0 |
 								| ------- | -------- | ------------- | -------- |
 								|FieldName|*Reserved*| Block MaxSize |*Reserved*|
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								In the tables, bit 7 is highest bit, while bit 0 is lowest.
 								__Version Number__
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+-bits field, must be set to `01`.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								Any other value cannot be decoded by this version of the specification.
 								Other version numbers will use different flag layouts.
 								__Block Independence flag__
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								If this flag is set to “1”, blocks are independent.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								If this flag is set to “0”, each block depends on previous ones
 								(up to LZ4 window size, which is 64 KB).
 								In such case, it’s necessary to decode all blocks in sequence.
 								Block dependency improves compression ratio, especially for small blocks.
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								On the other hand, it makes random access or multi-threaded decoding impossible.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Block checksum flag__
 								If this flag is set, each data block will be followed by a 4-bytes checksum,
 								calculated by using the xxHash-32 algorithm on the raw (compressed) data block.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								The intention is to detect data corruption (storage or transmission errors)
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								immediately, before decoding.
 								Block checksum usage is optional.
 								__Content Size flag__
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								If this flag is set, the uncompressed size of data included within the frame
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								will be present as an 8 bytes unsigned little endian value, after the flags.
 								Content Size usage is optional.
 								__Content checksum flag__
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								If this flag is set, a 32-bits content checksum will be appended
 								after the EndMark.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								__Dictionary ID flag__
 								If this flag is set, a 4-bytes Dict-ID field will be present,
 								after the descriptor flags and the Content Size.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Block Maximum Size__
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								This information is useful to help the decoder allocate memory.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								Size here refers to the original (uncompressed) data size.
 								Block Maximum Size is one value among the following table :
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								|  0  |  1  |  2  |  3  |   4   |   5    |  6   |  7   |
 								| --- | --- | --- | --- | ----- | ------ | ---- | ---- |
 								| N/A | N/A | N/A | N/A | 64 KB | 256 KB | 1 MB | 4 MB |
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								The decoder may refuse to allocate block sizes above any system-specific size.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								Unused values may be used in a future revision of the spec.
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								A decoder conformant with the current version of the spec
 								is only able to decode block sizes defined in this spec.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Reserved bits__
 								Value of reserved bits **must** be 0 (zero).
 								Reserved bit might be used in a future version of the specification,
 								typically enabling new optional features.
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								When this happens, a decoder respecting the current specification version
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								shall not be able to decode such a frame.
 								__Content Size__
 								This is the original (uncompressed) size.
 								This information is optional, and only present if the associated flag is set.
-												lz4_Frame_format.md: Fix typo HexaBytes -> Exabytes
											
										
										
											2018-11-24 21:13:10 +00:00
+								Content size is provided using unsigned 8 Bytes, for a maximum of 16 Exabytes.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								Format is Little endian.
-												Updated a few comments
											
										
										
											2015-05-12 09:10:20 +00:00
+								This value is informational, typically for display or memory allocation.
 								It can be skipped by a decoder, or used to validate content correctness.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								__Dictionary ID__
 								Dict-ID is only present if the associated flag is set.
 								It's an unsigned 32-bits value, stored using little-endian convention.
 								A dictionary is useful to compress short input sequences.
 								The compressor can take advantage of the dictionary context
 								to encode the input in a more compact manner.
 								It works as a kind of “known prefix” which is used by
 								both the compressor and the decompressor to “warm-up” reference tables.
 								The decompressor can use Dict-ID identifier to determine
 								which dictionary must be used to correctly decode data.
 								The compressor and the decompressor must use exactly the same dictionary.
 								It's presumed that the 32-bits dictID uniquely identifies a dictionary.
 								Within a single frame, a single dictionary can be defined.
 								When the frame descriptor defines independent blocks,
 								each block will be initialized with the same dictionary.
 								If the frame descriptor defines linked blocks,
 								the dictionary will only be used once, at the beginning of the frame.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								__Header Checksum__
 								One-byte checksum of combined descriptor fields, including optional ones.
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								The value is the second byte of `xxh32()` : ` (xxh32()>>8) & 0xFF `
-												added dictID inside LZ4F_frameInfo_t

Compressor can set dictID on LZ4F_compressBegin()
Decompressor can retrieve it using LZ4F_getFrameInfo()

											
										
										
											2017-08-09 00:43:11 +00:00
+								using zero as a seed, and the full Frame Descriptor as an input
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								(including optional fields when they are present).
 								A wrong checksum indicates an error in the descriptor.
 								Header checksum is informational and can be skipped.
 								Data Blocks
 								-----------
 								| Block Size |  data  | (Block Checksum) |
 								|:----------:| ------ |:----------------:|
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								|  4 bytes   |        |   0 - 4 bytes    |
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Block Size__
 								This field uses 4-bytes, format is little-endian.
-												clarify endMark definition

											
										
										
											2020-08-14 22:03:03 +00:00
+								If the highest bit is set (`1`), the block is uncompressed.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												clarify endMark definition

											
										
										
											2020-08-14 22:03:03 +00:00
+								If the highest bit is not set (`0`), the block is LZ4-compressed,
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								using the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								All other bits give the size, in bytes, of the data section.
-												updated frame format

re-wording non-full blocks,
for clarity.

											
										
										
											2019-01-02 23:02:22 +00:00
+								The size does not include the block checksum if present.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								_Block_Size_ shall never be larger than _Block_Maximum_Size_.
 								Such an outcome could potentially happen for non-compressible sources.
 								In such a case, such data block must be passed using uncompressed format.
 								A value of `0x00000000` is invalid, and signifies an _EndMark_ instead.
 								Note that this is different from a value of `0x80000000` (highest bit set),
 								which is an uncompressed block of size 0 (empty),
 								which is valid, and therefore doesn't end a frame.
 								Note that, if _Block_checksum_ is enabled,
 								even an empty block must be followed by a 32-bit block checksum.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Data__
 								Where the actual data to decode stands.
 								It might be compressed or not, depending on previous field indications.
-												updated frame format

re-wording non-full blocks,
for clarity.

											
										
										
											2019-01-02 23:02:22 +00:00
 								When compressed, the data must respect the [LZ4 block format specification](https://github.com/lz4/lz4/blob/master/doc/lz4_Block_format.md).
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								Note that a block is not necessarily full.
 								Uncompressed size of data can be any size __up to__ _Block_Maximum_Size_,
-												updated frame format

re-wording non-full blocks,
for clarity.

											
										
										
											2019-01-02 23:02:22 +00:00
+								so it may contain less data than the maximum block size.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								__Block checksum__
 								Only present if the associated flag is set.
 								This is a 4-bytes checksum value, in little endian format,
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								calculated by using the [xxHash-32 algorithm] on the __raw__ (undecoded) data block,
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								and a seed of zero.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								The intention is to detect data corruption (storage or transmission errors)
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								before decoding.
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+								_Block_checksum_ can be cumulative with _Content_checksum_.
 								[xxHash-32 algorithm]: https://github.com/Cyan4973/xxHash/blob/release/doc/xxhash_spec.md
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								Skippable Frames
 								----------------
 								| Magic Number | Frame Size | User Data |
 								|:------------:|:----------:| --------- |
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								|   4 bytes    |  4 bytes   |           |
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								Skippable frames allow the integration of user-defined data
 								into a flow of concatenated frames.
 								Its design is pretty straightforward,
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								with the sole objective to allow the decoder to quickly skip
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								over user-defined data and continue decoding.
 								For the purpose of facilitating identification,
 								it is discouraged to start a flow of concatenated frames with a skippable frame.
 								If there is a need to start such a flow with some user data
 								encapsulated into a skippable frame,
 								it’s recommended to start with a zero-byte LZ4 frame
 								followed by a skippable frame.
 								This will make it easier for file type identifiers.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								__Magic Number__
 Bytes, Little endian format.
 								Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
 								All 16 values are valid to identify a skippable frame.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								__Frame Size__
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								This is the size, in bytes, of the following User Data
 								(without including the magic number nor the size field itself).
 Bytes, Little endian format, unsigned 32-bits.
 								This means User Data can’t be bigger than (2^32-1) Bytes.
 								__User Data__
 								User Data can be anything. Data will just be skipped by the decoder.
 								Legacy frame
 								------------
 								The Legacy frame format was defined into the initial versions of “LZ4Demo”.
 								Newer compressors should not use this format anymore, as it is too restrictive.
 								Main characteristics of the legacy format :
 								- Fixed block size : 8 MB.
 								- All blocks must be completely filled, except the last one.
 								- All blocks are always compressed, even when compression is detrimental.
-												Remove whitespace from ends of lines

I'm trying to import LZ4 code into a project with strict linting
requirements. This will make that easier.

Signed-off-by: Julius Werner <jwerner@chromium.org>

											
										
										
											2016-02-12 22:34:07 +00:00
+								- The last block is detected either because
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								  it is followed by the “EOF” (End of File) mark,
 								  or because it is followed by a known Frame Magic Number.
 								- No checksum
 								- Convention is Little endian
 								| MagicNb | B.CSize | CData | B.CSize | CData |  (...)  | EndMark |
 								| ------- | ------- | ----- | ------- | ----- | ------- | ------- |
 								| 4 bytes | 4 bytes | CSize | 4 bytes | CSize | x times |   EOF   |
 								__Magic Number__
 Bytes, Little endian format.
 								Value : 0x184C2102
 								__Block Compressed Size__
 								This is the size, in bytes, of the following compressed data block.
 Bytes, Little endian format.
 								__Data__
 								Where the actual compressed data stands.
 								Data is always compressed, even when compression is detrimental.
 								__EndMark__
-												updated doc
											
										
										
											2015-04-06 00:03:49 +00:00
+								End of legacy frame is implicit only.
 								It must be followed by a standard EOF (End Of File) signal,
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
+								wether it is a file or a stream.
 								Alternatively, if the frame is followed by a valid Frame Magic Number,
 								it is considered completed.
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+								This policy makes it possible to concatenate legacy frames.
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 								Any other value will be interpreted as a block size,
 								and trigger an error if it does not fit within acceptable range.
 								Version changes
 								---------------
-												Clarifies and fix EndMark

EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.

Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.

But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).

Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.

This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.

This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.

											
										
										
											2020-08-13 00:27:33 +00:00
+.6.2 : clarifies specification of _EndMark_
-												proposed a minor change to LZ4 Frame format specification

add new terms "LZ4 Frame Header" and "LZ4 Frame Footer"

											
										
										
											2018-01-31 01:53:00 +00:00
+.6.1 : introduced terms "LZ4 Frame Header" and "LZ4 Frame Footer"
-												updated Frame specification

Restored DictID field in Frame header
Bumped specification version to v1.6.0

											
										
										
											2017-08-08 18:59:57 +00:00
+.6.0 : restored Dictionary ID field in Frame header
 .5.1 : changed document format to MarkDown
-												Added : Frame documentation in MarkDown format
											
										
										
											2015-03-31 08:44:56 +00:00
 .5 : removed Dictionary ID from specification
 .4.1 : changed wording from “stream” to “frame”
 .4 : added skippable streams, re-added stream checksum
 .3 : modified header checksum
 .2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.
 .1 : optional fields are now part of the descriptor
 .0 : changed “block size” specification, adding a compressed/uncompressed flag
 .9 : reduced scale of “block maximum size” table
 .8 : removed : high compression flag
 .7 : removed : stream checksum
 .6 : settled : stream size uses 8 bytes, endian convention is little endian
 .5: added copyright notice
-												minor : show in table that frame checksum is optional
											
										
										
											2015-09-01 17:56:54 +00:00
+.4 : changed format to Google Doc compatible OpenDocument