EndMark, the 4-bytes value indicating the end of frame,
must be `0x00000000`.
Previously, it was just mentioned as a `0-size` block.
But such definition could encompass uncompressed blocks of size 0,
with a header of value `0x80000000`.
But the intention was to also support uncompressed empty blocks.
They could be used as a keep-alive signal.
Note that compressed empty blocks are already supported,
it's just that they have a size 1 instead of 0 (for the `0` token).
Unfortunately, the decoder implementation was also wrong,
and would also interpret a `0x80000000` block header as an endMark.
This issue evaded detection so far simply because
this situation never happens, as LZ4Frame always issues
a clean 0x00000000 value as a endMark.
It also does not flush empty blocks.
This is fixed in this PR.
The decoder can now deal with empty uncompressed blocks,
and do not confuse them with EndMark.
The specification is also clarified.
Finally, FrameTest is updated to randomly insert empty blocks during fuzzing.
make it possible to generate LZ4-compressed block
with a controlled maximum offset (necessarily <= 65535).
This could be useful for compatibility with decoders
using a very limited memory budget (<64 KB).
Answer #154
- promoted LZ4_resetStream_fast() to stable
- moved LZ4_resetStream() into deprecate, but without triggering a compiler warning
- update all sources to no longer rely on LZ4_resetStream()
note : LZ4_initStream() proposal is slightly different :
it's able to initialize any buffer, provided that it's large enough.
To this end, it accepts a void*, and returns an LZ4_stream_t*.
these functions are now unpublished in dll by default.
One needs to opt-in, using macro LZ4_PUBLISH_STATIC_FUNCTIONS.
used this opportunity to update a bunch of api comments in lz4.h
the initial intention was to update lz4f ring buffer strategy,
but lz4f doesn't use ring buffer.
Instead, it uses the destination buffer as much as possible,
and merely copies just what's required to preserve history
into its own buffer, at the end.
Pretty efficient.
This patch just clarifies a few comments and add some assert().
It's built on top of #528.
It also updates doc.
It occurred to me that the formula "The last 5 bytes are always
literals", on the list of "assumptions made by the decoder", is
remarkably ambiguous. Suppose the decoder is presented with 5 bytes.
Are they literals? It may seem that the decoder degenerates
to memcpy on short inputs. But of course the answer is no,
so the formula needs some clarification.
Parsing restrictions should be explained as well, otherwise they look
like arbitrary numbers. The 5-byte restriction has been mentioned
recently in connection with the shortcut in LZ4_decompress_generic,
so I add that. The second restriction is left to be explained
by the author.
I also took the liberty to explain that empty inputs "are either
unrepresentable or can be represented with a null byte". This wording
may actually have some merit: it leaves for the implementation,
as opposed to the spec, to decide whether the encoder can compress
empty inputs, and whether the decoder can produce an empty output
(which the implementation should further clarify).
Someone found it would be a great idea to define there a global variable under the very generic name "index".
Cause problem with shadow warnings, so no variable can be named "index" now ...
Also : automatically update API manual