It occurred to me that the formula "The last 5 bytes are always
literals", on the list of "assumptions made by the decoder", is
remarkably ambiguous. Suppose the decoder is presented with 5 bytes.
Are they literals? It may seem that the decoder degenerates
to memcpy on short inputs. But of course the answer is no,
so the formula needs some clarification.
Parsing restrictions should be explained as well, otherwise they look
like arbitrary numbers. The 5-byte restriction has been mentioned
recently in connection with the shortcut in LZ4_decompress_generic,
so I add that. The second restriction is left to be explained
by the author.
I also took the liberty to explain that empty inputs "are either
unrepresentable or can be represented with a null byte". This wording
may actually have some merit: it leaves for the implementation,
as opposed to the spec, to decide whether the encoder can compress
empty inputs, and whether the decoder can produce an empty output
(which the implementation should further clarify).
The bug is a read up to 2 bytes past the end of the buffer.
There are three cases for this bug, one for each test case added.
* An empty input causes `token = *ip++` to read one byte too far.
* A one byte input with `(token >> ML_BITS) == RUN_MASK` causes
one extra byte to be read without validation. This could be
combined with the first bug to cause 2 extra bytes to be read.
* The case pointed out in issue #508, where `ip == iend` at the
beginning of the loop after taking the shortcut.
Benchmarks show no regressions on clang or gcc-7 on both my mac
and devserver.
Fixes#508.
The notes about "security guarantee" and "malicious inputs" seemed
a bit non-technical to me, so I took the liberty to tone them down
and instead describe the actual risks in technical terms. Namely,
the function never writes past the end of the output buffer, so
a direct hostile takeover (resulting in arbitrary code execution
soon after the return from the function) is not possible. However,
the application can crash because of reads from unmapped pages.
I also took the liberty to describe what I believe is the only sensible
usage scenario for the function: "This function is only usable if the
originalSize of uncompressed data is known in advance," etc.
The simple change from
`matchIndex+MAX_DISTANCE < current`
towards
`current - matchIndex > MAX_DISTANCE`
is enough to generate a 10% performance drop under clang.
Quite massive.
(I missed as my eyes were concentrated on gcc performance at that time).
The second version is more robust, because it also survives a situation where
`matchIndex > current`
due to overflows.
The first version requires matchIndex to not overflow.
Hence were added `assert()` conditions.
The only case where this can happen is with dictCtx compression,
in the case where the dictionary context is not initialized before loading the dictionary.
So it's enough to always initialize the context while loading the dictionary.