The LZ4 block format specification
states that the last match must start
at a minimum distance of 12 bytes from the end of the block.
However, out of an abundance of caution,
the reference implementation would actually stop searching matches
at 13 bytes from the end of the block.
This patch fixes this small detail.
The new version is now able to properly compress a limit case
such as `aaaaaaaabaaa\n`
as reported by Gao Xiang (@hsiangkao).
Obviously, it doesn't change a lot of things.
This is just one additional match candidate per block, with a maximum match length of 7 (since last 5 bytes must remain literals).
With default policy, blocks are 4 MB long, so it doesn't happen too often
Compressing silesia.tar at default level 1 saves 5 bytes (100930101 -> 100930096).
At max level 12, it saves a grand 16 bytes (77389871 -> 77389855).
The impact is a bit more visible when blocks are smaller, hence more numerous.
For example, compressing silesia with blocks of 64 KB (using -12 -B4D) saves 543 bytes (77304583 -> 77304040).
So the smaller the packet size, the more visible the impact.
And it happens we have a ton of scenarios with little blocks using LZ4 compression ...
And a useless "hooray" sidenote :
the patch improves the LZ4 compression record of silesia (using -12 -B7D --no-frame-crc) by 16 bytes (77270672 -> 77270656)
and the record on enwik9 by 44 bytes (371680396 -> 371680352) (previously claimed by [smallz4](http://create.stephan-brumme.com/smallz4/) ).
For example, with clang:
lz4.c:XXX:36: error: 'register' storage class specifier is deprecated and incompatible with C++17 [-Werror,-Wdeprecated-register]
static unsigned LZ4_NbCommonBytes (register reg_t val)
^~~~~~~~~
static analyzer `cppcheck` complains about a shift-by-32 on a 32 bits value,
which is an undefined behavior.
However, the flagged code path is never triggered in 32-bits mode,
(actually, it's not even generated if DCE kicks in),
the shift-by-32 is necessarily performed on a 64-bits value.
While it doesn't change anything regarding lz4 code generation, for both 32 and 64 bits mode,
(can be checked by md5sum on the generated binary),
the shift has been rewritten in a way which should please this static analyzer,
since it now pretends to shift by 16 on 32-bits cpu (note : it doesn't matter since the code will not even be generated in this case).
Note : this is a blind fix, the new code has not been tested with cppcheck, because cppcheck only works on Windows.
Other static analyzer, such as scan-build, do not trigger this false positive.
FORCE_INLINE macro is defined based on the compiler used. When using
gcc, it will include "__attribute__((always_inline))" forcing gcc to
always inline all the functions marked as FORCE_INLINE. However, this
can cause a performance degradation of about 15%.
This patch allows to set the FORCE_INLINE macro from the compiler
command line to either "static" or "static inline" giving allowing it
to inline functions as needed when performing optimizations.
The bug would make the bt search read one byte in an invalid memory region,
and make a branch decision based on its value.
Impact was small (missed compression opportunity).
It only happens in -BD mode, with extDict-prefix overlapping matches.
The bt match search is supposed to work also in extDict mode.
In which case, the match ptr can point into Dict.
When the match was overlapping Dict<->Prefix,
match[matchLength] would end up outside of Dict, in an invalid memory area.
The correction ensures that in such a case,
match[matchLength] ends up at intended location, inside prefix.
This macro is susceptible to be triggered from user side
typically through compiler flag (-DLZ4_HEAPMODE=1).
In which case, it makes sense to prefix the macro
since we want to reduce potential side-effect on namespace.