0fd4df6ed3
This makes it easier to edit for maintenance and evolutions (I plan to experiment modifications in huffman decompression functions). The methology followed seems broadly applicable to other BMI2 modules. Performance was tracked rigorously at each step, there is no noticeable loss (nor win) of performance compared to `#include` version. Note however that 4X decoder variants tend to be extremely sensitive to code alignment. This source code resulted in pretty good performance for gcc 7.2 and 7.3, but future changes (even in other parts of the code) might trigger the issue again. |
||
---|---|---|
.. | ||
huf_decompress.c | ||
zstd_decompress_impl.h | ||
zstd_decompress.c |