zstd/lib/compress
Yann Collet 0a0a212934 zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.

Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.

The "proper" formula makes it possible to remove this heuristic.

Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).

It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
2017-11-28 14:07:03 -08:00
..
fse_compress.c [zstd] Backport kernel patch from @ColinIanKing 2017-09-25 16:18:23 -07:00
huf_compress.c Ensure dictionary Huff table can encode any symbol 2017-10-03 13:22:13 -07:00
zstd_compress_internal.h zstd_opt: changed cost formula 2017-11-28 14:07:03 -08:00
zstd_compress.c fix one UB pointer arithmetic in encoder 2017-11-17 13:24:51 -08:00
zstd_double_fast.c renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_double_fast.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_fast.c renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_fast.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_lazy.c strategy ZSTD_btopt pinned to (0) variant (faster one) 2017-11-20 11:53:20 -08:00
zstd_lazy.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_ldm.c Increase maximum window size 2017-09-26 14:00:01 -07:00
zstd_ldm.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_opt.c zstd_opt: changed cost formula 2017-11-28 14:07:03 -08:00
zstd_opt.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstdmt_compress.c Fixed Btree update 2017-11-16 12:18:56 -08:00
zstdmt_compress.h fixed some complex scenarios 2017-11-16 15:18:18 -08:00