zstd/lib/compress
Yann Collet 5235d8d6ba first implementation of delayed update for btlazy2
This is a pretty nice speed win.

The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.

The baseline performance for btlazy2 on my laptop is :
15#calgary.tar       :   3265536 ->    955985 (3.416),  7.06 MB/s , 618.0 MB/s
15#enwik7            :  10000000 ->   3067341 (3.260),  4.65 MB/s , 521.2 MB/s
15#silesia.tar       : 211984896 ->  58095131 (3.649),  6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)

After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar       :   3265536 ->    958060 (3.408),  9.12 MB/s , 621.1 MB/s
15#enwik7            :  10000000 ->   3078318 (3.249),  6.37 MB/s , 525.1 MB/s
15#silesia.tar       : 211984896 ->  58444111 (3.627),  9.89 MB/s , 680.4 MB/s

That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.

As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.

To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar       :   3265536 ->    956311 (3.415),  8.32 MB/s , 611.4 MB/s
15#enwik7            :  10000000 ->   3067655 (3.260),  5.43 MB/s , 535.5 MB/s
15#silesia.tar       : 211984896 ->  58113144 (3.648),  8.35 MB/s , 679.3 MB/s

aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).

This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
2017-12-28 16:58:57 +01:00
..
fse_compress.c [zstd] Backport kernel patch from @ColinIanKing 2017-09-25 16:18:23 -07:00
huf_compress.c Ensure dictionary Huff table can encode any symbol 2017-10-03 13:22:13 -07:00
zstd_compress_internal.h fix #944 : ZSTDMT with large files and dictionary now works correctly 2017-12-12 18:04:58 -08:00
zstd_compress.c first implementation of delayed update for btlazy2 2017-12-28 16:58:57 +01:00
zstd_double_fast.c renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_double_fast.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_fast.c renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_fast.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_lazy.c first implementation of delayed update for btlazy2 2017-12-28 16:58:57 +01:00
zstd_lazy.h first implementation of delayed update for btlazy2 2017-12-28 16:58:57 +01:00
zstd_ldm.c Increase maximum window size 2017-09-26 14:00:01 -07:00
zstd_ldm.h renamed zstd_compress.h into zstd_compress_internal.h 2017-11-07 16:15:23 -08:00
zstd_opt.c first implementation of delayed update for btlazy2 2017-12-28 16:58:57 +01:00
zstd_opt.h first implementation of delayed update for btlazy2 2017-12-28 16:58:57 +01:00
zstdmt_compress.c Merge pull request #958 from facebook/continueCCtx 2017-12-20 00:12:50 +01:00
zstdmt_compress.h zstdmt via compress_generic: reduce opportunity to free/create mtctx 2017-12-16 12:48:13 -08:00