in this version, literal compression is always disabled for ZSTD_fast strategy.
Performance parity between ZSTD_compress_advanced() and ZSTD_compress_generic()
result of ZSTD_compress_advanced()
is different from ZSTD_compress_generic()
when using negative compression levels
because the disabling of huffman compression is not passed in parameters.
The (pretty old) code inside ZSTD_compress()
was making some pretty bold assumptions
on what's inside a CCtx and how to init it.
This is pretty fragile by design.
CCtx content evolve.
Knowledge of how to handle that should be concentrate in one place.
A side effect of this strategy
is that ZSTD_compress() wouldn't check for BMI2 capability,
and is therefore missing out some potential speed opportunity.
This patch makes ZSTD_compress() use
the same initialization and release functions
as the normal creator / destructor ones.
Measured on my laptop, with a custom version of bench
manually modified to use ZSTD_compress() (instead of the advanced API) :
This patch :
1#silesia.tar : 211984896 -> 73651053 (2.878), 312.2 MB/s , 723.8 MB/s
2#silesia.tar : 211984896 -> 70163650 (3.021), 226.2 MB/s , 649.8 MB/s
3#silesia.tar : 211984896 -> 66996749 (3.164), 169.4 MB/s , 636.7 MB/s
4#silesia.tar : 211984896 -> 65998319 (3.212), 136.7 MB/s , 619.2 MB/s
dev branch :
1#silesia.tar : 211984896 -> 73651053 (2.878), 291.7 MB/s , 727.5 MB/s
2#silesia.tar : 211984896 -> 70163650 (3.021), 216.2 MB/s , 655.7 MB/s
3#silesia.tar : 211984896 -> 66996749 (3.164), 162.2 MB/s , 633.1 MB/s
4#silesia.tar : 211984896 -> 65998319 (3.212), 130.6 MB/s , 618.6 MB/s
recently introduce into the new dictionary mode.
The bug could be reproduced with this command :
./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639
error was in function ZSTD_count_2segments() :
the beginning of the 2nd segment corresponds to prefixStart
and not the beginning of the current block (istart == src).
This would result in comparing the wrong byte.
recent experienced showed that
default distribution table for offset
can get it wrong pretty quickly with the nb of symbols,
while it remains a reasonable choice much longer for lengths symbols.
Changed the formula,
so that dynamic threshold is now 32 symbols for offsets.
It remains at 64 symbols for lengths.
Detection based on defaultNormLog
zstd rejects blocks which do not compress by at least a certain amount.
In which case, such block is simply emitted uncompressed (even if a little bit of compression could be achieved).
This is better for decompression speed, hence for energy.
The logic is controlled by ZSTD_minGain().
The rule is applied uniformly, at all compression levels.
This change makes btultra accepts blocks with poor compression ratios.
We presume that users of btultra mode prefers compression ratio over some decompress speed gains.
The threshold for minimum gain is lowered for btultra
from s>>6 (~1.5% minimum gain)
to s>>7 (~0.8% minimum gain).
This is a prudent change.
Not sure if it's large enough.
Work around bug in zstd decoder
Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.
* The cost of `set_basic` is computed using the cross-entropy cost
function `ZSTD_crossEntropyCost()`, using the normalized default count
and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
`ZSTD_entropyCost()`, together with the cost of writing the normalized
count `ZSTD_NCountCost()`.
for proper estimation of symbol's weights
when using dictionary compression.
Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
reported by @let-def.
It's actually a bug in ZSTD_compressBegin_usingCDict()
which would pass a wrong pledgedSrcSize value (0 instead of ZSTD_CONTENTSIZE_UNKNOWN)
resulting in wrong window size, resulting in downsized seqStore,
resulting in segfault when writing into the seqStore later in the process.
Added a test in fuzzer to cover this use case (fails before the patch).
The new advanced API basically set `requestedParams = appliedParams` when
using a dictionary. This halted all parameter adjustment, which can hurt
compression ratio if, for example, the window log is small for the first
call, but the rest of the files are large.
This patch fixes the bug, and checks that the `requestedParams` don't change
in the new advanced API when using a dictionary, and generally in the fuzzer.