The problem was already masked,
due to no longer accepting tiny blocks for statistics.
But in case it could still happen with not-so-tiny blocks,
there is a stricter control which ensures that
nothing was already loaded prior to statistics collection.
* Add configs that test multithreading, LDM, and setting explicit
parameters.
* Update the `compress cctx` method to accept `ZSTD_parameters`.
* Compile against the multithreaded `libzstd.a`.
* Update `results.csv` for the new configs.
Unless you think there are more configs/methods I should test, I think
we have a fairly wide set of configs/methods, so I'll pause adding
more for now.
depending on initialization,
the first byte of a new frame was invalidated or not.
As a consequence, one match opportunity was available or not,
resulting in slightly different compressed sizes
(on average, 1 or 2 bytes once every 20 frames).
It impacted ratio comparison between one-shot and streaming modes.
This fix makes the first byte of a new frame always a valid match.
Now compressed size is always the same.
It also improves compressed size by a negligible amount.
Compare the input and output files by their inode number and
refuse to open the output file if the input file is the same.
This doesn't work when (de)compressing multiple files to a single
file, but that is a very uncommon use case, mostly used for
benchmarking by me.
Fixes#1422.
* Fix `ZSTD_estimateCCtxSize()` with negative levels.
* Fix `ZSTD_estimateCStreamSize()` with negative levels.
* Add a unit test to test for this error.
When srcSize is small,
the nb of symbols produced is likely too small to warrant dedicated probability tables.
In which case, predefined distribution tables will be used instead.
There is a cheap algorithm in btultra initialization :
it presumes default distribution will be used if srcSize <= 1024.
btultra2 now uses the same threshold to shut down probability estimation,
since measured frequencies won't be used at entropy stage,
and therefore relying on them to determine sequence cost is misleading,
resulting in worse compression ratios.
This fixes btultra2 performance issue on very small input.
Note that, a proper way should be
to determine which symbol is going to use predefined probaility
and which symbol is going to use dynamic ones.
But the current algorithm is unable to make a "per-symbol" decision.
So this will require significant modifications.