When zstdmt cannot get a buffer and `ZSTD_e_end` is passed an empty
compression job can be created. Additionally, `mtctx->frameEnded` can be
set to 1, which could potentially cause problems like unterminated blocks.
The fix is to adjust to `ZSTD_e_flush` even when we can't get a buffer.
* Run compression twice and check the compressed data is byte-identical.
The compression loop had to be rewritten to ensure deteriminism. It is
guaranteed by always making maximal forward progress.
* When nbWorkers > 0, change the number of workers 1/8 of the time.
* Run in single-pass mode 1/4 of the time.
I've run a few hundred thousand iterations of zstreamtest and have seen
no deteriminism issues so far. Before the zstdmt fix that skips the
single-pass shortcut non-determinism showed up in a few hundred
iterations.
This commit leaves only the functions used by zstd_compress.c. All other
functions have been removed from the API. The ZSTDMT unit tests in
fuzzer.c and zstreamtest.c have been rewritten to use the ZSTD API. And
the --mt zstreamtest tests have been ripped out.
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.
Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.
Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.
Adds a simple test to ensure that we aren't downsizing too far.
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).
Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.
Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.
Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
This is kind of hacky. And maybe this test doesn't need to be permanently as
exhaustive as it is now. But while we're actively developing the DDSS, we
should ensure it's compatible across many different modes.
* Fix bug introduced in PR #2271
* Fix long-standing bug that is impossible to trigger inside of zstd
* Add a fuzzer that makes sure the normalized count always round trips
correctly
for --patch-from origin
and --filelist list
Also : removed some constrained syntax tests,
as the new argument parsing syntax is more permissive.
For example :
zstd file -of dest
used to be disallowed.
It's now allowed, and understood as:
zstd file -o dest -f
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.
Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.
Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.
Fixes#2174.