Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.
Fixes#1077.
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".
It will also be possible to define larger targetlength for ultra compression mode.
There is no adverse side effect due to removing this limit.
Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`,
which means that dictionary compression will only be able to reference the parts of
the dictionary within the window. The spec allows us to reference the entire
dictionary so long as even one byte is in the window.
`ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd`
beyond the window, even once the dictionary was out of range.
When overflow protection kicked in, the check `current > loadedDictEnd + maxDist`
is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset
below the value, which would incorrectly allow references beyond the window. This
bug is present in `master`, but is very hard to trigger, since it requires both
dictionaries and data which triggers overflow correction.
* Expose the reference external sequences API for zstdmt.
Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
more than 1 MB are encountered.
The overflow protection is broken when the window log is `> (3U << 29)`, so 31.
It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`,
and the the assertion `current > newCurrent` fails. This happens when the same
context is used many times over, but with a large window log, like in zstdmt.
Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`.
The added test fails before the patch, and passes after.
access negative compression levels from command line
for both compression and benchmark modes.
also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.
added relevant cli tests.
negative compression level trade compression ratio for more compression speed.
They turn off huffman compression of literals,
and use row 0 as baseline with a stepSize = -cLevel.
added associated test in fuzzer
also added : new advanced parameter ZSTD_p_literalCompression
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
stores them in a table. It should work with any chunk size, but
is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
instead of encoding the literals directly, it passes them to a
secondary block compressor. The code to handle chunk sizes greater
than the block size is currently commented out, since it is unused.
The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
`targetLength`. Also don't emit repcode matches in the LDM block
compressor. Enabling the LDM with the optimal parser now actually improves
the compression ratio.
* The compression ratio is very similar to before. It is very slightly
different, because the repcode handling is slightly different. If I remove
immediate repcode checking in both branches the compressed size is exactly
the same.
* The speed looks to be the same or better than before.
Up Next (in a separate PR)
--------------------------
Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).
Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
replaced by equivalent signal job->consumer == job->srcSize.
created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.
It avoids creating a job with srcSize==0.