Summary:
Allocate a single input buffer large enough to house each job, as well as
enough space for the IO thread to write 2 extra buffers. One goes in the
`POOL` queue, and one to fill, and then block on a full `POOL` queue.
Since we can't overlap with the prefix, we allocate space for 3 extra
input buffers.
Test Plan:
* CI
* With and without ASAN/UBSAN run zstdmt with different number of threads
on two large binaries, and verify that their checksums match.
* Test on the tip of the zstdmt ldm integration.
Reviewers: cyan
Differential Revision: https://phabricator.intern.facebook.com/D7284007
Tasks: T25664120
Summary:
* Expose the reference external sequences API for zstdmt.
Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
more than 1 MB are encountered.
Expose reference external sequence API
* Expose the reference external sequences API for zstdmt.
* Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
more than 1 MB are encountered.
Test Plan:
* CI
* Test the zstdmt ldm integration stacked on top of this diff
Reviewers: cyan
Differential Revision: https://phabricator.intern.facebook.com/D7283968
Tasks: T25664120
* Expose the reference external sequences API for zstdmt.
Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
more than 1 MB are encountered.
The overflow protection is broken when the window log is `> (3U << 29)`, so 31.
It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`,
and the the assertion `current > newCurrent` fails. This happens when the same
context is used many times over, but with a large window log, like in zstdmt.
Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`.
The added test fails before the patch, and passes after.
* Replaced a non-breaking space and an en dash with a plain space and
a hyphen.
* This means the files are simple ASCII and less likely to run into
codepage issues.
- do not test level 0, as it is converted into level 3,
which feels strange when compressing multiple levels
- Use direct synchronous mode when a single worker is requested.
access negative compression levels from command line
for both compression and benchmark modes.
also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.
added relevant cli tests.
negative compression level trade compression ratio for more compression speed.
They turn off huffman compression of literals,
and use row 0 as baseline with a stepSize = -cLevel.
added associated test in fuzzer
also added : new advanced parameter ZSTD_p_literalCompression
zstd bench module can focus on decompression speed _only_.
This is useful when trying to measure performance
on large input data compressed using a high level
as compression time becomes problematic (too long).
This mode is triggered by command : zstd -b -d
Problem was : in such a mode,
measured decoding speed was > 10% slower
than in nominal mode (compression + decompression),
making decompression benchmark mode much less useful.
This patch fixes the issue.
It's not completely clear why, but
moving the `memcpy()` operation sooner in the pipeline fixed it.
I can still measure some difference, but it is in the < 2% range,
so it's much more tolerable.
also : it doesn't matter anymore in which order are selected
commands `-b` and `-d`.
The combination always triggers bench_decodeOnly mode.