* [ldm] Fix bug in overflow correction with large job size
* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)
* [test] Add test that exposes the bug
Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
* tests: Fix shellcheck warnings in playTests.sh
* tests: Do not use ../programs which is relative to tests dirs
This commit fixes error when running playTests.sh in Meson.
Mesonbuild runs out of tree, so ./datagen not in `zstd/tests` dir,
it lies in <mesonbuilddir>/tests. This leads to ../programs invalid.
* tests: Replace relative paths for zstd/tests dir
* playTests: Set shell options explicitly, not in shebang
* playTests: Replace echo -e with printf
* meson: Fix test-zstd
Use std=gnu99 to build and test just like `make test`.
* meson: Fix legacy test
* meson: Enable testing in CI
Run build under release mode for faster test time.
* meson: Increase timeout time for test-zstream
Pull request #1499 added a new test, which uses 'head -c'. The '-c'
option is non-portable (not in POSIX). Instead use 'dd'. Similar issue
has been resolved in the past (#1321).
fseek() doesn't indicate when it moves past the end of a file.
Consequently, if a file is truncated within its last block, the error would't be detected.
This PR adds a test scenario that induces this situation using a small compressed file of only one block in size.
This test is added to tests/playTests.sh
Check is implemented by ensuring that the filehandle position is equal to the filesize upon exit.
On Windows, the equivalent of `/dev/null` is `NUL`.
When tests are run under msys2/minGW,
the environment identifies itself as Windows,
hence the script uses `NUL` instead of `/dev/null`
but the environment will consider `NUL` to be a regular file name.
Consequently, `NUL` will be overwritten during tests,
triggering an error.
This patch uses flag `-f` to force such overwrite
passing the test.
Compare the input and output files by their inode number and
refuse to open the output file if the input file is the same.
This doesn't work when (de)compressing multiple files to a single
file, but that is a very uncommon use case, mostly used for
benchmarking by me.
Fixes#1422.
The `--no-progress` flag disables zstd's progress bars, but leaves
the summary.
I've added simple tests to `playTests.sh` to make sure the parsing
works.
Sometimes, it's necessary to test that a certain command fail, as expected.
Such failure is actually a success, and must not stop the flow of tests.
Several tests were prefixed with `!` to invert return code.
This does not work : it effectively makes the tests pass no matter what.
Use instead function die(), which is meant to trap successes, and transform them into errors.
tests/playTests.sh uses 'head -c' in a couple of tests to truncate the
last byte of a file. The '-c' option is non-portable (not in POSIX).
Instead use a wrapper around dd (truncateLastByte).
* Minor fix
* Run non-optimize FASTCOVER 5 times in benchmark
* Merge fastCover into dictBuilder
* Fix mixed declaration issue
* Add fastcover to symbol.c
* Add fastCover.c and cover.h to build
* Change fastCover.c to fastcover.c
* Update benchmark to run FASTCOVER in dictBuilder
* Undo spliting fastcover_param into cover_param and f
* Remove convert param functions
* Assign f to parameter
* Add zdict.h to Makefile in lib
* Add cover.h to BUCK
* Cast 1 to U64 before shifting
* Remove trimming of zero freq head and tail in selectSegment and rebenchmark
* Remove f as a separate parameter of tryParam
* Read 8 bytes when d is 6
* Add trimming off zero frequency head and tail
* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)
* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary
* Change nbDmer to always read 8 bytes even when d=6
* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER
* Update comments and benchmarking result
* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover
* Add dictType enum and fix bug about passing zParam when converting to coverParam
* Combine finalize and skip into a single parameter
* Update acceleration parameters and benchmark on 3 sample sets
* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets
* Initialize variables outside of for loop in benchmark.c
* Update benchmark result for hg-manifest
* Remove cover.h from install-includes
* Add explanation of f
* Set default compression level for trainFromBuffer to 3
* Add assertion of fastCoverParams in DiB_trainFromFiles
* Add checkTotalCompressedSize function + some minor fixes
* Add test for multithreading fastCovr
* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish
* Free segmentFreqs
* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment
* Add FASTCOVER_MEMMULT
* Minor fix
* Update benchmarking result
OpenBSD's port building infrastructure is able to build in a privilege
separated mode. It uses a privilege drop model. Regression tests fail in
this mode as xz/lzma is unable to set file group and errors out with:
xz: tmp.xz: Cannot set the file group: Operation not permitted
gmake[1]: *** [Makefile:307: zstd-playTests] Error 2
Actually it is not a xz/lzma error but a warning causing zstd's
regression test to fail. Proposed fix is to have xz/lzma not set the
exit status to 2 even if a condition worth a warning was detected (-Q
flag).
https://github.com/facebook/zstd/pull/1124 fixes an issue with GNU/Hurd
being unable to write to /dev/zero. Implemented fix is writing to
/dev/random instead.
On OpenBSD a regular user is unable to write to /dev/random because of
permissions set on this device. Result is failing a regression test.
Proposed solution should work for all platforms.
OpenBSD uses md5 instead of md5sum, and has no device called full.
With this patch, make check runs until #1088. With the assumption made
in the issue make check runs succesfully.
Summary:
Allocate a single input buffer large enough to house each job, as well as
enough space for the IO thread to write 2 extra buffers. One goes in the
`POOL` queue, and one to fill, and then block on a full `POOL` queue.
Since we can't overlap with the prefix, we allocate space for 3 extra
input buffers.
Test Plan:
* CI
* With and without ASAN/UBSAN run zstdmt with different number of threads
on two large binaries, and verify that their checksums match.
* Test on the tip of the zstdmt ldm integration.
Reviewers: cyan
Differential Revision: https://phabricator.intern.facebook.com/D7284007
Tasks: T25664120
access negative compression levels from command line
for both compression and benchmark modes.
also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.
added relevant cli tests.
added some test
also updated relevant doc
+ fixed a mistake in `lz4` symlink support :
lz4 utility doesn't remove source files by default (like zstd, but unlike gzip).
The symlink must behave the same.
we want the dictionary table to be fully sorted,
not just lazily filled.
Dictionary loading is a bit more intensive,
but it saves cpu cycles for match search during compression.
zstd streaming API was adding a null-block at end of frame for small input.
Reason is : on small input, a single block is enough.
ZSTD_CStream would size its input buffer to expect a single block of this size,
automatically triggering a flush on reaching this size.
Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`).
The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame.
The solution is to not flush automatically, which is btw the expected behavior.
It happens in this case because blocksize is defined with exactly the same size as input.
Just adding one-byte is enough to stop triggering the automatic flush.
I initially looked at another solution, solving the problem directly in the compression context.
But it felt awkward.
Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame
on reaching its expected end (`pledgedSrcSize`).
This feels awkward, a responsability over-reach, beyond the definition of this API.
ZSTD_compressContinue() is clearly documented as a guaranteed flush,
with ZSTD_compressEnd() generating a guaranteed end.
I faced similar issue when trying to port a similar mechanism at the higher streaming layer.
Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller,
since it did not explicitly requested an end of frame.
The only sensible action remaining after that is to end the frame with no additional input.
This adds additional logic in the ZSTD_CStream state to check this condition.
Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?)
In the end, just enlarging input buffer by 1 byte feels the least intrusive change.
It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code.
The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.
- building cli from /tests preserves potential flags in MOREFLAGS (such as asan/usan)
- MT dictionary tests check for MT capability (MT is not enabled by default for zstd32)
* Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail
on my Mac.
* Maximum window size in 64-bit mode is 2GB, since that is the largest
power of 2 that works with the overflow prevention.
* Allow `--long=windowLog` to set the window log, along with
`--zstd=wlog=#`. These options also set the window size during
decompression, but don't override `--memory=#` if it is set.
* Present a helpful error message when the window size is too large during
decompression.
* The long range matcher defaults to a hash log 7 less than the window log,
which keeps it at 20 for window log 27.
* Keep the default long range matcher window size and the default maximum
window size at 27 for the API and CLI.
* Add tests that use the maximum window size and hash size for compression
and decompression.