params1 was swapped with params2.
This used to be a non-issue when testing for strict equality,
but now that some tests look for "sufficient size" `<=`, order matters.
The deep fuzzer tests caught a subtle bug that was probably there for a long time.
The impact of the bug is not a crash, or any other clear error signal,
rather, it reduces performance, by cutting data into smaller blocks.
Eventually, the following test would fail because it produces too many 1-byte blocks,
requiring more space than buffer can provide :
`./zstreamtest_asan --mt -s3514 -t1678312 -i1678314`
The root scenario is as follows :
- Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1`
- The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`.
`windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`.
The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes.
- Complete compression
- New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN`
trigger "continue mode"
- Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only,
but in fact, it also depends on `pledgedSrcSize`.
- The "old" blocksize (1) is still there,
next compression will use this value to cut input into blocks,
resulting in more blocks and worse performance than necessary performance.
Given the scenario, and its possible variants, I'm surprised it did not show up before.
But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger.
The above test is a special corner case, where performance is so impacted that it reaches an error case.
The fix works, but I'm not completely pleased.
I think the current code relies too much on implied relations between variables.
This will likely break again in the future when some related part of the code change.
Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3.
So a longer term fix will have to be considered after the release.
To do : create a reliable test case which triggers this scenario for CI tests.
`zstreamtest --newapi` (and `--opaqueapi`) create and destroy way too many threads
resulting in failure of tsan tests,
and potentially connected to the qemu flaky tests.
This is because, at each test, the nb of threads can be changed (random).
The `--no-big-tests` directive reduce this choice to 1/2 threads,
in order to limit memory usage, especially for qemu and 32-bits builds.
Unfortunately, swapping between 1 and 2 threads is enough to constantly create/destroy new mtctx.
This patch takes advantage of the following property :
via compress_generic, no internal mtctx is needed for nbThreads < 2.
As a consequence, when nbThreads == 2, the currently active mtctx is necessarily good.
This dramatically reduces the nb of thread creations when invoking `zstreamtest --newapi --no-big-tests`
(only when parent cctx itself is created, which is randomized to 1/256 tests).
Expected outcome :
- at a minimum : tsan tests shall now work continuously without exploding the thread counter
- at best : flaky qemu tests on `zstreamtest --newapi --no-big-tests` may stop being flaky, due to less stress from constant thread creation/destruction
Real world impact :
minimal, I don't expect users to constantly change `nbThreads` between each invocation.
If `nbThreads` remains stable, existing implementation re-uses existing mtctx.
Also : `zstreamtest --newapi` but without `--no-big-tests` doesn't benefit as much,
since this test can select a random `nbThreads` value between 1 and 4.
The current patch only reduces opportunity to free/create mtctx (for example : 2->1->2 doesn't need a new mtctx)
but doesn't completely eliminate it, since `nbThreads` can still change between 2/3/4.
A more complete solution could be to only use 2 out of 4 allocated threads, thus keeping the pool at a constant size.
This would require a larger change to `POOL_*` api though.
taking advantage of `btopt` improved speed to tune parameters.
Levels 16-19 are stronger than previous release, making the graph more favorable.
In theory, I should also update small-size tables,
but I got lazy on that one ...
zstd streaming API was adding a null-block at end of frame for small input.
Reason is : on small input, a single block is enough.
ZSTD_CStream would size its input buffer to expect a single block of this size,
automatically triggering a flush on reaching this size.
Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`).
The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame.
The solution is to not flush automatically, which is btw the expected behavior.
It happens in this case because blocksize is defined with exactly the same size as input.
Just adding one-byte is enough to stop triggering the automatic flush.
I initially looked at another solution, solving the problem directly in the compression context.
But it felt awkward.
Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame
on reaching its expected end (`pledgedSrcSize`).
This feels awkward, a responsability over-reach, beyond the definition of this API.
ZSTD_compressContinue() is clearly documented as a guaranteed flush,
with ZSTD_compressEnd() generating a guaranteed end.
I faced similar issue when trying to port a similar mechanism at the higher streaming layer.
Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller,
since it did not explicitly requested an end of frame.
The only sensible action remaining after that is to end the frame with no additional input.
This adds additional logic in the ZSTD_CStream state to check this condition.
Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?)
In the end, just enlarging input buffer by 1 byte feels the least intrusive change.
It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code.
The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.
This patch restores capability for each file to receive adapted compression parameters depending on its size.
The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.
Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
windowLog is now enforced from provided compression parameters,
instead of being copied blindly from `cdict`
where it could be smaller.
also :
- fix a minor bug in zstreamtest --mt : advanced parameters must be set before init
- changed advanced parameter name to ZSTDMT_jobSize
While the final result is still, technically, a frame,
the resulting frame expands initial data instead of compressing it.
This is because the streaming API creates a tiny 1-byte buffer for input,
because it believes input is empty (0-bytes),
because in the past, 0 used to mean "unknown" instead.
This patch fixes the issue.
Todo : add a test which traps the issue.
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action.
Any action shall be taken at parameter application only (during init).
It makes it possible to just copy CCtxParams from external container to internal state,
and get rid of the more complex code which was trying to compensate for missing actions.
Fixed : multithreading to compress some small data with dictionary
Fixed : ZSTD_initCStream_usingCDict()
Improved streaming memory usage when pledgedSrcSize is known.
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.
Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.
It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.
Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.
Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.
- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)
It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).
warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
added some traces and assert
related to hunting a potential ubsan error in 32-bits more
(it ends up being a compiler-side issue : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82802).
Modified one pointer arithmetic expression for a more conformant way.
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.
This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.
It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
ZSTD_compress() and friends would treat an empty input as an unknown size
when selecting parameters. Thus, they would drastically overallocate the
context. Tell ZSTD_getParams() that the source size is 1 when it is empty.
It isn't useful in any case to repeat default tables.
Saves a few bytes on Silesia, since we don't trigger the dictionary
heuristic.
Before: 211988480 => 73651998 bytes
After: 211988480 => 73651721 bytes
when determining compression parameters
to compress one file only.
For multiple files, it still "bets" that files are going to be small.
There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced()
making it incapable to use pledgedSrcSize to determine compression parameters.
In `ZSTD_compressBegin_advanced()`, `ZSTD_parameters` are used to set the
compression parameters, but the level didn't get set to `CLEVEL_CUSTOM`, so
`ZSTD_compressBlock()` used the wrong parameters when checking the source
size.