MT compression generates a single frame.
Multi-threading operates by breaking the frames into independent sections.
But from a decoder perspective, there is no difference :
it's just a suite of blocks.
Problem is, decoder preserves repCodes from previous block to start decoding next block.
This is also valid between sections, since they are no different than changing block.
Previous version would incorrectly initialize repcodes to their default value at the beginning of each section.
When using them, there was a mismatch between encoder (default values) and decoder (values from previous block).
This change ensures that repcodes won't be used at the beginning of a new section.
It works by setting them to 0.
This only works with regular (single segment) variants : extDict variants will fail !
Fortunately, sections beyond the 1st one belong to this category.
To be checked : btopt strategy.
This change was only validated from fast to btlazy2 strategies.
In some complex scenarios (free() without finishing compression),
it is possible that some resources are still into jobs
and not collected back into pools.
In which case, previous version of free() would miss them.
This would be equivalent to a leak.
New version ensures that it even foes after such resource.
It requires job consumers to properly mark resources as released,
by replacing entries by NULL after releasing back to the pool.
Obviously, it's not recommended to free() zstdmt context mid-term,
still that's now a supported scenario.
The same methodology is also used to ensure proper resource collection
after an error is detected.
Still to do :
- detect compression errors (not just allocation ones)
- properly manage resource when init() is called without finishing previous compression.
The main issue was to avoid a caller to continually loop on {flush,end}Stream()
when there was nothing ready to be flushed but still some compression work ongoing in a worker thread.
The continuous loop would have resulted in wasted energy.
The new version makes call to {flush,end}Stream blocking when there is nothing ready to be flushed.
Of course, if all worker threads have exhausted job, it will return zero (all flush completed).
Note : There are still some remaining issues to report error codes
and properly collect back resources into pools when an error is triggered.
When porting python-zstandard to use ZSTD_initCStream_usingCDict()
so compression dictionaries could be reused, an automated test
failed due to compressed content changing.
I tracked this down to ZSTD_initCStream_usingCDict() not
setting the dictID field of the ZSTD_CCtx attached to the
ZSTD_CStream instance.
I'm not 100% convinced this is the correct or full solution,
as I'm still seeing one automated test failing with this change.
In previous version, main function would return early when detecting a job error.
Late threads resources were therefore not collected back into pools.
New version just register the error, but continue the collecting process.
All buffers and context should be released back to pool before leaving main function.
Result from getBuffer and getCCtx could be NULL when allocation fails.
Now correctly checks : job creation stop and last job reports an allocation error.
releaseBuffer and releaseCCtx are now also compatible with NULL input.
Identified a new potential issue :
when early job fails, later jobs are not collected for resource retrieval.
Since the result of mt compression is a single frame,
changed naming, which implied the concatenation of multiple frames.
minor : ensures that content size is written in header
The new strategy involves cutting frame at block level.
The result is a single frame, preserving ZSTD_getDecompressedSize()
As a consequence, bench can now make a full round-trip,
since the result is compatible with ZSTD_decompress().
This strategy will not make it possible to decode the frame with multiple threads
since the exact cut between independent blocks is not known.
MT decoding needs further discussions.
use ZSTD_freeCCtxPool() to release the partially created pool.
avoids to duplicate logic.
Also : identified a new difficult corner case :
when freeing the Pool, all CCtx should be previously released back to the pool.
Otherwise, it means some CCtx are still in use.
There is currently no clear policy on what to do in such a case.
Note : it's supposed to never happen.
Since pool creation/usage is static, it has no external user,
which limits risks.
`ZDICT_finalizeDictionary()` had a flipped comparison.
I also allowed `dictBufferCapacity == dictContentSize`.
It might be the case that the user wants to fill the dictionary
completely up, and then let zstd take exactly the space it needs
for the entropy tables.
execSequence relied on pointer overflow to handle cases where
`sequence.matchLength < 8`. Instead of passing an `size_t` to
wildcopy, pass a `ptrdiff_t`.
Allows an adversary to write up to 3 bytes beyond the end of the buffer.
Occurs if the match overlaps the `extDict` and `currentPrefix`, and the
match length in the `currentPrefix` is less than `MINMATCH`, and
`op-(16-MINMATCH) >= oMatchEnd > op-16`.
When the overflow protection kicks in, it makes sure that ip - ctx->base
isn't too large. However, it didn't ensure that saved offsets are
still valid. This change ensures that any valid offsets (<= windowLog)
are still representable after the update.
The bug would shop up on line 1056, when `offset_1 > current + 1`, which
causes an underflow. This in turn, would cause a segfault on line 1063.
The input must necessarily be longer than 1 GB for this issue to occur.
Even then, it only occurs if one of the last 3 matches is larger than
the chain size and block size.