There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.
Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.
The "proper" formula makes it possible to remove this heuristic.
Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).
It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
adapt accuracy depending on value.
makes it possible to have higher accuracy for small value,
notably small compression speed.
This capability is expected to be useful while modifying optimal parser.
Currently, all files are joined by default,
they are compressed separately but benchmarked together,
providing a single final result.
Benchmarking files separately make it possible to accurately measure difference for each file.
This is expected to be useful while tuning optimal parser.
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)
It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).
warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.
This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.
It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
removed the other 2 code paths (single thread, and ZSTDMT ones)
keeping only the new advanced API, for easier code coverage.
It shall also fix identified issue with Visual Studio
which doesn't have ZSTD_NEWAPI defined.
UTIL_getFileSize() used to return zero on failure.
This made it impossible to distinguish a failure from a genuine empty file.
Both cases where coalesced.
Adding UTIL_FILESIZE_UNKNOWN constant has many consequences on user code,
since in many places, the `0` was assumed to mean "error".
This is no longer the case, and the error code must be actively checked.
for easier invocation.
- no longer expose frequency timer :
it's either useless, or stored internally in a static variable (init is only necessary once).
- UTIL_getTime() provides result by function return.
Doesn't speed optimize this buffer-to-buffer scenario yet.
Still internally defers to streaming implementation.
Also : fixed a long standing bug in ZSTDMT streaming API.
It now only uses compressionParameters as argument.
It produces many changes throughout user code,
though hopefully they tend to be simple :
just provide the cParams part from existing ZSTD_parameters.
Some programs might depend on ZSTD_createCDict_advanced() to pass frame parameters.
This change will force them to revisit this strategy and fix it,
since frame parameters are effectively silently ignored in current version.
- Add ZSTD_findDecompressedSize
- Traverses multiple frames to find total output size
- Add ZSTD_getFrameContentSize
- Gets the decompressed size of a single frame by reading header
- Deprecate ZSTD_getDecompressedSize
Also : fixed corner case, where nb of jobs completed becomes > jobQueueSize
which is possible when many flushes are issued
while there is not enough dst buffer to flush completed ones.
The new strategy involves cutting frame at block level.
The result is a single frame, preserving ZSTD_getDecompressedSize()
As a consequence, bench can now make a full round-trip,
since the result is compatible with ZSTD_decompress().
This strategy will not make it possible to decode the frame with multiple threads
since the exact cut between independent blocks is not known.
MT decoding needs further discussions.