Pathological samples may result in literal section being incompressible.
This case is now detected,
and literal distribution is replaced by one that can be written into the dictionary.
constants in zstd.h should not depend on MIN() macro which existence is not guaranteed.
Added a test to check the specific constants.
The test is a bit too specific.
But I have found no way to control a more generic "are all macro already defined" condition,
especially as this is a valid construction (the missing macro might be defined later, intentionnally).
in a new "custom memory allocator" paragraph
which is itself part of "memory management" category.
This makes it simpler to see the relation between the type and its usages.
This fixes the following crash:
$ touch exists
$ programs/zstd -r examples/ -o exists
zstd: exists already exists; not overwritten
Segmentation fault (core dumped)
* programs/fileio.c (FIO_compressMultipleFilenames):
Handle the case where we're not overwriting the destination.
Reported at https://bugzilla.redhat.com/1530049
It used to stop on reaching extDict, for simplification.
As a consequence, there was a small loss of performance each time the round buffer would restart from beginning.
It's not a large difference though, just several hundreds of bytes on silesia.
This patch fixes it.
now selected for levels 13, 14 and 15.
Also : dropped the requirement for monotonic memory budget increase of compression levels,,
which was required for ZSTD_estimateCCtxSize()
in order to ensure that a memory budget for level L is large enough for any level <= L.
This condition is now ensured at run time inside ZSTD_estimateCCtxSize().
now selected for levels 13, 14 and 15.
Also : dropped the requirement for monotonic memory budget increase of compression levels,,
which was required for ZSTD_estimateCCtxSize()
in order to ensure that a memory budget for level L is large enough for any level <= L.
This condition is now ensured at run time inside ZSTD_estimateCCtxSize().
we want the dictionary table to be fully sorted,
not just lazily filled.
Dictionary loading is a bit more intensive,
but it saves cpu cycles for match search during compression.
This is a pretty nice speed win.
The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.
The baseline performance for btlazy2 on my laptop is :
15#calgary.tar : 3265536 -> 955985 (3.416), 7.06 MB/s , 618.0 MB/s
15#enwik7 : 10000000 -> 3067341 (3.260), 4.65 MB/s , 521.2 MB/s
15#silesia.tar : 211984896 -> 58095131 (3.649), 6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)
After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar : 3265536 -> 958060 (3.408), 9.12 MB/s , 621.1 MB/s
15#enwik7 : 10000000 -> 3078318 (3.249), 6.37 MB/s , 525.1 MB/s
15#silesia.tar : 211984896 -> 58444111 (3.627), 9.89 MB/s , 680.4 MB/s
That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.
As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.
To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar : 3265536 -> 956311 (3.415), 8.32 MB/s , 611.4 MB/s
15#enwik7 : 10000000 -> 3067655 (3.260), 5.43 MB/s , 535.5 MB/s
15#silesia.tar : 211984896 -> 58113144 (3.648), 8.35 MB/s , 679.3 MB/s
aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).
This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
the test is broken
previous version was too Windows-specific.
It needs to be either refactored to be cross-platform,
or be removed if it doesn't bring any value
(which is indeed unclear).
Recipe in /tests rebuild everything from source for each target.
zstd is still a "small" project, so it's not prohibitive,
yet, rebuilding same files over and over represents substantial redundant work.
This patch replaces *.c files from /lib by their corresponding *.o files.
They cannot be compiled and stored directly within /lib,
since /tests triggers additional debug capabilities unwelcome in release binary.
So the resulting *.o are stored directly within /tests.
It turns out, it's difficult to find several target using *exactly* the same rules.
Using only the default rules (debug enabled, multi-threading disabled, no legacy)
a surprisingly small amount of targets share their work.
It's because, in many cases there are additional modifications requested :
some targets are 32-bits, some enable multi-threading, some enable legacy support,
some disable asserts, some want different kind of sanitizer, etc.
I created 2 sets of object files : with and without multithreading.
Several targets share their work, saving compilation time when running `make all`.
Also, obviously, when modifying one source file, only this one needs rebuilding.
For targets requiring some different setting, build from source *.c remain the rule.
The new rules have been tested within `-j` parallel compilation, and work fine with it.