Merge branch 'dev' into initStatic_tests
This commit is contained in:
commit
58227db405
28
README.md
28
README.md
@ -31,10 +31,10 @@ a list of known ports and bindings is provided on [Zstandard homepage](http://ww
|
||||
## Benchmarks
|
||||
|
||||
For reference, several fast compression algorithms were tested and compared
|
||||
on a server running Arch Linux (`Linux version 5.0.5-arch1-1`),
|
||||
on a server running Arch Linux (`Linux version 5.5.11-arch1-1`),
|
||||
with a Core i9-9900K CPU @ 5.0GHz,
|
||||
using [lzbench], an open-source in-memory benchmark by @inikep
|
||||
compiled with [gcc] 8.2.1,
|
||||
compiled with [gcc] 9.3.0,
|
||||
on the [Silesia compression corpus].
|
||||
|
||||
[lzbench]: https://github.com/inikep/lzbench
|
||||
@ -43,18 +43,26 @@ on the [Silesia compression corpus].
|
||||
|
||||
| Compressor name | Ratio | Compression| Decompress.|
|
||||
| --------------- | ------| -----------| ---------- |
|
||||
| **zstd 1.4.4 -1** | 2.884 | 520 MB/s | 1600 MB/s |
|
||||
| zlib 1.2.11 -1 | 2.743 | 110 MB/s | 440 MB/s |
|
||||
| brotli 1.0.7 -0 | 2.701 | 430 MB/s | 470 MB/s |
|
||||
| quicklz 1.5.0 -1 | 2.238 | 600 MB/s | 800 MB/s |
|
||||
| lzo1x 2.09 -1 | 2.106 | 680 MB/s | 950 MB/s |
|
||||
| lz4 1.8.3 | 2.101 | 800 MB/s | 4220 MB/s |
|
||||
| snappy 1.1.4 | 2.073 | 580 MB/s | 2020 MB/s |
|
||||
| lzf 3.6 -1 | 2.077 | 440 MB/s | 930 MB/s |
|
||||
| **zstd 1.4.5 -1** | 2.884 | 500 MB/s | 1660 MB/s |
|
||||
| zlib 1.2.11 -1 | 2.743 | 90 MB/s | 400 MB/s |
|
||||
| brotli 1.0.7 -0 | 2.703 | 400 MB/s | 450 MB/s |
|
||||
| **zstd 1.4.5 --fast=1** | 2.434 | 570 MB/s | 2200 MB/s |
|
||||
| **zstd 1.4.5 --fast=3** | 2.312 | 640 MB/s | 2300 MB/s |
|
||||
| quicklz 1.5.0 -1 | 2.238 | 560 MB/s | 710 MB/s |
|
||||
| **zstd 1.4.5 --fast=5** | 2.178 | 700 MB/s | 2420 MB/s |
|
||||
| lzo1x 2.10 -1 | 2.106 | 690 MB/s | 820 MB/s |
|
||||
| lz4 1.9.2 | 2.101 | 740 MB/s | 4530 MB/s |
|
||||
| **zstd 1.4.5 --fast=7** | 2.096 | 750 MB/s | 2480 MB/s |
|
||||
| lzf 3.6 -1 | 2.077 | 410 MB/s | 860 MB/s |
|
||||
| snappy 1.1.8 | 2.073 | 560 MB/s | 1790 MB/s |
|
||||
|
||||
[zlib]: http://www.zlib.net/
|
||||
[LZ4]: http://www.lz4.org/
|
||||
|
||||
The negative compression levels, specified with `--fast=#`,
|
||||
offer faster compression and decompression speed in exchange for some loss in
|
||||
compression ratio compared to level 1, as seen in the table above.
|
||||
|
||||
Zstd can also offer stronger compression ratios at the cost of compression speed.
|
||||
Speed vs Compression trade-off is configurable by small increments.
|
||||
Decompression speed is preserved and remains roughly the same at all settings,
|
||||
|
@ -1144,13 +1144,26 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD_CCtx_params* params)
|
||||
size_t const ldmSpace = ZSTD_ldm_getTableSize(params->ldmParams);
|
||||
size_t const ldmSeqSpace = ZSTD_cwksp_alloc_size(ZSTD_ldm_getMaxNbSeq(params->ldmParams, blockSize) * sizeof(rawSeq));
|
||||
|
||||
size_t const neededSpace = entropySpace + blockStateSpace + tokenSpace +
|
||||
matchStateSize + ldmSpace + ldmSeqSpace;
|
||||
/* estimateCCtxSize is for one-shot compression. So no buffers should
|
||||
* be needed. However, we still allocate two 0-sized buffers, which can
|
||||
* take space under ASAN. */
|
||||
size_t const bufferSpace = ZSTD_cwksp_alloc_size(0)
|
||||
+ ZSTD_cwksp_alloc_size(0);
|
||||
|
||||
size_t const cctxSpace = ZSTD_cwksp_alloc_size(sizeof(ZSTD_CCtx));
|
||||
|
||||
DEBUGLOG(5, "sizeof(ZSTD_CCtx) : %u", (U32)cctxSpace);
|
||||
size_t const neededSpace =
|
||||
cctxSpace +
|
||||
entropySpace +
|
||||
blockStateSpace +
|
||||
ldmSpace +
|
||||
ldmSeqSpace +
|
||||
matchStateSize +
|
||||
tokenSpace +
|
||||
bufferSpace;
|
||||
|
||||
DEBUGLOG(5, "estimate workspace : %u", (U32)neededSpace);
|
||||
return cctxSpace + neededSpace;
|
||||
return neededSpace;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -10,7 +10,7 @@ There are however other Makefile targets that create different variations of CLI
|
||||
- `zstd-decompress` : version of CLI which can only decompress zstd format
|
||||
|
||||
|
||||
#### Compilation variables
|
||||
### Compilation variables
|
||||
`zstd` scope can be altered by modifying the following `make` variables :
|
||||
|
||||
- __HAVE_THREAD__ : multithreading is automatically enabled when `pthread` is detected.
|
||||
@ -61,6 +61,24 @@ There are however other Makefile targets that create different variations of CLI
|
||||
In which case, linking stage will fail if `lz4` library cannot be found.
|
||||
This is useful to prevent silent feature disabling.
|
||||
|
||||
- __ZSTD_NOBENCH__ : `zstd` cli will be compiled without its integrated benchmark module.
|
||||
This can be useful to produce smaller binaries.
|
||||
In this case, the corresponding unit can also be excluded from compilation target.
|
||||
|
||||
- __ZSTD_NODICT__ : `zstd` cli will be compiled without support for the integrated dictionary builder.
|
||||
This can be useful to produce smaller binaries.
|
||||
In this case, the corresponding unit can also be excluded from compilation target.
|
||||
|
||||
- __ZSTD_NOCOMPRESS__ : `zstd` cli will be compiled without support for compression.
|
||||
The resulting binary will only be able to decompress files.
|
||||
This can be useful to produce smaller binaries.
|
||||
A corresponding `Makefile` target using this ability is `zstd-decompress`.
|
||||
|
||||
- __ZSTD_NODECOMPRESS__ : `zstd` cli will be compiled without support for decompression.
|
||||
The resulting binary will only be able to compress files.
|
||||
This can be useful to produce smaller binaries.
|
||||
A corresponding `Makefile` target using this ability is `zstd-compress`.
|
||||
|
||||
- __BACKTRACE__ : `zstd` can display a stack backtrace when execution
|
||||
generates a runtime exception. By default, this feature may be
|
||||
degraded/disabled on some platforms unless additional compiler directives are
|
||||
@ -69,11 +87,11 @@ There are however other Makefile targets that create different variations of CLI
|
||||
Example : `make zstd BACKTRACE=1`
|
||||
|
||||
|
||||
#### Aggregation of parameters
|
||||
### Aggregation of parameters
|
||||
CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined into `-b1e18i1`.
|
||||
|
||||
|
||||
#### Symlink shortcuts
|
||||
### Symlink shortcuts
|
||||
It's possible to invoke `zstd` through a symlink.
|
||||
When the name of the symlink has a specific value, it triggers an associated behavior.
|
||||
- `zstdmt` : compress using all cores available on local system.
|
||||
@ -86,7 +104,7 @@ When the name of the symlink has a specific value, it triggers an associated beh
|
||||
- `ungz`, `unxz` and `unlzma` will do the same, and will also remove source file by default (use `--keep` to preserve).
|
||||
|
||||
|
||||
#### Dictionary builder in Command Line Interface
|
||||
### Dictionary builder in Command Line Interface
|
||||
Zstd offers a training mode, which can be used to tune the algorithm for a selected
|
||||
type of data, by providing it with a few samples. The result of the training is stored
|
||||
in a file selected with the `-o` option (default name is `dictionary`),
|
||||
@ -106,7 +124,7 @@ Usage of the dictionary builder and created dictionaries with CLI:
|
||||
3. Decompress with the dictionary: `zstd --decompress FILE.zst -D dictionaryName`
|
||||
|
||||
|
||||
#### Benchmark in Command Line Interface
|
||||
### Benchmark in Command Line Interface
|
||||
CLI includes in-memory compression benchmark module for zstd.
|
||||
The benchmark is conducted using given filenames. The files are read into memory and joined together.
|
||||
It makes benchmark more precise as it eliminates I/O overhead.
|
||||
@ -118,81 +136,84 @@ One can select compression levels starting from `-b` and ending with `-e`.
|
||||
The `-i` parameter selects minimal time used for each of tested levels.
|
||||
|
||||
|
||||
#### Usage of Command Line Interface
|
||||
### Usage of Command Line Interface
|
||||
The full list of options can be obtained with `-h` or `-H` parameter:
|
||||
```
|
||||
Usage :
|
||||
zstd [args] [FILE(s)] [-o file]
|
||||
Usage :
|
||||
zstd [args] [FILE(s)] [-o file]
|
||||
|
||||
FILE : a filename
|
||||
FILE : a filename
|
||||
with no FILE, or when FILE is - , read standard input
|
||||
Arguments :
|
||||
-# : # compression level (1-19, default: 3)
|
||||
-d : decompression
|
||||
-D file: use `file` as Dictionary
|
||||
-o file: result stored into `file` (only if 1 input file)
|
||||
-f : overwrite output without prompting and (de)compress links
|
||||
--rm : remove source file(s) after successful de/compression
|
||||
-k : preserve source file(s) (default)
|
||||
-h/-H : display help/long help and exit
|
||||
Arguments :
|
||||
-# : # compression level (1-19, default: 3)
|
||||
-d : decompression
|
||||
-D file: use `file` as Dictionary
|
||||
-o file: result stored into `file` (only if 1 input file)
|
||||
-f : overwrite output without prompting and (de)compress links
|
||||
--rm : remove source file(s) after successful de/compression
|
||||
-k : preserve source file(s) (default)
|
||||
-h/-H : display help/long help and exit
|
||||
|
||||
Advanced arguments :
|
||||
-V : display Version number and exit
|
||||
Advanced arguments :
|
||||
-V : display Version number and exit
|
||||
-v : verbose mode; specify multiple times to increase verbosity
|
||||
-q : suppress warnings; specify twice to suppress errors too
|
||||
-c : force write to standard output, even if it is the console
|
||||
-l : print information about zstd compressed files
|
||||
--exclude-compressed: only compress files that are not previously compressed
|
||||
-l : print information about zstd compressed files
|
||||
--exclude-compressed: only compress files that are not previously compressed
|
||||
--ultra : enable levels beyond 19, up to 22 (requires more memory)
|
||||
--long[=#]: enable long distance matching with given window log (default: 27)
|
||||
--fast[=#]: switch to very fast compression levels (default: 1)
|
||||
--adapt : dynamically adapt compression level to I/O conditions
|
||||
--stream-size=# : optimize compression parameters for streaming input of given number of bytes
|
||||
--adapt : dynamically adapt compression level to I/O conditions
|
||||
--stream-size=# : optimize compression parameters for streaming input of given number of bytes
|
||||
--size-hint=# optimize compression parameters for streaming input of approximately this size
|
||||
--target-compressed-block-size=# : make compressed block near targeted size
|
||||
-T# : spawns # compression threads (default: 1, 0==# cores)
|
||||
-B# : select size of each job (default: 0==automatic)
|
||||
--rsyncable : compress using a rsync-friendly method (-B sets block size)
|
||||
--target-compressed-block-size=# : make compressed block near targeted size
|
||||
-T# : spawns # compression threads (default: 1, 0==# cores)
|
||||
-B# : select size of each job (default: 0==automatic)
|
||||
--rsyncable : compress using a rsync-friendly method (-B sets block size)
|
||||
--no-dictID : don't write dictID into header (dictionary compression)
|
||||
--[no-]check : integrity check (default: enabled)
|
||||
--[no-]compress-literals : force (un)compressed literals
|
||||
-r : operate recursively on directories
|
||||
--output-dir-flat[=directory]: all resulting files stored into `directory`.
|
||||
--format=zstd : compress files to the .zst format (default)
|
||||
--format=gzip : compress files to the .gz format
|
||||
--test : test compressed file integrity
|
||||
--[no-]check : integrity check (default: enabled)
|
||||
--[no-]compress-literals : force (un)compressed literals
|
||||
-r : operate recursively on directories
|
||||
--output-dir-flat[=directory]: all resulting files stored into `directory`.
|
||||
--format=zstd : compress files to the .zst format (default)
|
||||
--format=gzip : compress files to the .gz format
|
||||
--test : test compressed file integrity
|
||||
--[no-]sparse : sparse mode (default: disabled)
|
||||
-M# : Set a memory usage limit for decompression
|
||||
--no-progress : do not display the progress bar
|
||||
-- : All arguments after "--" are treated as files
|
||||
-M# : Set a memory usage limit for decompression
|
||||
--no-progress : do not display the progress bar
|
||||
-- : All arguments after "--" are treated as files
|
||||
|
||||
Dictionary builder :
|
||||
--train ## : create a dictionary from a training set of files
|
||||
Dictionary builder :
|
||||
--train ## : create a dictionary from a training set of files
|
||||
--train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args
|
||||
--train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#,shrink[=#]] : use the fast cover algorithm with optional args
|
||||
--train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9)
|
||||
-o file : `file` is dictionary name (default: dictionary)
|
||||
--maxdict=# : limit dictionary to specified size (default: 112640)
|
||||
-o file : `file` is dictionary name (default: dictionary)
|
||||
--maxdict=# : limit dictionary to specified size (default: 112640)
|
||||
--dictID=# : force dictionary ID to specified value (default: random)
|
||||
|
||||
Benchmark arguments :
|
||||
-b# : benchmark file(s), using # compression level (default: 3)
|
||||
Benchmark arguments :
|
||||
-b# : benchmark file(s), using # compression level (default: 3)
|
||||
-e# : test all compression levels from -bX to # (default: 1)
|
||||
-i# : minimum evaluation time in seconds (default: 3s)
|
||||
-i# : minimum evaluation time in seconds (default: 3s)
|
||||
-B# : cut file into independent blocks of size # (default: no block)
|
||||
--priority=rt : set process priority to real-time
|
||||
--priority=rt : set process priority to real-time
|
||||
```
|
||||
|
||||
#### Restricted usage of Environment Variables
|
||||
Using environment variables to set parameters has security implications.
|
||||
Therefore, this avenue is intentionally restricted.
|
||||
Only `ZSTD_CLEVEL` is supported currently, for setting compression level.
|
||||
`ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range).
|
||||
If the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message.
|
||||
`ZSTD_CLEVEL` just replaces the default compression level (`3`).
|
||||
It can be overridden by corresponding command line arguments.
|
||||
### Passing parameters through Environment Variables
|
||||
`ZSTD_CLEVEL` can be used to modify the default compression level of `zstd`
|
||||
(usually set to `3`) to another value between 1 and 19 (the "normal" range).
|
||||
This can be useful when `zstd` CLI is invoked in a way that doesn't allow passing arguments.
|
||||
One such scenario is `tar --zstd`.
|
||||
As `ZSTD_CLEVEL` only replaces the default compression level,
|
||||
it can then be overridden by corresponding command line arguments.
|
||||
|
||||
#### Long distance matching mode
|
||||
There is no "generic" way to pass "any kind of parameter" to `zstd` in a pass-through manner.
|
||||
Using environment variables for this purpose has security implications.
|
||||
Therefore, this avenue is intentionally restricted and only supports `ZSTD_CLEVEL`.
|
||||
|
||||
### Long distance matching mode
|
||||
The long distance matching mode, enabled with `--long`, is designed to improve
|
||||
the compression ratio for files with long matches at a large distance (up to the
|
||||
maximum window size, `128 MiB`) while still maintaining compression speed.
|
||||
@ -216,12 +237,12 @@ Compression Speed vs Ratio | Decompression Speed
|
||||
|
||||
| Method | Compression ratio | Compression speed | Decompression speed |
|
||||
|:-------|------------------:|-------------------------:|---------------------------:|
|
||||
| `zstd -1` | `5.065` | `284.8 MB/s` | `759.3 MB/s` |
|
||||
| `zstd -1` | `5.065` | `284.8 MB/s` | `759.3 MB/s` |
|
||||
| `zstd -5` | `5.826` | `124.9 MB/s` | `674.0 MB/s` |
|
||||
| `zstd -10` | `6.504` | `29.5 MB/s` | `771.3 MB/s` |
|
||||
| `zstd -1 --long` | `17.426` | `220.6 MB/s` | `1638.4 MB/s` |
|
||||
| `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s`|
|
||||
| `zstd -10 --long`| `21.949` | `75.6 MB/s` | `1632.6 MB/s`|
|
||||
| `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s` |
|
||||
| `zstd -10 --long`| `21.949` | `75.6 MB/s` | `1632.6 MB/s` |
|
||||
|
||||
On this file, the compression ratio improves significantly with minimal impact
|
||||
on compression speed, and the decompression speed doubles.
|
||||
@ -243,13 +264,27 @@ The below table illustrates this on the [Silesia compression corpus].
|
||||
| `zstd -10` | `3.523` | `16.4 MB/s` | `489.2 MB/s` |
|
||||
| `zstd -10 --long`| `3.566` | `16.2 MB/s` | `415.7 MB/s` |
|
||||
|
||||
#### zstdgrep
|
||||
|
||||
### zstdgrep
|
||||
|
||||
`zstdgrep` is a utility which makes it possible to `grep` directly a `.zst` compressed file.
|
||||
It's used the same way as normal `grep`, for example :
|
||||
`zstdgrep pattern file.zst`
|
||||
|
||||
`zstdgrep` is _not_ compatible with dictionary compression.
|
||||
`zstdgrep` does not support the following grep options
|
||||
|
||||
```
|
||||
--dereference-recursive (-R)
|
||||
--directories (-d)
|
||||
--exclude
|
||||
--exclude-from
|
||||
--exclude-dir
|
||||
--include
|
||||
--null (-Z),
|
||||
--null-data (-z)
|
||||
--recursive (-r)
|
||||
```
|
||||
|
||||
To search into a file compressed with a dictionary,
|
||||
it's necessary to decompress it using `zstd` or `zstdcat`,
|
||||
|
Loading…
Reference in New Issue
Block a user