Merge branch 'dev' into initStatic_tests

This commit is contained in:
Yann Collet 2020-05-11 16:51:13 -07:00
commit 58227db405
3 changed files with 129 additions and 73 deletions

View File

@ -31,10 +31,10 @@ a list of known ports and bindings is provided on [Zstandard homepage](http://ww
## Benchmarks
For reference, several fast compression algorithms were tested and compared
on a server running Arch Linux (`Linux version 5.0.5-arch1-1`),
on a server running Arch Linux (`Linux version 5.5.11-arch1-1`),
with a Core i9-9900K CPU @ 5.0GHz,
using [lzbench], an open-source in-memory benchmark by @inikep
compiled with [gcc] 8.2.1,
compiled with [gcc] 9.3.0,
on the [Silesia compression corpus].
[lzbench]: https://github.com/inikep/lzbench
@ -43,18 +43,26 @@ on the [Silesia compression corpus].
| Compressor name | Ratio | Compression| Decompress.|
| --------------- | ------| -----------| ---------- |
| **zstd 1.4.4 -1** | 2.884 | 520 MB/s | 1600 MB/s |
| zlib 1.2.11 -1 | 2.743 | 110 MB/s | 440 MB/s |
| brotli 1.0.7 -0 | 2.701 | 430 MB/s | 470 MB/s |
| quicklz 1.5.0 -1 | 2.238 | 600 MB/s | 800 MB/s |
| lzo1x 2.09 -1 | 2.106 | 680 MB/s | 950 MB/s |
| lz4 1.8.3 | 2.101 | 800 MB/s | 4220 MB/s |
| snappy 1.1.4 | 2.073 | 580 MB/s | 2020 MB/s |
| lzf 3.6 -1 | 2.077 | 440 MB/s | 930 MB/s |
| **zstd 1.4.5 -1** | 2.884 | 500 MB/s | 1660 MB/s |
| zlib 1.2.11 -1 | 2.743 | 90 MB/s | 400 MB/s |
| brotli 1.0.7 -0 | 2.703 | 400 MB/s | 450 MB/s |
| **zstd 1.4.5 --fast=1** | 2.434 | 570 MB/s | 2200 MB/s |
| **zstd 1.4.5 --fast=3** | 2.312 | 640 MB/s | 2300 MB/s |
| quicklz 1.5.0 -1 | 2.238 | 560 MB/s | 710 MB/s |
| **zstd 1.4.5 --fast=5** | 2.178 | 700 MB/s | 2420 MB/s |
| lzo1x 2.10 -1 | 2.106 | 690 MB/s | 820 MB/s |
| lz4 1.9.2 | 2.101 | 740 MB/s | 4530 MB/s |
| **zstd 1.4.5 --fast=7** | 2.096 | 750 MB/s | 2480 MB/s |
| lzf 3.6 -1 | 2.077 | 410 MB/s | 860 MB/s |
| snappy 1.1.8 | 2.073 | 560 MB/s | 1790 MB/s |
[zlib]: http://www.zlib.net/
[LZ4]: http://www.lz4.org/
The negative compression levels, specified with `--fast=#`,
offer faster compression and decompression speed in exchange for some loss in
compression ratio compared to level 1, as seen in the table above.
Zstd can also offer stronger compression ratios at the cost of compression speed.
Speed vs Compression trade-off is configurable by small increments.
Decompression speed is preserved and remains roughly the same at all settings,

View File

@ -1144,13 +1144,26 @@ size_t ZSTD_estimateCCtxSize_usingCCtxParams(const ZSTD_CCtx_params* params)
size_t const ldmSpace = ZSTD_ldm_getTableSize(params->ldmParams);
size_t const ldmSeqSpace = ZSTD_cwksp_alloc_size(ZSTD_ldm_getMaxNbSeq(params->ldmParams, blockSize) * sizeof(rawSeq));
size_t const neededSpace = entropySpace + blockStateSpace + tokenSpace +
matchStateSize + ldmSpace + ldmSeqSpace;
/* estimateCCtxSize is for one-shot compression. So no buffers should
* be needed. However, we still allocate two 0-sized buffers, which can
* take space under ASAN. */
size_t const bufferSpace = ZSTD_cwksp_alloc_size(0)
+ ZSTD_cwksp_alloc_size(0);
size_t const cctxSpace = ZSTD_cwksp_alloc_size(sizeof(ZSTD_CCtx));
DEBUGLOG(5, "sizeof(ZSTD_CCtx) : %u", (U32)cctxSpace);
size_t const neededSpace =
cctxSpace +
entropySpace +
blockStateSpace +
ldmSpace +
ldmSeqSpace +
matchStateSize +
tokenSpace +
bufferSpace;
DEBUGLOG(5, "estimate workspace : %u", (U32)neededSpace);
return cctxSpace + neededSpace;
return neededSpace;
}
}

View File

@ -10,7 +10,7 @@ There are however other Makefile targets that create different variations of CLI
- `zstd-decompress` : version of CLI which can only decompress zstd format
#### Compilation variables
### Compilation variables
`zstd` scope can be altered by modifying the following `make` variables :
- __HAVE_THREAD__ : multithreading is automatically enabled when `pthread` is detected.
@ -61,6 +61,24 @@ There are however other Makefile targets that create different variations of CLI
In which case, linking stage will fail if `lz4` library cannot be found.
This is useful to prevent silent feature disabling.
- __ZSTD_NOBENCH__ : `zstd` cli will be compiled without its integrated benchmark module.
This can be useful to produce smaller binaries.
In this case, the corresponding unit can also be excluded from compilation target.
- __ZSTD_NODICT__ : `zstd` cli will be compiled without support for the integrated dictionary builder.
This can be useful to produce smaller binaries.
In this case, the corresponding unit can also be excluded from compilation target.
- __ZSTD_NOCOMPRESS__ : `zstd` cli will be compiled without support for compression.
The resulting binary will only be able to decompress files.
This can be useful to produce smaller binaries.
A corresponding `Makefile` target using this ability is `zstd-decompress`.
- __ZSTD_NODECOMPRESS__ : `zstd` cli will be compiled without support for decompression.
The resulting binary will only be able to compress files.
This can be useful to produce smaller binaries.
A corresponding `Makefile` target using this ability is `zstd-compress`.
- __BACKTRACE__ : `zstd` can display a stack backtrace when execution
generates a runtime exception. By default, this feature may be
degraded/disabled on some platforms unless additional compiler directives are
@ -69,11 +87,11 @@ There are however other Makefile targets that create different variations of CLI
Example : `make zstd BACKTRACE=1`
#### Aggregation of parameters
### Aggregation of parameters
CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined into `-b1e18i1`.
#### Symlink shortcuts
### Symlink shortcuts
It's possible to invoke `zstd` through a symlink.
When the name of the symlink has a specific value, it triggers an associated behavior.
- `zstdmt` : compress using all cores available on local system.
@ -86,7 +104,7 @@ When the name of the symlink has a specific value, it triggers an associated beh
- `ungz`, `unxz` and `unlzma` will do the same, and will also remove source file by default (use `--keep` to preserve).
#### Dictionary builder in Command Line Interface
### Dictionary builder in Command Line Interface
Zstd offers a training mode, which can be used to tune the algorithm for a selected
type of data, by providing it with a few samples. The result of the training is stored
in a file selected with the `-o` option (default name is `dictionary`),
@ -106,7 +124,7 @@ Usage of the dictionary builder and created dictionaries with CLI:
3. Decompress with the dictionary: `zstd --decompress FILE.zst -D dictionaryName`
#### Benchmark in Command Line Interface
### Benchmark in Command Line Interface
CLI includes in-memory compression benchmark module for zstd.
The benchmark is conducted using given filenames. The files are read into memory and joined together.
It makes benchmark more precise as it eliminates I/O overhead.
@ -118,81 +136,84 @@ One can select compression levels starting from `-b` and ending with `-e`.
The `-i` parameter selects minimal time used for each of tested levels.
#### Usage of Command Line Interface
### Usage of Command Line Interface
The full list of options can be obtained with `-h` or `-H` parameter:
```
Usage :
zstd [args] [FILE(s)] [-o file]
Usage :
zstd [args] [FILE(s)] [-o file]
FILE : a filename
FILE : a filename
with no FILE, or when FILE is - , read standard input
Arguments :
-# : # compression level (1-19, default: 3)
-d : decompression
-D file: use `file` as Dictionary
-o file: result stored into `file` (only if 1 input file)
-f : overwrite output without prompting and (de)compress links
--rm : remove source file(s) after successful de/compression
-k : preserve source file(s) (default)
-h/-H : display help/long help and exit
Arguments :
-# : # compression level (1-19, default: 3)
-d : decompression
-D file: use `file` as Dictionary
-o file: result stored into `file` (only if 1 input file)
-f : overwrite output without prompting and (de)compress links
--rm : remove source file(s) after successful de/compression
-k : preserve source file(s) (default)
-h/-H : display help/long help and exit
Advanced arguments :
-V : display Version number and exit
Advanced arguments :
-V : display Version number and exit
-v : verbose mode; specify multiple times to increase verbosity
-q : suppress warnings; specify twice to suppress errors too
-c : force write to standard output, even if it is the console
-l : print information about zstd compressed files
--exclude-compressed: only compress files that are not previously compressed
-l : print information about zstd compressed files
--exclude-compressed: only compress files that are not previously compressed
--ultra : enable levels beyond 19, up to 22 (requires more memory)
--long[=#]: enable long distance matching with given window log (default: 27)
--fast[=#]: switch to very fast compression levels (default: 1)
--adapt : dynamically adapt compression level to I/O conditions
--stream-size=# : optimize compression parameters for streaming input of given number of bytes
--adapt : dynamically adapt compression level to I/O conditions
--stream-size=# : optimize compression parameters for streaming input of given number of bytes
--size-hint=# optimize compression parameters for streaming input of approximately this size
--target-compressed-block-size=# : make compressed block near targeted size
-T# : spawns # compression threads (default: 1, 0==# cores)
-B# : select size of each job (default: 0==automatic)
--rsyncable : compress using a rsync-friendly method (-B sets block size)
--target-compressed-block-size=# : make compressed block near targeted size
-T# : spawns # compression threads (default: 1, 0==# cores)
-B# : select size of each job (default: 0==automatic)
--rsyncable : compress using a rsync-friendly method (-B sets block size)
--no-dictID : don't write dictID into header (dictionary compression)
--[no-]check : integrity check (default: enabled)
--[no-]compress-literals : force (un)compressed literals
-r : operate recursively on directories
--output-dir-flat[=directory]: all resulting files stored into `directory`.
--format=zstd : compress files to the .zst format (default)
--format=gzip : compress files to the .gz format
--test : test compressed file integrity
--[no-]check : integrity check (default: enabled)
--[no-]compress-literals : force (un)compressed literals
-r : operate recursively on directories
--output-dir-flat[=directory]: all resulting files stored into `directory`.
--format=zstd : compress files to the .zst format (default)
--format=gzip : compress files to the .gz format
--test : test compressed file integrity
--[no-]sparse : sparse mode (default: disabled)
-M# : Set a memory usage limit for decompression
--no-progress : do not display the progress bar
-- : All arguments after "--" are treated as files
-M# : Set a memory usage limit for decompression
--no-progress : do not display the progress bar
-- : All arguments after "--" are treated as files
Dictionary builder :
--train ## : create a dictionary from a training set of files
Dictionary builder :
--train ## : create a dictionary from a training set of files
--train-cover[=k=#,d=#,steps=#,split=#,shrink[=#]] : use the cover algorithm with optional args
--train-fastcover[=k=#,d=#,f=#,steps=#,split=#,accel=#,shrink[=#]] : use the fast cover algorithm with optional args
--train-legacy[=s=#] : use the legacy algorithm with selectivity (default: 9)
-o file : `file` is dictionary name (default: dictionary)
--maxdict=# : limit dictionary to specified size (default: 112640)
-o file : `file` is dictionary name (default: dictionary)
--maxdict=# : limit dictionary to specified size (default: 112640)
--dictID=# : force dictionary ID to specified value (default: random)
Benchmark arguments :
-b# : benchmark file(s), using # compression level (default: 3)
Benchmark arguments :
-b# : benchmark file(s), using # compression level (default: 3)
-e# : test all compression levels from -bX to # (default: 1)
-i# : minimum evaluation time in seconds (default: 3s)
-i# : minimum evaluation time in seconds (default: 3s)
-B# : cut file into independent blocks of size # (default: no block)
--priority=rt : set process priority to real-time
--priority=rt : set process priority to real-time
```
#### Restricted usage of Environment Variables
Using environment variables to set parameters has security implications.
Therefore, this avenue is intentionally restricted.
Only `ZSTD_CLEVEL` is supported currently, for setting compression level.
`ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range).
If the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message.
`ZSTD_CLEVEL` just replaces the default compression level (`3`).
It can be overridden by corresponding command line arguments.
### Passing parameters through Environment Variables
`ZSTD_CLEVEL` can be used to modify the default compression level of `zstd`
(usually set to `3`) to another value between 1 and 19 (the "normal" range).
This can be useful when `zstd` CLI is invoked in a way that doesn't allow passing arguments.
One such scenario is `tar --zstd`.
As `ZSTD_CLEVEL` only replaces the default compression level,
it can then be overridden by corresponding command line arguments.
#### Long distance matching mode
There is no "generic" way to pass "any kind of parameter" to `zstd` in a pass-through manner.
Using environment variables for this purpose has security implications.
Therefore, this avenue is intentionally restricted and only supports `ZSTD_CLEVEL`.
### Long distance matching mode
The long distance matching mode, enabled with `--long`, is designed to improve
the compression ratio for files with long matches at a large distance (up to the
maximum window size, `128 MiB`) while still maintaining compression speed.
@ -216,12 +237,12 @@ Compression Speed vs Ratio | Decompression Speed
| Method | Compression ratio | Compression speed | Decompression speed |
|:-------|------------------:|-------------------------:|---------------------------:|
| `zstd -1` | `5.065` | `284.8 MB/s` | `759.3 MB/s` |
| `zstd -1` | `5.065` | `284.8 MB/s` | `759.3 MB/s` |
| `zstd -5` | `5.826` | `124.9 MB/s` | `674.0 MB/s` |
| `zstd -10` | `6.504` | `29.5 MB/s` | `771.3 MB/s` |
| `zstd -1 --long` | `17.426` | `220.6 MB/s` | `1638.4 MB/s` |
| `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s`|
| `zstd -10 --long`| `21.949` | `75.6 MB/s` | `1632.6 MB/s`|
| `zstd -5 --long` | `19.661` | `165.5 MB/s` | `1530.6 MB/s` |
| `zstd -10 --long`| `21.949` | `75.6 MB/s` | `1632.6 MB/s` |
On this file, the compression ratio improves significantly with minimal impact
on compression speed, and the decompression speed doubles.
@ -243,13 +264,27 @@ The below table illustrates this on the [Silesia compression corpus].
| `zstd -10` | `3.523` | `16.4 MB/s` | `489.2 MB/s` |
| `zstd -10 --long`| `3.566` | `16.2 MB/s` | `415.7 MB/s` |
#### zstdgrep
### zstdgrep
`zstdgrep` is a utility which makes it possible to `grep` directly a `.zst` compressed file.
It's used the same way as normal `grep`, for example :
`zstdgrep pattern file.zst`
`zstdgrep` is _not_ compatible with dictionary compression.
`zstdgrep` does not support the following grep options
```
--dereference-recursive (-R)
--directories (-d)
--exclude
--exclude-from
--exclude-dir
--include
--null (-Z),
--null-data (-z)
--recursive (-r)
```
To search into a file compressed with a dictionary,
it's necessary to decompress it using `zstd` or `zstdcat`,