zstd/contrib/pzstd
Chris Lamb 2dbe408a49 Make the build reproducible
Whilst working on the Reproducible Builds effort [0], we noticed
that zstd could not be built reproducibly.

This is due to the manual page encoding the number of CPUs from the
build machine and thus varies across builds.

This was originally filed in Debian as #897904 [1].

 [0] https://reproducible-builds.org/
 [1] https://bugs.debian.org/897904

Signed-off-by: Chris Lamb <lamby@debian.org>
2018-05-04 08:39:51 -07:00
..
images Minor tweaks to pzstd graph 2016-09-01 17:53:23 -07:00
test last batch of header files changed to reflect new license (#825) 2017-08-31 12:20:50 -07:00
utils fixed more file headers after license change (#825) 2017-08-31 12:11:57 -07:00
.gitignore zbufftest only depends on standard C time.h 2016-09-01 18:11:12 -07:00
BUCK Add BUCK files for Nuclide support 2017-01-27 10:43:12 -08:00
ErrorHolder.h fixed a bunch of headers after license change (#825) 2017-08-31 11:24:54 -07:00
Logging.h fixed a bunch of headers after license change (#825) 2017-08-31 11:24:54 -07:00
main.cpp fixed a bunch of headers after license change (#825) 2017-08-31 11:24:54 -07:00
Makefile Use -pthread rather than -lpthread. 2018-04-09 01:50:49 +02:00
Options.cpp Make the build reproducible 2018-05-04 08:39:51 -07:00
Options.h fixed a bunch of headers after license change (#825) 2017-08-31 11:24:54 -07:00
Pzstd.cpp last batch of header files changed to reflect new license (#825) 2017-08-31 12:20:50 -07:00
Pzstd.h fixed more file headers after license change (#825) 2017-08-31 12:11:57 -07:00
README.md [pzstd] Remove gtest dependency from make all 2016-11-14 11:56:28 -08:00
SkippableFrame.cpp fixed a bunch of headers after license change (#825) 2017-08-31 11:24:54 -07:00
SkippableFrame.h fixed a bunch of headers after license change (#825) 2017-08-31 11:24:54 -07:00

Parallel Zstandard (PZstandard)

Parallel Zstandard is a Pigz-like tool for Zstandard. It provides Zstandard format compatible compression and decompression that is able to utilize multiple cores. It breaks the input up into equal sized chunks and compresses each chunk independently into a Zstandard frame. It then concatenates the frames together to produce the final compressed output. Pzstandard will write a 12 byte header for each frame that is a skippable frame in the Zstandard format, which tells PZstandard the size of the next compressed frame. PZstandard supports parallel decompression of files compressed with PZstandard. When decompressing files compressed with Zstandard, PZstandard does IO in one thread, and decompression in another.

Usage

PZstandard supports the same command line interface as Zstandard, but also provides the -p option to specify the number of threads. Dictionary mode is not currently supported.

Basic usage

pzstd input-file -o output-file -p num-threads -#          # Compression
pzstd -d input-file -o output-file -p num-threads          # Decompression

PZstandard also supports piping and fifo pipes

cat input-file | pzstd -p num-threads -# -c > /dev/null

For more options

pzstd --help

PZstandard tries to pick a smart default number of threads if not specified (displayed in pzstd --help). If this number is not suitable, during compilation you can define PZSTD_NUM_THREADS to the number of threads you prefer.

Benchmarks

As a reference, PZstandard and Pigz were compared on an Intel Core i7 @ 3.1 GHz, each using 4 threads, with the Silesia compression corpus.

Compression Speed vs Ratio with 4 Threads Decompression Speed with 4 Threads
Compression Speed vs Ratio Decompression Speed

The test procedure was to run each of the following commands 2 times for each compression level, and take the minimum time.

time pzstd -# -p 4    -c silesia.tar     > silesia.tar.zst
time pzstd -d -p 4    -c silesia.tar.zst > /dev/null

time pigz  -# -p 4 -k -c silesia.tar     > silesia.tar.gz
time pigz  -d -p 4 -k -c silesia.tar.gz  > /dev/null

PZstandard was tested using compression levels 1-19, and Pigz was tested using compression levels 1-9. Pigz cannot do parallel decompression, it simply does each of reading, decompression, and writing on separate threads.

Tests

Tests require that you have gtest installed. Set GTEST_INC and GTEST_LIB in Makefile to specify the location of the gtest headers and libraries. Alternatively, run make googletest, which will clone googletest and build it. Run make tests && make check to run tests.