zstd/contrib/adaptive-compression
Yann Collet ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
..
.gitignore fixed contrib/adaptive-compression 2018-03-15 17:10:15 -07:00
adapt.c changed parameter names from ZSTD_p_* to ZSTD_c_* 2018-12-05 17:26:02 -08:00
datagencli.c fix confusion between unsigned <-> U32 2018-12-21 18:09:41 -08:00
Makefile Making changes to make it compile on my laptop 2018-10-11 15:51:57 -07:00
README.md Updating README.md 2017-08-15 17:48:23 -07:00
test-correctness.sh add tests for compression bounds, fix another warning 2017-07-28 15:55:02 -07:00
test-performance.sh updated tests to use different seeds when executing different tests 2017-07-14 16:29:29 -07:00

Summary

adapt is a new compression tool targeted at optimizing performance across network connections and pipelines. The tool is aimed at sensing network speeds and adapting compression level based on network or pipe speeds. In situations where the compression level does not appropriately match the network/pipe speed, compression may be bottlenecking the entire pipeline or the files may not be compressed as much as they potentially could be, therefore losing efficiency. It also becomes quite impractical to manually measure and set an optimalcompression level (which could potentially change over time).

Using adapt

In order to build and use the tool, you can simply run make adapt in the adaptive-compression directory under contrib. This will generate an executable available for use. Another possible method of installation is running make install, which will create and install the binary as the command zstd-adaptive.

Similar to many other compression utilities, zstd-adaptive can be invoked by using the following format:

zstd-adaptive [options] [file(s)]

Supported options for the above format are described below.

zstd-adaptive also supports reading from stdin and writing to stdout, which is potentially more useful. By default, if no files are given, zstd-adaptive reads from and writes to standard I/O. Therefore, you can simply insert it within a pipeline like so:

cat FILE | zstd-adaptive | ssh "cat - > tmp.zst"

If a file is provided, it is also possible to force writing to stdout using the -c flag like so:

zstd-adaptive -c FILE | ssh "cat - > tmp.zst"

Several options described below can be used to control the behavior of zstd-adaptive. More specifically, using the -l# and -u# flags will will set upper and lower bounds so that the compression level will always be within that range. The -i# flag can also be used to change the initial compression level. If an initial compression level is not provided, the initial compression level will be chosen such that it is within the appropriate range (it becomes equal to the lower bound).

Options

-oFILE : write output to FILE

-i# : provide initial compression level (must within the appropriate bounds)

-h : display help/information

-f : force the compression level to stay constant

-c : force write to stdout

-p : hide progress bar

-q : quiet mode -- do not show progress bar or other information

-l# : set a lower bound on the compression level (default is 1)

-u# : set an upper bound on the compression level (default is 22)

Benchmarking / Test results

Artificial Tests

These artificial tests were run by using the pv command line utility in order to limit pipe speeds (25 MB/s read and 5 MB/s write limits were chosen to mimic severe throughput constraints). A 40 GB backup file was sent through a pipeline, compressed, and written out to a file. Compression time, size, and ratio were computed. Data for zstd -15 was excluded from these tests because the test runs quite long.

25 MB/s read limit
Compressor Name Ratio Compressed Size Compression Time
zstd -3 2.108 20.718 GB 29m 48.530s
zstd-adaptive 2.230 19.581 GB 29m 48.798s
5 MB/s write limit
Compressor Name Ratio Compressed Size Compression Time
zstd -3 2.108 20.718 GB 1h 10m 43.076s
zstd-adaptive 2.249 19.412 GB 1h 06m 15.577s

The commands used for this test generally followed the form:

cat FILE | pv -L 25m -q | COMPRESSION | pv -q > tmp.zst # impose 25 MB/s read limit

cat FILE | pv -q | COMPRESSION | pv -L 5m -q > tmp.zst # impose 5 MB/s write limit

SSH Tests

The following tests were performed by piping a relatively large backup file (approximately 80 GB) through compression and over SSH to be stored on a server. The test data includes statistics for time and compressed size on zstd at several compression levels, as well as zstd-adaptive. The data highlights the potential advantages that zstd-adaptive has over using a low static compression level and the negative imapcts that using an excessively high static compression level can have on pipe throughput.

Compressor Name Ratio Compressed Size Compression Time
zstd -3 2.212 32.426 GB 1h 17m 59.756s
zstd -15 2.374 30.213 GB 2h 56m 59.441s
zstd-adaptive 2.315 30.993 GB 1h 18m 52.860s

The commands used for this test generally followed the form:

cat FILE | COMPRESSION | ssh dev "cat - > tmp.zst"