zstd/lib
Nick Terrell a419777eb1 Allow compressor to repeat Huffman tables
* Compressor saves most recently used Huffman table and reuses it
  if it produces better results.
* I attempted to preserve CPU usage profile.
  I intentionally left all of the existing heuristics in place.
  There is only a speed difference on the second block and later.
  When compressing large enough blocks (say >= 4 KiB) there is
  no significant difference in compression speed.
  Dictionary compression of one block is the same speed for blocks
  with literals <= 1 KiB, and after that the difference is not
  very significant.
* In the synthetic data, with blocks 10 KB or smaller, most blocks
  can't use repeated tables because the previous block did not
  contain a symbol that the current block contains.
  Once blocks are about 12 KB or more, most previous blocks have
  valid Huffman tables for the current block, and the compression
  ratio and decompression speed jumped.
* In silesia blocks as small as 4KB can frequently reuse the
  previous Huffman table (85%), but it isn't as profitable, and
  the previous Huffman table only gets used about 3% of the time.
* Microbenchmarks show that `HUF_validateCTable()` takes ~55 ns
  and `HUF_estimateCompressedSize()` takes ~35 ns.
  They are decently well optimized, the first versions took 90 ns
  and 120 ns respectively. `HUF_validateCTable()` could be twice as
  fast, if we cast the `HUF_CElt*` to a `U32*` and compare to 0.
  However, `U32` has an alignment of 4 instead of 2, so I think that
  might be undefined behavior.
* I've ran `zstreamtest` compiled normally, with UASAN and with MSAN
  for 4 hours each.

The worst case for the speed difference is a bunch of small blocks
in the same frame. I modified `bench.c` to compress the input in a
single frame but with blocks of the given block size, set by `-B`.
Benchmarks on level 1:

|  Program  | Block size |   Corpus  | Ratio | Compression MB/s | Decompression MB/s |
|-----------|------------|-----------|-------|------------------|--------------------|
| zstd.base |        256 | synthetic | 2.364 |            110.0 |              297.0 |
|      zstd |        256 | synthetic | 2.367 |            108.9 |              297.0 |
| zstd.base |        256 | silesia   | 2.204 |             93.8 |              415.7 |
|      zstd |        256 | silesia   | 2.204 |             93.4 |              415.7 |
| zstd.base |        512 | synthetic | 2.594 |            144.2 |              420.0 |
|      zstd |        512 | synthetic | 2.599 |            141.5 |              425.7 |
| zstd.base |        512 | silesia   | 2.358 |            118.4 |              432.6 |
|      zstd |        512 | silesia   | 2.358 |            119.8 |              432.6 |
| zstd.base |       1024 | synthetic | 2.790 |            192.3 |              594.1 |
|      zstd |       1024 | synthetic | 2.794 |            192.3 |              600.0 |
| zstd.base |       1024 | silesia   | 2.524 |            148.2 |              464.2 |
|      zstd |       1024 | silesia   | 2.525 |            148.2 |              467.6 |
| zstd.base |       4096 | synthetic | 3.023 |            300.0 |             1000.0 |
|      zstd |       4096 | synthetic | 3.024 |            300.0 |             1010.1 |
| zstd.base |       4096 | silesia   | 2.779 |            223.1 |              623.5 |
|      zstd |       4096 | silesia   | 2.779 |            223.1 |              636.0 |
| zstd.base |      16384 | synthetic | 3.131 |            350.0 |             1150.1 |
|      zstd |      16384 | synthetic | 3.152 |            350.0 |             1630.3 |
| zstd.base |      16384 | silesia   | 2.871 |            296.5 |              883.3 |
|      zstd |      16384 | silesia   | 2.872 |            294.4 |              898.3 |
2017-03-02 13:27:52 -08:00
..
common Allow compressor to repeat Huffman tables 2017-03-02 13:27:52 -08:00
compress Allow compressor to repeat Huffman tables 2017-03-02 13:27:52 -08:00
decompress Merge pull request #579 from iburinoc/multiframe 2017-03-01 11:02:04 -08:00
deprecated Fix deprecation warnings for clang with C++14 2017-02-08 17:38:17 -08:00
dictBuilder Fix deprecation warnings for clang with C++14 2017-02-08 17:38:17 -08:00
dll Merge branch 'dev' into multiframe 2017-02-10 10:08:55 -08:00
legacy Change name to to findFrameCompressedSize and add skippable support 2017-02-22 12:12:34 -08:00
.gitignore Added "dictionary decompression" example 2016-07-07 14:08:00 +02:00
BUCK Add BUCK files for Nuclide support 2017-01-27 10:43:12 -08:00
libzstd.pc.in updated pkg config file 2016-11-30 11:06:58 -08:00
Makefile fixed Mac OS-X specific directory in $(RM) list 2017-02-05 10:22:58 -08:00
README.md moved zbuff source files into lib/deprecated 2016-12-05 19:28:19 -08:00
zstd.h zstdmt : fix : loading prefix from previous segments 2017-02-23 23:42:12 -08:00

Zstandard library files

The lib directory contains several directories. Depending on target use case, it's enough to include only files from relevant directories.

API

Zstandard's stable API is exposed within zstd.h, at the root of lib directory.

Advanced API

Some additional API may be useful if you're looking into advanced features :

  • common/error_public.h : transforms size_t function results into an enum, for precise error handling.
  • ZSTD_STATIC_LINKING_ONLY : if you define this macro before including zstd.h, it will give access to advanced and experimental API. These APIs shall never be used with dynamic library ! They are not "stable", their definition may change in the future. Only static linking is allowed.

Modular build

Directory common/ is required in all circumstances. You can select to support compression only, by just adding files from the compress/ directory, In a similar way, you can build a decompressor-only library with the decompress/ directory.

Other optional functionalities provided are :

  • dictBuilder/ : source files to create dictionaries. The API can be consulted in dictBuilder/zdict.h. This module also depends on common/ and compress/ .

  • legacy/ : source code to decompress previous versions of zstd, starting from v0.1. This module also depends on common/ and decompress/ . Library compilation must include directive ZSTD_LEGACY_SUPPORT = 1 . The main API can be consulted in legacy/zstd_legacy.h. Advanced API from each version can be found in their relevant header file. For example, advanced API for version v0.4 is in legacy/zstd_v04.h .

Using MinGW+MSYS to create DLL

DLL can be created using MinGW+MSYS with the make libzstd command. This command creates dll\libzstd.dll and the import library dll\libzstd.lib. The import library is only required with Visual C++. The header file zstd.h and the dynamic library dll\libzstd.dll are required to compile a project using gcc/MinGW. The dynamic library has to be added to linking options. It means that if a project that uses ZSTD consists of a single test-dll.c file it should be linked with dll\libzstd.dll. For example:

    gcc $(CFLAGS) -Iinclude/ test-dll.c -o test-dll dll\libzstd.dll

The compiled executable will require ZSTD DLL which is available at dll\libzstd.dll.

Obsolete streaming API

Streaming is now provided within zstd.h. Older streaming API is still available within deprecated/zbuff.h. It will be removed in a future version. Consider migrating code towards newer streaming API in zstd.h.

Miscellaneous

The other files are not source code. There are :

  • LICENSE : contains the BSD license text
  • Makefile : script to compile or install zstd library (static and dynamic)
  • libzstd.pc.in : for pkg-config (make install)
  • README.md : this file