zstd/lib/common
Nick Terrell a419777eb1 Allow compressor to repeat Huffman tables
* Compressor saves most recently used Huffman table and reuses it
  if it produces better results.
* I attempted to preserve CPU usage profile.
  I intentionally left all of the existing heuristics in place.
  There is only a speed difference on the second block and later.
  When compressing large enough blocks (say >= 4 KiB) there is
  no significant difference in compression speed.
  Dictionary compression of one block is the same speed for blocks
  with literals <= 1 KiB, and after that the difference is not
  very significant.
* In the synthetic data, with blocks 10 KB or smaller, most blocks
  can't use repeated tables because the previous block did not
  contain a symbol that the current block contains.
  Once blocks are about 12 KB or more, most previous blocks have
  valid Huffman tables for the current block, and the compression
  ratio and decompression speed jumped.
* In silesia blocks as small as 4KB can frequently reuse the
  previous Huffman table (85%), but it isn't as profitable, and
  the previous Huffman table only gets used about 3% of the time.
* Microbenchmarks show that `HUF_validateCTable()` takes ~55 ns
  and `HUF_estimateCompressedSize()` takes ~35 ns.
  They are decently well optimized, the first versions took 90 ns
  and 120 ns respectively. `HUF_validateCTable()` could be twice as
  fast, if we cast the `HUF_CElt*` to a `U32*` and compare to 0.
  However, `U32` has an alignment of 4 instead of 2, so I think that
  might be undefined behavior.
* I've ran `zstreamtest` compiled normally, with UASAN and with MSAN
  for 4 hours each.

The worst case for the speed difference is a bunch of small blocks
in the same frame. I modified `bench.c` to compress the input in a
single frame but with blocks of the given block size, set by `-B`.
Benchmarks on level 1:

|  Program  | Block size |   Corpus  | Ratio | Compression MB/s | Decompression MB/s |
|-----------|------------|-----------|-------|------------------|--------------------|
| zstd.base |        256 | synthetic | 2.364 |            110.0 |              297.0 |
|      zstd |        256 | synthetic | 2.367 |            108.9 |              297.0 |
| zstd.base |        256 | silesia   | 2.204 |             93.8 |              415.7 |
|      zstd |        256 | silesia   | 2.204 |             93.4 |              415.7 |
| zstd.base |        512 | synthetic | 2.594 |            144.2 |              420.0 |
|      zstd |        512 | synthetic | 2.599 |            141.5 |              425.7 |
| zstd.base |        512 | silesia   | 2.358 |            118.4 |              432.6 |
|      zstd |        512 | silesia   | 2.358 |            119.8 |              432.6 |
| zstd.base |       1024 | synthetic | 2.790 |            192.3 |              594.1 |
|      zstd |       1024 | synthetic | 2.794 |            192.3 |              600.0 |
| zstd.base |       1024 | silesia   | 2.524 |            148.2 |              464.2 |
|      zstd |       1024 | silesia   | 2.525 |            148.2 |              467.6 |
| zstd.base |       4096 | synthetic | 3.023 |            300.0 |             1000.0 |
|      zstd |       4096 | synthetic | 3.024 |            300.0 |             1010.1 |
| zstd.base |       4096 | silesia   | 2.779 |            223.1 |              623.5 |
|      zstd |       4096 | silesia   | 2.779 |            223.1 |              636.0 |
| zstd.base |      16384 | synthetic | 3.131 |            350.0 |             1150.1 |
|      zstd |      16384 | synthetic | 3.152 |            350.0 |             1630.3 |
| zstd.base |      16384 | silesia   | 2.871 |            296.5 |              883.3 |
|      zstd |      16384 | silesia   | 2.872 |            294.4 |              898.3 |
2017-03-02 13:27:52 -08:00
..
bitstream.h minor BMI version check 2016-12-13 15:21:06 +01:00
entropy_common.c updated fse version 2017-02-15 12:00:03 -08:00
error_private.c added ZSTD_error_frameParameter_windowTooLarge (#403) 2016-10-12 17:29:24 -07:00
error_private.h created error_private.c, so that a single list of error strings get included 2016-10-11 17:24:50 -07:00
fse_decompress.c reduced ZSTD_DDict memory usage 2017-02-27 00:27:30 -08:00
fse.h updated fse version 2017-02-15 12:00:03 -08:00
huf.h Allow compressor to repeat Huffman tables 2017-03-02 13:27:52 -08:00
mem.h spelling fixes in comments 2017-02-20 12:08:59 -08:00
pool.c Fix pool.c threading.h import 2017-01-26 15:29:10 -08:00
pool.h Add include guards and extern C 2017-01-27 16:00:19 -08:00
threading.c avoid empty translation unit warning without #pragma 2017-02-14 00:46:47 +01:00
threading.h spelling fixes in comments 2017-02-20 12:08:59 -08:00
xxhash.c fixing FORCE_INLINE for older compilers (#330) 2016-09-02 11:44:21 -07:00
xxhash.h added : xxhash namespace enforced from xxhash.h. 2016-08-10 08:16:51 +02:00
zstd_common.c Fix ZSTD_getErrorString and add tests 2017-02-08 17:28:49 -08:00
zstd_errors.h Export all API functions 2016-12-16 13:27:30 -08:00
zstd_internal.h fixed Multi-threaded compression 2017-01-19 10:32:55 -08:00