Commit Graph

206 Commits

Author SHA1 Message Date
Nick Terrell
a419777eb1 Allow compressor to repeat Huffman tables
* Compressor saves most recently used Huffman table and reuses it
  if it produces better results.
* I attempted to preserve CPU usage profile.
  I intentionally left all of the existing heuristics in place.
  There is only a speed difference on the second block and later.
  When compressing large enough blocks (say >= 4 KiB) there is
  no significant difference in compression speed.
  Dictionary compression of one block is the same speed for blocks
  with literals <= 1 KiB, and after that the difference is not
  very significant.
* In the synthetic data, with blocks 10 KB or smaller, most blocks
  can't use repeated tables because the previous block did not
  contain a symbol that the current block contains.
  Once blocks are about 12 KB or more, most previous blocks have
  valid Huffman tables for the current block, and the compression
  ratio and decompression speed jumped.
* In silesia blocks as small as 4KB can frequently reuse the
  previous Huffman table (85%), but it isn't as profitable, and
  the previous Huffman table only gets used about 3% of the time.
* Microbenchmarks show that `HUF_validateCTable()` takes ~55 ns
  and `HUF_estimateCompressedSize()` takes ~35 ns.
  They are decently well optimized, the first versions took 90 ns
  and 120 ns respectively. `HUF_validateCTable()` could be twice as
  fast, if we cast the `HUF_CElt*` to a `U32*` and compare to 0.
  However, `U32` has an alignment of 4 instead of 2, so I think that
  might be undefined behavior.
* I've ran `zstreamtest` compiled normally, with UASAN and with MSAN
  for 4 hours each.

The worst case for the speed difference is a bunch of small blocks
in the same frame. I modified `bench.c` to compress the input in a
single frame but with blocks of the given block size, set by `-B`.
Benchmarks on level 1:

|  Program  | Block size |   Corpus  | Ratio | Compression MB/s | Decompression MB/s |
|-----------|------------|-----------|-------|------------------|--------------------|
| zstd.base |        256 | synthetic | 2.364 |            110.0 |              297.0 |
|      zstd |        256 | synthetic | 2.367 |            108.9 |              297.0 |
| zstd.base |        256 | silesia   | 2.204 |             93.8 |              415.7 |
|      zstd |        256 | silesia   | 2.204 |             93.4 |              415.7 |
| zstd.base |        512 | synthetic | 2.594 |            144.2 |              420.0 |
|      zstd |        512 | synthetic | 2.599 |            141.5 |              425.7 |
| zstd.base |        512 | silesia   | 2.358 |            118.4 |              432.6 |
|      zstd |        512 | silesia   | 2.358 |            119.8 |              432.6 |
| zstd.base |       1024 | synthetic | 2.790 |            192.3 |              594.1 |
|      zstd |       1024 | synthetic | 2.794 |            192.3 |              600.0 |
| zstd.base |       1024 | silesia   | 2.524 |            148.2 |              464.2 |
|      zstd |       1024 | silesia   | 2.525 |            148.2 |              467.6 |
| zstd.base |       4096 | synthetic | 3.023 |            300.0 |             1000.0 |
|      zstd |       4096 | synthetic | 3.024 |            300.0 |             1010.1 |
| zstd.base |       4096 | silesia   | 2.779 |            223.1 |              623.5 |
|      zstd |       4096 | silesia   | 2.779 |            223.1 |              636.0 |
| zstd.base |      16384 | synthetic | 3.131 |            350.0 |             1150.1 |
|      zstd |      16384 | synthetic | 3.152 |            350.0 |             1630.3 |
| zstd.base |      16384 | silesia   | 2.871 |            296.5 |              883.3 |
|      zstd |      16384 | silesia   | 2.872 |            294.4 |              898.3 |
2017-03-02 13:27:52 -08:00
Yann Collet
0b9b894b2d reduced ZSTD_DDict memory usage
saved 128 KB
2017-02-27 00:27:30 -08:00
Anders Oleson
517577bf53 spelling fixes in comments
i.e. occurred labeled Huffman
2017-02-20 12:08:59 -08:00
Yann Collet
2252d29a5a Merge branch 'dev' of github.com:facebook/zstd into dev 2017-02-15 12:00:50 -08:00
Yann Collet
4596037042 updated fse version
feature minor refactoring (removing FSE_abs())
also : fix a few minor issues recently introduced in examples
2017-02-15 12:00:03 -08:00
Yann Collet
f0b9a8dddb Merge pull request #547 from inikep/dev11
Avoid fseek()'s 2GiB barrier with MacOS and *BSD
2017-02-14 12:29:00 -08:00
ds77
08e6a88a97 avoid empty translation unit warning without #pragma 2017-02-14 00:46:47 +01:00
Przemyslaw Skibinski
09c8e5390d __builtin_bswap requires gcc 4.3+ 2017-02-13 12:45:53 +01:00
Sean Purcell
e0b3265e87 Fix ZSTD_getErrorString and add tests 2017-02-08 17:28:49 -08:00
Yann Collet
cc3d1bc262 Merge pull request #525 from terrelln/covermt
Multithreaded COVER dictionary training
2017-01-30 10:15:33 -08:00
Nick Terrell
b42dd27ef5 Add include guards and extern C 2017-01-27 16:00:19 -08:00
Nick Terrell
e628eaf87a Fix pool.c threading.h import 2017-01-26 15:29:10 -08:00
cyan4973
2e3b659ae1 fixed minor warnings (Visual, conversion, doxygen) 2017-01-20 14:43:09 -08:00
cyan4973
5fba09fa41 updated util's time for Windows compatibility
Correctly measures time on Posix systems when running with
Multi-threading

Todo : check Windows measurement under multi-threading
2017-01-20 12:57:31 -08:00
Yann Collet
0f984d94c4 changed MT enabling macro to ZSTD_MULTITHREAD 2017-01-19 14:05:07 -08:00
Yann Collet
32dfae6f98 fixed Multi-threaded compression
MT compression generates a single frame.
Multi-threading operates by breaking the frames into independent sections.
But from a decoder perspective, there is no difference :
it's just a suite of blocks.

Problem is, decoder preserves repCodes from previous block to start decoding next block.
This is also valid between sections, since they are no different than changing block.

Previous version would incorrectly initialize repcodes to their default value at the beginning of each section.
When using them, there was a mismatch between encoder (default values) and decoder (values from previous block).

This change ensures that repcodes won't be used at the beginning of a new section.
It works by setting them to 0.
This only works with regular (single segment) variants : extDict variants will fail !
Fortunately, sections beyond the 1st one belong to this category.

To be checked : btopt strategy.
This change was only validated from fast to btlazy2 strategies.
2017-01-19 10:32:55 -08:00
Yann Collet
f1cb55192c fixed linux warnings 2017-01-02 01:11:55 +01:00
Nick Terrell
bb13387d7d Fix pool for threading.h 2016-12-31 19:10:47 -05:00
Nick Terrell
4204e03e77 Add threading.h condition variables 2016-12-31 19:10:29 -05:00
Yann Collet
3b9d434356 extended ZSTDMT code support for non-MT systems and WIN32 (preliminary) 2016-12-31 16:32:19 +01:00
Yann Collet
3b29dbd9e8 new zstdmt version using generic treadpool 2016-12-31 06:04:25 +01:00
Yann Collet
c6a6417458 bench correctly measures time for multi-threaded compression (posix only) 2016-12-31 03:31:26 +01:00
Nick Terrell
e777a5be6b Add a thread pool for ZSTDMT and COVER 2016-12-29 23:39:44 -08:00
Yann Collet
0819abe3c1 added ZSTD_createDDict_byReference() body 2016-12-21 19:25:15 +01:00
Nick Terrell
8de46ab51a Export all API functions 2016-12-16 13:27:30 -08:00
Yann Collet
5397a66b19 minor BMI version check 2016-12-13 15:21:06 +01:00
Nick Terrell
064a143520 Fix execSequence wildcopy undefined behavior
execSequence relied on pointer overflow to handle cases where
`sequence.matchLength < 8`.  Instead of passing an `size_t` to
wildcopy, pass a `ptrdiff_t`.
2016-12-12 19:01:23 -08:00
Yann Collet
825dffbc43 moved zbuff source files into lib/deprecated 2016-12-05 19:28:19 -08:00
Przemyslaw Skibinski
821bf1febc fixed Doxygen trailing comment 2016-12-02 16:13:41 +01:00
Yann Collet
b89af20353 reduced table sizes for HUF_readDTableX4 2016-12-01 18:24:59 -08:00
Yann Collet
a0d742b1e4 introduced HUF_buildCTable_wksp(), to reduce stack memory usage 2016-12-01 17:47:30 -08:00
Yann Collet
e928f7e16d introduced ext_wksp variants of count to reduce stack memory usage 2016-12-01 16:13:35 -08:00
Yann Collet
5e00b848a8 FSE_compress_wksp() uses less stack space 2016-11-30 16:46:13 -08:00
Yann Collet
d79a9a00d9 Introduced FSE_compress_wksp() and FSE_buildCTable_wksp() to reduce stack memory usage 2016-11-30 15:52:20 -08:00
Yann Collet
766431909f introduced FSE_decompress_wksp(), to use less stack space 2016-11-30 12:36:45 -08:00
Yann Collet
167c494748 Merge branch 'dev' of github.com:facebook/zstd into dev 2016-11-29 14:05:15 -08:00
Yann Collet
52e136ed3d long decoder compatible with round and separate buffers 2016-11-28 19:59:11 -08:00
Przemyslaw Skibinski
9ca65af810 zstd_opt.h: improved price function 2016-11-23 17:22:54 +01:00
Yann Collet
0d761dbe95 Merge pull request #453 from inikep/dev11
fullbench-dll
2016-11-16 15:45:30 -08:00
Yann Collet
52afb3993e zbuff API now generates deprecation warnings 2016-11-16 08:50:54 -08:00
Przemyslaw Skibinski
179555c1d1 working fullbench-dll 2016-11-15 18:05:46 +01:00
Yann Collet
2115724c22 Merge pull request #430 from terrelln/exec-sequences
ZSTD_execSequence() accepts match in last 7 bytes
2016-10-28 10:45:05 -07:00
Nick Terrell
eb7873a048 ZSTD_execSequence() accepts match in last 7 bytes
The zstd reference compressor will not emit a match in the last 7
bytes of a block.  The decompressor will also not accept a match
in the last 7 bytes.  This patch makes the decompressor accept a
match in the last 7 bytes.
2016-10-25 21:24:15 -07:00
Nick Terrell
d760529a05 Fix stack buffer overrun when weightTotal == 0
If `weightTotal == 0`, then `BIT_highbit32(weightTotal)` is
undefined behavior in the case that it calls `__builtin_clz()`.
If `tableLog == HUF_TABLELOG_ABSOLUTEMAX` then we will access one
byte beyond the end of the buffer.
2016-10-19 11:39:11 -07:00
Nick Terrell
ccfcc643da Check if dict is empty before reading first byte 2016-10-17 11:46:03 -07:00
Yann Collet
5d919e7ac3 added ZSTD_error_frameParameter_windowTooLarge (#403) 2016-10-12 17:29:24 -07:00
Yann Collet
ef2357d0d3 created error_private.c, so that a single list of error strings get included 2016-10-11 17:24:50 -07:00
Yann Collet
a17fd7312a changed error_public.h into zstd_errors.h 2016-10-11 16:41:09 -07:00
Yann Collet
18b51b99c0 sync fse 2016-10-11 08:21:09 -07:00
Yann Collet
51f4d566c2 small decompression speed boost for very small data 2016-09-22 15:57:28 +02:00