Commit Graph

2873 Commits

Author SHA1 Message Date
Yann Collet
1af570bd05 Merge pull request #585 from terrelln/cover-leak
Fix COVER_optimizeTrainFromBuffer() resource leaks
2017-03-02 20:46:35 -08:00
Yann Collet
f44b55c18d Merge pull request #584 from terrelln/huff-repeat
Allow compressor to repeat Huffman tables
2017-03-02 17:20:11 -08:00
Yann Collet
e02409fdc3 update NEWS on @iburinoc's 32-bits version improvement 2017-03-02 17:14:57 -08:00
Yann Collet
fe5d27062e disable prefetch-decode for 32-bits target
This decoder variant is detrimental to x86 architecture
likely due to register pressure.

Note that the variant is disabled for all 32-bits targets.
It's unclear if it would help for different architectures,
such as ARM, MIPS or PowerPC.
2017-03-02 17:09:21 -08:00
Yann Collet
3a55d8be26 Merge pull request #582 from iburinoc/m32
Encode/decode offsets >= 32MB in 32-bits mode
2017-03-02 16:42:50 -08:00
Nick Terrell
d051cd5b43 Use workspace for count and CTable 2017-03-02 16:38:07 -08:00
Nick Terrell
976e325b2e Fix COVER_optimizeTrainFromBuffer() resource leaks
Thanks to @nemequ for reporting the resource leaks.
2017-03-02 15:54:39 -08:00
Sean Purcell
553f67e0c1 Remove 'generic' inline strategy
Seems to avoid performance loss for compression.
Same strategy tested on decompression side, did not appear to improve
speed.
2017-03-02 15:18:13 -08:00
Sean Purcell
3d95925a59 Merge remote-tracking branch 'origin/dev' into m32 2017-03-02 15:17:56 -08:00
Nick Terrell
a419777eb1 Allow compressor to repeat Huffman tables
* Compressor saves most recently used Huffman table and reuses it
  if it produces better results.
* I attempted to preserve CPU usage profile.
  I intentionally left all of the existing heuristics in place.
  There is only a speed difference on the second block and later.
  When compressing large enough blocks (say >= 4 KiB) there is
  no significant difference in compression speed.
  Dictionary compression of one block is the same speed for blocks
  with literals <= 1 KiB, and after that the difference is not
  very significant.
* In the synthetic data, with blocks 10 KB or smaller, most blocks
  can't use repeated tables because the previous block did not
  contain a symbol that the current block contains.
  Once blocks are about 12 KB or more, most previous blocks have
  valid Huffman tables for the current block, and the compression
  ratio and decompression speed jumped.
* In silesia blocks as small as 4KB can frequently reuse the
  previous Huffman table (85%), but it isn't as profitable, and
  the previous Huffman table only gets used about 3% of the time.
* Microbenchmarks show that `HUF_validateCTable()` takes ~55 ns
  and `HUF_estimateCompressedSize()` takes ~35 ns.
  They are decently well optimized, the first versions took 90 ns
  and 120 ns respectively. `HUF_validateCTable()` could be twice as
  fast, if we cast the `HUF_CElt*` to a `U32*` and compare to 0.
  However, `U32` has an alignment of 4 instead of 2, so I think that
  might be undefined behavior.
* I've ran `zstreamtest` compiled normally, with UASAN and with MSAN
  for 4 hours each.

The worst case for the speed difference is a bunch of small blocks
in the same frame. I modified `bench.c` to compress the input in a
single frame but with blocks of the given block size, set by `-B`.
Benchmarks on level 1:

|  Program  | Block size |   Corpus  | Ratio | Compression MB/s | Decompression MB/s |
|-----------|------------|-----------|-------|------------------|--------------------|
| zstd.base |        256 | synthetic | 2.364 |            110.0 |              297.0 |
|      zstd |        256 | synthetic | 2.367 |            108.9 |              297.0 |
| zstd.base |        256 | silesia   | 2.204 |             93.8 |              415.7 |
|      zstd |        256 | silesia   | 2.204 |             93.4 |              415.7 |
| zstd.base |        512 | synthetic | 2.594 |            144.2 |              420.0 |
|      zstd |        512 | synthetic | 2.599 |            141.5 |              425.7 |
| zstd.base |        512 | silesia   | 2.358 |            118.4 |              432.6 |
|      zstd |        512 | silesia   | 2.358 |            119.8 |              432.6 |
| zstd.base |       1024 | synthetic | 2.790 |            192.3 |              594.1 |
|      zstd |       1024 | synthetic | 2.794 |            192.3 |              600.0 |
| zstd.base |       1024 | silesia   | 2.524 |            148.2 |              464.2 |
|      zstd |       1024 | silesia   | 2.525 |            148.2 |              467.6 |
| zstd.base |       4096 | synthetic | 3.023 |            300.0 |             1000.0 |
|      zstd |       4096 | synthetic | 3.024 |            300.0 |             1010.1 |
| zstd.base |       4096 | silesia   | 2.779 |            223.1 |              623.5 |
|      zstd |       4096 | silesia   | 2.779 |            223.1 |              636.0 |
| zstd.base |      16384 | synthetic | 3.131 |            350.0 |             1150.1 |
|      zstd |      16384 | synthetic | 3.152 |            350.0 |             1630.3 |
| zstd.base |      16384 | silesia   | 2.871 |            296.5 |              883.3 |
|      zstd |      16384 | silesia   | 2.872 |            294.4 |              898.3 |
2017-03-02 13:27:52 -08:00
Yann Collet
fdb0fd34b3 Merge pull request #583 from terrelln/set-dictid
Set dictID to 0 for content only dictionaries
2017-03-02 13:15:31 -08:00
Nick Terrell
3475b9b431 Set dictID to 0 for content only dictionaries 2017-03-02 12:33:02 -08:00
Yann Collet
78208bd8be fixed : build zstd cli after libzstd 2017-03-01 21:02:06 -08:00
Yann Collet
27526c7201 make : added target shortest
shortest only run fast part of playTests.sh .
cc @iburinoc
2017-03-01 17:02:49 -08:00
Yann Collet
c1c040eae1 added gzip tests
also : made sure zstd --format=gzip -V
would fail if gzip compatibility is not supported
2017-03-01 16:49:20 -08:00
Sean Purcell
d44703d145 Offsets >= 32MB in 32-bits mode 2017-03-01 16:27:56 -08:00
Yann Collet
76f0494089 xxhash can be included twice in any order
Previously,

followed by :

would fail to include the static definitions,
because the second include was simply skipped by guard macro.

Now it works as intended :
the missing static part is included during the second include.
2017-03-01 13:29:29 -08:00
Yann Collet
4bcc69b761 solves warnings when compiling with global XXH_STATIC_LINKING_ONLY
XXH_STATIC_LINKING_ONLY protection macro is intended to be triggered just before the include.
The main idea is to keep this setting local :
user module shall explicitly understand and accept the static linking restriction
which becomes transparent when triggering the macro at project level.
Global definition also triggers redefinition warnings for user modules which do locally define the macro.

This new version compiles lib and cli without warning when the macro is set globally.
That's not a scenario to be recommended, since it trades a local effect for a global one,
but it was easy enough to provide from zstd side.
2017-03-01 11:33:25 -08:00
Yann Collet
31432cc57d Merge pull request #579 from iburinoc/multiframe
Check to ensure ddict isn't null before dereference
2017-03-01 11:02:04 -08:00
Yann Collet
51598510c0 Merge pull request #580 from facebook/speedStream
Improve streaming decompression speed
2017-03-01 10:59:51 -08:00
Yann Collet
43764cdb1d updated NEWS for 1.1.4
cmake, performance
2017-02-28 17:44:17 -08:00
Yann Collet
c896735b8d Merge pull request #575 from Majlen/cmake-improvement
Cmake improvement
2017-02-28 15:32:21 -08:00
Sean Purcell
a81d4fee58 Check to ensure ddict isn't null before dereference 2017-02-28 15:28:29 -08:00
Yann Collet
a5cbc02ed1 Merge pull request #578 from inikep/dev
decompression: --rm is silent when input is stdin
2017-02-28 15:21:28 -08:00
Przemyslaw Skibinski
5c1c80cbb6 travis.yml: fixed pull_request 2017-02-28 18:34:39 +01:00
Yann Collet
22d79762ef fixed multi frames 2017-02-28 02:12:42 -08:00
Milan Ševčík
4b62f41969 Added compile flags to pzstd
Definition NDEBUG from original Makefile
-Wno-shadow silences shadowing in initializers
2017-02-28 10:57:09 +01:00
Milan Ševčík
eeb080e601 -Wstrict-prototypes is not supported with C++ 2017-02-28 10:57:09 +01:00
Milan Ševčík
5a1cc5c22d Improve handling of library symlinks.
Previous method was failing to remove the symlinks when make clean was
invoked and wasn't portable.
2017-02-28 10:57:09 +01:00
Milan Ševčík
bf8a30ce0d Add zstdmt target in cmake 2017-02-28 10:57:09 +01:00
Milan Ševčík
59709d97d9 Support building contrib utils from cmake 2017-02-28 10:57:09 +01:00
Yann Collet
a33ae64204 fixed decoding skippable frames 2017-02-28 01:15:28 -08:00
Yann Collet
c0b1731bce added test for decompression with NULL dict and NULL DDict
previous version of ZSTD_decompressMultiFrame() would fail that test
2017-02-28 01:02:46 -08:00
Przemyslaw Skibinski
8e5032a965 cli : fix : --rm is silent when input is stdin (decompression) 2017-02-28 09:42:37 +01:00
Przemyslaw Skibinski
8b3560e196 update gzip tests 2017-02-28 09:41:23 +01:00
Yann Collet
d1760113ec Improved speed of ZSTD_decompressStream()
When ZSTD_decompressStream() detects
that there is enough space in dst
to complete decompression in a single pass,
delegates to ZSTD_decompress(),
for an extra ~5% speed boost
2017-02-28 00:14:28 -08:00
Przemyslaw Skibinski
a3352d06bc updated .travis.yml (2) 2017-02-28 08:20:53 +01:00
Przemyslaw Skibinski
ca1d3d4232 updated .travis.yml 2017-02-28 08:16:49 +01:00
Yann Collet
1d7f30f9d4 Merge branch 'decompressStream' into dev 2017-02-27 20:55:22 -08:00
Yann Collet
a81c2e7e44 Merge pull request #573 from facebook/ddict
Improved DDict memory usage
2017-02-27 20:54:42 -08:00
Yann Collet
952d06fa9c fullbench : -i0 displays list of functions to bench 2017-02-27 17:58:02 -08:00
Yann Collet
67d86a74a5 added test case : --rm on stdin
must remain silent (instead of failing)
2017-02-27 16:09:20 -08:00
Yann Collet
ef569bf75f Merge branch 'dev' of github.com:facebook/zstd into dev 2017-02-27 15:58:38 -08:00
Yann Collet
dccd6b6f65 cli : fix : --rm is silent when input is stdin
previously, app would produce an error message, and stop.
2017-02-27 15:57:50 -08:00
Yann Collet
ea7589ce07 Merge pull request #571 from inikep/dev11
gzip tests
2017-02-27 13:54:33 -08:00
Przemyslaw Skibinski
5d848527e6 use "./gzip" for gzip tests 2017-02-27 22:02:03 +01:00
Yann Collet
b78f211068 Merge pull request #569 from iburinoc/testcorpus
Fix some more ARM compile errors
2017-02-27 10:19:37 -08:00
Yann Collet
3ac85faf1f Merge pull request #572 from prashantkhandelwal/dev
Fix for a small typo
2017-02-27 10:19:08 -08:00
Przemyslaw Skibinski
862698f479 minor tweaks in FIO_decompressGzFrame 2017-02-27 13:21:05 +01:00
Prashant Khandelwal
013f8b4c27 Fix for a small Typo 2017-02-27 16:28:22 +05:30