Commit Graph

6596 Commits

Author SHA1 Message Date
Nick Terrell
0827edeace [libzstd] Bump the library version to 1.4.0
Bumps the library version to 1.4.0 in preparation to stabilize the
advanced API.
2019-04-03 18:43:20 -07:00
Nick Terrell
72a3fbc0e4
Merge pull request #1562 from terrelln/2fast
[libzstd] Speed up single segment zstd_fast by 5%
2019-04-03 18:08:15 -07:00
Nick Terrell
56261001ea
Merge pull request #1567 from terrelln/examples2
[examples] Update streaming_decompression.c
2019-04-03 11:27:49 -07:00
Yann Collet
816a3f47c7
Merge pull request #1568 from terrelln/examples3
Update streaming_memory_usage.c and fix ZSTD_estimateCStreamSize_usingCCtxParams()
2019-04-03 09:07:13 -07:00
Nick Terrell
cdc8ae2e9b [examples] Update streaming_memory_usage.c
Update to use the new streaming API. Making progress on Issue #1548.

Tested that the checks don't fail.
Tested with window log 9-32. The lowest and highest fail as expected.
2019-04-02 19:20:57 -07:00
Nick Terrell
00679da22b [libzstd] Setting ZSTD_d_maxWindowLog to 0 means default 2019-04-02 19:20:52 -07:00
Nick Terrell
95624b77e4 [libzstd] Speed up single segment zstd_fast by 5%
This PR is based on top of PR #1563.

The optimization is to process two input pointers per loop.
It is based on ideas from [igzip] level 1, and talking to @gbtucker.

| Platform                | Silesia     | Enwik8 |
|-------------------------|-------------|--------|
| OSX clang-10            | +5.3%       | +5.4%  |
| i9 5 GHz gcc-8          | +6.6%       | +6.6%  |
| i9 5 GHz clang-7        | +8.0%       | +8.0%  |
| Skylake 2.4 GHz gcc-4.8 | +6.3%       | +7.9%  |
| Skylake 2.4 GHz clang-7 | +6.2%       | +7.5%  |

Testing on all Silesia files on my Intel i9-9900k with gcc-8

| Silesia File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| silesia.tar  | +0.17%       | +6.6%        |
| dickens      | +0.25%       | +7.0%        |
| mozilla      | +0.02%       | +6.8%        |
| mr           | -0.30%       | +10.9%       |
| nci          | +1.28%       | +4.5%        |
| ooffice      | -0.35%       | +10.7%       |
| osdb         | +0.75%       | +9.8%        |
| reymont      | +0.65%       | +4.6%        |
| samba        | +0.70%       | +5.9%        |
| sao          | -0.01%       | +14.0%       |
| webster      | +0.30%       | +5.5%        |
| xml          | +0.92%       | +5.3%        |
| x-ray        | -0.00%       | +1.4%        |

Same tests on Calgary. For brevity, I've only included files
where compression ratio regressed or was much better.

| Calgary File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| calgary.tar  | +0.30%       | +7.1%        |
| geo          | -0.14%       | +25.0%       |
| obj1         | -0.46%       | +15.2%       |
| obj2         | -0.18%       | +6.0%        |
| pic          | +1.80%       | +9.3%        |
| trans        | -0.35%       | +5.5%        |

We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.

I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:

| Level | Ratio Change | Speed Change |
|-------|--------------|--------------|
| -1    | +0.13%       | +6.4%        |
| -2    | +4.6%        | -1.5%        |
| -3    | +7.5%        | -4.8%        |
| -4    | +8.5%        | -6.9%        |
| -5    | +9.1%        | -9.1%        |

Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller.  If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.

[igzip]: https://github.com/01org/isa-l/tree/master/igzip
2019-04-02 19:02:50 -07:00
Nick Terrell
de58910b5a [examples] Update streaming_decompression.c
Update to use the new streaming API. Making progress on Issue #1548.

Tested that it can decompress files produced by `streaming_compression`.
Tested that it can decompress two frames concatenated together.
Tested that it fails on corrupted data.
2019-04-02 18:52:59 -07:00
Nick Terrell
882ceb86bc
Merge pull request #1566 from terrelln/examples
[examples] Update multiple_streaming_compression.c
2019-04-02 17:13:10 -07:00
Nick Terrell
56682a7709 Fix ZSTD_estimateCStreamSize_usingCCtxParams()
It wasn't using the ZSTD_CCtx_params correctly. It must actualize
the compression parameters by calling ZSTD_getCParamsFromCCtxParams()
to get the real window log.

Tested by updating the streaming memory usage example in the next
commit. The CHECK() failed before this patch, and passes after.

I also added a unit test to zstreamtest.c that failed before this
patch, and passes after.
2019-04-01 18:02:52 -07:00
Nick Terrell
04325cbc2f Fix indentation 2019-04-01 17:33:49 -07:00
Nick Terrell
fb13d757af [examples] Update multiple_streaming_compression.c
Update to use the new streaming API. Making progress on Issue #1548.

Tested that multiple files could be compressed, and that the output
is the same as calling `streaming_compression` multiple times with
the same compression level, and that it can be decompressed.
2019-04-01 16:41:06 -07:00
Nick Terrell
425ce5547c
Merge pull request #1563 from terrelln/dms-sep
[libzstd] Split out zstd_fast dict match state function
2019-03-29 16:19:21 -06:00
Nick Terrell
f00407b640 Split out zstd_fast dict match state function 2019-03-29 10:39:16 -06:00
Nick Terrell
6625f3b390
Merge pull request #1561 from shakeelrao/fix-typo
Update comments in zstd.h and fileio.c
2019-03-28 23:42:16 -06:00
shakeelrao
dca73db30c fix srcSize typo and add new UTIL func to comment 2019-03-28 17:50:34 -07:00
Nick Terrell
dcc6c7e9ae
Merge pull request #1556 from terrelln/dictbuilder
[cover] Improvements for small or homogeneous data
2019-03-25 15:08:32 -07:00
Nick Terrell
440f390cba
Merge pull request #1557 from terrelln/examples
[examples] Update streaming_compression to the new API
2019-03-25 15:07:35 -07:00
Nick Terrell
7186a50775
Merge pull request #1559 from shakeelrao/reject-dict
[CLI] ensure dictionary and input file are different
2019-03-25 15:06:58 -07:00
shakeelrao
44f77b5c71 Add whitespace to test case 2019-03-24 03:42:11 -07:00
shakeelrao
b25d7eacf2 Rename test 2019-03-24 03:40:03 -07:00
shakeelrao
2b4491d81a Add CLI test to validate error 2019-03-24 00:47:13 -07:00
shakeelrao
5333e41ab3 Add NULL check for dict 2019-03-24 00:23:50 -07:00
shakeelrao
8ea219d8c6 Modify error msg 2019-03-23 21:59:30 -07:00
shakeelrao
1290933d19 Implement file check 2019-03-23 21:53:13 -07:00
shakeelrao
e5811e5520 Extract file comparison into utility func 2019-03-23 19:04:56 -07:00
Nick Terrell
f5cbee988b [examples] Update streaming_compression to the new API 2019-03-23 15:59:26 -07:00
Nick Terrell
d97605ad85
Merge pull request #1558 from nehaljwani/fix-version-soversion-libzstd
[libzstd] Specify soversion and version corectly for CMake build
2019-03-23 13:32:39 -07:00
Nehal J Wani
7ac2052dbc
[libzstd] Specify soversion and version correctly for CMake build
Fixes #1512
2019-03-23 17:37:37 +05:30
Nick Terrell
d0f5ba36fb [cover] Improvements for small or homogeneous data
* The algorithm would bail as soon as it found one epoch that
  contained no new segments. Change it so it now has to fail
  >= 10 times in a row (10 for fastcover, 10-100 for cover).
* The algorithm uses the `maxDict` size to decide the epoch size.
  When this size is absurdly large, it causes tiny epochs. Lower
  bound the epoch size at 10x the segment size, and warn the user
  that their training set is too small.

Fixes #1554
2019-03-22 14:14:46 -07:00
Nick Terrell
0c7668cd06
Merge pull request #1555 from terrelln/load-dict
[lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
2019-03-21 17:52:57 -07:00
Nick Terrell
6b053b9f60 [lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
  compression job. This allows the user to pass the dictionary before they
  set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
2019-03-21 16:13:53 -07:00
Nick Terrell
041bc0bdb7
Merge pull request #1551 from terrelln/stream-wrap
Refactor old stream API to completely wrap the advanced API
2019-03-21 16:10:00 -07:00
Nick Terrell
20f9ff7e53 Update documentation to tell how to replace the old streaming API with the new one. 2019-03-21 16:08:58 -07:00
Nick Terrell
e55da9e963 Wrap the new advanced api completely 2019-03-21 10:54:40 -07:00
Nick Terrell
11e73576bb [regression] Add more streaming tests
* Test all of the `ZSTD_initCStream*()` variants.
* Fix a typo in the zstdcli method.
2019-03-21 10:54:18 -07:00
Nick Terrell
0dd3588acc
Merge pull request #1553 from shakeelrao/legacy-bound
Add legacy support to decompressBound
2019-03-19 10:53:12 -07:00
shakeelrao
186ded6d91 Fix typo in legacy documentation 2019-03-19 01:44:08 -07:00
shakeelrao
5740eb6769 Remove extraneous spacing in comments 2019-03-18 21:05:35 -07:00
shakeelrao
0a3fa6f909 Add legacy mode in documentation 2019-03-18 20:33:15 -07:00
shakeelrao
20aa1b455c Stylistic changes 2019-03-17 19:35:43 -07:00
shakeelrao
0033bb4785 Update documentation for ZSTD_frameSizeInfo 2019-03-17 17:41:27 -07:00
shakeelrao
19b75b6ecb Test new ZSTD_findFrameCompressedSize and update documentation 2019-03-15 18:04:19 -07:00
shakeelrao
8cd423a659 Reorder declaration in ZSTD_findFrameSizeInfoLegacy 2019-03-15 16:20:34 -07:00
shakeelrao
60796e76b0 Add legacy support to decompressBound 2019-03-15 16:10:37 -07:00
Nick Terrell
f52a7d8faa
Merge pull request #1547 from shakeelrao/fix-error
Fix incorrect error code in ZSTD_errorFrameSizeInfo
2019-03-15 10:57:49 -07:00
shakeelrao
4c0540da1c Add static linking to legacy tests 2019-03-15 05:13:55 -07:00
shakeelrao
91ffc8d256 Add test to validate patch 2019-03-15 03:59:03 -07:00
Nick Terrell
45139e9fd5
Merge pull request #1550 from terrelln/cparams-cdict
[libzstd] Allow compression parameters to be set with a cdict
2019-03-13 19:17:22 -07:00
Nick Terrell
18fbcddd0c [zstreamtest] Remove outdated test 2019-03-13 17:01:23 -07:00