Commit Graph

631 Commits

Author SHA1 Message Date
Yann Collet
a57b4df85f removed literalCompression directive
in this version, literal compression is always disabled for ZSTD_fast strategy.

Performance parity between ZSTD_compress_advanced() and ZSTD_compress_generic()
2018-06-07 15:24:12 -07:00
Yann Collet
8537bfd85c fuzzer: make negative compression level fail
result of ZSTD_compress_advanced()
is different from ZSTD_compress_generic()
when using negative compression levels
because the disabling of huffman compression is not passed in parameters.
2018-06-07 15:12:13 -07:00
Yann Collet
e3c42c739b clean ZSTD_compress() initialization
The (pretty old) code inside ZSTD_compress()
was making some pretty bold assumptions
on what's inside a CCtx and how to init it.

This is pretty fragile by design.
CCtx content evolve.
Knowledge of how to handle that should be concentrate in one place.

A side effect of this strategy
is that ZSTD_compress() wouldn't check for BMI2 capability,
and is therefore missing out some potential speed opportunity.

This patch makes ZSTD_compress() use
the same initialization and release functions
as the normal creator / destructor ones.

Measured on my laptop, with a custom version of bench
manually modified to use ZSTD_compress() (instead of the advanced API) :
This patch :
 1#silesia.tar       : 211984896 ->  73651053 (2.878), 312.2 MB/s , 723.8 MB/s
 2#silesia.tar       : 211984896 ->  70163650 (3.021), 226.2 MB/s , 649.8 MB/s
 3#silesia.tar       : 211984896 ->  66996749 (3.164), 169.4 MB/s , 636.7 MB/s
 4#silesia.tar       : 211984896 ->  65998319 (3.212), 136.7 MB/s , 619.2 MB/s
dev branch :
 1#silesia.tar       : 211984896 ->  73651053 (2.878), 291.7 MB/s , 727.5 MB/s
 2#silesia.tar       : 211984896 ->  70163650 (3.021), 216.2 MB/s , 655.7 MB/s
 3#silesia.tar       : 211984896 ->  66996749 (3.164), 162.2 MB/s , 633.1 MB/s
 4#silesia.tar       : 211984896 ->  65998319 (3.212), 130.6 MB/s , 618.6 MB/s
2018-06-07 14:05:25 -07:00
Yann Collet
2108decb41 Fixed a nasty corruption bug
recently introduce into the new dictionary mode.
The bug could be reproduced with this command :
./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639

error was in function ZSTD_count_2segments() :
the beginning of the 2nd segment corresponds to prefixStart
and not the beginning of the current block (istart == src).
This would result in comparing the wrong byte.
2018-06-01 18:54:34 -07:00
Yann Collet
7c33b48221
Merge pull request #1151 from felixhandte/zstd-dfast-in-place-dict-goto
ZSTD_dfast: Support Searching the Dictionary Context In-Place (Alternate `goto` Implementation)
2018-05-31 17:37:09 -07:00
W. Felix Handte
48deab92de Allow Different Dict Attachment Cut-Offs for Different Strategies 2018-05-31 17:37:44 -04:00
Yann Collet
bb6eaf6495
Merge pull request #1153 from facebook/dynThreshold
changed dynamic fse threshold for offset
2018-05-26 08:43:45 -07:00
Yann Collet
e916c365a1 fixed minor visual warning 2018-05-25 20:43:09 -07:00
Yann Collet
a7fdceeccd changed dynamic fse threshold for offset
recent experienced showed that
default distribution table for offset
can get it wrong pretty quickly with the nb of symbols,
while it remains a reasonable choice much longer for lengths symbols.

Changed the formula,
so that dynamic threshold is now 32 symbols for offsets.
It remains at 64 symbols for lengths.

Detection based on defaultNormLog
2018-05-25 17:41:16 -07:00
Yann Collet
4b3a36d5d8 Merge branch 'dev' into lowCompression 2018-05-25 15:45:03 -07:00
Yann Collet
5f177f1c53 btultra accepts blocks with poorer compression ratio
zstd rejects blocks which do not compress by at least a certain amount.
In which case, such block is simply emitted uncompressed (even if a little bit of compression could be achieved).
This is better for decompression speed, hence for energy.

The logic is controlled by ZSTD_minGain().
The rule is applied uniformly, at all compression levels.

This change makes btultra accepts blocks with poor compression ratios.
We presume that users of btultra mode prefers compression ratio over some decompress speed gains.

The threshold for minimum gain is lowered for btultra
from s>>6 (~1.5% minimum gain)
to s>>7 (~0.8% minimum gain).

This is a prudent change.
Not sure if it's large enough.
2018-05-25 15:19:52 -07:00
W. Felix Handte
c10d1b4011 Skeleton for In-Place Impl for ZSTD_dfast 2018-05-25 13:13:28 -04:00
Yann Collet
f6ad59ab5c Merge branch 'dev' into staticDictCost 2018-05-24 16:21:02 -07:00
Yann Collet
08c5be5db3
Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict
ZSTD_fast: Support Searching the Dictionary Context In-Place
2018-05-23 19:32:25 -07:00
Nick Terrell
06b70179da
Work around bug in zstd decoder (#1147)
Work around bug in zstd decoder

Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.
2018-05-23 18:02:30 -07:00
W. Felix Handte
d9c7e67125 Assert that Dict and Current Window are Adjacent in Index Space 2018-05-23 17:53:03 -04:00
W. Felix Handte
298d24fa57 Make loadedDictEnd an Index, not the Dict Len 2018-05-23 17:53:03 -04:00
W. Felix Handte
7ef85e0618 Fixes in re Comments 2018-05-23 17:53:03 -04:00
W. Felix Handte
582b7f85ed Don't Attach Empty Dict Contents
In weird corner cases, they produce unexpected results...
2018-05-23 17:53:03 -04:00
W. Felix Handte
a44ab3b475 Remove Out-of-Date Comment 2018-05-23 17:53:03 -04:00
W. Felix Handte
7e0402e738 Also Attach Dict When Source Size is Unknown 2018-05-23 17:53:03 -04:00
W. Felix Handte
3ba70cc759 Clear the Dictionary When Sliding the Window 2018-05-23 17:53:03 -04:00
W. Felix Handte
2d598e6fed Force Working Context Indices Greater than Dict Indices 2018-05-23 17:53:03 -04:00
W. Felix Handte
ae4fcf7816 Respond to PR Comments; Formatting/Style/Lint Fixes 2018-05-23 17:53:03 -04:00
W. Felix Handte
66bc1ca641 Change Cut-Off to 8 KB 2018-05-23 17:53:03 -04:00
W. Felix Handte
b67196f30d Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff 2018-05-23 17:53:03 -04:00
W. Felix Handte
265c2869d1 Split Wrapper Functions to Cause Inlining 2018-05-23 17:53:03 -04:00
W. Felix Handte
d18a405779 Refer to the Dictionary Match State In-Place (Sometimes) 2018-05-23 17:53:03 -04:00
Nick Terrell
e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
Yann Collet
7a8b3496b4 Merge branch 'dev' into staticDictCost 2018-05-22 15:10:05 -07:00
Nick Terrell
49cf880513 Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.
2018-05-22 14:33:22 -07:00
Yann Collet
5381369cb1 Merge branch 'dev' into tableLevels 2018-05-18 18:23:27 -07:00
Yann Collet
b0b3fb517d updated compression levels for blocks of 256KB 2018-05-18 17:17:12 -07:00
Yann Collet
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
Yann Collet
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00
Yann Collet
63eeeaa1dd update table levels for blocks <= 16K
also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.
2018-05-16 16:13:37 -07:00
Yann Collet
f372ffc64d
Merge pull request #1127 from facebook/staticDictCost
Improved optimal parser with dictionary
2018-05-14 17:45:50 -07:00
Yann Collet
c9227ee16b update table for 128 KB blocks 2018-05-13 17:15:07 -07:00
Yann Collet
b4250489cf update compression levels for large inputs 2018-05-13 01:53:38 -07:00
Yann Collet
99ddca43a6 fixed wrong assertion
base can actually overflow
2018-05-10 19:48:09 -07:00
Yann Collet
1a26ec6e8d opt: init statistics from dictionary
instead of starting from fake "default" statistics.
2018-05-10 17:59:12 -07:00
Yann Collet
ac6105463a opt: minor improvements to log traces
slight improvement when using fractional-bit evaluation (opt:dictionay)
2018-05-09 15:46:11 -07:00
Yann Collet
4d5bd32a00 added traces to look at symbol costs
evaluation looks correct.
2018-05-09 12:00:12 -07:00
Yann Collet
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
Yann Collet
a155061328 minor code refactor for readability
removed some useless operations from optimal parser
(should not change performance, too small a difference)
2018-05-08 12:32:44 -07:00
Yann Collet
ad4524d605 fix ZSTD_compressBlock() associated with CDict
reported by @let-def.

It's actually a bug in ZSTD_compressBegin_usingCDict()
which would pass a wrong pledgedSrcSize value (0 instead of ZSTD_CONTENTSIZE_UNKNOWN)
resulting in wrong window size, resulting in downsized seqStore,
resulting in segfault when writing into the seqStore later in the process.

Added a test in fuzzer to cover this use case (fails before the patch).
2018-05-07 12:54:13 -07:00
Nick Terrell
ca77822ddf Fix parameter adjustment with dictionary
The new advanced API basically set `requestedParams = appliedParams` when
using a dictionary. This halted all parameter adjustment, which can hurt
compression ratio if, for example, the window log is small for the first
call, but the rest of the files are large.

This patch fixes the bug, and checks that the `requestedParams` don't change
in the new advanced API when using a dictionary, and generally in the fuzzer.
2018-04-25 16:32:29 -07:00
Nick Terrell
c0987986e5 Only reset CDict in ZSTD_CCtx_resetParameters() 2018-04-13 11:26:40 -07:00
Nick Terrell
9f76eebd17 Add ZSTD_CCtx_resetParameters() function
* Fix docs for `ZSTD_CCtx_reset()`.
* Add `ZSTD_CCtx_resetParameters()`.

Fixes #1094.
2018-04-12 16:54:07 -07:00
Nick Terrell
3c3f59e68f
Enforce pledgeSrcSize whenever known (#1106)
The test fails before the patch and passes after.

Fixes #1095.
2018-04-12 16:02:03 -07:00