Commit Graph

920 Commits

Author SHA1 Message Date
Yann Collet
809f2f9322 minor update of literal cost function
just assert() there is no negative cost evaluation for literals
2018-05-29 15:34:50 -07:00
Yann Collet
463a0fe38b simplified optimal parser
removed "cached" structure.
prices are now saved in the optimal table.

Primarily done for simplification.
Might improve speed by a little.
But actually, and surprisingly, also improves ratio in some circumstances.
2018-05-29 14:07:25 -07:00
Yann Collet
bb6eaf6495
Merge pull request #1153 from facebook/dynThreshold
changed dynamic fse threshold for offset
2018-05-26 08:43:45 -07:00
Yann Collet
e916c365a1 fixed minor visual warning 2018-05-25 20:43:09 -07:00
Yann Collet
a7fdceeccd changed dynamic fse threshold for offset
recent experienced showed that
default distribution table for offset
can get it wrong pretty quickly with the nb of symbols,
while it remains a reasonable choice much longer for lengths symbols.

Changed the formula,
so that dynamic threshold is now 32 symbols for offsets.
It remains at 64 symbols for lengths.

Detection based on defaultNormLog
2018-05-25 17:41:16 -07:00
Yann Collet
4b3a36d5d8 Merge branch 'dev' into lowCompression 2018-05-25 15:45:03 -07:00
Yann Collet
5f177f1c53 btultra accepts blocks with poorer compression ratio
zstd rejects blocks which do not compress by at least a certain amount.
In which case, such block is simply emitted uncompressed (even if a little bit of compression could be achieved).
This is better for decompression speed, hence for energy.

The logic is controlled by ZSTD_minGain().
The rule is applied uniformly, at all compression levels.

This change makes btultra accepts blocks with poor compression ratios.
We presume that users of btultra mode prefers compression ratio over some decompress speed gains.

The threshold for minimum gain is lowered for btultra
from s>>6 (~1.5% minimum gain)
to s>>7 (~0.8% minimum gain).

This is a prudent change.
Not sure if it's large enough.
2018-05-25 15:19:52 -07:00
Yann Collet
e2c0e3d437 slightly nudge choices towards less sequences
also slightly improve some strange detrimental corner cases.
2018-05-25 14:52:21 -07:00
Yann Collet
f6ad59ab5c Merge branch 'dev' into staticDictCost 2018-05-24 16:21:02 -07:00
Yann Collet
b5ef32fea7 Merge branch 'dev' into fracFse 2018-05-24 14:09:49 -07:00
Yann Collet
776128d16f fix corner case when requiring cost of an FSE symbol
ensure that, when frequency[symbol]==0,
result is (tableLog + 1) bits
with both upper-bit and fractional-bit estimates.

Also : enable BIT_DEBUG in /tests
2018-05-24 13:59:11 -07:00
Yann Collet
08c5be5db3
Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict
ZSTD_fast: Support Searching the Dictionary Context In-Place
2018-05-23 19:32:25 -07:00
Nick Terrell
06b70179da
Work around bug in zstd decoder (#1147)
Work around bug in zstd decoder

Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.
2018-05-23 18:02:30 -07:00
W. Felix Handte
d9c7e67125 Assert that Dict and Current Window are Adjacent in Index Space 2018-05-23 17:53:03 -04:00
W. Felix Handte
298d24fa57 Make loadedDictEnd an Index, not the Dict Len 2018-05-23 17:53:03 -04:00
W. Felix Handte
7ef85e0618 Fixes in re Comments 2018-05-23 17:53:03 -04:00
W. Felix Handte
582b7f85ed Don't Attach Empty Dict Contents
In weird corner cases, they produce unexpected results...
2018-05-23 17:53:03 -04:00
W. Felix Handte
9c92223468 Avoid Undefined Behavior in Match Ptr Calculation 2018-05-23 17:53:03 -04:00
W. Felix Handte
a44ab3b475 Remove Out-of-Date Comment 2018-05-23 17:53:03 -04:00
W. Felix Handte
95bdf20a87 Moar Renames 2018-05-23 17:53:03 -04:00
W. Felix Handte
7e0402e738 Also Attach Dict When Source Size is Unknown 2018-05-23 17:53:03 -04:00
W. Felix Handte
3ba70cc759 Clear the Dictionary When Sliding the Window 2018-05-23 17:53:03 -04:00
W. Felix Handte
b05ae9b608 Refine ip Initialization to Avoid ARM Weirdness 2018-05-23 17:53:03 -04:00
W. Felix Handte
1a7b34ef28 Use New Index Invariant to Simplify Conditionals 2018-05-23 17:53:03 -04:00
W. Felix Handte
2d598e6fed Force Working Context Indices Greater than Dict Indices 2018-05-23 17:53:03 -04:00
W. Felix Handte
d005e5daf4 Whitespace Fix 2018-05-23 17:53:03 -04:00
W. Felix Handte
154eb09419 Switch to Original Match Calc for noDict Repcode Check 2018-05-23 17:53:03 -04:00
W. Felix Handte
191fc74a51 Rename 'hasDict' to 'dictMode' 2018-05-23 17:53:03 -04:00
W. Felix Handte
ae4fcf7816 Respond to PR Comments; Formatting/Style/Lint Fixes 2018-05-23 17:53:03 -04:00
W. Felix Handte
ca26cecc7a Rename and Reformat 2018-05-23 17:53:03 -04:00
W. Felix Handte
66bc1ca641 Change Cut-Off to 8 KB 2018-05-23 17:53:03 -04:00
W. Felix Handte
c31ee3c7f8 Fix Rep Code Initialization 2018-05-23 17:53:03 -04:00
W. Felix Handte
b67196f30d Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff 2018-05-23 17:53:03 -04:00
W. Felix Handte
265c2869d1 Split Wrapper Functions to Cause Inlining 2018-05-23 17:53:03 -04:00
W. Felix Handte
6929964d65 Add bounds check in repcode tests 2018-05-23 17:53:03 -04:00
W. Felix Handte
70a537d1d7 Initial Repcode Check Support for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
W. Felix Handte
8d24ff0353 Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
W. Felix Handte
d18a405779 Refer to the Dictionary Match State In-Place (Sometimes) 2018-05-23 17:53:03 -04:00
Nick Terrell
e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
Yann Collet
7a8b3496b4 Merge branch 'dev' into staticDictCost 2018-05-22 15:10:05 -07:00
Yann Collet
a8ddf1d370 disable 2-passes strategy 2018-05-22 15:06:36 -07:00
Nick Terrell
49cf880513 Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.
2018-05-22 14:33:22 -07:00
Yann Collet
5381369cb1 Merge branch 'dev' into tableLevels 2018-05-18 18:23:27 -07:00
Yann Collet
b0b3fb517d updated compression levels for blocks of 256KB 2018-05-18 17:17:12 -07:00
Yann Collet
5cbef6e094 Merge branch 'dev' into staticDictCost 2018-05-18 16:03:06 -07:00
Yann Collet
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
Yann Collet
af3da079d1 fixed minor conversion warning 2018-05-17 17:27:27 -07:00
Yann Collet
8572b4d09f fixed a pretty complex bug when combining ldm + btultra 2018-05-17 16:13:53 -07:00
Yann Collet
134388ba6b collect statistics for first block in ultra mode
this patch makes btultra do 2 passes on the first block,
the first one being dedicated to collecting statistics
so that the 2nd pass is more accurate.

It translates into a very small compression ratio gain :

enwik7, level 20:
blocks  4K : 2.142 -> 2.153
blocks 16K : 2.447 -> 2.457
blocks 64K : 2.716 -> 2.726

On the other hand, the cpu cost is doubled.

The trade off looks bad.
Though, that's ultimately a price to pay to reach better compression ratio.
So it's only enabled when setting btultra.
2018-05-17 12:24:30 -07:00
Yann Collet
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00