Commit Graph

88 Commits

Author SHA1 Message Date
W. Felix Handte
5da9bbc38e Conceivably Dedup ZSTD_noDict and ZSTD_dictMatchState _insertBt1 Impls
By reverting to the bool extDict flag, we call ZSTD_insertBt1 with the same
const args in both non-extDict dictModes.
2018-06-21 11:20:01 -04:00
W. Felix Handte
5d81f71e83 Consistency in Guarding DMS-Only Variable Initializations 2018-06-20 16:54:53 -04:00
W. Felix Handte
9c14eafe3d Also Use matchLow for HC3 Match 2018-06-20 15:51:14 -04:00
W. Felix Handte
0a6cf7cd1d Minor Changes 2018-06-20 15:27:23 -04:00
W. Felix Handte
ae1f3898a2 Remove Dead(!) HC3 DMS Lookup 2018-06-20 15:27:12 -04:00
W. Felix Handte
03c39c540b Fix Incorrect Param 2018-06-19 15:36:33 -04:00
W. Felix Handte
f0a13bcd68 Make Sure Position 0 Gets Into the Tree 2018-06-19 15:10:06 -04:00
W. Felix Handte
87fe4788a3 Fix Compression Ratio Regression #1 2018-06-19 13:01:21 -04:00
W. Felix Handte
4bb79f9c55 Misc Changes 2018-06-19 13:01:21 -04:00
W. Felix Handte
2091f34e9e Find Proper Matches 2018-06-19 13:01:21 -04:00
W. Felix Handte
64348a15f1 Misc Fixes 2018-06-19 13:01:21 -04:00
W. Felix Handte
ade8586ce6 Find mls == 3 Matches 2018-06-19 13:01:21 -04:00
W. Felix Handte
ce743312e2 Fix Typo 2018-06-19 13:01:21 -04:00
W. Felix Handte
a075864756 Switch != ZSTD_extDict to == ZSTD_noDict 2018-06-19 13:01:21 -04:00
W. Felix Handte
1e03377bde Implement RepCode Check 2018-06-19 13:01:21 -04:00
W. Felix Handte
ccbf067973 Add _dictMatchState Functions 2018-06-19 13:01:21 -04:00
W. Felix Handte
d5d8240967 Convert extDict Flag to dictMode Enum 2018-06-19 13:01:21 -04:00
Yann Collet
2d76defbfe grouped all histogram functions into hist.c
renamed functions with HIST_* prefix
2018-06-13 19:49:31 -04:00
Yann Collet
809f2f9322 minor update of literal cost function
just assert() there is no negative cost evaluation for literals
2018-05-29 15:34:50 -07:00
Yann Collet
463a0fe38b simplified optimal parser
removed "cached" structure.
prices are now saved in the optimal table.

Primarily done for simplification.
Might improve speed by a little.
But actually, and surprisingly, also improves ratio in some circumstances.
2018-05-29 14:07:25 -07:00
Yann Collet
e2c0e3d437 slightly nudge choices towards less sequences
also slightly improve some strange detrimental corner cases.
2018-05-25 14:52:21 -07:00
Yann Collet
f6ad59ab5c Merge branch 'dev' into staticDictCost 2018-05-24 16:21:02 -07:00
Nick Terrell
e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
Yann Collet
a8ddf1d370 disable 2-passes strategy 2018-05-22 15:06:36 -07:00
Yann Collet
5cbef6e094 Merge branch 'dev' into staticDictCost 2018-05-18 16:03:06 -07:00
Yann Collet
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
Yann Collet
af3da079d1 fixed minor conversion warning 2018-05-17 17:27:27 -07:00
Yann Collet
8572b4d09f fixed a pretty complex bug when combining ldm + btultra 2018-05-17 16:13:53 -07:00
Yann Collet
134388ba6b collect statistics for first block in ultra mode
this patch makes btultra do 2 passes on the first block,
the first one being dedicated to collecting statistics
so that the 2nd pass is more accurate.

It translates into a very small compression ratio gain :

enwik7, level 20:
blocks  4K : 2.142 -> 2.153
blocks 16K : 2.447 -> 2.457
blocks 64K : 2.716 -> 2.726

On the other hand, the cpu cost is doubled.

The trade off looks bad.
Though, that's ultimately a price to pay to reach better compression ratio.
So it's only enabled when setting btultra.
2018-05-17 12:24:30 -07:00
Yann Collet
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00
Yann Collet
18fc3d3cd5 introduced bit-fractional cost evaluation
this improves compression ratio by a *tiny* amount.
It also reduces speed by a small amount.

Consequently, bit-fractional evaluation is only turned on for btultra.
2018-05-16 14:53:35 -07:00
Nick Terrell
30d9c84b1a Fix failing Travis tests 2018-05-15 09:46:20 -07:00
Yann Collet
2c26df0e13 opt: removed static prices
after testing, it's actually always better to use dynamic prices
albeit initialised from dictionary.
2018-05-14 18:04:08 -07:00
Yann Collet
761758982e replaced FSE_count by FSE_count_simple
to reduce usage of stack memory.

Also : tweaked a few comments, as suggested by @terrelln
2018-05-11 16:03:37 -07:00
Yann Collet
09d0fa29ee minor adjusting of weights 2018-05-10 18:13:48 -07:00
Yann Collet
1a26ec6e8d opt: init statistics from dictionary
instead of starting from fake "default" statistics.
2018-05-10 17:59:12 -07:00
Yann Collet
74b1c75d64 btopt : minor adjustment of update frequencies 2018-05-10 16:32:36 -07:00
Yann Collet
ac6105463a opt: minor improvements to log traces
slight improvement when using fractional-bit evaluation (opt:dictionay)
2018-05-09 15:46:11 -07:00
Yann Collet
c39061cb7b fixed declaration-after-statement warning 2018-05-09 12:07:25 -07:00
Yann Collet
c0da0f5e9e switchable bit-approximation / fractional-bit accuracy modes
also : makes it possible to select nb of fractional bits.
2018-05-09 10:48:09 -07:00
Yann Collet
ba2ad9b6b9 implemented fractional bit cost evaluation
for FSE symbols.

While it seems to work, the gains are negligible compared to rough maxNbBits evaluation.
There are even a few losses sometimes, that still need to be explained.
Furthermode, there are still cases where btlazy2 does a better job than btopt,
which seems rather strange too.
2018-05-08 17:43:13 -07:00
Yann Collet
1aff63b114 opt: shift all costs by 8 bits (* 256)
making it possible to represent fractional bit costs.
2018-05-08 16:19:04 -07:00
Yann Collet
6a3c34aa58 opt: estimate cost of both Hufman and FSE symbols
For FSE symbols : provide an upper bound,
in nb of bits,
since cost function is not able to store fractional bit costs.
2018-05-08 16:11:21 -07:00
Yann Collet
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
Yann Collet
a155061328 minor code refactor for readability
removed some useless operations from optimal parser
(should not change performance, too small a difference)
2018-05-08 12:32:44 -07:00
Nick Terrell
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
Nick Terrell
ab3346af07 Fix hashLog3 size when copying cdict tables 2018-01-31 11:12:17 -08:00
Nick Terrell
887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
Yann Collet
00db4dbbb3 fixed minor argument property for Visual 2017-12-30 15:42:28 +01:00
Yann Collet
5235d8d6ba first implementation of delayed update for btlazy2
This is a pretty nice speed win.

The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.

The baseline performance for btlazy2 on my laptop is :
15#calgary.tar       :   3265536 ->    955985 (3.416),  7.06 MB/s , 618.0 MB/s
15#enwik7            :  10000000 ->   3067341 (3.260),  4.65 MB/s , 521.2 MB/s
15#silesia.tar       : 211984896 ->  58095131 (3.649),  6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)

After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar       :   3265536 ->    958060 (3.408),  9.12 MB/s , 621.1 MB/s
15#enwik7            :  10000000 ->   3078318 (3.249),  6.37 MB/s , 525.1 MB/s
15#silesia.tar       : 211984896 ->  58444111 (3.627),  9.89 MB/s , 680.4 MB/s

That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.

As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.

To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar       :   3265536 ->    956311 (3.415),  8.32 MB/s , 611.4 MB/s
15#enwik7            :  10000000 ->   3067655 (3.260),  5.43 MB/s , 535.5 MB/s
15#silesia.tar       : 211984896 ->  58113144 (3.648),  8.35 MB/s , 679.3 MB/s

aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).

This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
2017-12-28 16:58:57 +01:00