Commit Graph

1668 Commits

Author SHA1 Message Date
senhuang42
1d221ecc03 Add support for representing last literals in the extracted seqs 2020-10-27 11:19:48 -04:00
senhuang42
9171f920cd Improve documentation of seqStore_t 2020-10-27 10:50:22 -04:00
senhuang42
96b0ff7886 Improve documentation regarding various operations in copyBlockSequences 2020-10-27 10:36:06 -04:00
senhuang42
3a11c7eb03 Modify ZSTD_copyBlockSequences to agree with new API 2020-10-27 10:31:40 -04:00
Yann Collet
a0ec50c2dc
Merge pull request #2355 from senhuang42/change_ldm_mt_config
Reduce --long mode MT jobsize at higher levels
2020-10-16 13:35:50 -07:00
senhuang42
f49926edf4 Change cycleLog adjustment to +3 from +4 2020-10-15 09:56:05 -04:00
senhuang42
d0550bb18f Clarify argument names, fix DEBUGLOG() statements 2020-10-14 15:45:43 -04:00
senhuang42
3f99c9b38d Adjust match backwards count args 2020-10-14 15:23:03 -04:00
senhuang42
bf0d559449 Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments() 2020-10-14 12:58:06 -04:00
senhuang42
467e4383b0 Merge branch 'dev' of github.com:senhuang42/zstd into change_ldm_mt_config 2020-10-14 10:17:50 -04:00
Yann Collet
f5d5cd3b40
Merge pull request #2341 from senhuang42/ldm_optimized_for_opt_parser
Integrate long distance matches into optimal parser
2020-10-13 13:09:07 -07:00
Nick Terrell
7e6f91ed84 [minor] Improve docs and add an assert in response to review 2020-10-12 16:43:17 -07:00
senhuang42
354b5f1c0a Use cycleLog instead of chainLog to determine LDM jobLog 2020-10-12 16:09:59 -04:00
Nick Terrell
d5c688e8ae Fix ZSTD_adjustCParams_internal() to handle dictionary logic
Pass in the `ZSTD_cParamMode_e` to select how we define our cparams.
Based on the mode we either take the `dictSize` into account or we set
it to `0`. See the documentation for `ZSTD_cParamMode_e`.

Some of the modes currently share the same behavior. But they have
distinct modes because they are drastically different cases. E.g.
compression + reprocessing the dictionary and creating a cdict.

Additionally, when downsizing the hashLog and chainLog take the
(adjusted) dictionary size into account, since the size of the
dictionary gets added onto the window size.

Adds a simple test to ensure that we aren't downsizing too far.
2020-10-12 12:50:04 -07:00
Nick Terrell
fadaab8c7c [minor improvement] Pass 0 as the content size in the DDS
The DDS structure can't be copied into the working tables like the DMS.
So it doesn't need to account for the source size when sizing its
parameters, just the dictionary size.
2020-10-12 12:47:21 -07:00
Nick Terrell
48ef15fb47 [minor improvement] Pass dictSize when selecting parameters
When selecting parameters in streaming compression with a dictionary use
the dictionary size to select the parameters.
2020-10-12 12:47:19 -07:00
Nick Terrell
012818df99 [refactor] Remove ZSTD_resetCStream_internal()
This function is only called in one place. It isn't a logical separation
of duties, and it was only obsfucating the code now, so inline it.
2020-10-12 12:46:10 -07:00
Nick Terrell
7083f79008 [bug] Fix dictContentType when reprocessing cdict
Conditions to trigger:
* CDict is loaded as raw content.
* CDict starts with the zstd dictionary magic number.
* The CDict is reprocessed (not attached or copied).
* The new API is used (streaming or `ZSTD_compress2()`).

Bug: The dictionary is loaded as a zstd dictionary, not a raw content
dictionary, because the dict content type is set to `ZSTD_dct_auto`.

Fix: Pass in the dictionary content type from cdict creation to the call
to `ZSTD_compress_insertDictionary()`.

Test: Added a test case that exposes the bug, and fixed the raw
content tests to not modify the `dictBuffer`, which makes all future
tests with the `dictBuffer` raw content, which doesn't seem intentional.
2020-10-12 12:46:10 -07:00
senhuang42
d6911b86be Require LDM matches to be strictly greater in length 2020-10-09 12:56:18 -04:00
Yann Collet
12541931fa
Merge pull request #2328 from marxin/zstd-pool-api
Allow external creation of POOLs that can be shared.
2020-10-09 01:00:50 -07:00
Yann Collet
6fdb0cb8d9
Merge pull request #2303 from senhuang42/let_cdict_take_clevel_priority
For ZSTD_compressStream2(), let cdict take compression level priority
2020-10-09 00:48:30 -07:00
senhuang42
b9c8033cde Define kNullRawSeqStore for every file 2020-10-07 19:02:41 -04:00
senhuang42
a6165c1b28 Change matchState_t::ldmSeqStore to pointer 2020-10-07 14:13:57 -04:00
senhuang42
abce708a56 Move posInSequence correction to correct location 2020-10-07 13:56:25 -04:00
senhuang42
0c515590d8 Replace offCode of largest match if ldm's offCode is superior 2020-10-07 13:56:25 -04:00
senhuang42
0fac8e07e1 Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes 2020-10-07 13:56:25 -04:00
senhuang42
a5500cf2af Refactor separate ldm variables all into one struct 2020-10-07 13:56:25 -04:00
senhuang42
0731b94e7c Use kNullRawSeqStore constant in zstdmt_compress.c 2020-10-07 13:56:25 -04:00
senhuang42
0325d878f2 Remove bubbling down matches with longer offCode and same matchLen 2020-10-07 13:56:25 -04:00
senhuang42
031b7ec15f Disable LDM minMatch adjustment when using opt parser 2020-10-07 13:56:25 -04:00
senhuang42
ddf8a3f1b9 Enable inclusion of mid-flight LDMs in opt parser 2020-10-07 13:56:25 -04:00
senhuang42
88f72ed942 Correct incorrect offcode calculation 2020-10-07 13:56:25 -04:00
senhuang42
d8b43a4202 Add explicit conversion of size_t to U32 2020-10-07 13:56:25 -04:00
senhuang42
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
senhuang42
c87d2e5866 Prefix new static ldm helpers with ZSTD_opt 2020-10-07 13:56:25 -04:00
senhuang42
429dec4f42 Add DEBUGLOG() calls in ldm helpers 2020-10-07 13:56:25 -04:00
senhuang42
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
senhuang42
1a687b3fcb Improve documentation of relevant structs 2020-10-07 13:56:25 -04:00
senhuang42
37617e23d7 Correct matchLength calculation and remove unnecessary functions 2020-10-07 13:56:25 -04:00
senhuang42
7dee62c287 Reset ldmSeqStore after initStats_ultra() pass for btultra2 2020-10-07 13:56:25 -04:00
senhuang42
0718aa70df Refactor existing functions to use posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
7348b40a87 Adjustments to ldm_calculateMatchRange() to calculate bounds correctly 2020-10-07 13:56:25 -04:00
senhuang42
a1ef2db5b2 Add ldm_calculateMatchRange() function 2020-10-07 13:56:25 -04:00
senhuang42
ef823e0299 Remove rawSeqStore.base and add rawSeqStore.posInSequence 2020-10-07 13:56:25 -04:00
senhuang42
4793ae3b84 Prevent duplicate LDMs from being inserted 2020-10-07 13:56:25 -04:00
senhuang42
65f9cfeeec Add extra bounds check to prevent heap access after free ASAN error 2020-10-07 13:56:25 -04:00
senhuang42
bff5785fd5 Address mixed variables C90 warning 2020-10-07 13:56:25 -04:00
senhuang42
724b94ed18 ldm_getNextMatch fixed return values 2020-10-07 13:56:25 -04:00
senhuang42
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
senhuang42
78da2e1808 Fixed sifting algorithm 2020-10-07 13:56:25 -04:00