AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
W. Felix Handte	3b82a23a35	Second Repcode Check	2018-06-13 14:58:36 -04:00
W. Felix Handte	a2a24bebec	First Repcode Check	2018-06-13 14:58:36 -04:00
W. Felix Handte	f74c2cd673	Disallow Too-Long Repcodes When Using an Attached Dict	2018-06-13 14:58:36 -04:00
W. Felix Handte	c14db94450	Rename `base` -> `prefixLowest`	2018-06-13 14:58:36 -04:00
W. Felix Handte	5d90708a0a	Go Back to Separate Intermediate Functions for Different Dict Modes	2018-06-13 14:58:36 -04:00
W. Felix Handte	f84fc63a43	Further Templatize Intermediate Functions on dictMode	2018-06-13 14:58:36 -04:00
W. Felix Handte	529d3a5acd	Convert Existing U32 extDict Vars to ZSTD_dictMode Enums	2018-06-13 14:58:36 -04:00
W. Felix Handte	33e2240fac	Attach Dict When Using ZSTD_lazy Strategies	2018-06-13 14:58:36 -04:00
W. Felix Handte	90cfc799e5	Add _dictMatchState Stubs for ZSTD_lazy Functions	2018-06-13 14:58:36 -04:00
W. Felix Handte	a85ecb32bd	Add dictMode Param to ZSTD_compressBlock_lazy_generic	2018-06-13 14:58:36 -04:00
Yann Collet	750ee87a92	Merge pull request #1175 from ryandesign/macos Fix name of macOS	2018-06-13 11:32:06 -04:00
Yann Collet	b2632bcf6c	Merge pull request #1174 from duc0/document_default_level Expose ZSTD_CLEVEL_DEFAULT and update documentation	2018-06-12 12:09:01 -07:00
Duc Ngo	869e2718f6	Line break	2018-06-11 10:02:15 -07:00
Duc Ngo	e8ef725e13	Address comments	2018-06-11 10:01:35 -07:00
Ryan Schmidt	b567ce9d68	Fix name of macOS	2018-06-09 14:31:17 -05:00
Duc Ngo	e34c000e44	Expose ZSTD_CLEVEL_DEFAULT and update documentation	2018-06-08 11:33:44 -07:00
Yann Collet	3050733042	Merge branch 'dev' into negLevels	2018-06-07 15:51:35 -07:00
Yann Collet	c2c47e24e0	support targetlen==0 with strategy==ZSTD_fast to mean "normal compression", targetlen >= 1 now means "disable huffman compression of literals"	2018-06-07 15:49:01 -07:00
Yann Collet	a57b4df85f	removed literalCompression directive in this version, literal compression is always disabled for ZSTD_fast strategy. Performance parity between ZSTD_compress_advanced() and ZSTD_compress_generic()	2018-06-07 15:24:12 -07:00
Yann Collet	8537bfd85c	fuzzer: make negative compression level fail result of ZSTD_compress_advanced() is different from ZSTD_compress_generic() when using negative compression levels because the disabling of huffman compression is not passed in parameters.	2018-06-07 15:12:13 -07:00
Yann Collet	8ef75547ef	Merge pull request #1165 from facebook/ctxSizeDown Dynamic context downsize	2018-06-07 14:44:32 -07:00
Yann Collet	e3c42c739b	clean ZSTD_compress() initialization The (pretty old) code inside ZSTD_compress() was making some pretty bold assumptions on what's inside a CCtx and how to init it. This is pretty fragile by design. CCtx content evolve. Knowledge of how to handle that should be concentrate in one place. A side effect of this strategy is that ZSTD_compress() wouldn't check for BMI2 capability, and is therefore missing out some potential speed opportunity. This patch makes ZSTD_compress() use the same initialization and release functions as the normal creator / destructor ones. Measured on my laptop, with a custom version of bench manually modified to use ZSTD_compress() (instead of the advanced API) : This patch : 1#silesia.tar : 211984896 -> 73651053 (2.878), 312.2 MB/s , 723.8 MB/s 2#silesia.tar : 211984896 -> 70163650 (3.021), 226.2 MB/s , 649.8 MB/s 3#silesia.tar : 211984896 -> 66996749 (3.164), 169.4 MB/s , 636.7 MB/s 4#silesia.tar : 211984896 -> 65998319 (3.212), 136.7 MB/s , 619.2 MB/s dev branch : 1#silesia.tar : 211984896 -> 73651053 (2.878), 291.7 MB/s , 727.5 MB/s 2#silesia.tar : 211984896 -> 70163650 (3.021), 216.2 MB/s , 655.7 MB/s 3#silesia.tar : 211984896 -> 66996749 (3.164), 162.2 MB/s , 633.1 MB/s 4#silesia.tar : 211984896 -> 65998319 (3.212), 130.6 MB/s , 618.6 MB/s	2018-06-07 14:05:25 -07:00
Yann Collet	b27c7389e3	Merge pull request #1164 from GeorgeLu97/CustomMacros Partial Compilation Macros	2018-06-06 16:47:42 -07:00
Yann Collet	24319975b6	bumped version number to v1.3.5	2018-06-06 15:51:55 -07:00
Yann Collet	f1ea383f45	context can be sized down even with constant parameters when parameters are "equivalent", the context is re-used in continue mode, hence needed workspace size is not recalculated. This incidentally also evades the size-down check and action. This patch intercepts the "continue mode" so that the size-down check and action is actually triggered.	2018-06-06 15:04:12 -07:00
Yann Collet	e5e17d009f	changed member name to workSpaceOversizedDuration	2018-06-06 15:00:27 -07:00
Yann Collet	f7392f3dc9	added test case	2018-06-05 14:53:28 -07:00
George Lu	11d5bfdaa9	Revert "Partial compilation test?" This reverts commit `b2496ab606`.	2018-06-05 13:55:36 -07:00
George Lu	b2496ab606	Partial compilation test?	2018-06-05 13:24:00 -07:00
Yann Collet	3d523c741b	added workSpaceTooLarge and workSpaceWasteful also : slightly increased speed of test fuzzer.16	2018-06-05 11:42:48 -07:00
George Lu	b3ef314830	Fix Typos	2018-06-04 17:19:06 -07:00
Yann Collet	357c648c3f	changed a few variable names to unify naming convention	2018-06-04 17:10:50 -07:00
George Lu	609d72b0ca	Added Deprecated Dependencies	2018-06-04 14:33:21 -07:00
George Lu	9437021d2f	Remove old file declaration	2018-06-04 13:32:41 -07:00
George Lu	6a617d70ed	Documentation	2018-06-04 09:56:37 -07:00
George Lu	65de25a463	Created Macros	2018-06-04 09:56:29 -07:00
Yann Collet	2108decb41	Fixed a nasty corruption bug recently introduce into the new dictionary mode. The bug could be reproduced with this command : ./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639 error was in function ZSTD_count_2segments() : the beginning of the 2nd segment corresponds to prefixStart and not the beginning of the current block (istart == src). This would result in comparing the wrong byte.	2018-06-01 18:54:34 -07:00
Yann Collet	143fc9ff6c	Merge pull request #1157 from facebook/decompressedSize minor : improved zstd.h API code comment	2018-06-01 10:28:17 -07:00
Yann Collet	7c33b48221	Merge pull request #1151 from felixhandte/zstd-dfast-in-place-dict-goto ZSTD_dfast: Support Searching the Dictionary Context In-Place (Alternate `goto` Implementation)	2018-05-31 17:37:09 -07:00
W. Felix Handte	48deab92de	Allow Different Dict Attachment Cut-Offs for Different Strategies	2018-05-31 17:37:44 -04:00
W. Felix Handte	f86796639e	Remove Incorrect and Extraneous Repcode Bounds Check	2018-05-31 17:02:29 -04:00
Yann Collet	9b979d0e33	minor : improved API code comment Extend guarantee that ZSTD_getFrameContentSize() will delivering the decompressed size to any single-pass compression function. Answer #1156	2018-05-31 11:12:18 -07:00
Yann Collet	809f2f9322	minor update of literal cost function just assert() there is no negative cost evaluation for literals	2018-05-29 15:34:50 -07:00
Yann Collet	463a0fe38b	simplified optimal parser removed "cached" structure. prices are now saved in the optimal table. Primarily done for simplification. Might improve speed by a little. But actually, and surprisingly, also improves ratio in some circumstances.	2018-05-29 14:07:25 -07:00
Yann Collet	bb6eaf6495	Merge pull request #1153 from facebook/dynThreshold changed dynamic fse threshold for offset	2018-05-26 08:43:45 -07:00
Yann Collet	e916c365a1	fixed minor visual warning	2018-05-25 20:43:09 -07:00
Yann Collet	a7fdceeccd	changed dynamic fse threshold for offset recent experienced showed that default distribution table for offset can get it wrong pretty quickly with the nb of symbols, while it remains a reasonable choice much longer for lengths symbols. Changed the formula, so that dynamic threshold is now 32 symbols for offsets. It remains at 64 symbols for lengths. Detection based on defaultNormLog	2018-05-25 17:41:16 -07:00
Yann Collet	4b3a36d5d8	Merge branch 'dev' into lowCompression	2018-05-25 15:45:03 -07:00
Yann Collet	5f177f1c53	btultra accepts blocks with poorer compression ratio zstd rejects blocks which do not compress by at least a certain amount. In which case, such block is simply emitted uncompressed (even if a little bit of compression could be achieved). This is better for decompression speed, hence for energy. The logic is controlled by ZSTD_minGain(). The rule is applied uniformly, at all compression levels. This change makes btultra accepts blocks with poor compression ratios. We presume that users of btultra mode prefers compression ratio over some decompress speed gains. The threshold for minimum gain is lowered for btultra from s>>6 (~1.5% minimum gain) to s>>7 (~0.8% minimum gain). This is a prudent change. Not sure if it's large enough.	2018-05-25 15:19:52 -07:00
Yann Collet	e2c0e3d437	slightly nudge choices towards less sequences also slightly improve some strange detrimental corner cases.	2018-05-25 14:52:21 -07:00
W. Felix Handte	5b292b5685	Check Long + 1 Matches in Both Prefix and Dict in Bothe Short Match Paths	2018-05-25 13:13:57 -04:00
W. Felix Handte	88b733b380	Interleave Prefix and Dict Searches	2018-05-25 13:13:57 -04:00
W. Felix Handte	1850025156	Refactor ZSTD_dfast to Use `goto`s	2018-05-25 13:13:57 -04:00
W. Felix Handte	43606f9c83	... When I Said "HashTable", I Meant "ChainTable"	2018-05-25 13:13:28 -04:00
W. Felix Handte	ec7efe88f5	Fix Off-By-One Error	2018-05-25 13:13:28 -04:00
W. Felix Handte	2bfe43267e	Disallow Too-Long Repcodes When Using an Attached Dict	2018-05-25 13:13:28 -04:00
W. Felix Handte	b97ad3f457	Port Changes Made to ZSTD_fast to ZSTD_dfast	2018-05-25 13:13:28 -04:00
W. Felix Handte	2313cca1b7	Implement Second Repcode Check	2018-05-25 13:13:28 -04:00
W. Felix Handte	0998f10813	Implement First Repcode Check	2018-05-25 13:13:28 -04:00
W. Felix Handte	50c5b2bb90	Find Dict Hash Table Matches	2018-05-25 13:13:28 -04:00
W. Felix Handte	7a25f7ef5b	Existing Repcode Check Only Applies to noDict Case	2018-05-25 13:13:28 -04:00
W. Felix Handte	8b241da4df	Properly Initialize Repcode Values	2018-05-25 13:13:28 -04:00
W. Felix Handte	7097a03749	Add Necessary Dict Variables	2018-05-25 13:13:28 -04:00
W. Felix Handte	aacbbf4f9a	Rename 'lowest' to 'localLowest' to Prepare to Introduce Dict Indices	2018-05-25 13:13:28 -04:00
W. Felix Handte	c10d1b4011	Skeleton for In-Place Impl for ZSTD_dfast	2018-05-25 13:13:28 -04:00
Yann Collet	f6ad59ab5c	Merge branch 'dev' into staticDictCost	2018-05-24 16:21:02 -07:00
Yann Collet	b5ef32fea7	Merge branch 'dev' into fracFse	2018-05-24 14:09:49 -07:00
Yann Collet	776128d16f	fix corner case when requiring cost of an FSE symbol ensure that, when frequency[symbol]==0, result is (tableLog + 1) bits with both upper-bit and fractional-bit estimates. Also : enable BIT_DEBUG in /tests	2018-05-24 13:59:11 -07:00
Yann Collet	08c5be5db3	Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict ZSTD_fast: Support Searching the Dictionary Context In-Place	2018-05-23 19:32:25 -07:00
Nick Terrell	06b70179da	Work around bug in zstd decoder (#1147 ) Work around bug in zstd decoder Pull request #1144 exercised a new path in the zstd decoder that proved to be buggy. Avoid the extremely rare bug by emitting an uncompressed block.	2018-05-23 18:02:30 -07:00
Nick Terrell	f2d0924b87	Variable declarations	2018-05-23 14:58:58 -07:00
W. Felix Handte	d9c7e67125	Assert that Dict and Current Window are Adjacent in Index Space	2018-05-23 17:53:03 -04:00
W. Felix Handte	298d24fa57	Make loadedDictEnd an Index, not the Dict Len	2018-05-23 17:53:03 -04:00
W. Felix Handte	7ef85e0618	Fixes in re Comments	2018-05-23 17:53:03 -04:00
W. Felix Handte	582b7f85ed	Don't Attach Empty Dict Contents In weird corner cases, they produce unexpected results...	2018-05-23 17:53:03 -04:00
W. Felix Handte	9c92223468	Avoid Undefined Behavior in Match Ptr Calculation	2018-05-23 17:53:03 -04:00
W. Felix Handte	a44ab3b475	Remove Out-of-Date Comment	2018-05-23 17:53:03 -04:00
W. Felix Handte	95bdf20a87	Moar Renames	2018-05-23 17:53:03 -04:00
W. Felix Handte	7e0402e738	Also Attach Dict When Source Size is Unknown	2018-05-23 17:53:03 -04:00
W. Felix Handte	3ba70cc759	Clear the Dictionary When Sliding the Window	2018-05-23 17:53:03 -04:00
W. Felix Handte	b05ae9b608	Refine ip Initialization to Avoid ARM Weirdness	2018-05-23 17:53:03 -04:00
W. Felix Handte	1a7b34ef28	Use New Index Invariant to Simplify Conditionals	2018-05-23 17:53:03 -04:00
W. Felix Handte	2d598e6fed	Force Working Context Indices Greater than Dict Indices	2018-05-23 17:53:03 -04:00
W. Felix Handte	d005e5daf4	Whitespace Fix	2018-05-23 17:53:03 -04:00
W. Felix Handte	154eb09419	Switch to Original Match Calc for noDict Repcode Check	2018-05-23 17:53:03 -04:00
W. Felix Handte	191fc74a51	Rename 'hasDict' to 'dictMode'	2018-05-23 17:53:03 -04:00
W. Felix Handte	ae4fcf7816	Respond to PR Comments; Formatting/Style/Lint Fixes	2018-05-23 17:53:03 -04:00
W. Felix Handte	ca26cecc7a	Rename and Reformat	2018-05-23 17:53:03 -04:00
W. Felix Handte	66bc1ca641	Change Cut-Off to 8 KB	2018-05-23 17:53:03 -04:00
W. Felix Handte	c31ee3c7f8	Fix Rep Code Initialization	2018-05-23 17:53:03 -04:00
W. Felix Handte	b67196f30d	Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff	2018-05-23 17:53:03 -04:00
W. Felix Handte	265c2869d1	Split Wrapper Functions to Cause Inlining	2018-05-23 17:53:03 -04:00
W. Felix Handte	6929964d65	Add bounds check in repcode tests	2018-05-23 17:53:03 -04:00
W. Felix Handte	70a537d1d7	Initial Repcode Check Support for Ext Dict Ctx	2018-05-23 17:53:03 -04:00
W. Felix Handte	8d24ff0353	Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx	2018-05-23 17:53:03 -04:00
W. Felix Handte	d18a405779	Refer to the Dictionary Match State In-Place (Sometimes)	2018-05-23 17:53:03 -04:00
Nick Terrell	c92dd11940	Error if reported size is too large in edge case	2018-05-23 14:47:20 -07:00
Nick Terrell	a97e9a627a	[zstd] Fix decompression edge case This edge case is only possible with the new optimal encoding selector, since before zstd would always choose `set_basic` for small numbers of sequences. Fix `FSE_readNCount()` to support buffers < 4 bytes. Credit to OSS-Fuzz	2018-05-23 12:16:00 -07:00
Nick Terrell	e3959d5eba	Fixes	2018-05-22 16:06:33 -07:00
Yann Collet	7a8b3496b4	Merge branch 'dev' into staticDictCost	2018-05-22 15:10:05 -07:00
Yann Collet	a8ddf1d370	disable 2-passes strategy	2018-05-22 15:06:36 -07:00
Nick Terrell	49cf880513	Approximate FSE encoding costs for selection Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and `set_repeat`, and select the one with the lowest cost. * The cost of `set_basic` is computed using the cross-entropy cost function `ZSTD_crossEntropyCost()`, using the normalized default count and the count. * The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the previous table to see if it is able to represent the distribution. * The cost of `set_compressed` is computed with the entropy cost function `ZSTD_entropyCost()`, together with the cost of writing the normalized count `ZSTD_NCountCost()`.	2018-05-22 14:33:22 -07:00
Yann Collet	27af35c110	Merge pull request #1143 from facebook/tableLevels Update table of compression levels	2018-05-19 14:40:37 -07:00
Yann Collet	5381369cb1	Merge branch 'dev' into tableLevels	2018-05-18 18:23:27 -07:00
Yann Collet	b0b3fb517d	updated compression levels for blocks of 256KB	2018-05-18 17:17:12 -07:00
Nick Terrell	7cbb8bbbbf	[cover] Small compression ratio improvement The cover algorithm selects one segment per epoch, and it selects the epoch size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs gives the algorithm more candidates to choose from for each segment it selects, and then it will loop back to the first epoch when it hits the last one. The trade off is that now it takes longer to select each segment, since it has to look at more data before making a choice. I benchmarked on the following data sets using this command: ```sh $ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR \| wc -c ``` \| Data set \| k (approx) \| Before \| After \| % difference \| \|--------------\|------------\|----------\|----------\|--------------\| \| GitHub \| ~1000 \| 738138 \| 746610 \| +1.14% \| \| hg-changelog \| ~90 \| 4295156 \| 4285336 \| -0.23% \| \| hg-commands \| ~500 \| 1095580 \| 1079814 \| -1.44% \| \| hg-manifest \| ~400 \| 16559892 \| 16504346 \| -0.34% \| There is some noise in the measurements, since small changes to `k` can have large differences, which is why I'm using `steps=256`, to try to minimize the noise. However, the GitHub data set still has some noise. If I run the GitHub data set on my Mac, which presumably lists directory entries in a different order, so the dictionary builder sees the files in a different order, or I use `steps=1024` I see these results. \| Run \| Before \| After \| % difference \| \|------------\|--------\|--------\|--------------\| \| steps=1024 \| 738138 \| 734470 \| -0.50% \| \| MacBook \| 738451 \| 737132 \| -0.18% \| Question: Should we expose this as a parameter? I don't think it is necessary. Someone might want to turn it up to exchange a much longer dictionary building time in exchange for a slightly better dictionary. I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16` with a faster running time.	2018-05-18 16:15:27 -07:00
Yann Collet	5cbef6e094	Merge branch 'dev' into staticDictCost	2018-05-18 16:03:06 -07:00
Yann Collet	a95e9e80d1	adding some debug functions to observe statistics	2018-05-18 14:09:42 -07:00
fbrosson	291824f49d	__builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 18:40:11 +00:00
fbrosson	16bb8f1f9e	Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 17:05:36 +00:00
Yann Collet	af3da079d1	fixed minor conversion warning	2018-05-17 17:27:27 -07:00
Yann Collet	8572b4d09f	fixed a pretty complex bug when combining ldm + btultra	2018-05-17 16:13:53 -07:00
Yann Collet	134388ba6b	collect statistics for first block in ultra mode this patch makes btultra do 2 passes on the first block, the first one being dedicated to collecting statistics so that the 2nd pass is more accurate. It translates into a very small compression ratio gain : enwik7, level 20: blocks 4K : 2.142 -> 2.153 blocks 16K : 2.447 -> 2.457 blocks 64K : 2.716 -> 2.726 On the other hand, the cpu cost is doubled. The trade off looks bad. Though, that's ultimately a price to pay to reach better compression ratio. So it's only enabled when setting btultra.	2018-05-17 12:24:30 -07:00
Yann Collet	a243020d37	slightly improved weight calculation translating into a tiny compression ratio improvement	2018-05-17 11:19:44 -07:00
Yann Collet	63eeeaa1dd	update table levels for blocks <= 16K also : allow hlog to be slighly larger than windowlog, as it's apparently good for both speed and compression ratio.	2018-05-16 16:13:37 -07:00
Yann Collet	18fc3d3cd5	introduced bit-fractional cost evaluation this improves compression ratio by a tiny amount. It also reduces speed by a small amount. Consequently, bit-fractional evaluation is only turned on for btultra.	2018-05-16 14:53:35 -07:00
Yann Collet	9938b17d4c	Merge pull request #1135 from facebook/frameCSize decompress: changed error code when input is too large	2018-05-15 11:02:53 -07:00
Nick Terrell	30d9c84b1a	Fix failing Travis tests	2018-05-15 09:46:20 -07:00
Yann Collet	0b31304c8d	Merge branch 'dev' into staticDictCost	2018-05-14 18:09:26 -07:00
Yann Collet	2c26df0e13	opt: removed static prices after testing, it's actually always better to use dynamic prices albeit initialised from dictionary.	2018-05-14 18:04:08 -07:00
Yann Collet	f372ffc64d	Merge pull request #1127 from facebook/staticDictCost Improved optimal parser with dictionary	2018-05-14 17:45:50 -07:00
Yann Collet	d59cf02df0	decompress: changed error code when input is too large ZSTD_decompress() can decompress multiple frames sent as a single input. But the input size must be the exact sum of all compressed frames, no more. In the case of a mistake on srcSize, being larger than required, ZSTD_decompress() will try to decompress a new frame after current one, and fail. As a consequence, it will issue an error code, ERROR(prefix_unknown). While the error is technically correct (the decoder could not recognise the header of _next_ frame), it's confusing, as users will believe that the first header of the first frame is wrong, which is not the case (it's correct). It makes it more difficult to understand that the error is in the source size, which is too large. This patch changes the error code provided in such a scenario. If (at least) a first frame was successfully decoded, and then following bytes are garbage values, the decoder assumes the provided input size is wrong (too large), and issue the error code ERROR(srcSize_wrong).	2018-05-14 15:32:28 -07:00
Yann Collet	c9227ee16b	update table for 128 KB blocks	2018-05-13 17:15:07 -07:00
Yann Collet	b4250489cf	update compression levels for large inputs	2018-05-13 01:53:38 -07:00
Yann Collet	761758982e	replaced FSE_count by FSE_count_simple to reduce usage of stack memory. Also : tweaked a few comments, as suggested by @terrelln	2018-05-11 16:03:37 -07:00
Yann Collet	3193d692c2	minor patch, ensuring LIBDIR is created before installation follow-up from #1123	2018-05-11 11:31:48 -07:00
Yann Collet	99ddca43a6	fixed wrong assertion base can actually overflow	2018-05-10 19:48:09 -07:00
Yann Collet	0d7626672d	fixed c++ conversion warning	2018-05-10 18:17:21 -07:00
Yann Collet	09d0fa29ee	minor adjusting of weights	2018-05-10 18:13:48 -07:00
Yann Collet	1a26ec6e8d	opt: init statistics from dictionary instead of starting from fake "default" statistics.	2018-05-10 17:59:12 -07:00
Yann Collet	74b1c75d64	btopt : minor adjustment of update frequencies	2018-05-10 16:32:36 -07:00
Yann Collet	ac6105463a	opt: minor improvements to log traces slight improvement when using fractional-bit evaluation (opt:dictionay)	2018-05-09 15:46:11 -07:00
Yann Collet	c39061cb7b	fixed declaration-after-statement warning	2018-05-09 12:07:25 -07:00
Yann Collet	4d5bd32a00	added traces to look at symbol costs evaluation looks correct.	2018-05-09 12:00:12 -07:00
Yann Collet	c0da0f5e9e	switchable bit-approximation / fractional-bit accuracy modes also : makes it possible to select nb of fractional bits.	2018-05-09 10:48:09 -07:00
Yann Collet	ba2ad9b6b9	implemented fractional bit cost evaluation for FSE symbols. While it seems to work, the gains are negligible compared to rough maxNbBits evaluation. There are even a few losses sometimes, that still need to be explained. Furthermode, there are still cases where btlazy2 does a better job than btopt, which seems rather strange too.	2018-05-08 17:43:13 -07:00
Yann Collet	1aff63b114	opt: shift all costs by 8 bits (* 256) making it possible to represent fractional bit costs.	2018-05-08 16:19:04 -07:00
Yann Collet	6a3c34aa58	opt: estimate cost of both Hufman and FSE symbols For FSE symbols : provide an upper bound, in nb of bits, since cost function is not able to store fractional bit costs.	2018-05-08 16:11:21 -07:00
Yann Collet	338f738c24	pass entropy tables to optimal parser for proper estimation of symbol's weights when using dictionary compression. Note : using only huffman costs is not good enough, presumably because sequence symbol costs are incorrect.	2018-05-08 15:37:06 -07:00
Yann Collet	a155061328	minor code refactor for readability removed some useless operations from optimal parser (should not change performance, too small a difference)	2018-05-08 12:32:44 -07:00
Baruch Siach	9a0643b633	lib/Makefile: create include directory before headers installation Make sure that $(INCLUDEDIR) exists before copying the headers there. Otherwise, the contest of header files is copied over $(DESTDIR)$(INCLUDEDIR), making it a regular file. While at it, remove $(DESTDIR)$(INCLUDEDIR) from the list of directories to create in the install-pc target. The install-pc target does not need this directory.	2018-05-08 20:59:44 +03:00
Yann Collet	ad4524d605	fix ZSTD_compressBlock() associated with CDict reported by @let-def. It's actually a bug in ZSTD_compressBegin_usingCDict() which would pass a wrong pledgedSrcSize value (0 instead of ZSTD_CONTENTSIZE_UNKNOWN) resulting in wrong window size, resulting in downsized seqStore, resulting in segfault when writing into the seqStore later in the process. Added a test in fuzzer to cover this use case (fails before the patch).	2018-05-07 12:54:13 -07:00
Peter Seiderer	64bfdca5b9	Split library install target into pc, static, shared and include only target Signed-off-by: Peter Seiderer <ps.report@gmx.net>	2018-04-30 20:32:32 +02:00
Nick Terrell	ca77822ddf	Fix parameter adjustment with dictionary The new advanced API basically set `requestedParams = appliedParams` when using a dictionary. This halted all parameter adjustment, which can hurt compression ratio if, for example, the window log is small for the first call, but the rest of the files are large. This patch fixes the bug, and checks that the `requestedParams` don't change in the new advanced API when using a dictionary, and generally in the fuzzer.	2018-04-25 16:32:29 -07:00
Yann Collet	12f60b8c98	clarified documentation related to refPrefix()	2018-04-25 10:17:06 -07:00
Yann Collet	ace856a835	updated documentation of streaming compression api	2018-04-24 14:44:27 -07:00
taigacon	2c3ad05812	Fix the problem that enables DYNAMIC_BMI2 macro by mistake on ARM architecture with Clang (#1110 )	2018-04-23 15:41:50 -07:00
Nick Terrell	e8c9dc5cea	Fix documentation	2018-04-13 12:43:38 -07:00
Nick Terrell	c0987986e5	Only reset CDict in ZSTD_CCtx_resetParameters()	2018-04-13 11:26:40 -07:00
Nick Terrell	9f76eebd17	Add ZSTD_CCtx_resetParameters() function * Fix docs for `ZSTD_CCtx_reset()`. * Add `ZSTD_CCtx_resetParameters()`. Fixes #1094.	2018-04-12 16:54:07 -07:00
Nick Terrell	3c3f59e68f	Enforce pledgeSrcSize whenever known (#1106 ) The test fails before the patch and passes after. Fixes #1095.	2018-04-12 16:02:03 -07:00
Nick Terrell	280a236e9e	Add ZSTD_CCtx(Param)?_getParameter() function Closes #1096.	2018-04-12 11:50:12 -07:00
Yann Collet	04212178b5	doc : clarified advanced API usage sticky parameters only work with `ZSTD_compress_generic()`	2018-04-10 11:40:36 -07:00
Yann Collet	ad5ba6cdcf	updated comment on parameters that can be changed during compression	2018-04-09 17:39:07 -07:00
Yann Collet	1da629f2ad	Merge pull request #1104 from terrelln/fast-train Allow negative compression levels in training	2018-04-09 14:16:20 -07:00
Nick Terrell	569e2abccd	Allow negative compression levels in training * Set `dictCLevel` in `zstdcli.c`. * Only set to default level if the compression level `== 0`, not `<= 0`.	2018-04-09 12:12:03 -07:00
Yann Collet	4195b36dd7	Merge pull request #1100 from bket/stable_sort zstd requires a stable sort.	2018-04-05 11:39:27 -07:00
Yann Collet	f35b8ba9da	updated ZSTD_p_chainLog description	2018-04-05 11:05:11 -07:00
Björn Ketelaars	462aed6811	zstd requires a stable sort. On OpenBSD qsort() is not guaranteed to be stable, their mergesort() is. This fixes issue #1088. All the hard work has been done by @terrelln.	2018-04-05 07:59:16 +02:00
Yann Collet	55f67502f4	Merge pull request #1098 from terrelln/nd-mt Only load extra table positions for CDicts	2018-04-02 15:38:20 -07:00
Nick Terrell	295ab0dbfa	Only load extra table positions for CDicts Zstdmt uses prefixes to load the overlap between segments. Loading extra positions makes compression non-deterministic, depending on the previous job the context was used for. Since loading extra position takes extra time as well, only do it when creating a `ZSTD_CDict`. Fixes #1077.	2018-04-02 14:41:30 -07:00
Yann Collet	5b616fa269	Merge pull request #1090 from bket/openbsd Fix building zstd on OpenBSD.	2018-04-02 14:15:26 -07:00
Björn Ketelaars	9d3048346d	Fix building zstd on OpenBSD.	2018-03-31 10:46:20 +02:00
Yann Collet	8be984ec45	fixed comments as suggested by @terrelln	2018-03-30 20:09:27 -07:00
Yann Collet	e6e848bfe9	added ZSTD_getFrameHeader_advanced() makes it possible to request frame header from a magicless frame	2018-03-29 17:51:08 -06:00
Yann Collet	a6694838e1	added more code documentation for ZSTD_getFrameHeader()	2018-03-29 15:24:17 -06:00
René Rebe	21eb26d664	fixed legacy/zstd_v* with older gcc version, by guarding builtin_* like in other files	2018-03-25 20:35:15 +02:00
Yann Collet	ad15c1b724	added __has_attribute() define for non-clang compilers	2018-03-23 19:04:48 -07:00
Yann Collet	52ca7c6c56	make DYNAMIC_BMI2 support of clang conditional to __has_attribute() to support older clang versions such as 3.4	2018-03-23 18:45:42 -07:00
Yann Collet	29b021f9a0	Merge pull request #1067 from facebook/targetLength removed limit ZSTD_TARGETLENGTH_MAX	2018-03-22 10:38:33 -07:00
Nick Terrell	ad344033df	Fix broken assertion The `avgJobSize` must not be lower than 256 KB for single-pass mode. In `zstd.h` we say the minimum value for `ZSTD_p_jobSize` is 1 MB, so ensure that we always pick a size >= 1 MB. Found by libFuzzer fuzzer tests with large input limits.	2018-03-21 16:20:30 -07:00
Yann Collet	153bc1c004	removed limit ZSTD_TARGETLENGTH_MAX this makes it possible to specify extremely large negative compression levels, achieving the side effect as "no compression". It will also be possible to define larger targetlength for ultra compression mode. There is no adverse side effect due to removing this limit.	2018-03-21 15:50:05 -07:00
Yann Collet	a99c4a3621	Merge branch 'dev' into advancedDecompress	2018-03-21 06:08:28 -07:00
Yann Collet	87b0cf05bd	Merge pull request #1057 from facebook/lrmSettings LRM parameters	2018-03-21 05:59:39 -07:00
Yann Collet	d1bf609abf	Merge pull request #1059 from terrelln/mt-ldm Integrate ldm with zstdmt	2018-03-20 17:50:20 -07:00
Yann Collet	e0cb8d19c6	fixed legacy test case	2018-03-20 17:48:22 -07:00
Yann Collet	878728dc26	fixed several comments by @terrelln	2018-03-20 16:35:14 -07:00
Yann Collet	e1c52faace	Merge pull request #1060 from facebook/compressImpl merge bmi2 implementation of encodeSequence into zstd_compress.c	2018-03-20 16:19:42 -07:00
Yann Collet	6cda8c932c	added test with ZSTD_decompress_generic() + ZSTD_DCtx_refPrefix() also : clarified stage condition to accept new parameters, fixed initializers correspondingly.	2018-03-20 16:16:13 -07:00
Yann Collet	0dadb6b70d	implemented ZSTD_DCtx_refPrefix*()	2018-03-20 15:45:56 -07:00
Yann Collet	569b8ba4d9	implemented ZSTD_DCtx_refDDict()	2018-03-20 15:43:49 -07:00
Nick Terrell	a3b76a77ef	Quiet appveyor warnings	2018-03-20 15:34:40 -07:00
Yann Collet	6873fec658	changed dictMore for dictContentType which seems clearer to describe what the variable/argument is about.	2018-03-20 15:13:14 -07:00
Yann Collet	31b54b6eea	updated ZSTD_initStaticDDict() prototype can also specify dictContentType.	2018-03-20 14:52:02 -07:00
Nick Terrell	136b9e2392	Fix external sequence corner cases * Clear external sequences when we reset the `ZSTD_CCtx`. * Skip external sequences when a block is too small to compress.	2018-03-20 14:50:28 -07:00
Yann Collet	353117c5d7	implemented ZSTD_DCtx_loadDictionary*() this required updating ZSTD_createDDict_advanced() to accept a dictContentType parameter (raw, full, auto).	2018-03-20 13:40:29 -07:00
Yann Collet	451357f37f	Merge pull request #1058 from facebook/cctxParams updated CCtxParams API	2018-03-20 12:36:12 -07:00
Yann Collet	2ed5af0766	merge bmi2 implementation of encodeSequence into zstd_compress.c	2018-03-19 19:10:31 -07:00
Nick Terrell	d19f803a3b	Fix window size for 1 worker + flushing	2018-03-19 18:56:39 -07:00
Nick Terrell	24d9edbdd8	Set ldmParams to 0 when disabled	2018-03-19 18:23:54 -07:00
Nick Terrell	4b92574feb	Fix corner cases exposed by zstreamtest	2018-03-19 17:54:04 -07:00
Nick Terrell	94c77710a9	Integrate ldm with zstdmt Integrate ldm into zstdmt by running it in serial and in order in the first step of each job, in the same place as the hash gets updated. The input buffer is sized to fit the whole LDM window and 2 full buffers of slack. Input buffers cannot be reused until the LDM step is done with them. After the LDM step is finished, the jobs don't actually have access to the full window, only the overlap. Tested on a few different multi-GB files with and without sanitizers, and with different numbers of threads.	2018-03-19 16:29:03 -07:00
Nick Terrell	aa4dbd09a1	Pull job/overlap log logic into common function (#1055 ) Prepares for LDM integration by separating the job size and overlap logic into helper functions.	2018-03-19 15:56:36 -07:00
Yann Collet	c8b3d389fd	updated CCtxParams API to respect naming convention : ZSTD_CCtxParams_*()	2018-03-19 15:07:26 -07:00
Yann Collet	6f4d0778a5	make it possible to express compression parameters in any order	2018-03-19 14:41:23 -07:00
Nick Terrell	2253d01b27	Move XXH64_update() into worker threads * Computes the XXH hash in the worker threads. * Workers get a sequence number and wait until ther number shows up. On error, ensures that its sequence is finished, so future threads don't get blocked. * Sets up for ldm integration, which will go in the same spot.	2018-03-19 11:08:27 -07:00
Yann Collet	9618c0c804	make it possible to specify LDM parameters in any order	2018-03-19 11:07:04 -07:00
Yann Collet	ec0959e701	Merge branch 'dev' into mt-single	2018-03-18 01:06:31 -07:00
Nick Terrell	4af1fafeb8	Restore setting loadedDictEnd Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`, which means that dictionary compression will only be able to reference the parts of the dictionary within the window. The spec allows us to reference the entire dictionary so long as even one byte is in the window. `ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd` beyond the window, even once the dictionary was out of range. When overflow protection kicked in, the check `current > loadedDictEnd + maxDist` is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset below the value, which would incorrectly allow references beyond the window. This bug is present in `master`, but is very hard to trigger, since it requires both dictionaries and data which triggers overflow correction.	2018-03-16 14:54:06 -07:00
Yann Collet	cbc71e40f6	moving LRM parameters out of experimental section into "normal" range, start pinned at 160.	2018-03-15 17:22:40 -07:00
Nick Terrell	f15a17e19f	Use a single buffer in zstdmt Summary: Allocate a single input buffer large enough to house each job, as well as enough space for the IO thread to write 2 extra buffers. One goes in the `POOL` queue, and one to fill, and then block on a full `POOL` queue. Since we can't overlap with the prefix, we allocate space for 3 extra input buffers. Test Plan: * CI * With and without ASAN/UBSAN run zstdmt with different number of threads on two large binaries, and verify that their checksums match. * Test on the tip of the zstdmt ldm integration. Reviewers: cyan Differential Revision: https://phabricator.intern.facebook.com/D7284007 Tasks: T25664120	2018-03-15 16:21:33 -07:00
Yann Collet	192542b63c	Merge pull request #1047 from facebook/hufCompress removed huf_compress_impl.h	2018-03-15 14:14:03 -07:00
Nick Terrell	a271399c97	Expose reference external sequence API Summary: * Expose the reference external sequences API for zstdmt. Allows external sequences of any length, which get split when necessary. * Reset the LDM window when the context is reset. * Store the maximum number of LDM sequences. * Sequence generation now returns the number of last literals. * Fix sequence generation to not throw out the last literals when blocks of more than 1 MB are encountered. Expose reference external sequence API * Expose the reference external sequences API for zstdmt. * Allows external sequences of any length, which get split when necessary. * Reset the LDM window when the context is reset. * Store the maximum number of LDM sequences. * Sequence generation now returns the number of last literals. * Fix sequence generation to not throw out the last literals when blocks of more than 1 MB are encountered. Test Plan: * CI * Test the zstdmt ldm integration stacked on top of this diff Reviewers: cyan Differential Revision: https://phabricator.intern.facebook.com/D7283968 Tasks: T25664120	2018-03-14 18:07:53 -07:00
Nick Terrell	1908c92c46	Merge remote-tracking branch 'upstream/dev' into extern-seq * upstream/dev: Fix overflow protection with wlog=31	2018-03-14 17:26:31 -07:00
Yann Collet	a909c293c6	Merge branch 'dev' into hufCompress	2018-03-14 16:11:25 -07:00
Nick Terrell	a9a6dcba63	Expose reference external sequence API * Expose the reference external sequences API for zstdmt. Allows external sequences of any length, which get split when necessary. * Reset the LDM window when the context is reset. * Store the maximum number of LDM sequences. * Sequence generation now returns the number of last literals. * Fix sequence generation to not throw out the last literals when blocks of more than 1 MB are encountered.	2018-03-14 12:29:31 -07:00
Nick Terrell	33fb966e56	Fix overflow protection with wlog=31 The overflow protection is broken when the window log is `> (3U << 29)`, so 31. It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`, and the the assertion `current > newCurrent` fails. This happens when the same context is used many times over, but with a large window log, like in zstdmt. Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`. The added test fails before the patch, and passes after.	2018-03-14 11:45:44 -07:00
Yann Collet	4c5cbac179	Merge pull request #1041 from facebook/fasterFast Negative compression levels	2018-03-13 21:32:46 -07:00
Yann Collet	50f763ec44	fixed several comments are underlined by @terrelln	2018-03-13 14:23:14 -07:00
Yann Collet	a95a88af57	removed huf_compress_impl.h re-imported all functions inside huf_compress.c for easier source editing. Also updated a bunch of code comments for clarification.	2018-03-13 14:14:05 -07:00
Yann Collet	bd7bb94361	Merge pull request #1044 from baldurk/remove-utf8-characters Remove non-ASCII characters in header file comments	2018-03-13 13:22:07 -07:00
Baldur Karlsson	430a2fec19	Remove non-ASCII characters in header file comments * Replaced a non-breaking space and an en dash with a plain space and a hyphen. * This means the files are simple ASCII and less likely to run into codepage issues.	2018-03-13 20:05:53 +00:00
Yann Collet	530eeb41a7	Merge pull request #1039 from facebook/zstd_decompress Removed zstd_decompress_impl.h	2018-03-12 18:21:46 -07:00
Yann Collet	2291b85a1e	changed ZSTD_p_literalCompression into ZSTD_p_compressLiterals prefer verb+object construction	2018-03-12 11:44:10 -07:00
Yann Collet	a57d43d4d4	updated documentation of targetLength	2018-03-12 11:35:01 -07:00
Yann Collet	6a9b41b731	create command --fast[=#] access negative compression levels from command line for both compression and benchmark modes. also : ensure proper propagation of parameters through ZSTD_compress_generic() interface. added relevant cli tests.	2018-03-11 20:01:23 -07:00
Yann Collet	a146ee04ae	added negative compression levels negative compression level trade compression ratio for more compression speed. They turn off huffman compression of literals, and use row 0 as baseline with a stepSize = -cLevel. added associated test in fuzzer also added : new advanced parameter ZSTD_p_literalCompression	2018-03-11 05:21:53 -07:00
Yann Collet	facc09aa03	minor compression level adaptation level 12 compresses slightly more and faster due to better btlazy2 mode	2018-03-11 03:06:52 -07:00
Yann Collet	fe321f9e2a	re-integrate ZSTD_decompressSequencesLong() into zstd_decompress.c removed zstd_decompress_impl.h	2018-03-09 19:48:06 -08:00
Yann Collet	89a2ebb971	incorporated ZSTD_decompressSequences() into zstd_decompress()	2018-03-09 19:35:57 -08:00
Yann Collet	cdb1f1433e	incorporated ZSTD_initFseState() inside zstd_decompress.c	2018-03-09 18:16:10 -08:00
Yann Collet	a166eae1ba	incorporate ZSTD_decodeSequenceLong() within zstd_decompress.c	2018-03-09 18:11:14 -08:00
Yann Collet	17626ba56e	restored ZSTD_decodeSequence() into zstd_decompress.c	2018-03-09 18:03:25 -08:00
Yann Collet	51169575a8	Merge pull request #1036 from terrelln/thread-void [threading] Cast unused arguments to void	2018-03-07 12:14:05 -08:00
Nick Terrell	7e103cdaf5	[threading] Cast unused arguments to void	2018-03-06 18:36:40 -08:00
Yann Collet	db147ea620	improved comments following @terrelln suggestions	2018-03-06 18:15:26 -08:00
Yann Collet	06ca9c7d7c	fixed 0-seq blocks in block-decompression mode	2018-03-06 01:50:19 -08:00
Yann Collet	9a91afe6ef	long offset mode : new default threshold for 32-bit	2018-03-05 16:41:08 -08:00
Yann Collet	7bd7a3ad43	long offset mode : new default threshold for 64-bits mode	2018-03-05 16:16:49 -08:00
Yann Collet	c0393a538f	fixed counting long distance weights	2018-03-05 15:12:10 -08:00
Yann Collet	41bd10446e	Merge branch 'dev' into longOffsetMode	2018-03-05 13:10:10 -08:00
Yann Collet	cb789d2df8	re-inserted offset evaluation	2018-03-05 13:08:59 -08:00
Yann Collet	b91ddf0ae6	Merge branch 'dev' into longOffsetMode	2018-03-05 11:59:54 -08:00
Yann Collet	d02b44cf55	DYNAMIC_BMI2 enabled for clang clang only claims compatibility with gcc 4.2. Consequently, recent patch which reserved DYNAMIC_BMI2 for gcc >= 4.8 also disabled it for clang. fix : __clang__ is now enough to enable DYNAMIC_BMI2 (associated with other existing conditions : x64/x64, !bmi2)	2018-03-04 16:05:59 -08:00
Yann Collet	45b09e7625	limit DYNAMIC_BMI2 to gcc >= 4.8 attribute bmi2 not supported by gcc 4.4	2018-03-01 15:02:18 -08:00
Yann Collet	b01552a07a	force inlining of HUF_decodeSymbol*() functions which was not done properly by gcc 4.8 resulting in major performance difference. ex : zstd -b1 silesia.tar before : dec 680 MB/s after : dec 710 MB/s (without bmi2) after : dec 770 MB/s (with DYNAMIC_BMI2)	2018-03-01 11:31:45 -08:00
Yann Collet	ccb7184a76	Merge pull request #1026 from terrelln/lrm-window LDM manages its own window round buffer	2018-02-27 17:09:10 -08:00
Nick Terrell	0a0e64c641	LDM manages its own window round buffer	2018-02-27 12:13:23 -08:00
Yann Collet	2c4d3f339a	Merge pull request #1025 from facebook/huf Huf	2018-02-27 09:57:01 -08:00
Yann Collet	33a3f18848	fixed wrong size test	2018-02-26 18:27:51 -08:00
Yann Collet	89741653ab	added error code workSpace_tooSmall	2018-02-26 15:11:50 -08:00
Yann Collet	6cdf690441	minor cleaning of huff0 Update code documentation, and properly names a few "magic constants". Also, HUF_compress_internal() gets a cleaner way to determine size of tables inside workspace.	2018-02-26 14:52:23 -08:00
Nick Terrell	6b88d592fd	Reduce ZSTD_CHAINLOG_MAX to 29 in 32-bit mode	2018-02-26 13:30:24 -08:00
Nick Terrell	7e5e226cbf	Split the window state into substructure	2018-02-26 13:29:57 -08:00
Yann Collet	50bc2ce95e	Merge pull request #1021 from terrelln/lrm-split Split block compresser out of long range matcher	2018-02-23 17:36:51 -08:00
Yann Collet	653383f74a	minor nit from Mac XCode	2018-02-22 15:44:26 -08:00
Nick Terrell	7e2bf4ebad	Remove long range matcher immediate repcode check The compression ratio gets about 0.01% worse on the files I tested, but the code is much simpler.	2018-02-22 15:18:47 -08:00
Nick Terrell	af866b3a58	Split block compresser out of long range matcher * `ZSTD_ldm_generateSequences()` generates the LDM sequences and stores them in a table. It should work with any chunk size, but is currently only called one block at a time. * `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and instead of encoding the literals directly, it passes them to a secondary block compressor. The code to handle chunk sizes greater than the block size is currently commented out, since it is unused. The next PR will uncomment exercise this code. * During optimal parsing, ensure LDM `minMatchLength` is at least `targetLength`. Also don't emit repcode matches in the LDM block compressor. Enabling the LDM with the optimal parser now actually improves the compression ratio. * The compression ratio is very similar to before. It is very slightly different, because the repcode handling is slightly different. If I remove immediate repcode checking in both branches the compressed size is exactly the same. * The speed looks to be the same or better than before. Up Next (in a separate PR) -------------------------- Allow sequence generation to happen prior to compression, and produce more than a block worth of sequences. Expose some API for zstdmt to consume. This will test out some currently untested code in `ZSTD_ldm_blockCompress()`.	2018-02-22 15:18:41 -08:00
Yann Collet	0fd4df6ed3	Implemented BMI2 functions directly within huf_decompress.c This makes it easier to edit for maintenance and evolutions (I plan to experiment modifications in huffman decompression functions). The methology followed seems broadly applicable to other BMI2 modules. Performance was tracked rigorously at each step, there is no noticeable loss (nor win) of performance compared to `#include` version. Note however that 4X decoder variants tend to be extremely sensitive to code alignment. This source code resulted in pretty good performance for gcc 7.2 and 7.3, but future changes (even in other parts of the code) might trigger the issue again.	2018-02-22 10:51:47 -08:00
Yann Collet	9c5a8040a9	fixed huf_compress workspace size	2018-02-21 11:34:49 -08:00
Yann Collet	010ba5f71f	Merge pull request #1017 from terrelln/c-bmi2 [compress] Support BMI2	2018-02-20 15:34:59 -08:00
Nick Terrell	6e128d3534	[BMI2] Add comments to the bmi2 variable in the contexts	2018-02-20 14:12:11 -08:00
Yann Collet	70163bf0d3	added clarification comments in zstd_errors.h answering some points in #1018	2018-02-20 12:54:49 -08:00
Yann Collet	7117ea8bec	Merge pull request #1011 from terrelln/bmi2 [decompress] Support BMI2	2018-02-15 11:40:34 -08:00
Nick Terrell	b58f01537e	[compress] Support BMI2	2018-02-14 19:20:32 -08:00
Nick Terrell	4319132312	[decompress] Support BMI2	2018-02-13 17:00:15 -08:00
Yann Collet	5cb1144872	fixed --single-thread was incorrectly set to -T0 (use as many cores as possible) previously	2018-02-13 14:56:35 -08:00
Yann Collet	2524cbd847	added code comment on how to generate default tables as suggested by @terrelln	2018-02-13 10:02:25 -08:00
Yann Collet	71c07966bb	added SEQSYMBOL_TABLE_SIZE() as suggested by @terrelln's comment	2018-02-12 16:52:15 -08:00
Yann Collet	5f7495371e	Merge branch 'dev' into fasterDec	2018-02-10 14:24:44 -08:00
Yann Collet	9945e60ac4	Merge branch 'dev' into flexibleLevel	2018-02-10 11:54:49 -08:00
Yann Collet	04a3f85ce7	fixed gcc warning on a switch code path	2018-02-09 16:16:27 -08:00
Yann Collet	af48f0b62b	fix : offset table pointer when using default table	2018-02-09 15:15:46 -08:00
Yann Collet	426944c3e3	fixed strict aliasing issue tuned threshold	2018-02-09 13:24:11 -08:00
Yann Collet	64ee732694	decide long-offset mode based on offcode statistics threshold vaguely estimated	2018-02-09 12:33:28 -08:00
Yann Collet	c72091556b	fixed minor nit as per @terrelln's comments	2018-02-09 09:46:08 -08:00
Yann Collet	4beaeaace5	Merge branch 'dev' into flexibleLevel	2018-02-09 09:15:05 -08:00
Yann Collet	6bfe50ad48	re-enabled ZSTD_decompressSequencesLong()	2018-02-09 09:14:25 -08:00
Yann Collet	1850597eaa	pre-calculated default decoding tables	2018-02-09 06:01:02 -08:00
Yann Collet	ab75df21ed	fixed mono-symbol distribution	2018-02-09 05:12:13 -08:00
Yann Collet	421a2716d8	fixed default fse distributions but would be better to pre-calculate tables, for speed	2018-02-09 04:50:58 -08:00
Yann Collet	95424409ea	addBits and baseline into FSE decoding table note : unfinished - need new default tables - need modify long mode	2018-02-09 04:25:15 -08:00
Yann Collet	de68c2ff10	Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex() as it's faster, due to one memory scan instead of two (confirmed by microbenchmark). Note : as ZSTD_reduceIndex() is rarely invoked, it does not translate into a visible gain. Consider it an exercise in auto-vectorization and micro-benchmarking.	2018-02-07 14:22:35 -08:00
Yann Collet	0170cf9a7a	minor : modified ZSTD_preserveUnsortedMark() to be more vectorization friendly	2018-02-05 11:46:02 -08:00
Yann Collet	94efb1749d	faster decoding in 32-bits mode for long offsets (tentative) On my laptop: Before: ./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S 3#silesia.tar : 211984896 -> 66683478 (3.179), 97.6 MB/s , 400.7 MB/s 3#enwik8 : 100000000 -> 35643153 (2.806), 76.5 MB/s , 303.2 MB/s After: ./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S 3#silesia.tar : 211984896 -> 66683478 (3.179), 97.4 MB/s , 435.0 MB/s 3#enwik8 : 100000000 -> 35643153 (2.806), 76.2 MB/s , 338.1 MB/s Mileage vary, depending on file, and cpu type. But a generic rule is : x86 benefits less from "long-offset mode" than x64, maybe due to register pressure. On "entropy", long-mode is _never_ a win for x86. On my laptop though, it may, depending on file and compression level (enwik8 benefits more from "long-mode" than silesia).	2018-02-04 01:49:31 -08:00
Yann Collet	5188749e1c	ensure compression parameters are updated when only compression level is changed	2018-02-02 16:31:20 -08:00
Yann Collet	4b525af53a	zstdmt: applies new parameters on the fly when invoked from ZSTD_compress_generic()	2018-02-02 15:58:13 -08:00
Yann Collet	90eca318a7	fileio: create dedicated function to generate zstd frames like other formats	2018-02-02 14:24:56 -08:00
Yann Collet	1291d9d7cf	Merge pull request #1006 from systemcrash/patch-2 Update README.md	2018-02-02 10:04:55 -08:00
Yann Collet	209df52ba2	Changed nbThreads for nbWorkers This makes it easier to explain that nbWorkers=0 --> single-threaded mode, while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread). No need for an additional asynchronous mode flag. nbWorkers>=2 works the same as nbThreads>=2 previously.	2018-02-01 19:29:30 -08:00
Yann Collet	4b6a94f0cc	clarified comments on LDM parameters	2018-02-01 17:07:27 -08:00
Yann Collet	60fa90b6c0	zstdmt: added ability to change compression parameters during compression	2018-02-01 16:13:31 -08:00
Nick Terrell	48acaddff9	Test for incorrect pledgeSrcSize earlier	2018-02-01 12:04:05 -08:00
Yann Collet	727bb7f090	Merge pull request #1008 from terrelln/hlog3 Fix hashLog3 size when copying cdict tables	2018-01-31 12:49:07 -08:00
Nick Terrell	ab3346af07	Fix hashLog3 size when copying cdict tables	2018-01-31 11:12:17 -08:00
Yann Collet	823a28a1f4	Merge pull request #1000 from facebook/progressiveFlush Progressive flush	2018-01-30 22:49:47 -08:00
Yann Collet	a2ba629971	fixed function declaration ZSTD_getBlockSize()	2018-01-30 15:03:39 -08:00
Yann Collet	2cb0740b6b	zstdmt: changed naming convention to avoid confusion with blocks. also: - jobs are cut into chunks of 512KB now, to reduce nb of mutex calls. - fix function declaration ZSTD_getBlockSizeMax() - fix outdated comment	2018-01-30 14:43:36 -08:00
systemcrash	6b57387728	Update README.md spelling	2018-01-29 18:42:20 +01:00
Yann Collet	9f8ed23b5b	bumped version number to v1.3.4 also added a paragraph on using compression level with training mode as this is a recurrent question (see for example #1004)	2018-01-27 22:23:26 -08:00
Yann Collet	ba0cd8cf78	fixed minor conversion warning for C++ compilation mode	2018-01-26 18:18:42 -08:00
Yann Collet	caf9e96dc3	job mutex creation is checked	2018-01-26 18:09:25 -08:00
Yann Collet	9c40ae7ff1	zstdmt: there is now one mutex/cond per job	2018-01-26 17:55:08 -08:00
Yann Collet	77e36273de	zstdmt: minor code refactor for clarity	2018-01-26 17:08:58 -08:00
Yann Collet	27c5853c42	zstdmt: job table correctly cleaned after synchronous ZSTDMT_compress()	2018-01-26 14:35:54 -08:00
Yann Collet	0d426f6b83	zstdmt : refactor a few member names for clarity	2018-01-26 13:00:14 -08:00
Yann Collet	79b6e28b0a	zstdmt : flush() only lock to read shared job members Other job members are accessed directly. This avoids a full job copy, which would access everything, including a few members that are supposed to be used by worker only, uselessly requiring additional locks to avoid race conditions.	2018-01-26 12:15:43 -08:00
Yann Collet	d2b62b6fa5	minor : ZSTDMT_writeLastEmptyBlock() is a void function because it cannot fail	2018-01-26 11:06:34 -08:00
Yann Collet	fca13c6855	zstdmt : fixed memory leak writeLastEmptyBlock() must release srcBuffer as mtctx assumes it's done by job worker. minor : changed 2 job member names (src->srcBuffer, srcStart->prefixStart) for clarity	2018-01-26 10:44:09 -08:00
Yann Collet	8e128eaf05	zstdmt : refactor job members grouped by sharing properties	2018-01-26 10:20:38 -08:00
Yann Collet	777d3c1559	fixed minor declaration-after-statement warning	2018-01-25 17:45:18 -08:00
Yann Collet	a1d4041e69	zstdmt: removed job->jobCompleted replaced by equivalent signal job->consumer == job->srcSize. created additional functions ZSTD_writeLastEmptyBlock() and ZSTDMT_writeLastEmptyBlock() required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker. It avoids creating a job with srcSize==0.	2018-01-25 17:35:49 -08:00
Yann Collet	1272d8e760	zstdmt:: renamed mutex and cond to underline they are context-global	2018-01-25 14:52:34 -08:00
Yann Collet	5f349b129c	zstdmt : correctly set end of frame	2018-01-23 15:52:40 -08:00
Yann Collet	c1cc57f270	zstdmt : fix end condition (ZSTD_e_end) When ZSTD_e_end directive is provided, the question is not only "are internal buffers completely flushed", it is also "is current frame completed". In some rare cases, it was possible for internal buffers to be completely flushed, triggering a @return == 0, but frame was not completed as it needed a last null-size block to mark the end, resulting in an unfinished frame.	2018-01-23 15:19:11 -08:00
Yann Collet	de5e38a7a6	zstdmt: fixed minor race condition no real consequence, but pollute tsan tests : job->dstBuff is being modified inside worker, while main thread might read it accidentally because it copies whole job. But since it doesn't used dstBuff, there is no real consequence. Other potential solution : only copy useful data, instead of whole job	2018-01-23 14:03:07 -08:00
Yann Collet	ebd955e26a	zstdmt : fixed ending frame with 0-size block	2018-01-23 13:12:40 -08:00
Yann Collet	6711396d97	zstreamtest : fixed test 32 : multi-thread compression using ZSTD_compress_generic(,,ZSTD_e_end) Since it already provides ZSTD_e_end as directive, it should not be followed by ZSTDMT_endStream().	2018-01-19 22:20:53 -08:00
Yann Collet	a7ef3a219c	zstdmt : fixed last job size	2018-01-19 18:19:09 -08:00
Yann Collet	3ad7d4951c	zstdmt : finally vanquished an elusive and rare race condition	2018-01-19 17:35:08 -08:00
Yann Collet	940634a610	zstdmt : simplify job creation job will not be created when not enough room within job Table	2018-01-19 13:25:06 -08:00
Yann Collet	dc69623453	zstdmt: fixed corruption issue in ZSTDMT_endStream() when invoked directly.	2018-01-19 12:41:56 -08:00
Yann Collet	70f81d6030	zstdmt uses POOL_tryAdd() to call a new worker so that it's no longer a blocking call. This makes it possible to stream out data gradually, while waiting for a worker to become available.	2018-01-19 10:01:40 -08:00
Yann Collet	d19dc1903c	Merge pull request #995 from facebook/progressiveMT Progressive mt	2018-01-18 17:59:49 -08:00
Yann Collet	6f7280fb33	fixed frame checksum issue and race conditions	2018-01-18 16:20:26 -08:00
Yann Collet	997e4d0ccd	added POOL_tryAdd()	2018-01-18 14:39:51 -08:00
Yann Collet	4f43ef731d	Merge branch 'dev' into constCDict	2018-01-18 13:36:43 -08:00
Yann Collet	ef97d5a287	Merge branch 'progressiveMT' into progressiveFlush	2018-01-18 13:35:24 -08:00
Yann Collet	b6ab232f2d	Merge branch 'dev' into progressiveMT	2018-01-18 13:34:56 -08:00
Nick Terrell	9d96761520	Set repcodes for empty ZSTD_CDict When the dictionary is <= 8 bytes, no data is loaded from the dictionary. In this case the repcodes weren't set, because they were inserted after the size check. Fix this problem in general by first setting the cdict state to a clean state of an empty dictionary, then filling the state from there.	2018-01-18 13:28:30 -08:00
Yann Collet	c7190c69cc	fixes for @terrelln comments	2018-01-18 11:15:23 -08:00
Yann Collet	1b5d80d633	zstdmt: added ability to flush current job before it's completed however, zstdmt may still wait on next available worker, so it's not smooth yet.	2018-01-18 11:03:27 -08:00
Yann Collet	aa79c18e3f	fixed a few access contention passes thread sanitizer test	2018-01-17 17:18:19 -08:00
Yann Collet	394eec697b	Introduce ZSTD_getFrameProgression() Produces 3 statistics for ongoing frame compression : - ingested - consumed (effectively compressed) - produced Ingested can be larger than consumed due to buffering effect. For the time being, this patch mostly fixes the % ratio issue, since it computes consumed / produced, instead of ingested / produced. That being said, update is not "smooth", because on a slow enough setting, fileio spends most of its time waiting for a worker to complete its job. This could be improved thanks to more granular flushing i.e. start flushing before ongoing job is fully completed.	2018-01-17 16:39:02 -08:00
Yann Collet	f3b8f90b6d	changed initStatic?Dict() return type to const ZSTD_?Dict* ZSTD_create?Dict() is required to produce a ?Dict* return type because `free()` does not accept a `const type` argument. If it wasn't for this restriction, I would have preferred to create a `const ?Dict` object to emphasize the fact that, once created, a dictionary never changes (hence can be shared concurrently until the end of its lifetime). There is no such limitation with initStatic?Dict() : as stated in the doc, there is no corresponding free() function, since `workspace` is provided, hence allocated, externally, it can only be free() externally. Which means, ZSTD_initStatic?Dict() can return a `const ZSTD_?Dict*` pointer. Tested with `make all`, to catch initStatic's users, which, incidentally, also updated zstd.h documentation.	2018-01-17 14:08:48 -08:00
Yann Collet	b86865323a	Merge branch 'dev' into progressiveMT fixed minor conflict on cdict	2018-01-17 13:51:03 -08:00
Yann Collet	d14cc881b0	zstdmt : fixed very large window sizes would create too large buffers, since default job size == window size * 4. This would crash on 32-bit systems. Also : jobSize being a 32-bit unsigned, it cannot be >= 4 GB, so the formula was failing for large window sizes >= 1 GB. Fixed now : max job Size is 2 GB, whatever the window size.	2018-01-17 12:39:58 -08:00
Yann Collet	58dd7de640	zstdmt: fixed an endless loop on allocation failure this happened on 32-bits build when requiring a too large input buffer, typically on wlog=29, creating jobs of 2 GB size. also : zstd32 now compiles with multithread support enabled by default (can be disabled with HAVE_THREAD=0)	2018-01-17 12:10:15 -08:00
Nick Terrell	16bd0fd4df	Reduce size of ZSTD_CDict Shaves 492,076 B off of the `ZSTD_CDict`. The size of a `ZSTD_CDict` created from a 112,640 B dictionary is: \| Level \| Before (B) \| After (B) \| \|-------\|------------\|-----------\| \| 1 \| 648,448 \| 156,412 \| \| 3 \| 1,140,008 \| 647,932 \|	2018-01-17 11:50:49 -08:00
Yann Collet	cb57c107ff	zstdmt: minor variable renaming, for clarity	2018-01-17 11:39:07 -08:00
Yann Collet	1dba98d563	introduced parameter ZSTD_p_nonBlockingMode This new parameter makes it possible to call streaming ZSTDMT with a single thread set which is non blocking. It makes it possible for the main thread to do other tasks in parallel while the worker thread does compression. Typically, for zstd cli, it means it can do I/O stuff. Applied within fileio.c, this patch provides non-negligible gains during compression. Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9 With traditional single-thread blocking mode : real 0m9.557s user 0m8.861s sys 0m0.538s With new single-worker non blocking mode : real 0m7.938s user 0m8.049s sys 0m0.514s => 20% faster	2018-01-16 16:15:47 -08:00
Yann Collet	6025465e42	ZSTDMT : minor CCtx memory optimization can be useful when a compression job only has small amount of data to compress.	2018-01-16 15:34:41 -08:00
Yann Collet	2e23333094	ZSTDMT can now work in non-blocking mode with 1 thread it still fallbacks to single-thread blocking invocation when input is small (<1job) or when invoking ZSTDMT_compress(), which is blocking. Also : fixed a bug in new block-granular compression routine.	2018-01-16 15:28:43 -08:00
Yann Collet	8e83c5c910	Merge branch 'dev' into progressiveMT	2018-01-16 12:54:33 -08:00
Nick Terrell	aae267a2e1	Reorganize block state	2018-01-16 11:17:50 -08:00
Nick Terrell	887cd4e35e	Split ZSTD_CCtx into smaller sub-structures	2018-01-16 11:17:50 -08:00
Yann Collet	9477f6529d	Merge pull request #984 from terrelln/dict-load Load more dictionary positions into table if empty	2018-01-13 13:20:42 -08:00
Yann Collet	58ecf13e02	zstdmt : can compress at block granularity offering perspective of more accurate progression report.	2018-01-13 13:18:57 -08:00
Nick Terrell	9a211d1f05	Load more dictionary positions into table if empty If the hash table is empty load positions into the hash table that we would otherwise skip. \| Level \| Data Set \| Improvement \| \|-------\|--------------\|-------------\| \| 1 \| github \| 0.44% \| \| 1 \| hg-changelog \| 0.13% \| \| 1 \| hg-commands \| 1.28% \| \| 1 \| hg-manifest \| 0.70% \| \| 3 \| github \| 0.74% \| \| 3 \| hg-changelog \| 0.87% \| \| 3 \| hg-commands \| 1.74% \| \| 3 \| hg-manifest \| 0.23% \|	2018-01-12 16:17:22 -08:00
Yann Collet	863b2f8db4	Merge pull request #983 from terrelln/dict-wlog Increase windowLog from CDict based on the srcSize when known	2018-01-12 07:47:43 -08:00
Nick Terrell	b610b777d3	Increase windowLog from CDict based on the srcSize when known	2018-01-11 16:23:21 -08:00
Yann Collet	cacf47cbee	Merge branch 'dev' into dubtlazy and fixed conflicts	2018-01-11 13:25:08 -08:00
Yann Collet	04c00f9388	Merge pull request #982 from facebook/fix304 Fix for #304 and #977 : error during dictionary creation	2018-01-11 13:20:59 -08:00
Yann Collet	b9a14900ff	changed function name to ZSTD_DUBT_findBestMatch()	2018-01-11 12:38:31 -08:00
Yann Collet	752bae4a48	added warning message when pathological dataset is detected (note : cover_optimize needs -v to display the warning)	2018-01-11 11:29:28 -08:00
Yann Collet	e8093dde09	fixed #304 Pathological samples may result in literal section being incompressible. This case is now detected, and literal distribution is replaced by one that can be written into the dictionary.	2018-01-11 11:16:32 -08:00
Yann Collet	218e9fe0fc	added a test case for dictBuilder failure cyclic data set makes the entropy stage fails now, onto a fix for #304 ...	2018-01-11 09:42:38 -08:00
Yann Collet	ff795580f2	fixed bug #976 , reported by @indygreg constants in zstd.h should not depend on MIN() macro which existence is not guaranteed. Added a test to check the specific constants. The test is a bit too specific. But I have found no way to control a more generic "are all macro already defined" condition, especially as this is a valid construction (the missing macro might be defined later, intentionnally).	2018-01-10 20:33:45 -08:00
Yann Collet	292eeb672f	api doc : grouped all ZSTD_create*_advanced() functions together in a new "custom memory allocator" paragraph which is itself part of "memory management" category. This makes it simpler to see the relation between the type and its usages.	2018-01-10 09:07:47 -08:00
Yann Collet	3ea156368c	API doc : grouped ZSTD_initStatic*() together within "memory management" category.	2018-01-10 08:49:50 -08:00
Yann Collet	b17fb488b0	fixed msan test a pointer calculation was wrong in a corner case	2018-01-06 20:50:36 +01:00
Yann Collet	658d6b8588	Merge branch 'dev' into dubtlazy	2018-01-06 12:40:58 +01:00
Yann Collet	a927fae2a1	fixed ZSTD_reduceIndex() following suggestions from @terrelln. Also added some comments to present logic behind ZSTD_preserveUnsortedMark().	2018-01-06 12:31:26 +01:00
Yann Collet	2eff217136	updated /lib documentation	2017-12-31 15:50:00 +01:00
Yann Collet	00db4dbbb3	fixed minor argument property for Visual	2017-12-30 15:42:28 +01:00
Yann Collet	f597f55675	improved btlazy2 : list of unsorted candidates can reach extDict It used to stop on reaching extDict, for simplification. As a consequence, there was a small loss of performance each time the round buffer would restart from beginning. It's not a large difference though, just several hundreds of bytes on silesia. This patch fixes it.	2017-12-30 15:12:59 +01:00
Yann Collet	a68b76afef	updated compression level table for btlazy2 now selected for levels 13, 14 and 15. Also : dropped the requirement for monotonic memory budget increase of compression levels,, which was required for ZSTD_estimateCCtxSize() in order to ensure that a memory budget for level L is large enough for any level <= L. This condition is now ensured at run time inside ZSTD_estimateCCtxSize().	2017-12-30 11:40:35 +01:00
Yann Collet	eb52e2f45e	simplify ZSTD_preserveUnsortedMark() implementation since no compiler attempts to auto-vectorize it.	2017-12-30 11:13:52 +01:00
Yann Collet	d228b6b0d0	btlazy2 : optimization for dictionary compression we want the dictionary table to be fully sorted, not just lazily filled. Dictionary loading is a bit more intensive, but it saves cpu cycles for match search during compression.	2017-12-29 19:14:18 +01:00
Yann Collet	02f64ef955	btlazy2: fixed interaction between unsortedMark and reduceTable	2017-12-29 19:08:51 +01:00
Yann Collet	64482c2c97	fixed bug in dubt the chain of unsorted candidates could grow beyond lowLimit.	2017-12-29 17:04:37 +01:00
Yann Collet	f36da5b4d9	minor speed optimization : index overflow prevention new code supposed to be easier to auto-vectorize	2017-12-29 14:40:33 +01:00
Yann Collet	5235d8d6ba	first implementation of delayed update for btlazy2 This is a pretty nice speed win. The new strategy consists in stacking new candidates as if it was a hash chain. Then, only if there is a need to actually consult the chain, they are batch-updated, before starting the match search itself. This is supposed to be beneficial when skipping positions, which happens a lot when using lazy strategy. The baseline performance for btlazy2 on my laptop is : 15#calgary.tar : 3265536 -> 955985 (3.416), 7.06 MB/s , 618.0 MB/s 15#enwik7 : 10000000 -> 3067341 (3.260), 4.65 MB/s , 521.2 MB/s 15#silesia.tar : 211984896 -> 58095131 (3.649), 6.20 MB/s , 682.4 MB/s (only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt) After this patch, and keeping all parameters identical, speed is increased by a pretty good margin (+30-50%), but compression ratio suffers a bit : 15#calgary.tar : 3265536 -> 958060 (3.408), 9.12 MB/s , 621.1 MB/s 15#enwik7 : 10000000 -> 3078318 (3.249), 6.37 MB/s , 525.1 MB/s 15#silesia.tar : 211984896 -> 58444111 (3.627), 9.89 MB/s , 680.4 MB/s That's because I kept `1<<searchLog` as a maximum number of candidates to update. But for a hash chain, this represents the total number of candidates in the chain, while for the binary, it represents the maximum depth of searches. Keep in mind that a lot of candidates won't even be visited in the btree, since they are filtered out by the binary sort. As a consequence, in the new implementation, the effective depth of the binary tree is substantially shorter. To compensate, it's enough to increase `searchLog` value. Here is the result after adding just +1 to searchLog (level 15 setting in this patch): 15#calgary.tar : 3265536 -> 956311 (3.415), 8.32 MB/s , 611.4 MB/s 15#enwik7 : 10000000 -> 3067655 (3.260), 5.43 MB/s , 535.5 MB/s 15#silesia.tar : 211984896 -> 58113144 (3.648), 8.35 MB/s , 679.3 MB/s aka, almost the same compression ratio as before, but with a noticeable speed increase (+20-30%). This modification makes btlazy2 more competitive. A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.	2017-12-28 16:58:57 +01:00
Yann Collet	473362e922	Merge pull request #958 from facebook/continueCCtx fix a subtle issue in continue mode	2017-12-20 00:12:50 +01:00
Yann Collet	cafedcbbe4	ZSTD_resetCCtx_internal: fixed order of arguments params1 was swapped with params2. This used to be a non-issue when testing for strict equality, but now that some tests look for "sufficient size" `<=`, order matters.	2017-12-19 21:49:04 +01:00
Yann Collet	9096088f45	changed variable name for clarity, suggested by @terrelln	2017-12-19 21:20:46 +01:00
Yann Collet	f299fa39ac	fix a subtle issue in continue mode The deep fuzzer tests caught a subtle bug that was probably there for a long time. The impact of the bug is not a crash, or any other clear error signal, rather, it reduces performance, by cutting data into smaller blocks. Eventually, the following test would fail because it produces too many 1-byte blocks, requiring more space than buffer can provide : `./zstreamtest_asan --mt -s3514 -t1678312 -i1678314` The root scenario is as follows : - Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1` - The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`. `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`. The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes. - Complete compression - New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN` trigger "continue mode" - Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only, but in fact, it also depends on `pledgedSrcSize`. - The "old" blocksize (1) is still there, next compression will use this value to cut input into blocks, resulting in more blocks and worse performance than necessary performance. Given the scenario, and its possible variants, I'm surprised it did not show up before. But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger. The above test is a special corner case, where performance is so impacted that it reaches an error case. The fix works, but I'm not completely pleased. I think the current code relies too much on implied relations between variables. This will likely break again in the future when some related part of the code change. Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3. So a longer term fix will have to be considered after the release. To do : create a reliable test case which triggers this scenario for CI tests.	2017-12-19 09:43:03 +01:00
Yann Collet	5c2f2ebfdb	zstdmt via compress_generic: reduce opportunity to free/create mtctx `zstreamtest --newapi` (and `--opaqueapi`) create and destroy way too many threads resulting in failure of tsan tests, and potentially connected to the qemu flaky tests. This is because, at each test, the nb of threads can be changed (random). The `--no-big-tests` directive reduce this choice to 1/2 threads, in order to limit memory usage, especially for qemu and 32-bits builds. Unfortunately, swapping between 1 and 2 threads is enough to constantly create/destroy new mtctx. This patch takes advantage of the following property : via compress_generic, no internal mtctx is needed for nbThreads < 2. As a consequence, when nbThreads == 2, the currently active mtctx is necessarily good. This dramatically reduces the nb of thread creations when invoking `zstreamtest --newapi --no-big-tests` (only when parent cctx itself is created, which is randomized to 1/256 tests). Expected outcome : - at a minimum : tsan tests shall now work continuously without exploding the thread counter - at best : flaky qemu tests on `zstreamtest --newapi --no-big-tests` may stop being flaky, due to less stress from constant thread creation/destruction Real world impact : minimal, I don't expect users to constantly change `nbThreads` between each invocation. If `nbThreads` remains stable, existing implementation re-uses existing mtctx. Also : `zstreamtest --newapi` but without `--no-big-tests` doesn't benefit as much, since this test can select a random `nbThreads` value between 1 and 4. The current patch only reduces opportunity to free/create mtctx (for example : 2->1->2 doesn't need a new mtctx) but doesn't completely eliminate it, since `nbThreads` can still change between 2/3/4. A more complete solution could be to only use 2 out of 4 allocated threads, thus keeping the pool at a constant size. This would require a larger change to `POOL_*` api though.	2017-12-16 12:48:13 -08:00
Yann Collet	3cbfac1cdb	updated levels 15-20 taking advantage of `btopt` improved speed to tune parameters. Levels 16-19 are stronger than previous release, making the graph more favorable. In theory, I should also update small-size tables, but I got lazy on that one ...	2017-12-14 23:29:00 -08:00
Yann Collet	2cff66b62f	version bump to v1.3.3	2017-12-14 16:11:20 -08:00
Yann Collet	8c41a9cb1e	Merge pull request #951 from facebook/lastBlock saves 3-bytes on small input with streaming API	2017-12-14 15:39:50 -08:00
Yann Collet	a0ac8c895c	Merge pull request #950 from facebook/srcSizeAdaptation fix adaptation on srcSize	2017-12-14 14:48:31 -08:00
Yann Collet	281f06e01f	saves 3-bytes on small input with streaming API zstd streaming API was adding a null-block at end of frame for small input. Reason is : on small input, a single block is enough. ZSTD_CStream would size its input buffer to expect a single block of this size, automatically triggering a flush on reaching this size. Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`). The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame. The solution is to not flush automatically, which is btw the expected behavior. It happens in this case because blocksize is defined with exactly the same size as input. Just adding one-byte is enough to stop triggering the automatic flush. I initially looked at another solution, solving the problem directly in the compression context. But it felt awkward. Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame on reaching its expected end (`pledgedSrcSize`). This feels awkward, a responsability over-reach, beyond the definition of this API. ZSTD_compressContinue() is clearly documented as a guaranteed flush, with ZSTD_compressEnd() generating a guaranteed end. I faced similar issue when trying to port a similar mechanism at the higher streaming layer. Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller, since it did not explicitly requested an end of frame. The only sensible action remaining after that is to end the frame with no additional input. This adds additional logic in the ZSTD_CStream state to check this condition. Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?) In the end, just enlarging input buffer by 1 byte feels the least intrusive change. It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code. The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.	2017-12-14 11:47:02 -08:00
Yann Collet	c005df136f	Merge pull request #947 from facebook/fix944 Fix #944	2017-12-14 10:01:52 -08:00
Yann Collet	2e97a6d464	fixed minor declaration-after-statement warning	2017-12-13 18:50:05 -08:00
Yann Collet	5432ef6921	fixes adaptation on srcSize This patch restores capability for each file to receive adapted compression parameters depending on its size. The bug breaking this feature was relatively silly : setting a parameter with a value "0" is supposed to be a no-op. Unfortunately, it would pin down compression parameters as if they were manually set, preventing later automatic adaptation. Unfortunately, I'm currently short of a test case that could check this situation and trigger an error. Compression parameters selection between tableID 0,1,2,3 is largely internal, leaving no trace to outside world, not even in frame header.	2017-12-13 17:45:26 -08:00
Yann Collet	d23eb9a098	zstreamtest : added missing CHECK_Z()	2017-12-13 15:35:49 -08:00
Nick Terrell	22727a7467	Fix cdict compressor repcodes	2017-12-13 11:31:20 -08:00
Yann Collet	e28305fcca	fix #944 : ZSTDMT with large files and dictionary now works correctly windowLog is now enforced from provided compression parameters, instead of being copied blindly from `cdict` where it could be smaller. also : - fix a minor bug in zstreamtest --mt : advanced parameters must be set before init - changed advanced parameter name to ZSTDMT_jobSize	2017-12-12 18:04:58 -08:00
Yann Collet	03832b7aa5	re-added test case messing with revert ... :(	2017-12-12 14:01:54 -08:00
Yann Collet	8a104fda05	Revert "Created a test case which reliably reproduces bug #944 " This reverts commit `5098d1fbe2`.	2017-12-12 12:51:49 -08:00
Yann Collet	5098d1fbe2	Created a test case which reliably reproduces bug #944 in zstreamtest.	2017-12-12 12:48:31 -08:00
Yann Collet	ac8e022806	Merge pull request #943 from facebook/fix942 Fix #942	2017-12-08 13:53:08 -05:00
Yann Collet	dfc697e967	comment clarification	2017-12-08 12:16:49 -05:00
Yann Collet	c029ee1f0b	ZSTD_initCStream_srcSize() considers "0" to mean "unknown" to not break existing programs relying on this behavior. Might be changed to mean "empty" in the future.	2017-12-07 17:13:10 -05:00
Yann Collet	3aa2b27a89	fix #942 : streaming interface does not compress after ZSTD_initCStream() While the final result is still, technically, a frame, the resulting frame expands initial data instead of compressing it. This is because the streaming API creates a tiny 1-byte buffer for input, because it believes input is empty (0-bytes), because in the past, 0 used to mean "unknown" instead. This patch fixes the issue. Todo : add a test which traps the issue.	2017-12-07 02:52:50 -05:00
Yann Collet	c173dbd6e7	no longer supported starting C++17	2017-12-04 18:00:53 -08:00
Yann Collet	7e05ef851a	Merge branch 'dev' into qemu32panic	2017-12-03 11:14:36 -08:00
Yann Collet	5e1f34b7e4	setParameter : no side-effect on setting a compression parameter last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow. It is no a useless operation, so it's removed. No side-effect left when setting a compression parameter.	2017-12-01 21:17:09 -08:00
Yann Collet	78290874a5	fixed Visual warning on minor interface discrepancy	2017-11-29 17:01:14 -08:00
Yann Collet	d3c59edac9	removed long-range-mode tests from `zstreamtest --no-big-tests`	2017-11-29 16:42:20 -08:00
Yann Collet	998a93b784	simplified ZSTD_CCtx_setParametersUsingCCtxParams() Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action. Any action shall be taken at parameter application only (during init). It makes it possible to just copy CCtxParams from external container to internal state, and get rid of the more complex code which was trying to compensate for missing actions.	2017-11-29 16:13:05 -08:00
Yann Collet	f98ee994c4	zstd_opt: added comments, as requested by @terrelln	2017-11-29 15:19:00 -08:00
Yann Collet	bc42bc3b1d	removed one invocation of SET_PRICE() macro	2017-11-28 16:08:56 -08:00
Yann Collet	0a0a212934	zstd_opt: changed cost formula There was a flaw in the formula which compared literal cost with match cost : at a given position, a non-null literal suite is going to be part of next sequence, while if position ends a previous match, to immediately start another match, next sequence will have a litlength of zero. A litlength of zero has a non-null cost. It follows that literals cost should be compared to match cost + litlength==0. Not doing so gave a structural advantage to matches, which would be selected more often. I believe that's what led to the creation of the strange heuristic which added a complex cost to matches. The heuristic was actually compensating. It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar). The problem with this heuristic is that it's hard to understand, and unfortunately, any future change in the parser would impact the way it should be calculated and its effects. The "proper" formula makes it possible to remove this heuristic. Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse. Note that all differences are small (< 0.01 ratio). In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7). I suspect that's because starting statistics are pretty poor (another area of improvement). However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...). It's a pity that zstd -22 gets worse on silesia.tar. That being said, I like that the new code gets rid of strange variables, which were introducing complexity for any future evolution (faster variants being in mind). Therefore, in spite of this detrimental side effect, I tend to be in favor of it.	2017-11-28 14:07:03 -08:00
Yann Collet	b71405dc51	removed a bunch of code related to cached literal price optState was used both to evaluate price and to cache cost of previously calculated literals. This created a strong dependency, forcing parser to request cost in a strict order. This limitation is forbids future parser with skipping capabilities. After this patch, caching literals price still exists, but is now explicit, in a stack structure.	2017-11-28 12:32:24 -08:00
Yann Collet	03f30d9dcb	separate rawLiterals, fullLiterals and match costs removed one SET_PRICE() macro invocation	2017-11-28 12:14:46 -08:00
Yann Collet	eee87cd6f2	btopt: minor refactor : removed one SET_PRICE() macro invocation direct assignment makes operation cleaner. Also allows some (very minor) optimization (non-measurable)	2017-11-27 17:18:57 -08:00
Yann Collet	e9d1987fd7	btopt: minor speed optimization matchPrice is always right at beginning	2017-11-27 17:01:51 -08:00
Yann Collet	bd88f633ac	zstreamtest : in `-T#s`, s considered a suffix meaning "seconds" avoid unintentionnally triggering `seedset`, so that seed gets automatically determined when not set.	2017-11-27 12:15:23 -08:00

... 6 7 8 9 10 ...

2693 Commits