AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yann Collet	f299fa39ac	fix a subtle issue in continue mode The deep fuzzer tests caught a subtle bug that was probably there for a long time. The impact of the bug is not a crash, or any other clear error signal, rather, it reduces performance, by cutting data into smaller blocks. Eventually, the following test would fail because it produces too many 1-byte blocks, requiring more space than buffer can provide : `./zstreamtest_asan --mt -s3514 -t1678312 -i1678314` The root scenario is as follows : - Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1` - The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`. `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`. The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes. - Complete compression - New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN` trigger "continue mode" - Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only, but in fact, it also depends on `pledgedSrcSize`. - The "old" blocksize (1) is still there, next compression will use this value to cut input into blocks, resulting in more blocks and worse performance than necessary performance. Given the scenario, and its possible variants, I'm surprised it did not show up before. But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger. The above test is a special corner case, where performance is so impacted that it reaches an error case. The fix works, but I'm not completely pleased. I think the current code relies too much on implied relations between variables. This will likely break again in the future when some related part of the code change. Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3. So a longer term fix will have to be considered after the release. To do : create a reliable test case which triggers this scenario for CI tests.	2017-12-19 09:43:03 +01:00
Yann Collet	0a0a212934	zstd_opt: changed cost formula There was a flaw in the formula which compared literal cost with match cost : at a given position, a non-null literal suite is going to be part of next sequence, while if position ends a previous match, to immediately start another match, next sequence will have a litlength of zero. A litlength of zero has a non-null cost. It follows that literals cost should be compared to match cost + litlength==0. Not doing so gave a structural advantage to matches, which would be selected more often. I believe that's what led to the creation of the strange heuristic which added a complex cost to matches. The heuristic was actually compensating. It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar). The problem with this heuristic is that it's hard to understand, and unfortunately, any future change in the parser would impact the way it should be calculated and its effects. The "proper" formula makes it possible to remove this heuristic. Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse. Note that all differences are small (< 0.01 ratio). In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7). I suspect that's because starting statistics are pretty poor (another area of improvement). However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...). It's a pity that zstd -22 gets worse on silesia.tar. That being said, I like that the new code gets rid of strange variables, which were introducing complexity for any future evolution (faster variants being in mind). Therefore, in spite of this detrimental side effect, I tend to be in favor of it.	2017-11-28 14:07:03 -08:00
Yann Collet	cdade555ee	fixed one UB pointer arithmetic	2017-11-17 11:40:08 -08:00
Yann Collet	4202b2e8a6	merged rep search into btMatchSearch but there is a tree corruption somewhere ... bug hunt ongoing	2017-11-14 20:38:52 -08:00
Yann Collet	9a11f70dc3	merged repcode search into BT match search this version has same speed as branch `opt` which is itself 5-10% slower than branch `dev` (no identified reason) It does not compress exactly the same as `opt` or `dev`, maybe because it doesn't stop search after repcodes, leading to sometimes better compression, sometimes worse (by a small margin). warning : _extDict path does not work for the time being This means that benchmark module works, but file module will fail with large files (and high compression level). Objective is to fuse _extDict path into current one, in order to have a single parser to maintain.	2017-11-13 02:23:48 -08:00
Yann Collet	4191efa993	zstd_opt: ensure sufficient_len < ZSTD_OPT_NUM to simplify some tests	2017-11-08 11:24:00 -08:00
Yann Collet	8b6aecf2cb	moved a few structures from `zstd_internal.h` to `zstd_compress.h` which is a more precise scope	2017-11-07 16:03:14 -08:00
Yann Collet	54a827fff0	Merge branch 'dev' into newFormats Fixed conflicts in zstdmt_compress.c	2017-09-27 16:39:40 -07:00
Yann Collet	df4e9bba25	fixed constant errors for gcc in c99 mode C standard does not consider a `static const int` as a constant. This is a problem for initializer, and ZSTD_STATIC_ASSERT(). Replaced by macro values	2017-09-26 14:31:06 -07:00
Yann Collet	9f0b8dfbe9	Merge branch 'dev' into newFormats	2017-09-26 14:22:39 -07:00
Nick Terrell	c233bdbaee	Increase maximum window size * Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail on my Mac. * Maximum window size in 64-bit mode is 2GB, since that is the largest power of 2 that works with the overflow prevention. * Allow `--long=windowLog` to set the window log, along with `--zstd=wlog=#`. These options also set the window size during decompression, but don't override `--memory=#` if it is set. * Present a helpful error message when the window size is too large during decompression. * The long range matcher defaults to a hash log 7 less than the window log, which keeps it at 20 for window log 27. * Keep the default long range matcher window size and the default maximum window size at 27 for the API and CLI. * Add tests that use the maximum window size and hash size for compression and decompression.	2017-09-26 14:00:01 -07:00
Yann Collet	b8d4a3887f	introduced constant ZSTD_frameIdSize within zstd_internal.h This is the size of magic number. Avoids using `4` directly in source code, which is a bit less meaningful.	2017-09-25 15:26:18 -07:00
Nick Terrell	bbe77212ef	[libzstd] Increase MaxOff	2017-09-25 13:36:18 -07:00
Yann Collet	7c3dea42ce	added prototypes for advanced parameters for decompression API required to decode custom formats	2017-09-24 15:57:29 -07:00
Stella Lau	eb3327c10a	Merge branch 'dev' of https://github.com/facebook/zstd into ldm-mergeDev	2017-09-11 15:00:01 -07:00
Yann Collet	3128e03be6	updated license header to clarify dual-license meaning as "or"	2017-09-08 00:09:23 -07:00
Stella Lau	67d4a6161c	Add ldmBucketSizeLog param	2017-09-02 21:55:29 -07:00
Stella Lau	a1f04d518d	Move hashEveryLog to cctxParams and update cli	2017-09-01 15:05:47 -07:00
Stella Lau	767a0b3be1	Move ldm hashLog, bucketLog, and mml to cctxParams	2017-09-01 12:24:59 -07:00
Stella Lau	17d8e0bdcc	Merge remote-tracking branch 'upstream/longRangeMatcher' into ldm-integrate	2017-09-01 10:19:38 -07:00
Stella Lau	8081becadc	Add long distance matching as a CCtxParam	2017-09-01 09:18:58 -07:00
Yann Collet	d7ad99b2ab	Merge branch 'longRangeMatcher' into dev	2017-08-31 18:08:37 -07:00
Stella Lau	6a546efb8c	Add long distance matcher Move last literals section to ZSTD_block_internal	2017-08-31 12:53:19 -07:00
Stella Lau	ee65701720	Minor fixes; remove formatting only changes	2017-08-29 20:27:35 -07:00
Stella Lau	c7a18b7c21	Localize 'dictMode' from cctx to function param	2017-08-29 15:52:24 -07:00
Stella Lau	024098a47d	Fix parameter retrieval from cdict	2017-08-25 17:58:28 -07:00
Stella Lau	2adde898c8	Fix typo with ZSTDMT_parameter	2017-08-25 16:13:40 -07:00
Stella Lau	eb7bbab36a	Remove ZSTD_p_refDictContent and dictContentByRef	2017-08-25 11:11:45 -07:00
Nick Terrell	de6c6bce85	Fix zstd_internal.h for C++ mode	2017-08-24 18:09:50 -07:00
Stella Lau	5bc2c1e982	Add prototype support for customMem with cctxParams	2017-08-23 12:03:30 -07:00
Stella Lau	6f1a21c7e9	Remove formatting-only changes	2017-08-23 10:24:19 -07:00
Stella Lau	23fc0e41fa	Remove 'opaque' naming from internal functions	2017-08-22 14:24:47 -07:00
Stella Lau	8fd1636776	Remove unused functions	2017-08-22 13:33:58 -07:00
Stella Lau	e50ed1fa3a	Fix undefined behavior when srcSize==1	2017-08-22 11:55:42 -07:00
Stella Lau	5b956f4753	Comment out CCtx_param versions of CDict functions	2017-08-21 14:49:16 -07:00
Stella Lau	502031ca10	Use cctxParam version of createCDict internally	2017-08-21 11:00:44 -07:00
Stella Lau	91b30dbe84	Remove test parameter	2017-08-21 10:09:06 -07:00
Stella Lau	f181f33bdf	Disable tests and refactor	2017-08-21 01:59:08 -07:00
Stella Lau	023b24e6d4	Add cctx param tests	2017-08-20 22:55:07 -07:00
Stella Lau	d775519296	Add cctxParam versions of internal functions	2017-08-18 17:37:58 -07:00
Yann Collet	32fb407c9d	updated a bunch of headers for the new license	2017-08-18 16:52:05 -07:00
Stella Lau	399ae013d4	Add function to apply cctx params	2017-08-18 13:01:55 -07:00
Stella Lau	81d89d82a6	Move nbThreads to cctx params	2017-08-18 12:08:57 -07:00
Stella Lau	2300c58a6f	Move dictContentByRef to cctx params	2017-08-18 12:03:16 -07:00
Stella Lau	b6cb2ed8cb	Move dictMode to cctxParams	2017-08-18 11:43:31 -07:00
Stella Lau	c0221124d5	Add function to set opaque parameters	2017-08-17 19:30:22 -07:00
Stella Lau	699f11b4f7	Create opaque parameter structure	2017-08-17 17:33:46 -07:00
Nick Terrell	565e925eb7	[libzstd] Fix FORCE_INLINE macro	2017-08-14 21:12:05 -07:00
Nick Terrell	7a28b9e4a3	[libzstd] Pull optimal parser state out of seqStore_t	2017-07-17 15:29:11 -07:00
Nick Terrell	e198230645	[libzstd] Remove ZSTD_CCtx* argument of ZSTD_compressSequences()	2017-07-17 12:27:24 -07:00

1 2 3

119 Commits