AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yann Collet	3cbfac1cdb	updated levels 15-20 taking advantage of `btopt` improved speed to tune parameters. Levels 16-19 are stronger than previous release, making the graph more favorable. In theory, I should also update small-size tables, but I got lazy on that one ...	2017-12-14 23:29:00 -08:00
Yann Collet	2cff66b62f	version bump to v1.3.3	2017-12-14 16:11:20 -08:00
Yann Collet	8c41a9cb1e	Merge pull request #951 from facebook/lastBlock saves 3-bytes on small input with streaming API	2017-12-14 15:39:50 -08:00
Yann Collet	a0ac8c895c	Merge pull request #950 from facebook/srcSizeAdaptation fix adaptation on srcSize	2017-12-14 14:48:31 -08:00
Yann Collet	281f06e01f	saves 3-bytes on small input with streaming API zstd streaming API was adding a null-block at end of frame for small input. Reason is : on small input, a single block is enough. ZSTD_CStream would size its input buffer to expect a single block of this size, automatically triggering a flush on reaching this size. Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`). The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame. The solution is to not flush automatically, which is btw the expected behavior. It happens in this case because blocksize is defined with exactly the same size as input. Just adding one-byte is enough to stop triggering the automatic flush. I initially looked at another solution, solving the problem directly in the compression context. But it felt awkward. Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame on reaching its expected end (`pledgedSrcSize`). This feels awkward, a responsability over-reach, beyond the definition of this API. ZSTD_compressContinue() is clearly documented as a guaranteed flush, with ZSTD_compressEnd() generating a guaranteed end. I faced similar issue when trying to port a similar mechanism at the higher streaming layer. Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller, since it did not explicitly requested an end of frame. The only sensible action remaining after that is to end the frame with no additional input. This adds additional logic in the ZSTD_CStream state to check this condition. Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?) In the end, just enlarging input buffer by 1 byte feels the least intrusive change. It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code. The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.	2017-12-14 11:47:02 -08:00
Yann Collet	c005df136f	Merge pull request #947 from facebook/fix944 Fix #944	2017-12-14 10:01:52 -08:00
Yann Collet	2e97a6d464	fixed minor declaration-after-statement warning	2017-12-13 18:50:05 -08:00
Yann Collet	5432ef6921	fixes adaptation on srcSize This patch restores capability for each file to receive adapted compression parameters depending on its size. The bug breaking this feature was relatively silly : setting a parameter with a value "0" is supposed to be a no-op. Unfortunately, it would pin down compression parameters as if they were manually set, preventing later automatic adaptation. Unfortunately, I'm currently short of a test case that could check this situation and trigger an error. Compression parameters selection between tableID 0,1,2,3 is largely internal, leaving no trace to outside world, not even in frame header.	2017-12-13 17:45:26 -08:00
Yann Collet	d23eb9a098	zstreamtest : added missing CHECK_Z()	2017-12-13 15:35:49 -08:00
Nick Terrell	22727a7467	Fix cdict compressor repcodes	2017-12-13 11:31:20 -08:00
Yann Collet	e28305fcca	fix #944 : ZSTDMT with large files and dictionary now works correctly windowLog is now enforced from provided compression parameters, instead of being copied blindly from `cdict` where it could be smaller. also : - fix a minor bug in zstreamtest --mt : advanced parameters must be set before init - changed advanced parameter name to ZSTDMT_jobSize	2017-12-12 18:04:58 -08:00
Yann Collet	03832b7aa5	re-added test case messing with revert ... :(	2017-12-12 14:01:54 -08:00
Yann Collet	8a104fda05	Revert "Created a test case which reliably reproduces bug #944 " This reverts commit `5098d1fbe2`.	2017-12-12 12:51:49 -08:00
Yann Collet	5098d1fbe2	Created a test case which reliably reproduces bug #944 in zstreamtest.	2017-12-12 12:48:31 -08:00
Yann Collet	ac8e022806	Merge pull request #943 from facebook/fix942 Fix #942	2017-12-08 13:53:08 -05:00
Yann Collet	dfc697e967	comment clarification	2017-12-08 12:16:49 -05:00
Yann Collet	c029ee1f0b	ZSTD_initCStream_srcSize() considers "0" to mean "unknown" to not break existing programs relying on this behavior. Might be changed to mean "empty" in the future.	2017-12-07 17:13:10 -05:00
Yann Collet	3aa2b27a89	fix #942 : streaming interface does not compress after ZSTD_initCStream() While the final result is still, technically, a frame, the resulting frame expands initial data instead of compressing it. This is because the streaming API creates a tiny 1-byte buffer for input, because it believes input is empty (0-bytes), because in the past, 0 used to mean "unknown" instead. This patch fixes the issue. Todo : add a test which traps the issue.	2017-12-07 02:52:50 -05:00
Yann Collet	c173dbd6e7	no longer supported starting C++17	2017-12-04 18:00:53 -08:00
Yann Collet	7e05ef851a	Merge branch 'dev' into qemu32panic	2017-12-03 11:14:36 -08:00
Yann Collet	5e1f34b7e4	setParameter : no side-effect on setting a compression parameter last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow. It is no a useless operation, so it's removed. No side-effect left when setting a compression parameter.	2017-12-01 21:17:09 -08:00
Yann Collet	78290874a5	fixed Visual warning on minor interface discrepancy	2017-11-29 17:01:14 -08:00
Yann Collet	d3c59edac9	removed long-range-mode tests from `zstreamtest --no-big-tests`	2017-11-29 16:42:20 -08:00
Yann Collet	998a93b784	simplified ZSTD_CCtx_setParametersUsingCCtxParams() Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action. Any action shall be taken at parameter application only (during init). It makes it possible to just copy CCtxParams from external container to internal state, and get rid of the more complex code which was trying to compensate for missing actions.	2017-11-29 16:13:05 -08:00
Yann Collet	f98ee994c4	zstd_opt: added comments, as requested by @terrelln	2017-11-29 15:19:00 -08:00
Yann Collet	bc42bc3b1d	removed one invocation of SET_PRICE() macro	2017-11-28 16:08:56 -08:00
Yann Collet	0a0a212934	zstd_opt: changed cost formula There was a flaw in the formula which compared literal cost with match cost : at a given position, a non-null literal suite is going to be part of next sequence, while if position ends a previous match, to immediately start another match, next sequence will have a litlength of zero. A litlength of zero has a non-null cost. It follows that literals cost should be compared to match cost + litlength==0. Not doing so gave a structural advantage to matches, which would be selected more often. I believe that's what led to the creation of the strange heuristic which added a complex cost to matches. The heuristic was actually compensating. It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar). The problem with this heuristic is that it's hard to understand, and unfortunately, any future change in the parser would impact the way it should be calculated and its effects. The "proper" formula makes it possible to remove this heuristic. Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse. Note that all differences are small (< 0.01 ratio). In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7). I suspect that's because starting statistics are pretty poor (another area of improvement). However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...). It's a pity that zstd -22 gets worse on silesia.tar. That being said, I like that the new code gets rid of strange variables, which were introducing complexity for any future evolution (faster variants being in mind). Therefore, in spite of this detrimental side effect, I tend to be in favor of it.	2017-11-28 14:07:03 -08:00
Yann Collet	b71405dc51	removed a bunch of code related to cached literal price optState was used both to evaluate price and to cache cost of previously calculated literals. This created a strong dependency, forcing parser to request cost in a strict order. This limitation is forbids future parser with skipping capabilities. After this patch, caching literals price still exists, but is now explicit, in a stack structure.	2017-11-28 12:32:24 -08:00
Yann Collet	03f30d9dcb	separate rawLiterals, fullLiterals and match costs removed one SET_PRICE() macro invocation	2017-11-28 12:14:46 -08:00
Yann Collet	eee87cd6f2	btopt: minor refactor : removed one SET_PRICE() macro invocation direct assignment makes operation cleaner. Also allows some (very minor) optimization (non-measurable)	2017-11-27 17:18:57 -08:00
Yann Collet	e9d1987fd7	btopt: minor speed optimization matchPrice is always right at beginning	2017-11-27 17:01:51 -08:00
Yann Collet	bd88f633ac	zstreamtest : in `-T#s`, s considered a suffix meaning "seconds" avoid unintentionnally triggering `seedset`, so that seed gets automatically determined when not set.	2017-11-27 12:15:23 -08:00
Yann Collet	f8d5c478af	fixed comment, reported by @gyscos	2017-11-21 10:36:14 -08:00
Yann Collet	4154aec679	fixed comment, as suggested by @terrelln	2017-11-21 10:26:17 -08:00
Yann Collet	899f2a29f6	strategy ZSTD_btopt pinned to (0) variant (faster one)	2017-11-20 11:53:20 -08:00
Yann Collet	3f457264d1	slightly improved compression speed	2017-11-19 14:40:21 -08:00
Yann Collet	42c1e64270	slightly improved ratio at -22 merging of repcode search into btsearch introduced a small compression ratio regressio at max level : 1.3.2 : 52728769 after repMerge patch : 52760789 (+32020) A few minor changes have produced this difference. They can be hard to spot. This patch buys back about half of the difference, by no longer inserting position at hc3 when a long match is found there. It feels strangely counter-intuitive, but works : after this patch : 52742555 (-18234)	2017-11-19 14:00:55 -08:00
Yann Collet	99435dbbab	minor : search early-out on sufficient_len for hc3 and rep very very small speed and ratio increases	2017-11-19 12:58:04 -08:00
Yann Collet	d100670045	btopt0 : a bit faster and weaker	2017-11-19 10:38:02 -08:00
Yann Collet	e6da37c430	created (hidden) new strategy btopt0 about ~+10% faster but losing ~0.01 compression ratio (note : amplitude vary a lot depending on files, but direction remains the same)	2017-11-19 10:21:21 -08:00
Yann Collet	e717a5b0dd	zstd_opt: minor speed optimization Calculate reference log2sums only once per serie of sequence (as opposed to once per sequence) Also: improved code comments	2017-11-18 16:24:02 -08:00
Yann Collet	d11661c3ec	fix ZSTD_COMPRESSBOUND() macro It was using macro `KB`, which is not defined in `zstd.h`.	2017-11-18 11:16:39 -08:00
Yann Collet	a4a20a4b2f	fix un-initialized memory warning harmless, but cleaner	2017-11-17 15:51:52 -08:00
Yann Collet	23767e950a	fix one UB pointer arithmetic in encoder Instead of calculating distance between 2 memory objects, which is UB, we extract the offset from object 1, and transfer it into object 2.	2017-11-17 13:24:51 -08:00
Yann Collet	cdade555ee	fixed one UB pointer arithmetic	2017-11-17 11:40:08 -08:00
Yann Collet	11e58d9ba4	fixed minor warning warning: void function returning a value (even if the return value is void)	2017-11-16 15:21:30 -08:00
Yann Collet	15768cabb5	fixed some complex scenarios Fixed : multithreading to compress some small data with dictionary Fixed : ZSTD_initCStream_usingCDict() Improved streaming memory usage when pledgedSrcSize is known.	2017-11-16 15:18:18 -08:00
Yann Collet	05dffe43a7	Fixed Btree update ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate. With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree. Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched. Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on. Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate, so that it no longer depends on a future function to do this job. It took time to get there, as the issue started with a memory sanitizer error. The pb would have been easier to spot with a proper `assert()`. So this patch add a few of them. Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests. This patch enables them. Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt. So this patch also fixes them. - Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed. Now, to avoid this issue, each type is independent. - ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately. - ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN - ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime). - ZSTDMT : nbThreads is automatically clamped on setting the value.	2017-11-16 12:18:56 -08:00
Yann Collet	dfc14579f5	removed wrong assertion	2017-11-15 15:35:56 -08:00
Yann Collet	c55e35b2fc	removed a few specialized traces	2017-11-15 15:04:53 -08:00

1 2 3 4 5 ...

1975 Commits