AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yann Collet	77fc611abe	Merge branch 'benchfix' into dubtlazy	2017-12-29 19:16:58 +01:00
Yann Collet	d228b6b0d0	btlazy2 : optimization for dictionary compression we want the dictionary table to be fully sorted, not just lazily filled. Dictionary loading is a bit more intensive, but it saves cpu cycles for match search during compression.	2017-12-29 19:14:18 +01:00
Yann Collet	02f64ef955	btlazy2: fixed interaction between unsortedMark and reduceTable	2017-12-29 19:08:51 +01:00
Yann Collet	4c7f137bd2	add test case which reliably produces btlazy2 rescale overflow bug The unsorted_mark is handled like any index, which fails after a rescale.	2017-12-29 17:40:36 +01:00
Yann Collet	64482c2c97	fixed bug in dubt the chain of unsorted candidates could grow beyond lowLimit.	2017-12-29 17:04:37 +01:00
Yann Collet	f36da5b4d9	minor speed optimization : index overflow prevention new code supposed to be easier to auto-vectorize	2017-12-29 14:40:33 +01:00
Yann Collet	ffc335bccf	complete ignore list fuzz tests artifacts	2017-12-29 14:39:49 +01:00
Yann Collet	5235d8d6ba	first implementation of delayed update for btlazy2 This is a pretty nice speed win. The new strategy consists in stacking new candidates as if it was a hash chain. Then, only if there is a need to actually consult the chain, they are batch-updated, before starting the match search itself. This is supposed to be beneficial when skipping positions, which happens a lot when using lazy strategy. The baseline performance for btlazy2 on my laptop is : 15#calgary.tar : 3265536 -> 955985 (3.416), 7.06 MB/s , 618.0 MB/s 15#enwik7 : 10000000 -> 3067341 (3.260), 4.65 MB/s , 521.2 MB/s 15#silesia.tar : 211984896 -> 58095131 (3.649), 6.20 MB/s , 682.4 MB/s (only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt) After this patch, and keeping all parameters identical, speed is increased by a pretty good margin (+30-50%), but compression ratio suffers a bit : 15#calgary.tar : 3265536 -> 958060 (3.408), 9.12 MB/s , 621.1 MB/s 15#enwik7 : 10000000 -> 3078318 (3.249), 6.37 MB/s , 525.1 MB/s 15#silesia.tar : 211984896 -> 58444111 (3.627), 9.89 MB/s , 680.4 MB/s That's because I kept `1<<searchLog` as a maximum number of candidates to update. But for a hash chain, this represents the total number of candidates in the chain, while for the binary, it represents the maximum depth of searches. Keep in mind that a lot of candidates won't even be visited in the btree, since they are filtered out by the binary sort. As a consequence, in the new implementation, the effective depth of the binary tree is substantially shorter. To compensate, it's enough to increase `searchLog` value. Here is the result after adding just +1 to searchLog (level 15 setting in this patch): 15#calgary.tar : 3265536 -> 956311 (3.415), 8.32 MB/s , 611.4 MB/s 15#enwik7 : 10000000 -> 3067655 (3.260), 5.43 MB/s , 535.5 MB/s 15#silesia.tar : 211984896 -> 58113144 (3.648), 8.35 MB/s , 679.3 MB/s aka, almost the same compression ratio as before, but with a noticeable speed increase (+20-30%). This modification makes btlazy2 more competitive. A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.	2017-12-28 16:58:57 +01:00
Yann Collet	63aebdf9ea	windows ci : removed fullbench-dll from appveyor the test is broken previous version was too Windows-specific. It needs to be either refactored to be cross-platform, or be removed if it doesn't bring any value (which is indeed unclear).	2017-12-27 19:27:54 +01:00
Yann Collet	2126ca8a5f	%.o objects files in /tests Recipe in /tests rebuild everything from source for each target. zstd is still a "small" project, so it's not prohibitive, yet, rebuilding same files over and over represents substantial redundant work. This patch replaces .c files from /lib by their corresponding .o files. They cannot be compiled and stored directly within /lib, since /tests triggers additional debug capabilities unwelcome in release binary. So the resulting .o are stored directly within /tests. It turns out, it's difficult to find several target using exactly* the same rules. Using only the default rules (debug enabled, multi-threading disabled, no legacy) a surprisingly small amount of targets share their work. It's because, in many cases there are additional modifications requested : some targets are 32-bits, some enable multi-threading, some enable legacy support, some disable asserts, some want different kind of sanitizer, etc. I created 2 sets of object files : with and without multithreading. Several targets share their work, saving compilation time when running `make all`. Also, obviously, when modifying one source file, only this one needs rebuilding. For targets requiring some different setting, build from source *.c remain the rule. The new rules have been tested within `-j` parallel compilation, and work fine with it.	2017-12-27 17:58:27 +01:00
Yann Collet	c707c6e9f2	fix: bench can accept hlog custom parameter was ignored during initialization	2017-12-27 13:32:05 +01:00
Shawn Landden	6ff43c0051	get soversion right	2017-12-24 10:05:43 -08:00
Yann Collet	d55aea3c3b	Merge pull request #961 from shawnl/patch-2 fix unbounded range	2017-12-22 07:09:18 +01:00
Yann Collet	b7d0d96ac6	Merge pull request #960 from shawnl/dev meson: support differn't legacy levels.	2017-12-22 07:07:52 +01:00
Shawn Landden	914d983879	fix unbounded range I think you meant 8 MiB or smaller, instead of an unbounded (and illogical) range	2017-12-21 16:15:12 -08:00
Shawn Landden	daffe435c0	meson: support differn't legacy levels. Default to v0.4.0+	2017-12-21 15:47:38 -08:00
Yann Collet	8879802866	Merge pull request #959 from shawnl/dev meson: fix build	2017-12-20 11:20:07 +01:00
Shawn Landden	3ddfa42fe8	meson: fix build used absolute paths which are deprecated in meson, also missing some sources that got split also move source files each to their own line so future diffs are clearer.	2017-12-19 22:02:03 -08:00
Yann Collet	473362e922	Merge pull request #958 from facebook/continueCCtx fix a subtle issue in continue mode	2017-12-20 00:12:50 +01:00
Yann Collet	cafedcbbe4	ZSTD_resetCCtx_internal: fixed order of arguments params1 was swapped with params2. This used to be a non-issue when testing for strict equality, but now that some tests look for "sufficient size" `<=`, order matters.	2017-12-19 21:49:04 +01:00
Yann Collet	9096088f45	changed variable name for clarity, suggested by @terrelln	2017-12-19 21:20:46 +01:00
Yann Collet	574e75354b	fuzzer: ensure existence of CHECK_Z macro beyond OS-X systems	2017-12-19 11:24:14 +01:00
Yann Collet	d88c671663	added test case for "wrong blockSize in continue mode"	2017-12-19 10:16:09 +01:00
Yann Collet	f299fa39ac	fix a subtle issue in continue mode The deep fuzzer tests caught a subtle bug that was probably there for a long time. The impact of the bug is not a crash, or any other clear error signal, rather, it reduces performance, by cutting data into smaller blocks. Eventually, the following test would fail because it produces too many 1-byte blocks, requiring more space than buffer can provide : `./zstreamtest_asan --mt -s3514 -t1678312 -i1678314` The root scenario is as follows : - Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1` - The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`. `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`. The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes. - Complete compression - New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN` trigger "continue mode" - Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only, but in fact, it also depends on `pledgedSrcSize`. - The "old" blocksize (1) is still there, next compression will use this value to cut input into blocks, resulting in more blocks and worse performance than necessary performance. Given the scenario, and its possible variants, I'm surprised it did not show up before. But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger. The above test is a special corner case, where performance is so impacted that it reaches an error case. The fix works, but I'm not completely pleased. I think the current code relies too much on implied relations between variables. This will likely break again in the future when some related part of the code change. Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3. So a longer term fix will have to be considered after the release. To do : create a reliable test case which triggers this scenario for CI tests.	2017-12-19 09:43:03 +01:00
Yann Collet	aa0c09bdc9	Merge pull request #957 from facebook/nbThreads zstdmt via compress_generic: reduce opportunity to free/create mtctx	2017-12-18 16:04:14 -08:00
Yann Collet	5c2f2ebfdb	zstdmt via compress_generic: reduce opportunity to free/create mtctx `zstreamtest --newapi` (and `--opaqueapi`) create and destroy way too many threads resulting in failure of tsan tests, and potentially connected to the qemu flaky tests. This is because, at each test, the nb of threads can be changed (random). The `--no-big-tests` directive reduce this choice to 1/2 threads, in order to limit memory usage, especially for qemu and 32-bits builds. Unfortunately, swapping between 1 and 2 threads is enough to constantly create/destroy new mtctx. This patch takes advantage of the following property : via compress_generic, no internal mtctx is needed for nbThreads < 2. As a consequence, when nbThreads == 2, the currently active mtctx is necessarily good. This dramatically reduces the nb of thread creations when invoking `zstreamtest --newapi --no-big-tests` (only when parent cctx itself is created, which is randomized to 1/256 tests). Expected outcome : - at a minimum : tsan tests shall now work continuously without exploding the thread counter - at best : flaky qemu tests on `zstreamtest --newapi --no-big-tests` may stop being flaky, due to less stress from constant thread creation/destruction Real world impact : minimal, I don't expect users to constantly change `nbThreads` between each invocation. If `nbThreads` remains stable, existing implementation re-uses existing mtctx. Also : `zstreamtest --newapi` but without `--no-big-tests` doesn't benefit as much, since this test can select a random `nbThreads` value between 1 and 4. The current patch only reduces opportunity to free/create mtctx (for example : 2->1->2 doesn't need a new mtctx) but doesn't completely eliminate it, since `nbThreads` can still change between 2/3/4. A more complete solution could be to only use 2 out of 4 allocated threads, thus keeping the pool at a constant size. This would require a larger change to `POOL_*` api though.	2017-12-16 12:48:13 -08:00
Yann Collet	569e06b91e	Merge pull request #955 from facebook/readme Readme	2017-12-15 13:53:37 -08:00
Yann Collet	64d1701c64	remove last paragraph	2017-12-15 13:28:34 -08:00
Yann Collet	78de28239f	minor readme formatting update	2017-12-15 13:26:39 -08:00
Yann Collet	f0b0789215	Merge pull request #953 from facebook/clevels update levels 15-20	2017-12-15 11:11:50 -08:00
Yann Collet	cc9e026866	Merge pull request #952 from terrelln/merge-end [fileio] Merge end loop for small optimization	2017-12-15 10:27:53 -08:00
Yann Collet	3cbfac1cdb	updated levels 15-20 taking advantage of `btopt` improved speed to tune parameters. Levels 16-19 are stronger than previous release, making the graph more favorable. In theory, I should also update small-size tables, but I got lazy on that one ...	2017-12-14 23:29:00 -08:00
Yann Collet	2cff66b62f	version bump to v1.3.3	2017-12-14 16:11:20 -08:00
Nick Terrell	f48d34edba	[fileio] Merge end loop for small optimization	2017-12-14 15:52:24 -08:00
Yann Collet	8c41a9cb1e	Merge pull request #951 from facebook/lastBlock saves 3-bytes on small input with streaming API	2017-12-14 15:39:50 -08:00
Yann Collet	a0ac8c895c	Merge pull request #950 from facebook/srcSizeAdaptation fix adaptation on srcSize	2017-12-14 14:48:31 -08:00
Yann Collet	a0e0985d38	added test on small file on top of test on small stream	2017-12-14 13:32:24 -08:00
Yann Collet	281f06e01f	saves 3-bytes on small input with streaming API zstd streaming API was adding a null-block at end of frame for small input. Reason is : on small input, a single block is enough. ZSTD_CStream would size its input buffer to expect a single block of this size, automatically triggering a flush on reaching this size. Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`). The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame. The solution is to not flush automatically, which is btw the expected behavior. It happens in this case because blocksize is defined with exactly the same size as input. Just adding one-byte is enough to stop triggering the automatic flush. I initially looked at another solution, solving the problem directly in the compression context. But it felt awkward. Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame on reaching its expected end (`pledgedSrcSize`). This feels awkward, a responsability over-reach, beyond the definition of this API. ZSTD_compressContinue() is clearly documented as a guaranteed flush, with ZSTD_compressEnd() generating a guaranteed end. I faced similar issue when trying to port a similar mechanism at the higher streaming layer. Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller, since it did not explicitly requested an end of frame. The only sensible action remaining after that is to end the frame with no additional input. This adds additional logic in the ZSTD_CStream state to check this condition. Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?) In the end, just enlarging input buffer by 1 byte feels the least intrusive change. It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code. The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.	2017-12-14 11:47:02 -08:00
Yann Collet	5b2ce2c043	Merge pull request #946 from terrelln/r-o Allow -o with multiple files	2017-12-14 10:02:05 -08:00
Yann Collet	c005df136f	Merge pull request #947 from facebook/fix944 Fix #944	2017-12-14 10:01:52 -08:00
Yann Collet	2e97a6d464	fixed minor declaration-after-statement warning	2017-12-13 18:50:05 -08:00
Yann Collet	5432ef6921	fixes adaptation on srcSize This patch restores capability for each file to receive adapted compression parameters depending on its size. The bug breaking this feature was relatively silly : setting a parameter with a value "0" is supposed to be a no-op. Unfortunately, it would pin down compression parameters as if they were manually set, preventing later automatic adaptation. Unfortunately, I'm currently short of a test case that could check this situation and trigger an error. Compression parameters selection between tableID 0,1,2,3 is largely internal, leaving no trace to outside world, not even in frame header.	2017-12-13 17:45:26 -08:00
Nick Terrell	4680e85bdf	Allow -o with multiple files	2017-12-13 17:44:34 -08:00
Yann Collet	4d0dfafa7b	Merge pull request #949 from terrelln/rrm [fileio] Refuse to remove non-regular file	2017-12-13 17:36:39 -08:00
Yann Collet	d23eb9a098	zstreamtest : added missing CHECK_Z()	2017-12-13 15:35:49 -08:00
Nick Terrell	90d38f6a53	Merge pull request #945 from terrelln/dev Fix cdict compressor repcodes	2017-12-13 14:24:21 -08:00
Nick Terrell	82bc8fe0cc	[fileio] Refuse to remove non-regular file	2017-12-13 13:38:26 -08:00
Yann Collet	aa81aac2dd	Merge pull request #948 from terrelln/mb [fileio] Fix window size MB calculation	2017-12-13 12:17:05 -08:00
Yann Collet	311878dec3	Improved tests - building cli from /tests preserves potential flags in MOREFLAGS (such as asan/usan) - MT dictionary tests check for MT capability (MT is not enabled by default for zstd32)	2017-12-13 11:48:30 -08:00
Nick Terrell	22727a7467	Fix cdict compressor repcodes	2017-12-13 11:31:20 -08:00

1 2 3 4 5 ...

4667 Commits