AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
bimbashrestha	7b041b552e	Removing assert for rle that doesn't always hold	2019-08-26 12:26:53 -07:00
bimbashrestha	1f2bf77f2a	Using typedef U32 instead of int	2019-08-26 09:00:22 -07:00
bimbashrestha	ba46932492	Removing implicit conversion from const void* to const BYTE* and added constant for threshold	2019-08-26 08:51:34 -07:00
bimbashrestha	0e3ba02cf1	Fixing more test falure errors	2019-08-22 13:54:41 -07:00
bimbashrestha	4faf3a5911	Fixing ci-circle test failure issues	2019-08-22 13:46:15 -07:00
bimbashrestha	cba5350f88	Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert	2019-08-22 12:12:44 -07:00
bimbashrestha	4c90d862e3	Generate RLE blocks in the encoder	2019-08-22 11:27:20 -07:00
Yann Collet	facbe8b2c2	factored the logic selecting lowest match index as suggested by @terrelln	2019-08-05 15:18:43 +02:00
Yann Collet	0b0b83e8f3	fix test 122 it's an unsupported scenario.	2019-08-03 16:51:26 +02:00
Yann Collet	98e7c344cd	fixed strategies btopt+	2019-08-02 14:42:53 +02:00
Yann Collet	b4257b04e7	fixed strategy btlazy2	2019-08-02 14:26:26 +02:00
Yann Collet	5cf1b24aca	fixed strategies greedy, lazy & lazy2 restore dictionary compression ratio	2019-08-02 14:21:39 +02:00
Yann Collet	98692c2838	fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3	2019-08-01 15:58:17 +02:00
Yann Collet	be3d2e2de8	Merge pull request #1679 from ephiepark/dev Restructure the source files	2019-07-19 15:29:07 -07:00
Ephraim Park	1dc98de279	Restructure the source files	2019-07-15 17:39:18 -07:00
Yann Collet	8fb08b68cc	Merge pull request #1681 from facebook/level3 updated double_fast complementary insertion	2019-07-12 16:16:06 -07:00
Nick Terrell	75cfe1dc69	[ldm] Fix bug in overflow correction with large job size (#1678 ) * [ldm] Fix bug in overflow correction with large job size * [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode) * [test] Add test that exposes the bug Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.	2019-07-12 18:45:18 -04:00
Yann Collet	eaeb7f00b5	updated the _extDict variant of double fast	2019-07-12 14:17:17 -07:00
Yann Collet	e8a7f5d3ce	double-fast: changed the trade-off for a smaller positive change same number of complementary insertions, just organized differently (long at `ip-2`, short at `ip-1`).	2019-07-12 11:34:53 -07:00
mgrice	812e8f2a16	perf improvements for zstd decode (#1668 ) * perf improvements for zstd decode tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge) Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization. See how GCC autovectorizes the loop here: https://godbolt.org/z/apr0x0 Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on) After: https://godbolt.org/z/OwO4F8 Note that autovectorization still does not do a good job on the optimized version, so it's turned off\ via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both. silesia benchmark data - second triad of each file is with the original code: file orig compressedratio encode decode change 1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s 2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s 3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s 1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60% 2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01% 3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99% 1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s 2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s 3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s 1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30% 2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20% 3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50% 1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s 2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s 3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s 1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76% 2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54% 3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89% 1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s 2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s 3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s 1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47% 2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72% 3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29% 1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s 2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s 3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s 1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15% 2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98% 3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45% 1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s 2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s 3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s 1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41% 2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42% 3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68% 1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s 2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s 3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s 1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26% 2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36% 3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38% 1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s 2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s 3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s 1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45% 2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93% 3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04% 1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s 2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s 3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s 1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70% 2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85% 3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31% 1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s 2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s 3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s 1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89% 2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43% 3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33% 1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s 2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s 3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s 1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24% 2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99% 3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05% 1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s 2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s 3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s 1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52% 2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00% 3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37% 7.51% * makefile changed to only pass -fno-tree-vectorize to gcc * <Replace this line with a title. Use 1 line only, 67 chars or less> Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__) * fix for warning/error with subtraction of void* pointers * fix c90 conformance issue - ISO C90 forbids mixed declarations and code * Fix assert for negative diff, only when there is no overlap * fix overflow revealed in fuzzing tests * tweak for small speed increase	2019-07-11 18:31:07 -04:00
Yann Collet	d1327738c2	updated double_fast complementary insertion in a way which is more favorable to compression ratio, though very slightly slower (~-1%). More details in the PR.	2019-07-11 15:25:22 -07:00
Yann Collet	b01c1c679f	Merge pull request #1675 from ephiepark/dev Factor out the logic to build sequences	2019-07-10 13:32:31 -07:00
Yann Collet	096714d1b8	Merge pull request #1671 from ephiepark/dev Adding targetCBlockSize param	2019-07-03 17:47:44 -07:00
Ephraim Park	f57ac7b09e	Factor out the logic to build sequences	2019-07-03 15:42:38 -07:00
Ephraim Park	9007701670	Adding targetCBlockSize param	2019-07-03 15:41:52 -07:00
Nick Terrell	6c92ba774e	ZSTD_compressSequences_internal assert op <= oend (#1667 ) When we wrote one byte beyond the end of the buffer for RLE blocks back in 1.3.7, we would then have `op > oend`. That is a problem when we use `oend - op` for the size of the destination buffer, and allows further writes beyond the end of the buffer for the rest of the function. Lets assert that it doesn't happen.	2019-07-02 15:45:47 -07:00
Yann Collet	857e608b51	Merge pull request #1658 from facebook/memset memset() rather than reduceIndex()	2019-07-01 15:01:43 -07:00
Yann Collet	621adde3b2	changed naming to ZSTD_indexTooCloseToMax() Also : minor speed optimization : shortcut to ZSTD_reset_matchState() rather than the full reset process. It still needs to be completed with ZSTD_continueCCtx() for proper initialization. Also : changed position of LDM hash tables in the context, so that the "regular" hash tables can be at a predictable position, hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.	2019-06-24 14:39:29 -07:00
Yann Collet	45c9fbd6d9	prefer memset() rather than reduceIndex() when close to index range limit by disabling continue mode when index is close to limit.	2019-06-21 16:19:21 -07:00
Yann Collet	944e2e9e12	benchfn : added macro macro CONTROL() like assert() but cannot be disabled. proper separation of user contract errors (CONTROL()) and invariant verification (assert()).	2019-06-21 15:58:55 -07:00
Nick Terrell	674534a700	[zstd] Fix data corruption in niche use case * Extract the overflow correction into a helper function. * Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time and overflow correct between each chunk. Data corruption could happen when all these conditions are true: * You are using multithreading mode * Your overlap size is >= 512 MB (implies window size >= 512 MB) * You are using a strategy >= ZSTD_btlazy * You are compressing more than 4 GB The problem is that when loading a large dictionary we don't do overflow correction. We can only load 512 MB at a time, and may need to do overflow correction before each chunk.	2019-06-21 15:47:31 -07:00
Nick Terrell	4156060ca4	[zstdmt] Update assert to use ZSTD_WINDOWLOG_MAX	2019-06-21 15:39:33 -07:00
Nick Terrell	95e2b430ea	[opt] Add asserts for corruption in ZSTD_updateTree()	2019-06-21 15:22:29 -07:00
Yann Collet	9af909bf35	Merge pull request #1624 from facebook/smallwlog Improves compression ratio for small windowLog	2019-06-14 17:28:21 -07:00
Nick Terrell	cdb9481e38	[libzstd] Optimize ZSTD_insertBt1() for repetitive data We would only skip at most 192 bytes at a time before this diff. This was added to optimize long matches and skip the middle of the match. However, it doesn't handle the case of repetitive data. This patch keeps the optimization, but also handles repetitive data by taking the max of the two return values. ``` > for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 \| command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done strategy=1 0.27 strategy=2 0.23 strategy=3 0.27 strategy=4 0.43 strategy=5 0.56 strategy=6 0.43 strategy=7 0.34 strategy=8 0.34 strategy=9 0.35 ``` At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes. In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`. Fixes Issue #1634.	2019-06-05 20:34:00 -07:00
Yann Collet	80d6ccea79	removed UINT32_MAX apparently not guaranteed on all platforms, replaced by UINT_MAX.	2019-05-31 17:27:07 -07:00
Yann Collet	fce4df3ab7	fixed wrong assert in double_fast	2019-05-31 17:06:28 -07:00
Yann Collet	a968099038	minor code cleaning for new index invalidation strategy	2019-05-31 16:52:37 -07:00
Yann Collet	d605f482c7	make double_fast compatible with new index invalidation strategy	2019-05-31 16:50:04 -07:00
Yann Collet	a30febaeeb	Made fast strategy compatible with new offset validation strategy fast mode does the same thing as before : it pre-emptively invalidates any index that could lead to offset > maxDistance. It's supposed to help speed. But this logic is performed inside zstd_fast, so that other strategies can select a different behavior.	2019-05-31 16:34:55 -07:00
Yann Collet	58adb1059f	extended exact window size to greedy/lazy modes	2019-05-31 16:08:48 -07:00
Yann Collet	bc601bdc6d	first implementation of small window size for btopt noticeably improves compression ratio when window size is small (< 18). enwik7 level 19 windowLog `dev` `smallwlog` improvement 23 3.577 3.577 0.02% 22 3.536 3.538 0.06% 21 3.462 3.467 0.14% 20 3.364 3.377 0.39% 19 3.244 3.272 0.86% 18 3.110 3.166 1.80% 17 2.843 3.057 7.53% 16 2.724 2.943 8.04% 15 2.594 2.822 8.79% 14 2.456 2.686 9.36% 13 2.312 2.523 9.13% 12 2.162 2.361 9.20% 11 2.003 2.182 8.94%	2019-05-31 15:55:12 -07:00
Yann Collet	b13a9207f9	Merge pull request #1623 from facebook/fullbench fullbench minor improvements	2019-05-31 14:40:19 -07:00
Yann Collet	ed38b645db	fullbench: pass proper parameters in scenario 43	2019-05-29 15:26:06 -07:00
Yann Collet	9719fd616c	removed nextToUpdate3 from ZSTD_window it's now a local variable of ZSTD_compressBlock_opt()	2019-05-28 16:18:12 -07:00
Yann Collet	33dabc8c80	get bt matches : made it a bit clearer which parameters are input and output	2019-05-28 16:11:32 -07:00
Yann Collet	327cf6fac1	nextToUpdate3 does not need to be maintained outside of zstd_opt.c It's re-synchronized with nextToUpdate at beginning of each block. It only needs to be tracked from within zstd_opt block parser. Made the logic clear, so that no code tried to maintain this variable. An even better solution would be to make nextToUpdate3 an internal variable of ZSTD_compressBlock_opt_generic(). That would make it possible to remove it from ZSTD_matchState_t, thus restricting its visibility to only where it's actually useful. This would require deeper changes though, since the matchState is the natural structure to transport parameters into and inside the parser.	2019-05-28 15:26:52 -07:00
Yann Collet	6453f8158f	complementary code comments on variables used / impacted during maxDist check	2019-05-28 14:12:16 -07:00
Yann Collet	4baecdf72a	added comments to better understand enforceMaxDist()	2019-05-28 13:15:48 -07:00
Nick Terrell	a17fe4c9e5	[visual] Fix unreachable code warning	2019-04-16 11:32:35 -07:00

1 2 3 4 5 ...

1242 Commits