AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
W. Felix Handte	2abe0145b1	Improve Comments a Bit	2019-09-09 13:34:08 -04:00
W. Felix Handte	7a2416a863	Allocate CDict in Workspace (Rather than in Separate Allocation)	2019-09-09 13:34:08 -04:00
W. Felix Handte	65057cf009	Rewrite ZSTD_initStaticCCtx to Alloc CCtx in Workspace	2019-09-09 13:34:08 -04:00
W. Felix Handte	58b69ab15c	Only the CCtx Itself Needs to be Cleared during Static CCtx Init	2019-09-09 13:34:08 -04:00
W. Felix Handte	88c2fcd0ee	Align Alloc Pointer When Transitioning from Buffers to Aligned Allocs	2019-09-09 13:34:08 -04:00
W. Felix Handte	e936b73889	Remove Overly-Restrictive Assert	2019-09-09 13:34:08 -04:00
W. Felix Handte	75d574368b	When Loading Dict By Copy, Always Put it in the Workspace	2019-09-09 13:34:08 -04:00
W. Felix Handte	e69b67e33a	Alloc Tables Separately	2019-09-09 13:34:08 -04:00
W. Felix Handte	6177354b36	Begin Introducing Phases	2019-09-09 13:34:08 -04:00
W. Felix Handte	786f2266bb	TMP	2019-09-09 13:34:08 -04:00
W. Felix Handte	c25283cf00	Disambiguate 'workspace' and 'entropyWorkspace'	2019-09-09 13:34:08 -04:00
W. Felix Handte	ccaac852e8	Normalize Case 'workSpace' -> 'workspace'	2019-09-09 13:27:18 -04:00
Varun S Nair	9816560649	Fixing assert and DEBUGLOG due to ZSTD_CCtx_params parameter change to const pointer	2019-09-05 15:47:17 +05:30
Varun S Nair	771645471f	Passing ZSTD_CCtx_params by const pointer	2019-09-05 15:28:30 +05:30
Yann Collet	5198347382	Merge pull request #1744 from bimbashrestha/dev Generate RLE blocks in the encoder	2019-08-29 15:19:10 -07:00
Bimba Shrestha	c3e3c8bf32	Undoing the last commit (that was an accident)	2019-08-29 12:05:47 -07:00
bimbashrestha	4a1ca5e0a8	Adding method for extracting sequences.	2019-08-29 11:55:12 -07:00
bimbashrestha	e5704bbfdf	Added test for multiple blocks of zeros and fixed nit about comments	2019-08-28 08:32:34 -07:00
bimbashrestha	96201d9774	Added bool to cctx and fixed some comment nits	2019-08-26 15:30:41 -07:00
bimbashrestha	991cbc9024	Fixing mixed declaration compiler complaint	2019-08-26 15:00:50 -07:00
bimbashrestha	ce264ce53b	Forbiding emission of RLE when its the first block	2019-08-26 14:54:29 -07:00
bimbashrestha	33b6446ca7	Removing accidental method call	2019-08-26 14:34:43 -07:00
bimbashrestha	7b041b552e	Removing assert for rle that doesn't always hold	2019-08-26 12:26:53 -07:00
bimbashrestha	1f2bf77f2a	Using typedef U32 instead of int	2019-08-26 09:00:22 -07:00
bimbashrestha	ba46932492	Removing implicit conversion from const void* to const BYTE* and added constant for threshold	2019-08-26 08:51:34 -07:00
bimbashrestha	0e3ba02cf1	Fixing more test falure errors	2019-08-22 13:54:41 -07:00
bimbashrestha	4faf3a5911	Fixing ci-circle test failure issues	2019-08-22 13:46:15 -07:00
bimbashrestha	cba5350f88	Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert	2019-08-22 12:12:44 -07:00
Nick Magerko	493f95c7df	Fix merge conflicts	2019-08-22 11:51:41 -07:00
bimbashrestha	4c90d862e3	Generate RLE blocks in the encoder	2019-08-22 11:27:20 -07:00
Nick Magerko	c7a24d7a14	Define ZSTD_SRCSIZEHINT_MIN as 0	2019-08-20 13:06:15 -07:00
Nick Magerko	2d39b43906	Use int for srcSizeHint when sensible	2019-08-19 16:49:25 -07:00
Nick Magerko	edf2abf106	Fix fall-through case	2019-08-19 12:32:43 -07:00
Nick Magerko	dffbac5f89	Add --size-hint=# option	2019-08-19 11:38:49 -07:00
Yann Collet	782bfb858a	fixed very minor inefficiency (nbSeq==127) The nbSeq "short" format (1-byte) is compatible with any value < 128. However, the code would cautiously only accept values < 127. This is not an error, because the general 2-bytes format is compatible with small values < 128. Hence the inefficiency never triggered any warning. Spotted by Intel's Smita Kumar.	2019-08-15 16:41:34 +02:00
Yann Collet	facbe8b2c2	factored the logic selecting lowest match index as suggested by @terrelln	2019-08-05 15:18:43 +02:00
Yann Collet	0b0b83e8f3	fix test 122 it's an unsupported scenario.	2019-08-03 16:51:26 +02:00
Yann Collet	98e7c344cd	fixed strategies btopt+	2019-08-02 14:42:53 +02:00
Yann Collet	b4257b04e7	fixed strategy btlazy2	2019-08-02 14:26:26 +02:00
Yann Collet	5cf1b24aca	fixed strategies greedy, lazy & lazy2 restore dictionary compression ratio	2019-08-02 14:21:39 +02:00
Yann Collet	98692c2838	fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3	2019-08-01 15:58:17 +02:00
Yann Collet	be3d2e2de8	Merge pull request #1679 from ephiepark/dev Restructure the source files	2019-07-19 15:29:07 -07:00
Ephraim Park	1dc98de279	Restructure the source files	2019-07-15 17:39:18 -07:00
Yann Collet	8fb08b68cc	Merge pull request #1681 from facebook/level3 updated double_fast complementary insertion	2019-07-12 16:16:06 -07:00
Nick Terrell	75cfe1dc69	[ldm] Fix bug in overflow correction with large job size (#1678 ) * [ldm] Fix bug in overflow correction with large job size * [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode) * [test] Add test that exposes the bug Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.	2019-07-12 18:45:18 -04:00
Yann Collet	eaeb7f00b5	updated the _extDict variant of double fast	2019-07-12 14:17:17 -07:00
Yann Collet	e8a7f5d3ce	double-fast: changed the trade-off for a smaller positive change same number of complementary insertions, just organized differently (long at `ip-2`, short at `ip-1`).	2019-07-12 11:34:53 -07:00
mgrice	812e8f2a16	perf improvements for zstd decode (#1668 ) * perf improvements for zstd decode tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge) Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization. See how GCC autovectorizes the loop here: https://godbolt.org/z/apr0x0 Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on) After: https://godbolt.org/z/OwO4F8 Note that autovectorization still does not do a good job on the optimized version, so it's turned off\ via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both. silesia benchmark data - second triad of each file is with the original code: file orig compressedratio encode decode change 1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s 2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s 3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s 1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60% 2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01% 3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99% 1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s 2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s 3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s 1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30% 2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20% 3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50% 1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s 2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s 3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s 1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76% 2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54% 3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89% 1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s 2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s 3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s 1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47% 2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72% 3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29% 1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s 2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s 3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s 1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15% 2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98% 3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45% 1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s 2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s 3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s 1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41% 2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42% 3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68% 1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s 2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s 3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s 1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26% 2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36% 3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38% 1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s 2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s 3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s 1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45% 2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93% 3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04% 1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s 2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s 3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s 1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70% 2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85% 3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31% 1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s 2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s 3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s 1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89% 2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43% 3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33% 1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s 2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s 3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s 1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24% 2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99% 3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05% 1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s 2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s 3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s 1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52% 2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00% 3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37% 7.51% * makefile changed to only pass -fno-tree-vectorize to gcc * <Replace this line with a title. Use 1 line only, 67 chars or less> Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__) * fix for warning/error with subtraction of void* pointers * fix c90 conformance issue - ISO C90 forbids mixed declarations and code * Fix assert for negative diff, only when there is no overlap * fix overflow revealed in fuzzing tests * tweak for small speed increase	2019-07-11 18:31:07 -04:00
Yann Collet	d1327738c2	updated double_fast complementary insertion in a way which is more favorable to compression ratio, though very slightly slower (~-1%). More details in the PR.	2019-07-11 15:25:22 -07:00
Yann Collet	b01c1c679f	Merge pull request #1675 from ephiepark/dev Factor out the logic to build sequences	2019-07-10 13:32:31 -07:00

1 2 3 4 5 ...

1270 Commits