Nick Terrell
0932de54bc
[dictBuilder] Fix deadlock in *COVER error case
...
The COVER and FASTCOVER dictionary builders can deadlock when
dictionary construction errors, likely because there are too few
samples, or too few distinct dmers. The deadlock only occurs when
there are errors.
Fixes #1746 .
2019-08-26 18:19:29 -07:00
Nick Magerko
493f95c7df
Fix merge conflicts
2019-08-22 11:51:41 -07:00
Nick Terrell
54ad33448c
Merge pull request #1737 from terrelln/legacy-fix
...
[legacy] Fix buffer overflow in v0.2 and v0.4 raw literals decompression
2019-08-21 10:10:24 -07:00
Yann Collet
38b6428fcd
Merge pull request #1725 from emaste/dev
...
remove extraneous doubled ;s
2019-08-21 05:19:30 -07:00
Yann Collet
fe0877c664
Merge pull request #1721 from facebook/seq127
...
fixed very minor inefficiency (nbSeq==127)
2019-08-21 05:19:12 -07:00
Nick Terrell
07f22d465d
[legacy] Fix buffer overflow in v0.2 and v0.4 raw literals decompression
...
Extends the fix in PR#1722 to v0.2 and v0.4. These aren't built into
zstd by default, and v0.5 onward are not affected.
I only add the `srcSize > BLOCKSIZE` check to v0.4 because the comments
say that it must hold, but the equivalent comment isn't present in v0.2.
Credit to OSS-Fuzz.
2019-08-20 17:13:04 -07:00
Nick Magerko
de6a6c7364
Fix ZSTD_SRCSIZEHINT_MIN typo
2019-08-20 13:07:51 -07:00
Nick Magerko
c7a24d7a14
Define ZSTD_SRCSIZEHINT_MIN as 0
2019-08-20 13:06:15 -07:00
Nick Magerko
2d39b43906
Use int for srcSizeHint when sensible
2019-08-19 16:49:25 -07:00
Nick Magerko
09894dc2eb
Add mention of regression with poor size hints
2019-08-19 13:41:36 -07:00
Nick Magerko
fee8fbcddf
Make upper bound INT_MAX
2019-08-19 12:58:54 -07:00
Nick Magerko
edf2abf106
Fix fall-through case
2019-08-19 12:32:43 -07:00
Nick Magerko
dffbac5f89
Add --size-hint=# option
2019-08-19 11:38:49 -07:00
Ed Maste
b81d7cc6a0
remove extraneous doubled ;s
2019-08-15 21:17:06 -04:00
W. Felix Handte
a42bbb4e05
Fix Buffer Overflow in Legacy (v0.3) Raw Literals Decompression
2019-08-15 14:28:30 -04:00
Yann Collet
782bfb858a
fixed very minor inefficiency (nbSeq==127)
...
The nbSeq "short" format (1-byte)
is compatible with any value < 128.
However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.
Spotted by Intel's Smita Kumar.
2019-08-15 16:41:34 +02:00
Yann Collet
01b2331ad1
bumped version number
...
to v1.4.3
2019-08-05 17:17:16 +02:00
Yann Collet
61936ba42a
Merge pull request #1705 from josepho0918/dev
...
Add support for IAR C/C++ Compiler for Arm
2019-08-05 15:57:28 +02:00
Yann Collet
facbe8b2c2
factored the logic selecting lowest match index
...
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
0b0b83e8f3
fix test 122
...
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
98e7c344cd
fixed strategies btopt+
2019-08-02 14:42:53 +02:00
Yann Collet
b4257b04e7
fixed strategy btlazy2
2019-08-02 14:26:26 +02:00
Yann Collet
5cf1b24aca
fixed strategies greedy, lazy & lazy2
...
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet
98692c2838
fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3
2019-08-01 15:58:17 +02:00
Joseph Chen
3855bc4295
Add support for IAR C/C++ Compiler for Arm
2019-07-29 15:25:58 +08:00
W. Felix Handte
8083581f9a
Bump Library Version Number to 1.4.2
2019-07-24 17:35:19 -04:00
Nick Terrell
e6edcfa795
[legacy] Fix bug in zstd-0.5 decoder
...
The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.
The bug, introduced by commit 0fd322f812
,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.
I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.
Fixes #1693 .
2019-07-22 13:05:09 -07:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
...
Restructure the source files
2019-07-19 15:29:07 -07:00
Vivek Miglani
c7be7d2efb
Fixing compressed block size checks
2019-07-17 12:53:15 -07:00
Ephraim Park
1dc98de279
Restructure the source files
2019-07-15 17:39:18 -07:00
Vivek Miglani
3f108f82fb
Return error if block size exceeds maximum
2019-07-15 12:10:21 -07:00
Yann Collet
8fb08b68cc
Merge pull request #1681 from facebook/level3
...
updated double_fast complementary insertion
2019-07-12 16:16:06 -07:00
Nick Terrell
75cfe1dc69
[ldm] Fix bug in overflow correction with large job size ( #1678 )
...
* [ldm] Fix bug in overflow correction with large job size
* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)
* [test] Add test that exposes the bug
Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Yann Collet
eaeb7f00b5
updated the _extDict variant of double fast
2019-07-12 14:17:17 -07:00
Yann Collet
e8a7f5d3ce
double-fast: changed the trade-off for a smaller positive change
...
same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).
2019-07-12 11:34:53 -07:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet
d1327738c2
updated double_fast complementary insertion
...
in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).
More details in the PR.
2019-07-11 15:25:22 -07:00
Yann Collet
b01c1c679f
Merge pull request #1675 from ephiepark/dev
...
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00
Yann Collet
b8ec4b0fd6
updated version number (to v1.4.1)
...
also : added doc on context re-use, as suggested by @scherepanov at #1676
2019-07-09 11:43:59 -07:00
Yann Collet
096714d1b8
Merge pull request #1671 from ephiepark/dev
...
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park
f57ac7b09e
Factor out the logic to build sequences
2019-07-03 15:42:38 -07:00
Ephraim Park
9007701670
Adding targetCBlockSize param
2019-07-03 15:41:52 -07:00
Nick Terrell
6c92ba774e
ZSTD_compressSequences_internal assert op <= oend ( #1667 )
...
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.
2019-07-02 15:45:47 -07:00
Yann Collet
857e608b51
Merge pull request #1658 from facebook/memset
...
memset() rather than reduceIndex()
2019-07-01 15:01:43 -07:00
Yann Collet
4d611ca405
Merge pull request #1664 from ephiepark/dev
...
decodecorpus
2019-07-01 14:13:49 -07:00
Tyler-Tran
c55d2e7ba3
Adding shrinking flag for cover and fastcover ( #1656 )
...
* Changed ERROR(GENERIC) excluding inits
* editing git ignore
* Edited init functions to size_t returns
* moved declarations earlier
* resolved issues with changes to init functions
* fixed style and an error check
* attempting to add tests that might trigger changes
* added && die to cases expecting to fail
* resolved no die on expected failed command
* fixed accel to be incorrect value
* Adding an automated shrinking option
* Fixing build
* finalizing fixes
* fix?
* Removing added comment in cover.h
* Styling fixes
* Merging with fb dev
* removing megic number for default regression
* Requested revisions
* fixing support for fast cover
* fixing casting errors
* parenthesis fix
* fixing some build nits
* resolving travis ci syntax
* might resolve all compilation issues
* removed unused variable
* remodeling the selectDict function
* fixing bad memory access
* fixing error checks
* fixed erroring check in selectDict
* fixing mixed declarations
* modify mixed declaration
* fixing nits and adding test cases
* Adding requested changes + fixed bug for error checking
* switched double comparison from != to <
* fixed declaration typing
* refactoring COVER_best_finish() and changing shrinkDict
* removing the const's
* modifying ZDICT_optimizeTrainFromBuffer_cover functions
* fixing potential bad memcpy
* fixing the error function for dict size
2019-06-27 16:26:57 -07:00
Ephraim Park
c7c1ba3a19
Fix a constraint stricter than the spec
2019-06-26 16:43:37 -07:00
Yann Collet
621adde3b2
changed naming to ZSTD_indexTooCloseToMax()
...
Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.
Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.
2019-06-24 14:39:29 -07:00
Yann Collet
45c9fbd6d9
prefer memset() rather than reduceIndex() when close to index range limit
...
by disabling continue mode when index is close to limit.
2019-06-21 16:19:21 -07:00
Yann Collet
944e2e9e12
benchfn : added macro macro CONTROL()
...
like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).
2019-06-21 15:58:55 -07:00