Commit Graph

1261 Commits

Author SHA1 Message Date
W. Felix Handte
786f2266bb TMP 2019-09-09 13:34:08 -04:00
W. Felix Handte
c25283cf00 Disambiguate 'workspace' and 'entropyWorkspace' 2019-09-09 13:34:08 -04:00
W. Felix Handte
ccaac852e8 Normalize Case 'workSpace' -> 'workspace' 2019-09-09 13:27:18 -04:00
Varun S Nair
9816560649 Fixing assert and DEBUGLOG due to ZSTD_CCtx_params parameter change to const pointer 2019-09-05 15:47:17 +05:30
Varun S Nair
771645471f Passing ZSTD_CCtx_params by const pointer 2019-09-05 15:28:30 +05:30
Yann Collet
5198347382
Merge pull request #1744 from bimbashrestha/dev
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
Bimba Shrestha
c3e3c8bf32 Undoing the last commit (that was an accident) 2019-08-29 12:05:47 -07:00
bimbashrestha
4a1ca5e0a8 Adding method for extracting sequences. 2019-08-29 11:55:12 -07:00
bimbashrestha
e5704bbfdf Added test for multiple blocks of zeros and fixed nit about comments 2019-08-28 08:32:34 -07:00
bimbashrestha
96201d9774 Added bool to cctx and fixed some comment nits 2019-08-26 15:30:41 -07:00
bimbashrestha
991cbc9024 Fixing mixed declaration compiler complaint 2019-08-26 15:00:50 -07:00
bimbashrestha
ce264ce53b Forbiding emission of RLE when its the first block 2019-08-26 14:54:29 -07:00
bimbashrestha
33b6446ca7 Removing accidental method call 2019-08-26 14:34:43 -07:00
bimbashrestha
7b041b552e Removing assert for rle that doesn't always hold 2019-08-26 12:26:53 -07:00
bimbashrestha
1f2bf77f2a Using typedef U32 instead of int 2019-08-26 09:00:22 -07:00
bimbashrestha
ba46932492 Removing implicit conversion from const void* to const BYTE* and added constant for threshold 2019-08-26 08:51:34 -07:00
bimbashrestha
0e3ba02cf1 Fixing more test falure errors 2019-08-22 13:54:41 -07:00
bimbashrestha
4faf3a5911 Fixing ci-circle test failure issues 2019-08-22 13:46:15 -07:00
bimbashrestha
cba5350f88 Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert 2019-08-22 12:12:44 -07:00
Nick Magerko
493f95c7df Fix merge conflicts 2019-08-22 11:51:41 -07:00
bimbashrestha
4c90d862e3 Generate RLE blocks in the encoder 2019-08-22 11:27:20 -07:00
Nick Magerko
c7a24d7a14 Define ZSTD_SRCSIZEHINT_MIN as 0 2019-08-20 13:06:15 -07:00
Nick Magerko
2d39b43906 Use int for srcSizeHint when sensible 2019-08-19 16:49:25 -07:00
Nick Magerko
edf2abf106 Fix fall-through case 2019-08-19 12:32:43 -07:00
Nick Magerko
dffbac5f89 Add --size-hint=# option 2019-08-19 11:38:49 -07:00
Yann Collet
782bfb858a fixed very minor inefficiency (nbSeq==127)
The nbSeq "short" format (1-byte)
is compatible with any value < 128.

However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.

Spotted by Intel's Smita Kumar.
2019-08-15 16:41:34 +02:00
Yann Collet
facbe8b2c2 factored the logic selecting lowest match index
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
0b0b83e8f3 fix test 122
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
98e7c344cd fixed strategies btopt+ 2019-08-02 14:42:53 +02:00
Yann Collet
b4257b04e7 fixed strategy btlazy2 2019-08-02 14:26:26 +02:00
Yann Collet
5cf1b24aca fixed strategies greedy, lazy & lazy2
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet
98692c2838 fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3 2019-08-01 15:58:17 +02:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
Restructure the source files
2019-07-19 15:29:07 -07:00
Ephraim Park
1dc98de279 Restructure the source files 2019-07-15 17:39:18 -07:00
Yann Collet
8fb08b68cc
Merge pull request #1681 from facebook/level3
updated double_fast complementary insertion
2019-07-12 16:16:06 -07:00
Nick Terrell
75cfe1dc69
[ldm] Fix bug in overflow correction with large job size (#1678)
* [ldm] Fix bug in overflow correction with large job size

* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)

* [test] Add test that exposes the bug

Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Yann Collet
eaeb7f00b5 updated the _extDict variant of double fast 2019-07-12 14:17:17 -07:00
Yann Collet
e8a7f5d3ce double-fast: changed the trade-off for a smaller positive change
same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).
2019-07-12 11:34:53 -07:00
mgrice
812e8f2a16 perf improvements for zstd decode (#1668)
* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
 via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet
d1327738c2 updated double_fast complementary insertion
in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).

More details in the PR.
2019-07-11 15:25:22 -07:00
Yann Collet
b01c1c679f
Merge pull request #1675 from ephiepark/dev
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00
Yann Collet
096714d1b8
Merge pull request #1671 from ephiepark/dev
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park
f57ac7b09e Factor out the logic to build sequences 2019-07-03 15:42:38 -07:00
Ephraim Park
9007701670 Adding targetCBlockSize param 2019-07-03 15:41:52 -07:00
Nick Terrell
6c92ba774e
ZSTD_compressSequences_internal assert op <= oend (#1667)
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.
2019-07-02 15:45:47 -07:00
Yann Collet
857e608b51
Merge pull request #1658 from facebook/memset
memset() rather than reduceIndex()
2019-07-01 15:01:43 -07:00
Yann Collet
621adde3b2 changed naming to ZSTD_indexTooCloseToMax()
Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.

Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.
2019-06-24 14:39:29 -07:00
Yann Collet
45c9fbd6d9 prefer memset() rather than reduceIndex() when close to index range limit
by disabling continue mode when index is close to limit.
2019-06-21 16:19:21 -07:00
Yann Collet
944e2e9e12 benchfn : added macro macro CONTROL()
like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).
2019-06-21 15:58:55 -07:00
Nick Terrell
674534a700 [zstd] Fix data corruption in niche use case
* Extract the overflow correction into a helper function.
* Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time
  and overflow correct between each chunk.

Data corruption could happen when all these conditions are true:

* You are using multithreading mode
* Your overlap size is >= 512 MB (implies window size >= 512 MB)
* You are using a strategy >= ZSTD_btlazy
* You are compressing more than 4 GB

The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.
2019-06-21 15:47:31 -07:00