W. Felix Handte
da88c35d41
Stop Assuming Tables are Adjacent
2019-10-10 13:40:16 -04:00
W. Felix Handte
35c30d6ca7
Poison Unused Workspace Memory
2019-10-10 13:40:16 -04:00
Bimba Shrestha
36528b96c4
Manually moving instead of memcpy on decoder and using genBuffer()
2019-10-03 09:26:51 -07:00
Bimba Shrestha
61ec4c2e7f
Cleaning sequence parsing logic
2019-10-03 06:42:40 -07:00
Bimba Shrestha
c04245b257
Replacing assert with memory_allocation error code throw
2019-09-23 15:42:16 -07:00
Bimba Shrestha
be0bebd24e
Adding test and null check for malloc
2019-09-23 15:08:18 -07:00
Nick Terrell
5cb7615f1f
Add UNUSED_ATTR to ZSTD_storeSeq()
2019-09-20 21:37:13 -07:00
Nick Terrell
5dc0a1d659
HINT_INLINE ZSTD_storeSeq()
...
Clang on Mac wasn't inlining `ZSTD_storeSeq()` in level 1, which was
causing a 5% performance regression. This fixes it.
2019-09-20 16:39:27 -07:00
Bimba Shrestha
f3c4fd17e3
Passing in dummy dst buffer of compressbound(srcSize)
2019-09-20 15:50:58 -07:00
Nick Terrell
44c65da97e
Remove literals overread in ZSTD_storeSeq() for ~neutral perf
2019-09-20 12:23:25 -07:00
Nick Terrell
fde217df04
Fix bounds check in ZSTD_storeSeq()
2019-09-20 08:25:12 -07:00
Nick Terrell
67b1f5fc72
Fix too strict assert
2019-09-20 01:23:35 -07:00
Nick Terrell
ddab2a94e8
Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy()
2019-09-20 00:56:20 -07:00
Nick Terrell
efd37a64ea
Optimize decompression and fix wildcopy overread
...
* Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread.
* Optimize `ZSTD_wildcopy()` by removing unnecessary branches and
unrolling the loop.
* Extract `ZSTD_overlapCopy8()` into its own function.
* Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is
optimized for single long sequences, since that is the important
case that can end up in `ZSTD_execSequenceEnd()`. Without this
optimization, decompressing a block with 1 long match goes
from 5.7 GB/s to 800 MB/s.
* Refactor `ZSTD_execSequenceEnd()`.
* Increase the literal copy shortcut to 16.
* Add a shortcut for offset >= 16.
* Simplify `ZSTD_execSequence()` by pushing more cases into
`ZSTD_execSequenceEnd()`.
* Delete `ZSTD_execSequenceLong()` since it is exactly the
same as `ZSTD_execSequence()`.
clang-8 seeds +17.5% on silesia and +21.8% on enwik8.
gcc-9 sees +12% on silesia and +15.5% on enwik8.
TODO: More detailed measurements, and on more datasets.
Crdit to OSS-Fuzz for finding the wildcopy overread.
2019-09-19 21:07:14 -07:00
Bimba Shrestha
ae6d0e64ae
Addressing comments
2019-09-19 15:25:20 -07:00
Yann Collet
bfff5b30a4
Merge pull request #1756 from mgrice/dev
...
Improvements in zstd decode performance
2019-09-18 11:35:50 -07:00
Yann Collet
243200e5bf
minor refactor of ZSTD_fast
...
- reduced variables lifetime
- more accurate code comments
2019-09-17 14:02:57 -07:00
Bimba Shrestha
76fea3fb99
Resolving appveyor test failure implicit conversion
2019-09-16 14:02:23 -07:00
Bimba Shrestha
a874435478
Merge branch 'dev' into extract_sequences_api
2019-09-16 13:29:59 -07:00
Bimba Shrestha
bff6072e3a
Bailing early when collecting sequences and documentation
2019-09-16 08:26:21 -07:00
W. Felix Handte
20c69077d1
Shrink Table Valid End During Alloc Alignment / Phase Change
2019-09-11 17:14:59 -04:00
W. Felix Handte
51d90668ba
Add Assertions to Confirm that Workspace Pointers are Correctly Ordered
2019-09-11 17:14:59 -04:00
W. Felix Handte
a10c191613
__msan_poison()
Workspace When Preparing for Re-Use
2019-09-11 17:14:45 -04:00
W. Felix Handte
7c57e2b9ca
Zero h3size
When h3log
is 0
...
This led to a nasty edgecase, where index reduction for modes that don't use
the h3 table would have a degenerate table (size 4) allocated and marked clean,
but which would not be re-indexed.
2019-09-11 13:14:26 -04:00
W. Felix Handte
bc020eec92
Also Shrink Clean Table Area When Reducing Indices
2019-09-11 11:40:57 -04:00
W. Felix Handte
1999b2ed9b
Update DEBUGLOG Statements
2019-09-11 11:21:00 -04:00
W. Felix Handte
13e29a56de
Shrink Clean Table Area When Copying Table Contents into Context
...
The source matchState is potentially at a lower current index, which means
that any extra table space not overwritten by the copy may now contain
invalid indices. The simple solution is to unconditionally shrink the valid
table area to just the area overwritten.
2019-09-11 11:18:45 -04:00
W. Felix Handte
edb3ad053e
Comments
2019-09-10 18:25:45 -04:00
W. Felix Handte
f31ef28ff8
Only Reset Indexing in ZSTD_resetCCtx_internal()
When Necessary
2019-09-10 18:25:45 -04:00
W. Felix Handte
9968a53e91
Remove No-Longer-Used Continuation Functions
2019-09-10 18:25:45 -04:00
W. Felix Handte
1b28e80416
Remove Fast Continue Path in ZSTD_resetCCtx_internal()
2019-09-10 18:25:45 -04:00
W. Felix Handte
ad16eda5e4
ZSTD_reset_matchState
Optionally Doesn't Restart Indexing
2019-09-10 18:25:45 -04:00
W. Felix Handte
5b10bb5ec3
Rename ZSTD_compResetPolicy_e
Values and Add Comment
2019-09-10 18:25:45 -04:00
W. Felix Handte
0492b9a9ec
Accept ZSTD_indexResetPolicy_e
Param in ZSTD_reset_matchState()
2019-09-10 18:25:45 -04:00
W. Felix Handte
14c5471d5e
Introduce ZSTD_indexResetPolicy_e
Enum
2019-09-10 18:25:45 -04:00
W. Felix Handte
17b6da2e0f
Track Usable Table Space in Compression Workspace
2019-09-10 18:25:37 -04:00
Yann Collet
22bd158e0f
Merge pull request #1712 from felixhandte/workspace-efficiency-2
...
Allocate Internal Buffers via Workspace Abstraction
2019-09-10 15:20:29 -07:00
Bimba Shrestha
1407919d13
Addressing comments on parsing
2019-09-10 15:10:50 -07:00
Bimba Shrestha
47199480da
Cleaning up parsing per suggestion
2019-09-10 13:18:59 -07:00
W. Felix Handte
a9d373f093
Remove Empty lib/compress/zstd_cwksp.c
2019-09-10 16:03:13 -04:00
Yann Collet
41416f0927
Merge pull request #1773 from bimbashrestha/rle_first_block_decompression_fix
...
Removing redundant condition in decompression, making first block rle…
2019-09-10 11:17:29 -07:00
Bimba Shrestha
e3c5825918
Fizing litLength == 0 case
2019-09-10 10:38:13 -07:00
Bimba Shrestha
9e7bb55e14
Addressing comments
2019-09-09 20:04:46 -07:00
W. Felix Handte
81208fd7c2
Forward Declare ZSTD_cwksp_available_space
to Fix Build
2019-09-09 19:10:09 -04:00
W. Felix Handte
91bf1babd1
Inline Workspace Functions
2019-09-09 18:53:53 -04:00
W. Felix Handte
0db3ffe7ee
Forward resetCCtx Errors when Using CDict
2019-09-09 16:47:19 -04:00
W. Felix Handte
eb6f69d978
Fix sizeof_CCtx and sizeof_CDict Calculations for Statically Init'ed Objects
2019-09-09 16:45:17 -04:00
W. Felix Handte
e3703825a8
Fix workspaceTooSmall Calculation
2019-09-09 15:12:14 -04:00
W. Felix Handte
0a65a67901
Shorten &zc->workspace
-> ws
in ZSTD_resetCCtx_internal()
2019-09-09 14:59:09 -04:00
W. Felix Handte
1120e4d962
Clean Up TODOs and Comments pt. II
2019-09-09 14:04:39 -04:00
W. Felix Handte
c60e1c3be5
Nit
2019-09-09 13:34:08 -04:00
W. Felix Handte
7d7b665c90
Pull Phase Advance Logic Out into Internal Function
2019-09-09 13:34:08 -04:00
W. Felix Handte
8549ae9f1d
Hide Workspace Movement Behind Helper Function
2019-09-09 13:34:08 -04:00
W. Felix Handte
2405c03bcd
Fix DEBUGLOG Statement Levels
2019-09-09 13:34:08 -04:00
W. Felix Handte
7100d24221
Fix Rescale Continue Special Case
2019-09-09 13:34:08 -04:00
W. Felix Handte
7321e4c9f3
Remove Unused noRealloc CRP Value
2019-09-09 13:34:08 -04:00
W. Felix Handte
901bba4ca6
Re-Implement Workspace Shrinking when Oversized
2019-09-09 13:34:08 -04:00
W. Felix Handte
881bcd80ca
Cleanup from Move
2019-09-09 13:34:08 -04:00
W. Felix Handte
b511a84adc
Move Workspace Functions to Their Own File
2019-09-09 13:34:08 -04:00
W. Felix Handte
077a2d7dc9
Rename
2019-09-09 13:34:08 -04:00
W. Felix Handte
ebd162194f
Clean Up TODOs and Comments
2019-09-09 13:34:08 -04:00
W. Felix Handte
2abe0145b1
Improve Comments a Bit
2019-09-09 13:34:08 -04:00
W. Felix Handte
7a2416a863
Allocate CDict in Workspace (Rather than in Separate Allocation)
2019-09-09 13:34:08 -04:00
W. Felix Handte
65057cf009
Rewrite ZSTD_initStaticCCtx to Alloc CCtx in Workspace
2019-09-09 13:34:08 -04:00
W. Felix Handte
58b69ab15c
Only the CCtx Itself Needs to be Cleared during Static CCtx Init
2019-09-09 13:34:08 -04:00
W. Felix Handte
88c2fcd0ee
Align Alloc Pointer When Transitioning from Buffers to Aligned Allocs
2019-09-09 13:34:08 -04:00
W. Felix Handte
e936b73889
Remove Overly-Restrictive Assert
2019-09-09 13:34:08 -04:00
W. Felix Handte
75d574368b
When Loading Dict By Copy, Always Put it in the Workspace
2019-09-09 13:34:08 -04:00
W. Felix Handte
e69b67e33a
Alloc Tables Separately
2019-09-09 13:34:08 -04:00
W. Felix Handte
6177354b36
Begin Introducing Phases
2019-09-09 13:34:08 -04:00
W. Felix Handte
786f2266bb
TMP
2019-09-09 13:34:08 -04:00
W. Felix Handte
c25283cf00
Disambiguate 'workspace' and 'entropyWorkspace'
2019-09-09 13:34:08 -04:00
W. Felix Handte
ccaac852e8
Normalize Case 'workSpace' -> 'workspace'
2019-09-09 13:27:18 -04:00
Bimba Shrestha
44e122053b
Mentioning cli only in the comment as suggested
2019-09-06 14:48:41 -07:00
Bimba Shrestha
a917cd597d
Put back omission for first rle block and updated comment as suggested
2019-09-06 13:44:25 -07:00
Bimba Shrestha
d687d603e4
Removing redundant condition in decompression, making first block rles valid to deocmpress
2019-09-06 10:46:19 -07:00
Varun S Nair
9816560649
Fixing assert and DEBUGLOG due to ZSTD_CCtx_params parameter change to const pointer
2019-09-05 15:47:17 +05:30
Varun S Nair
771645471f
Passing ZSTD_CCtx_params by const pointer
2019-09-05 15:28:30 +05:30
Bimba Shrestha
5f8b0f6890
Changing api to get sequences across all blocks
2019-08-30 09:18:44 -07:00
Yann Collet
5198347382
Merge pull request #1744 from bimbashrestha/dev
...
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
Bimba Shrestha
623b90f85d
Fixing ci-circle test complaints
2019-08-29 13:09:42 -07:00
Bimba Shrestha
ece465644b
Adding api for extracting sequences from seqstore
2019-08-29 12:29:39 -07:00
mgrice
b830599582
Improvements in zstd decode performance
...
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3). This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.
Speedup on Xeon E5-2680v4 is in the range of 3-5%.
Measured wildcopy length distributions on silesia.tar:
level <=8 <=16 <=24 >24
1 78.05% 11.49% 3.52% 6.94%
3 82.14% 8.99% 2.44% 6.43%
6 85.81% 6.51% 2.92% 4.76%
8 83.02% 7.31% 3.64% 6.03%
10 84.13% 6.67% 3.29% 5.91%
15 77.58% 7.55% 5.21% 9.66%
16 80.07% 7.20% 3.98% 8.75%
Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
Bimba Shrestha
c3e3c8bf32
Undoing the last commit (that was an accident)
2019-08-29 12:05:47 -07:00
bimbashrestha
4a1ca5e0a8
Adding method for extracting sequences.
2019-08-29 11:55:12 -07:00
bimbashrestha
e5704bbfdf
Added test for multiple blocks of zeros and fixed nit about comments
2019-08-28 08:32:34 -07:00
bimbashrestha
96201d9774
Added bool to cctx and fixed some comment nits
2019-08-26 15:30:41 -07:00
bimbashrestha
991cbc9024
Fixing mixed declaration compiler complaint
2019-08-26 15:00:50 -07:00
bimbashrestha
ce264ce53b
Forbiding emission of RLE when its the first block
2019-08-26 14:54:29 -07:00
bimbashrestha
33b6446ca7
Removing accidental method call
2019-08-26 14:34:43 -07:00
bimbashrestha
7b041b552e
Removing assert for rle that doesn't always hold
2019-08-26 12:26:53 -07:00
bimbashrestha
1f2bf77f2a
Using typedef U32 instead of int
2019-08-26 09:00:22 -07:00
bimbashrestha
ba46932492
Removing implicit conversion from const void* to const BYTE* and added constant for threshold
2019-08-26 08:51:34 -07:00
bimbashrestha
0e3ba02cf1
Fixing more test falure errors
2019-08-22 13:54:41 -07:00
bimbashrestha
4faf3a5911
Fixing ci-circle test failure issues
2019-08-22 13:46:15 -07:00
bimbashrestha
cba5350f88
Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert
2019-08-22 12:12:44 -07:00
Nick Magerko
493f95c7df
Fix merge conflicts
2019-08-22 11:51:41 -07:00
bimbashrestha
4c90d862e3
Generate RLE blocks in the encoder
2019-08-22 11:27:20 -07:00
Nick Magerko
c7a24d7a14
Define ZSTD_SRCSIZEHINT_MIN as 0
2019-08-20 13:06:15 -07:00
Nick Magerko
2d39b43906
Use int for srcSizeHint when sensible
2019-08-19 16:49:25 -07:00
Nick Magerko
edf2abf106
Fix fall-through case
2019-08-19 12:32:43 -07:00
Nick Magerko
dffbac5f89
Add --size-hint=# option
2019-08-19 11:38:49 -07:00
Yann Collet
782bfb858a
fixed very minor inefficiency (nbSeq==127)
...
The nbSeq "short" format (1-byte)
is compatible with any value < 128.
However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.
Spotted by Intel's Smita Kumar.
2019-08-15 16:41:34 +02:00
Yann Collet
facbe8b2c2
factored the logic selecting lowest match index
...
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
0b0b83e8f3
fix test 122
...
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
98e7c344cd
fixed strategies btopt+
2019-08-02 14:42:53 +02:00
Yann Collet
b4257b04e7
fixed strategy btlazy2
2019-08-02 14:26:26 +02:00
Yann Collet
5cf1b24aca
fixed strategies greedy, lazy & lazy2
...
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet
98692c2838
fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3
2019-08-01 15:58:17 +02:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
...
Restructure the source files
2019-07-19 15:29:07 -07:00
Ephraim Park
1dc98de279
Restructure the source files
2019-07-15 17:39:18 -07:00
Yann Collet
8fb08b68cc
Merge pull request #1681 from facebook/level3
...
updated double_fast complementary insertion
2019-07-12 16:16:06 -07:00
Nick Terrell
75cfe1dc69
[ldm] Fix bug in overflow correction with large job size ( #1678 )
...
* [ldm] Fix bug in overflow correction with large job size
* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)
* [test] Add test that exposes the bug
Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Yann Collet
eaeb7f00b5
updated the _extDict variant of double fast
2019-07-12 14:17:17 -07:00
Yann Collet
e8a7f5d3ce
double-fast: changed the trade-off for a smaller positive change
...
same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).
2019-07-12 11:34:53 -07:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet
d1327738c2
updated double_fast complementary insertion
...
in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).
More details in the PR.
2019-07-11 15:25:22 -07:00
Yann Collet
b01c1c679f
Merge pull request #1675 from ephiepark/dev
...
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00
Yann Collet
096714d1b8
Merge pull request #1671 from ephiepark/dev
...
Adding targetCBlockSize param
2019-07-03 17:47:44 -07:00
Ephraim Park
f57ac7b09e
Factor out the logic to build sequences
2019-07-03 15:42:38 -07:00
Ephraim Park
9007701670
Adding targetCBlockSize param
2019-07-03 15:41:52 -07:00
Nick Terrell
6c92ba774e
ZSTD_compressSequences_internal assert op <= oend ( #1667 )
...
When we wrote one byte beyond the end of the buffer for RLE
blocks back in 1.3.7, we would then have `op > oend`. That is
a problem when we use `oend - op` for the size of the destination
buffer, and allows further writes beyond the end of the buffer for
the rest of the function. Lets assert that it doesn't happen.
2019-07-02 15:45:47 -07:00
Yann Collet
857e608b51
Merge pull request #1658 from facebook/memset
...
memset() rather than reduceIndex()
2019-07-01 15:01:43 -07:00
Yann Collet
621adde3b2
changed naming to ZSTD_indexTooCloseToMax()
...
Also : minor speed optimization :
shortcut to ZSTD_reset_matchState() rather than the full reset process.
It still needs to be completed with ZSTD_continueCCtx() for proper initialization.
Also : changed position of LDM hash tables in the context,
so that the "regular" hash tables can be at a predictable position,
hence allowing the shortcut to ZSTD_reset_matchState() without complex conditions.
2019-06-24 14:39:29 -07:00
Yann Collet
45c9fbd6d9
prefer memset() rather than reduceIndex() when close to index range limit
...
by disabling continue mode when index is close to limit.
2019-06-21 16:19:21 -07:00
Yann Collet
944e2e9e12
benchfn : added macro macro CONTROL()
...
like assert() but cannot be disabled.
proper separation of user contract errors (CONTROL())
and invariant verification (assert()).
2019-06-21 15:58:55 -07:00
Nick Terrell
674534a700
[zstd] Fix data corruption in niche use case
...
* Extract the overflow correction into a helper function.
* Load the dictionary `ZSTD_CHUNKSIZE_MAX = 512 MB` bytes at a time
and overflow correct between each chunk.
Data corruption could happen when all these conditions are true:
* You are using multithreading mode
* Your overlap size is >= 512 MB (implies window size >= 512 MB)
* You are using a strategy >= ZSTD_btlazy
* You are compressing more than 4 GB
The problem is that when loading a large dictionary we don't do
overflow correction. We can only load 512 MB at a time, and may
need to do overflow correction before each chunk.
2019-06-21 15:47:31 -07:00
Nick Terrell
4156060ca4
[zstdmt] Update assert to use ZSTD_WINDOWLOG_MAX
2019-06-21 15:39:33 -07:00
Nick Terrell
95e2b430ea
[opt] Add asserts for corruption in ZSTD_updateTree()
2019-06-21 15:22:29 -07:00
Yann Collet
9af909bf35
Merge pull request #1624 from facebook/smallwlog
...
Improves compression ratio for small windowLog
2019-06-14 17:28:21 -07:00
Nick Terrell
cdb9481e38
[libzstd] Optimize ZSTD_insertBt1() for repetitive data
...
We would only skip at most 192 bytes at a time before this diff.
This was added to optimize long matches and skip the middle of the
match. However, it doesn't handle the case of repetitive data.
This patch keeps the optimization, but also handles repetitive data
by taking the max of the two return values.
```
> for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 | command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done
strategy=1
0.27
strategy=2
0.23
strategy=3
0.27
strategy=4
0.43
strategy=5
0.56
strategy=6
0.43
strategy=7
0.34
strategy=8
0.34
strategy=9
0.35
```
At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes.
In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`.
Fixes Issue #1634 .
2019-06-05 20:34:00 -07:00
Yann Collet
80d6ccea79
removed UINT32_MAX
...
apparently not guaranteed on all platforms,
replaced by UINT_MAX.
2019-05-31 17:27:07 -07:00
Yann Collet
fce4df3ab7
fixed wrong assert in double_fast
2019-05-31 17:06:28 -07:00
Yann Collet
a968099038
minor code cleaning for new index invalidation strategy
2019-05-31 16:52:37 -07:00
Yann Collet
d605f482c7
make double_fast compatible with new index invalidation strategy
2019-05-31 16:50:04 -07:00
Yann Collet
a30febaeeb
Made fast strategy compatible with new offset validation strategy
...
fast mode does the same thing as before :
it pre-emptively invalidates any index that could lead to offset > maxDistance.
It's supposed to help speed.
But this logic is performed inside zstd_fast,
so that other strategies can select a different behavior.
2019-05-31 16:34:55 -07:00
Yann Collet
58adb1059f
extended exact window size to greedy/lazy modes
2019-05-31 16:08:48 -07:00
Yann Collet
bc601bdc6d
first implementation of small window size for btopt
...
noticeably improves compression ratio
when window size is small (< 18).
enwik7 level 19
windowLog `dev` `smallwlog` improvement
23 3.577 3.577 0.02%
22 3.536 3.538 0.06%
21 3.462 3.467 0.14%
20 3.364 3.377 0.39%
19 3.244 3.272 0.86%
18 3.110 3.166 1.80%
17 2.843 3.057 7.53%
16 2.724 2.943 8.04%
15 2.594 2.822 8.79%
14 2.456 2.686 9.36%
13 2.312 2.523 9.13%
12 2.162 2.361 9.20%
11 2.003 2.182 8.94%
2019-05-31 15:55:12 -07:00
Yann Collet
b13a9207f9
Merge pull request #1623 from facebook/fullbench
...
fullbench minor improvements
2019-05-31 14:40:19 -07:00
Yann Collet
ed38b645db
fullbench: pass proper parameters in scenario 43
2019-05-29 15:26:06 -07:00
Yann Collet
9719fd616c
removed nextToUpdate3 from ZSTD_window
...
it's now a local variable of ZSTD_compressBlock_opt()
2019-05-28 16:18:12 -07:00
Yann Collet
33dabc8c80
get bt matches : made it a bit clearer which parameters are input and output
2019-05-28 16:11:32 -07:00
Yann Collet
327cf6fac1
nextToUpdate3 does not need to be maintained outside of zstd_opt.c
...
It's re-synchronized with nextToUpdate at beginning of each block.
It only needs to be tracked from within zstd_opt block parser.
Made the logic clear, so that no code tried to maintain this variable.
An even better solution would be to make nextToUpdate3
an internal variable of ZSTD_compressBlock_opt_generic().
That would make it possible to remove it from ZSTD_matchState_t,
thus restricting its visibility to only where it's actually useful.
This would require deeper changes though,
since the matchState is the natural structure to transport parameters into and inside the parser.
2019-05-28 15:26:52 -07:00
Yann Collet
6453f8158f
complementary code comments
...
on variables used / impacted during maxDist check
2019-05-28 14:12:16 -07:00
Yann Collet
4baecdf72a
added comments to better understand enforceMaxDist()
2019-05-28 13:15:48 -07:00
Nick Terrell
a17fe4c9e5
[visual] Fix unreachable code warning
2019-04-16 11:32:35 -07:00
Nick Terrell
de0499f7fa
[libzstd] Require ZSTD_MULTITHREAD to create a ZSTDMT_CCtx
...
ZSTDMT was broken when compiled without ZSTD_MULTITHREAD defined,
because `ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, nbWorkerss)`
failed. It was detected by the MSVC test which runs the fuzzer with
multithreading disabled.
This is a very niche use case of a deprecated API, because the API is
inefficient and synchronous, since `threading.h` will be synchronous.
Users almost certainly don't want this, and anyone who tested their code
should realize that it is broken. Therefore, I think it is safe to
require `ZSTD_MULTITHREAD` to be defined to use ZSTDMT.
2019-04-15 23:04:46 -07:00
Josh Soref
a880ca239b
Spelling ( #1582 )
...
* spelling: accidentally
* spelling: across
* spelling: additionally
* spelling: addresses
* spelling: appropriate
* spelling: assumed
* spelling: available
* spelling: builder
* spelling: capacity
* spelling: compiler
* spelling: compressibility
* spelling: compressor
* spelling: compression
* spelling: contract
* spelling: convenience
* spelling: decompress
* spelling: description
* spelling: deflate
* spelling: deterministically
* spelling: dictionary
* spelling: display
* spelling: eliminate
* spelling: preemptively
* spelling: exclude
* spelling: failure
* spelling: independence
* spelling: independent
* spelling: intentionally
* spelling: matching
* spelling: maximum
* spelling: meaning
* spelling: mishandled
* spelling: memory
* spelling: occasionally
* spelling: occurrence
* spelling: official
* spelling: offsets
* spelling: original
* spelling: output
* spelling: overflow
* spelling: overridden
* spelling: parameter
* spelling: performance
* spelling: probability
* spelling: receives
* spelling: redundant
* spelling: recompression
* spelling: resources
* spelling: sanity
* spelling: segment
* spelling: series
* spelling: specified
* spelling: specify
* spelling: subtracted
* spelling: successful
* spelling: return
* spelling: translation
* spelling: update
* spelling: unrelated
* spelling: useless
* spelling: variables
* spelling: variety
* spelling: verbatim
* spelling: verification
* spelling: visited
* spelling: warming
* spelling: workers
* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
48a6427d22
[libzstd] Fix ZSTD_compress2() for multithreaded compression
...
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.
Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`
* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
`ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell
641e594309
[libzstd] Remove ZSTDMT from the shared object
...
* Remove ZSTDMT from the shared object by default.
* Provide a macro `ZSTD_LEGACY_MULTITHREADED_API` to override it.
* Document it in `lib/README.md`.
2019-04-07 18:47:52 -07:00