Bimba Shrestha
a874435478
Merge branch 'dev' into extract_sequences_api
2019-09-16 13:29:59 -07:00
Bimba Shrestha
bff6072e3a
Bailing early when collecting sequences and documentation
2019-09-16 08:26:21 -07:00
W. Felix Handte
20c69077d1
Shrink Table Valid End During Alloc Alignment / Phase Change
2019-09-11 17:14:59 -04:00
W. Felix Handte
51d90668ba
Add Assertions to Confirm that Workspace Pointers are Correctly Ordered
2019-09-11 17:14:59 -04:00
W. Felix Handte
a10c191613
__msan_poison()
Workspace When Preparing for Re-Use
2019-09-11 17:14:45 -04:00
W. Felix Handte
7c57e2b9ca
Zero h3size
When h3log
is 0
...
This led to a nasty edgecase, where index reduction for modes that don't use
the h3 table would have a degenerate table (size 4) allocated and marked clean,
but which would not be re-indexed.
2019-09-11 13:14:26 -04:00
W. Felix Handte
bc020eec92
Also Shrink Clean Table Area When Reducing Indices
2019-09-11 11:40:57 -04:00
W. Felix Handte
1999b2ed9b
Update DEBUGLOG Statements
2019-09-11 11:21:00 -04:00
W. Felix Handte
13e29a56de
Shrink Clean Table Area When Copying Table Contents into Context
...
The source matchState is potentially at a lower current index, which means
that any extra table space not overwritten by the copy may now contain
invalid indices. The simple solution is to unconditionally shrink the valid
table area to just the area overwritten.
2019-09-11 11:18:45 -04:00
W. Felix Handte
edb3ad053e
Comments
2019-09-10 18:25:45 -04:00
W. Felix Handte
f31ef28ff8
Only Reset Indexing in ZSTD_resetCCtx_internal()
When Necessary
2019-09-10 18:25:45 -04:00
W. Felix Handte
9968a53e91
Remove No-Longer-Used Continuation Functions
2019-09-10 18:25:45 -04:00
W. Felix Handte
1b28e80416
Remove Fast Continue Path in ZSTD_resetCCtx_internal()
2019-09-10 18:25:45 -04:00
W. Felix Handte
ad16eda5e4
ZSTD_reset_matchState
Optionally Doesn't Restart Indexing
2019-09-10 18:25:45 -04:00
W. Felix Handte
5b10bb5ec3
Rename ZSTD_compResetPolicy_e
Values and Add Comment
2019-09-10 18:25:45 -04:00
W. Felix Handte
0492b9a9ec
Accept ZSTD_indexResetPolicy_e
Param in ZSTD_reset_matchState()
2019-09-10 18:25:45 -04:00
W. Felix Handte
14c5471d5e
Introduce ZSTD_indexResetPolicy_e
Enum
2019-09-10 18:25:45 -04:00
W. Felix Handte
17b6da2e0f
Track Usable Table Space in Compression Workspace
2019-09-10 18:25:37 -04:00
Yann Collet
22bd158e0f
Merge pull request #1712 from felixhandte/workspace-efficiency-2
...
Allocate Internal Buffers via Workspace Abstraction
2019-09-10 15:20:29 -07:00
Bimba Shrestha
1407919d13
Addressing comments on parsing
2019-09-10 15:10:50 -07:00
Bimba Shrestha
47199480da
Cleaning up parsing per suggestion
2019-09-10 13:18:59 -07:00
W. Felix Handte
a9d373f093
Remove Empty lib/compress/zstd_cwksp.c
2019-09-10 16:03:13 -04:00
Yann Collet
41416f0927
Merge pull request #1773 from bimbashrestha/rle_first_block_decompression_fix
...
Removing redundant condition in decompression, making first block rle…
2019-09-10 11:17:29 -07:00
Bimba Shrestha
e3c5825918
Fizing litLength == 0 case
2019-09-10 10:38:13 -07:00
Bimba Shrestha
9e7bb55e14
Addressing comments
2019-09-09 20:04:46 -07:00
W. Felix Handte
81208fd7c2
Forward Declare ZSTD_cwksp_available_space
to Fix Build
2019-09-09 19:10:09 -04:00
W. Felix Handte
91bf1babd1
Inline Workspace Functions
2019-09-09 18:53:53 -04:00
W. Felix Handte
0db3ffe7ee
Forward resetCCtx Errors when Using CDict
2019-09-09 16:47:19 -04:00
W. Felix Handte
eb6f69d978
Fix sizeof_CCtx and sizeof_CDict Calculations for Statically Init'ed Objects
2019-09-09 16:45:17 -04:00
W. Felix Handte
e3703825a8
Fix workspaceTooSmall Calculation
2019-09-09 15:12:14 -04:00
W. Felix Handte
0a65a67901
Shorten &zc->workspace
-> ws
in ZSTD_resetCCtx_internal()
2019-09-09 14:59:09 -04:00
W. Felix Handte
1120e4d962
Clean Up TODOs and Comments pt. II
2019-09-09 14:04:39 -04:00
W. Felix Handte
c60e1c3be5
Nit
2019-09-09 13:34:08 -04:00
W. Felix Handte
7d7b665c90
Pull Phase Advance Logic Out into Internal Function
2019-09-09 13:34:08 -04:00
W. Felix Handte
8549ae9f1d
Hide Workspace Movement Behind Helper Function
2019-09-09 13:34:08 -04:00
W. Felix Handte
2405c03bcd
Fix DEBUGLOG Statement Levels
2019-09-09 13:34:08 -04:00
W. Felix Handte
7100d24221
Fix Rescale Continue Special Case
2019-09-09 13:34:08 -04:00
W. Felix Handte
7321e4c9f3
Remove Unused noRealloc CRP Value
2019-09-09 13:34:08 -04:00
W. Felix Handte
901bba4ca6
Re-Implement Workspace Shrinking when Oversized
2019-09-09 13:34:08 -04:00
W. Felix Handte
881bcd80ca
Cleanup from Move
2019-09-09 13:34:08 -04:00
W. Felix Handte
b511a84adc
Move Workspace Functions to Their Own File
2019-09-09 13:34:08 -04:00
W. Felix Handte
077a2d7dc9
Rename
2019-09-09 13:34:08 -04:00
W. Felix Handte
ebd162194f
Clean Up TODOs and Comments
2019-09-09 13:34:08 -04:00
W. Felix Handte
2abe0145b1
Improve Comments a Bit
2019-09-09 13:34:08 -04:00
W. Felix Handte
7a2416a863
Allocate CDict in Workspace (Rather than in Separate Allocation)
2019-09-09 13:34:08 -04:00
W. Felix Handte
65057cf009
Rewrite ZSTD_initStaticCCtx to Alloc CCtx in Workspace
2019-09-09 13:34:08 -04:00
W. Felix Handte
58b69ab15c
Only the CCtx Itself Needs to be Cleared during Static CCtx Init
2019-09-09 13:34:08 -04:00
W. Felix Handte
88c2fcd0ee
Align Alloc Pointer When Transitioning from Buffers to Aligned Allocs
2019-09-09 13:34:08 -04:00
W. Felix Handte
e936b73889
Remove Overly-Restrictive Assert
2019-09-09 13:34:08 -04:00
W. Felix Handte
75d574368b
When Loading Dict By Copy, Always Put it in the Workspace
2019-09-09 13:34:08 -04:00
W. Felix Handte
e69b67e33a
Alloc Tables Separately
2019-09-09 13:34:08 -04:00
W. Felix Handte
6177354b36
Begin Introducing Phases
2019-09-09 13:34:08 -04:00
W. Felix Handte
786f2266bb
TMP
2019-09-09 13:34:08 -04:00
W. Felix Handte
c25283cf00
Disambiguate 'workspace' and 'entropyWorkspace'
2019-09-09 13:34:08 -04:00
W. Felix Handte
ccaac852e8
Normalize Case 'workSpace' -> 'workspace'
2019-09-09 13:27:18 -04:00
Bimba Shrestha
44e122053b
Mentioning cli only in the comment as suggested
2019-09-06 14:48:41 -07:00
Bimba Shrestha
a917cd597d
Put back omission for first rle block and updated comment as suggested
2019-09-06 13:44:25 -07:00
Bimba Shrestha
d687d603e4
Removing redundant condition in decompression, making first block rles valid to deocmpress
2019-09-06 10:46:19 -07:00
Varun S Nair
9816560649
Fixing assert and DEBUGLOG due to ZSTD_CCtx_params parameter change to const pointer
2019-09-05 15:47:17 +05:30
Varun S Nair
771645471f
Passing ZSTD_CCtx_params by const pointer
2019-09-05 15:28:30 +05:30
Bimba Shrestha
5f8b0f6890
Changing api to get sequences across all blocks
2019-08-30 09:18:44 -07:00
Yann Collet
5198347382
Merge pull request #1744 from bimbashrestha/dev
...
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
Bimba Shrestha
623b90f85d
Fixing ci-circle test complaints
2019-08-29 13:09:42 -07:00
Bimba Shrestha
ece465644b
Adding api for extracting sequences from seqstore
2019-08-29 12:29:39 -07:00
mgrice
b830599582
Improvements in zstd decode performance
...
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3). This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.
Speedup on Xeon E5-2680v4 is in the range of 3-5%.
Measured wildcopy length distributions on silesia.tar:
level <=8 <=16 <=24 >24
1 78.05% 11.49% 3.52% 6.94%
3 82.14% 8.99% 2.44% 6.43%
6 85.81% 6.51% 2.92% 4.76%
8 83.02% 7.31% 3.64% 6.03%
10 84.13% 6.67% 3.29% 5.91%
15 77.58% 7.55% 5.21% 9.66%
16 80.07% 7.20% 3.98% 8.75%
Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
Bimba Shrestha
c3e3c8bf32
Undoing the last commit (that was an accident)
2019-08-29 12:05:47 -07:00
bimbashrestha
4a1ca5e0a8
Adding method for extracting sequences.
2019-08-29 11:55:12 -07:00
bimbashrestha
e5704bbfdf
Added test for multiple blocks of zeros and fixed nit about comments
2019-08-28 08:32:34 -07:00
bimbashrestha
96201d9774
Added bool to cctx and fixed some comment nits
2019-08-26 15:30:41 -07:00
bimbashrestha
991cbc9024
Fixing mixed declaration compiler complaint
2019-08-26 15:00:50 -07:00
bimbashrestha
ce264ce53b
Forbiding emission of RLE when its the first block
2019-08-26 14:54:29 -07:00
bimbashrestha
33b6446ca7
Removing accidental method call
2019-08-26 14:34:43 -07:00
bimbashrestha
7b041b552e
Removing assert for rle that doesn't always hold
2019-08-26 12:26:53 -07:00
bimbashrestha
1f2bf77f2a
Using typedef U32 instead of int
2019-08-26 09:00:22 -07:00
bimbashrestha
ba46932492
Removing implicit conversion from const void* to const BYTE* and added constant for threshold
2019-08-26 08:51:34 -07:00
bimbashrestha
0e3ba02cf1
Fixing more test falure errors
2019-08-22 13:54:41 -07:00
bimbashrestha
4faf3a5911
Fixing ci-circle test failure issues
2019-08-22 13:46:15 -07:00
bimbashrestha
cba5350f88
Moving RLE logic to inside ZSTD_compressBlock_internal and adding assert
2019-08-22 12:12:44 -07:00
Nick Magerko
493f95c7df
Fix merge conflicts
2019-08-22 11:51:41 -07:00
bimbashrestha
4c90d862e3
Generate RLE blocks in the encoder
2019-08-22 11:27:20 -07:00
Nick Magerko
c7a24d7a14
Define ZSTD_SRCSIZEHINT_MIN as 0
2019-08-20 13:06:15 -07:00
Nick Magerko
2d39b43906
Use int for srcSizeHint when sensible
2019-08-19 16:49:25 -07:00
Nick Magerko
edf2abf106
Fix fall-through case
2019-08-19 12:32:43 -07:00
Nick Magerko
dffbac5f89
Add --size-hint=# option
2019-08-19 11:38:49 -07:00
Yann Collet
782bfb858a
fixed very minor inefficiency (nbSeq==127)
...
The nbSeq "short" format (1-byte)
is compatible with any value < 128.
However, the code would cautiously only accept values < 127.
This is not an error, because the general 2-bytes format
is compatible with small values < 128.
Hence the inefficiency never triggered any warning.
Spotted by Intel's Smita Kumar.
2019-08-15 16:41:34 +02:00
Yann Collet
facbe8b2c2
factored the logic selecting lowest match index
...
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
Yann Collet
0b0b83e8f3
fix test 122
...
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
98e7c344cd
fixed strategies btopt+
2019-08-02 14:42:53 +02:00
Yann Collet
b4257b04e7
fixed strategy btlazy2
2019-08-02 14:26:26 +02:00
Yann Collet
5cf1b24aca
fixed strategies greedy, lazy & lazy2
...
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet
98692c2838
fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3
2019-08-01 15:58:17 +02:00
Yann Collet
be3d2e2de8
Merge pull request #1679 from ephiepark/dev
...
Restructure the source files
2019-07-19 15:29:07 -07:00
Ephraim Park
1dc98de279
Restructure the source files
2019-07-15 17:39:18 -07:00
Yann Collet
8fb08b68cc
Merge pull request #1681 from facebook/level3
...
updated double_fast complementary insertion
2019-07-12 16:16:06 -07:00
Nick Terrell
75cfe1dc69
[ldm] Fix bug in overflow correction with large job size ( #1678 )
...
* [ldm] Fix bug in overflow correction with large job size
* [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode)
* [test] Add test that exposes the bug
Sadly the test fails on our CI because it uses too much memory, so
I had to comment it out.
2019-07-12 18:45:18 -04:00
Yann Collet
eaeb7f00b5
updated the _extDict variant of double fast
2019-07-12 14:17:17 -07:00
Yann Collet
e8a7f5d3ce
double-fast: changed the trade-off for a smaller positive change
...
same number of complementary insertions, just organized differently
(long at `ip-2`, short at `ip-1`).
2019-07-12 11:34:53 -07:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Yann Collet
d1327738c2
updated double_fast complementary insertion
...
in a way which is more favorable to compression ratio,
though very slightly slower (~-1%).
More details in the PR.
2019-07-11 15:25:22 -07:00
Yann Collet
b01c1c679f
Merge pull request #1675 from ephiepark/dev
...
Factor out the logic to build sequences
2019-07-10 13:32:31 -07:00