Commit Graph

4101 Commits

Author SHA1 Message Date
Yann Collet
2aba13b770 Merge pull request #769 from terrelln/real-block-split
[libzstd] Fix bug in Huffman encoding
2017-07-18 14:58:26 -07:00
Nick Terrell
d0b27483ae [zstdcli] Fix -t in streaming mode 2017-07-18 14:45:49 -07:00
Stella Lau
19258f51c1 Make the meaning of LDM_MEMORY_USAGE consistent across tables 2017-07-18 14:25:39 -07:00
Paul Cruz
a34bc30237 setting up basic readme 2017-07-18 13:31:02 -07:00
Paul Cruz
29c36cf051 rename completion variable, split up fwrite operations in order to track progress 2017-07-18 13:30:29 -07:00
Nick Terrell
cc1522351f [libzstd] Fix bug in Huffman encoding
Summary:
Huffman encoding with a bad dictionary can encode worse than the
HUF_BLOCKBOUND(srcSize), since we don't filter out incompressible
input, and even if we did, the dictionaries Huffman table could be
ill suited to compressing actual data.

The fast optimization doesn't seem to improve compression speed,
even when I hard coded fast = 1, the speed didn't improve over hard coding
it to 0.

Benchmarks:
$ ./zstd.dev -b1e5
Benchmarking levels from 1 to 5
 1#Synthetic 50%     :  10000000 ->   3139163 (3.186), 524.8 MB/s ,1890.0 MB/s
 2#Synthetic 50%     :  10000000 ->   3115138 (3.210), 372.6 MB/s ,1830.2 MB/s
 3#Synthetic 50%     :  10000000 ->   3222672 (3.103), 223.3 MB/s ,1400.2 MB/s
 4#Synthetic 50%     :  10000000 ->   3276678 (3.052), 198.0 MB/s ,1280.1 MB/s
 5#Synthetic 50%     :  10000000 ->   3271570 (3.057), 107.8 MB/s ,1200.0 MB/s
$ ./zstd -b1e5
Benchmarking levels from 1 to 5
 1#Synthetic 50%     :  10000000 ->   3139163 (3.186), 524.8 MB/s ,1870.2 MB/s
 2#Synthetic 50%     :  10000000 ->   3115138 (3.210), 370.0 MB/s ,1810.3 MB/s
 3#Synthetic 50%     :  10000000 ->   3222672 (3.103), 223.3 MB/s ,1380.1 MB/s
 4#Synthetic 50%     :  10000000 ->   3276678 (3.052), 196.1 MB/s ,1270.0 MB/s
 5#Synthetic 50%     :  10000000 ->   3271570 (3.057), 106.8 MB/s ,1180.1 MB/s
$ ./zstd.dev -b1e5 ../silesia.tar
Benchmarking levels from 1 to 5
 1#silesia.tar       : 211988480 ->  73651685 (2.878), 429.7 MB/s ,1096.5 MB/s
 2#silesia.tar       : 211988480 ->  70158785 (3.022), 321.2 MB/s ,1029.1 MB/s
 3#silesia.tar       : 211988480 ->  66993813 (3.164), 243.7 MB/s , 981.4 MB/s
 4#silesia.tar       : 211988480 ->  66306481 (3.197), 226.7 MB/s , 972.4 MB/s
 5#silesia.tar       : 211988480 ->  64757852 (3.274), 150.3 MB/s , 963.6 MB/s
$ ./zstd -b1e5 ../silesia.tar
Benchmarking levels from 1 to 5
 1#silesia.tar       : 211988480 ->  73651685 (2.878), 429.7 MB/s ,1087.1 MB/s
 2#silesia.tar       : 211988480 ->  70158785 (3.022), 318.8 MB/s ,1029.1 MB/s
 3#silesia.tar       : 211988480 ->  66993813 (3.164), 246.5 MB/s , 981.4 MB/s
 4#silesia.tar       : 211988480 ->  66306481 (3.197), 229.2 MB/s , 972.4 MB/s
 5#silesia.tar       : 211988480 ->  64757852 (3.274), 149.3 MB/s , 963.6 MB/s

Test Plan:
I added a test case to the fuzzer which crashed with ASAN before the patch
and succeeded after.
2017-07-18 13:20:40 -07:00
Paul Cruz
ae47eab2fd changed test cases to use -s setting on the diffs 2017-07-18 12:58:50 -07:00
Yann Collet
77d67fb167 Merge pull request #766 from terrelln/real-block-split
[libzstd] Pull optimal parser state out of seqStore_t
2017-07-18 08:26:24 -07:00
Yann Collet
14c83b05c7 Merge pull request #765 from terrelln/real-block-split
[libzstd] Remove ZSTD_CCtx* argument of ZSTD_compressSequences()
2017-07-17 19:25:55 -07:00
Stella Lau
fc41a87964 Experiment with using a lag when hashing 2017-07-17 18:13:09 -07:00
Paul Cruz
5af04c57b0 change parameters for compression level adapt 2017-07-17 17:59:50 -07:00
Paul Cruz
b3c9e02bb6 added signal to other threads whenever error occurs 2017-07-17 15:34:58 -07:00
Nick Terrell
7a28b9e4a3 [libzstd] Pull optimal parser state out of seqStore_t 2017-07-17 15:29:11 -07:00
Stella Lau
a00e406231 Remove version archive 2017-07-17 15:17:32 -07:00
Stella Lau
15a041adbf Add function to get valid entries only from table 2017-07-17 15:16:58 -07:00
Yann Collet
3381bf4b84 Merge pull request #764 from terrelln/real-block-split
[libzstd] Refactor ZSTD_compressSequences()
2017-07-17 14:46:01 -07:00
Paul Cruz
6be22f1f84 swap buffers instead of copying memory over 2017-07-17 14:39:10 -07:00
Paul Cruz
708238e07e open file outside of adaptCCtx, pass to the output thread 2017-07-17 14:01:13 -07:00
Nick Terrell
e198230645 [libzstd] Remove ZSTD_CCtx* argument of ZSTD_compressSequences() 2017-07-17 12:27:24 -07:00
Stella Lau
4bb42b02c1 Add basic chaining table 2017-07-17 11:53:54 -07:00
Nick Terrell
634f012420 [libzstd] Refactor ZSTD_compressSequences() 2017-07-17 11:36:11 -07:00
Paul Cruz
044e40db5a removed freeCCtx() calls from createCCtx() so that it is not called twice during errors 2017-07-17 11:19:23 -07:00
Paul Cruz
50ce4eaeb6 added error detection for pthread initialization, added compression completion measurement, fixed const values 2017-07-17 10:12:44 -07:00
Stella Lau
ca300ce6e0 Decouple hash table from compression function 2017-07-14 17:17:00 -07:00
Paul Cruz
1ab3f06f00 updated tests to use different seeds when executing different tests 2017-07-14 16:29:29 -07:00
Stella Lau
6e443b4960 Move hash table access for own functions 2017-07-14 14:27:55 -07:00
Stella Lau
2d8e6c6608 Add more statistics 2017-07-14 12:31:01 -07:00
Stella Lau
55f960e8db Add percentages to offset histogram 2017-07-14 11:00:20 -07:00
Stella Lau
4db7f12ef3 Add offset histogram 2017-07-14 10:52:03 -07:00
Yann Collet
fa3aa04ccd Merge pull request #761 from paulcruz74/file-rename
renamed pool.c to poolTests.c
2017-07-14 09:09:45 -07:00
Yann Collet
3841ee6fb3 Merge pull request #762 from facebook/errorCodes
pinned down error code enum values
2017-07-14 09:09:22 -07:00
Yann Collet
3b0cff3c33 fixed clang's -Wdocumentation 2017-07-13 18:58:30 -07:00
Yann Collet
2bd6440be0 pinned down error code enum values
Note : all error codes are changed by this new version,
but it's expected to be the last change for existing codes.

Codes are now grouped by category, and receive a manually attributed value.
The objective is to guarantee that
error code values will not change in the future
when introducing new codes.
Intentionnal empty spaces and ranges are defined
in order to keep room for potential new codes.
2017-07-13 17:12:16 -07:00
Paul Cruz
0c8b9436b7 removed goto statements for the most part 2017-07-13 16:38:20 -07:00
Stella Lau
175a6c6029 [ldm] Minor refactoring 2017-07-13 16:16:31 -07:00
Yann Collet
3502426fd4 Merge branch 'dev' of github.com:facebook/zstd into dev 2017-07-13 15:49:19 -07:00
Yann Collet
6733c0777c updated NEWS regarding #760 2017-07-13 15:34:44 -07:00
Stella Lau
361c06df75 Add min/max offset to stats 2017-07-13 15:29:41 -07:00
Paul Cruz
65a4ce2635 added tests for forced compression level 2017-07-13 14:57:24 -07:00
Paul Cruz
0d9665cef5 added additional tests for performance, allowed force compression level for testing purposes 2017-07-13 14:46:54 -07:00
Stella Lau
2b3c7e4199 [ldm] Make some functions shared 2017-07-13 14:39:35 -07:00
Paul Cruz
9165e97fc6 added some tests for correctness, time, and compression ratio 2017-07-13 13:50:23 -07:00
Stella Lau
9306feb8fa [ldm] Switch to using lib/common/mem.h and move typedefs to ldm.h
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Blame Revision:
2017-07-13 13:44:48 -07:00
Nick Terrell
830ef4152a [libzstd] Increase granularity of FSECTable repeat mode 2017-07-13 12:45:39 -07:00
Stella Lau
50421d9474 [ldm] Remove old main files 2017-07-13 11:45:00 -07:00
Stella Lau
68c4560701 [ldm] Add TODO and comment for segfaulting in compress function 2017-07-13 10:38:19 -07:00
Yann Collet
d985319337 Merge pull request #759 from terrelln/real-block-split
[libzstd] Pull CTables into sub-structure
2017-07-13 10:24:19 -07:00
Yann Collet
3a60efd3a9 policy change : ZSTDMT automatically caps nbThreads to ZSTDMT_NBTHREADS_MAX (#760)
Previously, ZSTDMT would refuse to create the compressor.
Also : increased ZSTDMT_NBTHREADS_MAX to 256,
updated doc,
and added relevant test
2017-07-13 10:17:23 -07:00
Paul Cruz
766663f1f1 added altering dictionary size depending on compression level 2017-07-13 10:15:27 -07:00
Yann Collet
132e6efd76 switched ZSTDMT_compress_advanced() last argument to overlapLog
overlapRLog (== 9 - overlapLog) was a bit "strange"
as all other public entry points use overlapLog
2017-07-13 02:22:58 -07:00