Commit Graph

516 Commits

Author SHA1 Message Date
Yann Collet
50b216146f
Merge pull request #1304 from facebook/largeNbDicts
contrib/largeNbDicts
2018-09-06 09:50:56 -07:00
Yann Collet
c57a856d64 fixed minor static analyzer warning 2018-09-05 14:33:51 -07:00
Yann Collet
1d487d587f updated documentation 2018-09-04 14:57:45 -07:00
Yann Collet
11b8b8c100 silenced false-positive scan-build warning 2018-08-31 10:01:06 -07:00
Yann Collet
0ff67511e6 fixed link order for old compilers 2018-08-30 16:43:28 -07:00
Yann Collet
f76253bb70 minor : createDictionaryBuffer() can create dictionaries of different sizes 2018-08-30 16:24:44 -07:00
Yann Collet
39c55a118f fixed minor compatibility issues with older compilers 2018-08-30 16:00:57 -07:00
Yann Collet
39ef91a599 -std=c99 for largeNbDicts 2018-08-30 14:59:23 -07:00
Yann Collet
4086b2871b largeNbDicts compatible with multiple source files
splitting is disabled by default, but can be re-enabled using usual command -B#
update commands to look like zstd ones
2018-08-30 14:38:49 -07:00
Yann Collet
a5a77965d3 make all includes contrib/largeNbDicts 2018-08-29 16:17:22 -07:00
Yann Collet
d89fa814c1 added a README
for documentation
2018-08-28 18:19:19 -07:00
Yann Collet
6444c50035 increases randomness of ddict ptrs 2018-08-28 18:13:46 -07:00
Yann Collet
6c398df241 level, block size and nb dicts can be set on command line 2018-08-28 18:05:31 -07:00
Yann Collet
0c66a44d1b first working test program
measures :
- compression ratio with / without dictionary
- create one dictionary per block
- memory budget for dictionaries
- decompression speed, using one different dictionary per block

current limitations :
- only one file
- 4K blocks only
- automatic dictionary built with 4K size

dictionary can be selected on command line, with -D
2018-08-28 15:47:07 -07:00
Yann Collet
274b60e6e6 largeNbDicts can compress and compare dict vs noDict 2018-08-27 17:08:44 -07:00
Yann Collet
6782725155 first sketch for largeNbDicts test program 2018-08-26 19:29:12 -07:00
Jennifer Liu
9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
Yann Collet
36d6165a2d Makefile: added variable SCANBUILD
so that a different version of scan-build can be selected
2018-08-16 16:44:13 -07:00
Yann Collet
42a02ab745 fixed minor warnings issued by scan-build 2018-08-15 14:36:02 -07:00
Jennifer Liu
0acb0abd1e Add non-optimize FASTCOVER (#1260)
* Add non-optimize FASTCOVER

* Minor fix

* Pass param as value instead of pointer
2018-08-01 11:06:16 -07:00
Jennifer Liu
4e29bc2469 Use CDict instead of CCtx in analyzeEntropy 2018-07-31 10:36:45 -07:00
Jennifer Liu
31229e527b Increment frequency for every dmer occurence within same sample instead of at most once per sample 2018-07-30 12:54:22 -07:00
Jennifer Liu
51b109c1b5 Delete old benchmarking result 2018-07-27 17:31:33 -07:00
Jennifer Liu
53ef22a4bc Undo deleting clean in make 2018-07-27 16:56:50 -07:00
Jennifer Liu
96d84ee235 Revert test.sh 2018-07-27 16:54:05 -07:00
Jennifer Liu
61262f6c0d Save segmentFreqs in ctx instead of malloc and memset in SelectSegment 2018-07-27 16:51:38 -07:00
Jennifer Liu
49b398e93f Use same param after optimizing cover and fastCover and record k and d for benchmarking 2018-07-27 13:39:19 -07:00
Jennifer Liu
759c543312 Rerun cover and fastCover with optimized values 2018-07-26 19:03:01 -07:00
Jennifer Liu
3d7941ce41 Benchmark different f values 2018-07-26 16:24:13 -07:00
Jennifer Liu
3b163e0b5b Add array to keep track of frequency within active segment, fix malloc bug, update benchmarking result 2018-07-26 13:53:13 -07:00
Jennifer Liu
2333ecb173 Allow d=6 2018-07-25 18:10:09 -07:00
Jennifer Liu
1e85f314d8 Benchmark fast cover optimize vs k=200 2018-07-25 17:53:38 -07:00
Jennifer Liu
d1fc507ef9 Initial benchmarking result for fastCover 2018-07-25 17:05:54 -07:00
Jennifer Liu
f5407e398a Make hash value const 2018-07-25 16:54:08 -07:00
Jennifer Liu
7f3f70f766 Add Fast Cover Dictionary Builder 2018-07-25 16:34:07 -07:00
Nick Terrell
77068a8447
Merge pull request #1246 from jennifermliu/benchmark
Benchmark dictionary builders
2018-07-20 18:09:31 -07:00
Jennifer Liu
b6c5d4982c Minor fix 2018-07-20 17:41:22 -07:00
Jennifer Liu
71e767ac09 Refactoring and benchmark without dictionary 2018-07-20 17:03:47 -07:00
Jennifer Liu
470c8d42f4 Benchmark dictionary builders 2018-07-20 11:32:39 -07:00
Nick Terrell
4d1ad5cdb2
Merge pull request #1238 from jennifermliu/random
Add random dictionary builder
2018-07-19 13:52:15 -07:00
Jennifer Liu
0c5eaef248 Update Makefile 2018-07-19 13:44:27 -07:00
Jennifer Liu
5bb46a898e Rename cleanup 2018-07-18 12:15:49 -07:00
Jennifer Liu
52e7cf0e40 Add cleanup to trainfromFiles and move RANDOM_segment_t declaration 2018-07-18 10:40:13 -07:00
Jennifer Liu
ce09fb723d Update freeSampleInfo 2018-07-17 16:13:40 -07:00
Jennifer Liu
896ff0644a Fix deallocation problem and add documentation 2018-07-17 16:01:44 -07:00
Jennifer Liu
e6fe405838 Make test PHONY target 2018-07-17 12:42:53 -07:00
Jennifer Liu
49acfaeaec Move file loading functions to new file for access by benchmarking tool 2018-07-17 12:35:09 -07:00
Jennifer Liu
4d32339b75 Remove CLevel cli option which was accidentally added back in the last commit 2018-07-16 18:59:18 -07:00
Jennifer Liu
1f7fa5cdd6 Fix spacing and Edit Makefile (now run with make instead of make run) 2018-07-16 16:31:59 -07:00
Jennifer Liu
b5806d33db Refactor RANDOM 2018-07-16 16:03:04 -07:00