measures :
- compression ratio with / without dictionary
- create one dictionary per block
- memory budget for dictionaries
- decompression speed, using one different dictionary per block
current limitations :
- only one file
- 4K blocks only
- automatic dictionary built with 4K size
dictionary can be selected on command line, with -D
We could undersize the literals buffer by up to 11 bytes,
due to a combination of 2 bugs:
* The literals buffer didn't have `WILDCOPY_OVERLENGTH` extra
space, like it is supposed to.
* We didn't check the literals buffer size in `ZSTD_sufficientBuff()`.
they were pretty easy to trigger by the way,
just start an extended paramgrill session
to find a compression table based on any sample,
it would necessarily happen at some point.
assert() in paramgrill are not in the benchmark path.
They should remain active, as they don't impact measurements, and their runtime is insignificant.
* Minor fix
* Run non-optimize FASTCOVER 5 times in benchmark
* Merge fastCover into dictBuilder
* Fix mixed declaration issue
* Add fastcover to symbol.c
* Add fastCover.c and cover.h to build
* Change fastCover.c to fastcover.c
* Update benchmark to run FASTCOVER in dictBuilder
* Undo spliting fastcover_param into cover_param and f
* Remove convert param functions
* Assign f to parameter
* Add zdict.h to Makefile in lib
* Add cover.h to BUCK
* Cast 1 to U64 before shifting
* Remove trimming of zero freq head and tail in selectSegment and rebenchmark
* Remove f as a separate parameter of tryParam
* Read 8 bytes when d is 6
* Add trimming off zero frequency head and tail
* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)
* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary
* Change nbDmer to always read 8 bytes even when d=6
* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER
* Update comments and benchmarking result
* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover
* Add dictType enum and fix bug about passing zParam when converting to coverParam
* Combine finalize and skip into a single parameter
* Update acceleration parameters and benchmark on 3 sample sets
* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets
* Initialize variables outside of for loop in benchmark.c
* Update benchmark result for hg-manifest
* Remove cover.h from install-includes
* Add explanation of f
* Set default compression level for trainFromBuffer to 3
* Add assertion of fastCoverParams in DiB_trainFromFiles
* Add checkTotalCompressedSize function + some minor fixes
* Add test for multithreading fastCovr
* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish
* Free segmentFreqs
* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment
* Add FASTCOVER_MEMMULT
* Minor fix
* Update benchmarking result
Per warnings from flawfinder: "Does not check for buffer overflows when
copying to destination [MS-banned] (CWE-120). Consider using snprintf,
strcpy_s, or strlcpy (warning: strncpy easily misused).".
Replaced called to strcpy and strcat in `fileio.c` to calls with a
specified size (`strncpy` and `strncat`).
Tested the changes on OSX, Linux, Windows.
On OSX + Linux, changes were tested with ASAN. The following flags were
used: 'check_initialization_order=1:strict_init_order=1:detect_odr_violation=1:detect_stack_use_after_return=1'
To reproduce warning:
./flawfinder.py ./programs/fileio.c
tells in a non-blocking way if there is something ready to flush right now.
only works with multi-threading for the time being.
Useful to know if flush speed will be limited by lack of production.