AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jennifer Liu	9d6ed9def3	Merge fastCover into DictBuilder (#1274 ) * Minor fix * Run non-optimize FASTCOVER 5 times in benchmark * Merge fastCover into dictBuilder * Fix mixed declaration issue * Add fastcover to symbol.c * Add fastCover.c and cover.h to build * Change fastCover.c to fastcover.c * Update benchmark to run FASTCOVER in dictBuilder * Undo spliting fastcover_param into cover_param and f * Remove convert param functions * Assign f to parameter * Add zdict.h to Makefile in lib * Add cover.h to BUCK * Cast 1 to U64 before shifting * Remove trimming of zero freq head and tail in selectSegment and rebenchmark * Remove f as a separate parameter of tryParam * Read 8 bytes when d is 6 * Add trimming off zero frequency head and tail * Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed) * Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary * Change nbDmer to always read 8 bytes even when d=6 * Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER * Update comments and benchmarking result * Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover * Add dictType enum and fix bug about passing zParam when converting to coverParam * Combine finalize and skip into a single parameter * Update acceleration parameters and benchmark on 3 sample sets * Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets * Initialize variables outside of for loop in benchmark.c * Update benchmark result for hg-manifest * Remove cover.h from install-includes * Add explanation of f * Set default compression level for trainFromBuffer to 3 * Add assertion of fastCoverParams in DiB_trainFromFiles * Add checkTotalCompressedSize function + some minor fixes * Add test for multithreading fastCovr * Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish * Free segmentFreqs * Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment * Add FASTCOVER_MEMMULT * Minor fix * Update benchmarking result	2018-08-23 12:06:20 -07:00
Yann Collet	1515f0bb0d	fixed more issues detected by recent version of scan-build test run on Linux	2018-08-16 15:20:25 -07:00
Yann Collet	6e66bbf5dd	fixed several minor issues detected by scan-build only notable one : writeNCount() resists better vs invalid distributions (though it should never happen within zstd anyway)	2018-08-14 16:55:35 -07:00
Jennifer Liu	f5228f2c44	Refactoring	2018-07-31 13:58:54 -07:00
Jennifer Liu	4e29bc2469	Use CDict instead of CCtx in analyzeEntropy	2018-07-31 10:36:45 -07:00
Jennifer Liu	612b346ed5	Add explanation for split=100	2018-07-11 15:50:28 -07:00
Jennifer Liu	5021441d86	Change default splitPoint to 100	2018-07-10 11:19:33 -07:00
Jennifer Liu	456f290e31	Change back to splitPoint<=0	2018-07-09 13:53:25 -07:00
Jennifer Liu	7efabb2cf6	Only make 0.0 default splitPoint	2018-07-09 12:26:53 -07:00
Jennifer Liu	015a00af0f	Change cover_sum back to 2 parameters and fix splitPoint issues	2018-07-06 14:24:18 -07:00
Jennifer Liu	0bbff01211	Fix testing parameter	2018-07-05 22:40:32 -07:00
Jennifer Liu	a085d1aae1	Allow splitPoint==1.0 (using all samples for both training and testing)	2018-07-05 10:38:45 -07:00
Jennifer Liu	0881184c89	Some edits based on pull request comments	2018-07-03 17:53:27 -07:00
Jennifer Liu	16e75e8804	Update minimal training sample size	2018-07-03 12:07:06 -07:00
Jennifer Liu	348e5f77a9	Add split=# to cli	2018-06-29 17:54:41 -07:00
Jennifer Liu	52fbbbcb6b	Explicitly cast double to unsigned	2018-06-29 16:17:20 -07:00
Jennifer Liu	f9d19b83fb	Fix variable declaration problem	2018-06-29 15:46:56 -07:00
Jennifer Liu	e061d84016	Another fix to comparator	2018-06-29 15:38:08 -07:00
Jennifer Liu	59797d3328	Fix splitPoint floating point comparison problem	2018-06-29 12:47:03 -07:00
Jennifer Liu	0ef06f2e8a	Split samples into train and test sets	2018-06-29 12:33:34 -07:00
Yann Collet	fa41bcc2c2	grouped debug functions into debug.h There were 2 competing set of debug functions within zstd_internal.h and bitstream.h. They were mostly duplicate, and required care to avoid messing with each other. There is now a single implementation, shared by both. Significant change : The macro variable ZSTD_DEBUG does no longer exist, it has been replaced by DEBUGLEVEL, which required modifying several source files.	2018-06-13 15:43:09 -04:00
Nick Terrell	7cbb8bbbbf	[cover] Small compression ratio improvement The cover algorithm selects one segment per epoch, and it selects the epoch size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs gives the algorithm more candidates to choose from for each segment it selects, and then it will loop back to the first epoch when it hits the last one. The trade off is that now it takes longer to select each segment, since it has to look at more data before making a choice. I benchmarked on the following data sets using this command: ```sh $ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR \| wc -c ``` \| Data set \| k (approx) \| Before \| After \| % difference \| \|--------------\|------------\|----------\|----------\|--------------\| \| GitHub \| ~1000 \| 738138 \| 746610 \| +1.14% \| \| hg-changelog \| ~90 \| 4295156 \| 4285336 \| -0.23% \| \| hg-commands \| ~500 \| 1095580 \| 1079814 \| -1.44% \| \| hg-manifest \| ~400 \| 16559892 \| 16504346 \| -0.34% \| There is some noise in the measurements, since small changes to `k` can have large differences, which is why I'm using `steps=256`, to try to minimize the noise. However, the GitHub data set still has some noise. If I run the GitHub data set on my Mac, which presumably lists directory entries in a different order, so the dictionary builder sees the files in a different order, or I use `steps=1024` I see these results. \| Run \| Before \| After \| % difference \| \|------------\|--------\|--------\|--------------\| \| steps=1024 \| 738138 \| 734470 \| -0.50% \| \| MacBook \| 738451 \| 737132 \| -0.18% \| Question: Should we expose this as a parameter? I don't think it is necessary. Someone might want to turn it up to exchange a much longer dictionary building time in exchange for a slightly better dictionary. I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16` with a faster running time.	2018-05-18 16:15:27 -07:00
Yann Collet	1da629f2ad	Merge pull request #1104 from terrelln/fast-train Allow negative compression levels in training	2018-04-09 14:16:20 -07:00
Nick Terrell	569e2abccd	Allow negative compression levels in training * Set `dictCLevel` in `zstdcli.c`. * Only set to default level if the compression level `== 0`, not `<= 0`.	2018-04-09 12:12:03 -07:00
Björn Ketelaars	462aed6811	zstd requires a stable sort. On OpenBSD qsort() is not guaranteed to be stable, their mergesort() is. This fixes issue #1088. All the hard work has been done by @terrelln.	2018-04-05 07:59:16 +02:00
Yann Collet	9f8ed23b5b	bumped version number to v1.3.4 also added a paragraph on using compression level with training mode as this is a recurrent question (see for example #1004)	2018-01-27 22:23:26 -08:00
Yann Collet	752bae4a48	added warning message when pathological dataset is detected (note : cover_optimize needs -v to display the warning)	2018-01-11 11:29:28 -08:00
Yann Collet	e8093dde09	fixed #304 Pathological samples may result in literal section being incompressible. This case is now detected, and literal distribution is replaced by one that can be written into the dictionary.	2018-01-11 11:16:32 -08:00
Yann Collet	218e9fe0fc	added a test case for dictBuilder failure cyclic data set makes the entropy stage fails now, onto a fix for #304 ...	2018-01-11 09:42:38 -08:00
Yann Collet	c173dbd6e7	no longer supported starting C++17	2017-12-04 18:00:53 -08:00
Nick Terrell	6c41adfb28	[libzstd] pthread function prefixed with ZSTD_ * `sed -i 's/pthread_/ZSTD_pthread_/g' lib/{,common,compress,decompress,dictBuilder}/.[hc]` Fix up `lib/common/threading.[hc]` * `sed -i s/PTHREAD_MUTEX_LOCK/ZSTD_PTHREAD_MUTEX_LOCK/g lib/compress/zstdmt_compress.c`	2017-09-27 11:48:48 -07:00
Yann Collet	77c137b3ae	minor comment refactor	2017-09-14 15:12:57 -07:00
Yann Collet	3128e03be6	updated license header to clarify dual-license meaning as "or"	2017-09-08 00:09:23 -07:00
Nick Terrell	376f435914	[dictBuilder] Set default compression level to 3	2017-08-24 16:21:05 -07:00
Dmitriy Titarenko	20f715d709	Fix displayLevel overflow	2017-08-23 15:56:15 +05:00
Yann Collet	bd9c8ca146	Merge pull request #811 from terrelln/segmentSize [cover] Fix end condition for small dictionary	2017-08-22 14:36:30 -07:00
Nick Terrell	29c2d9a4d0	[cover] Turn down notification for ZDICT subroutines	2017-08-21 14:28:31 -07:00
Nick Terrell	98de3f6847	[cover] Add dictionary size to compressed size	2017-08-21 14:23:17 -07:00
Nick Terrell	9a54a315aa	[cover] Convert score to U32 and check for zero	2017-08-21 13:30:07 -07:00
Nick Terrell	d49eb40c03	[cover] Stop when segmentSize is less than d	2017-08-21 13:10:03 -07:00
Nick Terrell	f306d400c0	[cover] Fix divide by zero	2017-08-21 11:12:11 -07:00
Yann Collet	32fb407c9d	updated a bunch of headers for the new license	2017-08-18 16:52:05 -07:00
Yann Collet	b71363b967	check pthread_*_init() success condition	2017-07-19 01:05:40 -07:00
Yann Collet	2bd6440be0	pinned down error code enum values Note : all error codes are changed by this new version, but it's expected to be the last change for existing codes. Codes are now grouped by category, and receive a manually attributed value. The objective is to guarantee that error code values will not change in the future when introducing new codes. Intentionnal empty spaces and ranges are defined in order to keep room for potential new codes.	2017-07-13 17:12:16 -07:00
Yann Collet	590937df20	Merge pull request #739 from facebook/refPrefix ZSTD_refPrefix	2017-06-29 04:36:03 -07:00
Yann Collet	7d3816183f	exposed ZSTD_MAGIC_DICTIONARY in zstd.h makes it easier to explain ZSTD_dictMode	2017-06-27 13:50:34 -07:00
Nick Terrell	5b7fd7c422	[zdict] Make COVER the default algorithm	2017-06-26 21:09:22 -07:00
Yann Collet	ee970398b2	Merge branch 'dev' into advancedAPI2	2017-05-22 12:33:56 -07:00
Nick Terrell	a1280406b0	[libzstd] Allow users to define custom visibility	2017-05-19 18:01:59 -07:00
Yann Collet	fa3671eac7	changed ZSTD_BLOCKSIZE_ABSOLUTEMAX into ZSTD_BLOCKSIZE_MAX Also : change ZSTD_getBlockSizeMax() into ZSTD_getBlockSize() created ZSTD_BLOCKSIZELOG_MAX	2017-05-19 10:51:30 -07:00
Nick Terrell	f376d47c11	[CLI] Switch dictionary builder on CLI to cover	2017-05-02 11:18:27 -07:00
Nick Terrell	020b960e13	[cover] Make optimization faster	2017-05-02 11:02:48 -07:00
Nick Terrell	f2d9ef1dc0	[cover] Optimize case where d <= 8	2017-05-02 11:02:43 -07:00
Nick Terrell	865918dd04	Fix typo in zdict.h	2017-05-02 11:02:37 -07:00
Nick Terrell	5152fb2cb2	Convert all tabs to spaces	2017-03-29 18:51:58 -07:00
Yann Collet	4cf0093571	restored bonus rule	2017-03-26 14:51:00 -07:00
Yann Collet	69017bf253	Merge branch 'dev' into LegacyDictBuilder	2017-03-26 14:39:13 -07:00
Yann Collet	582760818f	minor refactor add const changed if for easier to add new conditions	2017-03-26 03:04:56 -07:00
Yann Collet	858f72eeb8	fixed dictBuilder issue dictionary loading would fail during entropy analysis	2017-03-26 02:50:00 -07:00
Yann Collet	ecee9f2ef8	fixed conversion warnings	2017-03-26 00:59:14 -07:00
Yann Collet	4c41d37fcc	changed test for new syntax --dictID= and --maxdict=	2017-03-24 18:36:56 -07:00
Yann Collet	d41f707e88	minor improvement : remove duplicates with 1 char prefix difference	2017-03-24 17:56:45 -07:00
Yann Collet	96aa3019b2	changed advanced commands --maxdict= and --dictID= now works with the `=` variant, which is the recommended one. Old variant `--dictID #` still works, for compatibility with existing scripts. Long term objective is to remove the old variant..	2017-03-24 16:04:29 -07:00
Yann Collet	9da3b215ec	Ensure all limits derived from same constants Now uses ZDICT_DICTSIZE_MIN and ZDICT_CONTENTSIZE_MIN from zdict.h. Also : reduced values to 256 and 128 respectively	2017-03-24 15:02:09 -07:00
Yann Collet	f332ece468	dictBuilder fails to create dictionary on certain input Properly expressed with an error code (see zstd_errors.h) and a cli return code != 0	2017-03-23 16:24:02 -07:00
Sean Purcell	042ba122ae	Change g_displayLevel to int and fix DISPLAYUPDATE flush	2017-03-23 11:21:59 -07:00
Nick Terrell	976e325b2e	Fix COVER_optimizeTrainFromBuffer() resource leaks Thanks to @nemequ for reporting the resource leaks.	2017-03-02 15:54:39 -08:00
Nick Terrell	545987996a	Fix deprecation warnings for clang with C++14	2017-02-08 17:38:17 -08:00
Nick Terrell	71c5263c00	Attribute cover dictionary code	2017-02-07 11:35:07 -08:00
Nick Terrell	43474313f8	Fix documentation about memory usage	2017-01-27 18:43:05 -08:00
Nick Terrell	2fe9126591	Add multithread support to COVER	2017-01-27 11:56:02 -08:00
Nick Terrell	8d984699db	Document memory requirements for COVER algorithm	2017-01-09 18:20:10 -08:00
Nick Terrell	555e281637	Handle large input size in 32-bit mode correctly	2017-01-09 18:20:06 -08:00
Nick Terrell	3a1fefcf00	Simplify COVER parameters	2017-01-02 17:51:38 -08:00
Nick Terrell	96b39f65fa	Add COVER dictionary builder	2017-01-02 13:22:51 -08:00
Yann Collet	aca113f4f5	fixed ZSTD_sizeof_?Dict()	2016-12-23 22:25:03 +01:00
Nick Terrell	1b5d4a7d53	ZDICT_finalizeDictionary() flipped comparison	2016-12-22 18:14:57 -08:00
Nick Terrell	bcbe77e994	ZDICT_finalizeDictionary() flipped comparison `ZDICT_finalizeDictionary()` had a flipped comparison. I also allowed `dictBufferCapacity == dictContentSize`. It might be the case that the user wants to fill the dictionary completely up, and then let zstd take exactly the space it needs for the entropy tables.	2016-12-22 18:01:14 -08:00
Nick Terrell	78a0072d5a	Fix failing test due to deprecation warning	2016-12-22 17:36:16 -08:00
Yann Collet	d76d1a9ef0	added ZDICT_finalizeDictionary()	2016-12-22 20:18:43 +01:00
Yann Collet	0819abe3c1	added ZSTD_createDDict_byReference() body	2016-12-21 19:25:15 +01:00
Yann Collet	1496c3dc47	Fix : size estimation when some samples are very large	2016-12-18 11:58:23 +01:00
Yann Collet	d46ecb58a5	added dll compilation tests	2016-12-17 16:28:12 +01:00
Nick Terrell	8de46ab51a	Export all API functions	2016-12-16 13:27:30 -08:00
Yann Collet	0a5a5fb7fd	Fix #418 : printing selected segments in zdict debug mode can segfault with certain pathological patterns	2016-11-02 13:57:55 -07:00
Yann Collet	52c1bf93fe	improved dicitonary segment merge	2016-10-18 16:34:58 -07:00
Yann Collet	2b361cf2f1	minor opt	2016-10-14 16:09:07 -07:00
Yann Collet	df6797447f	update dictionary builder warning comments	2016-09-27 15:14:32 +02:00
Yann Collet	47094ea66b	added comment on filePos	2016-09-26 18:03:33 +02:00
Yann Collet	97b378a6f8	Streaming : dictionary compression on multiple files / segments can correctly provide srcSize into header (when provided) using pledgedSrcSize.	2016-09-21 17:20:19 +02:00
Yann Collet	d56dbc02d3	removed g_displayLevel	2016-09-02 17:28:41 -07:00
Yann Collet	855766d73d	clarified dictionary in format description	2016-09-02 17:04:49 -07:00
Yann Collet	d725427a3c	g_time => local displayTime	2016-09-02 15:32:39 -07:00
Yann Collet	4ded9e591c	added boilerplate	2016-08-30 11:06:28 -07:00
Yann Collet	3b15f1f10f	minor refactor	2016-08-30 09:58:50 -07:00
Yann Collet	87c18b2ebd	fixed multiple minor warnings for XCode	2016-08-26 01:43:47 +02:00
Yann Collet	da3fbcb302	Added ZDICT_getDictID()	2016-08-19 14:23:58 +02:00
Yann Collet	a5dbf9f629	Merge pull request #297 from borzunov/dev Export functions related to dictionary compression from DLL	2016-08-18 15:05:01 +02:00
Yann Collet	49d105cfcf	better warning and error messages in case of dictionary training failure (#292 )	2016-08-18 15:02:11 +02:00
Alexander Borzunov	0f6f17a14f	Rename ZSTDLIB_API to ZDICTLIB_API in zdict.h	2016-08-18 16:47:06 +05:00

1 2 3 4

194 Commits