AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Terrell	7cbb8bbbbf	[cover] Small compression ratio improvement The cover algorithm selects one segment per epoch, and it selects the epoch size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs gives the algorithm more candidates to choose from for each segment it selects, and then it will loop back to the first epoch when it hits the last one. The trade off is that now it takes longer to select each segment, since it has to look at more data before making a choice. I benchmarked on the following data sets using this command: ```sh $ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR \| wc -c ``` \| Data set \| k (approx) \| Before \| After \| % difference \| \|--------------\|------------\|----------\|----------\|--------------\| \| GitHub \| ~1000 \| 738138 \| 746610 \| +1.14% \| \| hg-changelog \| ~90 \| 4295156 \| 4285336 \| -0.23% \| \| hg-commands \| ~500 \| 1095580 \| 1079814 \| -1.44% \| \| hg-manifest \| ~400 \| 16559892 \| 16504346 \| -0.34% \| There is some noise in the measurements, since small changes to `k` can have large differences, which is why I'm using `steps=256`, to try to minimize the noise. However, the GitHub data set still has some noise. If I run the GitHub data set on my Mac, which presumably lists directory entries in a different order, so the dictionary builder sees the files in a different order, or I use `steps=1024` I see these results. \| Run \| Before \| After \| % difference \| \|------------\|--------\|--------\|--------------\| \| steps=1024 \| 738138 \| 734470 \| -0.50% \| \| MacBook \| 738451 \| 737132 \| -0.18% \| Question: Should we expose this as a parameter? I don't think it is necessary. Someone might want to turn it up to exchange a much longer dictionary building time in exchange for a slightly better dictionary. I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16` with a faster running time.	2018-05-18 16:15:27 -07:00
Yann Collet	1da629f2ad	Merge pull request #1104 from terrelln/fast-train Allow negative compression levels in training	2018-04-09 14:16:20 -07:00
Nick Terrell	569e2abccd	Allow negative compression levels in training * Set `dictCLevel` in `zstdcli.c`. * Only set to default level if the compression level `== 0`, not `<= 0`.	2018-04-09 12:12:03 -07:00
Björn Ketelaars	462aed6811	zstd requires a stable sort. On OpenBSD qsort() is not guaranteed to be stable, their mergesort() is. This fixes issue #1088. All the hard work has been done by @terrelln.	2018-04-05 07:59:16 +02:00
Yann Collet	9f8ed23b5b	bumped version number to v1.3.4 also added a paragraph on using compression level with training mode as this is a recurrent question (see for example #1004)	2018-01-27 22:23:26 -08:00
Yann Collet	752bae4a48	added warning message when pathological dataset is detected (note : cover_optimize needs -v to display the warning)	2018-01-11 11:29:28 -08:00
Yann Collet	e8093dde09	fixed #304 Pathological samples may result in literal section being incompressible. This case is now detected, and literal distribution is replaced by one that can be written into the dictionary.	2018-01-11 11:16:32 -08:00
Yann Collet	218e9fe0fc	added a test case for dictBuilder failure cyclic data set makes the entropy stage fails now, onto a fix for #304 ...	2018-01-11 09:42:38 -08:00
Yann Collet	c173dbd6e7	no longer supported starting C++17	2017-12-04 18:00:53 -08:00
Nick Terrell	6c41adfb28	[libzstd] pthread function prefixed with ZSTD_ * `sed -i 's/pthread_/ZSTD_pthread_/g' lib/{,common,compress,decompress,dictBuilder}/.[hc]` Fix up `lib/common/threading.[hc]` * `sed -i s/PTHREAD_MUTEX_LOCK/ZSTD_PTHREAD_MUTEX_LOCK/g lib/compress/zstdmt_compress.c`	2017-09-27 11:48:48 -07:00
Yann Collet	77c137b3ae	minor comment refactor	2017-09-14 15:12:57 -07:00
Yann Collet	3128e03be6	updated license header to clarify dual-license meaning as "or"	2017-09-08 00:09:23 -07:00
Nick Terrell	376f435914	[dictBuilder] Set default compression level to 3	2017-08-24 16:21:05 -07:00
Dmitriy Titarenko	20f715d709	Fix displayLevel overflow	2017-08-23 15:56:15 +05:00
Yann Collet	bd9c8ca146	Merge pull request #811 from terrelln/segmentSize [cover] Fix end condition for small dictionary	2017-08-22 14:36:30 -07:00
Nick Terrell	29c2d9a4d0	[cover] Turn down notification for ZDICT subroutines	2017-08-21 14:28:31 -07:00
Nick Terrell	98de3f6847	[cover] Add dictionary size to compressed size	2017-08-21 14:23:17 -07:00
Nick Terrell	9a54a315aa	[cover] Convert score to U32 and check for zero	2017-08-21 13:30:07 -07:00
Nick Terrell	d49eb40c03	[cover] Stop when segmentSize is less than d	2017-08-21 13:10:03 -07:00
Nick Terrell	f306d400c0	[cover] Fix divide by zero	2017-08-21 11:12:11 -07:00
Yann Collet	32fb407c9d	updated a bunch of headers for the new license	2017-08-18 16:52:05 -07:00
Yann Collet	b71363b967	check pthread_*_init() success condition	2017-07-19 01:05:40 -07:00
Yann Collet	2bd6440be0	pinned down error code enum values Note : all error codes are changed by this new version, but it's expected to be the last change for existing codes. Codes are now grouped by category, and receive a manually attributed value. The objective is to guarantee that error code values will not change in the future when introducing new codes. Intentionnal empty spaces and ranges are defined in order to keep room for potential new codes.	2017-07-13 17:12:16 -07:00
Yann Collet	590937df20	Merge pull request #739 from facebook/refPrefix ZSTD_refPrefix	2017-06-29 04:36:03 -07:00
Yann Collet	7d3816183f	exposed ZSTD_MAGIC_DICTIONARY in zstd.h makes it easier to explain ZSTD_dictMode	2017-06-27 13:50:34 -07:00
Nick Terrell	5b7fd7c422	[zdict] Make COVER the default algorithm	2017-06-26 21:09:22 -07:00
Yann Collet	ee970398b2	Merge branch 'dev' into advancedAPI2	2017-05-22 12:33:56 -07:00
Nick Terrell	a1280406b0	[libzstd] Allow users to define custom visibility	2017-05-19 18:01:59 -07:00
Yann Collet	fa3671eac7	changed ZSTD_BLOCKSIZE_ABSOLUTEMAX into ZSTD_BLOCKSIZE_MAX Also : change ZSTD_getBlockSizeMax() into ZSTD_getBlockSize() created ZSTD_BLOCKSIZELOG_MAX	2017-05-19 10:51:30 -07:00
Nick Terrell	f376d47c11	[CLI] Switch dictionary builder on CLI to cover	2017-05-02 11:18:27 -07:00
Nick Terrell	020b960e13	[cover] Make optimization faster	2017-05-02 11:02:48 -07:00
Nick Terrell	f2d9ef1dc0	[cover] Optimize case where d <= 8	2017-05-02 11:02:43 -07:00
Nick Terrell	865918dd04	Fix typo in zdict.h	2017-05-02 11:02:37 -07:00
Nick Terrell	5152fb2cb2	Convert all tabs to spaces	2017-03-29 18:51:58 -07:00
Yann Collet	4cf0093571	restored bonus rule	2017-03-26 14:51:00 -07:00
Yann Collet	69017bf253	Merge branch 'dev' into LegacyDictBuilder	2017-03-26 14:39:13 -07:00
Yann Collet	582760818f	minor refactor add const changed if for easier to add new conditions	2017-03-26 03:04:56 -07:00
Yann Collet	858f72eeb8	fixed dictBuilder issue dictionary loading would fail during entropy analysis	2017-03-26 02:50:00 -07:00
Yann Collet	ecee9f2ef8	fixed conversion warnings	2017-03-26 00:59:14 -07:00
Yann Collet	4c41d37fcc	changed test for new syntax --dictID= and --maxdict=	2017-03-24 18:36:56 -07:00
Yann Collet	d41f707e88	minor improvement : remove duplicates with 1 char prefix difference	2017-03-24 17:56:45 -07:00
Yann Collet	96aa3019b2	changed advanced commands --maxdict= and --dictID= now works with the `=` variant, which is the recommended one. Old variant `--dictID #` still works, for compatibility with existing scripts. Long term objective is to remove the old variant..	2017-03-24 16:04:29 -07:00
Yann Collet	9da3b215ec	Ensure all limits derived from same constants Now uses ZDICT_DICTSIZE_MIN and ZDICT_CONTENTSIZE_MIN from zdict.h. Also : reduced values to 256 and 128 respectively	2017-03-24 15:02:09 -07:00
Yann Collet	f332ece468	dictBuilder fails to create dictionary on certain input Properly expressed with an error code (see zstd_errors.h) and a cli return code != 0	2017-03-23 16:24:02 -07:00
Sean Purcell	042ba122ae	Change g_displayLevel to int and fix DISPLAYUPDATE flush	2017-03-23 11:21:59 -07:00
Nick Terrell	976e325b2e	Fix COVER_optimizeTrainFromBuffer() resource leaks Thanks to @nemequ for reporting the resource leaks.	2017-03-02 15:54:39 -08:00
Nick Terrell	545987996a	Fix deprecation warnings for clang with C++14	2017-02-08 17:38:17 -08:00
Nick Terrell	71c5263c00	Attribute cover dictionary code	2017-02-07 11:35:07 -08:00
Nick Terrell	43474313f8	Fix documentation about memory usage	2017-01-27 18:43:05 -08:00
Nick Terrell	2fe9126591	Add multithread support to COVER	2017-01-27 11:56:02 -08:00

1 2 3

123 Commits