AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jennifer Liu	5021441d86	Change default splitPoint to 100	2018-07-10 11:19:33 -07:00
Jennifer Liu	456f290e31	Change back to splitPoint<=0	2018-07-09 13:53:25 -07:00
Jennifer Liu	7efabb2cf6	Only make 0.0 default splitPoint	2018-07-09 12:26:53 -07:00
Jennifer Liu	015a00af0f	Change cover_sum back to 2 parameters and fix splitPoint issues	2018-07-06 14:24:18 -07:00
Jennifer Liu	0bbff01211	Fix testing parameter	2018-07-05 22:40:32 -07:00
Jennifer Liu	a085d1aae1	Allow splitPoint==1.0 (using all samples for both training and testing)	2018-07-05 10:38:45 -07:00
Jennifer Liu	0881184c89	Some edits based on pull request comments	2018-07-03 17:53:27 -07:00
Jennifer Liu	16e75e8804	Update minimal training sample size	2018-07-03 12:07:06 -07:00
Jennifer Liu	348e5f77a9	Add split=# to cli	2018-06-29 17:54:41 -07:00
Jennifer Liu	52fbbbcb6b	Explicitly cast double to unsigned	2018-06-29 16:17:20 -07:00
Jennifer Liu	f9d19b83fb	Fix variable declaration problem	2018-06-29 15:46:56 -07:00
Jennifer Liu	e061d84016	Another fix to comparator	2018-06-29 15:38:08 -07:00
Jennifer Liu	59797d3328	Fix splitPoint floating point comparison problem	2018-06-29 12:47:03 -07:00
Jennifer Liu	0ef06f2e8a	Split samples into train and test sets	2018-06-29 12:33:34 -07:00
Yann Collet	fa41bcc2c2	grouped debug functions into debug.h There were 2 competing set of debug functions within zstd_internal.h and bitstream.h. They were mostly duplicate, and required care to avoid messing with each other. There is now a single implementation, shared by both. Significant change : The macro variable ZSTD_DEBUG does no longer exist, it has been replaced by DEBUGLEVEL, which required modifying several source files.	2018-06-13 15:43:09 -04:00
Nick Terrell	7cbb8bbbbf	[cover] Small compression ratio improvement The cover algorithm selects one segment per epoch, and it selects the epoch size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs gives the algorithm more candidates to choose from for each segment it selects, and then it will loop back to the first epoch when it hits the last one. The trade off is that now it takes longer to select each segment, since it has to look at more data before making a choice. I benchmarked on the following data sets using this command: ```sh $ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR \| wc -c ``` \| Data set \| k (approx) \| Before \| After \| % difference \| \|--------------\|------------\|----------\|----------\|--------------\| \| GitHub \| ~1000 \| 738138 \| 746610 \| +1.14% \| \| hg-changelog \| ~90 \| 4295156 \| 4285336 \| -0.23% \| \| hg-commands \| ~500 \| 1095580 \| 1079814 \| -1.44% \| \| hg-manifest \| ~400 \| 16559892 \| 16504346 \| -0.34% \| There is some noise in the measurements, since small changes to `k` can have large differences, which is why I'm using `steps=256`, to try to minimize the noise. However, the GitHub data set still has some noise. If I run the GitHub data set on my Mac, which presumably lists directory entries in a different order, so the dictionary builder sees the files in a different order, or I use `steps=1024` I see these results. \| Run \| Before \| After \| % difference \| \|------------\|--------\|--------\|--------------\| \| steps=1024 \| 738138 \| 734470 \| -0.50% \| \| MacBook \| 738451 \| 737132 \| -0.18% \| Question: Should we expose this as a parameter? I don't think it is necessary. Someone might want to turn it up to exchange a much longer dictionary building time in exchange for a slightly better dictionary. I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16` with a faster running time.	2018-05-18 16:15:27 -07:00
Yann Collet	1da629f2ad	Merge pull request #1104 from terrelln/fast-train Allow negative compression levels in training	2018-04-09 14:16:20 -07:00
Nick Terrell	569e2abccd	Allow negative compression levels in training * Set `dictCLevel` in `zstdcli.c`. * Only set to default level if the compression level `== 0`, not `<= 0`.	2018-04-09 12:12:03 -07:00
Björn Ketelaars	462aed6811	zstd requires a stable sort. On OpenBSD qsort() is not guaranteed to be stable, their mergesort() is. This fixes issue #1088. All the hard work has been done by @terrelln.	2018-04-05 07:59:16 +02:00
Yann Collet	9f8ed23b5b	bumped version number to v1.3.4 also added a paragraph on using compression level with training mode as this is a recurrent question (see for example #1004)	2018-01-27 22:23:26 -08:00
Yann Collet	752bae4a48	added warning message when pathological dataset is detected (note : cover_optimize needs -v to display the warning)	2018-01-11 11:29:28 -08:00
Yann Collet	e8093dde09	fixed #304 Pathological samples may result in literal section being incompressible. This case is now detected, and literal distribution is replaced by one that can be written into the dictionary.	2018-01-11 11:16:32 -08:00
Yann Collet	218e9fe0fc	added a test case for dictBuilder failure cyclic data set makes the entropy stage fails now, onto a fix for #304 ...	2018-01-11 09:42:38 -08:00
Yann Collet	c173dbd6e7	no longer supported starting C++17	2017-12-04 18:00:53 -08:00
Nick Terrell	6c41adfb28	[libzstd] pthread function prefixed with ZSTD_ * `sed -i 's/pthread_/ZSTD_pthread_/g' lib/{,common,compress,decompress,dictBuilder}/.[hc]` Fix up `lib/common/threading.[hc]` * `sed -i s/PTHREAD_MUTEX_LOCK/ZSTD_PTHREAD_MUTEX_LOCK/g lib/compress/zstdmt_compress.c`	2017-09-27 11:48:48 -07:00
Yann Collet	77c137b3ae	minor comment refactor	2017-09-14 15:12:57 -07:00
Yann Collet	3128e03be6	updated license header to clarify dual-license meaning as "or"	2017-09-08 00:09:23 -07:00
Nick Terrell	376f435914	[dictBuilder] Set default compression level to 3	2017-08-24 16:21:05 -07:00
Dmitriy Titarenko	20f715d709	Fix displayLevel overflow	2017-08-23 15:56:15 +05:00
Yann Collet	bd9c8ca146	Merge pull request #811 from terrelln/segmentSize [cover] Fix end condition for small dictionary	2017-08-22 14:36:30 -07:00
Nick Terrell	29c2d9a4d0	[cover] Turn down notification for ZDICT subroutines	2017-08-21 14:28:31 -07:00
Nick Terrell	98de3f6847	[cover] Add dictionary size to compressed size	2017-08-21 14:23:17 -07:00
Nick Terrell	9a54a315aa	[cover] Convert score to U32 and check for zero	2017-08-21 13:30:07 -07:00
Nick Terrell	d49eb40c03	[cover] Stop when segmentSize is less than d	2017-08-21 13:10:03 -07:00
Nick Terrell	f306d400c0	[cover] Fix divide by zero	2017-08-21 11:12:11 -07:00
Yann Collet	32fb407c9d	updated a bunch of headers for the new license	2017-08-18 16:52:05 -07:00
Yann Collet	b71363b967	check pthread_*_init() success condition	2017-07-19 01:05:40 -07:00
Yann Collet	2bd6440be0	pinned down error code enum values Note : all error codes are changed by this new version, but it's expected to be the last change for existing codes. Codes are now grouped by category, and receive a manually attributed value. The objective is to guarantee that error code values will not change in the future when introducing new codes. Intentionnal empty spaces and ranges are defined in order to keep room for potential new codes.	2017-07-13 17:12:16 -07:00
Yann Collet	590937df20	Merge pull request #739 from facebook/refPrefix ZSTD_refPrefix	2017-06-29 04:36:03 -07:00
Yann Collet	7d3816183f	exposed ZSTD_MAGIC_DICTIONARY in zstd.h makes it easier to explain ZSTD_dictMode	2017-06-27 13:50:34 -07:00
Nick Terrell	5b7fd7c422	[zdict] Make COVER the default algorithm	2017-06-26 21:09:22 -07:00
Yann Collet	ee970398b2	Merge branch 'dev' into advancedAPI2	2017-05-22 12:33:56 -07:00
Nick Terrell	a1280406b0	[libzstd] Allow users to define custom visibility	2017-05-19 18:01:59 -07:00
Yann Collet	fa3671eac7	changed ZSTD_BLOCKSIZE_ABSOLUTEMAX into ZSTD_BLOCKSIZE_MAX Also : change ZSTD_getBlockSizeMax() into ZSTD_getBlockSize() created ZSTD_BLOCKSIZELOG_MAX	2017-05-19 10:51:30 -07:00
Nick Terrell	f376d47c11	[CLI] Switch dictionary builder on CLI to cover	2017-05-02 11:18:27 -07:00
Nick Terrell	020b960e13	[cover] Make optimization faster	2017-05-02 11:02:48 -07:00
Nick Terrell	f2d9ef1dc0	[cover] Optimize case where d <= 8	2017-05-02 11:02:43 -07:00
Nick Terrell	865918dd04	Fix typo in zdict.h	2017-05-02 11:02:37 -07:00
Nick Terrell	5152fb2cb2	Convert all tabs to spaces	2017-03-29 18:51:58 -07:00
Yann Collet	4cf0093571	restored bonus rule	2017-03-26 14:51:00 -07:00

1 2 3

138 Commits