AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yann Collet	27af35c110	Merge pull request #1143 from facebook/tableLevels Update table of compression levels	2018-05-19 14:40:37 -07:00
Yann Collet	ade583948d	Merge branch 'tableLevels' of github.com:facebook/zstd into tableLevels	2018-05-18 18:23:40 -07:00
Yann Collet	5381369cb1	Merge branch 'dev' into tableLevels	2018-05-18 18:23:27 -07:00
Yann Collet	ca06a1d82f	Merge pull request #1142 from terrelln/better-dict [cover] Small compression ratio improvement	2018-05-18 17:19:13 -07:00
Yann Collet	38c2c46823	Merge branch 'dev' into tableLevels	2018-05-18 17:17:45 -07:00
Yann Collet	b0b3fb517d	updated compression levels for blocks of 256KB	2018-05-18 17:17:12 -07:00
Nick Terrell	7cbb8bbbbf	[cover] Small compression ratio improvement The cover algorithm selects one segment per epoch, and it selects the epoch size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs gives the algorithm more candidates to choose from for each segment it selects, and then it will loop back to the first epoch when it hits the last one. The trade off is that now it takes longer to select each segment, since it has to look at more data before making a choice. I benchmarked on the following data sets using this command: ```sh $ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR \| wc -c ``` \| Data set \| k (approx) \| Before \| After \| % difference \| \|--------------\|------------\|----------\|----------\|--------------\| \| GitHub \| ~1000 \| 738138 \| 746610 \| +1.14% \| \| hg-changelog \| ~90 \| 4295156 \| 4285336 \| -0.23% \| \| hg-commands \| ~500 \| 1095580 \| 1079814 \| -1.44% \| \| hg-manifest \| ~400 \| 16559892 \| 16504346 \| -0.34% \| There is some noise in the measurements, since small changes to `k` can have large differences, which is why I'm using `steps=256`, to try to minimize the noise. However, the GitHub data set still has some noise. If I run the GitHub data set on my Mac, which presumably lists directory entries in a different order, so the dictionary builder sees the files in a different order, or I use `steps=1024` I see these results. \| Run \| Before \| After \| % difference \| \|------------\|--------\|--------\|--------------\| \| steps=1024 \| 738138 \| 734470 \| -0.50% \| \| MacBook \| 738451 \| 737132 \| -0.18% \| Question: Should we expose this as a parameter? I don't think it is necessary. Someone might want to turn it up to exchange a much longer dictionary building time in exchange for a slightly better dictionary. I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16` with a faster running time.	2018-05-18 16:15:27 -07:00
Yann Collet	44303428c6	Merge pull request #1139 from fbrosson/prefetch __builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 13:23:35 -07:00
fbrosson	291824f49d	__builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 18:40:11 +00:00
Yann Collet	bd6417de7f	Merge pull request #1140 from fbrosson/cpu-asm Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 10:32:16 -07:00
fbrosson	16bb8f1f9e	Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 17:05:36 +00:00
Yann Collet	63eeeaa1dd	update table levels for blocks <= 16K also : allow hlog to be slighly larger than windowlog, as it's apparently good for both speed and compression ratio.	2018-05-16 16:13:37 -07:00
Yann Collet	9938b17d4c	Merge pull request #1135 from facebook/frameCSize decompress: changed error code when input is too large	2018-05-15 11:02:53 -07:00
Yann Collet	b14c4bff96	Merge pull request #1136 from terrelln/fix Fix failing Travis tests	2018-05-15 11:02:01 -07:00
Nick Terrell	30d9c84b1a	Fix failing Travis tests	2018-05-15 09:46:20 -07:00
Yann Collet	f372ffc64d	Merge pull request #1127 from facebook/staticDictCost Improved optimal parser with dictionary	2018-05-14 17:45:50 -07:00
Yann Collet	d59cf02df0	decompress: changed error code when input is too large ZSTD_decompress() can decompress multiple frames sent as a single input. But the input size must be the exact sum of all compressed frames, no more. In the case of a mistake on srcSize, being larger than required, ZSTD_decompress() will try to decompress a new frame after current one, and fail. As a consequence, it will issue an error code, ERROR(prefix_unknown). While the error is technically correct (the decoder could not recognise the header of _next_ frame), it's confusing, as users will believe that the first header of the first frame is wrong, which is not the case (it's correct). It makes it more difficult to understand that the error is in the source size, which is too large. This patch changes the error code provided in such a scenario. If (at least) a first frame was successfully decoded, and then following bytes are garbage values, the decoder assumes the provided input size is wrong (too large), and issue the error code ERROR(srcSize_wrong).	2018-05-14 15:32:28 -07:00
Yann Collet	c8c67f7c84	Merge branch 'dev' into tableLevels	2018-05-14 11:55:52 -07:00
Yann Collet	174bd3d4a7	Merge pull request #1131 from facebook/zstdcli minor: control numeric argument overflow	2018-05-14 11:53:58 -07:00
Yann Collet	5d76201fee	Merge pull request #1130 from facebook/man fix #1115	2018-05-14 11:52:53 -07:00
Yann Collet	902db38798	Merge pull request #1129 from facebook/paramgrill Paramgrill refactoring	2018-05-14 11:52:41 -07:00
Yann Collet	3870db1ba5	Merge branch 'dev' into tableLevels	2018-05-14 11:52:05 -07:00
Yann Collet	4da0216db0	Merge pull request #1133 from felixhandte/travis-fix Make Travis CI Run `apt-get update`	2018-05-14 09:59:43 -07:00
W. Felix Handte	e26be5a7b3	Travis CI Runs apt-get Update	2018-05-14 11:55:21 -04:00
Yann Collet	2c392952f9	paramgrill: use NB_LEVELS_TRACKED in loop make it easier to generate/track more levels than ZSTD_maxClevel()	2018-05-13 17:25:53 -07:00
Yann Collet	c9227ee16b	update table for 128 KB blocks	2018-05-13 17:15:07 -07:00
Yann Collet	b4250489cf	update compression levels for large inputs	2018-05-13 01:53:38 -07:00
Yann Collet	9cd5c63771	cli: control numeric argument overflow exit on overflow backported from paramgrill added associated test case	2018-05-12 14:29:33 -07:00
Yann Collet	3f89cd1081	minor : factor out errorOut()	2018-05-12 14:09:32 -07:00
Yann Collet	b824d213cb	fix #1115	2018-05-12 10:21:30 -07:00
Yann Collet	50993901b2	paramgrill: subtle change in level spacing distance between levels is slightly increased to compensate for level 1 speed improvements and the will to have stronger level 19 extending the range of speed to cover.	2018-05-12 09:40:04 -07:00
Yann Collet	a3f2e84a37	added programmable constraints	2018-05-11 19:43:08 -07:00
Yann Collet	17c19fbbb5	generalized use of readU32FromChar() and check input overflow	2018-05-11 17:32:26 -07:00
Yann Collet	761758982e	replaced FSE_count by FSE_count_simple to reduce usage of stack memory. Also : tweaked a few comments, as suggested by @terrelln	2018-05-11 16:03:37 -07:00
Yann Collet	66b81817b5	Merge pull request #1128 from facebook/libdir minor Makefile patch	2018-05-11 11:47:59 -07:00
Yann Collet	3193d692c2	minor patch, ensuring LIBDIR is created before installation follow-up from #1123	2018-05-11 11:31:48 -07:00
Yann Collet	99ddca43a6	fixed wrong assertion base can actually overflow	2018-05-10 19:48:09 -07:00
Yann Collet	0d7626672d	fixed c++ conversion warning	2018-05-10 18:17:21 -07:00
Yann Collet	09d0fa29ee	minor adjusting of weights	2018-05-10 18:13:48 -07:00
Yann Collet	1a26ec6e8d	opt: init statistics from dictionary instead of starting from fake "default" statistics.	2018-05-10 17:59:12 -07:00
Yann Collet	74b1c75d64	btopt : minor adjustment of update frequencies	2018-05-10 16:32:36 -07:00
Yann Collet	498ab7bb12	Merge pull request #1123 from baruchsiach/fix-install-pc lib/Makefile: create include directory before headers installation	2018-05-10 10:38:19 -07:00
Yann Collet	ac6105463a	opt: minor improvements to log traces slight improvement when using fractional-bit evaluation (opt:dictionay)	2018-05-09 15:46:11 -07:00
Yann Collet	c39061cb7b	fixed declaration-after-statement warning	2018-05-09 12:07:25 -07:00
Yann Collet	4d5bd32a00	added traces to look at symbol costs evaluation looks correct.	2018-05-09 12:00:12 -07:00
Yann Collet	c0da0f5e9e	switchable bit-approximation / fractional-bit accuracy modes also : makes it possible to select nb of fractional bits.	2018-05-09 10:48:09 -07:00
Yann Collet	33dfc54069	Merge pull request #1124 from terrelln/playtests-gnu Write to /dev/random for test	2018-05-09 09:39:02 -07:00
Yann Collet	ba2ad9b6b9	implemented fractional bit cost evaluation for FSE symbols. While it seems to work, the gains are negligible compared to rough maxNbBits evaluation. There are even a few losses sometimes, that still need to be explained. Furthermode, there are still cases where btlazy2 does a better job than btopt, which seems rather strange too.	2018-05-08 17:43:13 -07:00
Yann Collet	1aff63b114	opt: shift all costs by 8 bits (* 256) making it possible to represent fractional bit costs.	2018-05-08 16:19:04 -07:00
Yann Collet	6a3c34aa58	opt: estimate cost of both Hufman and FSE symbols For FSE symbols : provide an upper bound, in nb of bits, since cost function is not able to store fractional bit costs.	2018-05-08 16:11:21 -07:00

1 2 3 4 5 ...

5068 Commits