AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
W. Felix Handte	265c2869d1	Split Wrapper Functions to Cause Inlining	2018-05-23 17:53:03 -04:00
W. Felix Handte	6929964d65	Add bounds check in repcode tests	2018-05-23 17:53:03 -04:00
W. Felix Handte	70a537d1d7	Initial Repcode Check Support for Ext Dict Ctx	2018-05-23 17:53:03 -04:00
W. Felix Handte	8d24ff0353	Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx	2018-05-23 17:53:03 -04:00
W. Felix Handte	d18a405779	Refer to the Dictionary Match State In-Place (Sometimes)	2018-05-23 17:53:03 -04:00
Nick Terrell	c92dd11940	Error if reported size is too large in edge case	2018-05-23 14:47:20 -07:00
Nick Terrell	a97e9a627a	[zstd] Fix decompression edge case This edge case is only possible with the new optimal encoding selector, since before zstd would always choose `set_basic` for small numbers of sequences. Fix `FSE_readNCount()` to support buffers < 4 bytes. Credit to OSS-Fuzz	2018-05-23 12:16:00 -07:00
Yann Collet	27dc078aa6	Merge pull request #1144 from terrelln/fse-entropy Approximate FSE encoding costs for selection	2018-05-22 19:25:37 -07:00
Yann Collet	4a498f03dc	Merge pull request #1145 from terrelln/spec Clarify what happens when Number_of_Sequences == 0	2018-05-22 16:21:40 -07:00
Nick Terrell	73f4c890cd	Clarify what happens when Number_of_Sequences == 0	2018-05-22 16:12:33 -07:00
Nick Terrell	e3959d5eba	Fixes	2018-05-22 16:06:33 -07:00
Yann Collet	7a8b3496b4	Merge branch 'dev' into staticDictCost	2018-05-22 15:10:05 -07:00
Yann Collet	a8ddf1d370	disable 2-passes strategy	2018-05-22 15:06:36 -07:00
Nick Terrell	49cf880513	Approximate FSE encoding costs for selection Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and `set_repeat`, and select the one with the lowest cost. * The cost of `set_basic` is computed using the cross-entropy cost function `ZSTD_crossEntropyCost()`, using the normalized default count and the count. * The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the previous table to see if it is able to represent the distribution. * The cost of `set_compressed` is computed with the entropy cost function `ZSTD_entropyCost()`, together with the cost of writing the normalized count `ZSTD_NCountCost()`.	2018-05-22 14:33:22 -07:00
Yann Collet	27af35c110	Merge pull request #1143 from facebook/tableLevels Update table of compression levels	2018-05-19 14:40:37 -07:00
Yann Collet	ade583948d	Merge branch 'tableLevels' of github.com:facebook/zstd into tableLevels	2018-05-18 18:23:40 -07:00
Yann Collet	5381369cb1	Merge branch 'dev' into tableLevels	2018-05-18 18:23:27 -07:00
Yann Collet	ca06a1d82f	Merge pull request #1142 from terrelln/better-dict [cover] Small compression ratio improvement	2018-05-18 17:19:13 -07:00
Yann Collet	38c2c46823	Merge branch 'dev' into tableLevels	2018-05-18 17:17:45 -07:00
Yann Collet	b0b3fb517d	updated compression levels for blocks of 256KB	2018-05-18 17:17:12 -07:00
Nick Terrell	7cbb8bbbbf	[cover] Small compression ratio improvement The cover algorithm selects one segment per epoch, and it selects the epoch size such that `epochs * segmentSize ~= dictSize`. Selecting less epochs gives the algorithm more candidates to choose from for each segment it selects, and then it will loop back to the first epoch when it hits the last one. The trade off is that now it takes longer to select each segment, since it has to look at more data before making a choice. I benchmarked on the following data sets using this command: ```sh $ZSTD -T0 -3 --train-cover=d=8,steps=256 $DIR -r -o dict && $ZSTD -3 -D dict -rc $DIR \| wc -c ``` \| Data set \| k (approx) \| Before \| After \| % difference \| \|--------------\|------------\|----------\|----------\|--------------\| \| GitHub \| ~1000 \| 738138 \| 746610 \| +1.14% \| \| hg-changelog \| ~90 \| 4295156 \| 4285336 \| -0.23% \| \| hg-commands \| ~500 \| 1095580 \| 1079814 \| -1.44% \| \| hg-manifest \| ~400 \| 16559892 \| 16504346 \| -0.34% \| There is some noise in the measurements, since small changes to `k` can have large differences, which is why I'm using `steps=256`, to try to minimize the noise. However, the GitHub data set still has some noise. If I run the GitHub data set on my Mac, which presumably lists directory entries in a different order, so the dictionary builder sees the files in a different order, or I use `steps=1024` I see these results. \| Run \| Before \| After \| % difference \| \|------------\|--------\|--------\|--------------\| \| steps=1024 \| 738138 \| 734470 \| -0.50% \| \| MacBook \| 738451 \| 737132 \| -0.18% \| Question: Should we expose this as a parameter? I don't think it is necessary. Someone might want to turn it up to exchange a much longer dictionary building time in exchange for a slightly better dictionary. I tested `2`, `4`, and `16`, and `4` got most of the benefit of `16` with a faster running time.	2018-05-18 16:15:27 -07:00
Yann Collet	5cbef6e094	Merge branch 'dev' into staticDictCost	2018-05-18 16:03:06 -07:00
Yann Collet	a95e9e80d1	adding some debug functions to observe statistics	2018-05-18 14:09:42 -07:00
Yann Collet	44303428c6	Merge pull request #1139 from fbrosson/prefetch __builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 13:23:35 -07:00
fbrosson	291824f49d	__builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 18:40:11 +00:00
Yann Collet	bd6417de7f	Merge pull request #1140 from fbrosson/cpu-asm Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 10:32:16 -07:00
fbrosson	16bb8f1f9e	Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 17:05:36 +00:00
Yann Collet	af3da079d1	fixed minor conversion warning	2018-05-17 17:27:27 -07:00
Yann Collet	8572b4d09f	fixed a pretty complex bug when combining ldm + btultra	2018-05-17 16:13:53 -07:00
Yann Collet	134388ba6b	collect statistics for first block in ultra mode this patch makes btultra do 2 passes on the first block, the first one being dedicated to collecting statistics so that the 2nd pass is more accurate. It translates into a very small compression ratio gain : enwik7, level 20: blocks 4K : 2.142 -> 2.153 blocks 16K : 2.447 -> 2.457 blocks 64K : 2.716 -> 2.726 On the other hand, the cpu cost is doubled. The trade off looks bad. Though, that's ultimately a price to pay to reach better compression ratio. So it's only enabled when setting btultra.	2018-05-17 12:24:30 -07:00
Yann Collet	a243020d37	slightly improved weight calculation translating into a tiny compression ratio improvement	2018-05-17 11:19:44 -07:00
Yann Collet	63eeeaa1dd	update table levels for blocks <= 16K also : allow hlog to be slighly larger than windowlog, as it's apparently good for both speed and compression ratio.	2018-05-16 16:13:37 -07:00
Yann Collet	18fc3d3cd5	introduced bit-fractional cost evaluation this improves compression ratio by a tiny amount. It also reduces speed by a small amount. Consequently, bit-fractional evaluation is only turned on for btultra.	2018-05-16 14:53:35 -07:00
Yann Collet	9938b17d4c	Merge pull request #1135 from facebook/frameCSize decompress: changed error code when input is too large	2018-05-15 11:02:53 -07:00
Yann Collet	b14c4bff96	Merge pull request #1136 from terrelln/fix Fix failing Travis tests	2018-05-15 11:02:01 -07:00
Nick Terrell	30d9c84b1a	Fix failing Travis tests	2018-05-15 09:46:20 -07:00
Yann Collet	0b31304c8d	Merge branch 'dev' into staticDictCost	2018-05-14 18:09:26 -07:00
Yann Collet	2c26df0e13	opt: removed static prices after testing, it's actually always better to use dynamic prices albeit initialised from dictionary.	2018-05-14 18:04:08 -07:00
Yann Collet	f372ffc64d	Merge pull request #1127 from facebook/staticDictCost Improved optimal parser with dictionary	2018-05-14 17:45:50 -07:00
Yann Collet	d59cf02df0	decompress: changed error code when input is too large ZSTD_decompress() can decompress multiple frames sent as a single input. But the input size must be the exact sum of all compressed frames, no more. In the case of a mistake on srcSize, being larger than required, ZSTD_decompress() will try to decompress a new frame after current one, and fail. As a consequence, it will issue an error code, ERROR(prefix_unknown). While the error is technically correct (the decoder could not recognise the header of _next_ frame), it's confusing, as users will believe that the first header of the first frame is wrong, which is not the case (it's correct). It makes it more difficult to understand that the error is in the source size, which is too large. This patch changes the error code provided in such a scenario. If (at least) a first frame was successfully decoded, and then following bytes are garbage values, the decoder assumes the provided input size is wrong (too large), and issue the error code ERROR(srcSize_wrong).	2018-05-14 15:32:28 -07:00
Yann Collet	c8c67f7c84	Merge branch 'dev' into tableLevels	2018-05-14 11:55:52 -07:00
Yann Collet	174bd3d4a7	Merge pull request #1131 from facebook/zstdcli minor: control numeric argument overflow	2018-05-14 11:53:58 -07:00
Yann Collet	5d76201fee	Merge pull request #1130 from facebook/man fix #1115	2018-05-14 11:52:53 -07:00
Yann Collet	902db38798	Merge pull request #1129 from facebook/paramgrill Paramgrill refactoring	2018-05-14 11:52:41 -07:00
Yann Collet	3870db1ba5	Merge branch 'dev' into tableLevels	2018-05-14 11:52:05 -07:00
Yann Collet	4da0216db0	Merge pull request #1133 from felixhandte/travis-fix Make Travis CI Run `apt-get update`	2018-05-14 09:59:43 -07:00
W. Felix Handte	e26be5a7b3	Travis CI Runs apt-get Update	2018-05-14 11:55:21 -04:00
Yann Collet	2c392952f9	paramgrill: use NB_LEVELS_TRACKED in loop make it easier to generate/track more levels than ZSTD_maxClevel()	2018-05-13 17:25:53 -07:00
Yann Collet	c9227ee16b	update table for 128 KB blocks	2018-05-13 17:15:07 -07:00
Yann Collet	b4250489cf	update compression levels for large inputs	2018-05-13 01:53:38 -07:00

1 2 3 4 5 ...

5141 Commits