AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yann Collet	9698d2fb72	Merge pull request #1189 from facebook/hist histogram module	2018-06-14 20:39:52 -04:00
Yann Collet	1adf84ccb7	renamed all HUF_decompressX4() functions into X2 to underline they generate up to 2 symbols per decoding, in preparation for a future X3 variant.	2018-06-14 15:17:03 -04:00
Yann Collet	a09af5eb6b	renamed all HUF_decompressX2() functions into X1 to underline they generate one symbol per decoding operation. The new naming scheme will make it easier to introduce an X3 variant.	2018-06-14 15:08:43 -04:00
Yann Collet	fc682263d0	fixed g_debuglevel variable name in debug.h	2018-06-13 20:02:33 -04:00
Yann Collet	2d76defbfe	grouped all histogram functions into hist.c renamed functions with HIST_* prefix	2018-06-13 19:49:31 -04:00
Yann Collet	fa41bcc2c2	grouped debug functions into debug.h There were 2 competing set of debug functions within zstd_internal.h and bitstream.h. They were mostly duplicate, and required care to avoid messing with each other. There is now a single implementation, shared by both. Significant change : The macro variable ZSTD_DEBUG does no longer exist, it has been replaced by DEBUGLEVEL, which required modifying several source files.	2018-06-13 15:43:09 -04:00
Yann Collet	463a0fe38b	simplified optimal parser removed "cached" structure. prices are now saved in the optimal table. Primarily done for simplification. Might improve speed by a little. But actually, and surprisingly, also improves ratio in some circumstances.	2018-05-29 14:07:25 -07:00
Yann Collet	b5ef32fea7	Merge branch 'dev' into fracFse	2018-05-24 14:09:49 -07:00
Yann Collet	776128d16f	fix corner case when requiring cost of an FSE symbol ensure that, when frequency[symbol]==0, result is (tableLog + 1) bits with both upper-bit and fractional-bit estimates. Also : enable BIT_DEBUG in /tests	2018-05-24 13:59:11 -07:00
Nick Terrell	f2d0924b87	Variable declarations	2018-05-23 14:58:58 -07:00
Nick Terrell	c92dd11940	Error if reported size is too large in edge case	2018-05-23 14:47:20 -07:00
Nick Terrell	a97e9a627a	[zstd] Fix decompression edge case This edge case is only possible with the new optimal encoding selector, since before zstd would always choose `set_basic` for small numbers of sequences. Fix `FSE_readNCount()` to support buffers < 4 bytes. Credit to OSS-Fuzz	2018-05-23 12:16:00 -07:00
Nick Terrell	e3959d5eba	Fixes	2018-05-22 16:06:33 -07:00
Nick Terrell	49cf880513	Approximate FSE encoding costs for selection Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and `set_repeat`, and select the one with the lowest cost. * The cost of `set_basic` is computed using the cross-entropy cost function `ZSTD_crossEntropyCost()`, using the normalized default count and the count. * The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the previous table to see if it is able to represent the distribution. * The cost of `set_compressed` is computed with the entropy cost function `ZSTD_entropyCost()`, together with the cost of writing the normalized count `ZSTD_NCountCost()`.	2018-05-22 14:33:22 -07:00
fbrosson	291824f49d	__builtin_prefetch did probably not exist before gcc 3.1.	2018-05-18 18:40:11 +00:00
fbrosson	16bb8f1f9e	Drop colon in asm snippet to make old versions of gcc happy.	2018-05-18 17:05:36 +00:00
Yann Collet	0d7626672d	fixed c++ conversion warning	2018-05-10 18:17:21 -07:00
Yann Collet	1a26ec6e8d	opt: init statistics from dictionary instead of starting from fake "default" statistics.	2018-05-10 17:59:12 -07:00
Yann Collet	c39061cb7b	fixed declaration-after-statement warning	2018-05-09 12:07:25 -07:00
Yann Collet	4d5bd32a00	added traces to look at symbol costs evaluation looks correct.	2018-05-09 12:00:12 -07:00
Yann Collet	c0da0f5e9e	switchable bit-approximation / fractional-bit accuracy modes also : makes it possible to select nb of fractional bits.	2018-05-09 10:48:09 -07:00
Yann Collet	ba2ad9b6b9	implemented fractional bit cost evaluation for FSE symbols. While it seems to work, the gains are negligible compared to rough maxNbBits evaluation. There are even a few losses sometimes, that still need to be explained. Furthermode, there are still cases where btlazy2 does a better job than btopt, which seems rather strange too.	2018-05-08 17:43:13 -07:00
Yann Collet	6a3c34aa58	opt: estimate cost of both Hufman and FSE symbols For FSE symbols : provide an upper bound, in nb of bits, since cost function is not able to store fractional bit costs.	2018-05-08 16:11:21 -07:00
Yann Collet	338f738c24	pass entropy tables to optimal parser for proper estimation of symbol's weights when using dictionary compression. Note : using only huffman costs is not good enough, presumably because sequence symbol costs are incorrect.	2018-05-08 15:37:06 -07:00
taigacon	2c3ad05812	Fix the problem that enables DYNAMIC_BMI2 macro by mistake on ARM architecture with Clang (#1110 )	2018-04-23 15:41:50 -07:00
Yann Collet	ad15c1b724	added __has_attribute() define for non-clang compilers	2018-03-23 19:04:48 -07:00
Yann Collet	52ca7c6c56	make DYNAMIC_BMI2 support of clang conditional to __has_attribute() to support older clang versions such as 3.4	2018-03-23 18:45:42 -07:00
Yann Collet	192542b63c	Merge pull request #1047 from facebook/hufCompress removed huf_compress_impl.h	2018-03-15 14:14:03 -07:00
Yann Collet	a909c293c6	Merge branch 'dev' into hufCompress	2018-03-14 16:11:25 -07:00
Nick Terrell	a9a6dcba63	Expose reference external sequence API * Expose the reference external sequences API for zstdmt. Allows external sequences of any length, which get split when necessary. * Reset the LDM window when the context is reset. * Store the maximum number of LDM sequences. * Sequence generation now returns the number of last literals. * Fix sequence generation to not throw out the last literals when blocks of more than 1 MB are encountered.	2018-03-14 12:29:31 -07:00
Yann Collet	a95a88af57	removed huf_compress_impl.h re-imported all functions inside huf_compress.c for easier source editing. Also updated a bunch of code comments for clarification.	2018-03-13 14:14:05 -07:00
Yann Collet	bd7bb94361	Merge pull request #1044 from baldurk/remove-utf8-characters Remove non-ASCII characters in header file comments	2018-03-13 13:22:07 -07:00
Baldur Karlsson	430a2fec19	Remove non-ASCII characters in header file comments * Replaced a non-breaking space and an en dash with a plain space and a hyphen. * This means the files are simple ASCII and less likely to run into codepage issues.	2018-03-13 20:05:53 +00:00
Yann Collet	51169575a8	Merge pull request #1036 from terrelln/thread-void [threading] Cast unused arguments to void	2018-03-07 12:14:05 -08:00
Nick Terrell	7e103cdaf5	[threading] Cast unused arguments to void	2018-03-06 18:36:40 -08:00
Yann Collet	d02b44cf55	DYNAMIC_BMI2 enabled for clang clang only claims compatibility with gcc 4.2. Consequently, recent patch which reserved DYNAMIC_BMI2 for gcc >= 4.8 also disabled it for clang. fix : __clang__ is now enough to enable DYNAMIC_BMI2 (associated with other existing conditions : x64/x64, !bmi2)	2018-03-04 16:05:59 -08:00
Yann Collet	45b09e7625	limit DYNAMIC_BMI2 to gcc >= 4.8 attribute bmi2 not supported by gcc 4.4	2018-03-01 15:02:18 -08:00
Yann Collet	89741653ab	added error code workSpace_tooSmall	2018-02-26 15:11:50 -08:00
Yann Collet	6cdf690441	minor cleaning of huff0 Update code documentation, and properly names a few "magic constants". Also, HUF_compress_internal() gets a cleaner way to determine size of tables inside workspace.	2018-02-26 14:52:23 -08:00
Nick Terrell	af866b3a58	Split block compresser out of long range matcher * `ZSTD_ldm_generateSequences()` generates the LDM sequences and stores them in a table. It should work with any chunk size, but is currently only called one block at a time. * `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and instead of encoding the literals directly, it passes them to a secondary block compressor. The code to handle chunk sizes greater than the block size is currently commented out, since it is unused. The next PR will uncomment exercise this code. * During optimal parsing, ensure LDM `minMatchLength` is at least `targetLength`. Also don't emit repcode matches in the LDM block compressor. Enabling the LDM with the optimal parser now actually improves the compression ratio. * The compression ratio is very similar to before. It is very slightly different, because the repcode handling is slightly different. If I remove immediate repcode checking in both branches the compressed size is exactly the same. * The speed looks to be the same or better than before. Up Next (in a separate PR) -------------------------- Allow sequence generation to happen prior to compression, and produce more than a block worth of sequences. Expose some API for zstdmt to consume. This will test out some currently untested code in `ZSTD_ldm_blockCompress()`.	2018-02-22 15:18:41 -08:00
Yann Collet	010ba5f71f	Merge pull request #1017 from terrelln/c-bmi2 [compress] Support BMI2	2018-02-20 15:34:59 -08:00
Yann Collet	70163bf0d3	added clarification comments in zstd_errors.h answering some points in #1018	2018-02-20 12:54:49 -08:00
Nick Terrell	b58f01537e	[compress] Support BMI2	2018-02-14 19:20:32 -08:00
Nick Terrell	4319132312	[decompress] Support BMI2	2018-02-13 17:00:15 -08:00
Yann Collet	95424409ea	addBits and baseline into FSE decoding table note : unfinished - need new default tables - need modify long mode	2018-02-09 04:25:15 -08:00
Yann Collet	0170cf9a7a	minor : modified ZSTD_preserveUnsortedMark() to be more vectorization friendly	2018-02-05 11:46:02 -08:00
Yann Collet	997e4d0ccd	added POOL_tryAdd()	2018-01-18 14:39:51 -08:00
Nick Terrell	887cd4e35e	Split ZSTD_CCtx into smaller sub-structures	2018-01-16 11:17:50 -08:00
Yann Collet	e8093dde09	fixed #304 Pathological samples may result in literal section being incompressible. This case is now detected, and literal distribution is replaced by one that can be written into the dictionary.	2018-01-11 11:16:32 -08:00
Yann Collet	f299fa39ac	fix a subtle issue in continue mode The deep fuzzer tests caught a subtle bug that was probably there for a long time. The impact of the bug is not a crash, or any other clear error signal, rather, it reduces performance, by cutting data into smaller blocks. Eventually, the following test would fail because it produces too many 1-byte blocks, requiring more space than buffer can provide : `./zstreamtest_asan --mt -s3514 -t1678312 -i1678314` The root scenario is as follows : - Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1` - The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`. `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`. The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes. - Complete compression - New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN` trigger "continue mode" - Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only, but in fact, it also depends on `pledgedSrcSize`. - The "old" blocksize (1) is still there, next compression will use this value to cut input into blocks, resulting in more blocks and worse performance than necessary performance. Given the scenario, and its possible variants, I'm surprised it did not show up before. But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger. The above test is a special corner case, where performance is so impacted that it reaches an error case. The fix works, but I'm not completely pleased. I think the current code relies too much on implied relations between variables. This will likely break again in the future when some related part of the code change. Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3. So a longer term fix will have to be considered after the release. To do : create a reliable test case which triggers this scenario for CI tests.	2017-12-19 09:43:03 +01:00
Yann Collet	c173dbd6e7	no longer supported starting C++17	2017-12-04 18:00:53 -08:00
Yann Collet	0a0a212934	zstd_opt: changed cost formula There was a flaw in the formula which compared literal cost with match cost : at a given position, a non-null literal suite is going to be part of next sequence, while if position ends a previous match, to immediately start another match, next sequence will have a litlength of zero. A litlength of zero has a non-null cost. It follows that literals cost should be compared to match cost + litlength==0. Not doing so gave a structural advantage to matches, which would be selected more often. I believe that's what led to the creation of the strange heuristic which added a complex cost to matches. The heuristic was actually compensating. It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar). The problem with this heuristic is that it's hard to understand, and unfortunately, any future change in the parser would impact the way it should be calculated and its effects. The "proper" formula makes it possible to remove this heuristic. Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse. Note that all differences are small (< 0.01 ratio). In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7). I suspect that's because starting statistics are pretty poor (another area of improvement). However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...). It's a pity that zstd -22 gets worse on silesia.tar. That being said, I like that the new code gets rid of strange variables, which were introducing complexity for any future evolution (faster variants being in mind). Therefore, in spite of this detrimental side effect, I tend to be in favor of it.	2017-11-28 14:07:03 -08:00
Yann Collet	cdade555ee	fixed one UB pointer arithmetic	2017-11-17 11:40:08 -08:00
Yann Collet	05dffe43a7	Fixed Btree update ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate. With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree. Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched. Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on. Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate, so that it no longer depends on a future function to do this job. It took time to get there, as the issue started with a memory sanitizer error. The pb would have been easier to spot with a proper `assert()`. So this patch add a few of them. Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests. This patch enables them. Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt. So this patch also fixes them. - Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed. Now, to avoid this issue, each type is independent. - ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately. - ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN - ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime). - ZSTDMT : nbThreads is automatically clamped on setting the value.	2017-11-16 12:18:56 -08:00
Yann Collet	4202b2e8a6	merged rep search into btMatchSearch but there is a tree corruption somewhere ... bug hunt ongoing	2017-11-14 20:38:52 -08:00
Yann Collet	9a11f70dc3	merged repcode search into BT match search this version has same speed as branch `opt` which is itself 5-10% slower than branch `dev` (no identified reason) It does not compress exactly the same as `opt` or `dev`, maybe because it doesn't stop search after repcodes, leading to sometimes better compression, sometimes worse (by a small margin). warning : _extDict path does not work for the time being This means that benchmark module works, but file module will fail with large files (and high compression level). Objective is to fuse _extDict path into current one, in order to have a single parser to maintain.	2017-11-13 02:23:48 -08:00
Yann Collet	4191efa993	zstd_opt: ensure sufficient_len < ZSTD_OPT_NUM to simplify some tests	2017-11-08 11:24:00 -08:00
Yann Collet	8b6aecf2cb	moved a few structures from `zstd_internal.h` to `zstd_compress.h` which is a more precise scope	2017-11-07 16:03:14 -08:00
Yann Collet	61e5a1adfc	removed direct call to malloc() from pool.c	2017-10-31 17:43:24 -07:00
Nick Terrell	a86a7097ec	Ensure dictionary Huff table can encode any symbol * Ensure that the dictionary Huffman CTable has maxSymbolValue 255. * Fix a stack buffer overflow during compression dictionary loading.	2017-10-03 13:22:13 -07:00
Yann Collet	ee1ed78fcb	fix proper naming on FSE_createCTable() arguments in fse.h	2017-09-30 11:08:50 -07:00
Yann Collet	86b4fe5b45	adjustCParams : restored previous behavior unknowns srcSize presumed small if there is a dictionary (dictSize>0) and presumed large otherwise.	2017-09-28 18:14:28 -07:00
Yann Collet	54a827fff0	Merge branch 'dev' into newFormats Fixed conflicts in zstdmt_compress.c	2017-09-27 16:39:40 -07:00
Nick Terrell	6c41adfb28	[libzstd] pthread function prefixed with ZSTD_ * `sed -i 's/pthread_/ZSTD_pthread_/g' lib/{,common,compress,decompress,dictBuilder}/.[hc]` Fix up `lib/common/threading.[hc]` * `sed -i s/PTHREAD_MUTEX_LOCK/ZSTD_PTHREAD_MUTEX_LOCK/g lib/compress/zstdmt_compress.c`	2017-09-27 11:48:48 -07:00
Yann Collet	9416195221	changed error code when pos<=size condition is not respected Now pointing towards src_size or dst_size, instead of error_GENERIC.	2017-09-27 10:35:56 -07:00
Yann Collet	df4e9bba25	fixed constant errors for gcc in c99 mode C standard does not consider a `static const int` as a constant. This is a problem for initializer, and ZSTD_STATIC_ASSERT(). Replaced by macro values	2017-09-26 14:31:06 -07:00
Yann Collet	9f0b8dfbe9	Merge branch 'dev' into newFormats	2017-09-26 14:22:39 -07:00
Nick Terrell	c233bdbaee	Increase maximum window size * Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail on my Mac. * Maximum window size in 64-bit mode is 2GB, since that is the largest power of 2 that works with the overflow prevention. * Allow `--long=windowLog` to set the window log, along with `--zstd=wlog=#`. These options also set the window size during decompression, but don't override `--memory=#` if it is set. * Present a helpful error message when the window size is too large during decompression. * The long range matcher defaults to a hash log 7 less than the window log, which keeps it at 20 for window log 27. * Keep the default long range matcher window size and the default maximum window size at 27 for the API and CLI. * Add tests that use the maximum window size and hash size for compression and decompression.	2017-09-26 14:00:01 -07:00
Yann Collet	5d8fdd1641	Merge pull request #855 from terrelln/maxoff [libzstd] Increase MaxOff	2017-09-25 16:34:29 -07:00
Yann Collet	b8d4a3887f	introduced constant ZSTD_frameIdSize within zstd_internal.h This is the size of magic number. Avoids using `4` directly in source code, which is a bit less meaningful.	2017-09-25 15:26:18 -07:00
Nick Terrell	bbe77212ef	[libzstd] Increase MaxOff	2017-09-25 13:36:18 -07:00
Yann Collet	7c3dea42ce	added prototypes for advanced parameters for decompression API required to decode custom formats	2017-09-24 15:57:29 -07:00
Nick Terrell	74718d7e43	[bitstream] Allow adding 31 bits at a time	2017-09-19 13:57:33 -07:00
Stella Lau	eb3327c10a	Merge branch 'dev' of https://github.com/facebook/zstd into ldm-mergeDev	2017-09-11 15:00:01 -07:00
Yann Collet	3128e03be6	updated license header to clarify dual-license meaning as "or"	2017-09-08 00:09:23 -07:00
Stella Lau	eeff55dfa8	Merge remote-tracking branch 'upstream/dev' into ldm-mergeDev	2017-09-06 15:56:32 -07:00
Nick Terrell	423b133568	[POOL] Allow free on NULL when multithreading is disabled	2017-09-05 11:18:13 -07:00
Stella Lau	67d4a6161c	Add ldmBucketSizeLog param	2017-09-02 21:55:29 -07:00
Stella Lau	a1f04d518d	Move hashEveryLog to cctxParams and update cli	2017-09-01 15:05:47 -07:00
Stella Lau	767a0b3be1	Move ldm hashLog, bucketLog, and mml to cctxParams	2017-09-01 12:24:59 -07:00
Stella Lau	17d8e0bdcc	Merge remote-tracking branch 'upstream/longRangeMatcher' into ldm-integrate	2017-09-01 10:19:38 -07:00
Stella Lau	8081becadc	Add long distance matching as a CCtxParam	2017-09-01 09:18:58 -07:00
Yann Collet	d963daa6a9	fixed minor warning (empty translation unit)	2017-09-01 00:12:07 -07:00
Yann Collet	d7ad99b2ab	Merge branch 'longRangeMatcher' into dev	2017-08-31 18:08:37 -07:00
Stella Lau	6a546efb8c	Add long distance matcher Move last literals section to ZSTD_block_internal	2017-08-31 12:53:19 -07:00
Yann Collet	e21384fffb	fixed more file headers after license change (#825 )	2017-08-31 12:11:57 -07:00
Yann Collet	e9dc204f42	fixed a bunch of headers after license change (#825 )	2017-08-31 11:24:54 -07:00
Stella Lau	ee65701720	Minor fixes; remove formatting only changes	2017-08-29 20:27:35 -07:00
Stella Lau	c7a18b7c21	Localize 'dictMode' from cctx to function param	2017-08-29 15:52:24 -07:00
Nick Terrell	9822f97721	[error] Don't guard undef X with ifdef X	2017-08-29 11:54:38 -07:00
Nick Terrell	02033be08c	[pool] Visual Studios disallows empty structs	2017-08-28 17:19:01 -07:00
Nick Terrell	7c365eb02c	[threading] Fix ERROR macro after including windows.h	2017-08-28 16:25:02 -07:00
Stella Lau	024098a47d	Fix parameter retrieval from cdict	2017-08-25 17:58:28 -07:00
Stella Lau	2adde898c8	Fix typo with ZSTDMT_parameter	2017-08-25 16:13:40 -07:00
Stella Lau	eb7bbab36a	Remove ZSTD_p_refDictContent and dictContentByRef	2017-08-25 11:11:45 -07:00
Nick Terrell	de6c6bce85	Fix zstd_internal.h for C++ mode	2017-08-24 18:09:50 -07:00
Nick Terrell	26dc040a7b	[pool] Accept custom allocators	2017-08-24 17:01:41 -07:00
Nick Terrell	89dc856cae	[pool] Fix formatting	2017-08-24 16:48:32 -07:00
Stella Lau	5bc2c1e982	Add prototype support for customMem with cctxParams	2017-08-23 12:03:30 -07:00
Stella Lau	6f1a21c7e9	Remove formatting-only changes	2017-08-23 10:24:19 -07:00

1 2 3 4 5 ...

456 Commits