AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yann Collet	d100670045	btopt0 : a bit faster and weaker	2017-11-19 10:38:02 -08:00
Yann Collet	e6da37c430	created (hidden) new strategy btopt0 about ~+10% faster but losing ~0.01 compression ratio (note : amplitude vary a lot depending on files, but direction remains the same)	2017-11-19 10:21:21 -08:00
Yann Collet	e717a5b0dd	zstd_opt: minor speed optimization Calculate reference log2sums only once per serie of sequence (as opposed to once per sequence) Also: improved code comments	2017-11-18 16:24:02 -08:00
Yann Collet	a4a20a4b2f	fix un-initialized memory warning harmless, but cleaner	2017-11-17 15:51:52 -08:00
Yann Collet	23767e950a	fix one UB pointer arithmetic in encoder Instead of calculating distance between 2 memory objects, which is UB, we extract the offset from object 1, and transfer it into object 2.	2017-11-17 13:24:51 -08:00
Yann Collet	11e58d9ba4	fixed minor warning warning: void function returning a value (even if the return value is void)	2017-11-16 15:21:30 -08:00
Yann Collet	15768cabb5	fixed some complex scenarios Fixed : multithreading to compress some small data with dictionary Fixed : ZSTD_initCStream_usingCDict() Improved streaming memory usage when pledgedSrcSize is known.	2017-11-16 15:18:18 -08:00
Yann Collet	05dffe43a7	Fixed Btree update ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate. With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree. Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched. Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on. Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate, so that it no longer depends on a future function to do this job. It took time to get there, as the issue started with a memory sanitizer error. The pb would have been easier to spot with a proper `assert()`. So this patch add a few of them. Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests. This patch enables them. Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt. So this patch also fixes them. - Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed. Now, to avoid this issue, each type is independent. - ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately. - ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN - ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime). - ZSTDMT : nbThreads is automatically clamped on setting the value.	2017-11-16 12:18:56 -08:00
Yann Collet	dfc14579f5	removed wrong assertion	2017-11-15 15:35:56 -08:00
Yann Collet	c55e35b2fc	removed a few specialized traces	2017-11-15 15:04:53 -08:00
Yann Collet	61c2d70c86	shortened repcode match finder implementation	2017-11-15 14:37:40 -08:00
Yann Collet	d7e9805028	fixed corruption issue	2017-11-15 13:44:24 -08:00
Yann Collet	046ea53bef	still fighting data corruption due to messed up tree. Seems to happen when reaching end of buffer.	2017-11-15 11:29:24 -08:00
Yann Collet	4202b2e8a6	merged rep search into btMatchSearch but there is a tree corruption somewhere ... bug hunt ongoing	2017-11-14 20:38:52 -08:00
Yann Collet	9a11f70dc3	merged repcode search into BT match search this version has same speed as branch `opt` which is itself 5-10% slower than branch `dev` (no identified reason) It does not compress exactly the same as `opt` or `dev`, maybe because it doesn't stop search after repcodes, leading to sometimes better compression, sometimes worse (by a small margin). warning : _extDict path does not work for the time being This means that benchmark module works, but file module will fail with large files (and high compression level). Objective is to fuse _extDict path into current one, in order to have a single parser to maintain.	2017-11-13 02:23:48 -08:00
Yann Collet	eb47705b18	reduced scope of multiple variables renamed some variables for better understanding	2017-11-10 08:31:12 -08:00
Yann Collet	100d8ad6be	lib/compress: created ZSTD_LLcode() and ZSTD_MLcode() transform length into code. Since transformation is needed in several places throughout the code, better write the logic in one place.	2017-11-08 12:43:05 -08:00
Yann Collet	5aa0352742	zstd_opt: simplified ZSTD_getPrice() and ZSTD_updatePrice() interface ZSTD_getPrice() and ZSTD_updatePrice() accept normal matchlength as argument instead of matchlength-MINMATCH, which makes them easier / more logical to use and read. Conversion is simply done internally.	2017-11-08 12:23:27 -08:00
Yann Collet	bf730e2044	zstd_opt: refactor code for improved readability renamed variables to be more meaningful reduced scope of multiple variables removed some useless var attribution	2017-11-08 12:07:39 -08:00
Yann Collet	4191efa993	zstd_opt: ensure sufficient_len < ZSTD_OPT_NUM to simplify some tests	2017-11-08 11:24:00 -08:00
Yann Collet	ee441d5d2b	renamed zstd_compress.h into zstd_compress_internal.h to emphasize the fact that all definitions it contains must remain private, accross lib/compress modules.	2017-11-07 16:15:23 -08:00
Yann Collet	8b6aecf2cb	moved a few structures from `zstd_internal.h` to `zstd_compress.h` which is a more precise scope	2017-11-07 16:03:14 -08:00
Yann Collet	150354c5fe	minor refactor added some traces and assert related to hunting a potential ubsan error in 32-bits more (it ends up being a compiler-side issue : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82802). Modified one pointer arithmetic expression for a more conformant way.	2017-11-01 16:57:48 -07:00
Yann Collet	428e8b3bf4	fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize as per documentation, on ZSTD_setPledgedSrcSize() : > If all data is provided and consumed in a single round, > this value (pledgedSrcSize) is overriden by srcSize instead. This wasn't applied before compression level is transformed into compression parameters. As a consequence, small input missed compression parameters adaptation. It seems to work fine now : compression was compared with ZSTD_compress_advanced(), results were the same.	2017-11-01 13:15:23 -07:00
Nick Terrell	86b8134cad	[libzstd] Fix parameter selection for empty input ZSTD_compress() and friends would treat an empty input as an unknown size when selecting parameters. Thus, they would drastically overallocate the context. Tell ZSTD_getParams() that the source size is 1 when it is empty.	2017-10-25 17:24:15 -07:00
Yann Collet	1ff8a8c109	Merge pull request #891 from facebook/contentSize Content size	2017-10-17 17:24:51 -07:00
Yann Collet	32c9f715ae	fixed : Visual build compressing stdin with multi-threading enabled fails It was multiple reasons stacked : - Visual use a different code path, because ZSTD_NEWAPI is not defined - fileio.c sends `0` as `pledgedSrcSize` to mean `ZSTD_CONTENTSIZE_UNKNOWN` (fixed) - ZSTDMT_resetCCtx() interpreted `0` as "empty" instead of "unknown" (fixed)	2017-10-17 14:07:43 -07:00
Yann Collet	13bfe885aa	edited ZSTD_initCStream_advanced() comment	2017-10-16 14:06:22 -07:00
Nick Terrell	7f961ba6cd	Don't allow default tables to repeat It isn't useful in any case to repeat default tables. Saves a few bytes on Silesia, since we don't trigger the dictionary heuristic. Before: 211988480 => 73651998 bytes After: 211988480 => 73651721 bytes	2017-10-16 11:37:56 -07:00
Yann Collet	fc8d293460	dictionary compression use correct file size estimation when determining compression parameters to compress one file only. For multiple files, it still "bets" that files are going to be small. There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced() making it incapable to use pledgedSrcSize to determine compression parameters.	2017-10-14 01:21:43 -07:00
Yann Collet	beb9b4b398	fixed ZSTDMT_initCStream() when contentSizeFlag==1 by default and a wrong test in zstreamtest --mt	2017-10-13 19:09:30 -07:00
Yann Collet	213ef3b510	fixed ZSTD_initCStream_advanced() behavior, which depends on contentSizeFlag, and a stream fuzzer test, which was incorrect (relied on 0 being unconditionnally transformed into `ZSTD_CONTENTSIZE_UNKNOWN`)	2017-10-13 19:01:58 -07:00
Yann Collet	3c1e3f8ec9	contentSizeFlag enabled by default would also fail for streaming and MT operations fixed	2017-10-13 18:32:06 -07:00
Yann Collet	fb44516641	ensure fParams.contentSizeFlag starts at 1 such default was failing for ZSTD_compressBegin/ZSTD_compressContinue fixed too	2017-10-13 17:39:13 -07:00
Yann Collet	dd18d73e7e	fileio: content size is enabled by default	2017-10-13 16:32:18 -07:00
Nick Terrell	ced6e6189c	Add DEBUGLOG() that prints FSE encoding types	2017-10-13 14:55:23 -07:00
Nick Terrell	24ac2dbd2a	Fix invalid use of dictionary offcode table Fixes #888.	2017-10-13 12:47:03 -07:00
Yann Collet	a9e5705077	minor code formatting added a trace during sequence encoding	2017-10-13 02:36:16 -07:00
Nick Terrell	a86a7097ec	Ensure dictionary Huff table can encode any symbol * Ensure that the dictionary Huffman CTable has maxSymbolValue 255. * Fix a stack buffer overflow during compression dictionary loading.	2017-10-03 13:22:13 -07:00
Yann Collet	67478f4cb0	fixed minor conversion warnings for printf in debug mode	2017-10-02 17:28:57 -07:00
Yann Collet	004fd34fd9	Merge pull request #876 from facebook/srcSize CLI Fix : srcSize written in frame headers when compressing multiple files	2017-10-02 15:02:05 -07:00
Nick Terrell	86e83e926f	[libzstd] Set CLEVEL_CUSTOM correctly In `ZSTD_compressBegin_advanced()`, `ZSTD_parameters` are used to set the compression parameters, but the level didn't get set to `CLEVEL_CUSTOM`, so `ZSTD_compressBlock()` used the wrong parameters when checking the source size.	2017-10-02 13:43:30 -07:00
Yann Collet	6e930c13d1	Merge branch 'dev' into compressBound	2017-10-01 11:24:02 -07:00
Yann Collet	dc404119e5	ZSTD_adjustCParams_internal : minor optimization	2017-09-30 15:02:40 -07:00
Nick Terrell	c5d6dde502	Don't `size -= 1` in ZSTD_adjustCParams() The window size could end up too small if the source size is 2^n + 1. Credit to OSS-Fuzz	2017-09-30 14:20:06 -07:00
Yann Collet	5b10345b26	added ZSTD_COMPRESSBOUND() as a macro ZSTD_compressBound() works fine, but is only useful for dynamic allocation. For static allocation, only a macro can provide the amount during compilation time.	2017-09-29 23:17:41 -07:00
Yann Collet	8afb151c9b	cli: fixed wrong initialization in MT mode It's not good to mix old and new API ZSTD_resetCStream() doesn't just set pledgedSrcSize : it also sets the CCtx for a single thread compression. Problem is, when 2+ threads are defined in cctx->requestedParams, ZSTD_compress_generic() will want to start MT compression, since initialization is supposed to have already happened (thanks to ZSTD_resetCStream()) except that the underlying ZSTDMT_CCtx* object is not created, resulting in a segfault. This is an invalid construction (correct one is to use ZSTD_CCtx_setPledgedSrcSize()). I haven't found a nice way to mitigate this impact if someone makes the same mistake. At some point, removing the old API to keep only the new API within fileio.c will limit these risks.	2017-09-29 22:14:37 -07:00
Yann Collet	fbd5ab7027	minor fix : no longer use fake srcSize during resource creation srcSize is read and provided at each file, not at resource creation. This used to be useful with older API, because it could not re-adapt parameters between sessions. At some point, it will be better to remove the old code, and only keep the new_api. It works fine by now.	2017-09-29 19:40:27 -07:00
Yann Collet	db1668a43b	fix : srcSize written in frame header when multiple files compressed This information used to be disabled when nbFiles>1. It was badly initialized later in the code, resulting in an error.	2017-09-29 18:05:18 -07:00
Yann Collet	7c9669f272	Merge pull request #873 from facebook/shorterTests Leaner tests	2017-09-29 17:26:46 -07:00

1 2 3 4 5 ...

666 Commits