AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Terrell	f91ed5c766	[lib] s/current/curr because it collides with Linux Kernel macro	2020-09-09 14:35:39 -07:00
Yann Collet	fdc56baa42	fix 22294 (#2151 )	2020-05-18 21:05:10 -07:00
Nick Terrell	b2092c6dc4	[ldm] Reset loadedDictEnd when the context is reset	2020-05-18 12:35:44 -07:00
Nick Terrell	add7ed2d4a	[lib] Fix bug in loading LDM dictionary in MT mode Exposed when loading a dictionary < LDM minMatch bytes in MT mode. Test Plan: ``` CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address" ./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297 ``` TODO: Add an explicit test that loads a small dictionary in MT mode	2020-05-14 11:52:28 -07:00
W. Felix Handte	6028827fee	Rewrite Include Paths to be Relative Addresses #1998.	2020-05-04 15:20:26 -04:00
Bimba Shrestha	5b0a452cac	Adding --long support for --patch-from (#1959 ) * adding long support for patch-from * adding refPrefix to dictionary_decompress * adding refPrefix to dictionary_loader * conversion nit * triggering log mode on chainLog < fileLog and removing old threshold * adding refPrefix to dictionary_round_trip * adding docs * adding enableldm + forceWindow test for dict * separate patch-from logic into FIO_adjustParamsForPatchFromMode * moving memLimit adjustment to outside ifdefs (need for decomp) * removing refPrefix gate on dictionary_round_trip * rebase on top of dev refPrefix change * making sure refPrefx + ldm is < 1% of srcSize * combining notes for patch-from * moving memlimit logic inside fileio.c * adding display for optimal parser and long mode trigger * conversion nit * fuzzer found heap-overflow fix * another conversion nit * moving FIO_adjustMemLimitForPatchFromMode outside ifndef * making params immutable * moving memLimit update before createDictBuffer call * making maxSrcSize unsigned long long * making dictSize and maxSrcSize params unsigned long long * error on files larger than 4gb * extend refPrefix test to include round trip * conversion to size_t * making sure ldm is at least 10x better * removing break * including zstd_compress_internal and removing redundant macros * exposing ZSTD_cycleLog() * using cycleLog instead of chainLog * add some more docs about user optimizations * formatting	2020-04-17 15:58:53 -05:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
W. Felix Handte	19a0955ec9	Add `ZSTD_cwksp_alloc_size()` to Help Calculate Needed Workspace Size	2019-10-10 13:40:16 -04:00
Nick Terrell	ddab2a94e8	Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy()	2019-09-20 00:56:20 -07:00
Nick Terrell	75cfe1dc69	[ldm] Fix bug in overflow correction with large job size (#1678 ) * [ldm] Fix bug in overflow correction with large job size * [zstdmt] Respect ZSTDMT_JOBSIZE_MAX (1G in 64-bit mode) * [test] Add test that exposes the bug Sadly the test fails on our CI because it uses too much memory, so I had to comment it out.	2019-07-12 18:45:18 -04:00
Josh Soref	a880ca239b	Spelling (#1582 ) * spelling: accidentally * spelling: across * spelling: additionally * spelling: addresses * spelling: appropriate * spelling: assumed * spelling: available * spelling: builder * spelling: capacity * spelling: compiler * spelling: compressibility * spelling: compressor * spelling: compression * spelling: contract * spelling: convenience * spelling: decompress * spelling: description * spelling: deflate * spelling: deterministically * spelling: dictionary * spelling: display * spelling: eliminate * spelling: preemptively * spelling: exclude * spelling: failure * spelling: independence * spelling: independent * spelling: intentionally * spelling: matching * spelling: maximum * spelling: meaning * spelling: mishandled * spelling: memory * spelling: occasionally * spelling: occurrence * spelling: official * spelling: offsets * spelling: original * spelling: output * spelling: overflow * spelling: overridden * spelling: parameter * spelling: performance * spelling: probability * spelling: receives * spelling: redundant * spelling: recompression * spelling: resources * spelling: sanity * spelling: segment * spelling: series * spelling: specified * spelling: specify * spelling: subtracted * spelling: successful * spelling: return * spelling: translation * spelling: update * spelling: unrelated * spelling: useless * spelling: variables * spelling: variety * spelling: verbatim * spelling: verification * spelling: visited * spelling: warming * spelling: workers * spelling: with	2019-04-12 11:18:11 -07:00
Yann Collet	be9e561da4	changed ZSTD_c_compressionStrategy into ZSTD_c_strategy also : fixed paramgrill, and limit conditions	2018-12-06 15:00:52 -08:00
Yann Collet	41c7d0b1e1	changed hashEveryLog into hashRateLog	2018-11-21 14:36:57 -08:00
Yann Collet	e874dacc08	changed searchLength into minMatch refactored all relevant API and calls for consistency.	2018-11-20 14:56:07 -08:00
Nick Terrell	b9693d3a49	[lib] Add rsyncable mode - Add rsyncable mode to multithreaded mode - Factor out LDM's hash function for reuse	2018-11-14 16:59:57 -08:00
W. Felix Handte	50cc1cf4d5	Remove CParams Arg from ZSTD_ldm_blockCompress	2018-09-28 17:12:53 -07:00
W. Felix Handte	dcdf437fed	Also Remove CParams from Table Filling Functions' Args	2018-09-28 17:10:42 -07:00
W. Felix Handte	6cb2454646	Remove CParams from Block Compressor Functions' Args	2018-09-28 17:10:42 -07:00
Nick Terrell	3841dbac84	Adjust advanced parameters to source size In the new advanced API, adjust the parameters even if they are explicitly set. This mainly applies to the `windowLog`, and accordingly the `hashLog` and `chainLog`, when the source size is known.	2018-06-18 15:49:31 -07:00
Yann Collet	f6ad59ab5c	Merge branch 'dev' into staticDictCost	2018-05-24 16:21:02 -07:00
W. Felix Handte	3ba70cc759	Clear the Dictionary When Sliding the Window	2018-05-23 17:53:03 -04:00
W. Felix Handte	b67196f30d	Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff	2018-05-23 17:53:03 -04:00
W. Felix Handte	265c2869d1	Split Wrapper Functions to Cause Inlining	2018-05-23 17:53:03 -04:00
Yann Collet	a8ddf1d370	disable 2-passes strategy	2018-05-22 15:06:36 -07:00
Yann Collet	8572b4d09f	fixed a pretty complex bug when combining ldm + btultra	2018-05-17 16:13:53 -07:00
Nick Terrell	295ab0dbfa	Only load extra table positions for CDicts Zstdmt uses prefixes to load the overlap between segments. Loading extra positions makes compression non-deterministic, depending on the previous job the context was used for. Since loading extra position takes extra time as well, only do it when creating a `ZSTD_CDict`. Fixes #1077.	2018-04-02 14:41:30 -07:00
Yann Collet	87b0cf05bd	Merge pull request #1057 from facebook/lrmSettings LRM parameters	2018-03-21 05:59:39 -07:00
Nick Terrell	a3b76a77ef	Quiet appveyor warnings	2018-03-20 15:34:40 -07:00
Nick Terrell	136b9e2392	Fix external sequence corner cases * Clear external sequences when we reset the `ZSTD_CCtx`. * Skip external sequences when a block is too small to compress.	2018-03-20 14:50:28 -07:00
Yann Collet	6f4d0778a5	make it possible to express compression parameters in any order	2018-03-19 14:41:23 -07:00
Yann Collet	9618c0c804	make it possible to specify LDM parameters in any order	2018-03-19 11:07:04 -07:00
Nick Terrell	4af1fafeb8	Restore setting loadedDictEnd Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`, which means that dictionary compression will only be able to reference the parts of the dictionary within the window. The spec allows us to reference the entire dictionary so long as even one byte is in the window. `ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd` beyond the window, even once the dictionary was out of range. When overflow protection kicked in, the check `current > loadedDictEnd + maxDist` is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset below the value, which would incorrectly allow references beyond the window. This bug is present in `master`, but is very hard to trigger, since it requires both dictionaries and data which triggers overflow correction.	2018-03-16 14:54:06 -07:00
Nick Terrell	1908c92c46	Merge remote-tracking branch 'upstream/dev' into extern-seq * upstream/dev: Fix overflow protection with wlog=31	2018-03-14 17:26:31 -07:00
Nick Terrell	a9a6dcba63	Expose reference external sequence API * Expose the reference external sequences API for zstdmt. Allows external sequences of any length, which get split when necessary. * Reset the LDM window when the context is reset. * Store the maximum number of LDM sequences. * Sequence generation now returns the number of last literals. * Fix sequence generation to not throw out the last literals when blocks of more than 1 MB are encountered.	2018-03-14 12:29:31 -07:00
Nick Terrell	33fb966e56	Fix overflow protection with wlog=31 The overflow protection is broken when the window log is `> (3U << 29)`, so 31. It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`, and the the assertion `current > newCurrent` fails. This happens when the same context is used many times over, but with a large window log, like in zstdmt. Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`. The added test fails before the patch, and passes after.	2018-03-14 11:45:44 -07:00
Nick Terrell	0a0e64c641	LDM manages its own window round buffer	2018-02-27 12:13:23 -08:00
Nick Terrell	7e5e226cbf	Split the window state into substructure	2018-02-26 13:29:57 -08:00
Nick Terrell	7e2bf4ebad	Remove long range matcher immediate repcode check The compression ratio gets about 0.01% worse on the files I tested, but the code is much simpler.	2018-02-22 15:18:47 -08:00
Nick Terrell	af866b3a58	Split block compresser out of long range matcher * `ZSTD_ldm_generateSequences()` generates the LDM sequences and stores them in a table. It should work with any chunk size, but is currently only called one block at a time. * `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and instead of encoding the literals directly, it passes them to a secondary block compressor. The code to handle chunk sizes greater than the block size is currently commented out, since it is unused. The next PR will uncomment exercise this code. * During optimal parsing, ensure LDM `minMatchLength` is at least `targetLength`. Also don't emit repcode matches in the LDM block compressor. Enabling the LDM with the optimal parser now actually improves the compression ratio. * The compression ratio is very similar to before. It is very slightly different, because the repcode handling is slightly different. If I remove immediate repcode checking in both branches the compressed size is exactly the same. * The speed looks to be the same or better than before. Up Next (in a separate PR) -------------------------- Allow sequence generation to happen prior to compression, and produce more than a block worth of sequences. Expose some API for zstdmt to consume. This will test out some currently untested code in `ZSTD_ldm_blockCompress()`.	2018-02-22 15:18:41 -08:00
Nick Terrell	887cd4e35e	Split ZSTD_CCtx into smaller sub-structures	2018-01-16 11:17:50 -08:00
Nick Terrell	c233bdbaee	Increase maximum window size * Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail on my Mac. * Maximum window size in 64-bit mode is 2GB, since that is the largest power of 2 that works with the overflow prevention. * Allow `--long=windowLog` to set the window log, along with `--zstd=wlog=#`. These options also set the window size during decompression, but don't override `--memory=#` if it is set. * Present a helpful error message when the window size is too large during decompression. * The long range matcher defaults to a hash log 7 less than the window log, which keeps it at 20 for window log 27. * Keep the default long range matcher window size and the default maximum window size at 27 for the API and CLI. * Add tests that use the maximum window size and hash size for compression and decompression.	2017-09-26 14:00:01 -07:00
Nick Terrell	6c9ed76676	[ldm] Fix corner case where minMatch < 8 There is a potential read buffer overflow when minMatch < 8. fix-fuzz-failure	2017-09-19 13:49:37 -07:00
Stella Lau	f902bf9676	Merge branch 'ldm-integrate' into ldm-mergeDev	2017-09-11 14:55:29 -07:00
Stella Lau	360428c5d9	Move ldm functions to their own file	2017-09-06 18:09:26 -07:00

44 Commits