AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Terrell	e103d7b4a6	Fix superblock mode (#2100 ) Fixes: Enable RLE blocks for superblock mode Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table. Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table. Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes. Respect disableLiteralsCompression in superblock mode Fix superblock mode to handle a block consisting of only compressed literals Fix a off by 1 error in superblock mode that disabled it whenever there were last literals Fix superblock mode with long literals/matches (> 0xFFFF) Allow superblock mode to repeat Huffman tables Respect ZSTD_minGain(). Tests: Simple check for the condition in #2096. When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much. Remaining limitations: O(targetCBlockSize^2) because we recompute statistics every sequence Unable to split literals of length > targetCBlockSize into multiple sequences Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead. ... Fixes #2096	2020-05-01 16:11:47 -07:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
Nick Terrell	81fda0419e	[opt] Only update repcodes upon arrival	2020-03-04 17:57:15 -08:00
Nick Terrell	0f9882deb9	[opt] Don't recompute repcodes while emitting sequences	2020-03-04 17:23:00 -08:00
Nick Terrell	c6caa2d04e	[opt] Delete ZSTD_litLengthContribution	2020-03-04 16:35:26 -08:00
Nick Terrell	610171ed86	[opt] Explain why we don't include literals price	2020-03-04 16:29:19 -08:00
Nick Terrell	5f49578be7	[opt] Don't recompute initial literals price	2020-03-04 16:27:17 -08:00
Nick Terrell	ddab2a94e8	Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy()	2019-09-20 00:56:20 -07:00
Yann Collet	facbe8b2c2	factored the logic selecting lowest match index as suggested by @terrelln	2019-08-05 15:18:43 +02:00
Yann Collet	98e7c344cd	fixed strategies btopt+	2019-08-02 14:42:53 +02:00
Nick Terrell	95e2b430ea	[opt] Add asserts for corruption in ZSTD_updateTree()	2019-06-21 15:22:29 -07:00
Yann Collet	9af909bf35	Merge pull request #1624 from facebook/smallwlog Improves compression ratio for small windowLog	2019-06-14 17:28:21 -07:00
Nick Terrell	cdb9481e38	[libzstd] Optimize ZSTD_insertBt1() for repetitive data We would only skip at most 192 bytes at a time before this diff. This was added to optimize long matches and skip the middle of the match. However, it doesn't handle the case of repetitive data. This patch keeps the optimization, but also handles repetitive data by taking the max of the two return values. ``` > for n in $(seq 9); do echo strategy=$n; dd status=none if=/dev/zero bs=1024k count=1000 \| command time -f %U ./zstd --zstd=strategy=$n >/dev/null; done strategy=1 0.27 strategy=2 0.23 strategy=3 0.27 strategy=4 0.43 strategy=5 0.56 strategy=6 0.43 strategy=7 0.34 strategy=8 0.34 strategy=9 0.35 ``` At level 19 with multithreading the compressed size of `silesia.tar` regresses 300 bytes, and `enwik8` regresses 100 bytes. In single threaded mode `enwik8` is also within 100 bytes, and I didn't test `silesia.tar`. Fixes Issue #1634.	2019-06-05 20:34:00 -07:00
Yann Collet	bc601bdc6d	first implementation of small window size for btopt noticeably improves compression ratio when window size is small (< 18). enwik7 level 19 windowLog `dev` `smallwlog` improvement 23 3.577 3.577 0.02% 22 3.536 3.538 0.06% 21 3.462 3.467 0.14% 20 3.364 3.377 0.39% 19 3.244 3.272 0.86% 18 3.110 3.166 1.80% 17 2.843 3.057 7.53% 16 2.724 2.943 8.04% 15 2.594 2.822 8.79% 14 2.456 2.686 9.36% 13 2.312 2.523 9.13% 12 2.162 2.361 9.20% 11 2.003 2.182 8.94%	2019-05-31 15:55:12 -07:00
Yann Collet	9719fd616c	removed nextToUpdate3 from ZSTD_window it's now a local variable of ZSTD_compressBlock_opt()	2019-05-28 16:18:12 -07:00
Yann Collet	33dabc8c80	get bt matches : made it a bit clearer which parameters are input and output	2019-05-28 16:11:32 -07:00
Yann Collet	327cf6fac1	nextToUpdate3 does not need to be maintained outside of zstd_opt.c It's re-synchronized with nextToUpdate at beginning of each block. It only needs to be tracked from within zstd_opt block parser. Made the logic clear, so that no code tried to maintain this variable. An even better solution would be to make nextToUpdate3 an internal variable of ZSTD_compressBlock_opt_generic(). That would make it possible to remove it from ZSTD_matchState_t, thus restricting its visibility to only where it's actually useful. This would require deeper changes though, since the matchState is the natural structure to transport parameters into and inside the parser.	2019-05-28 15:26:52 -07:00
Yann Collet	6453f8158f	complementary code comments on variables used / impacted during maxDist check	2019-05-28 14:12:16 -07:00
Josh Soref	a880ca239b	Spelling (#1582 ) * spelling: accidentally * spelling: across * spelling: additionally * spelling: addresses * spelling: appropriate * spelling: assumed * spelling: available * spelling: builder * spelling: capacity * spelling: compiler * spelling: compressibility * spelling: compressor * spelling: compression * spelling: contract * spelling: convenience * spelling: decompress * spelling: description * spelling: deflate * spelling: deterministically * spelling: dictionary * spelling: display * spelling: eliminate * spelling: preemptively * spelling: exclude * spelling: failure * spelling: independence * spelling: independent * spelling: intentionally * spelling: matching * spelling: maximum * spelling: meaning * spelling: mishandled * spelling: memory * spelling: occasionally * spelling: occurrence * spelling: official * spelling: offsets * spelling: original * spelling: output * spelling: overflow * spelling: overridden * spelling: parameter * spelling: performance * spelling: probability * spelling: receives * spelling: redundant * spelling: recompression * spelling: resources * spelling: sanity * spelling: segment * spelling: series * spelling: specified * spelling: specify * spelling: subtracted * spelling: successful * spelling: return * spelling: translation * spelling: update * spelling: unrelated * spelling: useless * spelling: variables * spelling: variety * spelling: verbatim * spelling: verification * spelling: visited * spelling: warming * spelling: workers * spelling: with	2019-04-12 11:18:11 -07:00
Nick Terrell	3d7377b874	[libzstd] Handle uncompressed literals	2019-02-15 14:58:11 -08:00
Yann Collet	e980ba212f	Merge pull request #1471 from facebook/nofloat guard functions using floating point for debug mode only	2018-12-23 12:35:51 -08:00
Yann Collet	c9dfb7e445	guard functions using floating point for debug mode only they are only used to print debug messages. Requested in #1386,	2018-12-22 09:09:40 -08:00
Yann Collet	ededcfca57	fix confusion between unsigned <-> U32 as suggested in #1441. generally U32 and unsigned are the same thing, except when they are not ... case : 32-bit compilation for MIPS (uint32_t == unsigned long) A vast majority of transformation consists in transforming U32 into unsigned. In rare cases, it's the other way around (typically for internal code, such as seeds). Among a few issues this patches solves : - some parameters were declared with type `unsigned` in .h, but with type `U32` in their implementation .c . - some parameters have type unsigned*, but the caller user a pointer to U32 instead. These fixes are useful. However, the bulk of changes is about %u formating, which requires unsigned type, but generally receives U32 values instead, often just for brevity (U32 is shorter than unsigned). These changes are generally minor, or even annoying. As a consequence, the amount of code changed is larger than I would expect for such a patch. Testing is also a pain : it requires manually modifying `mem.h`, in order to lie about `U32` and force it to be an `unsigned long` typically. On a 64-bit system, this will break the equivalence unsigned == U32. Unfortunately, it will also break a few static_assert(), controlling structure sizes. So it also requires modifying `debug.h` to make `static_assert()` a noop. And then reverting these changes. So it's inconvenient, and as a consequence, this property is currently not checked during CI tests. Therefore, these problems can emerge again in the future. I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests. It's another restriction for coding, adding more frustration during merge tests, since most platforms don't need this distinction (hence contributor will not see it), and while this can matter in theory, the number of platforms impacted seems minimal. Thoughts ?	2018-12-21 18:09:41 -08:00
Yann Collet	2898afab52	fixed OSSfuzz 11849 The problem was already masked, due to no longer accepting tiny blocks for statistics. But in case it could still happen with not-so-tiny blocks, there is a stricter control which ensures that nothing was already loaded prior to statistics collection.	2018-12-19 16:54:15 -08:00
Yann Collet	8e0e495ce8	fixed: compression ratio discrepancy depending on initialization, the first byte of a new frame was invalidated or not. As a consequence, one match opportunity was available or not, resulting in slightly different compressed sizes (on average, 1 or 2 bytes once every 20 frames). It impacted ratio comparison between one-shot and streaming modes. This fix makes the first byte of a new frame always a valid match. Now compressed size is always the same. It also improves compressed size by a negligible amount.	2018-12-19 10:11:06 -08:00
Yann Collet	ef984e7307	fix debug levels as reported by @terrelln. 2 is reserved for temporary usage only.	2018-12-18 13:40:07 -08:00
Yann Collet	635783da12	btultra2 and very small srcSize When srcSize is small, the nb of symbols produced is likely too small to warrant dedicated probability tables. In which case, predefined distribution tables will be used instead. There is a cheap algorithm in btultra initialization : it presumes default distribution will be used if srcSize <= 1024. btultra2 now uses the same threshold to shut down probability estimation, since measured frequencies won't be used at entropy stage, and therefore relying on them to determine sequence cost is misleading, resulting in worse compression ratios. This fixes btultra2 performance issue on very small input. Note that, a proper way should be to determine which symbol is going to use predefined probaility and which symbol is going to use dynamic ones. But the current algorithm is unable to make a "per-symbol" decision. So this will require significant modifications.	2018-12-18 12:32:58 -08:00
Yann Collet	373ff8b983	play around with rescale weights	2018-12-17 15:48:34 -08:00
Yann Collet	5e6aaa3abb	fixed btultra2 usage with prefix notably while using multi-threading	2018-12-10 18:45:03 -08:00
Yann Collet	be9e561da4	changed ZSTD_c_compressionStrategy into ZSTD_c_strategy also : fixed paramgrill, and limit conditions	2018-12-06 15:00:52 -08:00
Yann Collet	e9448cdf4c	introduced strategy btultra2 note : not yet applied on any compression level	2018-12-06 13:38:09 -08:00
Yann Collet	e874dacc08	changed searchLength into minMatch refactored all relevant API and calls for consistency.	2018-11-20 14:56:07 -08:00
Yann Collet	2caa995558	just add an assert() in ZSTD_insertBtAndGetAllMatches() to express a condition on ll0 . May help static analyzer as in #1397	2018-11-05 17:13:32 -08:00
W. Felix Handte	bad74c4781	Use Working Ctx Logs when not in DMS Mode We pre-hash the ptr for the dict match state sometimes. When that actually happens, a hashlog of 0 can produce undefined behavior (right shift a long long by 64). Only applies to unoptimized compilations, since when optimizations are applied, those hash operations are dropped when we're not actually in dms mode.	2018-09-28 17:12:54 -07:00
W. Felix Handte	00c088b32d	Support Split Logs in ZSTD_btopt..ZSTD_btultra	2018-09-28 17:12:54 -07:00
W. Felix Handte	97149f22c3	Stop Separately Passing CParams in ZSTD_opt Internal Functions	2018-09-28 17:10:42 -07:00
W. Felix Handte	dcdf437fed	Also Remove CParams from Table Filling Functions' Args	2018-09-28 17:10:42 -07:00
W. Felix Handte	6cb2454646	Remove CParams from Block Compressor Functions' Args	2018-09-28 17:10:42 -07:00
Nick Terrell	f2d6db45cd	[zstd] Add -Wmissing-prototypes	2018-09-27 15:24:48 -07:00
Yann Collet	6e66bbf5dd	fixed several minor issues detected by scan-build only notable one : writeNCount() resists better vs invalid distributions (though it should never happen within zstd anyway)	2018-08-14 16:55:35 -07:00
W. Felix Handte	3e91dc4d6a	Add Repcode Bounds Check	2018-06-21 15:54:41 -04:00
W. Felix Handte	5bd3d4b7d2	Add Debug Log Statement	2018-06-21 15:54:07 -04:00
W. Felix Handte	3caba150c6	Fix `dmsBtLow` Test	2018-06-21 15:53:40 -04:00
W. Felix Handte	5da9bbc38e	Conceivably Dedup ZSTD_noDict and ZSTD_dictMatchState _insertBt1 Impls By reverting to the bool extDict flag, we call ZSTD_insertBt1 with the same const args in both non-extDict dictModes.	2018-06-21 11:20:01 -04:00
W. Felix Handte	5d81f71e83	Consistency in Guarding DMS-Only Variable Initializations	2018-06-20 16:54:53 -04:00
W. Felix Handte	9c14eafe3d	Also Use `matchLow` for HC3 Match	2018-06-20 15:51:14 -04:00
W. Felix Handte	0a6cf7cd1d	Minor Changes	2018-06-20 15:27:23 -04:00
W. Felix Handte	ae1f3898a2	Remove Dead(!) HC3 DMS Lookup	2018-06-20 15:27:12 -04:00
W. Felix Handte	03c39c540b	Fix Incorrect Param	2018-06-19 15:36:33 -04:00
W. Felix Handte	f0a13bcd68	Make Sure Position 0 Gets Into the Tree	2018-06-19 15:10:06 -04:00

1 2 3

131 Commits