Commit Graph

1491 Commits

Author SHA1 Message Date
Nick Terrell
b2092c6dc4 [ldm] Reset loadedDictEnd when the context is reset 2020-05-18 12:35:44 -07:00
Nick Terrell
add7ed2d4a [lib] Fix bug in loading LDM dictionary in MT mode
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.

Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```

TODO: Add an explicit test that loads a small dictionary in MT mode
2020-05-14 11:52:28 -07:00
Nick Terrell
70c80e19e6 [greedy] Fix performance instability 2020-05-12 17:51:16 -07:00
Nick Terrell
c3e921c639
Merge pull request #2131 from terrelln/raw-dict-fuzzer
Fix rare scenario with lazy parser, dictionary, and repcodes
2020-05-12 17:44:31 -07:00
Nick Terrell
3c1eba4d99 [lib] Fix lazy repcode validity checks 2020-05-12 12:25:06 -07:00
Nick Terrell
4e0515916d [lib] Fix repcode validation in no dict mode 2020-05-12 11:57:15 -07:00
Nick Terrell
6d687a8816 [lib] Fix dictionary + repcodes + optimal parser 2020-05-12 10:36:53 -07:00
Nick Terrell
4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
Nick Terrell
80d3585e31 [lib] Fix lazy parser with dictionary + repcodes 2020-05-11 19:04:30 -07:00
Yann Collet
608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
W. Felix Handte
c6636afbbb Fix ZSTD_estimateCCtxSize() Under ASAN
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.

This commit fixes this.
2020-05-11 18:58:19 -04:00
Yann Collet
54144285fd small speed improvement for strategy fast
gcc 9.3.0 :
kennedy : 459 -> 466
silesia : 360 -> 365
enwik8  : 267 -> 269

clang 10.0.0 :
kennedy : 436 -> 441
silesia : 364 -> 366
enwik8  : 271 -> 272
2020-05-07 06:15:58 -07:00
Felix Handte
ad8dbae1b7
Merge pull request #2103 from felixhandte/relative-includes
Migrate Includes to Relative Paths
2020-05-06 09:42:23 -07:00
Yann Collet
c29fd7cd8b some more conversion warnings
hunting down some static analyzer warnings
2020-05-05 10:16:59 -07:00
Yann Collet
c1b836f4c3 fix minor conversion warnings 2020-05-04 14:43:09 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
Felix Handte
7e9aabd652
Merge pull request #2099 from felixhandte/compile-under-pedantic
Compile Under `-pedantic -Werror` and `-std=c90`
2020-05-04 10:07:13 -07:00
Felix Handte
816ed80774
Merge pull request #1984 from MeghnaM/1636-Reduce-stack-usage-of-HUF_sort
Reduce stack usage of HUF_sort()
2020-05-04 08:15:31 -07:00
W. Felix Handte
c7da66c9cf Purge C++-Style Comments (// ...), Make Compilation Succeed Under C90 2020-05-04 10:59:15 -04:00
W. Felix Handte
6696933b32 Make All Invocations Start With Literal Format String 2020-05-04 10:59:15 -04:00
W. Felix Handte
5e5f262612 Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations 2020-05-04 10:58:55 -04:00
Nick Terrell
e103d7b4a6
Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
Meghna Malhotra
0adfc8dfce Fix broken CI; make changes in response to the comments 2020-05-01 13:45:48 -07:00
Meghna Malhotra
53d76dc20f Remove magic constant and made other changes addressing the comments 2020-05-01 13:45:48 -07:00
Meghna Malhotra
fe8402b522 WIP: Still getting an error 2020-05-01 13:45:48 -07:00
Meghna Malhotra
a084d959bd WIP: Increased wksp size, but it's segfaulting 2020-05-01 13:45:48 -07:00
Meghna Malhotra
fdb2780c47 Move rank table into HUF_buildCTable_wksp() 2020-05-01 13:45:48 -07:00
Bimba Shrestha
1875f616ce passing dictContentType instead of rawContent every time 2020-04-21 22:29:35 -07:00
Bimba Shrestha
5b0a452cac
Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
Nick Terrell
5fcbc484c8
Merge pull request #2040 from caoyzh/dev-2
Optimize by prefetching on aarch64
2020-04-08 13:14:47 -07:00
Bimba Shrestha
c0d4b2b5a3
Merge pull request #2075 from bimbashrestha/dict_fuzzer_ref
[bug] handling case where prefix is NULL or 0 sized in refPrefix_advanced
2020-04-07 17:37:19 -05:00
Bimba Shrestha
1658ae75cd handling nil case for refprefix 2020-04-07 14:41:53 -07:00
Carl Woffenden
a93fadfcd9 Further replication removed
`CHECK_F` is now in `error_private.h`. Minor tidy.
2020-04-07 11:25:16 +02:00
Carl Woffenden
7af7735fa3 Merge remote-tracking branch 'upstream/dev' into single-file-lib 2020-04-07 11:13:02 +02:00
Carl Woffenden
edd9a07322 Code replicated in compression and decompression moved to shared headers
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
2020-04-07 11:02:06 +02:00
Bimba Shrestha
0154866749 moving consts to zstd_internal and reusing them 2020-04-03 14:26:15 -07:00
Carl Woffenden
7c420344d2 Single-file decoder script can now (optionally) create an encoder
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
2020-04-03 19:07:46 +02:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell
d34204a7b7
Merge pull request #2029 from terrelln/minor-opt
[opt] Update repcodes less often
2020-03-23 18:12:32 -07:00
caoyzh
7201980650 Optimize by prefetching on aarch64 2020-03-14 15:25:59 +08:00
Bimba Shrestha
66607d0eac
Merge pull request #2033 from bimbashrestha/icc
[opt] Small icc level 1 compression speed gain using #pragma vector
2020-03-10 20:42:19 -05:00
Bimba Shrestha
a89c45bdbd Typo 2020-03-10 15:19:48 -05:00
Bimba Shrestha
43fc88f443 Adding comment and remvoing ivdep 2020-03-10 14:57:27 -05:00
Bimba Shrestha
dba3abc95a Missed returns 2020-03-05 12:20:59 -08:00
Bimba Shrestha
a75e5f2ffc bitscan add undef check 2020-03-05 11:52:15 -08:00
Bimba Shrestha
4c72a1a9c2 adding vector to main loop 2020-03-05 09:55:38 -08:00
Nick Terrell
81fda0419e [opt] Only update repcodes upon arrival 2020-03-04 17:57:15 -08:00
Nick Terrell
04744e52dc
Merge pull request #2028 from terrelln/minor-opt
[opt] Don't recompute initial literals price
2020-03-04 17:40:59 -08:00
Nick Terrell
0f9882deb9 [opt] Don't recompute repcodes while emitting sequences 2020-03-04 17:23:00 -08:00
Nick Terrell
c6caa2d04e [opt] Delete ZSTD_litLengthContribution 2020-03-04 16:35:26 -08:00