Commit Graph

2327 Commits

Author SHA1 Message Date
René Rebe
21eb26d664 fixed legacy/zstd_v* with older gcc version, by guarding builtin_*
like in other files
2018-03-25 20:35:15 +02:00
Yann Collet
ad15c1b724 added __has_attribute() define for non-clang compilers 2018-03-23 19:04:48 -07:00
Yann Collet
52ca7c6c56 make DYNAMIC_BMI2 support of clang conditional to __has_attribute()
to support older clang versions such as 3.4
2018-03-23 18:45:42 -07:00
Yann Collet
29b021f9a0
Merge pull request #1067 from facebook/targetLength
removed limit ZSTD_TARGETLENGTH_MAX
2018-03-22 10:38:33 -07:00
Nick Terrell
ad344033df Fix broken assertion
The `avgJobSize` must not be lower than 256 KB for single-pass mode.
In `zstd.h` we say the minimum value for `ZSTD_p_jobSize` is 1 MB,
so ensure that we always pick a size >= 1 MB.

Found by libFuzzer fuzzer tests with large input limits.
2018-03-21 16:20:30 -07:00
Yann Collet
153bc1c004 removed limit ZSTD_TARGETLENGTH_MAX
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".

It will also be possible to define larger targetlength for ultra compression mode.

There is no adverse side effect due to removing this limit.
2018-03-21 15:50:05 -07:00
Yann Collet
a99c4a3621 Merge branch 'dev' into advancedDecompress 2018-03-21 06:08:28 -07:00
Yann Collet
87b0cf05bd
Merge pull request #1057 from facebook/lrmSettings
LRM parameters
2018-03-21 05:59:39 -07:00
Yann Collet
d1bf609abf
Merge pull request #1059 from terrelln/mt-ldm
Integrate ldm with zstdmt
2018-03-20 17:50:20 -07:00
Yann Collet
e0cb8d19c6 fixed legacy test case 2018-03-20 17:48:22 -07:00
Yann Collet
878728dc26 fixed several comments by @terrelln 2018-03-20 16:35:14 -07:00
Yann Collet
e1c52faace
Merge pull request #1060 from facebook/compressImpl
merge bmi2 implementation of encodeSequence into zstd_compress.c
2018-03-20 16:19:42 -07:00
Yann Collet
6cda8c932c added test with ZSTD_decompress_generic() + ZSTD_DCtx_refPrefix()
also :
clarified stage condition to accept new parameters,
fixed initializers correspondingly.
2018-03-20 16:16:13 -07:00
Yann Collet
0dadb6b70d implemented ZSTD_DCtx_refPrefix*() 2018-03-20 15:45:56 -07:00
Yann Collet
569b8ba4d9 implemented ZSTD_DCtx_refDDict() 2018-03-20 15:43:49 -07:00
Nick Terrell
a3b76a77ef Quiet appveyor warnings 2018-03-20 15:34:40 -07:00
Yann Collet
6873fec658 changed dictMore for dictContentType
which seems clearer to describe what the variable/argument is about.
2018-03-20 15:13:14 -07:00
Yann Collet
31b54b6eea updated ZSTD_initStaticDDict() prototype
can also specify dictContentType.
2018-03-20 14:52:02 -07:00
Nick Terrell
136b9e2392 Fix external sequence corner cases
* Clear external sequences when we reset the `ZSTD_CCtx`.
* Skip external sequences when a block is too small to compress.
2018-03-20 14:50:28 -07:00
Yann Collet
353117c5d7 implemented ZSTD_DCtx_loadDictionary*()
this required updating ZSTD_createDDict_advanced()
to accept a dictContentType parameter (raw, full, auto).
2018-03-20 13:40:29 -07:00
Yann Collet
451357f37f
Merge pull request #1058 from facebook/cctxParams
updated CCtxParams API
2018-03-20 12:36:12 -07:00
Yann Collet
2ed5af0766 merge bmi2 implementation of encodeSequence into zstd_compress.c 2018-03-19 19:10:31 -07:00
Nick Terrell
d19f803a3b Fix window size for 1 worker + flushing 2018-03-19 18:56:39 -07:00
Nick Terrell
24d9edbdd8 Set ldmParams to 0 when disabled 2018-03-19 18:23:54 -07:00
Nick Terrell
4b92574feb Fix corner cases exposed by zstreamtest 2018-03-19 17:54:04 -07:00
Nick Terrell
94c77710a9 Integrate ldm with zstdmt
Integrate ldm into zstdmt by running it in serial and in order in the first
step of each job, in the same place as the hash gets updated. The input
buffer is sized to fit the whole LDM window and 2 full buffers of slack.
Input buffers cannot be reused until the LDM step is done with them.
After the LDM step is finished, the jobs don't actually have access to the
full window, only the overlap.

Tested on a few different multi-GB files with and without sanitizers,
and with different numbers of threads.
2018-03-19 16:29:03 -07:00
Nick Terrell
aa4dbd09a1
Pull job/overlap log logic into common function (#1055)
Prepares for LDM integration by separating the job size and overlap logic
into helper functions.
2018-03-19 15:56:36 -07:00
Yann Collet
c8b3d389fd updated CCtxParams API
to respect naming convention :
ZSTD_CCtxParams_*()
2018-03-19 15:07:26 -07:00
Yann Collet
6f4d0778a5 make it possible to express compression parameters in any order 2018-03-19 14:41:23 -07:00
Nick Terrell
2253d01b27 Move XXH64_update() into worker threads
* Computes the XXH hash in the worker threads.
* Workers get a sequence number and wait until ther number shows up. On
  error, ensures that its sequence is finished, so future threads don't
  get blocked.
* Sets up for ldm integration, which will go in the same spot.
2018-03-19 11:08:27 -07:00
Yann Collet
9618c0c804 make it possible to specify LDM parameters in any order 2018-03-19 11:07:04 -07:00
Yann Collet
ec0959e701
Merge branch 'dev' into mt-single 2018-03-18 01:06:31 -07:00
Nick Terrell
4af1fafeb8 Restore setting loadedDictEnd
Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`,
which means that dictionary compression will only be able to reference the parts of
the dictionary within the window. The spec allows us to reference the entire
dictionary so long as even one byte is in the window.

`ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd`
beyond the window, even once the dictionary was out of range.

When overflow protection kicked in, the check `current > loadedDictEnd + maxDist`
is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset
below the value, which would incorrectly allow references beyond the window. This
bug is present in `master`, but is very hard to trigger, since it requires both
dictionaries and data which triggers overflow correction.
2018-03-16 14:54:06 -07:00
Yann Collet
cbc71e40f6 moving LRM parameters out of experimental section
into "normal" range,
start pinned at 160.
2018-03-15 17:22:40 -07:00
Nick Terrell
f15a17e19f Use a single buffer in zstdmt
Summary:
Allocate a single input buffer large enough to house each job, as well as
enough space for the IO thread to write 2 extra buffers. One goes in the
`POOL` queue, and one to fill, and then block on a full `POOL` queue.
Since we can't overlap with the prefix, we allocate space for 3 extra
input buffers.

Test Plan:
* CI
* With and without ASAN/UBSAN run zstdmt with different number of threads
  on two large binaries, and verify that their checksums match.
* Test on the tip of the zstdmt ldm integration.

Reviewers: cyan

Differential Revision: https://phabricator.intern.facebook.com/D7284007

Tasks: T25664120
2018-03-15 16:21:33 -07:00
Yann Collet
192542b63c
Merge pull request #1047 from facebook/hufCompress
removed huf_compress_impl.h
2018-03-15 14:14:03 -07:00
Nick Terrell
a271399c97 Expose reference external sequence API
Summary:
* Expose the reference external sequences API for zstdmt.
  Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.

Expose reference external sequence API

* Expose the reference external sequences API for zstdmt.
* Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.

Test Plan:
* CI
* Test the zstdmt ldm integration stacked on top of this diff

Reviewers: cyan

Differential Revision: https://phabricator.intern.facebook.com/D7283968

Tasks: T25664120
2018-03-14 18:07:53 -07:00
Nick Terrell
1908c92c46 Merge remote-tracking branch 'upstream/dev' into extern-seq
* upstream/dev:
  Fix overflow protection with wlog=31
2018-03-14 17:26:31 -07:00
Yann Collet
a909c293c6 Merge branch 'dev' into hufCompress 2018-03-14 16:11:25 -07:00
Nick Terrell
a9a6dcba63 Expose reference external sequence API
* Expose the reference external sequences API for zstdmt.
  Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.
2018-03-14 12:29:31 -07:00
Nick Terrell
33fb966e56 Fix overflow protection with wlog=31
The overflow protection is broken when the window log is `> (3U << 29)`, so 31.
It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`,
and the the assertion `current > newCurrent` fails. This happens when the same
context is used many times over, but with a large window log, like in zstdmt.

Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`.

The added test fails before the patch, and passes after.
2018-03-14 11:45:44 -07:00
Yann Collet
4c5cbac179
Merge pull request #1041 from facebook/fasterFast
Negative compression levels
2018-03-13 21:32:46 -07:00
Yann Collet
50f763ec44 fixed several comments are underlined by @terrelln 2018-03-13 14:23:14 -07:00
Yann Collet
a95a88af57 removed huf_compress_impl.h
re-imported all functions inside huf_compress.c
for easier source editing.

Also updated a bunch of code comments
for clarification.
2018-03-13 14:14:05 -07:00
Yann Collet
bd7bb94361
Merge pull request #1044 from baldurk/remove-utf8-characters
Remove non-ASCII characters in header file comments
2018-03-13 13:22:07 -07:00
Baldur Karlsson
430a2fec19 Remove non-ASCII characters in header file comments
* Replaced a non-breaking space and an en dash with a plain space and
  a hyphen.
* This means the files are simple ASCII and less likely to run into
  codepage issues.
2018-03-13 20:05:53 +00:00
Yann Collet
530eeb41a7
Merge pull request #1039 from facebook/zstd_decompress
Removed zstd_decompress_impl.h
2018-03-12 18:21:46 -07:00
Yann Collet
2291b85a1e changed ZSTD_p_literalCompression into ZSTD_p_compressLiterals
prefer verb+object construction
2018-03-12 11:44:10 -07:00
Yann Collet
a57d43d4d4 updated documentation of targetLength 2018-03-12 11:35:01 -07:00
Yann Collet
6a9b41b731 create command --fast[=#]
access negative compression levels from command line
for both compression and benchmark modes.

also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.

added relevant cli tests.
2018-03-11 20:01:23 -07:00
Yann Collet
a146ee04ae added negative compression levels
negative compression level trade compression ratio for more compression speed.
They turn off huffman compression of literals,
and use row 0 as baseline with a stepSize = -cLevel.

added associated test in fuzzer

also added : new advanced parameter ZSTD_p_literalCompression
2018-03-11 05:21:53 -07:00
Yann Collet
facc09aa03 minor compression level adaptation
level 12 compresses slightly more and faster
due to better btlazy2 mode
2018-03-11 03:06:52 -07:00
Yann Collet
fe321f9e2a re-integrate ZSTD_decompressSequencesLong() into zstd_decompress.c
removed zstd_decompress_impl.h
2018-03-09 19:48:06 -08:00
Yann Collet
89a2ebb971 incorporated ZSTD_decompressSequences() into zstd_decompress() 2018-03-09 19:35:57 -08:00
Yann Collet
cdb1f1433e incorporated ZSTD_initFseState() inside zstd_decompress.c 2018-03-09 18:16:10 -08:00
Yann Collet
a166eae1ba incorporate ZSTD_decodeSequenceLong() within zstd_decompress.c 2018-03-09 18:11:14 -08:00
Yann Collet
17626ba56e restored ZSTD_decodeSequence() into zstd_decompress.c 2018-03-09 18:03:25 -08:00
Yann Collet
51169575a8
Merge pull request #1036 from terrelln/thread-void
[threading] Cast unused arguments to void
2018-03-07 12:14:05 -08:00
Nick Terrell
7e103cdaf5 [threading] Cast unused arguments to void 2018-03-06 18:36:40 -08:00
Yann Collet
db147ea620 improved comments
following @terrelln suggestions
2018-03-06 18:15:26 -08:00
Yann Collet
06ca9c7d7c fixed 0-seq blocks in block-decompression mode 2018-03-06 01:50:19 -08:00
Yann Collet
9a91afe6ef long offset mode : new default threshold for 32-bit 2018-03-05 16:41:08 -08:00
Yann Collet
7bd7a3ad43 long offset mode : new default threshold for 64-bits mode 2018-03-05 16:16:49 -08:00
Yann Collet
c0393a538f fixed counting long distance weights 2018-03-05 15:12:10 -08:00
Yann Collet
41bd10446e Merge branch 'dev' into longOffsetMode 2018-03-05 13:10:10 -08:00
Yann Collet
cb789d2df8 re-inserted offset evaluation 2018-03-05 13:08:59 -08:00
Yann Collet
b91ddf0ae6 Merge branch 'dev' into longOffsetMode 2018-03-05 11:59:54 -08:00
Yann Collet
d02b44cf55 DYNAMIC_BMI2 enabled for clang
clang only claims compatibility with gcc 4.2.
Consequently, recent patch which reserved DYNAMIC_BMI2 for gcc >= 4.8
also disabled it for clang.

fix : __clang__ is now enough to enable DYNAMIC_BMI2
(associated with other existing conditions : x64/x64, !bmi2)
2018-03-04 16:05:59 -08:00
Yann Collet
45b09e7625 limit DYNAMIC_BMI2 to gcc >= 4.8
attribute bmi2 not supported by gcc 4.4
2018-03-01 15:02:18 -08:00
Yann Collet
b01552a07a force inlining of HUF_decodeSymbol*() functions
which was not done properly by gcc 4.8
resulting in major performance difference.

ex :
zstd -b1 silesia.tar
before : dec 680 MB/s
after  : dec 710 MB/s  (without bmi2)
after  : dec 770 MB/s  (with DYNAMIC_BMI2)
2018-03-01 11:31:45 -08:00
Yann Collet
ccb7184a76
Merge pull request #1026 from terrelln/lrm-window
LDM manages its own window round buffer
2018-02-27 17:09:10 -08:00
Nick Terrell
0a0e64c641 LDM manages its own window round buffer 2018-02-27 12:13:23 -08:00
Yann Collet
2c4d3f339a
Merge pull request #1025 from facebook/huf
Huf
2018-02-27 09:57:01 -08:00
Yann Collet
33a3f18848 fixed wrong size test 2018-02-26 18:27:51 -08:00
Yann Collet
89741653ab added error code workSpace_tooSmall 2018-02-26 15:11:50 -08:00
Yann Collet
6cdf690441 minor cleaning of huff0
Update code documentation, and properly names a few "magic constants".
Also, HUF_compress_internal() gets a cleaner way
to determine size of tables inside workspace.
2018-02-26 14:52:23 -08:00
Nick Terrell
6b88d592fd Reduce ZSTD_CHAINLOG_MAX to 29 in 32-bit mode 2018-02-26 13:30:24 -08:00
Nick Terrell
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
Yann Collet
50bc2ce95e
Merge pull request #1021 from terrelln/lrm-split
Split block compresser out of long range matcher
2018-02-23 17:36:51 -08:00
Yann Collet
653383f74a minor nit from Mac XCode 2018-02-22 15:44:26 -08:00
Nick Terrell
7e2bf4ebad Remove long range matcher immediate repcode check
The compression ratio gets about 0.01% worse on the files I tested, but the
code is much simpler.
2018-02-22 15:18:47 -08:00
Nick Terrell
af866b3a58 Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
  stores them in a table. It should work with any chunk size, but
  is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
  instead of encoding the literals directly, it passes them to a
  secondary block compressor. The code to handle chunk sizes greater
  than the block size is currently commented out, since it is unused.
  The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
  `targetLength`. Also don't emit repcode matches in the LDM block
  compressor. Enabling the LDM with the optimal parser now actually improves
  the compression ratio.
* The compression ratio is very similar to before. It is very slightly
  different, because the repcode handling is slightly different. If I remove
  immediate repcode checking in both branches the compressed size is exactly
  the same.
* The speed looks to be the same or better than before.

Up Next (in a separate PR)
--------------------------

Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
2018-02-22 15:18:41 -08:00
Yann Collet
0fd4df6ed3 Implemented BMI2 functions directly within huf_decompress.c
This makes it easier to edit for maintenance and evolutions
(I plan to experiment modifications in huffman decompression functions).

The methology followed seems broadly applicable to other BMI2 modules.

Performance was tracked rigorously at each step,
there is no noticeable loss (nor win) of performance compared to `#include` version.

Note however that 4X decoder variants tend to be extremely sensitive to code alignment.
This source code resulted in pretty good performance for gcc 7.2 and 7.3,
but future changes (even in other parts of the code) might trigger the issue again.
2018-02-22 10:51:47 -08:00
Yann Collet
9c5a8040a9 fixed huf_compress workspace size 2018-02-21 11:34:49 -08:00
Yann Collet
010ba5f71f
Merge pull request #1017 from terrelln/c-bmi2
[compress] Support BMI2
2018-02-20 15:34:59 -08:00
Nick Terrell
6e128d3534 [BMI2] Add comments to the bmi2 variable in the contexts 2018-02-20 14:12:11 -08:00
Yann Collet
70163bf0d3 added clarification comments in zstd_errors.h
answering some points in #1018
2018-02-20 12:54:49 -08:00
Yann Collet
7117ea8bec
Merge pull request #1011 from terrelln/bmi2
[decompress] Support BMI2
2018-02-15 11:40:34 -08:00
Nick Terrell
b58f01537e [compress] Support BMI2 2018-02-14 19:20:32 -08:00
Nick Terrell
4319132312 [decompress] Support BMI2 2018-02-13 17:00:15 -08:00
Yann Collet
5cb1144872 fixed --single-thread
was incorrectly set to -T0 (use as many cores as possible) previously
2018-02-13 14:56:35 -08:00
Yann Collet
2524cbd847 added code comment on how to generate default tables
as suggested by @terrelln
2018-02-13 10:02:25 -08:00
Yann Collet
71c07966bb added SEQSYMBOL_TABLE_SIZE()
as suggested by @terrelln's comment
2018-02-12 16:52:15 -08:00
Yann Collet
5f7495371e Merge branch 'dev' into fasterDec 2018-02-10 14:24:44 -08:00
Yann Collet
9945e60ac4 Merge branch 'dev' into flexibleLevel 2018-02-10 11:54:49 -08:00
Yann Collet
04a3f85ce7 fixed gcc warning on a switch code path 2018-02-09 16:16:27 -08:00
Yann Collet
af48f0b62b fix : offset table pointer when using default table 2018-02-09 15:15:46 -08:00
Yann Collet
426944c3e3 fixed strict aliasing issue
tuned threshold
2018-02-09 13:24:11 -08:00
Yann Collet
64ee732694 decide long-offset mode based on offcode statistics
threshold vaguely estimated
2018-02-09 12:33:28 -08:00
Yann Collet
c72091556b fixed minor nit as per @terrelln's comments 2018-02-09 09:46:08 -08:00
Yann Collet
4beaeaace5 Merge branch 'dev' into flexibleLevel 2018-02-09 09:15:05 -08:00
Yann Collet
6bfe50ad48 re-enabled ZSTD_decompressSequencesLong() 2018-02-09 09:14:25 -08:00
Yann Collet
1850597eaa pre-calculated default decoding tables 2018-02-09 06:01:02 -08:00
Yann Collet
ab75df21ed fixed mono-symbol distribution 2018-02-09 05:12:13 -08:00
Yann Collet
421a2716d8 fixed default fse distributions
but would be better to pre-calculate tables, for speed
2018-02-09 04:50:58 -08:00
Yann Collet
95424409ea addBits and baseline into FSE decoding table
note : unfinished
- need new default tables
- need modify long mode
2018-02-09 04:25:15 -08:00
Yann Collet
de68c2ff10 Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex()
as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).

Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
2018-02-07 14:22:35 -08:00
Yann Collet
0170cf9a7a minor : modified ZSTD_preserveUnsortedMark() to be more vectorization friendly 2018-02-05 11:46:02 -08:00
Yann Collet
94efb1749d faster decoding in 32-bits mode for long offsets (tentative)
On my laptop:
Before:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
 3#silesia.tar       : 211984896 ->  66683478 (3.179),  97.6 MB/s , 400.7 MB/s
 3#enwik8            : 100000000 ->  35643153 (2.806),  76.5 MB/s , 303.2 MB/s

After:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
 3#silesia.tar       : 211984896 ->  66683478 (3.179),  97.4 MB/s , 435.0 MB/s
 3#enwik8            : 100000000 ->  35643153 (2.806),  76.2 MB/s , 338.1 MB/s

Mileage vary, depending on file, and cpu type.
But a generic rule is : x86 benefits less from "long-offset mode" than x64,
maybe due to register pressure.
On "entropy", long-mode is _never_ a win for x86.
On my laptop though, it may, depending on file and compression level
(enwik8 benefits more from "long-mode" than silesia).
2018-02-04 01:49:31 -08:00
Yann Collet
5188749e1c ensure compression parameters are updated when only compression level is changed 2018-02-02 16:31:20 -08:00
Yann Collet
4b525af53a zstdmt: applies new parameters on the fly
when invoked from ZSTD_compress_generic()
2018-02-02 15:58:13 -08:00
Yann Collet
90eca318a7 fileio: create dedicated function to generate zstd frames
like other formats
2018-02-02 14:24:56 -08:00
Yann Collet
1291d9d7cf
Merge pull request #1006 from systemcrash/patch-2
Update README.md
2018-02-02 10:04:55 -08:00
Yann Collet
209df52ba2 Changed nbThreads for nbWorkers
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
2018-02-01 19:29:30 -08:00
Yann Collet
4b6a94f0cc clarified comments on LDM parameters 2018-02-01 17:07:27 -08:00
Yann Collet
60fa90b6c0 zstdmt: added ability to change compression parameters during compression 2018-02-01 16:13:31 -08:00
Nick Terrell
48acaddff9 Test for incorrect pledgeSrcSize earlier 2018-02-01 12:04:05 -08:00
Yann Collet
727bb7f090
Merge pull request #1008 from terrelln/hlog3
Fix hashLog3 size when copying cdict tables
2018-01-31 12:49:07 -08:00
Nick Terrell
ab3346af07 Fix hashLog3 size when copying cdict tables 2018-01-31 11:12:17 -08:00
Yann Collet
823a28a1f4
Merge pull request #1000 from facebook/progressiveFlush
Progressive flush
2018-01-30 22:49:47 -08:00
Yann Collet
a2ba629971 fixed function declaration ZSTD_getBlockSize() 2018-01-30 15:03:39 -08:00
Yann Collet
2cb0740b6b zstdmt: changed naming convention
to avoid confusion with blocks.

also:
- jobs are cut into chunks of 512KB now, to reduce nb of mutex calls.
- fix function declaration ZSTD_getBlockSizeMax()
- fix outdated comment
2018-01-30 14:43:36 -08:00
systemcrash
6b57387728
Update README.md
spelling
2018-01-29 18:42:20 +01:00
Yann Collet
9f8ed23b5b bumped version number to v1.3.4
also added a paragraph on using compression level with training mode
as this is a recurrent question (see for example #1004)
2018-01-27 22:23:26 -08:00
Yann Collet
ba0cd8cf78 fixed minor conversion warning for C++ compilation mode 2018-01-26 18:18:42 -08:00
Yann Collet
caf9e96dc3 job mutex creation is checked 2018-01-26 18:09:25 -08:00
Yann Collet
9c40ae7ff1 zstdmt: there is now one mutex/cond per job 2018-01-26 17:55:08 -08:00
Yann Collet
77e36273de zstdmt: minor code refactor for clarity 2018-01-26 17:08:58 -08:00
Yann Collet
27c5853c42 zstdmt: job table correctly cleaned after synchronous ZSTDMT_compress() 2018-01-26 14:35:54 -08:00
Yann Collet
0d426f6b83 zstdmt : refactor a few member names
for clarity
2018-01-26 13:00:14 -08:00
Yann Collet
79b6e28b0a zstdmt : flush() only lock to read shared job members
Other job members are accessed directly.
This avoids a full job copy, which would access everything,
including a few members that are supposed to be used by worker only,
uselessly requiring additional locks to avoid race conditions.
2018-01-26 12:15:43 -08:00
Yann Collet
d2b62b6fa5 minor : ZSTDMT_writeLastEmptyBlock() is a void function
because it cannot fail
2018-01-26 11:06:34 -08:00
Yann Collet
fca13c6855 zstdmt : fixed memory leak
writeLastEmptyBlock() must release srcBuffer
as mtctx assumes it's done by job worker.

minor : changed 2 job member names (src->srcBuffer, srcStart->prefixStart) for clarity
2018-01-26 10:44:09 -08:00
Yann Collet
8e128eaf05 zstdmt : refactor job members
grouped by sharing properties
2018-01-26 10:20:38 -08:00
Yann Collet
777d3c1559 fixed minor declaration-after-statement warning 2018-01-25 17:45:18 -08:00
Yann Collet
a1d4041e69 zstdmt: removed job->jobCompleted
replaced by equivalent signal job->consumer == job->srcSize.

created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.

It avoids creating a job with srcSize==0.
2018-01-25 17:35:49 -08:00
Yann Collet
1272d8e760 zstdmt:: renamed mutex and cond to underline they are context-global 2018-01-25 14:52:34 -08:00
Yann Collet
5f349b129c zstdmt : correctly set end of frame 2018-01-23 15:52:40 -08:00
Yann Collet
c1cc57f270 zstdmt : fix end condition (ZSTD_e_end)
When ZSTD_e_end directive is provided,
the question is not only "are internal buffers completely flushed",
it is also "is current frame completed".

In some rare cases,
it was possible for internal buffers to be completely flushed,
triggering a @return == 0,
but frame was not completed as it needed a last null-size block to mark the end,
resulting in an unfinished frame.
2018-01-23 15:19:11 -08:00
Yann Collet
de5e38a7a6 zstdmt: fixed minor race condition
no real consequence, but pollute tsan tests :

job->dstBuff is being modified inside worker,
while main thread might read it accidentally
because it copies whole job.
But since it doesn't used dstBuff, there is no real consequence.

Other potential solution : only copy useful data, instead of whole job
2018-01-23 14:03:07 -08:00
Yann Collet
ebd955e26a zstdmt : fixed ending frame with 0-size block 2018-01-23 13:12:40 -08:00
Yann Collet
6711396d97 zstreamtest : fixed test 32 : multi-thread compression
using ZSTD_compress_generic(,,ZSTD_e_end)
Since it already provides ZSTD_e_end as directive,
it should not be followed by ZSTDMT_endStream().
2018-01-19 22:20:53 -08:00
Yann Collet
a7ef3a219c zstdmt : fixed last job size 2018-01-19 18:19:09 -08:00
Yann Collet
3ad7d4951c zstdmt : finally vanquished an elusive and rare race condition 2018-01-19 17:35:08 -08:00
Yann Collet
940634a610 zstdmt : simplify job creation
job will not be created when not enough room within job Table
2018-01-19 13:25:06 -08:00
Yann Collet
dc69623453 zstdmt: fixed corruption issue in ZSTDMT_endStream()
when invoked directly.
2018-01-19 12:41:56 -08:00
Yann Collet
70f81d6030 zstdmt uses POOL_tryAdd() to call a new worker
so that it's no longer a blocking call.
This makes it possible to stream out data gradually,
while waiting for a worker to become available.
2018-01-19 10:01:40 -08:00
Yann Collet
d19dc1903c
Merge pull request #995 from facebook/progressiveMT
Progressive mt
2018-01-18 17:59:49 -08:00
Yann Collet
6f7280fb33 fixed frame checksum issue
and race conditions
2018-01-18 16:20:26 -08:00
Yann Collet
997e4d0ccd added POOL_tryAdd() 2018-01-18 14:39:51 -08:00
Yann Collet
4f43ef731d Merge branch 'dev' into constCDict 2018-01-18 13:36:43 -08:00
Yann Collet
ef97d5a287 Merge branch 'progressiveMT' into progressiveFlush 2018-01-18 13:35:24 -08:00
Yann Collet
b6ab232f2d Merge branch 'dev' into progressiveMT 2018-01-18 13:34:56 -08:00
Nick Terrell
9d96761520 Set repcodes for empty ZSTD_CDict
When the dictionary is <= 8 bytes, no data is loaded from the dictionary.
In this case the repcodes weren't set, because they were inserted after the
size check. Fix this problem in general by first setting the cdict state to
a clean state of an empty dictionary, then filling the state from there.
2018-01-18 13:28:30 -08:00
Yann Collet
c7190c69cc fixes for @terrelln comments 2018-01-18 11:15:23 -08:00
Yann Collet
1b5d80d633 zstdmt: added ability to flush current job before it's completed
however, zstdmt may still wait on next available worker,
so it's not smooth yet.
2018-01-18 11:03:27 -08:00
Yann Collet
aa79c18e3f fixed a few access contention
passes thread sanitizer test
2018-01-17 17:18:19 -08:00
Yann Collet
394eec697b Introduce ZSTD_getFrameProgression()
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced

Ingested can be larger than consumed due to buffering effect.

For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.

That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.

This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
2018-01-17 16:39:02 -08:00
Yann Collet
f3b8f90b6d changed initStatic?Dict() return type to const ZSTD_?Dict*
ZSTD_create?Dict() is required to produce a ?Dict* return type
because `free()` does not accept a `const type*` argument.
If it wasn't for this restriction, I would have preferred to create a `const ?Dict*` object
to emphasize the fact that, once created, a dictionary never changes
(hence can be shared concurrently until the end of its lifetime).

There is no such limitation with initStatic?Dict() :
as stated in the doc, there is no corresponding free() function,
since `workspace` is provided, hence allocated, externally,
it can only be free() externally.

Which means, ZSTD_initStatic?Dict() can return a `const ZSTD_?Dict*` pointer.

Tested with `make all`, to catch initStatic's users,
which, incidentally, also updated zstd.h documentation.
2018-01-17 14:08:48 -08:00
Yann Collet
b86865323a Merge branch 'dev' into progressiveMT
fixed minor conflict on cdict
2018-01-17 13:51:03 -08:00
Yann Collet
d14cc881b0 zstdmt : fixed very large window sizes
would create too large buffers,
since default job size == window size * 4.

This would crash on 32-bit systems.

Also : jobSize being a 32-bit unsigned, it cannot be >= 4 GB,
so the formula was failing for large window sizes >= 1 GB.
Fixed now : max job Size is 2 GB, whatever the window size.
2018-01-17 12:39:58 -08:00
Yann Collet
58dd7de640 zstdmt: fixed an endless loop on allocation failure
this happened on 32-bits build when requiring a too large input buffer,
typically on wlog=29, creating jobs of 2 GB size.

also : zstd32 now compiles with multithread support enabled by default
(can be disabled with HAVE_THREAD=0)
2018-01-17 12:10:15 -08:00
Nick Terrell
16bd0fd4df Reduce size of ZSTD_CDict
Shaves 492,076 B off of the `ZSTD_CDict`.
The size of a `ZSTD_CDict` created from a 112,640 B dictionary is:

| Level | Before (B) | After (B) |
|-------|------------|-----------|
| 1     | 648,448    | 156,412   |
| 3     | 1,140,008  | 647,932   |
2018-01-17 11:50:49 -08:00
Yann Collet
cb57c107ff zstdmt: minor variable renaming, for clarity 2018-01-17 11:39:07 -08:00
Yann Collet
1dba98d563 introduced parameter ZSTD_p_nonBlockingMode
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.

It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.

Applied within fileio.c, this patch provides non-negligible gains during compression.

Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9

With traditional single-thread blocking mode :
real    0m9.557s
user    0m8.861s
sys     0m0.538s

With new single-worker non blocking mode :
real    0m7.938s
user    0m8.049s
sys     0m0.514s

=> 20% faster
2018-01-16 16:15:47 -08:00
Yann Collet
6025465e42 ZSTDMT : minor CCtx memory optimization
can be useful when a compression job only has small amount of data to compress.
2018-01-16 15:34:41 -08:00
Yann Collet
2e23333094 ZSTDMT can now work in non-blocking mode with 1 thread
it still fallbacks to single-thread blocking invocation
when input is small (<1job)
or when invoking ZSTDMT_compress(), which is blocking.

Also : fixed a bug in new block-granular compression routine.
2018-01-16 15:28:43 -08:00
Yann Collet
8e83c5c910 Merge branch 'dev' into progressiveMT 2018-01-16 12:54:33 -08:00
Nick Terrell
aae267a2e1 Reorganize block state 2018-01-16 11:17:50 -08:00
Nick Terrell
887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
Yann Collet
9477f6529d
Merge pull request #984 from terrelln/dict-load
Load more dictionary positions into table if empty
2018-01-13 13:20:42 -08:00
Yann Collet
58ecf13e02 zstdmt : can compress at block granularity
offering perspective of more accurate progression report.
2018-01-13 13:18:57 -08:00
Nick Terrell
9a211d1f05 Load more dictionary positions into table if empty
If the hash table is empty load positions into the hash table
that we would otherwise skip.

| Level | Data Set     | Improvement |
|-------|--------------|-------------|
| 1     | github       | 0.44%       |
| 1     | hg-changelog | 0.13%       |
| 1     | hg-commands  | 1.28%       |
| 1     | hg-manifest  | 0.70%       |
| 3     | github       | 0.74%       |
| 3     | hg-changelog | 0.87%       |
| 3     | hg-commands  | 1.74%       |
| 3     | hg-manifest  | 0.23%       |
2018-01-12 16:17:22 -08:00
Yann Collet
863b2f8db4
Merge pull request #983 from terrelln/dict-wlog
Increase windowLog from CDict based on the srcSize when known
2018-01-12 07:47:43 -08:00
Nick Terrell
b610b777d3 Increase windowLog from CDict based on the srcSize when known 2018-01-11 16:23:21 -08:00
Yann Collet
cacf47cbee Merge branch 'dev' into dubtlazy
and fixed conflicts
2018-01-11 13:25:08 -08:00
Yann Collet
04c00f9388
Merge pull request #982 from facebook/fix304
Fix for #304 and #977 : error during dictionary creation
2018-01-11 13:20:59 -08:00
Yann Collet
b9a14900ff changed function name to ZSTD_DUBT_findBestMatch() 2018-01-11 12:38:31 -08:00
Yann Collet
752bae4a48 added warning message
when pathological dataset is detected
(note : cover_optimize needs -v to display the warning)
2018-01-11 11:29:28 -08:00
Yann Collet
e8093dde09 fixed #304
Pathological samples may result in literal section being incompressible.
This case is now detected,
and literal distribution is replaced by one that can be written into the dictionary.
2018-01-11 11:16:32 -08:00
Yann Collet
218e9fe0fc added a test case for dictBuilder failure
cyclic data set makes the entropy stage fails
now, onto a fix for #304 ...
2018-01-11 09:42:38 -08:00
Yann Collet
ff795580f2 fixed bug #976, reported by @indygreg
constants in zstd.h should not depend on MIN() macro which existence is not guaranteed.

Added a test to check the specific constants.
The test is a bit too specific.
But I have found no way to control a more generic "are all macro already defined" condition,
especially as this is a valid construction (the missing macro might be defined later, intentionnally).
2018-01-10 20:33:45 -08:00
Yann Collet
292eeb672f api doc : grouped all ZSTD_create*_advanced() functions together
in a new "custom memory allocator" paragraph
which is itself part of "memory management" category.

This makes it simpler to see the relation between the type and its usages.
2018-01-10 09:07:47 -08:00
Yann Collet
3ea156368c API doc : grouped ZSTD_initStatic*() together
within "memory management" category.
2018-01-10 08:49:50 -08:00
Yann Collet
b17fb488b0 fixed msan test
a pointer calculation was wrong in a corner case
2018-01-06 20:50:36 +01:00
Yann Collet
658d6b8588 Merge branch 'dev' into dubtlazy 2018-01-06 12:40:58 +01:00
Yann Collet
a927fae2a1 fixed ZSTD_reduceIndex()
following suggestions from @terrelln.
Also added some comments to present logic behind ZSTD_preserveUnsortedMark().
2018-01-06 12:31:26 +01:00
Yann Collet
2eff217136 updated /lib documentation 2017-12-31 15:50:00 +01:00
Yann Collet
00db4dbbb3 fixed minor argument property for Visual 2017-12-30 15:42:28 +01:00
Yann Collet
f597f55675 improved btlazy2 : list of unsorted candidates can reach extDict
It used to stop on reaching extDict, for simplification.
As a consequence, there was a small loss of performance each time the round buffer would restart from beginning.
It's not a large difference though, just several hundreds of bytes on silesia.
This patch fixes it.
2017-12-30 15:12:59 +01:00
Yann Collet
a68b76afef updated compression level table for btlazy2
now selected for levels 13, 14 and 15.

Also : dropped the requirement for monotonic memory budget increase of compression levels,,
which was required for ZSTD_estimateCCtxSize()
in order to ensure that a memory budget for level L is large enough for any level <= L.
This condition is now ensured at run time inside ZSTD_estimateCCtxSize().
2017-12-30 11:40:35 +01:00
Yann Collet
eb52e2f45e simplify ZSTD_preserveUnsortedMark() implementation
since no compiler attempts to auto-vectorize it.
2017-12-30 11:13:52 +01:00
Yann Collet
d228b6b0d0 btlazy2 : optimization for dictionary compression
we want the dictionary table to be fully sorted,
not just lazily filled.
Dictionary loading is a bit more intensive,
but it saves cpu cycles for match search during compression.
2017-12-29 19:14:18 +01:00
Yann Collet
02f64ef955 btlazy2: fixed interaction between unsortedMark and reduceTable 2017-12-29 19:08:51 +01:00
Yann Collet
64482c2c97 fixed bug in dubt
the chain of unsorted candidates could grow beyond lowLimit.
2017-12-29 17:04:37 +01:00
Yann Collet
f36da5b4d9 minor speed optimization : index overflow prevention
new code supposed to be easier to auto-vectorize
2017-12-29 14:40:33 +01:00
Yann Collet
5235d8d6ba first implementation of delayed update for btlazy2
This is a pretty nice speed win.

The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.

The baseline performance for btlazy2 on my laptop is :
15#calgary.tar       :   3265536 ->    955985 (3.416),  7.06 MB/s , 618.0 MB/s
15#enwik7            :  10000000 ->   3067341 (3.260),  4.65 MB/s , 521.2 MB/s
15#silesia.tar       : 211984896 ->  58095131 (3.649),  6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)

After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar       :   3265536 ->    958060 (3.408),  9.12 MB/s , 621.1 MB/s
15#enwik7            :  10000000 ->   3078318 (3.249),  6.37 MB/s , 525.1 MB/s
15#silesia.tar       : 211984896 ->  58444111 (3.627),  9.89 MB/s , 680.4 MB/s

That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.

As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.

To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar       :   3265536 ->    956311 (3.415),  8.32 MB/s , 611.4 MB/s
15#enwik7            :  10000000 ->   3067655 (3.260),  5.43 MB/s , 535.5 MB/s
15#silesia.tar       : 211984896 ->  58113144 (3.648),  8.35 MB/s , 679.3 MB/s

aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).

This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
2017-12-28 16:58:57 +01:00
Yann Collet
473362e922
Merge pull request #958 from facebook/continueCCtx
fix a subtle issue in continue mode
2017-12-20 00:12:50 +01:00
Yann Collet
cafedcbbe4 ZSTD_resetCCtx_internal: fixed order of arguments
params1 was swapped with params2.
This used to be a non-issue when testing for strict equality,
but now that some tests look for "sufficient size" `<=`, order matters.
2017-12-19 21:49:04 +01:00
Yann Collet
9096088f45 changed variable name for clarity, suggested by @terrelln 2017-12-19 21:20:46 +01:00