Commit Graph

631 Commits

Author SHA1 Message Date
Nick Terrell
280a236e9e Add ZSTD_CCtx(Param)?_getParameter() function
Closes #1096.
2018-04-12 11:50:12 -07:00
Nick Terrell
295ab0dbfa Only load extra table positions for CDicts
Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.

Fixes #1077.
2018-04-02 14:41:30 -07:00
Yann Collet
153bc1c004 removed limit ZSTD_TARGETLENGTH_MAX
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".

It will also be possible to define larger targetlength for ultra compression mode.

There is no adverse side effect due to removing this limit.
2018-03-21 15:50:05 -07:00
Yann Collet
a99c4a3621 Merge branch 'dev' into advancedDecompress 2018-03-21 06:08:28 -07:00
Yann Collet
87b0cf05bd
Merge pull request #1057 from facebook/lrmSettings
LRM parameters
2018-03-21 05:59:39 -07:00
Yann Collet
d1bf609abf
Merge pull request #1059 from terrelln/mt-ldm
Integrate ldm with zstdmt
2018-03-20 17:50:20 -07:00
Yann Collet
878728dc26 fixed several comments by @terrelln 2018-03-20 16:35:14 -07:00
Yann Collet
e1c52faace
Merge pull request #1060 from facebook/compressImpl
merge bmi2 implementation of encodeSequence into zstd_compress.c
2018-03-20 16:19:42 -07:00
Yann Collet
6873fec658 changed dictMore for dictContentType
which seems clearer to describe what the variable/argument is about.
2018-03-20 15:13:14 -07:00
Nick Terrell
136b9e2392 Fix external sequence corner cases
* Clear external sequences when we reset the `ZSTD_CCtx`.
* Skip external sequences when a block is too small to compress.
2018-03-20 14:50:28 -07:00
Yann Collet
2ed5af0766 merge bmi2 implementation of encodeSequence into zstd_compress.c 2018-03-19 19:10:31 -07:00
Yann Collet
c8b3d389fd updated CCtxParams API
to respect naming convention :
ZSTD_CCtxParams_*()
2018-03-19 15:07:26 -07:00
Yann Collet
6f4d0778a5 make it possible to express compression parameters in any order 2018-03-19 14:41:23 -07:00
Yann Collet
9618c0c804 make it possible to specify LDM parameters in any order 2018-03-19 11:07:04 -07:00
Nick Terrell
4af1fafeb8 Restore setting loadedDictEnd
Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`,
which means that dictionary compression will only be able to reference the parts of
the dictionary within the window. The spec allows us to reference the entire
dictionary so long as even one byte is in the window.

`ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd`
beyond the window, even once the dictionary was out of range.

When overflow protection kicked in, the check `current > loadedDictEnd + maxDist`
is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset
below the value, which would incorrectly allow references beyond the window. This
bug is present in `master`, but is very hard to trigger, since it requires both
dictionaries and data which triggers overflow correction.
2018-03-16 14:54:06 -07:00
Yann Collet
192542b63c
Merge pull request #1047 from facebook/hufCompress
removed huf_compress_impl.h
2018-03-15 14:14:03 -07:00
Nick Terrell
1908c92c46 Merge remote-tracking branch 'upstream/dev' into extern-seq
* upstream/dev:
  Fix overflow protection with wlog=31
2018-03-14 17:26:31 -07:00
Yann Collet
a909c293c6 Merge branch 'dev' into hufCompress 2018-03-14 16:11:25 -07:00
Nick Terrell
a9a6dcba63 Expose reference external sequence API
* Expose the reference external sequences API for zstdmt.
  Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.
2018-03-14 12:29:31 -07:00
Nick Terrell
33fb966e56 Fix overflow protection with wlog=31
The overflow protection is broken when the window log is `> (3U << 29)`, so 31.
It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`,
and the the assertion `current > newCurrent` fails. This happens when the same
context is used many times over, but with a large window log, like in zstdmt.

Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`.

The added test fails before the patch, and passes after.
2018-03-14 11:45:44 -07:00
Yann Collet
50f763ec44 fixed several comments are underlined by @terrelln 2018-03-13 14:23:14 -07:00
Yann Collet
a95a88af57 removed huf_compress_impl.h
re-imported all functions inside huf_compress.c
for easier source editing.

Also updated a bunch of code comments
for clarification.
2018-03-13 14:14:05 -07:00
Yann Collet
2291b85a1e changed ZSTD_p_literalCompression into ZSTD_p_compressLiterals
prefer verb+object construction
2018-03-12 11:44:10 -07:00
Yann Collet
6a9b41b731 create command --fast[=#]
access negative compression levels from command line
for both compression and benchmark modes.

also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.

added relevant cli tests.
2018-03-11 20:01:23 -07:00
Yann Collet
a146ee04ae added negative compression levels
negative compression level trade compression ratio for more compression speed.
They turn off huffman compression of literals,
and use row 0 as baseline with a stepSize = -cLevel.

added associated test in fuzzer

also added : new advanced parameter ZSTD_p_literalCompression
2018-03-11 05:21:53 -07:00
Yann Collet
facc09aa03 minor compression level adaptation
level 12 compresses slightly more and faster
due to better btlazy2 mode
2018-03-11 03:06:52 -07:00
Nick Terrell
0a0e64c641 LDM manages its own window round buffer 2018-02-27 12:13:23 -08:00
Nick Terrell
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
Nick Terrell
af866b3a58 Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
  stores them in a table. It should work with any chunk size, but
  is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
  instead of encoding the literals directly, it passes them to a
  secondary block compressor. The code to handle chunk sizes greater
  than the block size is currently commented out, since it is unused.
  The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
  `targetLength`. Also don't emit repcode matches in the LDM block
  compressor. Enabling the LDM with the optimal parser now actually improves
  the compression ratio.
* The compression ratio is very similar to before. It is very slightly
  different, because the repcode handling is slightly different. If I remove
  immediate repcode checking in both branches the compressed size is exactly
  the same.
* The speed looks to be the same or better than before.

Up Next (in a separate PR)
--------------------------

Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
2018-02-22 15:18:41 -08:00
Nick Terrell
b58f01537e [compress] Support BMI2 2018-02-14 19:20:32 -08:00
Yann Collet
5f7495371e Merge branch 'dev' into fasterDec 2018-02-10 14:24:44 -08:00
Yann Collet
9945e60ac4 Merge branch 'dev' into flexibleLevel 2018-02-10 11:54:49 -08:00
Yann Collet
c72091556b fixed minor nit as per @terrelln's comments 2018-02-09 09:46:08 -08:00
Yann Collet
95424409ea addBits and baseline into FSE decoding table
note : unfinished
- need new default tables
- need modify long mode
2018-02-09 04:25:15 -08:00
Yann Collet
de68c2ff10 Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex()
as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).

Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
2018-02-07 14:22:35 -08:00
Yann Collet
5188749e1c ensure compression parameters are updated when only compression level is changed 2018-02-02 16:31:20 -08:00
Yann Collet
4b525af53a zstdmt: applies new parameters on the fly
when invoked from ZSTD_compress_generic()
2018-02-02 15:58:13 -08:00
Yann Collet
90eca318a7 fileio: create dedicated function to generate zstd frames
like other formats
2018-02-02 14:24:56 -08:00
Yann Collet
209df52ba2 Changed nbThreads for nbWorkers
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
2018-02-01 19:29:30 -08:00
Yann Collet
60fa90b6c0 zstdmt: added ability to change compression parameters during compression 2018-02-01 16:13:31 -08:00
Nick Terrell
48acaddff9 Test for incorrect pledgeSrcSize earlier 2018-02-01 12:04:05 -08:00
Yann Collet
727bb7f090
Merge pull request #1008 from terrelln/hlog3
Fix hashLog3 size when copying cdict tables
2018-01-31 12:49:07 -08:00
Nick Terrell
ab3346af07 Fix hashLog3 size when copying cdict tables 2018-01-31 11:12:17 -08:00
Yann Collet
823a28a1f4
Merge pull request #1000 from facebook/progressiveFlush
Progressive flush
2018-01-30 22:49:47 -08:00
Yann Collet
777d3c1559 fixed minor declaration-after-statement warning 2018-01-25 17:45:18 -08:00
Yann Collet
a1d4041e69 zstdmt: removed job->jobCompleted
replaced by equivalent signal job->consumer == job->srcSize.

created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.

It avoids creating a job with srcSize==0.
2018-01-25 17:35:49 -08:00
Yann Collet
d19dc1903c
Merge pull request #995 from facebook/progressiveMT
Progressive mt
2018-01-18 17:59:49 -08:00
Yann Collet
6f7280fb33 fixed frame checksum issue
and race conditions
2018-01-18 16:20:26 -08:00
Yann Collet
4f43ef731d Merge branch 'dev' into constCDict 2018-01-18 13:36:43 -08:00
Yann Collet
b6ab232f2d Merge branch 'dev' into progressiveMT 2018-01-18 13:34:56 -08:00
Nick Terrell
9d96761520 Set repcodes for empty ZSTD_CDict
When the dictionary is <= 8 bytes, no data is loaded from the dictionary.
In this case the repcodes weren't set, because they were inserted after the
size check. Fix this problem in general by first setting the cdict state to
a clean state of an empty dictionary, then filling the state from there.
2018-01-18 13:28:30 -08:00
Yann Collet
394eec697b Introduce ZSTD_getFrameProgression()
Produces 3 statistics for ongoing frame compression :
- ingested
- consumed (effectively compressed)
- produced

Ingested can be larger than consumed due to buffering effect.

For the time being, this patch mostly fixes the % ratio issue,
since it computes consumed / produced,
instead of ingested / produced.

That being said, update is not "smooth",
because on a slow enough setting,
fileio spends most of its time waiting for a worker to complete its job.

This could be improved thanks to more granular flushing
i.e. start flushing before ongoing job is fully completed.
2018-01-17 16:39:02 -08:00
Yann Collet
f3b8f90b6d changed initStatic?Dict() return type to const ZSTD_?Dict*
ZSTD_create?Dict() is required to produce a ?Dict* return type
because `free()` does not accept a `const type*` argument.
If it wasn't for this restriction, I would have preferred to create a `const ?Dict*` object
to emphasize the fact that, once created, a dictionary never changes
(hence can be shared concurrently until the end of its lifetime).

There is no such limitation with initStatic?Dict() :
as stated in the doc, there is no corresponding free() function,
since `workspace` is provided, hence allocated, externally,
it can only be free() externally.

Which means, ZSTD_initStatic?Dict() can return a `const ZSTD_?Dict*` pointer.

Tested with `make all`, to catch initStatic's users,
which, incidentally, also updated zstd.h documentation.
2018-01-17 14:08:48 -08:00
Yann Collet
b86865323a Merge branch 'dev' into progressiveMT
fixed minor conflict on cdict
2018-01-17 13:51:03 -08:00
Nick Terrell
16bd0fd4df Reduce size of ZSTD_CDict
Shaves 492,076 B off of the `ZSTD_CDict`.
The size of a `ZSTD_CDict` created from a 112,640 B dictionary is:

| Level | Before (B) | After (B) |
|-------|------------|-----------|
| 1     | 648,448    | 156,412   |
| 3     | 1,140,008  | 647,932   |
2018-01-17 11:50:49 -08:00
Yann Collet
cb57c107ff zstdmt: minor variable renaming, for clarity 2018-01-17 11:39:07 -08:00
Yann Collet
1dba98d563 introduced parameter ZSTD_p_nonBlockingMode
This new parameter makes it possible to call
streaming ZSTDMT with a single thread set
which is non blocking.

It makes it possible for the main thread to do other tasks in parallel
while the worker thread does compression.
Typically, for zstd cli, it means it can do I/O stuff.

Applied within fileio.c, this patch provides non-negligible gains during compression.

Tested on my laptop, with enwik9 (1000000000 bytes) : time zstd -f enwik9

With traditional single-thread blocking mode :
real    0m9.557s
user    0m8.861s
sys     0m0.538s

With new single-worker non blocking mode :
real    0m7.938s
user    0m8.049s
sys     0m0.514s

=> 20% faster
2018-01-16 16:15:47 -08:00
Yann Collet
2e23333094 ZSTDMT can now work in non-blocking mode with 1 thread
it still fallbacks to single-thread blocking invocation
when input is small (<1job)
or when invoking ZSTDMT_compress(), which is blocking.

Also : fixed a bug in new block-granular compression routine.
2018-01-16 15:28:43 -08:00
Nick Terrell
aae267a2e1 Reorganize block state 2018-01-16 11:17:50 -08:00
Nick Terrell
887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
Yann Collet
863b2f8db4
Merge pull request #983 from terrelln/dict-wlog
Increase windowLog from CDict based on the srcSize when known
2018-01-12 07:47:43 -08:00
Nick Terrell
b610b777d3 Increase windowLog from CDict based on the srcSize when known 2018-01-11 16:23:21 -08:00
Yann Collet
cacf47cbee Merge branch 'dev' into dubtlazy
and fixed conflicts
2018-01-11 13:25:08 -08:00
Yann Collet
218e9fe0fc added a test case for dictBuilder failure
cyclic data set makes the entropy stage fails
now, onto a fix for #304 ...
2018-01-11 09:42:38 -08:00
Yann Collet
3ea156368c API doc : grouped ZSTD_initStatic*() together
within "memory management" category.
2018-01-10 08:49:50 -08:00
Yann Collet
a927fae2a1 fixed ZSTD_reduceIndex()
following suggestions from @terrelln.
Also added some comments to present logic behind ZSTD_preserveUnsortedMark().
2018-01-06 12:31:26 +01:00
Yann Collet
a68b76afef updated compression level table for btlazy2
now selected for levels 13, 14 and 15.

Also : dropped the requirement for monotonic memory budget increase of compression levels,,
which was required for ZSTD_estimateCCtxSize()
in order to ensure that a memory budget for level L is large enough for any level <= L.
This condition is now ensured at run time inside ZSTD_estimateCCtxSize().
2017-12-30 11:40:35 +01:00
Yann Collet
d228b6b0d0 btlazy2 : optimization for dictionary compression
we want the dictionary table to be fully sorted,
not just lazily filled.
Dictionary loading is a bit more intensive,
but it saves cpu cycles for match search during compression.
2017-12-29 19:14:18 +01:00
Yann Collet
02f64ef955 btlazy2: fixed interaction between unsortedMark and reduceTable 2017-12-29 19:08:51 +01:00
Yann Collet
64482c2c97 fixed bug in dubt
the chain of unsorted candidates could grow beyond lowLimit.
2017-12-29 17:04:37 +01:00
Yann Collet
f36da5b4d9 minor speed optimization : index overflow prevention
new code supposed to be easier to auto-vectorize
2017-12-29 14:40:33 +01:00
Yann Collet
5235d8d6ba first implementation of delayed update for btlazy2
This is a pretty nice speed win.

The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.

The baseline performance for btlazy2 on my laptop is :
15#calgary.tar       :   3265536 ->    955985 (3.416),  7.06 MB/s , 618.0 MB/s
15#enwik7            :  10000000 ->   3067341 (3.260),  4.65 MB/s , 521.2 MB/s
15#silesia.tar       : 211984896 ->  58095131 (3.649),  6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)

After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar       :   3265536 ->    958060 (3.408),  9.12 MB/s , 621.1 MB/s
15#enwik7            :  10000000 ->   3078318 (3.249),  6.37 MB/s , 525.1 MB/s
15#silesia.tar       : 211984896 ->  58444111 (3.627),  9.89 MB/s , 680.4 MB/s

That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.

As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.

To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar       :   3265536 ->    956311 (3.415),  8.32 MB/s , 611.4 MB/s
15#enwik7            :  10000000 ->   3067655 (3.260),  5.43 MB/s , 535.5 MB/s
15#silesia.tar       : 211984896 ->  58113144 (3.648),  8.35 MB/s , 679.3 MB/s

aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).

This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
2017-12-28 16:58:57 +01:00
Yann Collet
473362e922
Merge pull request #958 from facebook/continueCCtx
fix a subtle issue in continue mode
2017-12-20 00:12:50 +01:00
Yann Collet
cafedcbbe4 ZSTD_resetCCtx_internal: fixed order of arguments
params1 was swapped with params2.
This used to be a non-issue when testing for strict equality,
but now that some tests look for "sufficient size" `<=`, order matters.
2017-12-19 21:49:04 +01:00
Yann Collet
9096088f45 changed variable name for clarity, suggested by @terrelln 2017-12-19 21:20:46 +01:00
Yann Collet
f299fa39ac fix a subtle issue in continue mode
The deep fuzzer tests caught a subtle bug that was probably there for a long time.
The impact of the bug is not a crash, or any other clear error signal,
rather, it reduces performance, by cutting data into smaller blocks.
Eventually, the following test would fail because it produces too many 1-byte blocks,
requiring more space than buffer can provide :
`./zstreamtest_asan --mt -s3514 -t1678312 -i1678314`

The root scenario is as follows :
- Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1`
- The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`.
  `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`.
  The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes.
- Complete compression
- New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN`
  trigger "continue mode"
- Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only,
  but in fact, it also depends on `pledgedSrcSize`.
- The "old" blocksize (1) is still there,
  next compression will use this value to cut input into blocks,
  resulting in more blocks and worse performance than necessary performance.

Given the scenario, and its possible variants, I'm surprised it did not show up before.
But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger.
The above test is a special corner case, where performance is so impacted that it reaches an error case.

The fix works, but I'm not completely pleased.
I think the current code relies too much on implied relations between variables.
This will likely break again in the future when some related part of the code change.
Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3.
So a longer term fix will have to be considered after the release.

To do : create a reliable test case which triggers this scenario for CI tests.
2017-12-19 09:43:03 +01:00
Yann Collet
5c2f2ebfdb zstdmt via compress_generic: reduce opportunity to free/create mtctx
`zstreamtest --newapi` (and `--opaqueapi`) create and destroy way too many threads
resulting in failure of tsan tests,
and potentially connected to the qemu flaky tests.

This is because, at each test, the nb of threads can be changed (random).

The `--no-big-tests` directive reduce this choice to 1/2 threads,
in order to limit memory usage, especially for qemu and 32-bits builds.
Unfortunately, swapping between 1 and 2 threads is enough to constantly create/destroy new mtctx.

This patch takes advantage of the following property :
via compress_generic, no internal mtctx is needed for nbThreads < 2.
As a consequence, when nbThreads == 2, the currently active mtctx is necessarily good.

This dramatically reduces the nb of thread creations when invoking `zstreamtest --newapi --no-big-tests`
(only when parent cctx itself is created, which is randomized to 1/256 tests).

Expected outcome :
- at a minimum : tsan tests shall now work continuously without exploding the thread counter
- at best : flaky qemu tests on `zstreamtest --newapi --no-big-tests` may stop being flaky, due to less stress from constant thread creation/destruction

Real world impact :
minimal, I don't expect users to constantly change `nbThreads` between each invocation.
If `nbThreads` remains stable, existing implementation re-uses existing mtctx.

Also : `zstreamtest --newapi` but without `--no-big-tests` doesn't benefit as much,
since this test can select a random `nbThreads` value between 1 and 4.
The current patch only reduces opportunity to free/create mtctx (for example : 2->1->2 doesn't need a new mtctx)
but doesn't completely eliminate it, since `nbThreads` can still change between 2/3/4.
A more complete solution could be to only use 2 out of 4 allocated threads, thus keeping the pool at a constant size.
This would require a larger change to `POOL_*` api though.
2017-12-16 12:48:13 -08:00
Yann Collet
3cbfac1cdb updated levels 15-20
taking advantage of `btopt` improved speed to tune parameters.
Levels 16-19 are stronger than previous release, making the graph more favorable.

In theory, I should also update small-size tables,
but I got lazy on that one ...
2017-12-14 23:29:00 -08:00
Yann Collet
8c41a9cb1e
Merge pull request #951 from facebook/lastBlock
saves 3-bytes on small input with streaming API
2017-12-14 15:39:50 -08:00
Yann Collet
a0ac8c895c
Merge pull request #950 from facebook/srcSizeAdaptation
fix adaptation on srcSize
2017-12-14 14:48:31 -08:00
Yann Collet
281f06e01f saves 3-bytes on small input with streaming API
zstd streaming API was adding a null-block at end of frame for small input.

Reason is : on small input, a single block is enough.
ZSTD_CStream would size its input buffer to expect a single block of this size,
automatically triggering a flush on reaching this size.

Unfortunately, that last byte was generally received before the "end" directive (at least in `fileio`).
The later "end" directive would force the creation of a 3-bytes last block to indicate end of frame.

The solution is to not flush automatically, which is btw the expected behavior.
It happens in this case because blocksize is defined with exactly the same size as input.
Just adding one-byte is enough to stop triggering the automatic flush.

I initially looked at another solution, solving the problem directly in the compression context.
But it felt awkward.
Now, the underlying compression API `ZSTD_compressContinue()` would take the decision the close a frame
on reaching its expected end (`pledgedSrcSize`).
This feels awkward, a responsability over-reach, beyond the definition of this API.
ZSTD_compressContinue() is clearly documented as a guaranteed flush,
with ZSTD_compressEnd() generating a guaranteed end.

I faced similar issue when trying to port a similar mechanism at the higher streaming layer.
Having ZSTD_CStream end a frame automatically on reaching `pledgedSrcSize` can surprise the caller,
since it did not explicitly requested an end of frame.
The only sensible action remaining after that is to end the frame with no additional input.
This adds additional logic in the ZSTD_CStream state to check this condition.
Plus some potential confusion on the meaning of ZSTD_endStream() with no additional input (ending confirmation ? new 0-size frame ?)

In the end, just enlarging input buffer by 1 byte feels the least intrusive change.
It's also a contract remaining inside the streaming layer, so the logic is contained in this part of the code.

The patch also introduces a new test checking that size of small frame is as expected, without additional 3-bytes null block.
2017-12-14 11:47:02 -08:00
Yann Collet
c005df136f
Merge pull request #947 from facebook/fix944
Fix #944
2017-12-14 10:01:52 -08:00
Yann Collet
2e97a6d464 fixed minor declaration-after-statement warning 2017-12-13 18:50:05 -08:00
Yann Collet
5432ef6921 fixes adaptation on srcSize
This patch restores capability for each file to receive adapted compression parameters depending on its size.

The bug breaking this feature was relatively silly :
setting a parameter with a value "0" is supposed to be a no-op.
Unfortunately, it would pin down compression parameters as if they were manually set,
preventing later automatic adaptation.

Unfortunately, I'm currently short of a test case that could check this situation and trigger an error.
Compression parameters selection between tableID 0,1,2,3 is largely internal,
leaving no trace to outside world, not even in frame header.
2017-12-13 17:45:26 -08:00
Yann Collet
d23eb9a098 zstreamtest : added missing CHECK_Z() 2017-12-13 15:35:49 -08:00
Nick Terrell
22727a7467 Fix cdict compressor repcodes 2017-12-13 11:31:20 -08:00
Yann Collet
e28305fcca fix #944 : ZSTDMT with large files and dictionary now works correctly
windowLog is now enforced from provided compression parameters,
instead of being copied blindly from `cdict`
where it could be smaller.

also :
- fix a minor bug in zstreamtest --mt : advanced parameters must be set before init
- changed advanced parameter name to ZSTDMT_jobSize
2017-12-12 18:04:58 -08:00
Yann Collet
03832b7aa5 re-added test case
messing with revert ... :(
2017-12-12 14:01:54 -08:00
Yann Collet
8a104fda05 Revert "Created a test case which reliably reproduces bug #944"
This reverts commit 5098d1fbe2.
2017-12-12 12:51:49 -08:00
Yann Collet
5098d1fbe2 Created a test case which reliably reproduces bug #944
in zstreamtest.
2017-12-12 12:48:31 -08:00
Yann Collet
dfc697e967 comment clarification 2017-12-08 12:16:49 -05:00
Yann Collet
c029ee1f0b ZSTD_initCStream_srcSize() considers "0" to mean "unknown"
to not break existing programs relying on this behavior.
Might be changed to mean "empty" in the future.
2017-12-07 17:13:10 -05:00
Yann Collet
3aa2b27a89 fix #942 : streaming interface does not compress after ZSTD_initCStream()
While the final result is still, technically, a frame,
the resulting frame expands initial data instead of compressing it.
This is because the streaming API creates a tiny 1-byte buffer for input,
because it believes input is empty (0-bytes),
because in the past, 0 used to mean "unknown" instead.

This patch fixes the issue.
Todo : add a test which traps the issue.
2017-12-07 02:52:50 -05:00
Yann Collet
5e1f34b7e4 setParameter : no side-effect on setting a compression parameter
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
2017-12-01 21:17:09 -08:00
Yann Collet
78290874a5 fixed Visual warning on minor interface discrepancy 2017-11-29 17:01:14 -08:00
Yann Collet
d3c59edac9 removed long-range-mode tests from zstreamtest --no-big-tests 2017-11-29 16:42:20 -08:00
Yann Collet
998a93b784 simplified ZSTD_CCtx_setParametersUsingCCtxParams()
Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action.
Any action shall be taken at parameter application only (during init).
It makes it possible to just copy CCtxParams from external container to internal state,
and get rid of the more complex code which was trying to compensate for missing actions.
2017-11-29 16:13:05 -08:00
Yann Collet
23767e950a fix one UB pointer arithmetic in encoder
Instead of calculating distance between 2 memory objects, which is UB,
we extract the offset from object 1, and transfer it into object 2.
2017-11-17 13:24:51 -08:00
Yann Collet
15768cabb5 fixed some complex scenarios
Fixed : multithreading to compress some small data with dictionary
Fixed : ZSTD_initCStream_usingCDict()
Improved streaming memory usage when pledgedSrcSize is known.
2017-11-16 15:18:18 -08:00
Yann Collet
05dffe43a7 Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.

Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.

It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.

Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.

Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.

- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
  Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
2017-11-16 12:18:56 -08:00