Commit Graph

498 Commits

Author SHA1 Message Date
Yann Collet
c005df136f
Merge pull request #947 from facebook/fix944
Fix #944
2017-12-14 10:01:52 -08:00
Yann Collet
d23eb9a098 zstreamtest : added missing CHECK_Z() 2017-12-13 15:35:49 -08:00
Nick Terrell
22727a7467 Fix cdict compressor repcodes 2017-12-13 11:31:20 -08:00
Yann Collet
e28305fcca fix #944 : ZSTDMT with large files and dictionary now works correctly
windowLog is now enforced from provided compression parameters,
instead of being copied blindly from `cdict`
where it could be smaller.

also :
- fix a minor bug in zstreamtest --mt : advanced parameters must be set before init
- changed advanced parameter name to ZSTDMT_jobSize
2017-12-12 18:04:58 -08:00
Yann Collet
03832b7aa5 re-added test case
messing with revert ... :(
2017-12-12 14:01:54 -08:00
Yann Collet
8a104fda05 Revert "Created a test case which reliably reproduces bug #944"
This reverts commit 5098d1fbe2.
2017-12-12 12:51:49 -08:00
Yann Collet
5098d1fbe2 Created a test case which reliably reproduces bug #944
in zstreamtest.
2017-12-12 12:48:31 -08:00
Yann Collet
dfc697e967 comment clarification 2017-12-08 12:16:49 -05:00
Yann Collet
c029ee1f0b ZSTD_initCStream_srcSize() considers "0" to mean "unknown"
to not break existing programs relying on this behavior.
Might be changed to mean "empty" in the future.
2017-12-07 17:13:10 -05:00
Yann Collet
3aa2b27a89 fix #942 : streaming interface does not compress after ZSTD_initCStream()
While the final result is still, technically, a frame,
the resulting frame expands initial data instead of compressing it.
This is because the streaming API creates a tiny 1-byte buffer for input,
because it believes input is empty (0-bytes),
because in the past, 0 used to mean "unknown" instead.

This patch fixes the issue.
Todo : add a test which traps the issue.
2017-12-07 02:52:50 -05:00
Yann Collet
5e1f34b7e4 setParameter : no side-effect on setting a compression parameter
last such side-effect was modifying cctx->loadedDictEnd on setting forceWindow.
It is no a useless operation, so it's removed.
No side-effect left when setting a compression parameter.
2017-12-01 21:17:09 -08:00
Yann Collet
78290874a5 fixed Visual warning on minor interface discrepancy 2017-11-29 17:01:14 -08:00
Yann Collet
d3c59edac9 removed long-range-mode tests from zstreamtest --no-big-tests 2017-11-29 16:42:20 -08:00
Yann Collet
998a93b784 simplified ZSTD_CCtx_setParametersUsingCCtxParams()
Any ZSTD_CCtx_setParameter() shall just write the requested parameter, without further action.
Any action shall be taken at parameter application only (during init).
It makes it possible to just copy CCtxParams from external container to internal state,
and get rid of the more complex code which was trying to compensate for missing actions.
2017-11-29 16:13:05 -08:00
Yann Collet
23767e950a fix one UB pointer arithmetic in encoder
Instead of calculating distance between 2 memory objects, which is UB,
we extract the offset from object 1, and transfer it into object 2.
2017-11-17 13:24:51 -08:00
Yann Collet
15768cabb5 fixed some complex scenarios
Fixed : multithreading to compress some small data with dictionary
Fixed : ZSTD_initCStream_usingCDict()
Improved streaming memory usage when pledgedSrcSize is known.
2017-11-16 15:18:18 -08:00
Yann Collet
05dffe43a7 Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.

Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.

It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.

Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.

Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.

- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
  Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
2017-11-16 12:18:56 -08:00
Yann Collet
4202b2e8a6 merged rep search into btMatchSearch
but there is a tree corruption somewhere ...
bug hunt ongoing
2017-11-14 20:38:52 -08:00
Yann Collet
9a11f70dc3 merged repcode search into BT match search
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)

It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).

warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
2017-11-13 02:23:48 -08:00
Yann Collet
100d8ad6be lib/compress: created ZSTD_LLcode() and ZSTD_MLcode()
transform length into code.
Since transformation is needed in several places throughout the code,
better write the logic in one place.
2017-11-08 12:43:05 -08:00
Yann Collet
ee441d5d2b renamed zstd_compress.h into zstd_compress_internal.h
to emphasize the fact that all definitions it contains
must remain private, accross lib/compress modules.
2017-11-07 16:15:23 -08:00
Yann Collet
150354c5fe minor refactor
added some traces and assert
related to hunting a potential ubsan error in 32-bits more
(it ends up being a compiler-side issue : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82802).

Modified one pointer arithmetic expression for a more conformant way.
2017-11-01 16:57:48 -07:00
Yann Collet
428e8b3bf4 fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.

This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.

It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
2017-11-01 13:15:23 -07:00
Nick Terrell
86b8134cad [libzstd] Fix parameter selection for empty input
ZSTD_compress() and friends would treat an empty input as an unknown size
when selecting parameters. Thus, they would drastically overallocate the
context. Tell ZSTD_getParams() that the source size is 1 when it is empty.
2017-10-25 17:24:15 -07:00
Yann Collet
1ff8a8c109 Merge pull request #891 from facebook/contentSize
Content size
2017-10-17 17:24:51 -07:00
Yann Collet
13bfe885aa edited ZSTD_initCStream_advanced() comment 2017-10-16 14:06:22 -07:00
Nick Terrell
7f961ba6cd Don't allow default tables to repeat
It isn't useful in any case to repeat default tables.
Saves a few bytes on Silesia, since we don't trigger the dictionary
heuristic.

Before: 211988480 => 73651998 bytes
After:  211988480 => 73651721 bytes
2017-10-16 11:37:56 -07:00
Yann Collet
fc8d293460 dictionary compression use correct file size estimation
when determining compression parameters
to compress one file only.

For multiple files, it still "bets" that files are going to be small.

There was also a bug recently added in ZSTD_CCtx_loadDictionary_advanced()
making it incapable to use pledgedSrcSize to determine compression parameters.
2017-10-14 01:21:43 -07:00
Yann Collet
213ef3b510 fixed ZSTD_initCStream_advanced() behavior, which depends on contentSizeFlag,
and a stream fuzzer test, which was incorrect
(relied on 0 being unconditionnally transformed into `ZSTD_CONTENTSIZE_UNKNOWN`)
2017-10-13 19:01:58 -07:00
Yann Collet
3c1e3f8ec9 contentSizeFlag enabled by default would also fail for streaming and MT operations
fixed
2017-10-13 18:32:06 -07:00
Yann Collet
fb44516641 ensure fParams.contentSizeFlag starts at 1
such default was failing for ZSTD_compressBegin/ZSTD_compressContinue
fixed too
2017-10-13 17:39:13 -07:00
Yann Collet
dd18d73e7e fileio: content size is enabled by default 2017-10-13 16:32:18 -07:00
Nick Terrell
ced6e6189c Add DEBUGLOG() that prints FSE encoding types 2017-10-13 14:55:23 -07:00
Nick Terrell
24ac2dbd2a Fix invalid use of dictionary offcode table
Fixes #888.
2017-10-13 12:47:03 -07:00
Yann Collet
a9e5705077 minor code formatting
added a trace during sequence encoding
2017-10-13 02:36:16 -07:00
Nick Terrell
a86a7097ec Ensure dictionary Huff table can encode any symbol
* Ensure that the dictionary Huffman CTable has maxSymbolValue 255.
* Fix a stack buffer overflow during compression dictionary loading.
2017-10-03 13:22:13 -07:00
Yann Collet
004fd34fd9 Merge pull request #876 from facebook/srcSize
CLI Fix : srcSize written in frame headers when compressing multiple files
2017-10-02 15:02:05 -07:00
Nick Terrell
86e83e926f [libzstd] Set CLEVEL_CUSTOM correctly
In `ZSTD_compressBegin_advanced()`, `ZSTD_parameters` are used to set the
compression parameters, but the level didn't get set to `CLEVEL_CUSTOM`, so
`ZSTD_compressBlock()` used the wrong parameters when checking the source
size.
2017-10-02 13:43:30 -07:00
Yann Collet
6e930c13d1 Merge branch 'dev' into compressBound 2017-10-01 11:24:02 -07:00
Yann Collet
dc404119e5 ZSTD_adjustCParams_internal : minor optimization 2017-09-30 15:02:40 -07:00
Nick Terrell
c5d6dde502 Don't size -= 1 in ZSTD_adjustCParams()
The window size could end up too small if the source size is 2^n + 1.

Credit to OSS-Fuzz
2017-09-30 14:20:06 -07:00
Yann Collet
5b10345b26 added ZSTD_COMPRESSBOUND() as a macro
ZSTD_compressBound() works fine, but is only useful for dynamic allocation.
For static allocation, only a macro can provide the amount during compilation time.
2017-09-29 23:17:41 -07:00
Yann Collet
8afb151c9b cli: fixed wrong initialization in MT mode
It's not good to mix old and new API
ZSTD_resetCStream() doesn't just set pledgedSrcSize :
it also sets the CCtx for a single thread compression.

Problem is, when 2+ threads are defined in cctx->requestedParams,
ZSTD_compress_generic() will want to start MT compression,
since initialization is supposed to have already happened (thanks to ZSTD_resetCStream())
except that the underlying ZSTDMT_CCtx* object is not created,
resulting in a segfault.

This is an invalid construction
(correct one is to use ZSTD_CCtx_setPledgedSrcSize()).
I haven't found a nice way to mitigate this impact if someone makes the same mistake.

At some point, removing the old API to keep only the new API within fileio.c will limit these risks.
2017-09-29 22:14:37 -07:00
Yann Collet
fbd5ab7027 minor fix : no longer use fake srcSize during resource creation
srcSize is read and provided at each file, not at resource creation.
This used to be useful with older API, because it could not re-adapt parameters between sessions.

At some point, it will be better to remove the old code, and only keep the new_api.
It works fine by now.
2017-09-29 19:40:27 -07:00
Yann Collet
db1668a43b fix : srcSize written in frame header when multiple files compressed
This information used to be disabled when nbFiles>1.
It was badly initialized later in the code, resulting in an error.
2017-09-29 18:05:18 -07:00
Yann Collet
86b4fe5b45 adjustCParams : restored previous behavior
unknowns srcSize presumed small if there is a dictionary (dictSize>0)
and presumed large otherwise.
2017-09-28 18:14:28 -07:00
Yann Collet
e4ec427720 Merge branch 'dev' into shorterTests
fixed conflicts
2017-09-28 12:19:28 -07:00
Yann Collet
8074261d00 zstdmt : move on when not enough memory for a new input buffer
just continue operations without input forward progress,
instead of an error that stops current compression session.
2017-09-28 11:46:19 -07:00
Yann Collet
2cd15dd9a4 fixed minor Visual conversion warning 2017-09-28 02:33:41 -07:00
Yann Collet
9b5b47ac93 ensure adjustCParams adjust hLog and cLog even without srcSize
It would previously exit when srcSize is unknown.
But in the case of custom parameters,
hLog and cLog can still be too large in comparison with windowLog.

Reduces maximum memory allocated during zstreamtest --newapi
2017-09-28 01:25:40 -07:00