Commit Graph

272 Commits

Author SHA1 Message Date
Bimba Shrestha
db5124ef6e More void* issues. Just replacing with BYTE* 2019-12-13 16:24:49 -08:00
Bimba Shrestha
49b2bf7106 'void* size issue' fix 2019-12-13 16:06:57 -08:00
Bimba Shrestha
e3cd2785e2 Add test to catch too many noCompress superblocks on streaming 2019-12-13 15:31:29 -08:00
Bimba Shrestha
826b555463
Merge branch 'dev' into oss 2019-11-22 17:29:33 -08:00
Bimba Shrestha
707a12c419 Test enough room for checksum in superblock 2019-11-22 17:25:36 -08:00
Nick Terrell
659e9f05cf Fix null pointer addition 2019-11-20 18:36:04 -08:00
Nick Terrell
a839d6852c
Merge pull request #1888 from senhuang42/superblocks_fixed
RLE test and re-enable RLE in main compression loop
2019-11-18 16:09:33 -08:00
Sen Huang
bc3e21578d No margin on RLE test size check 2019-11-18 16:39:16 -05:00
Sen Huang
db8efbfe7d Updated comment to reflect actual compression behavior 2019-11-15 16:11:14 -05:00
Sen Huang
75c34684c0 Modified existing RLE test to take compressed size into account 2019-11-15 12:26:48 -05:00
Yann Collet
d67742bc5d
Merge pull request #1858 from senhuang42/dictionary_header_size
Method to get dictionary header size
2019-11-14 09:44:07 -08:00
Sen Huang
97b7f712f3 Change to heap allocation, remove implicit type conversion 2019-11-08 13:57:25 -05:00
Sen Huang
e1edc554a3 Added 2 unit tests: one for sanity, one for correctnesson fixed dict 2019-11-08 13:57:25 -05:00
Nick Terrell
8c474f9845 Fix parameter selection and adjustment with srcSize == 0 2019-11-07 08:58:43 -08:00
Nick Terrell
b1ec94e63c Fix ZSTD_f_zstd1_magicless for small data
* Fix `ZSTD_FRAMEHEADERSIZE_PREFIX` and `ZSTD_FRAMEHEADERSIZE_MIN` to
  take a `format` parameter, so it is impossible to get the wrong size.
* Fix the places that called `ZSTD_FRAMEHEADERSIZE_PREFIX` without
  taking the format into account, which is now impossible by design.
* Call `ZSTD_frameHeaderSize_internal()` with `dctx->format`.
* The added tests catch both bugs in `ZSTD_decompressFrame()`.

Fixes #1813.
2019-10-21 21:16:17 -07:00
Yann Collet
6323966e53 updated erroneous comments using ZSTD_dm_*
instead of the current ZSTD_dct_*,
reported by @nigeltao (#1822)
2019-10-16 16:14:04 -07:00
Yann Collet
fb77afc626
Merge pull request #1760 from bimbashrestha/extract_sequences_api
Adding api for extracting sequences from seqstore
2019-10-10 13:11:18 -07:00
Bimba Shrestha
36528b96c4 Manually moving instead of memcpy on decoder and using genBuffer() 2019-10-03 09:26:51 -07:00
Bimba Shrestha
b63a1e7ae5 Typo fix 2019-09-27 07:20:20 -07:00
Bimba Shrestha
91daee5c06 Fixing appveyor test 2019-09-26 16:21:57 -07:00
Bimba Shrestha
75b1286354 Fixing shortest failure 2019-09-26 16:07:34 -07:00
Bimba Shrestha
bb27472afc Adding more realistic test for get sequences 2019-09-26 15:38:31 -07:00
Bimba Shrestha
be0bebd24e Adding test and null check for malloc 2019-09-23 15:08:18 -07:00
W. Felix Handte
f7d9b36835 Update Comment on ZSTD_estimateCCtxSize() 2019-09-20 14:11:29 -04:00
Bimba Shrestha
3cacc0a30b Casting void pointer to ZSTD_Sequence pointer 2019-09-17 17:44:08 -07:00
Bimba Shrestha
5b038f128f Merge branch 'extract_sequences_api' of https://github.com/bimbashrestha/zstd into extract_sequences_api 2019-09-16 13:35:49 -07:00
Bimba Shrestha
1f93be0f6d Handling memory leak and potential side effect 2019-09-16 13:35:45 -07:00
Bimba Shrestha
a874435478
Merge branch 'dev' into extract_sequences_api 2019-09-16 13:29:59 -07:00
W. Felix Handte
194c542598 Fix Memory Leak in Test 2019-09-11 14:25:30 -04:00
W. Felix Handte
ff67c62458 Fix Compilation Error (uint32_t -> size_t) 2019-09-11 13:59:09 -04:00
W. Felix Handte
5707c8a9d5 Speed Up Test a Little 2019-09-11 13:23:59 -04:00
W. Felix Handte
ed4c2c60c3 Add Fuzzer Test Case for Index Reduction 2019-09-11 13:17:19 -04:00
Bimba Shrestha
9e7bb55e14 Addressing comments 2019-09-09 20:04:46 -07:00
Bimba Shrestha
5f8b0f6890 Changing api to get sequences across all blocks 2019-08-30 09:18:44 -07:00
Yann Collet
5198347382
Merge pull request #1744 from bimbashrestha/dev
Generate RLE blocks in the encoder
2019-08-29 15:19:10 -07:00
bimbashrestha
e5704bbfdf Added test for multiple blocks of zeros and fixed nit about comments 2019-08-28 08:32:34 -07:00
Ed Maste
b81d7cc6a0 remove extraneous doubled ;s 2019-08-15 21:17:06 -04:00
Yann Collet
0b0b83e8f3 fix test 122
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Yann Collet
efe8496755 minor test refactoring
just for clarity, for the currently failing unit test
2019-08-02 19:31:19 +02:00
Yann Collet
387e20d4f0 fixed minor conversion warning in datagen 2019-08-02 18:02:54 +02:00
Yann Collet
37f47e51a8 fixed datagen
to produce same content on both 32 and 64-bit platforms
by removing floating from literal table determination.

also : added checksum trace in compression control test,
so that it's easier to determine if test fails
as a consequence of compressing a different sample.
2019-08-02 17:34:53 +02:00
Yann Collet
d1927f0b39 regenerate sample to compress
to reduce chances of differences between 32 and 64-bit fuzzer tests
2019-08-02 15:31:00 +02:00
Yann Collet
5cf1b24aca fixed strategies greedy, lazy & lazy2
restore dictionary compression ratio
2019-08-02 14:21:39 +02:00
Yann Collet
2115292616 minor : fixed ptr arithmetic
invalid on void ptr
2019-08-01 17:12:26 +02:00
Yann Collet
810a9cac08 added efficiency test
to detect gross CR variations after a patch.

Tests normal and dictionary compression.
2019-08-01 16:59:22 +02:00
Yann Collet
98692c2838 fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3 2019-08-01 15:58:17 +02:00
Tyler-Tran
c55d2e7ba3 Adding shrinking flag for cover and fastcover (#1656)
* Changed ERROR(GENERIC) excluding inits

* editing git ignore

* Edited init functions to size_t returns

* moved declarations earlier

* resolved issues with changes to init functions

* fixed style and an error check

* attempting to add tests that might trigger changes

* added && die to cases expecting to fail

* resolved no die on expected failed command

* fixed accel to be incorrect value

* Adding an automated shrinking option

* Fixing build

* finalizing fixes

* fix?

* Removing added comment in cover.h

* Styling fixes

* Merging with fb dev

* removing megic number for default regression

* Requested revisions

* fixing support for fast cover

* fixing casting errors

* parenthesis fix

* fixing some build nits

* resolving travis ci syntax

* might resolve all compilation issues

* removed unused variable

* remodeling the selectDict function

* fixing bad memory access

* fixing error checks

* fixed erroring check in selectDict

* fixing mixed declarations

* modify mixed declaration

* fixing nits and adding test cases

* Adding requested changes + fixed bug for error checking

* switched double comparison from != to <

* fixed declaration typing

* refactoring COVER_best_finish() and changing shrinkDict

* removing the const's

* modifying ZDICT_optimizeTrainFromBuffer_cover functions

* fixing potential bad memcpy

* fixing the error function for dict size
2019-06-27 16:26:57 -07:00
Yann Collet
ed38b645db fullbench: pass proper parameters in scenario 43 2019-05-29 15:26:06 -07:00
Yann Collet
4baecdf72a added comments to better understand enforceMaxDist() 2019-05-28 13:15:48 -07:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Yann Collet
8ac2831f3d
Merge pull request #1581 from facebook/benchfn
benchfn's reduced dependencies
2019-04-11 14:23:04 -07:00
Nick Terrell
50b9c41196 [libzstd] Fix decompression dictionary bugs and clean up initialization
Bugs:

* `ZSTD_DCtx_refPrefix()` didn't clear the dictionary after the first
  use. Fix and add a test case.
* `ZSTD_DCtx_reset()` always cleared the dictionary. Fix and add a test
  case.
* After calling `ZSTD_resetDStream()` you could no longer load a
  dictionary, since the stage was set to `zdss_loadHeader`. Fix and add
  a test case.

Cleanup:

* Make `ZSTD_initDStream*()` and `ZSTD_resetDStream()` wrap the new
 advanced API, and add test cases.
* Document the equivalent of these functions in the advanced API and
  document the unstable functions as deprecated.
2019-04-10 12:59:02 -07:00
Yann Collet
59a7116cc2 benchfn dependencies reduced to only timefn
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.

Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.

Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
2019-04-10 12:37:03 -07:00
Nick Terrell
824aaa695f [libzstd] Fix ZSTD_decompressDCtx() with a dictionary
* `ZSTD_decompressDCtx()` did not use the dictionary loaded by
  `ZSTD_DCtx_loadDictionary()`.
* Add a unit test.
* A stacked diff uses `ZSTD_decompressDCtx()` in the
  `dictionary_round_trip` and `dictionary_decompress` fuzzers.
2019-04-09 17:59:27 -07:00
Nick Terrell
48a6427d22 [libzstd] Fix ZSTD_compress2() for multithreaded compression
`ZSTD_compress2()` wouldn't wait for multithreaded compression to
finish. We didn't find this because ZSTDMT will block when it can
compress all in one go, but it can't do that if it doesn't have enough
output space, or if `ZSTD_c_rsyncable` is enabled.

Since we will already sometimes block when using `ZSTD_e_end`, I've
changed `ZSTD_e_end` and `ZSTD_e_flush` to guarantee maximum forward
progress. This simplifies the API, and helps users avoid the easy bug
that was made in `ZSTD_compress2()`

* Found by the libfuzzer fuzzers.
* Added a test case that catches the problem.
* I will make the fuzzers sometimes allocate less than
  `ZSTD_compressBound()` output space.
2019-04-09 16:24:17 -07:00
Nick Terrell
6b053b9f60 [lib] Allow ZSTD_CCtx_loadDictionary() to be called before parameters are set
* After loading a dictionary only create the cdict once we've started the
  compression job. This allows the user to pass the dictionary before they
  set other settings, and is in line with the rest of the API.
* Add tests that mix the 3 dictionary loading APIs.
* Add extra tests for `ZSTD_CCtx_loadDictionary()`.
* The first 2 tests added fail before this patch.
* Run the regression test suite.
2019-03-21 16:13:53 -07:00
Nick Terrell
f52a7d8faa
Merge pull request #1547 from shakeelrao/fix-error
Fix incorrect error code in ZSTD_errorFrameSizeInfo
2019-03-15 10:57:49 -07:00
Nick Terrell
787b76904a [libzstd] Allow compression parameters to be set with a cdict
The order you set parameters in the advanced API is not supposed to matter.
However, once you call `ZSTD_CCtx_refCDict()` the compression parameters
cannot be changed. Remove that restriction, and document what parameters
are used when using a CDict.

If the CCtx is in dictionary mode, then the CDict's parameters are used.
If the CCtx is not in dictionary mode, then its requested parameters are
used.
2019-03-13 16:10:05 -07:00
shakeelrao
18d3a97d43 Add unit test to validate the error case 2019-03-13 01:43:40 -07:00
shakeelrao
95dfd48143 update formatting 2019-03-01 23:11:15 -08:00
shakeelrao
3da3dc2f45 add missing size content test 2019-03-01 21:27:30 -08:00
shakeelrao
03026c3b1d change compressedBound to ULL 2019-03-01 00:03:50 -08:00
shakeelrao
8930c3c79b implement API-level changes 2019-02-28 22:55:18 -08:00
shakeelrao
d0a3f25697 change return type to ULL 2019-02-28 01:52:01 -08:00
shakeelrao
820af1e078 Provide an API function to estimate decompressed size.
Introduces a new utility function `ZSTD_findFrameCompressedSize_internal` which
is equivalent to `ZSTD_findFrameCompressSize`, but accepts an additional output
parameter `bound` that computes an upper-bound for the compressed data in the frame.

The new API function is named `ZSTD_decompressBound` to be consistent with
`zstd_compressBound` (the inverse operation). Clients will now be able to compute an upper-bound for
their compressed payloads instead of guessing a large size.

Implements https://github.com/facebook/zstd/issues/1536.
2019-02-28 00:42:49 -08:00
Nick Terrell
7ad7ba3178 [libzstd] Rename ZSTD_CCtxParam_* to ZSTD_CCtxParams_* 2019-02-19 17:44:52 -08:00
Nick Terrell
6efce7c9ca [fuzzer] Add test cases 2019-02-19 13:22:44 -08:00
Yann Collet
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
2898afab52 fixed OSSfuzz 11849
The problem was already masked,
due to no longer accepting tiny blocks for statistics.

But in case it could still happen with not-so-tiny blocks,
there is a stricter control which ensures that
nothing was already loaded prior to statistics collection.
2018-12-19 16:54:15 -08:00
Yann Collet
78c4ea4930 added tests case 2018-12-19 14:10:27 -08:00
Nick Terrell
d7def456d8 [libzstd] Fix estimate with negative levels
* Fix `ZSTD_estimateCCtxSize()` with negative levels.
* Fix `ZSTD_estimateCStreamSize()` with negative levels.
* Add a unit test to test for this error.
2018-12-18 14:24:49 -08:00
Nick Terrell
aaea4ef924 [libzstd] Fix infinite loop in decompression
When we switched `ZSTD_SKIPPABLEHEADERSIZE` to a macro, the places where we do:

    MEM_readLE32(ptr) + ZSTD_SKIPPABLEHEADERSIZE

can now overflow `(unsigned)-8` to `0` and we infinite loop. We now check
the frame size and reject sizes that overflow a U32.

Note that this bug never made it into a release, and was only in the dev branch
for a few days.

Credit to OSS-Fuzz
2018-12-13 15:13:19 -08:00
Yann Collet
3583d19c4e changed parameter names from ZSTD_p_* to ZSTD_c_*
for naming consistency
2018-12-05 17:26:02 -08:00
Yann Collet
3e042d5cc0 ZSTD_decompressDCtx() is compatible with sticky parameters 2018-12-04 17:30:58 -08:00
Yann Collet
d7da3fc90a merge dedicated dParam setters 2018-12-04 17:06:48 -08:00
Yann Collet
2fb8d1a392 fixed declaration-after-statement warnings 2018-12-04 15:54:01 -08:00
Yann Collet
aec945f0dc implemented ZSTD_dParam_getBounds()
and ZSTD_DCtx_setParameter()
2018-12-04 15:35:37 -08:00
Yann Collet
34e146f548 advanced decompression function replaces by normal streaming one
advanced parameters compatible with ZSTD_decompressStream().
2018-12-04 10:28:36 -08:00
Yann Collet
6ced8f7c7c joined normal streaming API with advanced one 2018-12-03 14:22:38 -08:00
Yann Collet
d8e215cbee created ZSTD_compress2() and ZSTD_compressStream2()
ZSTD_compress_generic() is renamed ZSTD_compressStream2().

Note that, for the time being,
the "stable" API and advanced one use different parameter planes :
setting parameters using the advanced API does not influence ZSTD_compressStream()
and using ZSTD_initCStream() does not influence parameters for ZSTD_compressStream2().
2018-11-30 11:25:56 -08:00
Yann Collet
d4d4e109e9 getParameter fills an int*
rather than an unsigned*
for consistency
since type of setParameter() changed to int.
2018-11-21 15:37:26 -08:00
Yann Collet
3b838abf97 ZSTD_CCtx_setParameter : value argument is now int
for compatibility with compression level
2018-11-20 11:53:01 -08:00
Yann Collet
5c68639186 updated ZSTD_DCtx_reset()
signature and behavior is now the same as ZSTD_CCtx_reset()
2018-11-15 16:12:39 -08:00
Yann Collet
7b0391e37e finalized retrofit of ZSTD_CCtx_reset()
updated all depending sources
2018-11-14 13:05:35 -08:00
Yann Collet
b83d1e7714 removed some static const variables
and replaced by traditional macro constants.

Unfortunately, C doesn't consider `static const` to mean "constant"
2018-11-13 16:56:32 -08:00
Nick Terrell
103d1ee7a4 Add multithreaded dictbuilder tests to fuzzer.c 2018-11-08 10:58:51 -08:00
Yann Collet
3e5cdf1b6a fixed T36302429 2018-11-05 17:50:30 -08:00
Yann Collet
806a5c84e4 support decompressing an empty frame into NULL
fix #1385
decompressing into NULL was an automatic error.
It is now allowed, as long as the content of the frame is empty.

Seems to simplify things for `arrow`.
Maybe some other projects rely on this behavior ?
2018-10-24 16:34:35 -07:00
Nick Terrell
f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
Yann Collet
f98c69d77c fix : huge (>4GB) stream of blocks
experimental function ZSTD_compressBlock() is designed for very small data in mind,
for situation where saving the ~12 bytes of frame header can actually make a difference.

Some systems though may have to deal with small and large data entangled.
If it's larger than a block (> 128KB), compressBlock() cannot compress them in one round.

That's why it's possible to compress in multiple rounds.
This is a chain of compressed blocks.

Some users push this capability to the limit, encoding gigantic chain of blocks.
On crossing the 4GB limit, some internal overflow occurs.

This fix moves the overflow correction mechanism higher in the call chain,
so that it's applied also to gigantic chains of blocks.

Added a test case in fuzzer.c, which crashes before the fix, and pass now.
2018-09-26 14:24:28 -07:00
Yann Collet
a54c86cfc6 defined a minimum negative level
which can be probed using new function ZSTD_minCLevel().

Also : redefined ZSTD_TARGETLENGTH_MIN/MAX for consistency

used the opportunity to bump version number to v1.3.6
2018-09-20 16:52:03 -07:00
Nick Terrell
e984d01912 Small test fixes 2018-08-28 13:42:01 -07:00
Nick Terrell
5a4e6c9f3d [fuzzer] Test growing the seqStore_t 2018-08-28 13:20:37 -07:00
Yann Collet
3692c31598 Merge branch 'dev' into scanbuild 2018-08-15 13:50:49 -07:00
Yann Collet
5808027abf Merge branch 'dev' into fix1241 2018-08-03 16:08:33 -07:00
Nick Terrell
b9faaa1dc3 [FSE] Add division by zero test 2018-07-30 13:24:09 -07:00
cyan4973
7d1bc9cc8c fix minor conversion warning 2018-07-18 16:10:23 +02:00
cyan4973
9597b438e9 fix #1241
Ensure that first input position is valid for a match
even during first usage of context
by starting reference at 1
(avoiding the problematic 0).
2018-07-17 18:52:57 +02:00
Yann Collet
31769ce702 error on no forward progress
streaming decoders, such as ZSTD_decompressStream() or ZSTD_decompress_generic(),
may end up making no forward progress,
(aka no byte read from input __and__ no byte written to output),
due to unusual parameters conditions,
such as providing an output buffer already full.

In such case, the caller may be caught in an infinite loop,
calling the streaming decompression function again and again,
without making any progress.

This version detects such situation, and generates an error instead :
ZSTD_error_dstSize_tooSmall when output buffer is full,
ZSTD_error_srcSize_wrong when input buffer is empty.

The detection tolerates a number of attempts before triggering an error,
controlled by ZSTD_NO_FORWARD_PROGRESS_MAX macro constant,
which is set to 16 by default, and can be re-defined at compilation time.
This behavior tolerates potentially existing implementations
where such cases happen sporadically, like once or twice,
which is not dangerous (only infinite loops are),
without generating an error, hence without breaking these implementations.
2018-06-22 17:58:21 -07:00
Yann Collet
56961e4ced fixed minor conversion warning 2018-06-07 16:59:33 -07:00