Commit Graph

194 Commits

Author SHA1 Message Date
Yann Collet
314c7df170 minor : change test order
to reduce a warning with `clang` on Windows
2020-10-16 13:26:47 -07:00
Nick Terrell
614e446000
Merge pull request #2271 from terrelln/small-blocks
Small block optimizations
2020-08-24 18:54:33 -07:00
Nick Terrell
575731b6db Use ncount=1 when < 4096 symbols 2020-08-18 16:47:53 -07:00
Carl Woffenden
5d81d44e40 Fixed VS variable shadowing warning (and added test) 2020-07-29 12:33:39 +02:00
yoshihitoh
c6548eac8e Rename static vars to avoid redefinition error. 2020-06-29 10:51:50 +09:00
Nick Terrell
081691a3aa
Merge pull request #2217 from terrelln/cover-redundant
[cover] Remove unnecessary mask and dedup hash functions
2020-06-22 17:41:13 -07:00
Nick Terrell
370933fa20
Merge pull request #2209 from Niadb/dev
Explicitly use __cdecl for qsort, to avoid warning when default calling convention is not __cdecl
2020-06-22 17:41:03 -07:00
Nick Terrell
2312b819af [cover] Remove unnecessary mask and dedup hash functions
* Remove the unnecessary mask, since `ZSTD_hash*()` already ensures
  the output is mod 2^h.
* Dedup the hash functions and use `ZSTD_hash*()` directly.
2020-06-22 12:52:13 -07:00
Carl Woffenden
38cdb6a072 Renamed cover and fast cover hash functions/vars 2020-06-22 11:54:24 +02:00
Carl Woffenden
4a9b7d136f Initial implementation (files added, macros fixed)
Hashing functions still to fix.
2020-06-22 10:31:36 +02:00
Niadb
405586d40a
Add files via upload 2020-06-19 03:32:11 -06:00
Nick Terrell
08981d2638 [lib] Allow compression dictionaries with missing symbols
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.

Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.

Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.

Fixes #2174.
2020-06-12 17:57:19 -07:00
Nick Terrell
45c66dd298 [zdict] Stabilize ZDICT_finalizeDictionary() 2020-05-07 10:37:01 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Sen Huang
c85d10d0ea Remove mixed declarations 2019-11-08 13:57:26 -05:00
Sen Huang
d9c475f3b3 Fix static analyze error, use proper bounds for dictEnd 2019-11-08 13:57:26 -05:00
Sen Huang
d06b90692b Move asserts to loadZstdDictionary() 2019-11-08 13:57:26 -05:00
Sen Huang
b39149e156 Expose ZSTD_reset_compressedBlockState() to shared API 2019-11-08 13:57:26 -05:00
Sen Huang
6ce335371b Add error forwarding to loadCEntropy(), make check for dictSize >= 8 from bad merge 2019-11-08 13:57:26 -05:00
Sen Huang
4a61aaf368 Remove redundant comment 2019-11-08 13:57:26 -05:00
Sen Huang
c787b351ea Use ZSTD Error codes, improve explanation of ZSTD_loadCEntropy() and ZSTD_loadDEntropy() 2019-11-08 13:57:26 -05:00
Sen Huang
04fb42b4f3 Integrated refactor into getDictHeaderSize, now passes tests 2019-11-08 13:57:26 -05:00
Sen Huang
4b141b63e0 Revert "Move decompress symbols into zstd_internal.h, remove dependency"
This reverts commit a152b4c67a5266f611db4a2eac4a79003852a795.
2019-11-08 13:57:26 -05:00
Sen Huang
84404cff6e Move decompress symbols into zstd_internal.h, remove dependency 2019-11-08 13:57:26 -05:00
Sen Huang
341e0641ed Checks malloc() for failure, returns 0 if so 2019-11-08 13:57:26 -05:00
Sen Huang
97b7f712f3 Change to heap allocation, remove implicit type conversion 2019-11-08 13:57:25 -05:00
Sen Huang
3c36a7f13a Add ZDICT_getHeaderSize() 2019-11-08 13:57:08 -05:00
moozzyk
eda7946a36 Take ZSTD_parameters as a const pointer
Fixes: #1637
2019-10-22 23:21:54 -07:00
Yann Collet
2b0a271ed2 fix eductional decoder
fix #1774
also :
- fix minor compilation warnings
- make sure the `test` is run during CI tests
2019-09-06 14:30:13 -07:00
Nick Terrell
0932de54bc [dictBuilder] Fix deadlock in *COVER error case
The COVER and FASTCOVER dictionary builders can deadlock when
dictionary construction errors, likely because there are too few
samples, or too few distinct dmers. The deadlock only occurs when
there are errors.

Fixes #1746.
2019-08-26 18:19:29 -07:00
Ed Maste
b81d7cc6a0 remove extraneous doubled ;s 2019-08-15 21:17:06 -04:00
Tyler-Tran
c55d2e7ba3 Adding shrinking flag for cover and fastcover (#1656)
* Changed ERROR(GENERIC) excluding inits

* editing git ignore

* Edited init functions to size_t returns

* moved declarations earlier

* resolved issues with changes to init functions

* fixed style and an error check

* attempting to add tests that might trigger changes

* added && die to cases expecting to fail

* resolved no die on expected failed command

* fixed accel to be incorrect value

* Adding an automated shrinking option

* Fixing build

* finalizing fixes

* fix?

* Removing added comment in cover.h

* Styling fixes

* Merging with fb dev

* removing megic number for default regression

* Requested revisions

* fixing support for fast cover

* fixing casting errors

* parenthesis fix

* fixing some build nits

* resolving travis ci syntax

* might resolve all compilation issues

* removed unused variable

* remodeling the selectDict function

* fixing bad memory access

* fixing error checks

* fixed erroring check in selectDict

* fixing mixed declarations

* modify mixed declaration

* fixing nits and adding test cases

* Adding requested changes + fixed bug for error checking

* switched double comparison from != to <

* fixed declaration typing

* refactoring COVER_best_finish() and changing shrinkDict

* removing the const's

* modifying ZDICT_optimizeTrainFromBuffer_cover functions

* fixing potential bad memcpy

* fixing the error function for dict size
2019-06-27 16:26:57 -07:00
Tyler-Tran
cb47871a0a [dictBuilder] Be more specific than ERROR(generic) (#1616)
* Specify errors at a finer granularity than `ERROR(generic)`.
* Add tests for bad parameters in the dictionary builder.
2019-05-22 18:57:50 -07:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
e649fad7aa [dictBuilder] Fix displayLevel for corpus warning
Pass the displaylevel into the corpus warning, because it is used in
fast cover and cover, so it needs to respect the local level.
2019-04-08 20:00:18 -07:00
Nick Terrell
d0f5ba36fb [cover] Improvements for small or homogeneous data
* The algorithm would bail as soon as it found one epoch that
  contained no new segments. Change it so it now has to fail
  >= 10 times in a row (10 for fastcover, 10-100 for cover).
* The algorithm uses the `maxDict` size to decide the epoch size.
  When this size is absurdly large, it causes tiny epochs. Lower
  bound the epoch size at 10x the segment size, and warn the user
  that their training set is too small.

Fixes #1554
2019-03-22 14:14:46 -07:00
Nick Terrell
21616d8a77 [zdict] Improve documentation 2019-02-01 15:19:32 -08:00
Yann Collet
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
ffba142406 fixed file identity detection in 32-bit mode
also :
some library decided to use `index` as a global variable declared in standard header
shadowing the ones used in fastcover.c  :(
2018-12-20 14:30:30 -08:00
Yann Collet
2e7fd6a2cb fixed remaining searchLength invocations 2018-11-20 15:13:27 -08:00
Bartosz Szreder
5c5c476338 Prevent deadlock on malloc() failure. 2018-11-08 10:29:31 +01:00
Yann Collet
3ca6261223 fixed static analyzer warnings
note : for some reason,
scan-build version on my laptop found problems within fastcover.c
that scan-build on travisCI does not flag.

They are, as usual, false positive :
the analyzer does not understand that a table (`offset`) is correctly filled before usage.
2018-10-02 15:59:11 -07:00
Nick Terrell
f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
Yann Collet
50b216146f
Merge pull request #1304 from facebook/largeNbDicts
contrib/largeNbDicts
2018-09-06 09:50:56 -07:00
Jennifer Liu
21721b75a3 Change default f to 20 2018-09-04 17:15:14 -07:00
Jennifer Liu
944c9986e0 Update comment on default steps of cover and fastcover 2018-08-30 15:37:29 -07:00
Jennifer Liu
16db0337b1 Always use splitPoint=1.0 for non-optimize cover and fastcover 2018-08-30 14:59:22 -07:00
modbw
d14edf259f
Fixed memory leak detected by cppcheck
cppcheck (which is run regularly in our CI environment)  detected a possible memory leak.
2018-08-28 07:25:05 +02:00
Yann Collet
6782725155 first sketch for largeNbDicts test program 2018-08-26 19:29:12 -07:00