Commit Graph

967 Commits

Author SHA1 Message Date
Yann Collet
8883af6a1e
Merge pull request #1327 from facebook/adapt
Adaptive compression
2018-09-26 14:39:08 -07:00
Yann Collet
f98c69d77c fix : huge (>4GB) stream of blocks
experimental function ZSTD_compressBlock() is designed for very small data in mind,
for situation where saving the ~12 bytes of frame header can actually make a difference.

Some systems though may have to deal with small and large data entangled.
If it's larger than a block (> 128KB), compressBlock() cannot compress them in one round.

That's why it's possible to compress in multiple rounds.
This is a chain of compressed blocks.

Some users push this capability to the limit, encoding gigantic chain of blocks.
On crossing the 4GB limit, some internal overflow occurs.

This fix moves the overflow correction mechanism higher in the call chain,
so that it's applied also to gigantic chains of blocks.

Added a test case in fuzzer.c, which crashes before the fix, and pass now.
2018-09-26 14:24:28 -07:00
Yann Collet
65ed6eeefb
Merge pull request #1337 from facebook/test_failure
fixed `!` tests
2018-09-26 13:40:20 -07:00
Yann Collet
8ff17a6a09
Merge pull request #1329 from facebook/v04isout
Changed default legacy support to v0.5+
2018-09-26 13:39:05 -07:00
Nick Terrell
3ff6040848 Publish artifacts with CircleCI
* Updates CircleCI to use workflows.
  We can now specify any number of test jobs to run in parallel.
* Switch the image to `buildpack-deps:trusty` which is only 500 MB
  instead of 7 GB, so that saves 7 minutes to download it if it isn't
  already cached on the host.
* Publish the source tarball and sha256sum as artifacts.
* If the `GITHUB_TOKEN` environment variable is set, we will also
  add the tarball + sha256sum to the tagged release, after manual
  approval.
2018-09-26 13:23:28 -07:00
Yann Collet
6c51bf420c bounds for --adapt mode
can supply min and max compression level through advanced command :
--adapt=min=#,max=#
2018-09-25 16:03:28 -07:00
Yann Collet
63abaf2171 fixed ! tests
Sometimes, it's necessary to test that a certain command fail, as expected.
Such failure is actually a success, and must not stop the flow of tests.

Several tests were prefixed with `!` to invert return code.
This does not work : it effectively makes the tests pass no matter what.

Use instead function die(), which is meant to trap successes, and transform them into errors.
2018-09-25 15:57:28 -07:00
Yann Collet
04f47bbdd2 Merge branch 'dev' into adapt 2018-09-24 16:56:45 -07:00
Yann Collet
0e211bdd18 fixed constant comparison on 32-bits systems 2018-09-21 18:23:32 -07:00
Yann Collet
484f40697b fix constant redeclaration in paramgrill 2018-09-21 17:28:37 -07:00
Yann Collet
a54c86cfc6 defined a minimum negative level
which can be probed using new function ZSTD_minCLevel().

Also : redefined ZSTD_TARGETLENGTH_MIN/MAX for consistency

used the opportunity to bump version number to v1.3.6
2018-09-20 16:52:03 -07:00
Yann Collet
db97310ace fixed versions-test to only test v0.5+
since zstd_devel is no longer compatible with v0.4+
2018-09-20 14:59:11 -07:00
Yann Collet
15519479ba fixed minor gcc warning on a unused variable 2018-09-20 13:00:11 -07:00
Yann Collet
45010da074 updated man page
and added `--adapt` test in `playTests.sh`
2018-09-19 17:37:22 -07:00
Yann Collet
2f78228f65 Merge branch 'dev' into adapt 2018-09-19 12:43:42 -07:00
Björn Ketelaars
06fd1e473d 'head -c BYTES' is non-portable.
tests/playTests.sh uses 'head -c' in a couple of tests to truncate the
last byte of a file. The '-c' option is non-portable (not in POSIX).
Instead use a wrapper around dd (truncateLastByte).
2018-09-17 20:39:35 +02:00
Yann Collet
d195eec97e fixed msan error
cold dictionary is detected through a comparison with dictEnd,
which was not initialized at the beginning of first DCtx usage.
2018-09-13 12:29:52 -07:00
Yann Collet
31ebb26945
Merge pull request #1301 from terrelln/lit-size
[zstd] Fix seqStore growth
2018-08-28 17:10:25 -07:00
Nick Terrell
e3b5286197 Fix decodecorpus 2018-08-28 13:56:47 -07:00
Nick Terrell
e984d01912 Small test fixes 2018-08-28 13:42:01 -07:00
Nick Terrell
5a4e6c9f3d [fuzzer] Test growing the seqStore_t 2018-08-28 13:20:37 -07:00
Yann Collet
b37a0a6bde
Merge pull request #1298 from facebook/bench
Refactored bench.c
2018-08-28 12:25:02 -07:00
Yann Collet
55affc09de timedFn : measurement delay is programmable
instead of hard-coded 1 second per measurement
2018-08-28 11:26:27 -07:00
Yann Collet
0ff9b67552 paramgrill: removed useless tests
designed to compensate iter_mode,
but since only time_mode is available now,
all tests are guaranteed to last a minimum amount of time.
2018-08-27 19:07:17 -07:00
Yann Collet
9e26893e07 paramgrill: fixed a bunch of div-by-zero
they were pretty easy to trigger by the way,
just start an extended paramgrill session
to find a compression table based on any sample,
it would necessarily happen at some point.
2018-08-27 18:47:09 -07:00
Yann Collet
0071e8348f restored assert() in paramgrill
assert() in paramgrill are not in the benchmark path.
They should remain active, as they don't impact measurements, and their runtime is insignificant.
2018-08-27 17:52:04 -07:00
Yann Collet
01dcd0fd17 bench: minor api update, for consistency
BMK_benchTimedFn()
BMK_isCompleted_TimedFn() uses TimedFnState
2018-08-26 21:30:18 -07:00
Yann Collet
c3a4baaf6e fixed minor warnings
valgrind: memory leak of a few bytes in fullbench
static analyzer: uninitialized data passed as result
2018-08-24 23:25:35 -07:00
Yann Collet
af23d39eb8
Merge pull request #1297 from felixhandte/check-offset-table
Fix Missing Offset Table Check
2018-08-24 17:36:44 -07:00
Yann Collet
2279f3d127 bench: reduce nb of return type
runOutcome is enough
removed timedFnOutcome
2018-08-24 17:28:38 -07:00
W. Felix Handte
db4c8d05b3 Add Failing Test 2018-08-24 14:30:21 -07:00
Yann Collet
7b23cc4d1e fixed fullbench behavior
now same as v1.3.5
2018-08-24 12:40:10 -07:00
Yann Collet
4da5bdf482 fixed zstd -b speed result
the benchmark was displaying the speed of last run
instead of the best of all previous runs.
2018-08-23 18:13:49 -07:00
Nick Terrell
3b56bb1e4c Fix decodecorpus 2018-08-23 17:48:06 -07:00
Yann Collet
b0e1f3982d fixed paramgrill
to work with new bench.c
2018-08-23 17:21:38 -07:00
Yann Collet
1f9ec13621 introduced MB_UNIT
so that all benchmarking programs use the same speed scale
2018-08-23 16:03:30 -07:00
Yann Collet
d39a25c5ed update fullbench.c to work with new bench.h 2018-08-23 15:00:09 -07:00
Jennifer Liu
9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
Yann Collet
77e805e3db bench: changed creation/reset function to timedFnState
for consistency
2018-08-21 18:19:27 -07:00
Yann Collet
801e3bcd97
Merge pull request #1290 from edenzik/ezik/1119-safe-strcpy-in-fileio
Fixed unsafe string copy and concat in `fileio.c`.
2018-08-21 13:18:44 -07:00
Eden Zik
78af534f82 Fixed unsafe string copy and concat in fileio.c.
Per warnings from flawfinder: "Does not check for buffer overflows when
copying to destination [MS-banned] (CWE-120). Consider using snprintf,
strcpy_s, or strlcpy (warning: strncpy easily misused).".

Replaced called to strcpy and strcat in `fileio.c` to calls with a
specified size (`strncpy` and `strncat`).

Tested the changes on OSX, Linux, Windows.
On OSX + Linux, changes were tested with ASAN. The following flags were
used: 'check_initialization_order=1:strict_init_order=1:detect_odr_violation=1:detect_stack_use_after_return=1'

To reproduce warning:
./flawfinder.py ./programs/fileio.c
2018-08-20 22:15:24 -04:00
Yann Collet
ea0b5fc193
Merge pull request #1285 from facebook/scanbuild
static analyzer tests
2018-08-17 16:38:41 -07:00
Yann Collet
b4e7f71055 Merge branch 'dev' into adapt 2018-08-17 15:54:13 -07:00
George Lu
3959ba15e6 Clarify README 2018-08-16 17:22:29 -07:00
George Lu
8175b28f03 Fix negative lvl display value
Also fix synthetic benchmark parameter setting
2018-08-16 16:46:37 -07:00
George Lu
239e114d62 prune comments 2018-08-15 16:04:34 -07:00
George Lu
8a296d3e1f Move Stuff around
Group similar functions together, remove outdated comments
2018-08-15 16:04:34 -07:00
George Lu
3f8b10baa1 consts 2018-08-15 16:04:34 -07:00
George Lu
46be2ef5d8 Remove unused stuff 2018-08-15 16:04:34 -07:00
George Lu
b234870c33 clarify display README 2018-08-15 14:29:49 -07:00
George Lu
ee77ddc28d Fix wraparound 2018-08-15 14:01:32 -07:00
George Lu
1e8d352930 silencing params 2018-08-15 14:01:32 -07:00
George Lu
2c5fdae0ae Clean up repetitive display
Add documentation
2018-08-15 14:01:32 -07:00
George Lu
4d9c6f51b8 -q -v options 2018-08-15 14:01:32 -07:00
George Lu
3dcfe5cc2c begin display changes 2018-08-15 14:01:32 -07:00
Yann Collet
3692c31598 Merge branch 'dev' into scanbuild 2018-08-15 13:50:49 -07:00
George Lu
b1d9ca737a Add memoTable options
-hashing memotable
-no memotable
2018-08-15 10:19:38 -07:00
George Lu
8c918edd3a MAke it easier to add params
Make memoTable size limited
2018-08-14 16:15:46 -07:00
George Lu
96725989ef Temp fix perf regression 2018-08-14 16:14:37 -07:00
George Lu
3f2d024dca forceAttachDict 2018-08-14 14:24:41 -07:00
George Lu
e3c679484a Add Time Checks
Fix double -> U64 display
2018-08-14 14:24:41 -07:00
George Lu
88dda92285 Reduce Duplication
Change Defaults
Asserts actually disabled in paramgrill + fullbench
2018-08-14 14:24:41 -07:00
George Lu
f581ccd267 Doc Updates
Add option to pass in existing parameters in use
2018-08-14 14:24:41 -07:00
George Lu
76acba025d scan-build 2018-08-14 12:13:05 -07:00
George Lu
614aaa3ae1 rebase clevel 2018-08-14 10:53:04 -07:00
George Lu
3b36fe5c68 strategy switching 2018-08-13 16:36:14 -07:00
George Lu
d4730a4f66 Update fulltable to use same interface
Add seperateFiles flag
2018-08-13 16:15:52 -07:00
George Lu
43b4971ca8 Renames, Documentation Updates 2018-08-13 16:15:52 -07:00
George Lu
a884b76bc2 Style Changes
Add single run dictionaries
Change MB to be consistent 1 << 20 rather than 1,000,000
2018-08-13 16:15:52 -07:00
George Lu
b3544217b7 Cleanup 2018-08-13 16:15:52 -07:00
George Lu
8ff0de15e4 Generalize, macro magic numbers 2018-08-13 16:15:52 -07:00
George Lu
3a2e95eba4 Perf improvements
try decay
strategy selection skipping
2018-08-13 16:15:52 -07:00
George Lu
2bdfe6ca71 Better Display 2018-08-13 16:15:52 -07:00
George Lu
f67d040c39 Bugfixes, style changes
Complete euclidean distance climb
2018-08-13 16:15:52 -07:00
George Lu
5f4502fc07 New climb
feas part 2 uses euclidean metric
2018-08-13 16:15:52 -07:00
George Lu
13611249a5 Table
Compiling
+Euclidean Metric
2018-08-13 16:15:52 -07:00
George Lu
0cea754024 Revert "Reorder declaration"
This reverts commit 3ac2c22485.
2018-08-13 16:15:34 -07:00
George Lu
486e586eed Revert "Default lvl 1"
This reverts commit 0cc75d6ee0.
2018-08-13 16:13:46 -07:00
George Lu
0cc75d6ee0 Default lvl 1
MB to 2^20
2018-08-13 14:55:56 -07:00
Yann Collet
09c9cf3f51 simplified rateLimiter
resists better to changing in/out conditions
limits risks of "catching up"
2018-08-13 12:13:47 -07:00
Yann Collet
e11f91b039 remove error message for Ctrl+C 2018-08-13 11:48:25 -07:00
Yann Collet
f3aa510738 rateLimiter does not "catch up" when input speed is slow 2018-08-13 11:38:55 -07:00
Yann Collet
a996b1fd2d fixed rate limited for high speed 2018-08-10 17:39:00 -07:00
Yann Collet
681a382eea added rateLimiter.py, by @felixhandte
this rate limiter avoid the problem of `pv`
which "catch up" after a blocked period
instead of preserving a constant speed cap.
2018-08-10 12:25:52 -07:00
George Lu
3ac2c22485 Reorder declaration 2018-08-09 16:38:32 -07:00
George Lu
bfe8392e23 Remove ctx from benchMem 2018-08-09 12:07:57 -07:00
George Lu
0ece2e5cdc Add consts
+ fix gcc-8 warnings
2018-08-09 11:38:09 -07:00
George Lu
6f480927af argument parsing cleanup
+ clarifying comment
2018-08-09 10:42:58 -07:00
George Lu
ad16a69408 Readability improvements, renaming 2018-08-09 10:42:58 -07:00
George Lu
8278a49cb6 const srcPtrs 2018-08-09 10:42:58 -07:00
George Lu
3d230db853 Change speed representation from floating point to integral 2018-08-09 10:42:58 -07:00
George Lu
8faeb41679 Update Documentation
Change comment // to  /* */
Add more description of what functions do
Remove outdated comments
2018-08-09 10:42:58 -07:00
George Lu
dd270b2f75 Renaming / Style fixes 2018-08-09 10:42:58 -07:00
George Lu
e148db366e Separate capacity vs size
Also:
Make suggested fixes
-varInds_t
-reorder some arguments
-remove code duplication
-update README / -h
-Fix memory leaks
2018-08-09 10:42:58 -07:00
George Lu
df026e159f Fix windows implicit casting bugs 2018-08-09 10:42:58 -07:00
George Lu
0f91b039ff Add Levels 2018-08-09 10:42:58 -07:00
George Lu
7b5b3d7ae3 BenchMem with block compressed sizes passed back up 2018-08-09 10:42:58 -07:00
George Lu
3adc217ea4 Total Changes:
Add different constraint types (decompression speed, compression memory, parameter constraints)
Separate search space by strategy + strategy selection
Memoize results
Real random restarts
Support multiple files
Support Dictionary inputs
Debug Macro for extra printing
2018-08-09 10:42:58 -07:00
George Lu
fab4438801 Dictionary + Multiple file Loading 2018-08-09 10:42:58 -07:00
George Lu
eb21b7f482 Not crashing 2018-08-09 10:42:58 -07:00