zstd/tests
Bimba Shrestha f25a6e9f8f Adding new cli endpoint --patch-from= (#1940)
* Adding new cli endpoint --diff-from=

* Appveyor conversion nit

* Using bool set trick instead of direct set

* Removing --diff-from and only leaving --diff-from=#

* Throwing error when both dictFileName vars are set

* Clean up syntax

* Renaming diff-from to patch-from

* Revering comma separated syntax clean up

* Updating playtests with patch-from

* Uncommenting accidentally commented

* Updating remaining docs and var names to be patch-from instead of diff-from

* Constifying

* Using existing log2 function and removing newly created one

* Argument order (moving prefs to end)

* Using comma separated syntax

* Moving to outside #ifndef
2020-01-10 14:25:24 -08:00
..
dict-files [fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights (#1898) 2019-11-26 12:24:19 -08:00
fuzz [fuzz] Allow zero sized buffers for streaming fuzzers (#1945) 2020-01-09 11:38:50 -08:00
golden-compression Changing test file directory names to be more descriptive 2019-09-09 12:08:33 -07:00
golden-decompression Changing test file directory names to be more descriptive 2019-09-09 12:08:33 -07:00
gzip Fix name of macOS 2018-06-09 14:31:17 -05:00
regression updated fuzz tests to use FileNamesTable* abstraction 2019-11-06 14:42:13 -08:00
.gitignore fix gitignore errors 2019-07-09 21:08:13 +08:00
automated_benchmarking.py [bench] Automated benchmarking script (#1906) 2020-01-06 14:19:11 -08:00
bigdict.c [tests] Add tests for big dictionaries 2019-06-21 17:58:24 -07:00
checkTag.c added tests/checkTag 2018-01-14 17:03:45 -08:00
datagencli.c fix confusion between unsigned <-> U32 2018-12-21 18:09:41 -08:00
decodecorpus.c [tests] Fix decodecorpus 2019-09-20 01:09:47 -07:00
DEPRECATED-test-zstd-speed.py [bench] Automated benchmarking script (#1906) 2020-01-06 14:19:11 -08:00
fullbench.c fix initCStream_advanced() for fast strategies 2019-10-22 15:01:38 -07:00
fuzzer.c [fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936) 2020-01-03 16:53:51 -08:00
invalidDictionaries.c updated license header 2017-09-08 00:09:23 -07:00
legacy.c Test new ZSTD_findFrameCompressedSize and update documentation 2019-03-15 18:04:19 -07:00
libzstd_partial_builds.sh fix and refactored libzstd_partial_build.sh 2018-10-24 11:32:09 -07:00
longmatch.c fixed remaining searchLength invocations 2018-11-20 15:13:27 -08:00
Makefile [bench] Automated benchmarking script (#1906) 2020-01-06 14:19:11 -08:00
paramgrill.c [paramgrill] Fix mingw build errors 2019-04-18 15:06:56 -07:00
playTests.sh Adding new cli endpoint --patch-from= (#1940) 2020-01-10 14:25:24 -08:00
poolTests.c [threading] Add debug utilities 2019-10-18 15:05:34 -07:00
rateLimiter.py simplified rateLimiter 2018-08-13 12:13:47 -07:00
README.md [bench] Automated benchmarking script (#1906) 2020-01-06 14:19:11 -08:00
roundTripCrash.c [libzstd] Rename ZSTD_CCtxParam_* to ZSTD_CCtxParams_* 2019-02-19 17:44:52 -08:00
seqgen.c fix confusion between unsigned <-> U32 2018-12-21 18:09:41 -08:00
seqgen.h [test] Exercise all codes in dictionary tables 2017-10-16 18:05:36 -07:00
symbols.c Provide an API function to estimate decompressed size. 2019-02-28 00:42:49 -08:00
test-zstd-versions.py fixed versions-test to only test v0.5+ 2018-09-20 14:59:11 -07:00
zbufftest.c improve deprecation warning macro 2019-10-23 11:59:32 -07:00
zstreamtest.c Fix null pointer addition 2019-11-20 18:36:04 -08:00

Programs and scripts for automated testing of Zstandard

This directory contains the following programs and scripts:

  • datagen : Synthetic and parametrable data generator, for tests
  • fullbench : Precisely measure speed for each zstd inner functions
  • fuzzer : Test tool, to check zstd integrity on target platform
  • paramgrill : parameter tester for zstd
  • test-zstd-speed.py : script for testing zstd speed difference between commits
  • test-zstd-versions.py : compatibility test between zstd versions stored on Github (v0.1+)
  • zbufftest : Test tool to check ZBUFF (a buffered streaming API) integrity
  • zstreamtest : Fuzzer test tool for zstd streaming API
  • legacy : Test tool to test decoding of legacy zstd frames
  • decodecorpus : Tool to generate valid Zstandard frames, for verifying decoder implementations

test-zstd-versions.py - script for testing zstd interoperability between versions

This script creates versionsTest directory to which zstd repository is cloned. Then all tagged (released) versions of zstd are compiled. In the following step interoperability between zstd versions is checked.

automated-benchmarking.py - script for benchmarking zstd prs to dev

This script benchmarks facebook:dev and changes from pull requests made to zstd and compares them against facebook:dev to detect regressions. This script currently runs on a dedicated desktop machine for every pull request that is made to the zstd repo but can also be run on any machine via the command line interface.

There are three modes of usage for this script: fastmode will just run a minimal single build comparison (between facebook:dev and facebook:master), onetime will pull all the current pull requests from the zstd repo and compare facebook:dev to all of them once, continuous will continuously get pull requests from the zstd repo and run benchmarks against facebook:dev.

Example usage: python automated_benchmarking.py golden-compression 1 current 1 "" 60
usage: automated_benchmarking.py [-h] directory levels mode emails

positional arguments:
  directory   directory with files to benchmark
  levels      levels to test eg ('1,2,3')
  mode        'fastmode', 'onetime', 'current' or 'continuous'
  iterations  number of benchmark iterations to run
  emails      email addresses of people who will be alerted upon regression.
              Only for continuous mode
  frequency   specifies the number of seconds to wait before each successive
              check for new PRs in continuous mode

test-zstd-speed.py - script for testing zstd speed difference between commits

DEPRECATED

This script creates speedTest directory to which zstd repository is cloned. Then it compiles all branches of zstd and performs a speed benchmark for a given list of files (the testFileNames parameter). After sleepTime (an optional parameter, default 300 seconds) seconds the script checks repository for new commits. If a new commit is found it is compiled and a speed benchmark for this commit is performed. The results of the speed benchmark are compared to the previous results. If compression or decompression speed for one of zstd levels is lower than lowerLimit (an optional parameter, default 0.98) the speed benchmark is restarted. If second results are also lower than lowerLimit the warning e-mail is send to recipients from the list (the emails parameter).

Additional remarks:

  • To be sure that speed results are accurate the script should be run on a "stable" target system with no other jobs running in parallel
  • Using the script with virtual machines can lead to large variations of speed results
  • The speed benchmark is not performed until computers' load average is lower than maxLoadAvg (an optional parameter, default 0.75)
  • The script sends e-mails using mutt; if mutt is not available it sends e-mails without attachments using mail; if both are not available it only prints a warning

The example usage with two test files, one e-mail address, and with an additional message:

./test-zstd-speed.py "silesia.tar calgary.tar" "email@gmail.com" --message "tested on my laptop" --sleepTime 60

To run the script in background please use:

nohup ./test-zstd-speed.py testFileNames emails &

The full list of parameters:

positional arguments:
  testFileNames         file names list for speed benchmark
  emails                list of e-mail addresses to send warnings

optional arguments:
  -h, --help            show this help message and exit
  --message MESSAGE     attach an additional message to e-mail
  --lowerLimit LOWERLIMIT
                        send email if speed is lower than given limit
  --maxLoadAvg MAXLOADAVG
                        maximum load average to start testing
  --lastCLevel LASTCLEVEL
                        last compression level for testing
  --sleepTime SLEEPTIME
                        frequency of repository checking in seconds

decodecorpus - tool to generate Zstandard frames for decoder testing

Command line tool to generate test .zst files.

This tool will generate .zst files with checksums, as well as optionally output the corresponding correct uncompressed data for extra verification.

Example:

./decodecorpus -ptestfiles -otestfiles -n10000 -s5

will generate 10,000 sample .zst files using a seed of 5 in the testfiles directory, with the zstd checksum field set, as well as the 10,000 original files for more detailed comparison of decompression results.

./decodecorpus -t -T1mn

will choose a random seed, and for 1 minute, generate random test frames and ensure that the zstd library correctly decompresses them in both simple and streaming modes.

paramgrill - tool for generating compression table parameters and optimizing parameters on file given constraints

Full list of arguments

 -T#          : set level 1 speed objective
 -B#          : cut input into blocks of size # (default : single block)
 -S           : benchmarks a single run (example command: -Sl3w10h12)
    w# - windowLog
    h# - hashLog
    c# - chainLog
    s# - searchLog
    l# - minMatch
    t# - targetLength
    S# - strategy
    L# - level
 --zstd=      : Single run, parameter selection syntax same as zstdcli with more parameters
                    (Added forceAttachDictionary / fadt)
                    When invoked with --optimize, this represents the sample to exceed.
 --optimize=  : find parameters to maximize compression ratio given parameters
                    Can use all --zstd= commands to constrain the type of solution found in addition to the following constraints
    cSpeed=   : Minimum compression speed
    dSpeed=   : Minimum decompression speed
    cMem=     : Maximum compression memory
    lvl=      : Searches for solutions which are strictly better than that compression lvl in ratio and cSpeed,
    stc=      : When invoked with lvl=, represents percentage slack in ratio/cSpeed allowed for a solution to be considered (Default 100%)
              : In normal operation, represents percentage slack in choosing viable starting strategy selection in choosing the default parameters
                    (Lower value will begin with stronger strategies) (Default 90%)
    speedRatio=   (accepts decimals)
              : determines value of gains in speed vs gains in ratio
                    when determining overall winner (default 5 (1% ratio = 5% speed)).
    tries=    : Maximum number of random restarts on a single strategy before switching (Default 5)
                    Higher values will make optimizer run longer, more chances to find better solution.
    memLog    : Limits the log of the size of each memotable (1 per strategy). Will use hash tables when state space is larger than max size.
                    Setting memLog = 0 turns off memoization
 --display=   : specify which parameters are included in the output
                    can use all --zstd parameter names and 'cParams' as a shorthand for all parameters used in ZSTD_compressionParameters
                    (Default: display all params available)
 -P#          : generated sample compressibility (when no file is provided)
 -t#          : Caps runtime of operation in seconds (default : 99999 seconds (about 27 hours ))
 -v           : Prints Benchmarking output
 -D           : Next argument dictionary file
 -s           : Benchmark all files separately
 -q           : Quiet, repeat for more quiet
                  -q Prints parameters + results whenever a new best is found
                  -qq Only prints parameters whenever a new best is found, prints final parameters + results
                  -qqq Only print final parameters + results
                  -qqqq Only prints final parameter set in the form --zstd=
 -v           : Verbose, cancels quiet, repeat for more volume
                  -v Prints all candidate parameters and results

Any inputs afterwards are treated as files to benchmark.