Commit Graph

242 Commits

Author SHA1 Message Date
Yann Collet
0a0a212934 zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.

Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.

The "proper" formula makes it possible to remove this heuristic.

Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).

It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
2017-11-28 14:07:03 -08:00
Yann Collet
daebc7fe26 bench: slightly adjusted display format
adapt accuracy depending on value.
makes it possible to have higher accuracy for small value,
notably small compression speed.
This capability is expected to be useful while modifying optimal parser.
2017-11-18 15:54:32 -08:00
Yann Collet
5b957ba899 minor interface adjustments 2017-11-17 01:21:40 -08:00
Yann Collet
d898fb7ba6 bench: added cli command -S to benchmark multiple files separately
Currently, all files are joined by default,
they are compressed separately but benchmarked together,
providing a single final result.

Benchmarking files separately make it possible to accurately measure difference for each file.
This is expected to be useful while tuning optimal parser.
2017-11-17 00:22:55 -08:00
Yann Collet
8accfa7fcc bench: realTime is a global parameter
like most parameters not directly related to compression
2017-11-17 00:02:37 -08:00
Yann Collet
9a11f70dc3 merged repcode search into BT match search
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)

It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).

warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
2017-11-13 02:23:48 -08:00
Yann Collet
6f1dfa8adf removed line with // comment
this is for a different topic
(better parameter adaptation for small files + dictionary and/or custome parameters)
2017-11-01 17:01:45 -07:00
Yann Collet
428e8b3bf4 fix : ZSTD_compress_generic(,,,ZSTD_e_end) automatically sets pledgedSrcSize
as per documentation, on ZSTD_setPledgedSrcSize() :
> If all data is provided and consumed in a single round,
> this value (pledgedSrcSize) is overriden by srcSize instead.

This wasn't applied before compression level is transformed into compression parameters.
As a consequence, small input missed compression parameters adaptation.

It seems to work fine now : compression was compared with ZSTD_compress_advanced(),
results were the same.
2017-11-01 13:15:23 -07:00
Yann Collet
eac42534fe bench: fixed Visual warning regarding struct initialization
also :
removed dependency on zstdmt_compress.h
removed several unused macros
fileio : small code refactoring to reduce some variable scope
2017-10-19 11:56:14 -07:00
Yann Collet
d3b9547aa4 IO and bench : ZSTD_NEWAPI is the only remaining code path
removed the other 2 code paths (single thread, and ZSTDMT ones)
keeping only the new advanced API, for easier code coverage.

It shall also fix identified issue with Visual Studio
which doesn't have ZSTD_NEWAPI defined.
2017-10-18 17:01:53 -07:00
Yann Collet
18b795374a UTIL_getFileSize() returns UTIL_FILESIZE_UNKNOWN on failure
UTIL_getFileSize() used to return zero on failure.
This made it impossible to distinguish a failure from a genuine empty file.
Both cases where coalesced.

Adding UTIL_FILESIZE_UNKNOWN constant has many consequences on user code,
since in many places, the `0` was assumed to mean "error".
This is no longer the case, and the error code must be actively checked.
2017-10-17 16:14:25 -07:00
Yann Collet
f1571dad8f Merge pull request #838 from stellamplau/ldm-mergeDev
Add long distance matcher
2017-09-13 13:24:08 -07:00
Yann Collet
c95c0c9725 modified util::time API
for easier invocation.
- no longer expose frequency timer :
it's either useless, or stored internally in a static variable (init is only necessary once).
- UTIL_getTime() provides result by function return.
2017-09-12 18:12:46 -07:00
Stella Lau
eb3327c10a Merge branch 'dev' of https://github.com/facebook/zstd into ldm-mergeDev 2017-09-11 15:00:01 -07:00
Yann Collet
3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Stella Lau
eeff55dfa8 Merge remote-tracking branch 'upstream/dev' into ldm-mergeDev 2017-09-06 15:56:32 -07:00
Stella Lau
67d4a6161c Add ldmBucketSizeLog param 2017-09-02 21:55:29 -07:00
Stella Lau
a1f04d518d Move hashEveryLog to cctxParams and update cli 2017-09-01 15:05:47 -07:00
Yann Collet
0558850735 bench stops immediately on decoding error 2017-09-01 11:46:15 -07:00
Stella Lau
17d8e0bdcc Merge remote-tracking branch 'upstream/longRangeMatcher' into ldm-integrate 2017-09-01 10:19:38 -07:00
Stella Lau
8081becadc Add long distance matching as a CCtxParam 2017-09-01 09:18:58 -07:00
Yann Collet
d7ad99b2ab Merge branch 'longRangeMatcher' into dev 2017-08-31 18:08:37 -07:00
Stella Lau
6a546efb8c Add long distance matcher
Move last literals section to ZSTD_block_internal
2017-08-31 12:53:19 -07:00
Stella Lau
c88fb9267f Replace 'byReference' with enum 2017-08-29 11:55:02 -07:00
Yann Collet
32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
Yann Collet
7bd1a2900e added ZSTD_dictMode_e to control dictionary loading mode 2017-06-21 11:50:33 -07:00
Yann Collet
c08e649e95 first implementation of bench.c with new API ZSTD_compress_generic()
Doesn't speed optimize this buffer-to-buffer scenario yet.
Still internally defers to streaming implementation.

Also : fixed a long standing bug in ZSTDMT streaming API.
2017-06-19 18:25:35 -07:00
Yann Collet
fe234bf48b fix attempts : fullbench for VS2008 2017-06-19 15:23:19 -07:00
Yann Collet
31533bacce Changed ZSTD_createCDict_advanced()
It now only uses compressionParameters as argument.
It produces many changes throughout user code,
though hopefully they tend to be simple :
just provide the cParams part from existing ZSTD_parameters.

Some programs might depend on ZSTD_createCDict_advanced() to pass frame parameters.
This change will force them to revisit this strategy and fix it,
since frame parameters are effectively silently ignored in current version.
2017-04-27 00:29:04 -07:00
Sean Purcell
42bac7fa84 Change ifndef's to undef's 2017-04-13 15:35:05 -07:00
Sean Purcell
f876f1200c Fix compilation on macOS 2017-04-13 12:33:45 -07:00
Yann Collet
c2007388a5 fixed bench.c : optional advanced parameters applied
before creating cdict
2017-04-04 15:35:06 -07:00
Yann Collet
81d6380139 minor bench.c adjustments
shorter debug messages
no need to check decompressedLength==0 twice
2017-04-04 15:21:09 -07:00
Yann Collet
5bde4be5a1 fix : bench automatically adapts parameters to srcSize 2017-03-29 12:10:38 -07:00
Sean Purcell
042ba122ae Change g_displayLevel to int and fix DISPLAYUPDATE flush 2017-03-23 11:21:59 -07:00
Yann Collet
c1c040eae1 added gzip tests
also : made sure zstd --format=gzip -V
would fail if gzip compatibility is not supported
2017-03-01 16:49:20 -08:00
Przemyslaw Skibinski
74dcd8d15f bench.c: use a single ticksPerSecond 2017-02-21 12:22:05 +01:00
Przemyslaw Skibinski
e052c60540 introduce UTIL_freq_t 2017-02-20 11:27:11 +01:00
Sean Purcell
5069b6c2c3 Merge branch 'dev' into multiframe 2017-02-10 10:08:55 -08:00
Sean Purcell
4e709712e1 Decompressed size functions now handle multiframes and distinguish cases
- Add ZSTD_findDecompressedSize
    - Traverses multiple frames to find total output size
- Add ZSTD_getFrameContentSize
    - Gets the decompressed size of a single frame by reading header
- Deprecate ZSTD_getDecompressedSize
2017-02-08 14:50:10 -08:00
Przemyslaw Skibinski
d05014c739 added the "--rt-prio" option 2017-02-07 16:48:01 +01:00
Przemyslaw Skibinski
94abd6a26c SET_REALTIME_PRIORITY 2017-02-07 16:36:19 +01:00
Nick Terrell
83c387eb8e Fix zstdmt_compress.h include 2017-01-26 15:25:32 -08:00
Yann Collet
f8804d1014 convert tabs to space
joys of using multiple editors from multiple environments ...
2017-01-20 17:23:19 -08:00
cyan4973
5fba09fa41 updated util's time for Windows compatibility
Correctly measures time on Posix systems when running with
Multi-threading

Todo : check Windows measurement under multi-threading
2017-01-20 12:57:31 -08:00
Yann Collet
458c8a94b4 minor refactoring : cleaner MT integration within bench 2017-01-19 17:44:15 -08:00
Yann Collet
500014af49 zstd cli can now compress using multi-threading
added : command -T#
added : ZSTD_resetCStream() (zstdmt_compress)
added : FIO_setNbThreads()  (fileio)
2017-01-19 17:04:28 -08:00
Yann Collet
736788f8e8 added streaming fuzzer tests for MT API
Also : fixed corner case, where nb of jobs completed becomes > jobQueueSize
which is possible when many flushes are issued
while there is not enough dst buffer to flush completed ones.
2017-01-19 12:15:29 -08:00
Yann Collet
5eb749e734 ZSTDMT_compress() creates a single frame
The new strategy involves cutting frame at block level.
The result is a single frame, preserving ZSTD_getDecompressedSize()

As a consequence, bench can now make a full round-trip,
since the result is compatible with ZSTD_decompress().

This strategy will not make it possible to decode the frame with multiple threads
since the exact cut between independent blocks is not known.
MT decoding needs further discussions.
2017-01-11 18:21:25 +01:00
Yann Collet
f1cb55192c fixed linux warnings 2017-01-02 01:11:55 +01:00
Yann Collet
0ec6a95ba1 minor fixes 2017-01-02 00:49:42 +01:00
Yann Collet
c6a6417458 bench correctly measures time for multi-threaded compression (posix only) 2016-12-31 03:31:26 +01:00
Yann Collet
e70912c72b Changed : input divided into roughly equal parts.
Debug : can measure time waiting for mutexes to unlock.
2016-12-29 01:24:01 +01:00
Yann Collet
ab7a579180 added -T command , to set nb of threads 2016-12-28 16:11:09 +01:00
Yann Collet
3d93f2fce7 first zstdmt sketch 2016-12-27 07:19:36 +01:00
Yann Collet
8333106b8a Merge branch 'dev' of github.com:facebook/zstd into dev 2016-12-21 16:44:24 +01:00
Yann Collet
1f57c2ed32 added : ZSTD_createCDict_byReference() 2016-12-21 16:20:11 +01:00
Przemyslaw Skibinski
7a8a03c20d util.h: restore BSD license for Facebook Open-Source 2016-12-21 15:08:44 +01:00
Przemyslaw Skibinski
e679741b18 _CRT_SECURE_NO_WARNINGS moved to util.h 2016-12-21 13:47:11 +01:00
Przemyslaw Skibinski
2f6ccee6af platform.h: removed Compiler Options 2016-12-21 13:23:34 +01:00
Przemyslaw Skibinski
16ae6563a2 executables use new util.h and platform.h 2016-12-21 09:06:14 +01:00
Przemyslaw Skibinski
f8046b8e72 Merge remote-tracking branch 'refs/remotes/facebook/dev' into v112
# Conflicts:
#	appveyor.yml
2016-12-19 08:20:26 +01:00
Przemyslaw Skibinski
b866e72826 tools use platform.h 2016-12-16 14:24:01 +01:00
Przemyslaw Skibinski
c71e552b2e fixed "strategy" in advanced compression parameters 2016-12-13 20:04:32 +01:00
Przemyslaw Skibinski
897b8bb5eb bench.c: support advanced compression parameters 2016-12-13 13:03:41 +01:00
Yann Collet
e63c631aaf decode benchmark, multi-files 2016-12-06 17:46:49 -08:00
Yann Collet
d946501d2c decode benchmark - single file (hidden option) 2016-12-06 16:49:23 -08:00
Yann Collet
167c494748 Merge branch 'dev' of github.com:facebook/zstd into dev 2016-11-29 14:05:15 -08:00
Yann Collet
4f5350f610 long matches support overflow 2016-11-29 13:12:24 -08:00
Przemyslaw Skibinski
fd0ac93024 bench.c: use ZSTD_maxCLevel() 2016-11-23 21:45:29 +01:00
Przemyslaw Skibinski
5ddcd9d9ae bench.c: fixed MAX_CLEVEL 2016-11-21 16:37:56 +01:00
Przemyslaw Skibinski
2558b4cdbc bench.c without dict uses ZSTD_compressCCtx 2016-11-18 11:46:30 +01:00
Przemyslaw Skibinski
26306fcacf BMK_SetNbIterations renamed to BMK_SetNbSeconds 2016-11-03 11:38:01 +01:00
Yann Collet
7ae67bb18a small compression speed gains with using_CDict 2016-09-06 06:28:05 +02:00
Yann Collet
4ded9e591c added boilerplate 2016-08-30 11:06:28 -07:00
Yann Collet
87c18b2ebd fixed multiple minor warnings for XCode 2016-08-26 01:43:47 +02:00
Yann Collet
0baa64a763 increased maximum memory size for 64-bits bench to 16 GB 2016-08-25 22:54:13 +02:00
Yann Collet
d1733f7417 fixed crc bug in rare timing conditions within bench.c 2016-08-21 01:04:46 +02:00
inikep
5a54870047 fixed Intel Compiler warnings with Visual Studio
http://encode.ru/threads/2119-Zstandard?p=49504&viewfull=1#post49504
2016-08-18 09:00:25 +02:00
inikep
7132fb15ba bench.c: removed benchResult_t 2016-08-10 14:59:18 +02:00
Yann Collet
6a21971f4a bench : implemented avgSize 2016-08-03 00:06:24 +02:00
Yann Collet
bf2bc112bb bench : controlled display update when loading lot of files 2016-08-02 23:48:13 +02:00
Yann Collet
de4c04f6c2 Fixed : ZSTD_compress* can compress > 4 GB in a single pass, reported by Nick Terrell 2016-08-02 11:27:05 +02:00
Yann Collet
a9febe81ae changed bench behavior for slow compression levels 2016-08-01 13:40:52 +02:00
Yann Collet
235911e13f removed "avg" evaluation from bench -q
removed "sleeping" notification from bench -q
2016-07-31 01:32:48 +02:00
Yann Collet
3c242e79d3 updated compression levels table 2016-07-13 14:56:24 +02:00
Yann Collet
3d2cd7f816 Introduced ZSTD_getParams()
bench now uses ZSTD_createCDict_advanced()
2016-06-27 15:12:26 +02:00
Yann Collet
ec224d256d removed useless context 2016-06-27 13:39:30 +02:00
Yann Collet
4c56f4a3cf fixed error messages 2016-06-27 13:36:54 +02:00
inikep
d7d251ccb5 bench.c: added support for ZSTD_GIT_COMMIT 2016-06-22 16:13:25 +02:00
inikep
f2f59d758e test-zstd-speed.py: added ZSTD_GIT_COMMIT 2016-06-22 15:42:26 +02:00
inikep
9bf5357101 bench.c: use ZSTD_VERSION_STRING 2016-06-21 11:01:29 +02:00
Yann Collet
ee1a084852 Integrated new dictionary API into bench module 2016-06-07 01:40:49 +02:00
Yann Collet
2cc72f1fd3 fixed initialization issue in bench 2016-06-06 17:50:07 +02:00
Yann Collet
d3b7f8d21f Merged zstd_static.h into zstd.h . Now requires ZSTD_STATIC_LINKING_ONLY macro 2016-06-04 19:47:02 +02:00
Yann Collet
70d1301d6e Changed ZSTD_adjustCParams() prototype
`ZSTD_adjustCParams()` is now automatically invoked at the end of `ZSTD_getCParams()`
2016-06-01 18:45:34 +02:00
inikep
957823f56f zstdcli: -r (operate recursively on directories) works with dictBuilder and compression 2016-05-25 15:30:55 +02:00
Yann Collet
e162aceeb6 minor simplification 2016-05-20 11:35:00 +02:00
inikep
0bdb6a8118 changed definition of UTIL_createFileList 2016-05-13 10:52:02 +02:00
inikep
4dbf7f4a3b dynamic memory allocation in UTIL_createFileList 2016-05-11 14:11:00 +02:00