Commit Graph

2458 Commits

Author SHA1 Message Date
Nick Terrell
5ee5e71be3 [zstd] Add note about empty ZSTD_CDict 2018-08-23 17:48:06 -07:00
Nick Terrell
924944e471 [zstd] Reuse the ZSTD_CCtx more often with small data. 2018-08-23 17:48:06 -07:00
Jennifer Liu
9d6ed9def3 Merge fastCover into DictBuilder (#1274)
* Minor fix

* Run non-optimize FASTCOVER 5 times in benchmark

* Merge fastCover into dictBuilder

* Fix mixed declaration issue

* Add fastcover to symbol.c

* Add fastCover.c and cover.h to build

* Change fastCover.c to fastcover.c

* Update benchmark to run FASTCOVER in dictBuilder

* Undo spliting fastcover_param into cover_param and f

* Remove convert param functions

* Assign f to parameter

* Add zdict.h to Makefile in lib

* Add cover.h to BUCK

* Cast 1 to U64 before shifting

* Remove trimming of zero freq head and tail in selectSegment and rebenchmark

* Remove f as a separate parameter of tryParam

* Read 8 bytes when d is 6

* Add trimming off zero frequency head and tail

* Use best functions from COVER and remove trimming part(which leads to worse compression ratio after previous bugs were fixed)

* Add finalize= argument to FASTCOVER to specify percentage of training samples passed to ZDICT_finalizeDictionary

* Change nbDmer to always read 8 bytes even when d=6

* Add skip=# argument to allow skipping dmers in computeFrequency in FASTCOVER

* Update comments and benchmarking result

* Change default method of ZDICT_trainFromBuffer to ZDICT_optimizeTrainFromBuffer_fastCover

* Add dictType enum and fix bug about passing zParam when converting to coverParam

* Combine finalize and skip into a single parameter

* Update acceleration parameters and benchmark on 3 sample sets

* Change default splitPoint of FASTCOVER to 0.75 and benchmark first 3 sample sets

* Initialize variables outside of for loop in benchmark.c

* Update benchmark result for hg-manifest

* Remove cover.h from install-includes

* Add explanation of f

* Set default compression level for trainFromBuffer to 3

* Add assertion of fastCoverParams in DiB_trainFromFiles

* Add checkTotalCompressedSize function + some minor fixes

* Add test for multithreading fastCovr

* Initialize segmentFreqs in every FASTCOVER_selectSegment and move mutex_unnlock to end of COVER_best_finish

* Free segmentFreqs

* Initialize segmentFreqs before calling FASTCOVER_buildDictionary instead of in FASTCOVER_selectSegment

* Add FASTCOVER_MEMMULT

* Minor fix

* Update benchmarking result
2018-08-23 12:06:20 -07:00
Yann Collet
36d6165a2d Makefile: added variable SCANBUILD
so that a different version of scan-build can be selected
2018-08-16 16:44:13 -07:00
Yann Collet
1515f0bb0d fixed more issues detected by recent version of scan-build
test run on Linux
2018-08-16 15:20:25 -07:00
Yann Collet
5291d9ac31 fix scope of scan-build tests
exclude zlib code
2018-08-15 17:41:44 -07:00
Yann Collet
42a02ab745 fixed minor warnings issued by scan-build 2018-08-15 14:36:02 -07:00
Yann Collet
3692c31598 Merge branch 'dev' into scanbuild 2018-08-15 13:50:49 -07:00
Yann Collet
6e66bbf5dd fixed several minor issues detected by scan-build
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00
Yann Collet
79a35ac20d minor code comments improvements 2018-08-09 15:16:31 -07:00
W. Felix Handte
2ca7c69167 Fix CDict Attachment to Handle CDicts with Non-Zero Starts
CDicts were previously guaranteed to be generated with `lowLimit=dictLimit=0`.
This is no longer true, and so the old length and index calculations are no
longer valid. This diff fixes them to handle non-zero start indices in CDicts.
2018-08-07 18:14:14 -07:00
Yann Collet
5808027abf Merge branch 'dev' into fix1241 2018-08-03 16:08:33 -07:00
Yann Collet
5892dd5da4
Merge pull request #1255 from terrelln/norm-fix
[FSE] Fix division by zero
2018-08-02 11:48:56 -07:00
Nick Terrell
dc5a67cb7b Disallow tableLog == srcLog 2018-08-02 11:12:17 -07:00
Jennifer Liu
f5228f2c44 Refactoring 2018-07-31 13:58:54 -07:00
Jennifer Liu
4e29bc2469 Use CDict instead of CCtx in analyzeEntropy 2018-07-31 10:36:45 -07:00
cyan4973
3f535007e4 fix %zu support under minGW
and relevant test on Appveyor
2018-07-30 16:56:18 +02:00
cyan4973
aade1e5904 Merge branch 'dev' into fix1241 2018-07-30 16:30:35 +02:00
Nick Terrell
9889bca530 [FSE] Fix division by zero
When the primary normalization method fails, and
`(1 << tableLog) == (maxSymbolValue + 1)`, and every symbol gets assigned
normalized weight 1 or -1 in the first loop, then the next division can
raise `SIGFPE`.
2018-07-27 17:30:03 -07:00
Yann Collet
6e490a2f09
Merge pull request #1237 from terrelln/init-cstream-adv
Set requestedParams in ZSTD_initCStream*()
2018-07-18 16:33:30 +02:00
cyan4973
9597b438e9 fix #1241
Ensure that first input position is valid for a match
even during first usage of context
by starting reference at 1
(avoiding the problematic 0).
2018-07-17 18:52:57 +02:00
cyan4973
53e1f0504e zstdmt debug traces compatibles with mingw
since mingw does not have `sys/times.h`,
remove this path when detecting mingw compilation.
2018-07-17 14:39:44 +02:00
Nick Terrell
45821fac0c
Merge pull request #1225 from jennifermliu/dev
Split samples when building dictionary for COVER
2018-07-13 13:26:15 -07:00
Nick Terrell
6d222c437c Set requestedParams in ZSTD_initCStream*()
The correct parameters are used once, but once `ZSTD_resetCStream()` is
called the default parameters (level 3) are used. Fix this by setting
`requestedParams` in the `ZSTD_initCStream*()` functions.

The added tests both fail before this patch and pass after.
2018-07-12 18:35:55 -07:00
Jennifer Liu
612b346ed5 Add explanation for split=100 2018-07-11 15:50:28 -07:00
Jennifer Liu
5021441d86 Change default splitPoint to 100 2018-07-10 11:19:33 -07:00
Jennifer Liu
456f290e31 Change back to splitPoint<=0 2018-07-09 13:53:25 -07:00
Jennifer Liu
7efabb2cf6 Only make 0.0 default splitPoint 2018-07-09 12:26:53 -07:00
Yann Collet
bbd78df59b add build macro NO_PREFETCH
prevent usage of prefetch intrinsic commands
which are not supported by c2rust
(see https://github.com/immunant/c2rust/issues/13)
2018-07-06 17:06:04 -07:00
Jennifer Liu
015a00af0f Change cover_sum back to 2 parameters and fix splitPoint issues 2018-07-06 14:24:18 -07:00
Jennifer Liu
0bbff01211 Fix testing parameter 2018-07-05 22:40:32 -07:00
Jennifer Liu
a085d1aae1 Allow splitPoint==1.0 (using all samples for both training and testing) 2018-07-05 10:38:45 -07:00
Jennifer Liu
0881184c89 Some edits based on pull request comments 2018-07-03 17:53:27 -07:00
Jennifer Liu
16e75e8804 Update minimal training sample size 2018-07-03 12:07:06 -07:00
Jennifer Liu
348e5f77a9 Add split=# to cli 2018-06-29 17:54:41 -07:00
Jennifer Liu
52fbbbcb6b Explicitly cast double to unsigned 2018-06-29 16:17:20 -07:00
Jennifer Liu
f9d19b83fb Fix variable declaration problem 2018-06-29 15:46:56 -07:00
Jennifer Liu
e061d84016 Another fix to comparator 2018-06-29 15:38:08 -07:00
Jennifer Liu
59797d3328 Fix splitPoint floating point comparison problem 2018-06-29 12:47:03 -07:00
Jennifer Liu
0ef06f2e8a Split samples into train and test sets 2018-06-29 12:33:34 -07:00
Yann Collet
121aa2c388
Merge pull request #1211 from facebook/staticAssert
updated DEBUG_STATIC_ASSERT()
2018-06-27 12:19:17 -07:00
Yann Collet
4489daec09 slightly adjusted default-distribution threshold
depending on strategy.
fast favors faster compression and decompression speeds.
2018-06-26 20:10:45 -07:00
Yann Collet
ff773bfcde zeroise freq table with memset()
improves decoding speed by ~5% in github_users sample set
2018-06-26 17:24:41 -07:00
Yann Collet
7b9bbf77c9 switched to a sizeof() version
avoid -Werror=unused-variable issue
2018-06-26 14:08:35 -07:00
Yann Collet
f98ec46979 updated DEBUG_STATIC_ASSERT()
following suggestion from #1209
2018-06-26 12:04:59 -07:00
Nick Terrell
b426bcc097
[zstdmt] Fix jobsize bugs (#1205)
[zstdmt] Fix jobsize bugs

* `ZSTDMT_serialState_reset()` should use `targetSectionSize`, not `jobSize` when sizing the seqstore.
  Add an assert that checks that we sized the seqstore using the right job size.
* `ZSTDMT_compressionJob()` should check if `rawSeqStore.seq == NULL`.
* `ZSTDMT_initCStream_internal()` should not adjust `mtctx->params.jobSize` (clamping to MIN/MAX is okay).
2018-06-25 15:21:08 -07:00
Yann Collet
3b53bfe4f3
Merge pull request #1200 from felixhandte/zstd-attach-dict-pref
Add CCtx Param Controlling Dict Attachment Behavior
2018-06-25 12:42:31 -07:00
Yann Collet
31769ce702 error on no forward progress
streaming decoders, such as ZSTD_decompressStream() or ZSTD_decompress_generic(),
may end up making no forward progress,
(aka no byte read from input __and__ no byte written to output),
due to unusual parameters conditions,
such as providing an output buffer already full.

In such case, the caller may be caught in an infinite loop,
calling the streaming decompression function again and again,
without making any progress.

This version detects such situation, and generates an error instead :
ZSTD_error_dstSize_tooSmall when output buffer is full,
ZSTD_error_srcSize_wrong when input buffer is empty.

The detection tolerates a number of attempts before triggering an error,
controlled by ZSTD_NO_FORWARD_PROGRESS_MAX macro constant,
which is set to 16 by default, and can be re-defined at compilation time.
This behavior tolerates potentially existing implementations
where such cases happen sporadically, like once or twice,
which is not dangerous (only infinite loops are),
without generating an error, hence without breaking these implementations.
2018-06-22 17:58:21 -07:00
Yann Collet
3934e010a2
Merge pull request #1197 from facebook/poolResize
Thread Pool resize
2018-06-22 14:20:07 -07:00
Yann Collet
fbd5dfc1b1 changed POOL_resize() return type to int
return is now just en error code.
This guarantee that `ctx` remains valid after POOL_resize().
Gets rid of internal POOL_free() operation.
2018-06-22 12:14:59 -07:00