Commit Graph

85 Commits

Author SHA1 Message Date
Nick Terrell
e55da9e963 Wrap the new advanced api completely 2019-03-21 10:54:40 -07:00
Nick Terrell
3d7377b874 [libzstd] Handle uncompressed literals 2019-02-15 14:58:11 -08:00
Nick Terrell
f9513115e4 [libzstd] Add ZSTD_c_literalCompressionMode flag
It controls the literals compression. It is either
`auto`, `huffman`, or `uncompressed`. It defaults to
`auto`, which is the current behavior.
2019-02-13 14:59:22 -08:00
Yann Collet
e980ba212f
Merge pull request #1471 from facebook/nofloat
guard functions using floating point for debug mode only
2018-12-23 12:35:51 -08:00
Yann Collet
c9dfb7e445 guard functions using floating point for debug mode only
they are only used to print debug messages.
Requested in #1386,
2018-12-22 09:09:40 -08:00
Yann Collet
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
95784c654c fixed shadowing of stat variable
some standard lib declares a `stat` variable at global scope
shadowing local declarations ....
2018-12-20 14:56:44 -08:00
Yann Collet
8e0e495ce8 fixed: compression ratio discrepancy
depending on initialization,
the first byte of a new frame was invalidated or not.

As a consequence, one match opportunity was available or not,
resulting in slightly different compressed sizes
(on average, 1 or 2 bytes once every 20 frames).

It impacted ratio comparison between one-shot and streaming modes.

This fix makes the first byte of a new frame always a valid match.
Now compressed size is always the same.
It also improves compressed size by a negligible amount.
2018-12-19 10:11:06 -08:00
Yann Collet
eee789b7ea continued: changed to overlapLog
in deeper code layer.
for consistency.
2018-12-11 17:41:42 -08:00
Yann Collet
41c7d0b1e1 changed hashEveryLog into hashRateLog 2018-11-21 14:36:57 -08:00
Nick Terrell
b9693d3a49 [lib] Add rsyncable mode
- Add rsyncable mode to multithreaded mode
- Factor out LDM's hash function for reuse
2018-11-14 16:59:57 -08:00
W. Felix Handte
4127de5fa6 Switch Enum to Only Non-Negative Values, Update Comments 2018-11-12 12:47:47 -08:00
Yann Collet
8d56f4baee added a few comments for clarifications 2018-10-26 15:21:52 -07:00
W. Felix Handte
6cb2454646 Remove CParams from Block Compressor Functions' Args 2018-09-28 17:10:42 -07:00
W. Felix Handte
76ef87ed9d Add ZSTD_compressionParameters to ZSTD_matchState_t 2018-09-28 17:10:42 -07:00
Nick Terrell
5e580de6da [zstd] Fix seqStore growth
We could undersize the literals buffer by up to 11 bytes,
due to a combination of 2 bugs:
* The literals buffer didn't have `WILDCOPY_OVERLENGTH` extra
  space, like it is supposed to.
* We didn't check the literals buffer size in `ZSTD_sufficientBuff()`.
2018-08-28 13:24:44 -07:00
Nick Terrell
924944e471 [zstd] Reuse the ZSTD_CCtx more often with small data. 2018-08-23 17:48:06 -07:00
W. Felix Handte
01bb1c1016 Add CCtx Param Controlling Dict Attachment Behavior 2018-06-21 17:29:25 -04:00
Yann Collet
fa41bcc2c2 grouped debug functions into debug.h
There were 2 competing set of debug functions
within zstd_internal.h and bitstream.h.
They were mostly duplicate, and required care to avoid messing with each other.

There is now a single implementation, shared by both.

Significant change :
The macro variable ZSTD_DEBUG does no longer exist,
it has been replaced by DEBUGLEVEL,
which required modifying several source files.
2018-06-13 15:43:09 -04:00
Yann Collet
3050733042 Merge branch 'dev' into negLevels 2018-06-07 15:51:35 -07:00
Yann Collet
a57b4df85f removed literalCompression directive
in this version, literal compression is always disabled for ZSTD_fast strategy.

Performance parity between ZSTD_compress_advanced() and ZSTD_compress_generic()
2018-06-07 15:24:12 -07:00
Yann Collet
e5e17d009f changed member name to workSpaceOversizedDuration 2018-06-06 15:00:27 -07:00
Yann Collet
3d523c741b added workSpaceTooLarge and workSpaceWasteful
also :
slightly increased speed of test fuzzer.16
2018-06-05 11:42:48 -07:00
Yann Collet
2108decb41 Fixed a nasty corruption bug
recently introduce into the new dictionary mode.
The bug could be reproduced with this command :
./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639

error was in function ZSTD_count_2segments() :
the beginning of the 2nd segment corresponds to prefixStart
and not the beginning of the current block (istart == src).
This would result in comparing the wrong byte.
2018-06-01 18:54:34 -07:00
Yann Collet
463a0fe38b simplified optimal parser
removed "cached" structure.
prices are now saved in the optimal table.

Primarily done for simplification.
Might improve speed by a little.
But actually, and surprisingly, also improves ratio in some circumstances.
2018-05-29 14:07:25 -07:00
Yann Collet
f6ad59ab5c Merge branch 'dev' into staticDictCost 2018-05-24 16:21:02 -07:00
W. Felix Handte
298d24fa57 Make loadedDictEnd an Index, not the Dict Len 2018-05-23 17:53:03 -04:00
W. Felix Handte
3ba70cc759 Clear the Dictionary When Sliding the Window 2018-05-23 17:53:03 -04:00
W. Felix Handte
191fc74a51 Rename 'hasDict' to 'dictMode' 2018-05-23 17:53:03 -04:00
W. Felix Handte
ae4fcf7816 Respond to PR Comments; Formatting/Style/Lint Fixes 2018-05-23 17:53:03 -04:00
W. Felix Handte
b67196f30d Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff 2018-05-23 17:53:03 -04:00
W. Felix Handte
265c2869d1 Split Wrapper Functions to Cause Inlining 2018-05-23 17:53:03 -04:00
W. Felix Handte
8d24ff0353 Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
W. Felix Handte
d18a405779 Refer to the Dictionary Match State In-Place (Sometimes) 2018-05-23 17:53:03 -04:00
Nick Terrell
e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
Nick Terrell
49cf880513 Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.
2018-05-22 14:33:22 -07:00
Yann Collet
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
Yann Collet
8572b4d09f fixed a pretty complex bug when combining ldm + btultra 2018-05-17 16:13:53 -07:00
Yann Collet
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00
Yann Collet
18fc3d3cd5 introduced bit-fractional cost evaluation
this improves compression ratio by a *tiny* amount.
It also reduces speed by a small amount.

Consequently, bit-fractional evaluation is only turned on for btultra.
2018-05-16 14:53:35 -07:00
Yann Collet
2c26df0e13 opt: removed static prices
after testing, it's actually always better to use dynamic prices
albeit initialised from dictionary.
2018-05-14 18:04:08 -07:00
Yann Collet
761758982e replaced FSE_count by FSE_count_simple
to reduce usage of stack memory.

Also : tweaked a few comments, as suggested by @terrelln
2018-05-11 16:03:37 -07:00
Yann Collet
74b1c75d64 btopt : minor adjustment of update frequencies 2018-05-10 16:32:36 -07:00
Yann Collet
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
Yann Collet
a155061328 minor code refactor for readability
removed some useless operations from optimal parser
(should not change performance, too small a difference)
2018-05-08 12:32:44 -07:00
Nick Terrell
295ab0dbfa Only load extra table positions for CDicts
Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.

Fixes #1077.
2018-04-02 14:41:30 -07:00
Yann Collet
a99c4a3621 Merge branch 'dev' into advancedDecompress 2018-03-21 06:08:28 -07:00
Yann Collet
87b0cf05bd
Merge pull request #1057 from facebook/lrmSettings
LRM parameters
2018-03-21 05:59:39 -07:00
Yann Collet
6873fec658 changed dictMore for dictContentType
which seems clearer to describe what the variable/argument is about.
2018-03-20 15:13:14 -07:00
Yann Collet
6f4d0778a5 make it possible to express compression parameters in any order 2018-03-19 14:41:23 -07:00