Commit Graph

131 Commits

Author SHA1 Message Date
W. Felix Handte
7874cf06b3 Consts and Asserts and Other Minor Nits 2018-04-20 15:30:08 -04:00
W. Felix Handte
d7347f9eea Add API for Attaching Dictionaries 2018-04-20 14:59:34 -04:00
W. Felix Handte
ca833f928f Also Reset the Chain Table 2018-04-20 14:16:27 -04:00
W. Felix Handte
8f118cf6e9 Remove inputBuffer from Context, Work Around its Absence 2018-04-20 14:08:06 -04:00
W. Felix Handte
0064e8ebc7 Remove Commented Out Support for Match Continuation over Segment Boundary 2018-04-20 13:14:37 -04:00
W. Felix Handte
14c577d4c9 Fix Signedness of Comparison 2018-04-19 20:54:35 -04:00
W. Felix Handte
f4b13e17ea Don't Clear the Dictionary Context Until No Longer Useful 2018-04-19 20:54:35 -04:00
W. Felix Handte
0abc23f72e Copy DictCtx into Working Context on Inputs Larger than 4 KB 2018-04-19 20:54:35 -04:00
W. Felix Handte
b67de2a327 Force Inline on HashChain 2018-04-19 20:54:35 -04:00
W. Felix Handte
22e16d5b50 Split DictCtx-using Code Into Separate Inlining Chain 2018-04-19 20:54:35 -04:00
W. Felix Handte
3591fe8ab8 Add Fast Reset Paths 2018-04-19 20:54:35 -04:00
W. Felix Handte
8db291bc1d Remove Match Upper Bounds Check 2018-04-19 20:54:35 -04:00
W. Felix Handte
8f9a2db0e1 Fix Some Cast/Conversion Warnings 2018-04-19 20:54:35 -04:00
W. Felix Handte
221211d7d0 Fix Offset Math 2018-04-19 20:54:35 -04:00
W. Felix Handte
a1beba13f7 Reset Stream in LZ4_compress_HC 2018-04-19 20:54:35 -04:00
W. Felix Handte
bdd7af6f71 Don't Bother Clearing Chain Table for Working Contexts 2018-04-19 20:54:35 -04:00
W. Felix Handte
895e76cc20 Push Previous Compression Offsets into the Past 2018-04-19 20:54:35 -04:00
W. Felix Handte
22db704a73 Shift Dict Limit Checks out of the Loop 2018-04-19 20:54:35 -04:00
W. Felix Handte
4f7b7a8ffa Clear Tables on Dict Load 2018-04-19 20:54:35 -04:00
W. Felix Handte
b88a0b4e88 Only Perform Dict Lookup if Attempts Remain 2018-04-19 20:54:35 -04:00
W. Felix Handte
b6c35ed642 Avoid Resetting Chain Table 2018-04-19 20:54:35 -04:00
W. Felix Handte
595ea58289 Avoid Resetting Hash Table 2018-04-19 20:54:35 -04:00
W. Felix Handte
66d217e240 Perform Lookups into the Dictionary Context 2018-04-19 20:54:35 -04:00
W. Felix Handte
fdeead0b09 Set dictCtx Rather than memcpy'ing Ctx 2018-04-19 20:54:35 -04:00
W. Felix Handte
a992d11fc2 Fully Bounds Check Hash Table Reads 2018-04-19 20:54:35 -04:00
W. Felix Handte
e75153f508 Add Debug Log Statements to HC 2018-04-19 20:54:35 -04:00
test4973
8af32ce6f7 modified a few traces for debug 2018-04-12 13:35:19 -07:00
test4973
43132af808 Merge branch 'dev' into lowAddr 2018-04-04 11:38:55 -07:00
W. Felix Handte
efc419a6d4 Replace calloc() Calls With malloc() Where Possible 2018-03-12 14:58:43 -04:00
Yann Collet
550b40849f merge lz4opt.h into lz4hc.c
Having a dedicated file for optimal parser
made sense during its creation,
it allowed Przemyslaw to work more freely on lz4opt, with less dependency on lz4hc,
moreover, the optimal parser was more complex, with its own search functions.

Since the optimal was rewritten last year, it's now a lot lighter.
It makes more sense now to integrate it directly inside lz4hc.c,
making it easier to edit (editors are a bit "lost" inside a `*.h` dependent on its #include position),
it also reduces the number of files in the project,
which fits pretty well with lz4 objectives.
(adding lz4hc requires "just" lz4hc.h and lz4hc.c).
2018-02-25 00:32:09 -08:00
Yann Collet
7173a631db edge case : compress up to end-mflimit (12 bytes)
The LZ4 block format specification
states that the last match must start
at a minimum distance of 12 bytes from the end of the block.

However, out of an abundance of caution,
the reference implementation would actually stop searching matches
at 13 bytes from the end of the block.

This patch fixes this small detail.
The new version is now able to properly compress a limit case
such as `aaaaaaaabaaa\n`
as reported by Gao Xiang (@hsiangkao).

Obviously, it doesn't change a lot of things.
This is just one additional match candidate per block, with a maximum match length of 7 (since last 5 bytes must remain literals).

With default policy, blocks are 4 MB long, so it doesn't happen too often
Compressing silesia.tar at default level 1 saves 5 bytes (100930101 -> 100930096).
At max level 12, it saves a grand 16 bytes (77389871 -> 77389855).

The impact is a bit more visible when blocks are smaller, hence more numerous.
For example, compressing silesia with blocks of 64 KB (using -12 -B4D) saves 543 bytes (77304583 -> 77304040).
So the smaller the packet size, the more visible the impact.

And it happens we have a ton of scenarios with little blocks using LZ4 compression ...

And a useless "hooray" sidenote :
the patch improves the LZ4 compression record of silesia (using -12 -B7D --no-frame-crc) by 16 bytes (77270672 -> 77270656)
and the record on enwik9 by 44 bytes (371680396 -> 371680352) (previously claimed by [smallz4](http://create.stephan-brumme.com/smallz4/) ).
2018-02-24 11:47:53 -08:00
Yann Collet
25b16e8a2e added one assert()
suggested by @terrelln
2018-02-20 15:25:45 -08:00
Yann Collet
d3a13397d9 slight hc speed benefit (~+1%)
by optimizing countback
2018-02-12 00:01:58 -08:00
Yann Collet
2b674bf02f slightly improved hc compression speed (+~1-2%)
by removing bad candidates faster.
2018-02-11 02:45:36 -08:00
Yann Collet
20e969e579 fuzzer: added low address compression test
is expected to work on linux+gcc only.
2018-02-05 15:19:00 -08:00
Nick Terrell
30e92f320c [lz4hc] level == 0 means default, not level 1 2018-01-22 12:50:06 -08:00
Yann Collet
0b203b04f6
Merge pull request #434 from lz4/pattern
conditional pattern analysis
2018-01-06 06:58:41 +01:00
Yann Collet
7d2f30c7d1 lz4opt supports _destSize
no longer limited to level 9
2017-12-22 12:47:59 +01:00
Yann Collet
9753ac4c91 conditional pattern analysis
Pattern analysis (currently limited to long ranges of identical bytes)
is actually detrimental to performance
when `nbSearches` is low.

Reason is : `nbSearches` provides a built-in protection for these cases.
The problem with patterns is that they dramatically increase the number of candidates to visit.
But with a low nbSearches, the match finder just aborts early.

In such cases, pattern analysis adds some complexity without reducing total nb of candidates.
It actually increases compression ratio a little bit, by filtering only "good" candidates,
but at a measurable speed cost, so it's not a good trade-off.

This patch makes pattern analysis optional.
It's enabled for levels 8+ only.
2017-12-22 08:07:25 +01:00
Yann Collet
55da545e7a new level 10
lz4opt is only competitive vs lz4hc level 10.
Below that level, it doesn't match the speed / compression effectiveness of regular hc parser.

This patch propose to extend lz4opt to levels 10-12.
The new level 10 tend to compress a bit better and a bit faster than previous one (mileage vary depending on file)

The only downside is that `limitedDestSize` mode is now limited to max level 9 (vs 10),
since it's only compatible with regular HC parser.
(Note : I suspect it's possible to convert lz4opt to support it too, but haven't spent time into it).
2017-12-20 14:14:01 +01:00
Yann Collet
f93b595718 lz4opt: simplified match finder invocation to LZ4HC_FindLongerMatch() 2017-11-08 17:11:51 -08:00
Yann Collet
fa03a9d3d9 added code comments 2017-11-08 08:42:59 -08:00
Yann Collet
b07d36245a fixed LZ4HC_reverseCountPattern()
for multi-bytes patterns
(which is not useful for the time being)
2017-11-07 17:58:59 -08:00
Yann Collet
897f5e9834 removed the ip++ at the beginning of block
The first byte used to be skipped
to avoid a infinite self-comparison.
This is no longer necessary, since init() ensures that index starts at 64K.

The first byte is also useless to search when each block is independent,
but it's no longer the case when blocks are linked.

Removing the first-byte-skip saves
about 10 bytes / MB on files compressed with -BD4 (linked blocks 64Kb),
which feels correct as each MB has 16 blocks of 64KB.
2017-11-07 17:37:31 -08:00
Yann Collet
71fd08c17d removed legacy version of LZ4HC_InsertAndFindBestMatch() 2017-11-07 11:33:40 -08:00
Yann Collet
c49f66f2ad ensure pattern is a 1-byte repetition 2017-11-07 11:29:28 -08:00
Yann Collet
5512a5f1a9 removed useless (1 && ...) condition
as reported by @terrelln
2017-11-07 11:22:57 -08:00
Yann Collet
7130bfe573 improved LZ4HC_reverseCountPattern() :
works for any repetitive pattern of length 1, 2 or 4 (but not 3!)
works for any endianess
2017-11-07 11:05:48 -08:00
Yann Collet
a004c1fbee fixed LZ4HC_countPattern()
- works with byte values other than `0`
- works for any repetitive pattern of length 1, 2 or 4 (but not 3!)
- works for little and big endian systems
- preserve speed of previous implementation
2017-11-07 10:53:29 -08:00
Yann Collet
aa99163752 fixed minor static analyzer warning
dead assignment
2017-11-03 12:33:55 -07:00