Commit Graph

2060 Commits

Author SHA1 Message Date
Dave Watson
75fb878a90 decompress_generic: Add fastpath for small offsets
For small offsets of size 1, 2, 4 and 8, we can set a single uint64_t,
and then use it to do a memset() variation.  In particular, this makes
the somewhat-common RLE (offset 1) about 2-4x faster than the previous
implementation - we avoid not only the load blocked by store, but also
avoid the loads entirely.
2019-02-08 13:57:23 -08:00
Dave Watson
faac110e20 decompress_generic: Unroll loops a bit more
Generally we want our wildcopy loops to look like the
memcpy loops from our libc, but without the final byte copy checks.
We can unroll a bit to make long copies even faster.

The only catch is that this affects the value of FASTLOOP_SAFE_DISTANCE.
2019-02-08 13:57:23 -08:00
Dave Watson
1fbaf84306 decompress_generic: remove msan write
This store is also causing load-blocked-by-store issues, remove it.
The msan warning will have to be fixed another way if it is still an issue.
2019-02-08 13:57:23 -08:00
Dave Watson
28b824921d decompress_generic: re-add fastpath
This is the remaineder of the original 'shortcut'.  If true, we can avoid
the loop in LZ4_wildCopy, and directly copy instead.
2019-02-08 13:57:23 -08:00
Dave Watson
232f1e261f decompress_generic: drop partial copy check in fast loop
We've already checked that we are more than FASTLOOP_SAFE_DISTANCE
away from the end, so this branch can never be true, we will have
already jumped to the second decode loop.
2019-02-08 13:57:23 -08:00
Dave Watson
59332a3026 decompress_generic: Optimize literal copies
Use LZ4_wildCopy16 for variable-length literals.  For literal counts that
fit in the flag byte, copy directly.  We can also omit oend checks for
roughly the same reason as the previous shortcut:  We check once that both
match length and literal length fit in FASTLOOP_SAFE_DISTANCE, including
wildcopy distance.
2019-02-08 13:57:23 -08:00
Dave Watson
5dfa7d422b decompress_generic: optimize match copy
Add an LZ4_wildCopy16, that will wildcopy, potentially smashing up
to 16 bytes, and use it for match copy.  On x64, this avoids many
blocked loads due to store forwarding, similar to issue #411.
2019-02-08 13:57:23 -08:00
Dave Watson
28356e02ad decompress_generic: Add a loop fastpath
Copy the main loop, and change checks such that op is always less
than oend-SAFE_DISTANCE.  Currently these are added for the literal
copy length check, and for the match copy length check.

Otherwise the first loop is exactly the same as the second.  Follow on
diffs will optimize the first copy loop based on this new requirement.

I also tried instead making a separate inlineable function for the copy
loop (similar to existing partialDecode flags, etc), but I think the
changes might be significant enough to warrent doubling the code, instead
pulling out common functionality to separate functions.

This is the basic transformation that will allow several following optimisations.
2019-02-08 13:57:19 -08:00
Dave Watson
4da336062e decompress_generic: Refactor variable length fields
Make a helper function to read variable lengths for literals and
match length.
2019-02-08 13:42:42 -08:00
Yann Collet
591b662124
Merge pull request #648 from aregm/fix-VS2017-solution
Build fixed by removing unavailable project
2019-02-07 10:38:09 -08:00
Areg Melik-Adamyan
71ec7fde1e Build fixed by removing unavailable project 2019-02-06 22:29:31 -06:00
Yann Collet
976cb5ca31
Merge pull request #646 from jbms/fix-clang-optimize-attribute-ppc64le
Eliminate optimize attribute warning with clang on PPC64LE
2019-02-04 14:50:28 -08:00
Jeremy Maitin-Shepard
26e7635a0e Eliminate optimize attribute warning with clang on PPC64LE 2019-02-04 12:22:56 -08:00
Yann Collet
c3f0753d30
Merge pull request #644 from lzutao/meson-msvc-export
meson: Add -DLZ4_DLL_EXPORT=1 to build dynamic lib on Windows
2019-01-23 09:03:13 -08:00
Lzu Tao
929dbbcddf meson: Add -DLZ4_DLL_EXPORT=1 to build dynamic lib on Windows
Thanks @nacho for pointing it out.
2019-01-23 15:40:26 +07:00
Yann Collet
6305e43dce
Merge pull request #638 from lzutao/travis
Travis: Clean up .travis.yml
2019-01-11 12:11:45 -08:00
Yann Collet
bc85de6ec9
Merge pull request #639 from lzutao/meson
meson: Small improvements
2019-01-11 12:11:35 -08:00
Yann Collet
94afb9a87b
Merge pull request #640 from tzakian/remove_io_globals
Remove a bunch of global variables that tracked settings for the IO module
2019-01-11 12:11:24 -08:00
Tim Zakian
c1610690b1 Add cast around malloc 2019-01-11 09:49:26 -08:00
Tim Zakian
416916146f Add constant pointer annotations 2019-01-10 20:40:00 -08:00
Tim Zakian
5822e667cc Remove a bunch of global variables that tracked settings for the IO module, and move them in to a struct 2019-01-10 15:27:47 -08:00
Lzu Tao
c99b64af86 travis: Prefer apt-get in install field than addons-apt-sources 2019-01-11 04:17:34 +07:00
Lzu Tao
7fe378fc70 travis: Prefer script field than Cmd env 2019-01-11 04:17:34 +07:00
Lzu Tao
d2288d2cc0 meson: Favor warning if cannot find version string 2019-01-11 02:34:16 +07:00
Lzu Tao
4765ad88bd meson: Use libray as required argument in pkgconfig 2019-01-11 02:33:27 +07:00
Lzu Tao
b3b22b9660 meson: Explicit use meson setup to setup a builddir 2019-01-11 02:32:39 +07:00
Yann Collet
d4a40c6e39
Merge pull request #637 from tzakian/fix_pass-through_mode
Fix pass-through mode
2019-01-10 10:56:41 -08:00
Tim Zakian
9028682e7a Fix pass-through mode 2019-01-10 10:20:17 -08:00
Yann Collet
e30b1f73d4
Merge pull request #635 from tzakian/clean_call_to_LZ4HC_encodeSequence
Make effectfulness of calls to LZ4HC_encodeSequence clearer
2019-01-09 19:58:07 -08:00
Yann Collet
186015a5d2 fixed strict C++ compilation 2019-01-09 13:45:42 -08:00
Tim Zakian
81441e2462 Make fact that certain variables that are passed into LZ4HC_encodeSequence are changed by the function call 2019-01-09 13:42:12 -08:00
Yann Collet
baed01a9c7 fixed long sequence overflow test 2019-01-09 13:38:33 -08:00
Yann Collet
fbebf0345d minor explicit cast warning 2019-01-09 13:18:43 -08:00
Yann Collet
e953474464
Merge pull request #634 from lz4/longSeqTest
add a test to check long sequences (#631)
2019-01-09 12:22:04 -08:00
Yann Collet
c750cbe5c1
Merge pull request #631 from qiuyangs/dev
lz4hc.c: change (length >> 8) to (length / 255)
2019-01-09 12:21:39 -08:00
Yann Collet
cc34d3ff75
Merge pull request #633 from tzakian/make_block_size_public
Make LZ4F_getBlockSize public and public in experimental section
2019-01-09 12:13:17 -08:00
Yann Collet
7741c60f98 add a test to check long sequences (#631)
the test fails, as intended,
since #631 is not merged yet in this branch.
2019-01-09 12:09:52 -08:00
Tim Zakian
4ec29b0fab Fix C90 compatibility issue 2019-01-09 11:17:46 -08:00
Tim Zakian
8193742251 Make LZ4F_getBlockSize public and publis in experimental section 2019-01-09 10:49:49 -08:00
Yann Collet
d6eac9c5cf
Merge pull request #632 from rubenochiavone/fix-lz4-extesion-not-decompressing
Fix lz4 extension in input filename not causing decompression
2019-01-09 09:21:54 -08:00
Ruben O. Chiavone
4c953b46ef Add test to cover issue #596 2019-01-09 01:51:40 -03:00
Ruben O. Chiavone
e6905b5812 Fix lz4 extension in input filename not causing decompression 2019-01-08 22:56:04 -03:00
qiuyangs
06e080ace4
Merge pull request #1 from qiuyangs/sunqiuyang-fix-length>>8
lz4hc.c: change (length >> 8) to (length / 255)
2019-01-06 16:33:53 +08:00
qiuyangs
660d21272e
lz4hc.c: change (length >> 8) to (length / 255)
Every 0xff byte in the compressed block corresponds to a length of 255 (not 256) in the input data. For long repeating sequences, using (length >> 8) may generate bad compressed blocks.
2019-01-06 16:29:30 +08:00
Yann Collet
ec735ac53e updated frame format
re-wording non-full blocks,
for clarity.
2019-01-02 15:02:22 -08:00
Yann Collet
7a4e04e6a6 updated LZ4 block format
rewording the end of block conditions
for clarity and answering related questions.
2019-01-02 14:36:12 -08:00
Yann Collet
6e24ef902a
Merge pull request #620 from lzutao/meson_symlink
Update meson symlink and man1 extension
2018-12-17 09:32:01 -08:00
Yann Collet
e5a1911ec2
Merge pull request #621 from lzutao/meson_getversion
meson: Remove unused sys import
2018-12-14 09:24:50 -08:00
Lzu Tao
e23d0fb908 meson: Remove unused sys import 2018-12-14 11:12:22 +07:00
Lzu Tao
34dcc5e16d Simplify logic by setting default value for MESON_INSTALL_DESTDIR_PREFIX 2018-12-13 18:08:01 +07:00