Vivek Miglani
c7be7d2efb
Fixing compressed block size checks
2019-07-17 12:53:15 -07:00
Vivek Miglani
3f108f82fb
Return error if block size exceeds maximum
2019-07-15 12:10:21 -07:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Ephraim Park
c7c1ba3a19
Fix a constraint stricter than the spec
2019-06-26 16:43:37 -07:00
Nick Terrell
5f228f8db2
[libzstd] Add a ZSTD_STATIC_ASSERT for BIT_DStream_status
2019-04-23 14:22:16 -07:00
Nick Terrell
a892e25374
[libzstd] Error if all sequence bits aren't consumed
2019-04-23 14:07:36 -07:00
Nick Terrell
ee130a9889
[libzstd] Check the size in readSkippableFrameSize()
2019-04-17 11:41:55 -07:00
Nick Terrell
450feb0f95
[libzstd] Fix ZSTD_decompressBound() on bad skippable frames
...
The function didn't verify that the skippable frame size is correct.
2019-04-17 11:29:42 -07:00
Josh Soref
a880ca239b
Spelling ( #1582 )
...
* spelling: accidentally
* spelling: across
* spelling: additionally
* spelling: addresses
* spelling: appropriate
* spelling: assumed
* spelling: available
* spelling: builder
* spelling: capacity
* spelling: compiler
* spelling: compressibility
* spelling: compressor
* spelling: compression
* spelling: contract
* spelling: convenience
* spelling: decompress
* spelling: description
* spelling: deflate
* spelling: deterministically
* spelling: dictionary
* spelling: display
* spelling: eliminate
* spelling: preemptively
* spelling: exclude
* spelling: failure
* spelling: independence
* spelling: independent
* spelling: intentionally
* spelling: matching
* spelling: maximum
* spelling: meaning
* spelling: mishandled
* spelling: memory
* spelling: occasionally
* spelling: occurrence
* spelling: official
* spelling: offsets
* spelling: original
* spelling: output
* spelling: overflow
* spelling: overridden
* spelling: parameter
* spelling: performance
* spelling: probability
* spelling: receives
* spelling: redundant
* spelling: recompression
* spelling: resources
* spelling: sanity
* spelling: segment
* spelling: series
* spelling: specified
* spelling: specify
* spelling: subtracted
* spelling: successful
* spelling: return
* spelling: translation
* spelling: update
* spelling: unrelated
* spelling: useless
* spelling: variables
* spelling: variety
* spelling: verbatim
* spelling: verification
* spelling: visited
* spelling: warming
* spelling: workers
* spelling: with
2019-04-12 11:18:11 -07:00
Nick Terrell
aafe97b67d
[libzstd] Switch dictUses to an enum
2019-04-10 16:50:35 -07:00
Nick Terrell
50b9c41196
[libzstd] Fix decompression dictionary bugs and clean up initialization
...
Bugs:
* `ZSTD_DCtx_refPrefix()` didn't clear the dictionary after the first
use. Fix and add a test case.
* `ZSTD_DCtx_reset()` always cleared the dictionary. Fix and add a test
case.
* After calling `ZSTD_resetDStream()` you could no longer load a
dictionary, since the stage was set to `zdss_loadHeader`. Fix and add
a test case.
Cleanup:
* Make `ZSTD_initDStream*()` and `ZSTD_resetDStream()` wrap the new
advanced API, and add test cases.
* Document the equivalent of these functions in the advanced API and
document the unstable functions as deprecated.
2019-04-10 12:59:02 -07:00
Nick Terrell
824aaa695f
[libzstd] Fix ZSTD_decompressDCtx() with a dictionary
...
* `ZSTD_decompressDCtx()` did not use the dictionary loaded by
`ZSTD_DCtx_loadDictionary()`.
* Add a unit test.
* A stacked diff uses `ZSTD_decompressDCtx()` in the
`dictionary_round_trip` and `dictionary_decompress` fuzzers.
2019-04-09 17:59:27 -07:00
Nick Terrell
bfcd5b81d7
[libzstd] Don't check the dictID in fuzzing mode
...
When `FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION` is defined don't check
the dictID. This check makes the fuzzers job harder, and it is at the
very beginning.
2019-04-08 19:57:41 -07:00
Nick Terrell
00679da22b
[libzstd] Setting ZSTD_d_maxWindowLog to 0 means default
2019-04-02 19:20:52 -07:00
shakeelrao
0a3fa6f909
Add legacy mode in documentation
2019-03-18 20:33:15 -07:00
shakeelrao
20aa1b455c
Stylistic changes
2019-03-17 19:35:43 -07:00
shakeelrao
60796e76b0
Add legacy support to decompressBound
2019-03-15 16:10:37 -07:00
shakeelrao
79827a179f
Fix incorrectly assigned value in ZSTD_errorFrameSizeInfo
...
As documented in `zstd.h`, ZSTD_decompressBound returns `ZSTD_CONTENTSIZE_ERROR`
if an error occurs (not `ZSTD_CONTENTSIZE_UNKNOWN`). This is consistent with
the error checking made in ZSTD_decompressBound, particularly line 545.
2019-03-13 01:23:07 -07:00
shakeelrao
95dfd48143
update formatting
2019-03-01 23:11:15 -08:00
shakeelrao
1e08c49f75
add stylistic changes
2019-03-01 18:29:35 -08:00
shakeelrao
2bb5eec711
update missing error case to CONTENTSIZE_ERROR
2019-03-01 00:12:16 -08:00
shakeelrao
44ae395b3e
change nbBlocks to size_t for consistency
2019-03-01 00:05:59 -08:00
shakeelrao
03026c3b1d
change compressedBound to ULL
2019-03-01 00:03:50 -08:00
shakeelrao
8930c3c79b
implement API-level changes
2019-02-28 22:55:18 -08:00
shakeelrao
dce9a09772
initialize local vars in decompressBound
2019-02-28 03:01:21 -08:00
shakeelrao
515c506b4c
switch frameBound type to ULL
2019-02-28 02:10:17 -08:00
shakeelrao
d0a3f25697
change return type to ULL
2019-02-28 01:52:01 -08:00
shakeelrao
c9d674b60d
Remove autogenerated test file
2019-02-28 01:29:04 -08:00
shakeelrao
97d3d28dab
Fix decl-after-stmnt build error
2019-02-28 01:24:54 -08:00
shakeelrao
820af1e078
Provide an API function to estimate decompressed size.
...
Introduces a new utility function `ZSTD_findFrameCompressedSize_internal` which
is equivalent to `ZSTD_findFrameCompressSize`, but accepts an additional output
parameter `bound` that computes an upper-bound for the compressed data in the frame.
The new API function is named `ZSTD_decompressBound` to be consistent with
`zstd_compressBound` (the inverse operation). Clients will now be able to compute an upper-bound for
their compressed payloads instead of guessing a large size.
Implements https://github.com/facebook/zstd/issues/1536 .
2019-02-28 00:42:49 -08:00
W. Felix Handte
501eb25102
Rename FORWARD_ERROR -> FORWARD_IF_ERROR
2019-01-29 12:56:07 -05:00
W. Felix Handte
03e040a966
Replace Uses of CHECK_E with RETURN_ERROR_IF(*_isError(...
2019-01-28 17:33:01 -05:00
W. Felix Handte
64bb6640f2
Replace CHECK_F Uses in zstdmt_compress.c and zstd_ddict.c
2019-01-28 17:15:57 -05:00
W. Felix Handte
32fed9c7be
Switch CHECK_F Calls to FORWARD_ERROR
2019-01-28 12:45:34 -05:00
W. Felix Handte
800c87fed0
Switch Unconditional RETURN_ERROR_IF Calls to RETURN_ERROR
2019-01-28 12:45:34 -05:00
W. Felix Handte
c823237d7b
Convert Checks in zstd_decompress.c to RETURN_ERROR_IF
2019-01-28 12:23:14 -05:00
W. Felix Handte
ea031f4ea2
Convert Checks in zstd_decompress_block.c to RETURN_ERROR_IF
2019-01-28 11:56:39 -05:00
Yann Collet
ededcfca57
fix confusion between unsigned <-> U32
...
as suggested in #1441 .
generally U32 and unsigned are the same thing,
except when they are not ...
case : 32-bit compilation for MIPS (uint32_t == unsigned long)
A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).
Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
but the caller user a pointer to U32 instead.
These fixes are useful.
However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.
As a consequence, the amount of code changed is larger than I would expect for such a patch.
Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.
So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.
I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.
Thoughts ?
2018-12-21 18:09:41 -08:00
W. Felix Handte
91b7309115
Mask Off Unused Functions When ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
2018-12-20 12:20:34 -08:00
W. Felix Handte
038aabde28
Mask Off Unused Functions When ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
2018-12-20 12:15:07 -08:00
W. Felix Handte
0d606ee3db
Fix Incorrect assert()
2018-12-18 13:36:39 -08:00
W. Felix Handte
c2d51637d9
Add Mutual-Exclusion Error
2018-12-18 13:36:39 -08:00
W. Felix Handte
c560e34c86
Add HUF_FORCE_DECOMPRESS_X2
2018-12-18 13:36:39 -08:00
W. Felix Handte
abd1567d3c
Move HUF_DGEN Up Out of X1 Definitions
2018-12-18 13:36:39 -08:00
W. Felix Handte
4a0572b215
Refactor Huffman Decompression Away From Ternary Tree in ZSTD_decodeLiteralsBlock
2018-12-18 13:36:39 -08:00
W. Felix Handte
432314b58a
Rename HUF_DECOMPRESS_MINIMAL -> HUF_FORCE_DECOMPRESS_X1
2018-12-18 13:36:39 -08:00
W. Felix Handte
4bbb8a48ad
Add ZSTD_FORCE_DECOMPRESS_SEQUENCES_LONG
...
This macro forces behavior in the opposite direction.
2018-12-18 13:36:39 -08:00
W. Felix Handte
64553a0e35
Rename ZSTD_DECOMPRESS_MINIMAL -> ZSTD_FORCE_DECOMPRESS_SEQUENCES_SHORT
2018-12-18 13:36:39 -08:00
W. Felix Handte
df28e5babd
Add ZSTD_DECOMPRESS_MINIMAL Macro, Which Reduces Branching of Decompress Variants
2018-12-18 13:36:39 -08:00
W. Felix Handte
f45c9df42e
Totally Hide/Disable X2 Variants when HUF_DECOMPRESS_MINIMAL is Defined
2018-12-18 13:36:39 -08:00