Yann Collet
61936ba42a
Merge pull request #1705 from josepho0918/dev
...
Add support for IAR C/C++ Compiler for Arm
2019-08-05 15:57:28 +02:00
Yann Collet
0b0b83e8f3
fix test 122
...
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Joseph Chen
3855bc4295
Add support for IAR C/C++ Compiler for Arm
2019-07-29 15:25:58 +08:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Josh Soref
a880ca239b
Spelling ( #1582 )
...
* spelling: accidentally
* spelling: across
* spelling: additionally
* spelling: addresses
* spelling: appropriate
* spelling: assumed
* spelling: available
* spelling: builder
* spelling: capacity
* spelling: compiler
* spelling: compressibility
* spelling: compressor
* spelling: compression
* spelling: contract
* spelling: convenience
* spelling: decompress
* spelling: description
* spelling: deflate
* spelling: deterministically
* spelling: dictionary
* spelling: display
* spelling: eliminate
* spelling: preemptively
* spelling: exclude
* spelling: failure
* spelling: independence
* spelling: independent
* spelling: intentionally
* spelling: matching
* spelling: maximum
* spelling: meaning
* spelling: mishandled
* spelling: memory
* spelling: occasionally
* spelling: occurrence
* spelling: official
* spelling: offsets
* spelling: original
* spelling: output
* spelling: overflow
* spelling: overridden
* spelling: parameter
* spelling: performance
* spelling: probability
* spelling: receives
* spelling: redundant
* spelling: recompression
* spelling: resources
* spelling: sanity
* spelling: segment
* spelling: series
* spelling: specified
* spelling: specify
* spelling: subtracted
* spelling: successful
* spelling: return
* spelling: translation
* spelling: update
* spelling: unrelated
* spelling: useless
* spelling: variables
* spelling: variety
* spelling: verbatim
* spelling: verification
* spelling: visited
* spelling: warming
* spelling: workers
* spelling: with
2019-04-12 11:18:11 -07:00
shakeelrao
0033bb4785
Update documentation for ZSTD_frameSizeInfo
2019-03-17 17:41:27 -07:00
shakeelrao
19b75b6ecb
Test new ZSTD_findFrameCompressedSize and update documentation
2019-03-15 18:04:19 -07:00
W. Felix Handte
501eb25102
Rename FORWARD_ERROR -> FORWARD_IF_ERROR
2019-01-29 12:56:07 -05:00
W. Felix Handte
429987c9a6
Add Comment
2019-01-28 17:35:31 -05:00
W. Felix Handte
2179ce00e1
Remove CHECK_E Macro
2019-01-28 17:33:13 -05:00
W. Felix Handte
7ebd897157
Remove CHECK_F Macro
2019-01-28 17:16:32 -05:00
W. Felix Handte
324e9654d3
Add grep-able String to Error Macros
2019-01-28 12:50:36 -05:00
W. Felix Handte
a3538bbc6f
Add RETURN_ERROR and FORWARD_ERROR Macros
2019-01-28 12:45:26 -05:00
W. Felix Handte
54fa31f03b
Add RETURN_ERROR_IF Macro That Logs Debug Information When Check Fails
2019-01-28 11:43:33 -05:00
Yann Collet
ededcfca57
fix confusion between unsigned <-> U32
...
as suggested in #1441 .
generally U32 and unsigned are the same thing,
except when they are not ...
case : 32-bit compilation for MIPS (uint32_t == unsigned long)
A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).
Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
but the caller user a pointer to U32 instead.
These fixes are useful.
However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.
As a consequence, the amount of code changed is larger than I would expect for such a patch.
Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.
So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.
I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.
Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
e4ae24c229
Merge pull request #1420 from felixhandte/zstd-decompress-minimal
...
Various Macros to Allow Building Extremely Minimal Decoder Library
2018-12-20 15:17:37 -08:00
W. Felix Handte
8e61ac8161
Use Unused Variable in ERR_getErrorString()
2018-12-19 12:36:10 -08:00
W. Felix Handte
c560e34c86
Add HUF_FORCE_DECOMPRESS_X2
2018-12-18 13:36:39 -08:00
W. Felix Handte
432314b58a
Rename HUF_DECOMPRESS_MINIMAL -> HUF_FORCE_DECOMPRESS_X1
2018-12-18 13:36:39 -08:00
W. Felix Handte
605dd576ee
Remove Error Strings with ZSTD_STRIP_ERROR_STRINGS
2018-12-18 13:36:39 -08:00
W. Felix Handte
9d5f3963ff
Add Option to Not Request Inlining with ZSTD_NO_INLINE
2018-12-18 13:36:39 -08:00
W. Felix Handte
f45c9df42e
Totally Hide/Disable X2 Variants when HUF_DECOMPRESS_MINIMAL is Defined
2018-12-18 13:36:39 -08:00
Yann Collet
373ff8b983
play around with rescale weights
2018-12-17 15:48:34 -08:00
Yann Collet
9c3265a53f
Merge pull request #1417 from facebook/advancedAPI
...
Advanced API
2018-12-10 18:48:15 -08:00
Ryan Schmidt
ef4df0df4a
Fix i386 build failure "Junk character 13"
2018-11-16 02:16:21 -06:00
Yann Collet
d7e10a774a
added constant ZSTD_WINDOWLOG_LIMIT_DEFAULT
...
answering #1407 .
Also : removed obsolete function ZSTD_setDStreamParameter()
which could only be used with one parameter (DStream_p_maxWindowSize).
Now replaced by ZSTD_DCtx_setWindowSize() (which exists since a few revisions)
2018-11-13 18:12:34 -08:00
Yann Collet
626040ab53
changed PREFETCH() macro into PREFETCH_L2()
...
which is more accurate
2018-11-12 17:05:32 -08:00
Yann Collet
1b4a9c518b
Merge pull request #1410 from facebook/prefetch_dec
...
improve long-range decoder speed
2018-11-08 18:41:58 -08:00
Yann Collet
9126da5b5c
improve long-range decoder speed
...
on enwik9 at level 22 (which is almost a worst case scenario),
speed improves by +7% on my laptop (415 -> 445 MB/s)
2018-11-08 12:47:46 -08:00
Nick Terrell
a8daa2d683
Signal before unlocking in pool.c
2018-11-08 10:45:53 -08:00
Yann Collet
acd75a1448
fixed a second memset() on NULL
...
not sure why it only triggers now,
this code has been around for a while.
Introduced a new error code : dstBuffer_null,
I couldn't express anything even remotely similar with existing error codes set.
2018-10-29 15:03:57 -07:00
Yann Collet
2b4914082e
created zstd_decompress_block module
...
isolate all logic associated with block decompression
into its own module.
zstd_decompress is still in charge
of context creation/destruction,
frames, headers, streaming, special blocks, etc.
Compressed blocks themselves are now handled within zstd_decompress_block .
2018-10-25 16:28:41 -07:00
Yann Collet
ccd2d426fc
separate DDict logic into its own module
...
created zstd_ddict.c within lib/decompress
2018-10-23 17:25:49 -07:00
Ori Livneh
f31715f5e0
Enable use of bswap intrinsics in clang
...
Necessary because clang disguises itself as an older (__GNUC_MINOR__ = 2) GCC.
2018-10-11 15:01:09 -04:00
Yann Collet
6ed3b526e4
restored bitMask for shift values
...
since corrupted bitstreams can generate too large values.
This slightly reduces the benefits from clang on my laptop.
gcc results and code generation are not affected.
2018-10-10 18:29:50 -07:00
Yann Collet
c012e9540a
removed one assert()
...
that can be triggered by a corrupted bitstream.
2018-10-10 17:33:04 -07:00
Yann Collet
7791f192ee
removed one assert()
...
which can be triggered when input is corrupted.
2018-10-10 16:39:15 -07:00
Yann Collet
d3ec23313d
improved decompression speed
...
while reviewing #1364 ,
I found a decompression speed improvement.
On my laptop, the new code decompresses +5-6% faster on clang
and +2-3% faster on gcc.
not bad for an accidental optimization...
2018-10-10 15:48:43 -07:00
Nick Terrell
109bd37474
Include stddef.h for size_t
2018-09-27 15:24:48 -07:00
Yann Collet
292d8e4a83
added some tests based on limits.h
...
in order to ensure proper type mapping
when not using stdint.h
2018-09-23 23:57:30 -07:00
Yann Collet
0e5b447aaa
Merge pull request #1316 from facebook/coldDict
...
Cold dictionary mitigation
2018-09-14 10:37:46 -07:00
Yann Collet
5512400677
updated code comments, based on @terrelln review
2018-09-13 16:44:04 -07:00
Yann Collet
2618253da2
fixed PREFETCH() macro
...
for corner cases and platforms without this instruction
2018-09-12 16:15:37 -07:00
Nick Terrell
f6daddf2db
Also allow x86
2018-09-12 12:05:32 -07:00
Nick Terrell
1e0bac6a9c
[libzstd] Fix cpu for MSFT ARM
...
The `__cpuid()` and `__cpuidex()` intrinsics are only available
on x86 and x86_64.
2018-09-12 10:35:16 -07:00
Yann Collet
4de344d505
added conditional prefetch
...
depending on amount of work to do.
2018-09-12 10:29:47 -07:00
Yann Collet
63a519dbf6
implemented first prefetch
...
based on dictID.
dictContent is prefetched up to 32 KB
(no contentSize adaptation)
2018-09-11 17:23:44 -07:00
Nick Terrell
5e580de6da
[zstd] Fix seqStore growth
...
We could undersize the literals buffer by up to 11 bytes,
due to a combination of 2 bugs:
* The literals buffer didn't have `WILDCOPY_OVERLENGTH` extra
space, like it is supposed to.
* We didn't check the literals buffer size in `ZSTD_sufficientBuff()`.
2018-08-28 13:24:44 -07:00
Nick Terrell
924944e471
[zstd] Reuse the ZSTD_CCtx more often with small data.
2018-08-23 17:48:06 -07:00
Yann Collet
6e66bbf5dd
fixed several minor issues detected by scan-build
...
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00