Nick Terrell
5fcbc484c8
Merge pull request #2040 from caoyzh/dev-2
...
Optimize by prefetching on aarch64
2020-04-08 13:14:47 -07:00
Carl Woffenden
7af7735fa3
Merge remote-tracking branch 'upstream/dev' into single-file-lib
2020-04-07 11:13:02 +02:00
Carl Woffenden
edd9a07322
Code replicated in compression and decompression moved to shared headers
...
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
2020-04-07 11:02:06 +02:00
Bimba Shrestha
0154866749
moving consts to zstd_internal and reusing them
2020-04-03 14:26:15 -07:00
Nick Terrell
ac58c8d720
Fix copyright and license lines
...
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized
The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.
The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
caoyzh
7201980650
Optimize by prefetching on aarch64
2020-03-14 15:25:59 +08:00
Bimba Shrestha
dba3abc95a
Missed returns
2020-03-05 12:20:59 -08:00
Bimba Shrestha
a75e5f2ffc
bitscan add undef check
2020-03-05 11:52:15 -08:00
Bimba Shrestha
85d0efd619
Removing no-tree-vectorize for intel
2020-03-05 10:02:48 -08:00
Nick Terrell
e32e3e8662
Improve wildcopy performance across the board
2020-01-28 20:37:04 -08:00
Bimba Shrestha
b1f53b1a10
[fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit ( #1936 )
...
* Adding fail logging for superblock flow
* Dividing by targetCBlockSize instead of blockSize
* Adding new const and using more acurate formula for nbBlocks
* Only do dstCapacity check if using superblock
* Remvoing disabling logic
* Updating test to make it catch more extreme case of previou bug
* Also updating comment
* Only taking compressEnd shortcut on non-superblock
2020-01-03 16:53:51 -08:00
Bimba Shrestha
a3a3c62b81
[fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights ( #1898 )
...
Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table.
* Only setting loaded dict huf table to valid on non-zero
* Adding hasNoZeroWeights test to fse tables
* Forbiding nbBits != 0 when weight == 0
* Reverting the last commit
* Setting table log to 0 when weight == 0
* Small (invalid) zero weight dict test
* Small (valid) zero weight dict test
* Initializing repeatMode vars to check before zero check
* Removing FSE changes to seperate pr
* Reverting accidentally changed file
* Negating bool, using unsigned, optimization nit
2019-11-26 12:24:19 -08:00
Nick Terrell
718f00ff6f
Optimize decompression speed for gcc and clang ( #1892 )
...
* Optimize `ZSTD_decodeSequence()`
* Optimize Huffman decoding
* Optimize `ZSTD_decompressSequences()`
* Delete `ZSTD_decodeSequenceLong()`
2019-11-25 18:26:19 -08:00
Sen Huang
7ce891870c
Fix merge conflicts
2019-11-05 15:51:25 -05:00
Nick Terrell
919d1d8e93
Merge pull request #1831 from terrelln/zstdmt-bad-memset
...
[zstdmt] Don't memset the jobDescription
2019-10-21 15:53:57 -07:00
Felix Handte
cf725630a6
Merge pull request #1795 from felixhandte/workspace-asan
...
Add Poisoned Redzones to the Workspace When Compiling with ASAN
2019-10-21 12:15:17 -04:00
Nick Terrell
243824551f
[threading] Add debug utilities
2019-10-18 15:05:34 -07:00
Yann Collet
19741c7d99
Merge pull request #1815 from facebook/zlibwrap
...
make zlibWrapper strict ISO-C90 compatible
2019-10-16 16:45:15 -07:00
Yann Collet
2d5201b0ab
removed wildcopy8()
...
which is no longer used,
noticed by @davidbolvansky
2019-10-16 14:51:33 -07:00
W. Felix Handte
b6987acbbf
Declare the ASAN Functions We Need, Don't Include the Header
2019-10-10 13:40:16 -04:00
W. Felix Handte
edb6d884a5
Detect Whether We're Being Compiled with ASAN
2019-10-10 13:40:16 -04:00
W. Felix Handte
dc1fb684bf
Remove Unused MEM_SKIP_MSAN Macro
2019-10-10 13:40:16 -04:00
Yann Collet
cb18fffe65
enforce C90 compatibility for zlibWrapper
2019-09-24 17:50:58 -07:00
Dávid Bolvanský
1ab1a40c9c
Fixed one more place
2019-09-23 21:32:56 +02:00
Dávid Bolvanský
1f7228c040
Use clz ^ 31 instead of 31 - clz; better codegen for GCC
2019-09-23 21:23:09 +02:00
Nick Terrell
5cb7615f1f
Add UNUSED_ATTR to ZSTD_storeSeq()
2019-09-20 21:37:13 -07:00
Nick Terrell
44c65da97e
Remove literals overread in ZSTD_storeSeq() for ~neutral perf
2019-09-20 12:23:25 -07:00
Nick Terrell
cdad7fa512
Widen ZSTD_wildcopy to 32 bytes
2019-09-20 00:52:15 -07:00
Nick Terrell
efd37a64ea
Optimize decompression and fix wildcopy overread
...
* Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread.
* Optimize `ZSTD_wildcopy()` by removing unnecessary branches and
unrolling the loop.
* Extract `ZSTD_overlapCopy8()` into its own function.
* Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is
optimized for single long sequences, since that is the important
case that can end up in `ZSTD_execSequenceEnd()`. Without this
optimization, decompressing a block with 1 long match goes
from 5.7 GB/s to 800 MB/s.
* Refactor `ZSTD_execSequenceEnd()`.
* Increase the literal copy shortcut to 16.
* Add a shortcut for offset >= 16.
* Simplify `ZSTD_execSequence()` by pushing more cases into
`ZSTD_execSequenceEnd()`.
* Delete `ZSTD_execSequenceLong()` since it is exactly the
same as `ZSTD_execSequence()`.
clang-8 seeds +17.5% on silesia and +21.8% on enwik8.
gcc-9 sees +12% on silesia and +15.5% on enwik8.
TODO: More detailed measurements, and on more datasets.
Crdit to OSS-Fuzz for finding the wildcopy overread.
2019-09-19 21:07:14 -07:00
Yann Collet
3cac061db5
Merge pull request #1802 from bimbashrestha/rle_block_bound_fix_pt2
...
Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from las…
2019-09-18 16:32:37 -07:00
Bimba Shrestha
6e9f6813bb
adding bit container size
2019-09-18 13:49:45 -07:00
Bimba Shrestha
f9b6abb896
Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from last week)
2019-09-18 13:29:05 -07:00
Yann Collet
bfff5b30a4
Merge pull request #1756 from mgrice/dev
...
Improvements in zstd decode performance
2019-09-18 11:35:50 -07:00
Felix Handte
2164a130f3
Merge pull request #1780 from felixhandte/workspace-efficiency-3
...
Avoid Clearing Tables Even When Changing CParams
2019-09-16 14:37:05 -04:00
W. Felix Handte
72ea79cacd
Don't Include sanitizer/msan_interface.h
, Since Not All Platforms Provide It
...
Instead, explicitly declare the functions we use.
2019-09-16 12:08:03 -04:00
Yann Collet
09b1844d9b
Merge pull request #1784 from bimbashrestha/fse_block_bound_err
...
Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND()
2019-09-12 19:09:27 -07:00
Bimba Shrestha
fe9af338ed
Added assert to BIT_flushBits()
2019-09-12 15:35:27 -07:00
Bimba Shrestha
43da5bf27e
Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND()
2019-09-12 14:43:50 -07:00
W. Felix Handte
a10c191613
__msan_poison()
Workspace When Preparing for Re-Use
2019-09-11 17:14:45 -04:00
mgrice
5d89771529
fix warning: always_inline function might not be inlinable
2019-08-29 12:32:15 -07:00
mgrice
b830599582
Improvements in zstd decode performance
...
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3). This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.
Speedup on Xeon E5-2680v4 is in the range of 3-5%.
Measured wildcopy length distributions on silesia.tar:
level <=8 <=16 <=24 >24
1 78.05% 11.49% 3.52% 6.94%
3 82.14% 8.99% 2.44% 6.43%
6 85.81% 6.51% 2.92% 4.76%
8 83.02% 7.31% 3.64% 6.03%
10 84.13% 6.67% 3.29% 5.91%
15 77.58% 7.55% 5.21% 9.66%
16 80.07% 7.20% 3.98% 8.75%
Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
Carl Woffenden
901ea61f83
Tweaks to create a single-file decoder
...
The CHECK_F macros differ slightly (but eventually do the same thing). Older GCC needs to fallback on the old-style pragma optimisation flags.
2019-08-21 17:49:17 +02:00
Yann Collet
61936ba42a
Merge pull request #1705 from josepho0918/dev
...
Add support for IAR C/C++ Compiler for Arm
2019-08-05 15:57:28 +02:00
Yann Collet
0b0b83e8f3
fix test 122
...
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Joseph Chen
3855bc4295
Add support for IAR C/C++ Compiler for Arm
2019-07-29 15:25:58 +08:00
mgrice
812e8f2a16
perf improvements for zstd decode ( #1668 )
...
* perf improvements for zstd decode
tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)
Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.
See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0
Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8
Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.
silesia benchmark data - second triad of each file is with the original code:
file orig compressedratio encode decode change
1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s
2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s
3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s
1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60%
2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01%
3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99%
1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s
2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s
3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s
1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30%
2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20%
3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50%
1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s
2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s
3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s
1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76%
2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54%
3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89%
1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s
2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s
3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s
1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47%
2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72%
3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29%
1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s
2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s
3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s
1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15%
2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98%
3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45%
1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s
2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s
3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s
1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41%
2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42%
3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68%
1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s
2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s
3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s
1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26%
2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36%
3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38%
1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s
2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s
3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s
1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45%
2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93%
3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04%
1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s
2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s
3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s
1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70%
2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85%
3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31%
1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s
2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s
3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s
1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89%
2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43%
3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33%
1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s
2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s
3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s
1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24%
2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99%
3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05%
1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s
2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s
3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s
1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52%
2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00%
3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37%
7.51%
* makefile changed to only pass -fno-tree-vectorize to gcc
* <Replace this line with a title. Use 1 line only, 67 chars or less>
Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)
* fix for warning/error with subtraction of void* pointers
* fix c90 conformance issue - ISO C90 forbids mixed declarations and code
* Fix assert for negative diff, only when there is no overlap
* fix overflow revealed in fuzzing tests
* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Josh Soref
a880ca239b
Spelling ( #1582 )
...
* spelling: accidentally
* spelling: across
* spelling: additionally
* spelling: addresses
* spelling: appropriate
* spelling: assumed
* spelling: available
* spelling: builder
* spelling: capacity
* spelling: compiler
* spelling: compressibility
* spelling: compressor
* spelling: compression
* spelling: contract
* spelling: convenience
* spelling: decompress
* spelling: description
* spelling: deflate
* spelling: deterministically
* spelling: dictionary
* spelling: display
* spelling: eliminate
* spelling: preemptively
* spelling: exclude
* spelling: failure
* spelling: independence
* spelling: independent
* spelling: intentionally
* spelling: matching
* spelling: maximum
* spelling: meaning
* spelling: mishandled
* spelling: memory
* spelling: occasionally
* spelling: occurrence
* spelling: official
* spelling: offsets
* spelling: original
* spelling: output
* spelling: overflow
* spelling: overridden
* spelling: parameter
* spelling: performance
* spelling: probability
* spelling: receives
* spelling: redundant
* spelling: recompression
* spelling: resources
* spelling: sanity
* spelling: segment
* spelling: series
* spelling: specified
* spelling: specify
* spelling: subtracted
* spelling: successful
* spelling: return
* spelling: translation
* spelling: update
* spelling: unrelated
* spelling: useless
* spelling: variables
* spelling: variety
* spelling: verbatim
* spelling: verification
* spelling: visited
* spelling: warming
* spelling: workers
* spelling: with
2019-04-12 11:18:11 -07:00
shakeelrao
0033bb4785
Update documentation for ZSTD_frameSizeInfo
2019-03-17 17:41:27 -07:00
shakeelrao
19b75b6ecb
Test new ZSTD_findFrameCompressedSize and update documentation
2019-03-15 18:04:19 -07:00
W. Felix Handte
501eb25102
Rename FORWARD_ERROR -> FORWARD_IF_ERROR
2019-01-29 12:56:07 -05:00