AuroraMiddleware/zstd - zstd - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Terrell	e32e3e8662	Improve wildcopy performance across the board	2020-01-28 20:37:04 -08:00
Bimba Shrestha	b1f53b1a10	[fuzz] Dividing by targetCBlockSize instead of blockSize for nbBlocks fit (#1936 ) * Adding fail logging for superblock flow * Dividing by targetCBlockSize instead of blockSize * Adding new const and using more acurate formula for nbBlocks * Only do dstCapacity check if using superblock * Remvoing disabling logic * Updating test to make it catch more extreme case of previou bug * Also updating comment * Only taking compressEnd shortcut on non-superblock	2020-01-03 16:53:51 -08:00
Bimba Shrestha	a3a3c62b81	[fuzz] Only set HUF_repeat_valid if loaded table has all non-zero weights (#1898 ) Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table. * Only setting loaded dict huf table to valid on non-zero * Adding hasNoZeroWeights test to fse tables * Forbiding nbBits != 0 when weight == 0 * Reverting the last commit * Setting table log to 0 when weight == 0 * Small (invalid) zero weight dict test * Small (valid) zero weight dict test * Initializing repeatMode vars to check before zero check * Removing FSE changes to seperate pr * Reverting accidentally changed file * Negating bool, using unsigned, optimization nit	2019-11-26 12:24:19 -08:00
Nick Terrell	718f00ff6f	Optimize decompression speed for gcc and clang (#1892 ) * Optimize `ZSTD_decodeSequence()` * Optimize Huffman decoding * Optimize `ZSTD_decompressSequences()` * Delete `ZSTD_decodeSequenceLong()`	2019-11-25 18:26:19 -08:00
Sen Huang	7ce891870c	Fix merge conflicts	2019-11-05 15:51:25 -05:00
Nick Terrell	919d1d8e93	Merge pull request #1831 from terrelln/zstdmt-bad-memset [zstdmt] Don't memset the jobDescription	2019-10-21 15:53:57 -07:00
Felix Handte	cf725630a6	Merge pull request #1795 from felixhandte/workspace-asan Add Poisoned Redzones to the Workspace When Compiling with ASAN	2019-10-21 12:15:17 -04:00
Nick Terrell	243824551f	[threading] Add debug utilities	2019-10-18 15:05:34 -07:00
Yann Collet	19741c7d99	Merge pull request #1815 from facebook/zlibwrap make zlibWrapper strict ISO-C90 compatible	2019-10-16 16:45:15 -07:00
Yann Collet	2d5201b0ab	removed wildcopy8() which is no longer used, noticed by @davidbolvansky	2019-10-16 14:51:33 -07:00
W. Felix Handte	b6987acbbf	Declare the ASAN Functions We Need, Don't Include the Header	2019-10-10 13:40:16 -04:00
W. Felix Handte	edb6d884a5	Detect Whether We're Being Compiled with ASAN	2019-10-10 13:40:16 -04:00
W. Felix Handte	dc1fb684bf	Remove Unused MEM_SKIP_MSAN Macro	2019-10-10 13:40:16 -04:00
Yann Collet	cb18fffe65	enforce C90 compatibility for zlibWrapper	2019-09-24 17:50:58 -07:00
Dávid Bolvanský	1ab1a40c9c	Fixed one more place	2019-09-23 21:32:56 +02:00
Dávid Bolvanský	1f7228c040	Use clz ^ 31 instead of 31 - clz; better codegen for GCC	2019-09-23 21:23:09 +02:00
Nick Terrell	5cb7615f1f	Add UNUSED_ATTR to ZSTD_storeSeq()	2019-09-20 21:37:13 -07:00
Nick Terrell	44c65da97e	Remove literals overread in ZSTD_storeSeq() for ~neutral perf	2019-09-20 12:23:25 -07:00
Nick Terrell	cdad7fa512	Widen ZSTD_wildcopy to 32 bytes	2019-09-20 00:52:15 -07:00
Nick Terrell	efd37a64ea	Optimize decompression and fix wildcopy overread * Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread. * Optimize `ZSTD_wildcopy()` by removing unnecessary branches and unrolling the loop. * Extract `ZSTD_overlapCopy8()` into its own function. * Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is optimized for single long sequences, since that is the important case that can end up in `ZSTD_execSequenceEnd()`. Without this optimization, decompressing a block with 1 long match goes from 5.7 GB/s to 800 MB/s. * Refactor `ZSTD_execSequenceEnd()`. * Increase the literal copy shortcut to 16. * Add a shortcut for offset >= 16. * Simplify `ZSTD_execSequence()` by pushing more cases into `ZSTD_execSequenceEnd()`. * Delete `ZSTD_execSequenceLong()` since it is exactly the same as `ZSTD_execSequence()`. clang-8 seeds +17.5% on silesia and +21.8% on enwik8. gcc-9 sees +12% on silesia and +15.5% on enwik8. TODO: More detailed measurements, and on more datasets. Crdit to OSS-Fuzz for finding the wildcopy overread.	2019-09-19 21:07:14 -07:00
Yann Collet	3cac061db5	Merge pull request #1802 from bimbashrestha/rle_block_bound_fix_pt2 Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from las…	2019-09-18 16:32:37 -07:00
Bimba Shrestha	6e9f6813bb	adding bit container size	2019-09-18 13:49:45 -07:00
Bimba Shrestha	f9b6abb896	Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from last week)	2019-09-18 13:29:05 -07:00
Yann Collet	bfff5b30a4	Merge pull request #1756 from mgrice/dev Improvements in zstd decode performance	2019-09-18 11:35:50 -07:00
Felix Handte	2164a130f3	Merge pull request #1780 from felixhandte/workspace-efficiency-3 Avoid Clearing Tables Even When Changing CParams	2019-09-16 14:37:05 -04:00
W. Felix Handte	72ea79cacd	Don't Include `sanitizer/msan_interface.h`, Since Not All Platforms Provide It Instead, explicitly declare the functions we use.	2019-09-16 12:08:03 -04:00
Yann Collet	09b1844d9b	Merge pull request #1784 from bimbashrestha/fse_block_bound_err Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND()	2019-09-12 19:09:27 -07:00
Bimba Shrestha	fe9af338ed	Added assert to BIT_flushBits()	2019-09-12 15:35:27 -07:00
Bimba Shrestha	43da5bf27e	Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND()	2019-09-12 14:43:50 -07:00
W. Felix Handte	a10c191613	`__msan_poison()` Workspace When Preparing for Re-Use	2019-09-11 17:14:45 -04:00
mgrice	5d89771529	fix warning: always_inline function might not be inlinable	2019-08-29 12:32:15 -07:00
mgrice	b830599582	Improvements in zstd decode performance Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3). This change takes that further by exploiting some properties: 1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches 2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved. Speedup on Xeon E5-2680v4 is in the range of 3-5%. Measured wildcopy length distributions on silesia.tar: level <=8 <=16 <=24 >24 1 78.05% 11.49% 3.52% 6.94% 3 82.14% 8.99% 2.44% 6.43% 6 85.81% 6.51% 2.92% 4.76% 8 83.02% 7.31% 3.64% 6.03% 10 84.13% 6.67% 3.29% 5.91% 15 77.58% 7.55% 5.21% 9.66% 16 80.07% 7.20% 3.98% 8.75% Test Plan: benchmark silesia, make check	2019-08-29 12:25:56 -07:00
Carl Woffenden	901ea61f83	Tweaks to create a single-file decoder The CHECK_F macros differ slightly (but eventually do the same thing). Older GCC needs to fallback on the old-style pragma optimisation flags.	2019-08-21 17:49:17 +02:00
Yann Collet	61936ba42a	Merge pull request #1705 from josepho0918/dev Add support for IAR C/C++ Compiler for Arm	2019-08-05 15:57:28 +02:00
Yann Collet	0b0b83e8f3	fix test 122 it's an unsupported scenario.	2019-08-03 16:51:26 +02:00
Joseph Chen	3855bc4295	Add support for IAR C/C++ Compiler for Arm	2019-07-29 15:25:58 +08:00
mgrice	812e8f2a16	perf improvements for zstd decode (#1668 ) * perf improvements for zstd decode tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge) Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand. The sites where wildcopy is invoked have an interesting distribution of lengths to be copied. The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization. See how GCC autovectorizes the loop here: https://godbolt.org/z/apr0x0 Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on) After: https://godbolt.org/z/OwO4F8 Note that autovectorization still does not do a good job on the optimized version, so it's turned off\ via attribute and flag. I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both. silesia benchmark data - second triad of each file is with the original code: file orig compressedratio encode decode change 1#dickens 10192446-> 4268865(2.388), 198.9MB/s 709.6MB/s 2#dickens 10192446-> 3876126(2.630), 128.7MB/s 552.5MB/s 3#dickens 10192446-> 3682956(2.767), 104.6MB/s 537MB/s 1#dickens 10192446-> 4268865(2.388), 195.4MB/s 659.5MB/s 7.60% 2#dickens 10192446-> 3876126(2.630), 127MB/s 516.3MB/s 7.01% 3#dickens 10192446-> 3682956(2.767), 105MB/s 479.5MB/s 11.99% 1#mozilla 51220480-> 20117517(2.546), 285.4MB/s 734.9MB/s 2#mozilla 51220480-> 19067018(2.686), 220.8MB/s 686.3MB/s 3#mozilla 51220480-> 18508283(2.767), 152.2MB/s 669.4MB/s 1#mozilla 51220480-> 20117517(2.546), 283.4MB/s 697.9MB/s 5.30% 2#mozilla 51220480-> 19067018(2.686), 225.9MB/s 665MB/s 3.20% 3#mozilla 51220480-> 18508283(2.767), 154.5MB/s 640.6MB/s 4.50% 1#mr 9970564-> 3840242(2.596), 262.4MB/s 899.8MB/s 2#mr 9970564-> 3600976(2.769), 181.2MB/s 717.9MB/s 3#mr 9970564-> 3563987(2.798), 116.3MB/s 620MB/s 1#mr 9970564-> 3840242(2.596), 253.2MB/s 827.3MB/s 8.76% 2#mr 9970564-> 3600976(2.769), 177.4MB/s 655.4MB/s 9.54% 3#mr 9970564-> 3563987(2.798), 111.2MB/s 564.2MB/s 9.89% 1#nci 33553445-> 2849306(11.78), 575.2MB/s , 1335.8MB/s 2#nci 33553445-> 2890166(11.61), 509.3MB/s , 1238.1MB/s 3#nci 33553445-> 2857408(11.74), 431MB/s , 1210.7MB/s 1#nci 33553445-> 2849306(11.78), 565.4MB/s , 1220.2MB/s 9.47% 2#nci 33553445-> 2890166(11.61), 508.2MB/s , 1128.4MB/s 9.72% 3#nci 33553445-> 2857408(11.74), 429.1MB/s , 1097.7MB/s 10.29% 1#ooffice 6152192-> 3590954(1.713), 231.4MB/s , 662.6MB/s 2#ooffice 6152192-> 3323931(1.851), 162.8MB/s , 592.6MB/s 3#ooffice 6152192-> 3145625(1.956), 99.9MB/s , 549.6MB/s 1#ooffice 6152192-> 3590954(1.713), 224.7MB/s , 624.2MB/s 6.15% 2#ooffice 6152192-> 3323931 (1.851), 155MB/s , 564.5MB/s 4.98% 3#ooffice 6152192-> 3145625(1.956), 101.1MB/s , 521.2MB/s 5.45% 1#osdb 10085684-> 3739042(2.697), 271.9MB/s 876.4MB/s 2#osdb 10085684-> 3493875(2.887), 208.2MB/s 857MB/s 3#osdb 10085684-> 3515831(2.869), 135.3MB/s 805.4MB/s 1#osdb 10085684-> 3739042(2.697), 257.4MB/s 793.8MB/s 10.41% 2#osdb 10085684-> 3493875(2.887), 209.7MB/s 776.1MB/s 10.42% 3#osdb 10085684-> 3515831(2.869), 130.6MB/s 727.7MB/s 10.68% 1#reymont 6627202-> 2152771(3.078), 198.9MB/s 696.2MB/s 2#reymont 6627202-> 2071140(3.200), 170MB/s 595.2MB/s 3#reymont 6627202-> 1953597(3.392), 128.5MB/s 609.7MB/s 1#reymont 6627202-> 2152771(3.078), 199.6MB/s 655.2MB/s 6.26% 2#reymont 6627202-> 2071140(3.200), 168.2MB/s 554.4MB/s 7.36% 3#reymont 6627202-> 1953597(3.392), 128.7MB/s 557.4MB/s 9.38% 1#samba 21606400-> 5510994(3.921), 338.1MB/s 1066MB/s 2#samba 21606400-> 5240208(4.123), 258.7MB/s 992.3MB/s 3#samba 21606400-> 5003358(4.318), 200.2MB/s 991.1MB/s 1#samba 21606400-> 5510994(3.921), 330.8MB/s 974MB/s 9.45% 2#samba 21606400-> 5240208(4.123), 257.9MB/s 919.4MB/s 7.93% 3#samba 21606400-> 5003358(4.318), 198.5MB/s 908.9MB/s 9.04% 1#sao 7251944-> 6256401(1.159), 194.6MB/s 602.2MB/s 2#sao 7251944-> 5808761(1.248), 128.2MB/s 532.1MB/s 3#sao 7251944-> 5556318(1.305), 73MB/s 509.4MB/s 1#sao 7251944-> 6256401(1.159), 198.7MB/s 580.7MB/s 3.70% 2#sao 7251944-> 5808761(1.248), 129.1MB/s 502.7MB/s 5.85% 3#sao 7251944-> 5556318(1.305), 74.6MB/s 493.1MB/s 3.31% 1#webster 41458703-> 13692222(3.028), 222.3MB/s 752MB/s 2#webster 41458703-> 12842646(3.228), 157.6MB/s 532.2MB/s 3#webster 41458703-> 12191964(3.400), 124MB/s 468.5MB/s 1#webster 41458703-> 13692222(3.028), 219.7MB/s 697MB/s 7.89% 2#webster 41458703-> 12842646(3.228), 153.9MB/s 495.4MB/s 7.43% 3#webster 41458703-> 12191964(3.400), 124.8MB/s 444.8MB/s 5.33% 1#xml 5345280-> 696652(7.673), 485MB/s , 1333.9MB/s 2#xml 5345280-> 681492(7.843), 405.2MB/s , 1237.5MB/s 3#xml 5345280-> 639057(8.364), 328.5MB/s , 1281.3MB/s 1#xml 5345280-> 696652(7.673), 473.1MB/s , 1232.4MB/s 8.24% 2#xml 5345280-> 681492(7.843), 398.6MB/s , 1145.9MB/s 7.99% 3#xml 5345280-> 639057(8.364), 327.1MB/s , 1175MB/s 9.05% 1#x-ray 8474240-> 6772557(1.251), 521.3MB/s 762.6MB/s 2#x-ray 8474240-> 6684531(1.268), 230.5MB/s 688.5MB/s 3#x-ray 8474240-> 6166679(1.374), 68.7MB/s 478.8MB/s 1#x-ray 8474240-> 6772557(1.251), 502.8MB/s 736.7MB/s 3.52% 2#x-ray 8474240-> 6684531(1.268), 224.4MB/s 662MB/s 4.00% 3#x-ray 8474240-> 6166679(1.374), 67.3MB/s 437.8MB/s 9.37% 7.51% * makefile changed to only pass -fno-tree-vectorize to gcc * <Replace this line with a title. Use 1 line only, 67 chars or less> Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__) * fix for warning/error with subtraction of void* pointers * fix c90 conformance issue - ISO C90 forbids mixed declarations and code * Fix assert for negative diff, only when there is no overlap * fix overflow revealed in fuzzing tests * tweak for small speed increase	2019-07-11 18:31:07 -04:00
Josh Soref	a880ca239b	Spelling (#1582 ) * spelling: accidentally * spelling: across * spelling: additionally * spelling: addresses * spelling: appropriate * spelling: assumed * spelling: available * spelling: builder * spelling: capacity * spelling: compiler * spelling: compressibility * spelling: compressor * spelling: compression * spelling: contract * spelling: convenience * spelling: decompress * spelling: description * spelling: deflate * spelling: deterministically * spelling: dictionary * spelling: display * spelling: eliminate * spelling: preemptively * spelling: exclude * spelling: failure * spelling: independence * spelling: independent * spelling: intentionally * spelling: matching * spelling: maximum * spelling: meaning * spelling: mishandled * spelling: memory * spelling: occasionally * spelling: occurrence * spelling: official * spelling: offsets * spelling: original * spelling: output * spelling: overflow * spelling: overridden * spelling: parameter * spelling: performance * spelling: probability * spelling: receives * spelling: redundant * spelling: recompression * spelling: resources * spelling: sanity * spelling: segment * spelling: series * spelling: specified * spelling: specify * spelling: subtracted * spelling: successful * spelling: return * spelling: translation * spelling: update * spelling: unrelated * spelling: useless * spelling: variables * spelling: variety * spelling: verbatim * spelling: verification * spelling: visited * spelling: warming * spelling: workers * spelling: with	2019-04-12 11:18:11 -07:00
shakeelrao	0033bb4785	Update documentation for ZSTD_frameSizeInfo	2019-03-17 17:41:27 -07:00
shakeelrao	19b75b6ecb	Test new ZSTD_findFrameCompressedSize and update documentation	2019-03-15 18:04:19 -07:00
W. Felix Handte	501eb25102	Rename FORWARD_ERROR -> FORWARD_IF_ERROR	2019-01-29 12:56:07 -05:00
W. Felix Handte	429987c9a6	Add Comment	2019-01-28 17:35:31 -05:00
W. Felix Handte	2179ce00e1	Remove CHECK_E Macro	2019-01-28 17:33:13 -05:00
W. Felix Handte	7ebd897157	Remove CHECK_F Macro	2019-01-28 17:16:32 -05:00
W. Felix Handte	324e9654d3	Add grep-able String to Error Macros	2019-01-28 12:50:36 -05:00
W. Felix Handte	a3538bbc6f	Add RETURN_ERROR and FORWARD_ERROR Macros	2019-01-28 12:45:26 -05:00
W. Felix Handte	54fa31f03b	Add RETURN_ERROR_IF Macro That Logs Debug Information When Check Fails	2019-01-28 11:43:33 -05:00
Yann Collet	ededcfca57	fix confusion between unsigned <-> U32 as suggested in #1441. generally U32 and unsigned are the same thing, except when they are not ... case : 32-bit compilation for MIPS (uint32_t == unsigned long) A vast majority of transformation consists in transforming U32 into unsigned. In rare cases, it's the other way around (typically for internal code, such as seeds). Among a few issues this patches solves : - some parameters were declared with type `unsigned` in .h, but with type `U32` in their implementation .c . - some parameters have type unsigned*, but the caller user a pointer to U32 instead. These fixes are useful. However, the bulk of changes is about %u formating, which requires unsigned type, but generally receives U32 values instead, often just for brevity (U32 is shorter than unsigned). These changes are generally minor, or even annoying. As a consequence, the amount of code changed is larger than I would expect for such a patch. Testing is also a pain : it requires manually modifying `mem.h`, in order to lie about `U32` and force it to be an `unsigned long` typically. On a 64-bit system, this will break the equivalence unsigned == U32. Unfortunately, it will also break a few static_assert(), controlling structure sizes. So it also requires modifying `debug.h` to make `static_assert()` a noop. And then reverting these changes. So it's inconvenient, and as a consequence, this property is currently not checked during CI tests. Therefore, these problems can emerge again in the future. I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests. It's another restriction for coding, adding more frustration during merge tests, since most platforms don't need this distinction (hence contributor will not see it), and while this can matter in theory, the number of platforms impacted seems minimal. Thoughts ?	2018-12-21 18:09:41 -08:00
Yann Collet	e4ae24c229	Merge pull request #1420 from felixhandte/zstd-decompress-minimal Various Macros to Allow Building Extremely Minimal Decoder Library	2018-12-20 15:17:37 -08:00
W. Felix Handte	8e61ac8161	Use Unused Variable in ERR_getErrorString()	2018-12-19 12:36:10 -08:00

1 2 3 4 5 ...

504 Commits