zstd/lib/common
mgrice b830599582 Improvements in zstd decode performance
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3).  This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.

Speedup on Xeon E5-2680v4 is in the range of 3-5%.

Measured wildcopy length distributions on silesia.tar:

level	<=8	<=16	<=24	>24
1	78.05%	11.49%	3.52%	6.94%
3	82.14%	8.99%	2.44%	6.43%
6	85.81%	6.51%	2.92%	4.76%
8	83.02%	7.31%	3.64%	6.03%
10	84.13%	6.67%	3.29%	5.91%
15	77.58%	7.55%	5.21%	9.66%
16	80.07%	7.20%	3.98%	8.75%

Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
..
bitstream.h Add support for IAR C/C++ Compiler for Arm 2019-07-29 15:25:58 +08:00
compiler.h Tweaks to create a single-file decoder 2019-08-21 17:49:17 +02:00
cpu.h Fix i386 build failure "Junk character 13" 2018-11-16 02:16:21 -06:00
debug.c grouped debug functions into debug.h 2018-06-13 15:43:09 -04:00
debug.h play around with rescale weights 2018-12-17 15:48:34 -08:00
entropy_common.c zeroise freq table with memset() 2018-06-26 17:24:41 -07:00
error_private.c Use Unused Variable in ERR_getErrorString() 2018-12-19 12:36:10 -08:00
error_private.h updated license header 2017-09-08 00:09:23 -07:00
fse_decompress.c Tweaks to create a single-file decoder 2019-08-21 17:49:17 +02:00
fse.h Spelling (#1582) 2019-04-12 11:18:11 -07:00
huf.h fix confusion between unsigned <-> U32 2018-12-21 18:09:41 -08:00
mem.h Add support for IAR C/C++ Compiler for Arm 2019-07-29 15:25:58 +08:00
pool.c Signal before unlocking in pool.c 2018-11-08 10:45:53 -08:00
pool.h changed POOL_resize() return type to int 2018-06-22 12:14:59 -07:00
threading.c Spelling (#1582) 2019-04-12 11:18:11 -07:00
threading.h [threading] Cast unused arguments to void 2018-03-06 18:36:40 -08:00
xxhash.c Add support for IAR C/C++ Compiler for Arm 2019-07-29 15:25:58 +08:00
xxhash.h xxhash can be included twice in any order 2017-03-01 13:29:29 -08:00
zstd_common.c separate DDict logic into its own module 2018-10-23 17:25:49 -07:00
zstd_errors.h fixed a second memset() on NULL 2018-10-29 15:03:57 -07:00
zstd_internal.h Improvements in zstd decode performance 2019-08-29 12:25:56 -07:00