Commit Graph

584 Commits

Author SHA1 Message Date
Yann Collet
3cac061db5
Merge pull request #1802 from bimbashrestha/rle_block_bound_fix_pt2
Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from las…
2019-09-18 16:32:37 -07:00
Bimba Shrestha
6e9f6813bb adding bit container size 2019-09-18 13:49:45 -07:00
Bimba Shrestha
f9b6abb896 Adding 4 blocks to FSE_BLOCKBOUND() in lib/common (different from last week) 2019-09-18 13:29:05 -07:00
Yann Collet
bfff5b30a4
Merge pull request #1756 from mgrice/dev
Improvements in zstd decode performance
2019-09-18 11:35:50 -07:00
Felix Handte
2164a130f3
Merge pull request #1780 from felixhandte/workspace-efficiency-3
Avoid Clearing Tables Even When Changing CParams
2019-09-16 14:37:05 -04:00
W. Felix Handte
72ea79cacd Don't Include sanitizer/msan_interface.h, Since Not All Platforms Provide It
Instead, explicitly declare the functions we use.
2019-09-16 12:08:03 -04:00
Yann Collet
09b1844d9b
Merge pull request #1784 from bimbashrestha/fse_block_bound_err
Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND()
2019-09-12 19:09:27 -07:00
Bimba Shrestha
fe9af338ed Added assert to BIT_flushBits() 2019-09-12 15:35:27 -07:00
Bimba Shrestha
43da5bf27e Rearranging assert and allowing 4 extra for FSE_BLOCKBOUND() 2019-09-12 14:43:50 -07:00
W. Felix Handte
a10c191613 __msan_poison() Workspace When Preparing for Re-Use 2019-09-11 17:14:45 -04:00
mgrice
5d89771529 fix warning: always_inline function might not be inlinable 2019-08-29 12:32:15 -07:00
mgrice
b830599582 Improvements in zstd decode performance
Summary: The idea behind wildcopy is that it can be cheaper to copy more bytes (say 8) than it is to copy less (say, 3).  This change takes that further by exploiting some properties:
1. it's almost always OK to copy 16 bytes instead of 8, which means fewer copy instructions, and fewer branches
2. A 16 byte chunk size means that ~90% of wildcopy invocations will have a trip count of 1, so branch prediction will be improved.

Speedup on Xeon E5-2680v4 is in the range of 3-5%.

Measured wildcopy length distributions on silesia.tar:

level	<=8	<=16	<=24	>24
1	78.05%	11.49%	3.52%	6.94%
3	82.14%	8.99%	2.44%	6.43%
6	85.81%	6.51%	2.92%	4.76%
8	83.02%	7.31%	3.64%	6.03%
10	84.13%	6.67%	3.29%	5.91%
15	77.58%	7.55%	5.21%	9.66%
16	80.07%	7.20%	3.98%	8.75%

Test Plan: benchmark silesia, make check
2019-08-29 12:25:56 -07:00
Carl Woffenden
901ea61f83 Tweaks to create a single-file decoder
The CHECK_F macros differ slightly (but eventually do the same thing). Older GCC needs to fallback on the old-style pragma optimisation flags.
2019-08-21 17:49:17 +02:00
Yann Collet
61936ba42a
Merge pull request #1705 from josepho0918/dev
Add support for IAR C/C++ Compiler for Arm
2019-08-05 15:57:28 +02:00
Yann Collet
0b0b83e8f3 fix test 122
it's an unsupported scenario.
2019-08-03 16:51:26 +02:00
Joseph Chen
3855bc4295 Add support for IAR C/C++ Compiler for Arm 2019-07-29 15:25:58 +08:00
mgrice
812e8f2a16 perf improvements for zstd decode (#1668)
* perf improvements for zstd decode

tldr: 7.5% average decode speedup on silesia corpus at compression levels 1-3 (sandy bridge)

Background: while investigating zstd perf differences between clang and gcc I noticed that even though gcc is vectorizing the loop in in wildcopy, it was not being done as well as could be done by hand.  The sites where wildcopy is invoked have an interesting distribution of lengths to be copied.  The loop trip count is rarely above 1, yet long copies are common enough to make their performance important.The code in zstd_decompress.c to invoke wildcopy handles the latter well but the gcc autovectorizer introduces a needlessly expensive startup check for vectorization.

See how GCC autovectorizes the loop here:
https://godbolt.org/z/apr0x0

Here is the code after this diff has been applied: (left hand side is the good one, right is with vectorizer on)
After: https://godbolt.org/z/OwO4F8

Note that autovectorization still does not do a good job on the optimized version, so it's turned off\
 via attribute and flag.  I found that neither attribute nor command-line flag were entirely successful in turning off vectorization, which is why there were both.

    silesia benchmark data - second triad of each file is with the original code:

    file      orig        compressedratio     encode              decode           change
    1#dickens   10192446->   4268865(2.388),       198.9MB/s           709.6MB/s
    2#dickens   10192446->   3876126(2.630),       128.7MB/s           552.5MB/s
    3#dickens   10192446->   3682956(2.767),       104.6MB/s             537MB/s
    1#dickens   10192446->   4268865(2.388),       195.4MB/s           659.5MB/s     7.60%
    2#dickens   10192446->   3876126(2.630),         127MB/s           516.3MB/s     7.01%
    3#dickens   10192446->   3682956(2.767),         105MB/s           479.5MB/s    11.99%
    1#mozilla   51220480->  20117517(2.546),       285.4MB/s           734.9MB/s
    2#mozilla   51220480->  19067018(2.686),       220.8MB/s           686.3MB/s
    3#mozilla   51220480->  18508283(2.767),       152.2MB/s           669.4MB/s
    1#mozilla   51220480->  20117517(2.546),       283.4MB/s           697.9MB/s     5.30%
    2#mozilla   51220480->  19067018(2.686),       225.9MB/s             665MB/s     3.20%
    3#mozilla   51220480->  18508283(2.767),       154.5MB/s           640.6MB/s     4.50%
    1#mr         9970564->   3840242(2.596),       262.4MB/s           899.8MB/s
    2#mr         9970564->   3600976(2.769),       181.2MB/s           717.9MB/s
    3#mr         9970564->   3563987(2.798),       116.3MB/s             620MB/s
    1#mr         9970564->   3840242(2.596),       253.2MB/s           827.3MB/s     8.76%
    2#mr         9970564->   3600976(2.769),       177.4MB/s           655.4MB/s     9.54%
    3#mr         9970564->   3563987(2.798),       111.2MB/s           564.2MB/s     9.89%
    1#nci       33553445->   2849306(11.78),       575.2MB/s ,        1335.8MB/s
    2#nci       33553445->   2890166(11.61),       509.3MB/s ,        1238.1MB/s
    3#nci       33553445->   2857408(11.74),         431MB/s ,        1210.7MB/s
    1#nci       33553445->   2849306(11.78),       565.4MB/s ,        1220.2MB/s     9.47%
    2#nci       33553445->   2890166(11.61),       508.2MB/s ,        1128.4MB/s     9.72%
    3#nci       33553445->   2857408(11.74),       429.1MB/s ,        1097.7MB/s    10.29%
    1#ooffice    6152192->   3590954(1.713),       231.4MB/s ,         662.6MB/s
    2#ooffice    6152192->   3323931(1.851),       162.8MB/s ,         592.6MB/s
    3#ooffice    6152192->   3145625(1.956),        99.9MB/s ,         549.6MB/s
    1#ooffice    6152192->   3590954(1.713),       224.7MB/s ,         624.2MB/s     6.15%
    2#ooffice    6152192->   3323931 (1.851),        155MB/s ,         564.5MB/s     4.98%
    3#ooffice    6152192->   3145625(1.956),       101.1MB/s ,         521.2MB/s     5.45%
    1#osdb      10085684->   3739042(2.697),       271.9MB/s           876.4MB/s
    2#osdb      10085684->   3493875(2.887),       208.2MB/s             857MB/s
    3#osdb      10085684->   3515831(2.869),       135.3MB/s           805.4MB/s
    1#osdb      10085684->   3739042(2.697),       257.4MB/s           793.8MB/s    10.41%
    2#osdb      10085684->   3493875(2.887),       209.7MB/s           776.1MB/s    10.42%
    3#osdb      10085684->   3515831(2.869),       130.6MB/s           727.7MB/s    10.68%
    1#reymont    6627202->   2152771(3.078),       198.9MB/s           696.2MB/s
    2#reymont    6627202->   2071140(3.200),         170MB/s           595.2MB/s
    3#reymont    6627202->   1953597(3.392),       128.5MB/s           609.7MB/s
    1#reymont    6627202->   2152771(3.078),       199.6MB/s           655.2MB/s     6.26%
    2#reymont    6627202->   2071140(3.200),       168.2MB/s           554.4MB/s     7.36%
    3#reymont    6627202->   1953597(3.392),       128.7MB/s           557.4MB/s     9.38%
    1#samba     21606400->   5510994(3.921),       338.1MB/s            1066MB/s
    2#samba     21606400->   5240208(4.123),       258.7MB/s           992.3MB/s
    3#samba     21606400->   5003358(4.318),       200.2MB/s           991.1MB/s
    1#samba     21606400->   5510994(3.921),       330.8MB/s             974MB/s     9.45%
    2#samba     21606400->   5240208(4.123),       257.9MB/s           919.4MB/s     7.93%
    3#samba     21606400->   5003358(4.318),       198.5MB/s           908.9MB/s     9.04%
    1#sao        7251944->   6256401(1.159),       194.6MB/s           602.2MB/s
    2#sao        7251944->   5808761(1.248),       128.2MB/s           532.1MB/s
    3#sao        7251944->   5556318(1.305),          73MB/s           509.4MB/s
    1#sao        7251944->   6256401(1.159),       198.7MB/s           580.7MB/s     3.70%
    2#sao        7251944->   5808761(1.248),       129.1MB/s           502.7MB/s     5.85%
    3#sao        7251944->   5556318(1.305),        74.6MB/s           493.1MB/s     3.31%
    1#webster   41458703->  13692222(3.028),       222.3MB/s             752MB/s
    2#webster   41458703->  12842646(3.228),       157.6MB/s           532.2MB/s
    3#webster   41458703->  12191964(3.400),         124MB/s           468.5MB/s
    1#webster   41458703->  13692222(3.028),       219.7MB/s             697MB/s     7.89%
    2#webster   41458703->  12842646(3.228),       153.9MB/s           495.4MB/s     7.43%
    3#webster   41458703->  12191964(3.400),       124.8MB/s           444.8MB/s     5.33%
    1#xml        5345280->    696652(7.673),         485MB/s ,        1333.9MB/s
    2#xml        5345280->    681492(7.843),       405.2MB/s ,        1237.5MB/s
    3#xml        5345280->    639057(8.364),       328.5MB/s ,        1281.3MB/s
    1#xml        5345280->    696652(7.673),       473.1MB/s ,        1232.4MB/s     8.24%
    2#xml        5345280->    681492(7.843),       398.6MB/s ,        1145.9MB/s     7.99%
    3#xml        5345280->    639057(8.364),       327.1MB/s ,          1175MB/s     9.05%
    1#x-ray      8474240->   6772557(1.251),       521.3MB/s           762.6MB/s
    2#x-ray      8474240->   6684531(1.268),       230.5MB/s           688.5MB/s
    3#x-ray      8474240->   6166679(1.374),        68.7MB/s           478.8MB/s
    1#x-ray      8474240->   6772557(1.251),       502.8MB/s           736.7MB/s     3.52%
    2#x-ray      8474240->   6684531(1.268),       224.4MB/s             662MB/s     4.00%
    3#x-ray      8474240->   6166679(1.374),        67.3MB/s           437.8MB/s     9.37%

                                                                                     7.51%

* makefile changed to only pass -fno-tree-vectorize to gcc

* <Replace this line with a title. Use 1 line only, 67 chars or less>

Don't add "no-tree-vectorize" attribute on clang (which defines __GNUC__)

* fix for warning/error with subtraction of void* pointers

* fix c90 conformance issue - ISO C90 forbids mixed declarations and code

* Fix assert for negative diff, only when there is no overlap

* fix overflow revealed in fuzzing tests

* tweak for small speed increase
2019-07-11 18:31:07 -04:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
shakeelrao
0033bb4785 Update documentation for ZSTD_frameSizeInfo 2019-03-17 17:41:27 -07:00
shakeelrao
19b75b6ecb Test new ZSTD_findFrameCompressedSize and update documentation 2019-03-15 18:04:19 -07:00
W. Felix Handte
501eb25102 Rename FORWARD_ERROR -> FORWARD_IF_ERROR 2019-01-29 12:56:07 -05:00
W. Felix Handte
429987c9a6 Add Comment 2019-01-28 17:35:31 -05:00
W. Felix Handte
2179ce00e1 Remove CHECK_E Macro 2019-01-28 17:33:13 -05:00
W. Felix Handte
7ebd897157 Remove CHECK_F Macro 2019-01-28 17:16:32 -05:00
W. Felix Handte
324e9654d3 Add grep-able String to Error Macros 2019-01-28 12:50:36 -05:00
W. Felix Handte
a3538bbc6f Add RETURN_ERROR and FORWARD_ERROR Macros 2019-01-28 12:45:26 -05:00
W. Felix Handte
54fa31f03b Add RETURN_ERROR_IF Macro That Logs Debug Information When Check Fails 2019-01-28 11:43:33 -05:00
Yann Collet
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
e4ae24c229
Merge pull request #1420 from felixhandte/zstd-decompress-minimal
Various Macros to Allow Building Extremely Minimal Decoder Library
2018-12-20 15:17:37 -08:00
W. Felix Handte
8e61ac8161 Use Unused Variable in ERR_getErrorString() 2018-12-19 12:36:10 -08:00
W. Felix Handte
c560e34c86 Add HUF_FORCE_DECOMPRESS_X2 2018-12-18 13:36:39 -08:00
W. Felix Handte
432314b58a Rename HUF_DECOMPRESS_MINIMAL -> HUF_FORCE_DECOMPRESS_X1 2018-12-18 13:36:39 -08:00
W. Felix Handte
605dd576ee Remove Error Strings with ZSTD_STRIP_ERROR_STRINGS 2018-12-18 13:36:39 -08:00
W. Felix Handte
9d5f3963ff Add Option to Not Request Inlining with ZSTD_NO_INLINE 2018-12-18 13:36:39 -08:00
W. Felix Handte
f45c9df42e Totally Hide/Disable X2 Variants when HUF_DECOMPRESS_MINIMAL is Defined 2018-12-18 13:36:39 -08:00
Yann Collet
373ff8b983 play around with rescale weights 2018-12-17 15:48:34 -08:00
Yann Collet
9c3265a53f
Merge pull request #1417 from facebook/advancedAPI
Advanced API
2018-12-10 18:48:15 -08:00
Ryan Schmidt
ef4df0df4a Fix i386 build failure "Junk character 13" 2018-11-16 02:16:21 -06:00
Yann Collet
d7e10a774a added constant ZSTD_WINDOWLOG_LIMIT_DEFAULT
answering #1407.

Also : removed obsolete function ZSTD_setDStreamParameter()
which could only be used with one parameter (DStream_p_maxWindowSize).
Now replaced by ZSTD_DCtx_setWindowSize() (which exists since a few revisions)
2018-11-13 18:12:34 -08:00
Yann Collet
626040ab53 changed PREFETCH() macro into PREFETCH_L2()
which is more accurate
2018-11-12 17:05:32 -08:00
Yann Collet
1b4a9c518b
Merge pull request #1410 from facebook/prefetch_dec
improve long-range decoder speed
2018-11-08 18:41:58 -08:00
Yann Collet
9126da5b5c improve long-range decoder speed
on enwik9 at level 22 (which is almost a worst case scenario),
speed improves by +7% on my laptop (415 -> 445 MB/s)
2018-11-08 12:47:46 -08:00
Nick Terrell
a8daa2d683 Signal before unlocking in pool.c 2018-11-08 10:45:53 -08:00
Yann Collet
acd75a1448 fixed a second memset() on NULL
not sure why it only triggers now,
this code has been around for a while.

Introduced a new error code : dstBuffer_null,
I couldn't express anything even remotely similar with existing error codes set.
2018-10-29 15:03:57 -07:00
Yann Collet
2b4914082e created zstd_decompress_block module
isolate all logic associated with block decompression
into its own module.

zstd_decompress is still in charge
of context creation/destruction,
frames, headers, streaming, special blocks, etc.

Compressed blocks themselves are now handled within zstd_decompress_block .
2018-10-25 16:28:41 -07:00
Yann Collet
ccd2d426fc separate DDict logic into its own module
created zstd_ddict.c within lib/decompress
2018-10-23 17:25:49 -07:00
Ori Livneh
f31715f5e0 Enable use of bswap intrinsics in clang
Necessary because clang disguises itself as an older (__GNUC_MINOR__ = 2) GCC.
2018-10-11 15:01:09 -04:00
Yann Collet
6ed3b526e4 restored bitMask for shift values
since corrupted bitstreams can generate too large values.

This slightly reduces the benefits from clang on my laptop.
gcc results and code generation are not affected.
2018-10-10 18:29:50 -07:00
Yann Collet
c012e9540a removed one assert()
that can be triggered by a corrupted bitstream.
2018-10-10 17:33:04 -07:00
Yann Collet
7791f192ee removed one assert()
which can be triggered when input is corrupted.
2018-10-10 16:39:15 -07:00
Yann Collet
d3ec23313d improved decompression speed
while reviewing #1364,
I found a decompression speed improvement.

On my laptop, the new code decompresses +5-6% faster on clang
and +2-3% faster on gcc.

not bad for an accidental optimization...
2018-10-10 15:48:43 -07:00
Nick Terrell
109bd37474 Include stddef.h for size_t 2018-09-27 15:24:48 -07:00
Yann Collet
292d8e4a83 added some tests based on limits.h
in order to ensure proper type mapping
when not using stdint.h
2018-09-23 23:57:30 -07:00
Yann Collet
0e5b447aaa
Merge pull request #1316 from facebook/coldDict
Cold dictionary mitigation
2018-09-14 10:37:46 -07:00
Yann Collet
5512400677 updated code comments, based on @terrelln review 2018-09-13 16:44:04 -07:00
Yann Collet
2618253da2 fixed PREFETCH() macro
for corner cases and platforms without this instruction
2018-09-12 16:15:37 -07:00
Nick Terrell
f6daddf2db Also allow x86 2018-09-12 12:05:32 -07:00
Nick Terrell
1e0bac6a9c [libzstd] Fix cpu for MSFT ARM
The `__cpuid()` and `__cpuidex()` intrinsics are only available
on x86 and x86_64.
2018-09-12 10:35:16 -07:00
Yann Collet
4de344d505 added conditional prefetch
depending on amount of work to do.
2018-09-12 10:29:47 -07:00
Yann Collet
63a519dbf6 implemented first prefetch
based on dictID.
dictContent is prefetched up to 32 KB
(no contentSize adaptation)
2018-09-11 17:23:44 -07:00
Nick Terrell
5e580de6da [zstd] Fix seqStore growth
We could undersize the literals buffer by up to 11 bytes,
due to a combination of 2 bugs:
* The literals buffer didn't have `WILDCOPY_OVERLENGTH` extra
  space, like it is supposed to.
* We didn't check the literals buffer size in `ZSTD_sufficientBuff()`.
2018-08-28 13:24:44 -07:00
Nick Terrell
924944e471 [zstd] Reuse the ZSTD_CCtx more often with small data. 2018-08-23 17:48:06 -07:00
Yann Collet
6e66bbf5dd fixed several minor issues detected by scan-build
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00
Yann Collet
bbd78df59b add build macro NO_PREFETCH
prevent usage of prefetch intrinsic commands
which are not supported by c2rust
(see https://github.com/immunant/c2rust/issues/13)
2018-07-06 17:06:04 -07:00
Yann Collet
121aa2c388
Merge pull request #1211 from facebook/staticAssert
updated DEBUG_STATIC_ASSERT()
2018-06-27 12:19:17 -07:00
Yann Collet
ff773bfcde zeroise freq table with memset()
improves decoding speed by ~5% in github_users sample set
2018-06-26 17:24:41 -07:00
Yann Collet
7b9bbf77c9 switched to a sizeof() version
avoid -Werror=unused-variable issue
2018-06-26 14:08:35 -07:00
Yann Collet
f98ec46979 updated DEBUG_STATIC_ASSERT()
following suggestion from #1209
2018-06-26 12:04:59 -07:00
Yann Collet
fbd5dfc1b1 changed POOL_resize() return type to int
return is now just en error code.
This guarantee that `ctx` remains valid after POOL_resize().
Gets rid of internal POOL_free() operation.
2018-06-22 12:14:59 -07:00
Yann Collet
243cd9d8bb add a cond_broadcast after resize
to make sure all threads (notably newly available threads)
get awaken to immediately process potential items in the queue.
2018-06-21 18:04:58 -07:00
Yann Collet
818e72b4d5 added extended POOL test
abrupt end + downsizing with running jobs remaining in queue.

also : POOL_resize() requires numThreads >= 1
2018-06-21 14:58:59 -07:00
Yann Collet
6de249c1c6 fixed: bug when counting nb of active threads
when queueSize > 1

also : added a test in testpool.c
       verifying resizing is effective.
2018-06-20 18:28:49 -07:00
Yann Collet
6b48eb12c0 change control of threadLimit
now limits maximum nb of active threads
even when queueSize > 1.
2018-06-20 14:35:39 -07:00
Yann Collet
62469c9f41 fixed wrong size in pthread struct transfer 2018-06-19 20:14:03 -07:00
Yann Collet
166901dc72 reduced POOL_resize() restriction
It's not necessary to ensure that no job is ongoing.
The pool is only expanded, existing threads are preserved.
In case of error, the only option is to return NULL and terminate the thread pool anyway.
2018-06-19 18:07:18 -07:00
Yann Collet
4567c57199 finalized POOL_resize()
POOL_ctx* POOL_resize(POOL_ctx* ctx, size_t numThreads)

The function may fail, and returns a NULL pointer in this case.
2018-06-19 16:03:12 -07:00
Yann Collet
1c714fda3f introduced POOL_resize()
not complete yet :
finalize behavior in case of unfinished expansion
2018-06-18 20:46:39 -07:00
Yann Collet
d8462ecba2 Merge branch 'dev' into huf_rename 2018-06-14 20:42:10 -04:00
Yann Collet
9698d2fb72
Merge pull request #1189 from facebook/hist
histogram module
2018-06-14 20:39:52 -04:00
Yann Collet
1adf84ccb7 renamed all HUF_decompress*X4*() functions into *X2
to underline they generate up to 2 symbols per decoding,
in preparation for a future *X3 variant.
2018-06-14 15:17:03 -04:00
Yann Collet
a09af5eb6b renamed all HUF_decompress*X2*() functions into *X1
to underline they generate one symbol per decoding operation.

The new naming scheme will make it easier to introduce an *X3 variant.
2018-06-14 15:08:43 -04:00
Yann Collet
fc682263d0 fixed g_debuglevel variable name
in debug.h
2018-06-13 20:02:33 -04:00
Yann Collet
2d76defbfe grouped all histogram functions into hist.c
renamed functions with HIST_* prefix
2018-06-13 19:49:31 -04:00
Yann Collet
fa41bcc2c2 grouped debug functions into debug.h
There were 2 competing set of debug functions
within zstd_internal.h and bitstream.h.
They were mostly duplicate, and required care to avoid messing with each other.

There is now a single implementation, shared by both.

Significant change :
The macro variable ZSTD_DEBUG does no longer exist,
it has been replaced by DEBUGLEVEL,
which required modifying several source files.
2018-06-13 15:43:09 -04:00
Yann Collet
463a0fe38b simplified optimal parser
removed "cached" structure.
prices are now saved in the optimal table.

Primarily done for simplification.
Might improve speed by a little.
But actually, and surprisingly, also improves ratio in some circumstances.
2018-05-29 14:07:25 -07:00
Yann Collet
b5ef32fea7 Merge branch 'dev' into fracFse 2018-05-24 14:09:49 -07:00
Yann Collet
776128d16f fix corner case when requiring cost of an FSE symbol
ensure that, when frequency[symbol]==0,
result is (tableLog + 1) bits
with both upper-bit and fractional-bit estimates.

Also : enable BIT_DEBUG in /tests
2018-05-24 13:59:11 -07:00
Nick Terrell
f2d0924b87 Variable declarations 2018-05-23 14:58:58 -07:00
Nick Terrell
c92dd11940 Error if reported size is too large in edge case 2018-05-23 14:47:20 -07:00
Nick Terrell
a97e9a627a [zstd] Fix decompression edge case
This edge case is only possible with the new optimal encoding selector,
since before zstd would always choose `set_basic` for small numbers of
sequences.

Fix `FSE_readNCount()` to support buffers < 4 bytes.

Credit to OSS-Fuzz
2018-05-23 12:16:00 -07:00
Nick Terrell
e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
Nick Terrell
49cf880513 Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.
2018-05-22 14:33:22 -07:00
fbrosson
291824f49d __builtin_prefetch did probably not exist before gcc 3.1. 2018-05-18 18:40:11 +00:00
fbrosson
16bb8f1f9e Drop colon in asm snippet to make old versions of gcc happy. 2018-05-18 17:05:36 +00:00
Yann Collet
0d7626672d fixed c++ conversion warning 2018-05-10 18:17:21 -07:00
Yann Collet
1a26ec6e8d opt: init statistics from dictionary
instead of starting from fake "default" statistics.
2018-05-10 17:59:12 -07:00
Yann Collet
c39061cb7b fixed declaration-after-statement warning 2018-05-09 12:07:25 -07:00
Yann Collet
4d5bd32a00 added traces to look at symbol costs
evaluation looks correct.
2018-05-09 12:00:12 -07:00
Yann Collet
c0da0f5e9e switchable bit-approximation / fractional-bit accuracy modes
also : makes it possible to select nb of fractional bits.
2018-05-09 10:48:09 -07:00
Yann Collet
ba2ad9b6b9 implemented fractional bit cost evaluation
for FSE symbols.

While it seems to work, the gains are negligible compared to rough maxNbBits evaluation.
There are even a few losses sometimes, that still need to be explained.
Furthermode, there are still cases where btlazy2 does a better job than btopt,
which seems rather strange too.
2018-05-08 17:43:13 -07:00
Yann Collet
6a3c34aa58 opt: estimate cost of both Hufman and FSE symbols
For FSE symbols : provide an upper bound,
in nb of bits,
since cost function is not able to store fractional bit costs.
2018-05-08 16:11:21 -07:00
Yann Collet
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
taigacon
2c3ad05812 Fix the problem that enables DYNAMIC_BMI2 macro by mistake on ARM architecture with Clang (#1110) 2018-04-23 15:41:50 -07:00
Yann Collet
ad15c1b724 added __has_attribute() define for non-clang compilers 2018-03-23 19:04:48 -07:00
Yann Collet
52ca7c6c56 make DYNAMIC_BMI2 support of clang conditional to __has_attribute()
to support older clang versions such as 3.4
2018-03-23 18:45:42 -07:00
Yann Collet
192542b63c
Merge pull request #1047 from facebook/hufCompress
removed huf_compress_impl.h
2018-03-15 14:14:03 -07:00
Yann Collet
a909c293c6 Merge branch 'dev' into hufCompress 2018-03-14 16:11:25 -07:00
Nick Terrell
a9a6dcba63 Expose reference external sequence API
* Expose the reference external sequences API for zstdmt.
  Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.
2018-03-14 12:29:31 -07:00
Yann Collet
a95a88af57 removed huf_compress_impl.h
re-imported all functions inside huf_compress.c
for easier source editing.

Also updated a bunch of code comments
for clarification.
2018-03-13 14:14:05 -07:00
Yann Collet
bd7bb94361
Merge pull request #1044 from baldurk/remove-utf8-characters
Remove non-ASCII characters in header file comments
2018-03-13 13:22:07 -07:00
Baldur Karlsson
430a2fec19 Remove non-ASCII characters in header file comments
* Replaced a non-breaking space and an en dash with a plain space and
  a hyphen.
* This means the files are simple ASCII and less likely to run into
  codepage issues.
2018-03-13 20:05:53 +00:00
Yann Collet
51169575a8
Merge pull request #1036 from terrelln/thread-void
[threading] Cast unused arguments to void
2018-03-07 12:14:05 -08:00
Nick Terrell
7e103cdaf5 [threading] Cast unused arguments to void 2018-03-06 18:36:40 -08:00
Yann Collet
d02b44cf55 DYNAMIC_BMI2 enabled for clang
clang only claims compatibility with gcc 4.2.
Consequently, recent patch which reserved DYNAMIC_BMI2 for gcc >= 4.8
also disabled it for clang.

fix : __clang__ is now enough to enable DYNAMIC_BMI2
(associated with other existing conditions : x64/x64, !bmi2)
2018-03-04 16:05:59 -08:00
Yann Collet
45b09e7625 limit DYNAMIC_BMI2 to gcc >= 4.8
attribute bmi2 not supported by gcc 4.4
2018-03-01 15:02:18 -08:00
Yann Collet
89741653ab added error code workSpace_tooSmall 2018-02-26 15:11:50 -08:00
Yann Collet
6cdf690441 minor cleaning of huff0
Update code documentation, and properly names a few "magic constants".
Also, HUF_compress_internal() gets a cleaner way
to determine size of tables inside workspace.
2018-02-26 14:52:23 -08:00
Nick Terrell
af866b3a58 Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
  stores them in a table. It should work with any chunk size, but
  is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
  instead of encoding the literals directly, it passes them to a
  secondary block compressor. The code to handle chunk sizes greater
  than the block size is currently commented out, since it is unused.
  The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
  `targetLength`. Also don't emit repcode matches in the LDM block
  compressor. Enabling the LDM with the optimal parser now actually improves
  the compression ratio.
* The compression ratio is very similar to before. It is very slightly
  different, because the repcode handling is slightly different. If I remove
  immediate repcode checking in both branches the compressed size is exactly
  the same.
* The speed looks to be the same or better than before.

Up Next (in a separate PR)
--------------------------

Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
2018-02-22 15:18:41 -08:00
Yann Collet
010ba5f71f
Merge pull request #1017 from terrelln/c-bmi2
[compress] Support BMI2
2018-02-20 15:34:59 -08:00
Yann Collet
70163bf0d3 added clarification comments in zstd_errors.h
answering some points in #1018
2018-02-20 12:54:49 -08:00
Nick Terrell
b58f01537e [compress] Support BMI2 2018-02-14 19:20:32 -08:00
Nick Terrell
4319132312 [decompress] Support BMI2 2018-02-13 17:00:15 -08:00
Yann Collet
95424409ea addBits and baseline into FSE decoding table
note : unfinished
- need new default tables
- need modify long mode
2018-02-09 04:25:15 -08:00
Yann Collet
0170cf9a7a minor : modified ZSTD_preserveUnsortedMark() to be more vectorization friendly 2018-02-05 11:46:02 -08:00
Yann Collet
997e4d0ccd added POOL_tryAdd() 2018-01-18 14:39:51 -08:00
Nick Terrell
887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
Yann Collet
e8093dde09 fixed #304
Pathological samples may result in literal section being incompressible.
This case is now detected,
and literal distribution is replaced by one that can be written into the dictionary.
2018-01-11 11:16:32 -08:00
Yann Collet
f299fa39ac fix a subtle issue in continue mode
The deep fuzzer tests caught a subtle bug that was probably there for a long time.
The impact of the bug is not a crash, or any other clear error signal,
rather, it reduces performance, by cutting data into smaller blocks.
Eventually, the following test would fail because it produces too many 1-byte blocks,
requiring more space than buffer can provide :
`./zstreamtest_asan --mt -s3514 -t1678312 -i1678314`

The root scenario is as follows :
- Create context, initialize it using explicit parameters or a `cdict` to pin them down, set `pledgedSrcSize=1`
- The compression parameters will not be adapted, but `windowSize` and `blockSize` will be automatically set to `1`.
  `windowSize` and `blockSize` are dynamic values, set within `ZSTD_resetCCtx_internal()`.
  The automatic adaptation makes it possible to generate smaller contexts for smaller input sizes.
- Complete compression
- New compression with same context, using same parameters, but `pledgedSrcSize=ZSTD_CONTENTSIZE_UNKNOWN`
  trigger "continue mode"
- Continue mode doesn't modify blockSize, because it used to depend on `windowLog` only,
  but in fact, it also depends on `pledgedSrcSize`.
- The "old" blocksize (1) is still there,
  next compression will use this value to cut input into blocks,
  resulting in more blocks and worse performance than necessary performance.

Given the scenario, and its possible variants, I'm surprised it did not show up before.
But I suspect it did show up, it's just that it never triggered an error, because "worse performance" is not a trigger.
The above test is a special corner case, where performance is so impacted that it reaches an error case.

The fix works, but I'm not completely pleased.
I think the current code relies too much on implied relations between variables.
This will likely break again in the future when some related part of the code change.
Unfortunately, no time to make larger changes if we want to keep the release target for zstd v1.3.3.
So a longer term fix will have to be considered after the release.

To do : create a reliable test case which triggers this scenario for CI tests.
2017-12-19 09:43:03 +01:00
Yann Collet
c173dbd6e7 no longer supported starting C++17 2017-12-04 18:00:53 -08:00
Yann Collet
0a0a212934 zstd_opt: changed cost formula
There was a flaw in the formula
which compared literal cost with match cost :
at a given position,
a non-null literal suite is going to be part of next sequence,
while if position ends a previous match, to immediately start another match,
next sequence will have a litlength of zero.
A litlength of zero has a non-null cost.
It follows that literals cost should be compared to match cost + litlength==0.

Not doing so gave a structural advantage to matches, which would be selected more often.
I believe that's what led to the creation of the strange heuristic which added a complex cost to matches.
The heuristic was actually compensating.
It was probably created through multiple trials, settling for best outcome on a given scenario (I suspect silesia.tar).
The problem with this heuristic is that it's hard to understand,
and unfortunately, any future change in the parser would impact the way it should be calculated and its effects.

The "proper" formula makes it possible to remove this heuristic.

Now, the problem is : in a head to head comparison, it's sometimes better, sometimes worse.
Note that all differences are small (< 0.01 ratio).
In general, the newer formula is better for smaller files (for example, calgary.tar and enwik7).
I suspect that's because starting statistics are pretty poor (another area of improvement).
However, for silesia.tar specifically, it's worse at level 22 (while being better at level 17, so even compression level has an impact ...).

It's a pity that zstd -22 gets worse on silesia.tar.
That being said, I like that the new code gets rid of strange variables,
which were introducing complexity for any future evolution (faster variants being in mind).
Therefore, in spite of this detrimental side effect, I tend to be in favor of it.
2017-11-28 14:07:03 -08:00
Yann Collet
cdade555ee fixed one UB pointer arithmetic 2017-11-17 11:40:08 -08:00
Yann Collet
05dffe43a7 Fixed Btree update
ZSTD_updateTree() expected to be followed by a Bt match finder, which would update zc->nextToUpdate.
With the new optimal match finder, it's not necessarily the case : a match might be found during repcode or hash3, and stops there because it reaches sufficient_len, without even entering the binary tree.
Previous policy was to nonetheless update zc->nextToUpdate, but the current position would not be inserted, creating "holes" in the btree, aka positions that will no longer be searched.
Now, when current position is not inserted, zc->nextToUpdate is not update, expecting ZSTD_updateTree() to fill the tree later on.

Solution selected is that ZSTD_updateTree() takes care of properly setting zc->nextToUpdate,
so that it no longer depends on a future function to do this job.

It took time to get there, as the issue started with a memory sanitizer error.
The pb would have been easier to spot with a proper `assert()`.
So this patch add a few of them.

Additionnally, I discovered that `make test` does not enable `assert()` during CLI tests.
This patch enables them.

Unfortunately, these `assert()` triggered other (unrelated) bugs during CLI tests, mostly within zstdmt.
So this patch also fixes them.

- Changed packed structure for gcc memory access : memory sanitizer would complain that a read "might" reach out-of-bound position on the ground that the `union` is larger than the type accessed.
  Now, to avoid this issue, each type is independent.
- ZSTD_CCtxParams_setParameter() : @return provides the value of parameter, clamped/fixed appropriately.
- ZSTDMT : changed constant name to ZSTDMT_JOBSIZE_MIN
- ZSTDMT : multithreading is automatically disabled when srcSize <= ZSTDMT_JOBSIZE_MIN, since only one thread will be used in this case (saves memory and runtime).
- ZSTDMT : nbThreads is automatically clamped on setting the value.
2017-11-16 12:18:56 -08:00
Yann Collet
4202b2e8a6 merged rep search into btMatchSearch
but there is a tree corruption somewhere ...
bug hunt ongoing
2017-11-14 20:38:52 -08:00
Yann Collet
9a11f70dc3 merged repcode search into BT match search
this version has same speed as branch `opt`
which is itself 5-10% slower than branch `dev`
(no identified reason)

It does not compress exactly the same as `opt` or `dev`,
maybe because it doesn't stop search after repcodes,
leading to sometimes better compression, sometimes worse
(by a small margin).

warning : _extDict path does not work for the time being
This means that benchmark module works,
but file module will fail with large files (and high compression level).
Objective is to fuse _extDict path into current one,
in order to have a single parser to maintain.
2017-11-13 02:23:48 -08:00
Yann Collet
4191efa993 zstd_opt: ensure sufficient_len < ZSTD_OPT_NUM to simplify some tests 2017-11-08 11:24:00 -08:00
Yann Collet
8b6aecf2cb moved a few structures from zstd_internal.h to zstd_compress.h
which is a more precise scope
2017-11-07 16:03:14 -08:00
Yann Collet
61e5a1adfc removed direct call to malloc() from pool.c 2017-10-31 17:43:24 -07:00
Nick Terrell
a86a7097ec Ensure dictionary Huff table can encode any symbol
* Ensure that the dictionary Huffman CTable has maxSymbolValue 255.
* Fix a stack buffer overflow during compression dictionary loading.
2017-10-03 13:22:13 -07:00
Yann Collet
ee1ed78fcb fix proper naming on FSE_createCTable() arguments in fse.h 2017-09-30 11:08:50 -07:00
Yann Collet
86b4fe5b45 adjustCParams : restored previous behavior
unknowns srcSize presumed small if there is a dictionary (dictSize>0)
and presumed large otherwise.
2017-09-28 18:14:28 -07:00
Yann Collet
54a827fff0 Merge branch 'dev' into newFormats
Fixed conflicts in zstdmt_compress.c
2017-09-27 16:39:40 -07:00
Nick Terrell
6c41adfb28 [libzstd] pthread function prefixed with ZSTD_
* `sed -i 's/pthread_/ZSTD_pthread_/g' lib/{,common,compress,decompress,dictBuilder}/*.[hc]`
* Fix up `lib/common/threading.[hc]`
* `sed -i s/PTHREAD_MUTEX_LOCK/ZSTD_PTHREAD_MUTEX_LOCK/g lib/compress/zstdmt_compress.c`
2017-09-27 11:48:48 -07:00
Yann Collet
9416195221 changed error code when pos<=size condition is not respected
Now pointing towards src_size or dst_size,
instead of error_GENERIC.
2017-09-27 10:35:56 -07:00
Yann Collet
df4e9bba25 fixed constant errors for gcc in c99 mode
C standard does not consider a `static const int` as a constant.
This is a problem for initializer, and ZSTD_STATIC_ASSERT().
Replaced by macro values
2017-09-26 14:31:06 -07:00
Yann Collet
9f0b8dfbe9 Merge branch 'dev' into newFormats 2017-09-26 14:22:39 -07:00
Nick Terrell
c233bdbaee Increase maximum window size
* Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail
  on my Mac.
* Maximum window size in 64-bit mode is 2GB, since that is the largest
  power of 2 that works with the overflow prevention.
* Allow `--long=windowLog` to set the window log, along with
  `--zstd=wlog=#`. These options also set the window size during
  decompression, but don't override `--memory=#` if it is set.
* Present a helpful error message when the window size is too large during
  decompression.
* The long range matcher defaults to a hash log 7 less than the window log,
  which keeps it at 20 for window log 27.
* Keep the default long range matcher window size and the default maximum
  window size at 27 for the API and CLI.
* Add tests that use the maximum window size and hash size for compression
  and decompression.
2017-09-26 14:00:01 -07:00
Yann Collet
5d8fdd1641 Merge pull request #855 from terrelln/maxoff
[libzstd] Increase MaxOff
2017-09-25 16:34:29 -07:00
Yann Collet
b8d4a3887f introduced constant ZSTD_frameIdSize
within zstd_internal.h
This is the size of magic number.

Avoids using `4` directly in source code, which is a bit less meaningful.
2017-09-25 15:26:18 -07:00
Nick Terrell
bbe77212ef [libzstd] Increase MaxOff 2017-09-25 13:36:18 -07:00
Yann Collet
7c3dea42ce added prototypes for advanced parameters for decompression API
required to decode custom formats
2017-09-24 15:57:29 -07:00