Commit Graph

63 Commits

Author SHA1 Message Date
Like Ma
cc907770bd Fix building on AIX 5.1 2020-10-09 18:34:00 +08:00
Nick Terrell
5717bd39ee [lib] Fix NULL pointer dereference
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.

This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.

* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.

Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.

Credit to OSS-Fuzz.
2020-05-06 12:09:02 -07:00
W. Felix Handte
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
W. Felix Handte
c7da66c9cf Purge C++-Style Comments (// ...), Make Compilation Succeed Under C90 2020-05-04 10:59:15 -04:00
Nick Terrell
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
Nick Terrell
d1cc9d2797
[fuzz] Allow zero sized buffers for streaming fuzzers (#1945)
* Allow zero sized buffers in `stream_decompress`. Ensure that we never have two
  zero sized buffers in a row so we guarantee forwards progress.
* Make case 4 in `stream_round_trip` do a zero sized buffers call followed by
  a full call to guarantee forwards progress.
* Fix `limitCopy()` in legacy decoders.
* Fix memcpy in `zstdmt_compress.c`.

Catches the bug fixed in PR #1939
2020-01-09 11:38:50 -08:00
Dávid Bolvanský
1f7228c040 Use clz ^ 31 instead of 31 - clz; better codegen for GCC 2019-09-23 21:23:09 +02:00
Nick Terrell
e6edcfa795 [legacy] Fix bug in zstd-0.5 decoder
The match length and literal length extra bytes could either
by 2 bytes or 3 bytes in version 0.5. All earlier verions were
always 3 bytes, and later version didn't have dumps.

The bug, introduced by commit 0fd322f812,
was triggered when the last dump was a 2-byte dump, because we didn't
separate that case from a 3-byte dump, and thought we were over-reading.

I've tested this fix with every zstd version < 1.0.0 on the buggy file,
and we are now always successfully decompressing with the right
checksum.

Fixes #1693.
2019-07-22 13:05:09 -07:00
Nick Terrell
0fd322f812 [legacy] Fix ZSTDv0*_decodeSequence()
* Version <= 0.5 could read beyond the end of `dumps`, which points into
  the input buffer.
* Check the validity of `dumps` before using it, if it is out of bounds
  return garbage values. There is no return code for this function.
* Introduce `MEM_readLE24()` for simplicity, since I don't want to trust
  that there is an extra byte after `dumps`.
2019-04-19 11:34:52 -07:00
Nick Terrell
2536771134 [legacy] Fix Huffman jump table reads in v01 and v05 2019-04-18 16:20:42 -07:00
Josh Soref
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
shakeelrao
0a3fa6f909 Add legacy mode in documentation 2019-03-18 20:33:15 -07:00
shakeelrao
20aa1b455c Stylistic changes 2019-03-17 19:35:43 -07:00
shakeelrao
60796e76b0 Add legacy support to decompressBound 2019-03-15 16:10:37 -07:00
Yann Collet
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
Yann Collet
11cd2ea43d finalized minor warnings on Haiku 2018-10-03 16:37:50 -07:00
Nick Terrell
f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
Yann Collet
36d6165a2d Makefile: added variable SCANBUILD
so that a different version of scan-build can be selected
2018-08-16 16:44:13 -07:00
Yann Collet
6e66bbf5dd fixed several minor issues detected by scan-build
only notable one :
writeNCount() resists better vs invalid distributions
(though it should never happen within zstd anyway)
2018-08-14 16:55:35 -07:00
Yann Collet
c173dbd6e7 no longer supported starting C++17 2017-12-04 18:00:53 -08:00
Yann Collet
3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
Yann Collet
32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
Yann Collet
2bd6440be0 pinned down error code enum values
Note : all error codes are changed by this new version,
but it's expected to be the last change for existing codes.

Codes are now grouped by category, and receive a manually attributed value.
The objective is to guarantee that
error code values will not change in the future
when introducing new codes.
Intentionnal empty spaces and ranges are defined
in order to keep room for potential new codes.
2017-07-13 17:12:16 -07:00
Yann Collet
133f0aee54 fixed redundant declarations in legacy v0.5 and v0.7 decoders
triggered by new flag -Wredundant-decls
2017-05-15 17:44:04 -07:00
Jos Collin
280510f2d5 lib/legacy: warning: this statement may fall through
The following warning appears during build at sevaral places.

../lib/legacy/zstd_v04.c:819:40: warning: this statement may fall through [-Wimplicit-fallthrough=]
             case 7: bitD->bitContainer += (size_t)(((const BYTE*)(bitD->start))[6]) << (sizeof(size_t)*8 - 16);

../lib/legacy/zstd_v05.c:821:40: warning: this statement may fall through [-Wimplicit-fallthrough=]
             case 7: bitD->bitContainer += (size_t)(((const BYTE*)(bitD->start))[6]) << (sizeof(size_t)*8 - 16);

../lib/legacy/zstd_v06.c:913:40: warning: this statement may fall through [-Wimplicit-fallthrough=]
             case 7: bitD->bitContainer += (size_t)(((const BYTE*)(srcBuffer))[6]) << (sizeof(bitD->bitContainer)*8 - 16);

../lib/legacy/zstd_v07.c:583:40: warning: this statement may fall through [-Wimplicit-fallthrough=]
             case 7: bitD->bitContainer += (size_t)(((const BYTE*)(srcBuffer))[6]) <<
             (sizeof(bitD->bitContainer)*8 - 16);

Signed-off-by: Jos Collin <jcollin@redhat.com>
2017-05-11 14:27:40 +05:30
Nick Terrell
5152fb2cb2 Convert all tabs to spaces 2017-03-29 18:51:58 -07:00
Sean Purcell
9050e1925e Change name to to findFrameCompressedSize and add skippable support 2017-02-22 12:12:34 -08:00
Sean Purcell
d7bfcac18a Expose frameSrcSize to experimental API 2017-02-10 11:55:44 -08:00
Sean Purcell
4e709712e1 Decompressed size functions now handle multiframes and distinguish cases
- Add ZSTD_findDecompressedSize
    - Traverses multiple frames to find total output size
- Add ZSTD_getFrameContentSize
    - Gets the decompressed size of a single frame by reading header
- Deprecate ZSTD_getDecompressedSize
2017-02-08 14:50:10 -08:00
Yann Collet
b5fd15ccb2 fixed : legacy decoders v04 and v05 2017-01-30 10:45:58 -08:00
Yann Collet
cafdd31a38 fixed MSAN warnings in legacy decoders
In some extraordinary circumstances,
*Length field can be generated from reading a partially uninitialized memory segment.
Data is correctly identified as corrupted later on,
but the read taints some later pointer arithmetic operation.
2017-01-27 10:44:03 -08:00
Yann Collet
35168679bd Merge pull request #478 from terrelln/wildcopy-ub
Fix execSequence wildcopy undefined behavior
2016-12-13 11:33:00 +01:00
Nick Terrell
064a143520 Fix execSequence wildcopy undefined behavior
execSequence relied on pointer overflow to handle cases where
`sequence.matchLength < 8`.  Instead of passing an `size_t` to
wildcopy, pass a `ptrdiff_t`.
2016-12-12 19:01:23 -08:00
Nick Terrell
e474aa55b4 Fix decompression buffer overrun
Allows an adversary to write up to 3 bytes beyond the end of the buffer.
Occurs if the match overlaps the `extDict` and `currentPrefix`, and the
match length in the `currentPrefix` is less than `MINMATCH`, and
`op-(16-MINMATCH) >= oMatchEnd > op-16`.
2016-12-12 18:05:30 -08:00
Nick Terrell
4359d21ad7 Merge two memset() calls into one 2016-11-14 17:52:51 -08:00
Nick Terrell
24701de877 Fix uninitialized memory read 2016-11-14 13:57:05 -08:00
Nick Terrell
dc904ad17b Fix bug in zstd v0.{5, 6} dictionary decompression
Introduced by bb68062c59.
2016-11-04 16:18:59 -07:00
Nick Terrell
f698ad6deb Merge remote-tracking branch 'upstream/dev' into fixes
* upstream/dev:
  added doc\zstd_manual.html
  added contrib\gen_html
  zstd_compression_format.md moved to doc/
  Fix small bug in ZSTD_execSequence()
  improved ZSTD_compressBlock_opt_extDict_generic
  protect ZSTD_decodeFrameHeader() from invalid usage, as suggested by @spaskob
  zstd_opt.h: small improvement in compression ratio
  improved dicitonary segment merge
  use implicit rules to compile zstd_decompress.c
  detect early impossible decompression scenario in legacy decoder v0.5
  no repeat mode in legacy v0.5
  fixed invalid invocation of dictionary in legacy decoder v0.5
  fix edge case
  fix command line interpretation
  fixed minor corner case
  zstd.h: added the Introduction section
  fixed clang 3.5 warnings
  zstd.h: updated comments
2016-10-24 13:10:13 -07:00
Nick Terrell
ae1cb3b3d0 Fix small bug in ZSTD_execSequence()
`memmove(op, match, sequence.matchLength)` is not the desired behavior.
Overlap is allowed, and handled as if we did `*op++ = *match++`, which
is not how `memmove()` handles overlap.

Only triggered if both of the following conditions are met:
* The match spans extDict & currentPrefixSegment
* `oLitEnd <= oend_w < oLitEnd + length1 < oMatchEnd <= oend`.

These two conditions imply that the block is less than 15 bytes long.
This bug isn't triggered by the streaming API, because it allocates
enough space for the window size + the block size, so there cannot be
a match that is within 8 bytes of the end and overlaps with itself.
It cannot be triggered by the block decompression API because all of
the decompressed data is in the currentPrefixSegment.

Introduced by commit 7158584399
2016-10-21 12:13:44 -07:00
Nick Terrell
d760529a05 Fix stack buffer overrun when weightTotal == 0
If `weightTotal == 0`, then `BIT_highbit32(weightTotal)` is
undefined behavior in the case that it calls `__builtin_clz()`.
If `tableLog == HUF_TABLELOG_ABSOLUTEMAX` then we will access one
byte beyond the end of the buffer.
2016-10-19 11:39:11 -07:00
Nick Terrell
bb68062c59 Unitialized memory read in ZSTD_decodeSeqHeaders()
Caused by two things:
1. Not checking that `ip` is in range except for the first byte.
2. `ZSTDv0{5,6}_decodeLiteralsBlock()` could return a value larger than `srcSize`.
2016-10-18 16:41:33 -07:00
Nick Terrell
7b06ad7a05 Backport fix from commit 125d817
This fixes a read of unitialized memory.
Full commit hash: 125d81774f.
2016-10-18 14:52:34 -07:00
Nick Terrell
f45b157d95 Backport fix from commit 9e8b09a
Fixes uninitialized memory reads.
Full commit hash: 9e8b09a7bd
2016-10-18 14:22:49 -07:00
Yann Collet
f7906d5955 detect early impossible decompression scenario in legacy decoder v0.5 2016-10-18 13:48:32 -07:00
Yann Collet
9313c8d953 no repeat mode in legacy v0.5 2016-10-18 13:36:15 -07:00
Yann Collet
83d7bdee4b fixed invalid invocation of dictionary in legacy decoder v0.5 2016-10-18 12:25:43 -07:00
Nick Terrell
4db751668f Fix buffer overrun in ZSTD_loadEntropy()
The table log set by `FSE_readNCount()` was not checked in
`ZSTD_loadEntropy()`.  This caused `FSE_buildDTable(dctx->MLTable, ...)`
to overwrite the beginning of `dctx->hufTable`.

The benchmarks look good, there is no obvious performance regression:

  > ./zstds/zstd.opt.0 -i10 -b1 -e5 ~/bench/silesia.tar
   1#silesia.tar       : 211988480 ->  73656930 (2.878), 268.2 MB/s , 701.0 MB/s
   2#silesia.tar       : 211988480 ->  70162842 (3.021), 199.5 MB/s , 666.9 MB/s
   3#silesia.tar       : 211988480 ->  66997986 (3.164), 154.9 MB/s , 655.6 MB/s
   4#silesia.tar       : 211988480 ->  66002591 (3.212), 128.9 MB/s , 648.4 MB/s
   5#silesia.tar       : 211988480 ->  65008480 (3.261),  98.4 MB/s , 633.4 MB/s

  > ./zstds/zstd.opt.2 -i10 -b1 -e5 ~/bench/silesia.tar
   1#silesia.tar       : 211988480 ->  73656930 (2.878), 266.1 MB/s , 703.7 MB/s
   2#silesia.tar       : 211988480 ->  70162842 (3.021), 199.0 MB/s , 666.6 MB/s
   3#silesia.tar       : 211988480 ->  66997986 (3.164), 156.2 MB/s , 656.2 MB/s
   4#silesia.tar       : 211988480 ->  66002591 (3.212), 133.2 MB/s , 647.4 MB/s
   5#silesia.tar       : 211988480 ->  65008480 (3.261),  96.3 MB/s , 633.3 MB/s
2016-10-17 15:51:15 -07:00
Nick Terrell
ccfcc643da Check if dict is empty before reading first byte 2016-10-17 11:46:03 -07:00
Nick Terrell
7158584399 Fix ZSTD_execSequence() edge case 2016-10-12 10:05:26 -07:00
inikep
8161e7321a unified error codes for legacy decoders 2016-09-05 12:29:51 +02:00