This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.
This attempts to guarantee that the estimates returned to users are always
correct.
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.
This commit fixes this.
Resubmission of #2001. This switches the `sed` invocations to use `-E`,
extended regex syntax, which is better standardized across platforms.
I guess.
Same test plan:
```
make -C lib clean libzstd.pc
cat lib/libzstd.pc
echo # should fail
make -C lib clean libzstd.pc LIBDIR=/foo
make -C lib clean libzstd.pc INCLUDEDIR=/foo
make -C lib clean libzstd.pc LIBDIR=/usr/localfoo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/localfoo
make -C lib clean libzstd.pc LIBDIR=/usr/local/lib prefix=/foo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/include prefix=/foo
echo # should succeed
make -C lib clean libzstd.pc LIBDIR=/usr/local/foo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/foo
make -C lib clean libzstd.pc LIBDIR=/usr/local/
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/
make -C lib clean libzstd.pc LIBDIR=/usr/local
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local
make -C lib clean libzstd.pc LIBDIR=/tmp/foo prefix=/tmp
make -C lib clean libzstd.pc INCLUDEDIR=/tmp/foo prefix=/tmp
make -C lib clean libzstd.pc LIBDIR=/tmp/foo prefix=/tmp/foo
make -C lib clean libzstd.pc INCLUDEDIR=/tmp/foo prefix=/tmp/foo
echo # should also succeed
make -C lib clean libzstd.pc prefix=/foo LIBDIR=/foo/bar INCLUDEDIR=/foo/
cat lib/libzstd.pc
mkdir out
cd out
cmake ../build/cmake
make
cat lib/libzstd.pc
```
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.
This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.
* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.
Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.
Credit to OSS-Fuzz.
This diff reorganizes the `lib/Makefile` to extract various settings that a
user would normally invoke together (supposing that they were aware of them)
if they were trying to build the smallest `libzstd` possible. It collects
these settings under a master setting `ZSTD_LIB_MIN_SIZE`.
Also document this new option.
`-Wall` implies `-Wformat-zero-length`, which will cause compilation to fail
under `-Werror` when an empty string is passed as the format string to a
`printf`-family function. This commit moves us back to prefixing the provided
format string, which successfully avoids that warning.
However, this removes the failure mode where that `RAWLOG` invocation would
fail to compile when no format string was provided at all (which was desirable
to avoid having code that would successfully compile normally but fail under
`-pedantic`, which *does* require that a non-zero number of args are provided).
So this commit also introduces a function which does nothing at all, but will
fail to compile if not provided with at least one argument, which is a string.
This successfully links the compilability of pedantic and non-pedantic builds.
Fixes:
Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:
Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:
O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes#2096
fix#2045
When compiling `libzstd` in multithreading mode,
the `libzstd-mt` recipe would not include `-pthread`,
resulting in an underlinked dynamic library.
Added a test on Travis to check that the library is fully linked.
This makes it possible, in some future release,
to build a multi-threaded `libzstd` dynamic library by default
as it would no longer impact the build script of user programs.
* adding long support for patch-from
* adding refPrefix to dictionary_decompress
* adding refPrefix to dictionary_loader
* conversion nit
* triggering log mode on chainLog < fileLog and removing old threshold
* adding refPrefix to dictionary_round_trip
* adding docs
* adding enableldm + forceWindow test for dict
* separate patch-from logic into FIO_adjustParamsForPatchFromMode
* moving memLimit adjustment to outside ifdefs (need for decomp)
* removing refPrefix gate on dictionary_round_trip
* rebase on top of dev refPrefix change
* making sure refPrefx + ldm is < 1% of srcSize
* combining notes for patch-from
* moving memlimit logic inside fileio.c
* adding display for optimal parser and long mode trigger
* conversion nit
* fuzzer found heap-overflow fix
* another conversion nit
* moving FIO_adjustMemLimitForPatchFromMode outside ifndef
* making params immutable
* moving memLimit update before createDictBuffer call
* making maxSrcSize unsigned long long
* making dictSize and maxSrcSize params unsigned long long
* error on files larger than 4gb
* extend refPrefix test to include round trip
* conversion to size_t
* making sure ldm is at least 10x better
* removing break
* including zstd_compress_internal and removing redundant macros
* exposing ZSTD_cycleLog()
* using cycleLog instead of chainLog
* add some more docs about user optimizations
* formatting
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized
The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.
The copyright and license of `divsufsort.{h,c}` is not changed.
The alignment is added before the loop, so this shouldn't hurt
performance in any case. The only way it hurts is if there is already
performance instability, and we force it to be stable but in the bad
case.
This consistently gets us into the good case with gcc-{7,8,9} on an
Intel i9-9900K and clang-9. gcc-5 is 5% worse than its best case but has
stable performance. We get consistently good behavior on my Macbook Pro
compiled with both clang and gcc-8. It ends up in the 50% from DSB and
50% from MITE case, but the performance is the same as the 85% DSB case,
so thats fine.
When the `PCLIBDIR` or `PCINCDIR` is non-empty (either because we succeeded
in removing the prefix, or because it was manually set), we don't need to
perform the check. This lets us trust users who go to the trouble of setting
a manual override, rather than still blindly failing the make.
They'll still be prefixed with `${prefix}/` / `${exec_prefix}/` in the
pkg-config file though.
Revises #1851. Fixes#1900. Replaces #1930.
Thanks to @orbea, @neheb, @Polynomial-C, and particularly @eli-schwartz for
pointing out the problem and suggesting solutions.
Tested with
```
make -C lib clean libzstd.pc
cat lib/libzstd.pc
# should fail
make -C lib clean libzstd.pc LIBDIR=/foo
make -C lib clean libzstd.pc INCLUDEDIR=/foo
make -C lib clean libzstd.pc LIBDIR=/usr/localfoo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/localfoo
make -C lib clean libzstd.pc LIBDIR=/usr/local/lib prefix=/foo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/include prefix=/foo
# should succeed
make -C lib clean libzstd.pc LIBDIR=/usr/local/foo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/foo
make -C lib clean libzstd.pc LIBDIR=/usr/local/
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/
make -C lib clean libzstd.pc LIBDIR=/usr/local
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local
make -C lib clean libzstd.pc LIBDIR=/tmp/foo prefix=/tmp
make -C lib clean libzstd.pc INCLUDEDIR=/tmp/foo prefix=/tmp
make -C lib clean libzstd.pc LIBDIR=/tmp/foo prefix=/tmp/foo
make -C lib clean libzstd.pc INCLUDEDIR=/tmp/foo prefix=/tmp/foo
# should also succeed
make -C lib clean libzstd.pc prefix=/foo LIBDIR=/foo/bar INCLUDEDIR=/foo/
cat lib/libzstd.pc
mkdir out
cd out
cmake ../build/cmake
make
cat lib/libzstd.pc
```
Super blocks must never violate the zstd block bound of input_size + ZSTD_blockHeaderSize. The individual sub-blocks may, but not the super block. If the superblock violates the block bound we are liable to violate ZSTD_compressBound(), which we must not do. Whenever the super block violates the block bound we instead emit an uncompressed block.
This means we increase the latency because of the single uncompressed block. I fix this by enabling streaming an uncompressed block, so the latency of an uncompressed block is 1 byte. This doesn't reduce the latency of the buffer-less API, but I don't think we really care.
* I added a test case that verifies that the decompression has 1 byte latency.
* I rely on existing zstreamtest / fuzzer / libfuzzer regression tests for correctness. During development I had several correctness bugs, and they easily caught them.
* The added assert that the superblock doesn't violate the block bound will help us discover any missed conditions (though I think I got them all).
Credit to OSS-Fuzz.
* Allow zero sized buffers in `stream_decompress`. Ensure that we never have two
zero sized buffers in a row so we guarantee forwards progress.
* Make case 4 in `stream_round_trip` do a zero sized buffers call followed by
a full call to guarantee forwards progress.
* Fix `limitCopy()` in legacy decoders.
* Fix memcpy in `zstdmt_compress.c`.
Catches the bug fixed in PR #1939
* Adding fail logging for superblock flow
* Dividing by targetCBlockSize instead of blockSize
* Adding new const and using more acurate formula for nbBlocks
* Only do dstCapacity check if using superblock
* Remvoing disabling logic
* Updating test to make it catch more extreme case of previou bug
* Also updating comment
* Only taking compressEnd shortcut on non-superblock
Fixes new fuzz issue
Credit to OSS-Fuzz
* Initializing unsigned value
* Initialilzing to 1 instead of 0 because its more conservative
* Unconditionoally setting to check first and then checking zero
* Moving bool to before block for c90
* Move check set before block
Fixes a fuzz issue where dictionary_round_trip failed because the compressor was generating corrupt files thanks to zero weights in the table.
* Only setting loaded dict huf table to valid on non-zero
* Adding hasNoZeroWeights test to fse tables
* Forbiding nbBits != 0 when weight == 0
* Reverting the last commit
* Setting table log to 0 when weight == 0
* Small (invalid) zero weight dict test
* Small (valid) zero weight dict test
* Initializing repeatMode vars to check before zero check
* Removing FSE changes to seperate pr
* Reverting accidentally changed file
* Negating bool, using unsigned, optimization nit
This has no measurable impact on large files but improves small file
decompression by ~1-2% for 10kB, benchmarked with:
head -c 10000 silesia.tar > /tmp/test
make CC=/usr/local/bin/clang-9 BUILD_STATIC=1 && ./lzbench -ezstd -t1,5 /tmp/test
This parameter is unused in single-threaded compression. We should make it
behave like the other multithread-only parameters, for which we only accept
zero when we are not built with multithreading.
* Silently skip dictionaries less than 8 bytes, unless using `ZSTD_dct_fullDict`.
This changes the compressor, which silently skips dictionaries <= 8 bytes.
* Allow repcodes that are equal to the dictionary content size, since it is in bounds.
In the case that `op >= oend_w` it is possible that `diff < 8` because
the two buffers could be adjacent.
Credit to OSS-Fuzz, which found the bug. It isn't reproducible because
it depends on the memory layout.
Addresses #1794. Instead of deriving the lib dir and include dir at
build-time, let's do it like everyone else does at pkg-config run-time.
This has the disadvantage that we can no longer override LIBDIR and
INCLUDEDIR in the Makefile and have that reflected in the .pc file.
Compression ratio of fast strategies (levels 1 & 2)
was seriously reduced, due to accidental disabling of Literals compression.
Credit to @QrczakMK, which perfectly described the issue, and implementation details,
making the fix straightforward.
Example : initCStream with level 1 on synthetic sample P50 :
Before : 5,273,976 bytes
After : 3,154,678 bytes
ZSTD_compress (for comparison) : 3,154,550
Fix#1787.
To follow : refactor the test which was supposed to catch this issue (and failed)
* Fix `ZSTD_FRAMEHEADERSIZE_PREFIX` and `ZSTD_FRAMEHEADERSIZE_MIN` to
take a `format` parameter, so it is impossible to get the wrong size.
* Fix the places that called `ZSTD_FRAMEHEADERSIZE_PREFIX` without
taking the format into account, which is now impossible by design.
* Call `ZSTD_frameHeaderSize_internal()` with `dctx->format`.
* The added tests catch both bugs in `ZSTD_decompressFrame()`.
Fixes#1813.
* Bump `WILDCOPY_OVERLENGTH` to 16 to fix the wildcopy overread.
* Optimize `ZSTD_wildcopy()` by removing unnecessary branches and
unrolling the loop.
* Extract `ZSTD_overlapCopy8()` into its own function.
* Add `ZSTD_safecopy()` for `ZSTD_execSequenceEnd()`. It is
optimized for single long sequences, since that is the important
case that can end up in `ZSTD_execSequenceEnd()`. Without this
optimization, decompressing a block with 1 long match goes
from 5.7 GB/s to 800 MB/s.
* Refactor `ZSTD_execSequenceEnd()`.
* Increase the literal copy shortcut to 16.
* Add a shortcut for offset >= 16.
* Simplify `ZSTD_execSequence()` by pushing more cases into
`ZSTD_execSequenceEnd()`.
* Delete `ZSTD_execSequenceLong()` since it is exactly the
same as `ZSTD_execSequence()`.
clang-8 seeds +17.5% on silesia and +21.8% on enwik8.
gcc-9 sees +12% on silesia and +15.5% on enwik8.
TODO: More detailed measurements, and on more datasets.
Crdit to OSS-Fuzz for finding the wildcopy overread.
This led to a nasty edgecase, where index reduction for modes that don't use
the h3 table would have a degenerate table (size 4) allocated and marked clean,
but which would not be re-indexed.
The source matchState is potentially at a lower current index, which means
that any extra table space not overwritten by the copy may now contain
invalid indices. The simple solution is to unconditionally shrink the valid
table area to just the area overwritten.