new version easier to vectorize
leads to smaller code and faster execution
notably at the last recombination stage
(basically, fixed cost per block).
Assembly inspected with godbolt
On my laptop, with `clang` and `-mavx2` :
2K block : 1280 MB/s -> 1550 MB/s
8K block : 1750 MB/s -> 1860 MB/s
Fuzzing build modes (FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION) doesn't
necessarily imply that assert() is enabled, according to the manual.
When the current do-nothing is expanded under -Wunused-variable (-Wall),
it results in unused variables in some of the FUZZING_BUILD_MODE...
blocks.
This patch extends the do-nothing to avoid the unused variable.
The bound check condition should always be met because we selected `set_basic` as
our encoding type. But that code is very far away, so assert it is true so if it is
ever false we can catch it, and add a bounds check.
Fixes#2213.
Allow compression to use dictionaries with missing symbols in their
entropy tables. We set the FSE repeat mode to check when there are
missing symbols, and set the FSE repeat mode to valid when all symbols
are present.
Note that when not all symbols are present, the heuristics which favor
dictionary tables for lower compression levels won't activate.
Tested by manually creating a dictionary with missing symbols of every
type, and validing that the compressor rejects it before this change,
and accepts it after this change. Also, I ran the `dictionary_loader`
fuzzer for >1 hour of CPU time without running into cases where
compression succeeds, but decompression fails.
Fixes#2174.
default rule is `lib-release`
`lib-release` wasn't working : it was just skipped.
Removing `lib-release` from the list of .PHONY targets fixes it.
Same for `lib-mt`.
this API is deprecated, for a loong time now,
all related symbols will be removed in a future version (likely v1.5.0)
and the header file `zbuff.h` doesn't compile from `include/` anyway,
because it needs to be positioned one directory below `zstd.h`.
Also removed `cover.h` from `cmake` installer,
as it should have never been part of this list to begin with.
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.
Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```
TODO: Add an explicit test that loads a small dictionary in MT mode
This commit pulls out the internals of `ZSTD_estimateCCtxSize_usingCCtxParams`
into a helper. It then migrates two other callsites to use that helper,
a small optimization for `ZSTD_estimateCStreamSize_usingCCtxParams`, which
folds the buffer sizing into the helper, and then `ZSTD_resetCCtx_internal`,
which is more invasive.
This attempts to guarantee that the estimates returned to users are always
correct.
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.
This commit fixes this.
Resubmission of #2001. This switches the `sed` invocations to use `-E`,
extended regex syntax, which is better standardized across platforms.
I guess.
Same test plan:
```
make -C lib clean libzstd.pc
cat lib/libzstd.pc
echo # should fail
make -C lib clean libzstd.pc LIBDIR=/foo
make -C lib clean libzstd.pc INCLUDEDIR=/foo
make -C lib clean libzstd.pc LIBDIR=/usr/localfoo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/localfoo
make -C lib clean libzstd.pc LIBDIR=/usr/local/lib prefix=/foo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/include prefix=/foo
echo # should succeed
make -C lib clean libzstd.pc LIBDIR=/usr/local/foo
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/foo
make -C lib clean libzstd.pc LIBDIR=/usr/local/
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local/
make -C lib clean libzstd.pc LIBDIR=/usr/local
make -C lib clean libzstd.pc INCLUDEDIR=/usr/local
make -C lib clean libzstd.pc LIBDIR=/tmp/foo prefix=/tmp
make -C lib clean libzstd.pc INCLUDEDIR=/tmp/foo prefix=/tmp
make -C lib clean libzstd.pc LIBDIR=/tmp/foo prefix=/tmp/foo
make -C lib clean libzstd.pc INCLUDEDIR=/tmp/foo prefix=/tmp/foo
echo # should also succeed
make -C lib clean libzstd.pc prefix=/foo LIBDIR=/foo/bar INCLUDEDIR=/foo/
cat lib/libzstd.pc
mkdir out
cd out
cmake ../build/cmake
make
cat lib/libzstd.pc
```
When the output buffer is `NULL` with size 0, but the frame content size
is non-zero, we will write to the NULL pointer because our bounds check
underflowed.
This was exposed by a recent PR that allowed an empty frame into the
single-pass shortcut in streaming mode.
* Fix the bug.
* Fix another NULL dereference in zstd-v1.
* Overflow checks in 32-bit mode.
* Add a dedicated test.
* Expose the bug in the dedicated simple_decompress fuzzer.
* Switch all mallocs in fuzzers to return NULL for size=0.
* Fix a new timeout in a fuzzer.
Neither clang nor gcc show a decompression speed regression on x86-64.
On x86-32 clang is slightly positive and gcc loses 2.5% of speed.
Credit to OSS-Fuzz.
This diff reorganizes the `lib/Makefile` to extract various settings that a
user would normally invoke together (supposing that they were aware of them)
if they were trying to build the smallest `libzstd` possible. It collects
these settings under a master setting `ZSTD_LIB_MIN_SIZE`.
Also document this new option.
`-Wall` implies `-Wformat-zero-length`, which will cause compilation to fail
under `-Werror` when an empty string is passed as the format string to a
`printf`-family function. This commit moves us back to prefixing the provided
format string, which successfully avoids that warning.
However, this removes the failure mode where that `RAWLOG` invocation would
fail to compile when no format string was provided at all (which was desirable
to avoid having code that would successfully compile normally but fail under
`-pedantic`, which *does* require that a non-zero number of args are provided).
So this commit also introduces a function which does nothing at all, but will
fail to compile if not provided with at least one argument, which is a string.
This successfully links the compilability of pedantic and non-pedantic builds.
Fixes:
Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:
Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:
O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes#2096
fix#2045
When compiling `libzstd` in multithreading mode,
the `libzstd-mt` recipe would not include `-pthread`,
resulting in an underlinked dynamic library.
Added a test on Travis to check that the library is fully linked.
This makes it possible, in some future release,
to build a multi-threaded `libzstd` dynamic library by default
as it would no longer impact the build script of user programs.
* adding long support for patch-from
* adding refPrefix to dictionary_decompress
* adding refPrefix to dictionary_loader
* conversion nit
* triggering log mode on chainLog < fileLog and removing old threshold
* adding refPrefix to dictionary_round_trip
* adding docs
* adding enableldm + forceWindow test for dict
* separate patch-from logic into FIO_adjustParamsForPatchFromMode
* moving memLimit adjustment to outside ifdefs (need for decomp)
* removing refPrefix gate on dictionary_round_trip
* rebase on top of dev refPrefix change
* making sure refPrefx + ldm is < 1% of srcSize
* combining notes for patch-from
* moving memlimit logic inside fileio.c
* adding display for optimal parser and long mode trigger
* conversion nit
* fuzzer found heap-overflow fix
* another conversion nit
* moving FIO_adjustMemLimitForPatchFromMode outside ifndef
* making params immutable
* moving memLimit update before createDictBuffer call
* making maxSrcSize unsigned long long
* making dictSize and maxSrcSize params unsigned long long
* error on files larger than 4gb
* extend refPrefix test to include round trip
* conversion to size_t
* making sure ldm is at least 10x better
* removing break
* including zstd_compress_internal and removing redundant macros
* exposing ZSTD_cycleLog()
* using cycleLog instead of chainLog
* add some more docs about user optimizations
* formatting
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized
The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.
The copyright and license of `divsufsort.{h,c}` is not changed.
The alignment is added before the loop, so this shouldn't hurt
performance in any case. The only way it hurts is if there is already
performance instability, and we force it to be stable but in the bad
case.
This consistently gets us into the good case with gcc-{7,8,9} on an
Intel i9-9900K and clang-9. gcc-5 is 5% worse than its best case but has
stable performance. We get consistently good behavior on my Macbook Pro
compiled with both clang and gcc-8. It ends up in the 50% from DSB and
50% from MITE case, but the performance is the same as the 85% DSB case,
so thats fine.