isolate all logic associated with block decompression
into its own module.
zstd_decompress is still in charge
of context creation/destruction,
frames, headers, streaming, special blocks, etc.
Compressed blocks themselves are now handled within zstd_decompress_block .
fix#1385
decompressing into NULL was an automatic error.
It is now allowed, as long as the content of the frame is empty.
Seems to simplify things for `arrow`.
Maybe some other projects rely on this behavior ?
fix#1379
decodecorpus was generating one extraneous byte when `nbSeq==0`.
This is disallowed by the specification.
The reference decoder was just skipping the extraneous byte.
It is now stricter, and flag such situation as an error.
ensure that the structure layout is as expected.
will trigger an error if it changes in the future.
Another solution would be to use a union,
this would be cleaner and get rid of these static asserts.
However, in order to keep the current code unmodified,
it would be necessary to use an un-named unions.
And apparently, un-named unions are only possible on "recent" compilers (C99+).
corresponding to the removal of workspace
which is needed while building huffman table
and is now either present in DCtx,
or temporarily borrowed from available FSE table space.
streaming decoders, such as ZSTD_decompressStream() or ZSTD_decompress_generic(),
may end up making no forward progress,
(aka no byte read from input __and__ no byte written to output),
due to unusual parameters conditions,
such as providing an output buffer already full.
In such case, the caller may be caught in an infinite loop,
calling the streaming decompression function again and again,
without making any progress.
This version detects such situation, and generates an error instead :
ZSTD_error_dstSize_tooSmall when output buffer is full,
ZSTD_error_srcSize_wrong when input buffer is empty.
The detection tolerates a number of attempts before triggering an error,
controlled by ZSTD_NO_FORWARD_PROGRESS_MAX macro constant,
which is set to 16 by default, and can be re-defined at compilation time.
This behavior tolerates potentially existing implementations
where such cases happen sporadically, like once or twice,
which is not dangerous (only infinite loops are),
without generating an error, hence without breaking these implementations.
It's not necessary to ensure that no job is ongoing.
The pool is only expanded, existing threads are preserved.
In case of error, the only option is to return NULL and terminate the thread pool anyway.
There were 2 competing set of debug functions
within zstd_internal.h and bitstream.h.
They were mostly duplicate, and required care to avoid messing with each other.
There is now a single implementation, shared by both.
Significant change :
The macro variable ZSTD_DEBUG does no longer exist,
it has been replaced by DEBUGLEVEL,
which required modifying several source files.
recently introduce into the new dictionary mode.
The bug could be reproduced with this command :
./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639
error was in function ZSTD_count_2segments() :
the beginning of the 2nd segment corresponds to prefixStart
and not the beginning of the current block (istart == src).
This would result in comparing the wrong byte.
ZSTD_decompress() can decompress multiple frames sent as a single input.
But the input size must be the exact sum of all compressed frames, no more.
In the case of a mistake on srcSize, being larger than required,
ZSTD_decompress() will try to decompress a new frame after current one, and fail.
As a consequence, it will issue an error code, ERROR(prefix_unknown).
While the error is technically correct
(the decoder could not recognise the header of _next_ frame),
it's confusing, as users will believe that the first header of the first frame is wrong,
which is not the case (it's correct).
It makes it more difficult to understand that the error is in the source size, which is too large.
This patch changes the error code provided in such a scenario.
If (at least) a first frame was successfully decoded,
and then following bytes are garbage values,
the decoder assumes the provided input size is wrong (too large),
and issue the error code ERROR(srcSize_wrong).
for FSE symbols.
While it seems to work, the gains are negligible compared to rough maxNbBits evaluation.
There are even a few losses sometimes, that still need to be explained.
Furthermode, there are still cases where btlazy2 does a better job than btopt,
which seems rather strange too.
which was not done properly by gcc 4.8
resulting in major performance difference.
ex :
zstd -b1 silesia.tar
before : dec 680 MB/s
after : dec 710 MB/s (without bmi2)
after : dec 770 MB/s (with DYNAMIC_BMI2)
Update code documentation, and properly names a few "magic constants".
Also, HUF_compress_internal() gets a cleaner way
to determine size of tables inside workspace.
This makes it easier to edit for maintenance and evolutions
(I plan to experiment modifications in huffman decompression functions).
The methology followed seems broadly applicable to other BMI2 modules.
Performance was tracked rigorously at each step,
there is no noticeable loss (nor win) of performance compared to `#include` version.
Note however that 4X decoder variants tend to be extremely sensitive to code alignment.
This source code resulted in pretty good performance for gcc 7.2 and 7.3,
but future changes (even in other parts of the code) might trigger the issue again.
On my laptop:
Before:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
3#silesia.tar : 211984896 -> 66683478 (3.179), 97.6 MB/s , 400.7 MB/s
3#enwik8 : 100000000 -> 35643153 (2.806), 76.5 MB/s , 303.2 MB/s
After:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
3#silesia.tar : 211984896 -> 66683478 (3.179), 97.4 MB/s , 435.0 MB/s
3#enwik8 : 100000000 -> 35643153 (2.806), 76.2 MB/s , 338.1 MB/s
Mileage vary, depending on file, and cpu type.
But a generic rule is : x86 benefits less from "long-offset mode" than x64,
maybe due to register pressure.
On "entropy", long-mode is _never_ a win for x86.
On my laptop though, it may, depending on file and compression level
(enwik8 benefits more from "long-mode" than silesia).
ZSTD_create?Dict() is required to produce a ?Dict* return type
because `free()` does not accept a `const type*` argument.
If it wasn't for this restriction, I would have preferred to create a `const ?Dict*` object
to emphasize the fact that, once created, a dictionary never changes
(hence can be shared concurrently until the end of its lifetime).
There is no such limitation with initStatic?Dict() :
as stated in the doc, there is no corresponding free() function,
since `workspace` is provided, hence allocated, externally,
it can only be free() externally.
Which means, ZSTD_initStatic?Dict() can return a `const ZSTD_?Dict*` pointer.
Tested with `make all`, to catch initStatic's users,
which, incidentally, also updated zstd.h documentation.
it still fallbacks to single-thread blocking invocation
when input is small (<1job)
or when invoking ZSTDMT_compress(), which is blocking.
Also : fixed a bug in new block-granular compression routine.
It used to stop on reaching extDict, for simplification.
As a consequence, there was a small loss of performance each time the round buffer would restart from beginning.
It's not a large difference though, just several hundreds of bytes on silesia.
This patch fixes it.
It does not feel "right" from a dependency perspective.
ZSTD_initDCtx_internal() is triggered once, on DCtx creation,
while ZSTD_decompressBegin() is invoked at the beginning of each new frame,
and is also a user-facing prototype.
Downside : a DCtx must be init before first usage !
This was always the intention by the way, and is documented as such.
This stage is automatically done within ZSTD_decompress() and variants,
and also within ZSTD_decompressStream().
Only ZSTD_decompressContinue() is impacted,
it must be preceded by a ZSTD_decompressBegin(), as detailed in doc.
A test has been fixed, to no longer rely on undocumented assumption that ZSTD_decompressBegin() is invoked during init.
decoder output buffer would receive a wrong size.
In previous version, ZSTD_decompressStream() would blindly trust the caller that pos <= size.
In this version, this condition is actively checked,
and the function returns an error code if this condition is not respected.
This check could also be done with an assert(),
but since this is a user-facing interface, it seems better to keep this check at runtime.
* Maximum window size in 32-bit mode is 1GB, since allocations for 2GB fail
on my Mac.
* Maximum window size in 64-bit mode is 2GB, since that is the largest
power of 2 that works with the overflow prevention.
* Allow `--long=windowLog` to set the window log, along with
`--zstd=wlog=#`. These options also set the window size during
decompression, but don't override `--memory=#` if it is set.
* Present a helpful error message when the window size is too large during
decompression.
* The long range matcher defaults to a hash log 7 less than the window log,
which keeps it at 20 for window log 27.
* Keep the default long range matcher window size and the default maximum
window size at 27 for the API and CLI.
* Add tests that use the maximum window size and hash size for compression
and decompression.
supporting function for bufferless streaming API (ZSTD_decompressContinue())
makes it possible to correctly size a round buffer for decoding using this API.
also : added field blockSizeMax within ZSTD_frameHeader,
as it's a necessary information to know when to restart at beginning of decoding buffer.
The zstd format specification doesn't enforce that Huffman compressed
literals (including the table) have to be smaller than the uncompressed
literals. The compressor will never Huffman compress literals if the
compressed size is larger than the uncompressed size. The decompresser
doesn't accept Huffman compressed literals with 4 streams whose compressed
size is at least as large as the uncompressed size.
* Make the decompresser accept Huffman compressed literals whose size
increases.
* Add a test case that exposes the bug. The compressed file has to be
statically generated, since the compressor won't normally produce files
that expose the bug.
Note : all error codes are changed by this new version,
but it's expected to be the last change for existing codes.
Codes are now grouped by category, and receive a manually attributed value.
The objective is to guarantee that
error code values will not change in the future
when introducing new codes.
Intentionnal empty spaces and ranges are defined
in order to keep room for potential new codes.
Makes frame type (zstd,skippable) detection more straighforward.
ZSTD_getFrameHeader set frameContentSize=ZSTD_CONTENTSIZE_UNKNOWN to mean "field not present"
* `ZSTD_decompressStream_generic()` `ip` may be `NULL` for one of the calls
to `memcpy()`
* Assert the source is not `NULL` for calls to `memcpy()` where I believe
the source should not be `NULL`.
now ZSTD_customCMem is promoted as new default.
Advantages : ZSTD_customCMem = { NULL, NULL, NULL},
so it's natural default after a memset.
ZSTD_customCMem is public constant
(defaultCustomMem was private only).
Also : makes it possible to introduce ZSTD_calloc(),
which can now default to stdlib's calloc()
when it detects system default.
Fixed zlibwrapper which depended on defaultCustomMem.
memset() was a quick fix to initialization problems,
but initialize too much space (tables, buffers)
which show up in decompression speed of ZSTD_decompress()
since it needs to recreate DCtx at each invocation.
Fixed by only initialization relevant pointers and size fields.
They are now the same object.
It's recommended to keep both types in source code
as previous versions of library (<v1.3.0)
still need this differentiation.
ZSTD_frameParams => ZSTD_frameHeader
ZSTD_getFrameParams() -> ZSTD_getFrameHeader()
The new naming is more distinctive from ZSTD_frameParameters,
which is used during compression.
ZSTD_frameHeader is clearer in its intention to described frame header content.
It also implies we are decoding a ZSTD frame, hence we are at decoding stage.
This decoder variant is detrimental to x86 architecture
likely due to register pressure.
Note that the variant is disabled for all 32-bits targets.
It's unclear if it would help for different architectures,
such as ARM, MIPS or PowerPC.