Normally these are set to size_t and ssize_t. But if they do not
exist, then they are set to the smallest integer type that can
contain a pointer. size_t is unsigned and ssize_t is signed.
gzsetparams() was using Z_PARTIAL_FLUSH when it could use Z_BLOCK
instead. This commit uses Z_BLOCK, which avoids emitting an
unnecessary ten bits into the stream.
In some cases the return values did not match the documentation,
or the documentation did not document all of the return values.
gzprintf() now consistently returns negative values on error,
which matches the behavior of the stdio fprintf() function.
The previous code slid the window and the hash table and copied
every input byte three times in order to just write the data as
stored blocks with no compression. This commit minimizes sliding
and copying, especially for large input and output buffers.
Level 0 compression is now more than 20 times faster than before
the commit.
Most of the speedup is due to deferring hash table slides until
deflateParams() is called to change the compression level away
from 0. More speedup is due to copying directly from next_in to
next_out when the amounts of available input data and output space
permit it, avoiding the intermediate pending buffer. Additionally,
only the last 32K of the used input data is copied back to the
sliding window when large input buffers are provided.
This alters the specification in zlib.h, so that deflateParams()
will not change any parameters if there is not enough output space
in the event that a block is emitted in order to allow switching
the compression function.
When debugging the Huffman coding would warn about resulting codes
greater than 15 bits in length. This is handled properly, and is
not uncommon. This increases the verbosity of the warning by one,
so that it is not displayed by default.
This speeds up level 0 by about a factor of three, as compared to
the previous byte-at-a-time loop. We can do much better though. A
later commit avoids this copy for level 0 with large buffers,
instead copying directly from the input to the output. This commit
still speeds up storing incompressible data found when compressing
normally.
Compression level 0 requests no compression, using only stored
blocks. When Z_HUFFMAN or Z_RLE was used with level 0 (granted,
an odd choice, but permitted), the resulting blocks were mostly
fixed or dynamic. The reason is that deflate_stored() was not
being called in that case. The compressed data was valid, but it
was not what the application requested. This commit assures that
only stored blocks are emitted for compression level 0, regardless
of the strategy selected.
This updates the OS_CODE determination at compile time to match as
closely as possible the operating system mappings documented in
the PKWare APPNOTE.TXT version 6.3.4, section 4.4.2.2. That byte
in the gzip header is used by nobody for anything, as far as I can
tell. However we might as well try to set it appropriately.
This verifies that the state has been initialized, that it is the
expected type of state, deflate or inflate, and that at least the
first several bytes of the internal state have not been clobbered.
There is a bug in deflate for windowBits == 8 (256-byte window).
As a result, zlib silently changes a request for 8 to a request
for 9 (512-byte window), and sets the zlib header accordingly so
that the decompressor knows to use a 512-byte window. However if
deflateInit2() is used for raw deflate or gzip streams, then there
is no indication that the request was not honored, and the
application might assume that it can use a 256-byte window when
decompressing. This commit returns an error if the user requests
a 256-byte window when using raw deflate or gzip encoding.
See the comment for more details. This is in response to an issue
raised as a result of a security audit of the zlib code by Trail
of Bits and TrustInSoft, in support of the Mozilla Foundation.
There was a small optimization for PowerPCs to pre-increment a
pointer when accessing a word, instead of post-incrementing. This
required prefacing the loop with a decrement of the pointer,
possibly pointing before the object passed. This is not compliant
with the C standard, for which decrementing a pointer before its
allocated memory is undefined. When tested on a modern PowerPC
with a modern compiler, the optimization no longer has any effect.
Due to all that, and per the recommendation of a security audit of
the zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, this "optimization" was removed, in order to
avoid the possibility of undefined behavior.
inftrees.c was subtracting an offset from a pointer to an array,
in order to provide a pointer that allowed indexing starting at
the offset. This is not compliant with the C standard, for which
the behavior of a pointer decremented before its allocated memory
is undefined. Per the recommendation of a security audit of the
zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, this tiny optimization was removed, in order
to avoid the possibility of undefined behavior.
An old inffast.c optimization turns out to not be optimal anymore
with modern compilers, and furthermore was not compliant with the
C standard, for which decrementing a pointer before its allocated
memory is undefined. Per the recommendation of a security audit of
the zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, this "optimization" was removed, in order to
avoid the possibility of undefined behavior.
While woolly mammoths still roamed the Earth and before Atlantis
sunk into the ocean, there were C compilers that could not handle
forward structure references, e.g. "struct name;". zlib dutifully
provided a work-around for such compilers. That work-around is no
longer needed, and, per the recommendation of a security audit of
the zlib code by Trail of Bits and TrustInSoft, in support of the
Mozilla Foundation, should be removed since what a compiler will
do with this is technically undefined. From the report: "there is
no telling what interactions the bug could have in the future with
link-time optimizations and type-based alias analyses, both
features that are present (but not default) in clang."
The undocumented (except in these commit comments) function
inflateValidate(strm, check) can be called after an inflateInit(),
inflateInit2(), or inflateReset2() with check equal to zero to
turn off the check value (CRC-32 or Adler-32) computation and
comparison. Calling with check not equal to zero turns checking
back on. This should only be called immediately after the init or
reset function. inflateReset() does not change the state, so a
previous inflateValidate() setting will remain in effect.
This also turns off validation of the gzip header CRC when
present.
This should only be used when a zlib or gzip stream has already
been checked, and repeated decompressions of the same stream no
longer need to be validated.