The new modes process the input data in independent blocks,
using backward references only from within an input block.
The new modes can be used by specifying quality 0 or quality 1,
the old quality 1 and quality 2 modes are renamed quality 2 and
quality 3, respectively, and the old quality 3 mode is removed.
In quality 1, use static Huffman codes for distance
and command histograms with <= 128 symbols and dynamic
Huffman codes with static code length codes for the other
histograms.
* Remove default constructors.
* Initialize bit_cost in histogram.Clear().
* Check fseek result in FileSize.
* Replace malloc in BrotliFileIn constructor with "new".
* Catch bad_alloc in bro tool.
These affected only quality 11, and now it does not make sense
to disable block splitting or context modeling because most of
the time is spent in zopfli anyway.
Now all speed vs size compromises are controlled by the quality param.
This is used for quality 11, for qualities <= 9 we already
have a simpler hash table.
The static data size is 252 kB, and this removes the
need to initialize a huge hash map at startup, which was
the reason why transforms had to be disabled by default.
In comparison, the static dictionary itself is 120 kB.
This supports every transform, except the kOmitFirstN.
This commit adopts the backward reference search algorithm
from the zopfli project (see https://github.com/google/zopfli)
to brotli.
This slower backward reference search is run only in quality 11
and it runs two iterations of entropy cost modeling and
shortest path search.
As a result, the original backward reference search function can
be simplified a bit, since we can remove some heuristics that were
replaced with the zopfli-style search.
* Fix an out-of-bounds access to depth_histo in the
bit cost calculation function.
* Change type of distance symbol to uint16_t in block
splitter, because if all postfix bits are used, there
can be 520 distance symbols.
* Save the distance cache between meta-blocks at the
correct place. This fixes a roundtrip failure that
can occur when there is an uncompressed metablock
between two compressed metablocks.
* Fix a bug when setting lgwin to 24 in the encoder parameters
It ended up making metablocks larger than 24 bits in size.
* Fix out-of-bounds memory accesses in parallel encoder.
CreateBackwardReferences can read up to 4 bytes past end of
input if the end of input is before mask.
* Add missing header for memcpy() in port.h
Adds functions to prepend such dictionary to the
encoder and decoder, and twiddles their internal
parameters to do as if that was a previous part of
the input. This dictionary is just a prefilled LZ77
window, it is not related to the built in transformable
brotli dictionary.
* Cluster at most 64 histograms at a time in the first
round of clustering.
* Use a faster histogram cost estimation function.
* Don't compute the log2(total) multiple times in the
block splitter.
With this the users can distinguish between not knowing
what the input is (ddefault) and knowing that it is text,
and thus can be relied on to force some UTF-8 specific settings.
Two different definitions (offset by 1) were used in
command.h and hash.h. Now they have been made the same,
also consistent with the spec (e.g. 0 means use previous dist, etc...)
With this commit, the encoder will skip some
compression optimization steps for quality <= 4,
which results in faster compression but higher
compressed sizes.
Enabled for quality >= 4, and if there are no obvious
UTF8 violations detected.
For each block, we gather two separate histograms, one
for continuation bytes and one for ASCII or lead bytes.
* Change order of members of bit reader state structure.
* Remove unused includes for assert. Add BROTLI_DCHECK
macros and use it instead of assert.
* Do not calculate nbits in common case of ReadSymbol.
* Introduce and use PREDICT_TRUE / PREDICT_FALSE macros.
* Allocate less memory in the brotli decoder if it knows
the result size beforehand. Before this, the decoder
would always allocate 16MB if the encoder annotated the
window size as 22 bit (which is the default), even if the
file is only a few KB uncompressed. Now, it'll only
allocate a ringbuffer as large as needed for the result file.
But only if it can know the filesize, it's not possible
to know that if there are multiple metablocks or too large
uncompressed metablock.
Add a BrotliCompress() method to the public encoder API
that uses the BrotliIn and BrotliOut classes and use
that in the 'bro' command-line tool.
Use the streaming api in BrotliCompressBuffer() and
BrotliCompressor::WriteMetaBlock().
Use the appropiate hashers for quality <= 9.
Disable all slow features for quality <= 9 (literal cost modeling,
dictionary, context modeling, advanced block splitting).
Change vector<Command> arguments of internal functions
to Command* and size_t.
Add a version of the brotli encoder that compresses
each meta-block independently, only using the
original input data from previous meta-blocks
and nothing from the compressor state.
This is a proof-of-concept to show that the
current format is flexible enough to support
parallel multi-threaded compression.
This will enable processing the input in smaller
chunks than the currently default 2MB for the
slow brotli, while still benefiting from the
larger sliding window.
Remove the hard-coded constants for window size
and meta-block size.
Initialize internal storage for each metablock
separately and reserve only as much as needed
for the actual input.
The new mode can be used by setting the greedy_block_split
field of BrotliParams to true.
This commit moves all the meta-block processing code
into its own library and moves the meta-block encoding
code to brotli_bit_stream.cc from encode.cc
As reported by @anthrotype, log2() is missing from MSVS 2010.
This patch uses log() and a multiplication in FastLog2()
for _MSV_VER <= 1600 and uses FastLog2() in literal_cost.cc
instead of log2().
Changes suggested by @r-lyeh and @anthrotype.
- Use a portable simple PRNG instead of rand_r()
- add missing <assert.h> include
- disambiguate log2() argument type
- remove endian.h include from write_bits.h
The new interface of the backward reference search
function makes it possible to use it in a streaming
manner.
Using the advanced cost model and static dictionary
can be turned on/off by template parameters.
The distance short codes are now computed as part of
the backward reference search.
Added a faster version of the Hasher.
This commit contains a batch of changes that were made to the Brotli
compression algorithm in the last month. Most important changes:
* Format change: don't push distances representing static dictionary words to the distance cache.
* Fix decoder invalid memory access bug caused by building a non-complete Huffman tree.
* Add a mode parameter to the encoder interface.
* Use different hashers for text and font mode.
* Add a heuristics to the hasher for skipping non-compressible data.
* Exhaustive search of static dictionary during backward reference search.
This commit contains a batch of changes that were made to the Brotli
compression algorithm in the last month. Most important changes:
* Fixes to the spec.
* Change of code length code order.
* Use a 2-level Huffman lookup table in the decoder.
* Faster uncompressed meta-block decoding.
* Optimized encoding of the Huffman code.
* Detection of UTF-8 input encoding.
* UTF-8 based literal cost modeling for improved
backward reference selection.
This change removes the redundant HCLEN, HLENINC and HLEN
fields from the encoding of the complex Huffman codes and
derives these from an invariant of the code length sequence.
Based on a patch by Robert Obryk.
This commit contains a batch of changes that were made to the Brotli
compression algorithm in the last month. Most important changes:
* Updated spec
* Changed Huffman code length alphabet to use run length codes more
efficiently, based on a suggestion by Robert Obryk
* Changed encoding of the number of Huffman code lengths (HLEN)
* Changed encoding of the number of Huffman trees (NTREES)
* Added support for uncompressed meta-blocks