Commit Graph

124 Commits

Author SHA1 Message Date
acid
08e98d8d89 Support large inputs/outputs in memory adaptors 2015-09-28 16:03:51 +03:00
Marcin Karpinski
21ac39f7c8 Fix typos. 2015-09-21 21:04:07 +02:00
Eugene Klyuchnikov
b58317a652 Fix bug in encoder. 2015-09-01 12:18:41 +02:00
Lode Vandevenne
6511d6b016 update brotli encoder with latest improvements 2015-08-28 16:09:23 +02:00
Zoltan Szabadka
7de70dbcc7 Add missing <stdlib.h> to streams.cc 2015-08-11 11:09:04 +02:00
Lode Vandevenne
17ed258993 msan bugfixes to the brotli encoder 2015-08-10 13:25:45 +02:00
Zoltan Szabadka
29c2679500 Fix encoder bug.
Under some circumstances CopyLiteralsToByteArray tried
to read begind ringbuffer.

In this patch we force it to read completely from range [0..mask]
2015-07-30 17:42:02 +02:00
Zoltan Szabadka
95ddb48a11 Fix some VS compilation errors in the encoder.
- Use std::numeric_limits<double>::infinity() instead of 1.0 / 0.0
  - Use FastLog2() instead of log2() in cost model
2015-06-29 14:20:25 +02:00
Lode Vandevenne
bad0f4edf1 Brotli Bug Fixes 2015-06-26 17:37:00 +02:00
Zoltan Szabadka
618287b373 Deprecate greedy_block_split and enable_context_modeling brotli params.
These affected only quality 11, and now it does not make sense
to disable block splitting or context modeling because most of
the time is spent in zopfli anyway.

Now all speed vs size compromises are controlled by the quality param.
2015-06-12 16:50:49 +02:00
Zoltan Szabadka
66098830a2 Use a static hash table to look up dictionary words and transforms.
This is used for quality 11, for qualities <= 9 we already
have a simpler hash table.

The static data size is 252 kB, and this removes the
need to initialize a huge hash map at startup, which was
the reason why transforms had to be disabled by default.
In comparison, the static dictionary itself is 120 kB.
This supports every transform, except the kOmitFirstN.
2015-06-12 16:45:17 +02:00
Zoltan Szabadka
b3d3723f62 Add "zopfli"-style backward reference search to brotli.
This commit adopts the backward reference search algorithm
from the zopfli project (see https://github.com/google/zopfli)
to brotli.

This slower backward reference search is run only in quality 11
and it runs two iterations of entropy cost modeling and
shortest path search.

As a result, the original backward reference search function can
be simplified a bit, since we can remove some heuristics that were
replaced with the zopfli-style search.
2015-06-12 16:25:41 +02:00
Zoltan Szabadka
835a77469e Change the static dictionary hash table to take into
account word frequency when there are hash collisions.
2015-06-12 16:14:06 +02:00
Zoltan Szabadka
65f3fc55f5 Bug fixes for the brotli encoder.
* Fix an out-of-bounds access to depth_histo in the
    bit cost calculation function.

  * Change type of distance symbol to uint16_t in block
    splitter, because if all postfix bits are used, there
    can be 520 distance symbols.

  * Save the distance cache between meta-blocks at the
    correct place. This fixes a roundtrip failure that
    can occur when there is an uncompressed metablock
    between two compressed metablocks.

  * Fix a bug when setting lgwin to 24 in the encoder parameters
    It ended up making metablocks larger than 24 bits in size.

  * Fix out-of-bounds memory accesses in parallel encoder.
    CreateBackwardReferences can read up to 4 bytes past end of
    input if the end of input is before mask.

  * Add missing header for memcpy() in port.h
2015-06-12 16:11:50 +02:00
Zoltan Szabadka
b43df8f699 Brotli custom LZ77 dictionary support.
Adds functions to prepend such dictionary to the
encoder and decoder, and twiddles their internal
parameters to do as if that was a previous part of
the input. This dictionary is just a prefilled LZ77
window, it is not related to the built in transformable
brotli dictionary.
2015-06-12 15:43:54 +02:00
Zoltan Szabadka
667f70adcb Speedups to brotli quality 11.
* Cluster at most 64 histograms at a time in the first
    round of clustering.

  * Use a faster histogram cost estimation function.

  * Don't compute the log2(total) multiple times in the
    block splitter.
2015-06-12 15:29:06 +02:00
Zoltan Szabadka
6622355a9a Use the same hasher for text and font mode.
We use 4-byte hashing in both and look for length 3
matches separately.
2015-05-11 14:11:07 +02:00
Zoltan Szabadka
cc8d64dfec Fix broken quality 0, make it same as quality 1. 2015-05-11 13:51:47 +02:00
Zoltan Szabadka
aa853f3cbc Add a MODE_GENERIC compression mode to the interface.
With this the users can distinguish between not knowing
what the input is (ddefault) and knowing that it is text,
and thus can be relied on to force some UTF-8 specific settings.
2015-05-11 11:33:19 +02:00
Zoltan Szabadka
54f69c9ef7 Support window bits 10 - 15 in the decoder.
The previous window bit value 17 is used to
extend the range, since it has not been used
in any previous encoders.
2015-05-07 17:44:33 +02:00
Zoltan Szabadka
12eb9bfd70 Align distance code meaning in the brotli encoder.
Two different definitions (offset by 1) were used in
command.h and hash.h. Now they have been made the same,
also consistent with the spec (e.g. 0 means use previous dist, etc...)
2015-05-07 17:40:00 +02:00
Zoltan Szabadka
7cde616c9e Faster encoding for low quality settings.
With this commit, the encoder will skip some
compression optimization steps for quality <= 4,
which results in faster compression but higher
compressed sizes.
2015-05-07 17:30:10 +02:00
Zoltan Szabadka
945b0d025f Use a static context map with two buckets for UTF8 data.
Enabled for quality >= 4, and if there are no obvious
UTF8 violations detected.
For each block, we gather two separate histograms, one
for continuation bytes and one for ASCII or lead bytes.
2015-05-07 17:23:07 +02:00
Zoltan Szabadka
83aa24dc86 Speed and memory usage improvements for the decoder.
* Change order of members of bit reader state structure.

* Remove unused includes for assert. Add BROTLI_DCHECK
  macros and use it instead of assert.

* Do not calculate nbits in common case of ReadSymbol.

* Introduce and use PREDICT_TRUE / PREDICT_FALSE macros.

* Allocate less memory in the brotli decoder if it knows
  the result size beforehand. Before this, the decoder
  would always allocate 16MB if the encoder annotated the
  window size as 22 bit (which is the default), even if the
  file is only a few KB uncompressed. Now, it'll only
  allocate a ringbuffer as large as needed for the result file.
  But only if it can know the filesize, it's not possible
  to know that if there are multiple metablocks or too large
  uncompressed metablock.
2015-05-07 16:53:43 +02:00
Zoltan Szabadka
0f726df1f1 Don't do any block splitting for quality 1. 2015-04-28 10:12:47 +02:00
Zoltan Szabadka
99af4df8ea Remove the 'override' keyword from ~BrotliFileIn().
Apparently MSVS 2010 does not support this.
2015-04-23 16:43:38 +02:00
Zoltan Szabadka
98539223f5 Remove quality parameter from bitstream writing functions.
Fix a few crashes related to some quality and param combinations.
2015-04-23 16:20:29 +02:00
Zoltan Szabadka
fdfb19806b Add a static hash table based dictionary lookup to fast brotli. 2015-04-23 15:52:32 +02:00
Zoltan Szabadka
2fd80cdc9a Encoder support for new empty meta-block format.
Changed the parallel implementation to sync meta-blocks
to byte boundary by emitting empty meta-blocks.
2015-04-23 15:43:37 +02:00
Zoltan Szabadka
6d80610f03 Fix entropy calculation. 2015-04-23 15:35:16 +02:00
Zoltan Szabadka
3dbe2e03e7 Encoder implementation using input/output classes.
Add a BrotliCompress() method to the public encoder API
that uses the BrotliIn and BrotliOut classes and use
that in the 'bro' command-line tool.

Use the streaming api in BrotliCompressBuffer() and
BrotliCompressor::WriteMetaBlock().

Use the appropiate hashers for quality <= 9.
2015-04-23 15:26:08 +02:00
Zoltan Szabadka
09aa9ca410 Add the streams.* files to Makefile and setup.py 2015-04-23 14:35:43 +02:00
Zoltan Szabadka
6a5303304e Add input and output classes for streaming compression. 2015-04-23 14:29:01 +02:00
Zoltan Szabadka
89a6fb85fb Add params to disable static dictionary and context modeling.
Disable all slow features for quality <= 9 (literal cost modeling,
dictionary, context modeling, advanced block splitting).

Change vector<Command> arguments of internal functions
to Command* and size_t.
2015-04-23 13:15:42 +02:00
Zoltan Szabadka
e377e65f11 Limit the max input meta-block size to 16MB. 2015-04-02 11:12:04 +02:00
Zoltan Szabadka
1428d54178 Proof-of-concept encoder for parallel compression.
Add a version of the brotli encoder that compresses
each meta-block independently, only using the
original input data from previous meta-blocks
and nothing from the compressor state.
This is a proof-of-concept to show that the
current format is flexible enough to support
parallel multi-threaded compression.
2015-04-01 16:35:52 +02:00
Zoltan Szabadka
817a3edd52 Add an input block size parameter to brotli.
This will enable processing the input in smaller
chunks than the currently default 2MB for the
slow brotli, while still benefiting from the
larger sliding window.
2015-04-01 16:29:04 +02:00
Zoltan Szabadka
d6d69ec4ac Add quality and lgwin to the BrotliParams.
Remove the hard-coded constants for window size
and meta-block size.

Initialize internal storage for each metablock
separately and reserve only as much as needed
for the actual input.
2015-04-01 16:10:15 +02:00
Zoltan Szabadka
ca3a7a98f2 Use FastLog2() instead of log() in BitsEntropy(). 2015-03-30 13:41:52 +02:00
Zoltan Szabadka
534654def1 Add a faster but less dense compression mode.
The new mode can be used by setting the greedy_block_split
field of BrotliParams to true.

This commit moves all the meta-block processing code
into its own library and moves the meta-block encoding
code to brotli_bit_stream.cc from encode.cc
2015-03-27 14:20:35 +01:00
Zoltan Szabadka
497814eebd Remove the redundant EncodeMetaBlockHeader() function.
Use Store{Compressed,Uncompressed}MetaBlockHeader() instead.
2015-03-24 10:18:06 +01:00
Zoltan Szabadka
28135ea9e8 Fix another use of log2() in literal_cost.cc 2015-02-27 16:53:00 +01:00
Zoltan Szabadka
fab601e81f Fix encoder compilation error on MSVS 2010.
As reported by @anthrotype, log2() is missing from MSVS 2010.
This patch uses log() and a multiplication in FastLog2()
for _MSV_VER <= 1600 and uses FastLog2() in literal_cost.cc
instead of log2().
2015-02-27 16:04:43 +01:00
Zoltan Szabadka
f0b88cbcdb Fixes to the encoder to support visual studio.
Changes suggested by @r-lyeh and @anthrotype.

 - Use a portable simple PRNG instead of rand_r()
 - add missing <assert.h> include
 - disambiguate log2() argument type
 - remove endian.h include from write_bits.h
2015-02-25 18:19:51 +01:00
Zoltan Szabadka
5bc56a17ee Fully qualify std::max_element, std::push_heap and std::pop_heap names. 2015-02-25 10:29:24 +01:00
Zoltan Szabadka
e643328a7a Speed up FindMatchLength for non-x86 64-bit targets.
This CL enables 64-bit optimization for non-x86 target.
2015-02-25 10:24:13 +01:00
Zoltan Szabadka
e1739826c0 Add command-line tool and tests. 2014-10-30 13:59:37 +01:00
Zoltan Szabadka
96d04e53d7 Disable transforms in the encoder by default.
This change reduces the startup-time of the encoder considerably.
2014-10-29 15:39:35 +01:00
Zoltan Szabadka
485ad82e94 Fix potential output buffer overflow in encoder. 2014-10-28 14:05:53 +01:00
Zoltan Szabadka
0428f2d1bd Move the context map encoding function to the brotli_bit_stream library. 2014-10-28 13:47:21 +01:00
Zoltan Szabadka
f321ba1964 Make the histogram clustering function more generic.
Change the template parameter to be the histogram class
instead of the alphabet size of the histogram.
2014-10-28 13:36:21 +01:00
Zoltan Szabadka
b4f39bf540 New version of the backward reference search code.
The new	interface of the backward reference search
function makes it possible to use it in	a streaming
manner.

Using the advanced cost model and static dictionary
can be turned on/off by	template parameters.

The distance short codes are now computed as part of
the backward reference search.

Added a	faster version of the Hasher.
2014-10-28 13:25:22 +01:00
Zoltan Szabadka
03b20347e1 Minor tuning of encoder heuristics. 2014-10-28 12:50:33 +01:00
Zoltan Szabadka
ca8c2890aa Fix storing of the meta-block header for last empty meta-block. 2014-10-28 12:09:18 +01:00
Zoltan Szabadka
467e6eef80 Move the block switch stroing functions to the brotli_bit_stream library. 2014-10-28 11:53:52 +01:00
Zoltan Szabadka
39cde017e6 Fix TODO markups. 2014-10-15 14:14:34 +02:00
Zoltan Szabadka
d6d9fc60e1 Factor out serialization functions into their own file.
Create a brotli_bit_stream library that is responsible for writing
various structures (headers, Huffman codes, etc.) directly into the
bit-stream.
2014-10-15 14:01:36 +02:00
Zoltan Szabadka
12c6d1fbe4 Apply const qualifier to call operator of comparator class. 2014-10-15 13:33:56 +02:00
Zoltan Szabadka
fe6e9b0148 Remove broken Makefiles.
Makefiles will be added together with a command-line interface in a
later commit.
2014-10-14 13:39:48 +02:00
Zoltan Szabadka
e8d668f873 Add top-level README file.
Remove brotlispec.txt and add a link to the latest internet-draft instead.
2014-10-14 13:08:35 +02:00
Zoltan Szabadka
3f655b63de Fix buffer overflow bug in the brotli encoder. 2014-03-27 16:38:07 +01:00
Zoltan Szabadka
347781947a Update the dictionary and the transforms. 2014-03-25 16:48:25 +01:00
Zoltan Szabadka
e7650080a8 Updates to Brotli compression format, decoder and encoder
This commit contains a batch of changes that were made to the Brotli
compression algorithm in the last month. Most important changes:

   * Format change: don't push distances representing static dictionary words to the distance cache.
   * Fix decoder invalid memory access bug caused by building a non-complete Huffman tree.
   * Add a mode parameter to the encoder interface.
   * Use different hashers for text and font mode.
   * Add a heuristics to the hasher for skipping non-compressible data.
   * Exhaustive search of static dictionary during backward reference search.
2014-03-20 14:32:35 +01:00
Zoltan Szabadka
2f268ad158 Add the initial version of the static dictionary and transforms to Brotli. 2014-02-17 14:25:36 +01:00
Zoltan Szabadka
0454ab4ec0 Updates to Brotli compression format, decoder and encoder
This commit contains a batch of changes that were made to the Brotli
compression algorithm in the last month. Most important changes:

   * Fixes to the spec.
   * Change of code length code order.
   * Use a 2-level Huffman lookup table in the decoder.
   * Faster uncompressed meta-block decoding.
   * Optimized encoding of the Huffman code.
   * Detection of UTF-8 input encoding.
   * UTF-8 based literal cost modeling for improved
     backward reference selection.
2014-02-14 15:04:23 +01:00
Zoltan Szabadka
2bcd58bb5a Brotli format change: small improvement to the encoding of Huffman codes
Combine the HSKIP and the simple/complex Huffman code type bits.
2014-01-08 12:28:28 +01:00
Zoltan Szabadka
d762bc6845 Bug fixes for the brotli encoder and decoder. 2014-01-06 16:01:57 +01:00
Zoltan Szabadka
1447345cbb Brotli format change: improved encoding of Huffman codes
This change removes the redundant HCLEN, HLENINC and HLEN
fields from the encoding of the complex Huffman codes and
derives these from an invariant of the code length sequence.
Based on a patch by Robert Obryk.
2013-12-17 17:17:57 +01:00
Roderick Sheeter
c23cb1e83d Support for OSX build; tested using OSX 10.9/clang-500.2.79 2013-12-12 10:43:05 -08:00
Zoltan Szabadka
60c24c0c2d Updates to Brotli compression format, decoder and encoder
This commit contains a batch of changes that were made to the Brotli
compression algorithm in the last month. Most important changes:

   * Updated spec
   * Changed Huffman code length alphabet to use run length codes more
     efficiently, based on a suggestion by Robert Obryk
   * Changed encoding of the number of Huffman code lengths (HLEN)
   * Changed encoding of the number of Huffman trees (NTREES)
   * Added support for uncompressed meta-blocks
2013-12-12 13:18:04 +01:00
Zoltan Szabadka
8d7081f2d0 Add draft specification of the brotli format 2013-11-28 17:37:13 +01:00
Roderick Sheeter
1cdcbd851f Added Brotli compress/decompress utilities and makefiles 2013-11-19 14:32:56 -08:00
Zoltan Szabadka
c6b9c7c5c8 Updates to Brotli compression format, decoder and encoder
This commit contains a batch of changes that were made to the Brotli
compression algorithm in the last three weeks. Most important changes:

  * Added UTF8 context model for good text compression.
  * Simplified context modeling by having only 4 context modes.
  * Per-block context mode selection.
  * Faster backward copying and bit reading functions.
  * More efficient histogram coding.
  * Streaming support for the decoder and encoder.
2013-11-15 19:02:17 +01:00
Zoltan Szabadka
c66e4e3e4f Add brotli compressor
This commit is for the encoder for brotli compression format.
Brotli is a generic byte-level compression algorithm.
2013-10-23 13:06:13 +02:00