Commit Graph

102 Commits

Author SHA1 Message Date
Zoltan Szabadka
417107b3dd Add two more fast modes to the brotli compressor.
The new modes process the input data in independent blocks,
using backward references only from within an input block.

The new modes can be used by specifying quality 0 or quality 1,
the old quality 1 and quality 2 modes are renamed quality 2 and
quality 3, respectively, and the old quality 3 mode is removed.
2016-01-11 11:21:42 +01:00
Zoltan Szabadka
1bf1b0a598 Faster entropy coding phase for quality 1.
In quality 1, use static Huffman codes for distance
and command histograms with <= 128 symbols and dynamic
Huffman codes with static code length codes for the other
histograms.
2016-01-08 10:10:22 +01:00
Zoltan Szabadka
4dd9114c97 Partial Hasher initialization for small input data.
This increases compression speed of very small files (< 1KB) for quality <= 3.
2016-01-07 17:10:34 +01:00
Zoltan Szabadka
8844b7f0d7 Fix more conversion warnings. 2016-01-07 16:27:49 +01:00
Eugene Klyuchnikov
24ffa78414 Fix headers 2015-12-11 11:11:51 +01:00
eustas
bc5da25a43 Merge pull request #272 from eustas/master
Upgrade license to MIT.
2015-12-09 16:25:06 +01:00
Eugene Klyuchnikov
901cd82f4f Fix WriteMetadata (unaligned and out-of-bounds write). 2015-12-04 16:09:40 +01:00
Eugene Klyuchnikov
771eb10798 Update license statement in source files. 2015-11-27 11:27:11 +01:00
Eugene Klyuchnikov
bb26d1919f Fix sign-comparison warnings
+ add more debug runtime checks
+ minor cleanup
2015-11-23 11:05:12 +01:00
Eugene Klyuchnikov
152e33c3a0 Add more explicit type conversions.
Remove dead code.
Fix includes.
2015-11-17 13:45:41 +01:00
Zoltan Szabadka
8d061836ac Fix assertion in 32-bit build. 2015-11-12 20:13:58 +01:00
Zoltan Szabadka
ea48ce5a6f Fix --Wconversion and --pedantic-erros for the encoder. 2015-10-28 17:44:47 +01:00
Zoltan Szabadka
a89b57b90c Use uint32_t positions in the hasher and compute distances modulo 2^32. 2015-10-26 17:08:57 +01:00
Zoltan Szabadka
d2e857d83e Fix integer overflow and slowness in entropy estimation. 2015-10-23 11:19:04 +02:00
Mayhem
2a1a1f7282 Remove useless BrotliCompressor instantiation in BrotliCompressBuffer 2015-10-21 19:54:35 +02:00
Zoltan Szabadka
512b9b8a3c Remove 'static' from kBitCostThreshold declaration. 2015-10-11 13:03:51 +02:00
Zoltan Szabadka
2726b8a4f6 Encoder fixes.
* Remove default constructors.
* Initialize bit_cost in histogram.Clear().
* Check fseek result in FileSize.
* Replace malloc in BrotliFileIn constructor with "new".
* Catch bad_alloc in bro tool.
2015-10-06 11:23:44 +02:00
Cosimo Lupo
66fa4ff403 [types.h] make std ints types for _MSC_VER compatible with CFFI
As defined in _cffi_include.h: 21fef94ca0/cffi/_cffi_include.h (_cffi_include.h-15)
2015-10-05 11:32:12 +01:00
Zoltan Szabadka
b39eec8810 Remove C++11 vector::data() calls from encoder. 2015-10-05 11:43:49 +02:00
Zoltan Szabadka
99aae450b8 Initialize min_cost_cmd_ in constructor. 2015-10-05 11:43:31 +02:00
Zoltan Szabadka
b6689b1504 Remove unnecessary branch from literal cost calculation. 2015-10-05 11:42:45 +02:00
szabadka
0477473ba2 Merge pull request #184 from IIoTeP9HuY/master
Support large inputs/outputs in memory adaptors
2015-10-02 13:22:15 +02:00
Zoltan Szabadka
754deaed2f Reduce command buffer memory usage. 2015-10-01 17:08:59 +02:00
Zoltan Szabadka
4c37566f4b Move literal cost computation to where it's used.
Move utf8 heuristics functions to their own file.
2015-10-01 15:10:42 +02:00
Zoltan Szabadka
3b8bef70a5 Add extern "C" linkage to the encoder and decoder dictionary definitions. 2015-10-01 14:30:22 +02:00
Zoltan Szabadka
d4cc4f8f6f Define the encoder dictionary in the .cc file and link only once. 2015-10-01 13:08:43 +02:00
Zoltan Szabadka
4a7024dcde Make the brotli encoder C++98 compatible. 2015-10-01 12:08:14 +02:00
Zoltan Szabadka
dfdf2dd4c4 Encoder bug fixes.
* Fix forward declaration mismatch.
* Fix division by zero in 64X test.
* Avoid shadowing of variables in encoder.
2015-10-01 11:40:05 +02:00
acid
08e98d8d89 Support large inputs/outputs in memory adaptors 2015-09-28 16:03:51 +03:00
Marcin Karpinski
21ac39f7c8 Fix typos. 2015-09-21 21:04:07 +02:00
Eugene Klyuchnikov
b58317a652 Fix bug in encoder. 2015-09-01 12:18:41 +02:00
Lode Vandevenne
6511d6b016 update brotli encoder with latest improvements 2015-08-28 16:09:23 +02:00
Zoltan Szabadka
7de70dbcc7 Add missing <stdlib.h> to streams.cc 2015-08-11 11:09:04 +02:00
Lode Vandevenne
17ed258993 msan bugfixes to the brotli encoder 2015-08-10 13:25:45 +02:00
Zoltan Szabadka
29c2679500 Fix encoder bug.
Under some circumstances CopyLiteralsToByteArray tried
to read begind ringbuffer.

In this patch we force it to read completely from range [0..mask]
2015-07-30 17:42:02 +02:00
Zoltan Szabadka
95ddb48a11 Fix some VS compilation errors in the encoder.
- Use std::numeric_limits<double>::infinity() instead of 1.0 / 0.0
  - Use FastLog2() instead of log2() in cost model
2015-06-29 14:20:25 +02:00
Lode Vandevenne
bad0f4edf1 Brotli Bug Fixes 2015-06-26 17:37:00 +02:00
Zoltan Szabadka
618287b373 Deprecate greedy_block_split and enable_context_modeling brotli params.
These affected only quality 11, and now it does not make sense
to disable block splitting or context modeling because most of
the time is spent in zopfli anyway.

Now all speed vs size compromises are controlled by the quality param.
2015-06-12 16:50:49 +02:00
Zoltan Szabadka
66098830a2 Use a static hash table to look up dictionary words and transforms.
This is used for quality 11, for qualities <= 9 we already
have a simpler hash table.

The static data size is 252 kB, and this removes the
need to initialize a huge hash map at startup, which was
the reason why transforms had to be disabled by default.
In comparison, the static dictionary itself is 120 kB.
This supports every transform, except the kOmitFirstN.
2015-06-12 16:45:17 +02:00
Zoltan Szabadka
b3d3723f62 Add "zopfli"-style backward reference search to brotli.
This commit adopts the backward reference search algorithm
from the zopfli project (see https://github.com/google/zopfli)
to brotli.

This slower backward reference search is run only in quality 11
and it runs two iterations of entropy cost modeling and
shortest path search.

As a result, the original backward reference search function can
be simplified a bit, since we can remove some heuristics that were
replaced with the zopfli-style search.
2015-06-12 16:25:41 +02:00
Zoltan Szabadka
835a77469e Change the static dictionary hash table to take into
account word frequency when there are hash collisions.
2015-06-12 16:14:06 +02:00
Zoltan Szabadka
65f3fc55f5 Bug fixes for the brotli encoder.
* Fix an out-of-bounds access to depth_histo in the
    bit cost calculation function.

  * Change type of distance symbol to uint16_t in block
    splitter, because if all postfix bits are used, there
    can be 520 distance symbols.

  * Save the distance cache between meta-blocks at the
    correct place. This fixes a roundtrip failure that
    can occur when there is an uncompressed metablock
    between two compressed metablocks.

  * Fix a bug when setting lgwin to 24 in the encoder parameters
    It ended up making metablocks larger than 24 bits in size.

  * Fix out-of-bounds memory accesses in parallel encoder.
    CreateBackwardReferences can read up to 4 bytes past end of
    input if the end of input is before mask.

  * Add missing header for memcpy() in port.h
2015-06-12 16:11:50 +02:00
Zoltan Szabadka
b43df8f699 Brotli custom LZ77 dictionary support.
Adds functions to prepend such dictionary to the
encoder and decoder, and twiddles their internal
parameters to do as if that was a previous part of
the input. This dictionary is just a prefilled LZ77
window, it is not related to the built in transformable
brotli dictionary.
2015-06-12 15:43:54 +02:00
Zoltan Szabadka
667f70adcb Speedups to brotli quality 11.
* Cluster at most 64 histograms at a time in the first
    round of clustering.

  * Use a faster histogram cost estimation function.

  * Don't compute the log2(total) multiple times in the
    block splitter.
2015-06-12 15:29:06 +02:00
Zoltan Szabadka
6622355a9a Use the same hasher for text and font mode.
We use 4-byte hashing in both and look for length 3
matches separately.
2015-05-11 14:11:07 +02:00
Zoltan Szabadka
cc8d64dfec Fix broken quality 0, make it same as quality 1. 2015-05-11 13:51:47 +02:00
Zoltan Szabadka
aa853f3cbc Add a MODE_GENERIC compression mode to the interface.
With this the users can distinguish between not knowing
what the input is (ddefault) and knowing that it is text,
and thus can be relied on to force some UTF-8 specific settings.
2015-05-11 11:33:19 +02:00
Zoltan Szabadka
54f69c9ef7 Support window bits 10 - 15 in the decoder.
The previous window bit value 17 is used to
extend the range, since it has not been used
in any previous encoders.
2015-05-07 17:44:33 +02:00
Zoltan Szabadka
12eb9bfd70 Align distance code meaning in the brotli encoder.
Two different definitions (offset by 1) were used in
command.h and hash.h. Now they have been made the same,
also consistent with the spec (e.g. 0 means use previous dist, etc...)
2015-05-07 17:40:00 +02:00
Zoltan Szabadka
7cde616c9e Faster encoding for low quality settings.
With this commit, the encoder will skip some
compression optimization steps for quality <= 4,
which results in faster compression but higher
compressed sizes.
2015-05-07 17:30:10 +02:00