mirror of
https://github.com/google/brotli.git
synced 2024-11-21 19:20:09 +00:00
Updates to Brotli compression format, decoder and encoder
This commit contains a batch of changes that were made to the Brotli compression algorithm in the last month. Most important changes: * Updated spec * Changed Huffman code length alphabet to use run length codes more efficiently, based on a suggestion by Robert Obryk * Changed encoding of the number of Huffman code lengths (HLEN) * Changed encoding of the number of Huffman trees (NTREES) * Added support for uncompressed meta-blocks
This commit is contained in:
parent
8d7081f2d0
commit
60c24c0c2d
225
brotlispec.txt
225
brotlispec.txt
@ -52,9 +52,8 @@ Abstract
|
||||
such as Unix filters;
|
||||
* Compresses data with efficiency comparable to the best
|
||||
currently available general-purpose compression methods,
|
||||
and in particular considerably better than the gzip
|
||||
program and decompresses much faster than the LZMA
|
||||
implementations.
|
||||
and in particular considerably better than the gzip program;
|
||||
* Decompresses much faster than the LZMA implementations.
|
||||
|
||||
The data format defined by this specification does not attempt to:
|
||||
* Allow random access to compressed data;
|
||||
@ -196,23 +195,50 @@ Abstract
|
||||
|
||||
The sequence of each type of value in the representation of a command
|
||||
(insert-and-copy lengths, literals and distances) within a meta-
|
||||
block is further divided into blocks. In other words, each meta-block
|
||||
has a series of insert-and-copy length blocks, a series of literal
|
||||
blocks and a series of distance blocks. These are also called the
|
||||
three block categories: a meta-block has a series of blocks for each
|
||||
block category. The subsequent blocks within each block category have
|
||||
different block types, but blocks further away in the block sequence
|
||||
can have the same types. The block types are numbered from 0 to the
|
||||
maximum block type number of 253 and the first block of each block
|
||||
category has type 0. The block structure of a meta-block is
|
||||
represented by the sequence of block-switch commands for each block
|
||||
category, where a block-switch command is a pair <block type, block
|
||||
length>. The block-switch commands are represented in the compressed
|
||||
data before the start of each new block using a Huffman code tree for
|
||||
block is further divided into blocks. In the "brotli" format, blocks
|
||||
are not contiguous chunks of compressed data, but rather the pieces
|
||||
of compressed data belonging to a block are interleaved with pieces
|
||||
of data belonging to other blocks. Each meta-block can be logically
|
||||
decomposed into a series of insert-and-copy length blocks, a series
|
||||
of literal blocks and a series of distance blocks. These are also
|
||||
called the three block categories: a meta-block has a series of
|
||||
blocks for each block category. Note that the physical structure of
|
||||
the meta-block is a series of commands, while the three series of
|
||||
blocks is the logical structure. Consider the following example:
|
||||
|
||||
(IaC0, L0, L1, L2, D0)(IaC1, D1)(IaC2, L3, L4, D2)(IaC3, L5, D3)
|
||||
|
||||
The meta-block here has 4 commands, and each three types of symbols
|
||||
within these commands can be rearranged into for example the
|
||||
following logical block structure:
|
||||
|
||||
[IaC0, IaC1][IaC2, IaC3] <-- block types 0 and 1
|
||||
|
||||
[L0, L1][L2, L3, L4][L5] <-- block types 0, 1, and 0
|
||||
|
||||
[D0][D1, D2, D3] <-- block types 0 and 1
|
||||
|
||||
The subsequent blocks within each block category must have different
|
||||
block types, but blocks further away in the block sequence can have
|
||||
the same types. The block types are numbered from 0 to the maximum
|
||||
block type number of 255 and the first block of each block category
|
||||
must have type 0. The block structure of a meta-block is represented
|
||||
by the sequence of block-switch commands for each block category,
|
||||
where a block-switch command is a pair <block type, block length>.
|
||||
The block-switch commands are represented in the compressed data
|
||||
before the start of each new block using a Huffman code tree for
|
||||
block types and a separate Huffman code tree for block lengths for
|
||||
each block category. The code trees for block types and lengths
|
||||
(total of six Huffman code trees) appear in a compact form in the
|
||||
meta-block header.
|
||||
each block category. In the above example the physical layout of the
|
||||
meta-block is the following:
|
||||
|
||||
IaC0 L0 L1 LBlockSwitch(1, 3) L2 D0 IaC1 DBlockSwitch(1, 1) D1
|
||||
IaCBlockSwitch(1, 2) IaC2 L3 L4 D2 IaC3 LBlockSwitch(0, 1) D3
|
||||
|
||||
Note that the block switch commands for the first blocks are not part
|
||||
of the meta-block compressed data part, they are encoded in the meta-
|
||||
block header. The code trees for block types and lengths (total of
|
||||
six Huffman code trees) appear in a compact form in the meta-block
|
||||
header.
|
||||
|
||||
Each type of value (insert-and-copy lengths, literals and distances)
|
||||
can be encoded with any Huffman tree from a collection of Huffman
|
||||
@ -235,7 +261,7 @@ Abstract
|
||||
and the context map), the meta-block header contains the number of
|
||||
input bytes in the meta-block and two additional parameters used in
|
||||
the representation of copy distances (number of "postfix bits" and
|
||||
number of direct distance codes, see later).
|
||||
number of direct distance codes).
|
||||
|
||||
3. Compressed representation of Huffman codes
|
||||
|
||||
@ -383,8 +409,7 @@ Abstract
|
||||
length codes, the alphabet size is 704. For block length codes,
|
||||
the alphabet size is 26. For distance codes, block type codes and
|
||||
the Huffman codes used in compressing the context map, the
|
||||
alphabet size is dynamic and is based on other parameters (see
|
||||
later).
|
||||
alphabet size is dynamic and is based on other parameters.
|
||||
|
||||
3.4. Simple Huffman codes
|
||||
|
||||
@ -446,13 +471,19 @@ Abstract
|
||||
If this is the first code length, or all previous
|
||||
code lengths are zero, a code length of 8 is
|
||||
repeated 3 - 6 times
|
||||
Example: Codes 7, 16 (+2 bits 11),
|
||||
16 (+2 bits 10) will expand to
|
||||
12 code lengths of 7 (1 + 6 + 5)
|
||||
A repeated code length code of 16 modifies the
|
||||
repeat count of the previous one as follows:
|
||||
repeat count = (4 * (repeat count - 2)) +
|
||||
(3 - 6 on the next 2 bits)
|
||||
Example: Codes 7, 16 (+2 bits 11), 16 (+2 bits 10)
|
||||
will expand to 22 code lengths of 7
|
||||
(1 + 4 * (6 - 2) + 5)
|
||||
17: Repeat a code length of 0 for 3 - 10 times.
|
||||
(3 bits of length)
|
||||
18: Repeat a code length of 0 for 11 - 138 times
|
||||
(7 bits of length)
|
||||
A repeated code length code of 17 modifies the
|
||||
repeat count of the previous one as follows:
|
||||
repeat count = (8 * (repeat count - 2)) +
|
||||
(3 - 10 on the next 3 bits)
|
||||
|
||||
A code length of 0 indicates that the corresponding symbol in the
|
||||
alphabet will not occur in the compressed data, and should not
|
||||
@ -475,12 +506,12 @@ Abstract
|
||||
follows:
|
||||
|
||||
1 bit: 0, indicating a complex Huffman code
|
||||
4 bits: HCLEN, # of code length codes - 4
|
||||
4 bits: HCLEN, # of code length codes - 3
|
||||
1 bit : HSKIP, if 1, skip over first two code length codes
|
||||
|
||||
(HCLEN + 4 - 2 * HSKIP) code lengths for symbols in the code
|
||||
(HCLEN + 3 - 2 * HSKIP) code lengths for symbols in the code
|
||||
length alphabet given just above, in the order: 1, 2, 3,
|
||||
4, 0, 17, 18, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
|
||||
4, 0, 17, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
|
||||
|
||||
If HSKIP is 1, code lengths of code length symbols 1 and
|
||||
2 are implicit zeros. Code lengths of code length symbols
|
||||
@ -495,19 +526,18 @@ Abstract
|
||||
1 bit: HLENINC, if 1, the number of code length symbols is
|
||||
encoded next
|
||||
|
||||
3 bits: HNBITPAIRS, (# of bit pairs to represent HLEN) - 2,
|
||||
appears only if HLENINC = 1
|
||||
|
||||
2 * HNBITPAIRS + 2 bits: HLEN, # of code length symbols - 2,
|
||||
appears only if HLENINC = 1
|
||||
7-8 bits: HLEN, # of code length symbols, with the following
|
||||
encoding: values 4 - 67 with bit pattern 0xxxxxx,
|
||||
values 68 - 195 with bit pattern 1xxxxxxx, appears
|
||||
only if HLENINC = 1
|
||||
|
||||
Sequence of code lengths symbols, encoded using the code
|
||||
length Huffman code. The number of code length symbols
|
||||
is either HLEN + 2 (in case of HLENINC = 1), or as many
|
||||
as is needed to assign a code length to each symbol in
|
||||
the alphabet (i.e. the alphabet size minus the sum of all
|
||||
the repeat lengths defined by extra bits of code length
|
||||
symbols 16 - 18). In case of HLENINC = 1, all symbols
|
||||
is either HLEN (in case of HLENINC = 1), or as many as is
|
||||
needed to assign a code length to each symbol in the
|
||||
alphabet (i.e. the alphabet size minus the sum of all the
|
||||
repeat lengths defined by extra bits of code length
|
||||
symbols 16 and 17). In case of HLENINC = 1, all symbols
|
||||
not assigned a code length have implicit code length 0.
|
||||
|
||||
3.6. Validity of the Huffman code
|
||||
@ -582,7 +612,7 @@ Abstract
|
||||
the NDIRECT direct distance codes have any extra bits.
|
||||
|
||||
Distance codes 16 + NDIRECT and greater all have extra bits, the
|
||||
number of extra bits for a distance code `dcode' is given by the
|
||||
number of extra bits for a distance code "dcode" is given by the
|
||||
following formula:
|
||||
|
||||
ndistbits = 1 + ((dcode - NDIRECT - 16) >> (NPOSTFIX + 1))
|
||||
@ -590,8 +620,8 @@ Abstract
|
||||
The maximum number of extra bits is 24, therefore the size of the
|
||||
distance code alphabet is (16 + NDIRECT + (48 << NPOSTFIX)).
|
||||
|
||||
Given a distance code `dcode' (>= 16 + NDIRECT), and extra bits
|
||||
`dextra', the backward distance is given by the following formula:
|
||||
Given a distance code "dcode" (>= 16 + NDIRECT), and extra bits
|
||||
"dextra", the backward distance is given by the following formula:
|
||||
|
||||
hcode = (dcode - NDIRECT - 16) >> NPOSTFIX
|
||||
lcode = (dcode - NDIRECT - 16) & POSTFIX_MASK
|
||||
@ -704,8 +734,8 @@ Abstract
|
||||
alphabet. A block type code 0 means that the block type is the same
|
||||
as the type of the second last block from the same block category,
|
||||
while a block type code 1 means that the block type equals the last
|
||||
block type plus one. Block type codes 2 - 255 represent block types
|
||||
0 - 253. The second last and last block types are initialized with 0
|
||||
block type plus one. Block type codes 2 - 257 represent block types
|
||||
0 - 255. The second last and last block types are initialized with 0
|
||||
and 1, respectively, at the beginning of each meta-block.
|
||||
|
||||
The first block type of each block category must be 0, and the block
|
||||
@ -851,10 +881,9 @@ Abstract
|
||||
|
||||
7.2. Context id for distances
|
||||
|
||||
The context for encoding the next distance code is defined by the
|
||||
copy length corresponding to the distance. The context ids are
|
||||
0, 1, 2, and 3 for copy lengths 2, 3, 4, and more than 4,
|
||||
respectively.
|
||||
The context for encoding a distance code is defined by the copy
|
||||
length corresponding to the distance. The context ids are 0, 1, 2,
|
||||
and 3 for copy lengths 2, 3, 4, and more than 4, respectively.
|
||||
|
||||
7.3. Encoding of the context map
|
||||
|
||||
@ -869,7 +898,7 @@ Abstract
|
||||
CMAPL[0..(64 * NBLTYPESL - 1)] and CMAPD[0..(4 * NBLTYPESD - 1)].
|
||||
|
||||
The index of the Huffman tree for encoding a literal or distance
|
||||
code with context id `cid' and block type `bltype' is
|
||||
code with context id "cid" and block type "bltype" is
|
||||
|
||||
index of literal Huffman tree = CMAPL[bltype * 64 + cid]
|
||||
|
||||
@ -899,9 +928,6 @@ Abstract
|
||||
now define the format of the context map (the same format is used
|
||||
for literal and distance context maps):
|
||||
|
||||
8 bits: NTREES - 1, if NTREES = 1 all values in the context
|
||||
map are zeros, and no further bits are needed for
|
||||
the context map encoding. Otherwise,
|
||||
1-5 bits: RLEMAX, 0 is encoded with one 0 bit, and values
|
||||
1 - 16 are encoded with bit pattern 1xxxx
|
||||
|
||||
@ -914,7 +940,9 @@ Abstract
|
||||
transform on the values in the context map to get
|
||||
the Huffman code indexes
|
||||
|
||||
8. Language-based static dictionaries
|
||||
For the encoding of NTREES see Section 9.2.
|
||||
|
||||
8. Static dictionary
|
||||
|
||||
At any given point during decoding the compressed data, a reference
|
||||
to a duplicated string in the output produced so far has a maximum
|
||||
@ -923,24 +951,44 @@ Abstract
|
||||
from the input stream, as described in section 4, can produce
|
||||
distances that are greater than this maximum allowed value. The
|
||||
difference between these distances and the first invalid distance
|
||||
value is treated as reference to a word in one of the language-based
|
||||
static dictionaries given in Appendix A. The id of the static
|
||||
dictionary is determined by the copy length of the command:
|
||||
value is treated as reference to a word in the static dictionary
|
||||
given in Appendix A. The maximum valid copy length for a static
|
||||
dictionary reference is 24. The static dictionary has three parts:
|
||||
|
||||
dictionary id = copy length - 4
|
||||
word id = distance - (max allowed distance + 1)
|
||||
* DICT[0..DICTSIZE], an array of bytes
|
||||
* DOFFSET[0..24], an array of byte offset values for each length
|
||||
* NDBITS[0..24], an array of bit-depth values for each length
|
||||
|
||||
If the copy length is less than 4, or the dictionary id is invalid,
|
||||
the compressed data set is invalid and must be discarded.
|
||||
The number of static dictionary words for a given length is:
|
||||
|
||||
Each of the static dictionaries has 2^N words, and the index of the
|
||||
referenced word is formed by the N least significant bits of the word
|
||||
id. The word id right-shifted by N gives the index to one of the word
|
||||
transformations given in Appendix B. If this transformation index is
|
||||
greater than the maximum transformation index, the compressed data
|
||||
set is invalid and must be discarded. The string copied to the output
|
||||
stream is computed by applying the transformation to the referenced
|
||||
static dictionary word.
|
||||
NWORDS[length] = 0 (if length < 3)
|
||||
NWORDS[length] = (1 << NDBITS[lengths]) (if length >= 3)
|
||||
|
||||
DOFFSET and DICTSIZE are defined by the following recursion:
|
||||
|
||||
DOFFSET[0] = 0
|
||||
DOFFSET[length + 1] = DOFFSET[length] + length * NWORDS[length]
|
||||
DICTSIZE = DOFFSET[24] + 24 * NWORDS[24]
|
||||
|
||||
The offset of a word within the DICT array for a given length and
|
||||
index is:
|
||||
|
||||
offset(length, index) = DOFFSET[length] + index * length
|
||||
|
||||
Each static dictionary word has 64 different forms, given by applying
|
||||
a word transformation to a base word in the DICT array. The list of
|
||||
word transformations is given in Appendix B. The static dictionary
|
||||
word for a <length, distance> pair can be reconstructed as follows:
|
||||
|
||||
word_id = distance - (max allowed distance + 1)
|
||||
index = word_id % NWORDS[length]
|
||||
base_word = DICT[offset(length, index)..offset(length, index+1))
|
||||
transform_id = word_id >> NBITS[length]
|
||||
|
||||
The string copied to the output stream is computed by applying the
|
||||
transformation to the base dictionary word. If transform_id is
|
||||
greater than 63 or length is greater than 24, the compressed data set
|
||||
is invalid and must be discarded.
|
||||
|
||||
9. Compressed data format
|
||||
|
||||
@ -979,6 +1027,11 @@ Abstract
|
||||
(MNIBBLES + 4) x 4 bits: MLEN - 1, where MLEN is the length
|
||||
of the meta-block in the input data in bytes
|
||||
|
||||
1 bit: ISUNCOMPRESSED, if set to 1, any bits of input up to
|
||||
the next byte boundary are ignored, and the rest of
|
||||
the meta-block contains MLEN bytes of literal data;
|
||||
this field is only present if ISLAST bit is not set
|
||||
|
||||
1-11 bits: NBLTYPESL, # of literal block types, encoded with
|
||||
the following variable length code:
|
||||
|
||||
@ -1035,17 +1088,25 @@ Abstract
|
||||
|
||||
NBLTYPESL x 2 bits: context mode for each literal block type
|
||||
|
||||
1-11 bits: NTREESL, # of literal Huffman trees, encoded with
|
||||
the same variable length code as NBLTYPESL
|
||||
|
||||
Literal context map, encoded as described in Paragraph 7.3,
|
||||
the number of Huffman tree indexes is denoted by NHTREESL
|
||||
appears only if NTREESL >= 2, otherwise the context map
|
||||
has only zero values
|
||||
|
||||
1-11 bits: NTREESD, # of distance Huffman trees, encoded with
|
||||
the same variable length code as NBLTYPESD
|
||||
|
||||
Distance context map, encoded as described in Paragraph 7.3,
|
||||
the number of Huffman tree indexes is denoted by NHTREESD
|
||||
appears only if NTREESD >= 2, otherwise the context map
|
||||
has only zero values
|
||||
|
||||
NHTREESL Huffman codes for literals
|
||||
NTREESL Huffman codes for literals
|
||||
|
||||
NBLTYPESI Huffman codes for insert-and-copy lengths
|
||||
|
||||
NHTREESD Huffman codes for distances
|
||||
NTREESD Huffman codes for distances
|
||||
|
||||
9.3. Format of the meta-block data
|
||||
|
||||
@ -1109,6 +1170,12 @@ Abstract
|
||||
if ISEMPTY
|
||||
break from loop
|
||||
read MLEN
|
||||
if not ISLAST
|
||||
read ISUNCOMPRESSED bit
|
||||
if ISUNCOMPRESSED
|
||||
skip any bits up to the next byte boundary
|
||||
copy MLEN bytes of input to the output stream
|
||||
continue to the next meta-block
|
||||
loop for each three block categories (i = L, I, D)
|
||||
read NBLTYPESi
|
||||
if NBLTYPESi >= 2
|
||||
@ -1122,8 +1189,16 @@ Abstract
|
||||
set block length, BLEN_i to 268435456
|
||||
read NPOSTFIX and NDIRECT
|
||||
read array of literal context modes, CMODE[]
|
||||
read literal context map, CMAPL[]
|
||||
read distance context map, CMAPD[]
|
||||
read NTREESL
|
||||
if NTREESL >= 2
|
||||
read literal context map, CMAPL[]
|
||||
else
|
||||
fill CMAPL[] with zeros
|
||||
read NTREESD
|
||||
if NTREESD >= 2
|
||||
read distance context map, CMAPD[]
|
||||
else
|
||||
fill CMAPD[] with zeros
|
||||
read array of Huffman codes for literals, HTREEL[]
|
||||
read array of Huffman codes for insert-and-copy, HTREEI[]
|
||||
read array of Huffman codes for distances, HTREED[]
|
||||
|
@ -59,6 +59,12 @@ static BROTLI_INLINE uint32_t BrotliPrefetchBits(BrotliBitReader* const br) {
|
||||
// For jumping over a number of bits in the bit stream when accessed with
|
||||
// BrotliPrefetchBits and BrotliFillBitWindow.
|
||||
static BROTLI_INLINE void BrotliSetBitPos(BrotliBitReader* const br, int val) {
|
||||
#ifdef BROTLI_DECODE_DEBUG
|
||||
int n_bits = val - br->bit_pos_;
|
||||
const uint32_t bval = (uint32_t)(br->val_ >> br->bit_pos_) & kBitMask[n_bits];
|
||||
printf("[BrotliReadBits] %010ld %2d val: %6x\n",
|
||||
(br->pos_ << 3) + br->bit_pos_ - 64, n_bits, bval);
|
||||
#endif
|
||||
br->bit_pos_ = val;
|
||||
}
|
||||
|
||||
@ -145,6 +151,10 @@ static BROTLI_INLINE uint32_t BrotliReadBits(
|
||||
BrotliBitReader* const br, int n_bits) {
|
||||
BrotliFillBitWindow(br);
|
||||
const uint32_t val = (uint32_t)(br->val_ >> br->bit_pos_) & kBitMask[n_bits];
|
||||
#ifdef BROTLI_DECODE_DEBUG
|
||||
printf("[BrotliReadBits] %010ld %2d val: %6x\n",
|
||||
(br->pos_ << 3) + br->bit_pos_ - 64, n_bits, val);
|
||||
#endif
|
||||
br->bit_pos_ += n_bits;
|
||||
return val;
|
||||
}
|
||||
|
114
dec/decode.c
114
dec/decode.c
@ -38,20 +38,16 @@ extern "C" {
|
||||
#endif
|
||||
|
||||
static const int kDefaultCodeLength = 8;
|
||||
static const int kCodeLengthLiterals = 16;
|
||||
static const int kCodeLengthRepeatCode = 16;
|
||||
static const int kCodeLengthExtraBits[3] = { 2, 3, 7 };
|
||||
static const int kCodeLengthRepeatOffsets[3] = { 3, 3, 11 };
|
||||
|
||||
static const int kNumLiteralCodes = 256;
|
||||
static const int kNumInsertAndCopyCodes = 704;
|
||||
static const int kNumBlockLengthCodes = 26;
|
||||
static const int kLiteralContextBits = 6;
|
||||
static const int kDistanceContextBits = 2;
|
||||
|
||||
#define CODE_LENGTH_CODES 19
|
||||
#define CODE_LENGTH_CODES 18
|
||||
static const uint8_t kCodeLengthCodeOrder[CODE_LENGTH_CODES] = {
|
||||
1, 2, 3, 4, 0, 17, 18, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
|
||||
1, 2, 3, 4, 0, 17, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15,
|
||||
};
|
||||
|
||||
#define NUM_DISTANCE_SHORT_CODES 16
|
||||
@ -71,11 +67,26 @@ static BROTLI_INLINE int DecodeWindowBits(BrotliBitReader* br) {
|
||||
}
|
||||
}
|
||||
|
||||
// Decodes a number in the range [0..255], by reading 1 - 11 bits.
|
||||
static BROTLI_INLINE int DecodeVarLenUint8(BrotliBitReader* br) {
|
||||
if (BrotliReadBits(br, 1)) {
|
||||
int nbits = BrotliReadBits(br, 3);
|
||||
if (nbits == 0) {
|
||||
return 1;
|
||||
} else {
|
||||
return BrotliReadBits(br, nbits) + (1 << nbits);
|
||||
}
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void DecodeMetaBlockLength(BrotliBitReader* br,
|
||||
size_t* meta_block_length,
|
||||
int* input_end) {
|
||||
int* input_end,
|
||||
int* is_uncompressed) {
|
||||
*input_end = BrotliReadBits(br, 1);
|
||||
*meta_block_length = 0;
|
||||
*is_uncompressed = 0;
|
||||
if (*input_end && BrotliReadBits(br, 1)) {
|
||||
return;
|
||||
}
|
||||
@ -85,6 +96,9 @@ static void DecodeMetaBlockLength(BrotliBitReader* br,
|
||||
*meta_block_length |= BrotliReadBits(br, 4) << (i * 4);
|
||||
}
|
||||
++(*meta_block_length);
|
||||
if (!*input_end) {
|
||||
*is_uncompressed = BrotliReadBits(br, 1);
|
||||
}
|
||||
}
|
||||
|
||||
// Decodes the next Huffman code from bit-stream.
|
||||
@ -130,6 +144,8 @@ static int ReadHuffmanCodeLengths(
|
||||
int max_symbol;
|
||||
int decode_number_of_code_length_codes;
|
||||
int prev_code_len = kDefaultCodeLength;
|
||||
int repeat = 0;
|
||||
int repeat_length = 0;
|
||||
HuffmanTree tree;
|
||||
|
||||
if (!BrotliHuffmanTreeBuildImplicit(&tree, code_length_code_lengths,
|
||||
@ -146,9 +162,11 @@ static int ReadHuffmanCodeLengths(
|
||||
decode_number_of_code_length_codes = BrotliReadBits(br, 1);
|
||||
BROTLI_LOG_UINT(decode_number_of_code_length_codes);
|
||||
if (decode_number_of_code_length_codes) {
|
||||
const int length_nbits = 2 + 2 * BrotliReadBits(br, 3);
|
||||
max_symbol = 2 + BrotliReadBits(br, length_nbits);
|
||||
BROTLI_LOG_UINT(length_nbits);
|
||||
if (BrotliReadBits(br, 1)) {
|
||||
max_symbol = 68 + BrotliReadBits(br, 7);
|
||||
} else {
|
||||
max_symbol = 4 + BrotliReadBits(br, 6);
|
||||
}
|
||||
if (max_symbol > num_symbols) {
|
||||
printf("[ReadHuffmanCodeLengths] max_symbol > num_symbols (%d vs %d)\n",
|
||||
max_symbol, num_symbols);
|
||||
@ -160,7 +178,7 @@ static int ReadHuffmanCodeLengths(
|
||||
BROTLI_LOG_UINT(max_symbol);
|
||||
|
||||
symbol = 0;
|
||||
while (symbol < num_symbols) {
|
||||
while (symbol + repeat < num_symbols) {
|
||||
int code_len;
|
||||
if (max_symbol-- == 0) break;
|
||||
if (!BrotliReadMoreInput(br)) {
|
||||
@ -169,30 +187,36 @@ static int ReadHuffmanCodeLengths(
|
||||
}
|
||||
code_len = ReadSymbol(&tree, br);
|
||||
BROTLI_LOG_UINT(symbol);
|
||||
BROTLI_LOG_UINT(repeat);
|
||||
BROTLI_LOG_UINT(repeat_length);
|
||||
BROTLI_LOG_UINT(code_len);
|
||||
if (code_len < kCodeLengthLiterals) {
|
||||
if ((code_len < kCodeLengthRepeatCode) ||
|
||||
(code_len == kCodeLengthRepeatCode && repeat_length == 0) ||
|
||||
(code_len > kCodeLengthRepeatCode && repeat_length > 0)) {
|
||||
while (repeat > 0) {
|
||||
code_lengths[symbol++] = repeat_length;
|
||||
--repeat;
|
||||
}
|
||||
}
|
||||
if (code_len < kCodeLengthRepeatCode) {
|
||||
code_lengths[symbol++] = code_len;
|
||||
if (code_len != 0) prev_code_len = code_len;
|
||||
} else {
|
||||
const int use_prev = (code_len == kCodeLengthRepeatCode);
|
||||
const int slot = code_len - kCodeLengthLiterals;
|
||||
const int extra_bits = kCodeLengthExtraBits[slot];
|
||||
const int repeat_offset = kCodeLengthRepeatOffsets[slot];
|
||||
const int length = use_prev ? prev_code_len : 0;
|
||||
int repeat = BrotliReadBits(br, extra_bits) + repeat_offset;
|
||||
BROTLI_LOG_UINT(repeat);
|
||||
BROTLI_LOG_UINT(length);
|
||||
if (symbol + repeat > num_symbols) {
|
||||
printf("[ReadHuffmanCodeLengths] symbol + repeat > num_symbols "
|
||||
"(%d + %d vs %d)\n", symbol, repeat, num_symbols);
|
||||
goto End;
|
||||
} else {
|
||||
while (repeat-- > 0) {
|
||||
code_lengths[symbol++] = length;
|
||||
}
|
||||
const int extra_bits = code_len - 14;
|
||||
if (repeat > 0) {
|
||||
repeat -= 2;
|
||||
repeat <<= extra_bits;
|
||||
}
|
||||
repeat += BrotliReadBits(br, extra_bits) + 3;
|
||||
repeat_length = (code_len == kCodeLengthRepeatCode ? prev_code_len : 0);
|
||||
}
|
||||
}
|
||||
if (symbol + repeat > num_symbols) {
|
||||
printf("[ReadHuffmanCodeLengths] symbol + repeat > num_symbols "
|
||||
"(%d + %d vs %d)\n", symbol, repeat, num_symbols);
|
||||
goto End;
|
||||
}
|
||||
while (repeat-- > 0) code_lengths[symbol++] = repeat_length;
|
||||
while (symbol < num_symbols) code_lengths[symbol++] = 0;
|
||||
ok = 1;
|
||||
|
||||
@ -256,7 +280,7 @@ static int ReadHuffmanCode(int alphabet_size,
|
||||
} else { // Decode Huffman-coded code lengths.
|
||||
int i;
|
||||
uint8_t code_length_code_lengths[CODE_LENGTH_CODES] = { 0 };
|
||||
const int num_codes = BrotliReadBits(br, 4) + 4;
|
||||
const int num_codes = BrotliReadBits(br, 4) + 3;
|
||||
BROTLI_LOG_UINT(num_codes);
|
||||
if (num_codes > CODE_LENGTH_CODES) {
|
||||
return 0;
|
||||
@ -434,7 +458,7 @@ static int DecodeContextMap(int context_map_size,
|
||||
printf("[DecodeContextMap] Unexpected end of input.\n");
|
||||
return 0;
|
||||
}
|
||||
*num_htrees = BrotliReadBits(br, 8) + 1;
|
||||
*num_htrees = DecodeVarLenUint8(br) + 1;
|
||||
|
||||
BROTLI_LOG_UINT(context_map_size);
|
||||
BROTLI_LOG_UINT(*num_htrees);
|
||||
@ -569,7 +593,8 @@ int BrotliDecompressedSize(size_t encoded_size,
|
||||
DecodeWindowBits(&br);
|
||||
size_t meta_block_len;
|
||||
int input_end;
|
||||
DecodeMetaBlockLength(&br, &meta_block_len, &input_end);
|
||||
int is_uncompressed;
|
||||
DecodeMetaBlockLength(&br, &meta_block_len, &input_end, &is_uncompressed);
|
||||
if (!input_end) {
|
||||
return 0;
|
||||
}
|
||||
@ -633,7 +658,8 @@ int BrotliDecompress(BrotliInput input, BrotliOutput output) {
|
||||
while (!input_end && ok) {
|
||||
size_t meta_block_len = 0;
|
||||
size_t meta_block_end_pos;
|
||||
uint32_t block_length[3] = { UINT32_MAX, UINT32_MAX, UINT32_MAX };
|
||||
int is_uncompressed;
|
||||
uint32_t block_length[3] = { 1 << 28, 1 << 28, 1 << 28 };
|
||||
int block_type[3] = { 0 };
|
||||
int num_block_types[3] = { 1, 1, 1 };
|
||||
int block_type_rb[6] = { 0, 1, 0, 1, 0, 1 };
|
||||
@ -672,22 +698,30 @@ int BrotliDecompress(BrotliInput input, BrotliOutput output) {
|
||||
goto End;
|
||||
}
|
||||
BROTLI_LOG_UINT(pos);
|
||||
DecodeMetaBlockLength(&br, &meta_block_len, &input_end);
|
||||
DecodeMetaBlockLength(&br, &meta_block_len, &input_end, &is_uncompressed);
|
||||
BROTLI_LOG_UINT(meta_block_len);
|
||||
if (meta_block_len == 0) {
|
||||
goto End;
|
||||
}
|
||||
meta_block_end_pos = pos + meta_block_len;
|
||||
if (is_uncompressed) {
|
||||
BrotliSetBitPos(&br, (br.bit_pos_ + 7) & ~7);
|
||||
for (; pos < meta_block_end_pos; ++pos) {
|
||||
ringbuffer[pos & ringbuffer_mask] = BrotliReadBits(&br, 8);
|
||||
if ((pos & ringbuffer_mask) == ringbuffer_mask) {
|
||||
if (BrotliWrite(output, ringbuffer, ringbuffer_size) < 0) {
|
||||
ok = 0;
|
||||
goto End;
|
||||
}
|
||||
}
|
||||
}
|
||||
goto End;
|
||||
}
|
||||
for (i = 0; i < 3; ++i) {
|
||||
block_type_trees[i].root_ = NULL;
|
||||
block_len_trees[i].root_ = NULL;
|
||||
if (BrotliReadBits(&br, 1)) {
|
||||
int nbits = BrotliReadBits(&br, 3);
|
||||
if (nbits == 0) {
|
||||
num_block_types[i] = 2;
|
||||
} else {
|
||||
num_block_types[i] = BrotliReadBits(&br, nbits) + (1 << nbits) + 1;
|
||||
}
|
||||
num_block_types[i] = DecodeVarLenUint8(&br) + 1;
|
||||
if (num_block_types[i] >= 2) {
|
||||
if (!ReadHuffmanCode(
|
||||
num_block_types[i] + 2, &block_type_trees[i], &br) ||
|
||||
!ReadHuffmanCode(kNumBlockLengthCodes, &block_len_trees[i], &br)) {
|
||||
|
@ -25,7 +25,7 @@
|
||||
namespace brotli {
|
||||
|
||||
static const int kHuffmanExtraBits[kCodeLengthCodes] = {
|
||||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 7,
|
||||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3,
|
||||
};
|
||||
|
||||
static inline int HuffmanTreeBitCost(const int* counts, const uint8_t* depth) {
|
||||
@ -58,25 +58,29 @@ static inline int HuffmanBitCost(const uint8_t* depth, int length) {
|
||||
}
|
||||
i += reps;
|
||||
if (value == 0) {
|
||||
while (reps > 10) {
|
||||
++histogram[18];
|
||||
reps -= 138;
|
||||
}
|
||||
if (reps > 2) {
|
||||
++histogram[17];
|
||||
} else if (reps > 0) {
|
||||
if (reps < 3) {
|
||||
histogram[0] += reps;
|
||||
} else {
|
||||
reps -= 3;
|
||||
while (reps >= 0) {
|
||||
++histogram[17];
|
||||
reps >>= 3;
|
||||
--reps;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
tail_start = i;
|
||||
++histogram[value];
|
||||
--reps;
|
||||
while (reps > 2) {
|
||||
++histogram[16];
|
||||
reps -= 6;
|
||||
}
|
||||
if (reps > 0) {
|
||||
if (reps < 3) {
|
||||
histogram[value] += reps;
|
||||
} else {
|
||||
reps -= 3;
|
||||
while (reps >= 0) {
|
||||
++histogram[16];
|
||||
reps >>= 2;
|
||||
--reps;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -87,7 +91,6 @@ static inline int HuffmanBitCost(const uint8_t* depth, int length) {
|
||||
// account for rle extra bits
|
||||
cost[16] += 2;
|
||||
cost[17] += 3;
|
||||
cost[18] += 7;
|
||||
|
||||
int tree_size = 0;
|
||||
int bits = 6 + 3 * max_depth; // huffman tree of huffman tree cost
|
||||
@ -95,27 +98,6 @@ static inline int HuffmanBitCost(const uint8_t* depth, int length) {
|
||||
bits += histogram[i] * cost[i]; // huffman tree bit cost
|
||||
tree_size += histogram[i];
|
||||
}
|
||||
// bit cost adjustment for long trailing zero sequence
|
||||
int tail_size = length - tail_start;
|
||||
int tail_bits = 0;
|
||||
while (tail_size >= 1) {
|
||||
if (tail_size < 3) {
|
||||
tail_bits += tail_size * cost[0];
|
||||
tree_size -= tail_size;
|
||||
break;
|
||||
} else if (tail_size < 11) {
|
||||
tail_bits += cost[17];
|
||||
--tree_size;
|
||||
break;
|
||||
} else {
|
||||
tail_bits += cost[18];
|
||||
tail_size -= 138;
|
||||
--tree_size;
|
||||
}
|
||||
}
|
||||
if (tail_bits > 12) {
|
||||
bits += ((Log2Ceiling(tree_size - 1) + 1) & ~1) + 3 - tail_bits;
|
||||
}
|
||||
return bits;
|
||||
}
|
||||
|
||||
|
@ -282,9 +282,8 @@ void ClusterBlocks(const DataType* data, const size_t length,
|
||||
}
|
||||
std::vector<HistogramType> clustered_histograms;
|
||||
std::vector<int> histogram_symbols;
|
||||
// Block ids need to fit in one byte and there are two ids reserved for
|
||||
// indicating 'same as last' and 'last plus one'.
|
||||
static const int kMaxNumberOfBlockTypes = 254;
|
||||
// Block ids need to fit in one byte.
|
||||
static const int kMaxNumberOfBlockTypes = 256;
|
||||
ClusterHistograms(histograms, 1, histograms.size(),
|
||||
kMaxNumberOfBlockTypes,
|
||||
&clustered_histograms,
|
||||
|
@ -30,7 +30,7 @@ namespace brotli {
|
||||
struct BlockSplit {
|
||||
int num_types_;
|
||||
std::vector<uint8_t> types_;
|
||||
std::vector<uint8_t> type_codes_;
|
||||
std::vector<int> type_codes_;
|
||||
std::vector<int> lengths_;
|
||||
};
|
||||
|
||||
|
172
enc/encode.cc
172
enc/encode.cc
@ -64,21 +64,32 @@ double TotalBitCost(const std::vector<Histogram<kSize> >& histograms) {
|
||||
return retval;
|
||||
}
|
||||
|
||||
void EncodeSize(size_t len, int* storage_ix, uint8_t* storage) {
|
||||
std::vector<uint8_t> len_bytes;
|
||||
do {
|
||||
len_bytes.push_back(len & 0xff);
|
||||
len >>= 8;
|
||||
} while (len > 0);
|
||||
WriteBits(3, len_bytes.size(), storage_ix, storage);
|
||||
for (int i = 0; i < len_bytes.size(); ++i) {
|
||||
WriteBits(8, len_bytes[i], storage_ix, storage);
|
||||
void EncodeVarLenUint8(int n, int* storage_ix, uint8_t* storage) {
|
||||
if (n == 0) {
|
||||
WriteBits(1, 0, storage_ix, storage);
|
||||
} else {
|
||||
WriteBits(1, 1, storage_ix, storage);
|
||||
int nbits = Log2Floor(n);
|
||||
WriteBits(3, nbits, storage_ix, storage);
|
||||
if (nbits > 0) {
|
||||
WriteBits(nbits, n - (1 << nbits), storage_ix, storage);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void EncodeMetaBlockLength(size_t meta_block_size,
|
||||
bool is_last,
|
||||
bool is_uncompressed,
|
||||
int* storage_ix, uint8_t* storage) {
|
||||
WriteBits(1, 0, storage_ix, storage);
|
||||
WriteBits(1, is_last, storage_ix, storage);
|
||||
if (is_last) {
|
||||
if (meta_block_size == 0) {
|
||||
WriteBits(1, 1, storage_ix, storage);
|
||||
return;
|
||||
}
|
||||
WriteBits(1, 0, storage_ix, storage);
|
||||
}
|
||||
--meta_block_size;
|
||||
int num_bits = Log2Floor(meta_block_size) + 1;
|
||||
if (num_bits < 16) {
|
||||
num_bits = 16;
|
||||
@ -89,6 +100,9 @@ void EncodeMetaBlockLength(size_t meta_block_size,
|
||||
meta_block_size >>= 4;
|
||||
num_bits -= 4;
|
||||
}
|
||||
if (!is_last) {
|
||||
WriteBits(1, is_uncompressed, storage_ix, storage);
|
||||
}
|
||||
}
|
||||
|
||||
template<int kSize>
|
||||
@ -104,16 +118,16 @@ void StoreHuffmanTreeOfHuffmanTreeToBitMask(
|
||||
const uint8_t* code_length_bitdepth,
|
||||
int* storage_ix, uint8_t* storage) {
|
||||
static const uint8_t kStorageOrder[kCodeLengthCodes] = {
|
||||
1, 2, 3, 4, 0, 17, 18, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
|
||||
1, 2, 3, 4, 0, 17, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15,
|
||||
};
|
||||
// Throw away trailing zeros:
|
||||
int codes_to_store = kCodeLengthCodes;
|
||||
for (; codes_to_store > 4; --codes_to_store) {
|
||||
for (; codes_to_store > 3; --codes_to_store) {
|
||||
if (code_length_bitdepth[kStorageOrder[codes_to_store - 1]] != 0) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
WriteBits(4, codes_to_store - 4, storage_ix, storage);
|
||||
WriteBits(4, codes_to_store - 3, storage_ix, storage);
|
||||
const int skip_two_first =
|
||||
code_length_bitdepth[kStorageOrder[0]] == 0 &&
|
||||
code_length_bitdepth[kStorageOrder[1]] == 0;
|
||||
@ -144,9 +158,6 @@ void StoreHuffmanTreeToBitMask(
|
||||
case 17:
|
||||
WriteBits(3, extra_bits, storage_ix, storage);
|
||||
break;
|
||||
case 18:
|
||||
WriteBits(7, extra_bits, storage_ix, storage);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -225,16 +236,16 @@ void StoreHuffmanCode(const EntropyCode<kSize>& code, int alphabet_size,
|
||||
}
|
||||
int trimmed_size = trimmed_histogram.total_count_;
|
||||
bool write_length = false;
|
||||
if (trimmed_size > 1 && trimmed_size < huffman_tree_size) {
|
||||
if (trimmed_size >= 4 && trimmed_size <= 195 &&
|
||||
trimmed_size < huffman_tree_size) {
|
||||
EntropyCode<kCodeLengthCodes> trimmed_entropy;
|
||||
BuildEntropyCode(trimmed_histogram, 5, kCodeLengthCodes, &trimmed_entropy);
|
||||
int huffman_bit_cost = HuffmanTreeBitCost(huffman_tree_histogram,
|
||||
huffman_tree_entropy);
|
||||
int trimmed_bit_cost = HuffmanTreeBitCost(trimmed_histogram,
|
||||
trimmed_entropy);;
|
||||
const int nbits = Log2Ceiling(trimmed_size - 1);
|
||||
const int nbitpairs = (nbits == 0) ? 1 : (nbits + 1) / 2;
|
||||
if (trimmed_bit_cost + 3 + 2 * nbitpairs < huffman_bit_cost) {
|
||||
trimmed_bit_cost += (trimmed_size < 68 ? 7 : 8);
|
||||
if (trimmed_bit_cost < huffman_bit_cost) {
|
||||
write_length = true;
|
||||
huffman_tree_size = trimmed_size;
|
||||
huffman_tree_entropy = trimmed_entropy;
|
||||
@ -245,10 +256,12 @@ void StoreHuffmanCode(const EntropyCode<kSize>& code, int alphabet_size,
|
||||
&huffman_tree_entropy.depth_[0], storage_ix, storage);
|
||||
WriteBits(1, write_length, storage_ix, storage);
|
||||
if (write_length) {
|
||||
const int nbits = Log2Ceiling(huffman_tree_size - 1);
|
||||
const int nbitpairs = (nbits == 0) ? 1 : (nbits + 1) / 2;
|
||||
WriteBits(3, nbitpairs - 1, storage_ix, storage);
|
||||
WriteBits(nbitpairs * 2, huffman_tree_size - 2, storage_ix, storage);
|
||||
WriteBits(1, huffman_tree_size >= 68, storage_ix, storage);
|
||||
if (huffman_tree_size < 68) {
|
||||
WriteBits(6, huffman_tree_size - 4, storage_ix, storage);
|
||||
} else {
|
||||
WriteBits(7, huffman_tree_size - 68, storage_ix, storage);
|
||||
}
|
||||
}
|
||||
StoreHuffmanTreeToBitMask(&huffman_tree[0], &huffman_tree_extra_bits[0],
|
||||
huffman_tree_size, huffman_tree_entropy,
|
||||
@ -464,7 +477,7 @@ int BestMaxZeroRunLengthPrefix(const std::vector<int>& v) {
|
||||
void EncodeContextMap(const std::vector<int>& context_map,
|
||||
int num_clusters,
|
||||
int* storage_ix, uint8_t* storage) {
|
||||
WriteBits(8, num_clusters - 1, storage_ix, storage);
|
||||
EncodeVarLenUint8(num_clusters - 1, storage_ix, storage);
|
||||
|
||||
if (num_clusters == 1) {
|
||||
return;
|
||||
@ -476,11 +489,11 @@ void EncodeContextMap(const std::vector<int>& context_map,
|
||||
int max_run_length_prefix = BestMaxZeroRunLengthPrefix(transformed_symbols);
|
||||
RunLengthCodeZeros(transformed_symbols, &max_run_length_prefix,
|
||||
&rle_symbols, &extra_bits);
|
||||
HistogramLiteral symbol_histogram;
|
||||
HistogramContextMap symbol_histogram;
|
||||
for (int i = 0; i < rle_symbols.size(); ++i) {
|
||||
symbol_histogram.Add(rle_symbols[i]);
|
||||
}
|
||||
EntropyCodeLiteral symbol_code;
|
||||
EntropyCodeContextMap symbol_code;
|
||||
BuildEntropyCode(symbol_histogram, 15, num_clusters + max_run_length_prefix,
|
||||
&symbol_code);
|
||||
bool use_rle = max_run_length_prefix > 0;
|
||||
@ -510,7 +523,7 @@ void BuildEntropyCodes(const std::vector<Histogram<kSize> >& histograms,
|
||||
}
|
||||
|
||||
struct BlockSplitCode {
|
||||
EntropyCodeLiteral block_type_code;
|
||||
EntropyCodeBlockType block_type_code;
|
||||
EntropyCodeBlockLength block_len_code;
|
||||
};
|
||||
|
||||
@ -553,18 +566,12 @@ void ComputeBlockTypeShortCodes(BlockSplit* split) {
|
||||
void BuildAndEncodeBlockSplitCode(const BlockSplit& split,
|
||||
BlockSplitCode* code,
|
||||
int* storage_ix, uint8_t* storage) {
|
||||
if (split.num_types_ <= 1) {
|
||||
WriteBits(1, 0, storage_ix, storage);
|
||||
EncodeVarLenUint8(split.num_types_ - 1, storage_ix, storage);
|
||||
if (split.num_types_ == 1) {
|
||||
return;
|
||||
}
|
||||
WriteBits(1, 1, storage_ix, storage);
|
||||
int nbits = Log2Floor(split.num_types_ - 1);
|
||||
WriteBits(3, nbits, storage_ix, storage);
|
||||
if (nbits > 0) {
|
||||
WriteBits(nbits, split.num_types_ - 1 - (1 << nbits), storage_ix, storage);
|
||||
}
|
||||
|
||||
HistogramLiteral type_histo;
|
||||
HistogramBlockType type_histo;
|
||||
for (int i = 0; i < split.type_codes_.size(); ++i) {
|
||||
type_histo.Add(split.type_codes_[i]);
|
||||
}
|
||||
@ -591,7 +598,7 @@ void MoveAndEncode(const BlockSplitCode& code,
|
||||
++it->idx_;
|
||||
it->type_ = it->split_.types_[it->idx_];
|
||||
it->length_ = it->split_.lengths_[it->idx_];
|
||||
uint8_t type_code = it->split_.type_codes_[it->idx_];
|
||||
int type_code = it->split_.type_codes_[it->idx_];
|
||||
EntropyEncode(type_code, code.block_type_code, storage_ix, storage);
|
||||
EncodeBlockLength(code.block_len_code, it->length_, storage_ix, storage);
|
||||
}
|
||||
@ -626,6 +633,9 @@ void BuildMetaBlock(const EncodingParams& params,
|
||||
MetaBlock* mb) {
|
||||
mb->cmds = cmds;
|
||||
mb->params = params;
|
||||
if (cmds.empty()) {
|
||||
return;
|
||||
}
|
||||
ComputeCommandPrefixes(&mb->cmds,
|
||||
mb->params.num_direct_distance_codes,
|
||||
mb->params.distance_postfix_bits);
|
||||
@ -661,9 +671,8 @@ void BuildMetaBlock(const EncodingParams& params,
|
||||
&mb->command_histograms,
|
||||
&distance_histograms);
|
||||
|
||||
// Histogram ids need to fit in one byte and there are 16 ids reserved for
|
||||
// run length codes, which leaves a maximum number of 240 histograms.
|
||||
static const int kMaxNumberOfHistograms = 240;
|
||||
// Histogram ids need to fit in one byte.
|
||||
static const int kMaxNumberOfHistograms = 256;
|
||||
|
||||
mb->literal_histograms = literal_histograms;
|
||||
ClusterHistograms(literal_histograms,
|
||||
@ -692,14 +701,20 @@ size_t MetaBlockLength(const std::vector<Command>& cmds) {
|
||||
}
|
||||
|
||||
void StoreMetaBlock(const MetaBlock& mb,
|
||||
const bool is_last,
|
||||
const uint8_t* ringbuffer,
|
||||
const size_t mask,
|
||||
size_t* pos,
|
||||
int* storage_ix, uint8_t* storage) {
|
||||
size_t length = MetaBlockLength(mb.cmds);
|
||||
const size_t end_pos = *pos + length;
|
||||
EncodeMetaBlockLength(length - 1,
|
||||
EncodeMetaBlockLength(length,
|
||||
is_last,
|
||||
false,
|
||||
storage_ix, storage);
|
||||
if (length == 0) {
|
||||
return;
|
||||
}
|
||||
BlockSplitCode literal_split_code;
|
||||
BlockSplitCode command_split_code;
|
||||
BlockSplitCode distance_split_code;
|
||||
@ -798,42 +813,65 @@ void BrotliCompressor::WriteStreamHeader() {
|
||||
|
||||
void BrotliCompressor::WriteMetaBlock(const size_t input_size,
|
||||
const uint8_t* input_buffer,
|
||||
const bool is_last,
|
||||
size_t* encoded_size,
|
||||
uint8_t* encoded_buffer) {
|
||||
ringbuffer_.Write(input_buffer, input_size);
|
||||
EstimateBitCostsForLiterals(input_pos_, input_size,
|
||||
kRingBufferMask, ringbuffer_.start(),
|
||||
&literal_cost_[0]);
|
||||
std::vector<Command> commands;
|
||||
CreateBackwardReferences(input_size, input_pos_,
|
||||
ringbuffer_.start(),
|
||||
&literal_cost_[0],
|
||||
kRingBufferMask, kMaxBackwardDistance,
|
||||
hasher_,
|
||||
&commands);
|
||||
ComputeDistanceShortCodes(&commands, dist_ringbuffer_,
|
||||
&dist_ringbuffer_idx_);
|
||||
if (input_size > 0) {
|
||||
ringbuffer_.Write(input_buffer, input_size);
|
||||
EstimateBitCostsForLiterals(input_pos_, input_size,
|
||||
kRingBufferMask, ringbuffer_.start(),
|
||||
&literal_cost_[0]);
|
||||
CreateBackwardReferences(input_size, input_pos_,
|
||||
ringbuffer_.start(),
|
||||
&literal_cost_[0],
|
||||
kRingBufferMask, kMaxBackwardDistance,
|
||||
hasher_,
|
||||
&commands);
|
||||
ComputeDistanceShortCodes(&commands, dist_ringbuffer_,
|
||||
&dist_ringbuffer_idx_);
|
||||
}
|
||||
EncodingParams params;
|
||||
params.num_direct_distance_codes = 12;
|
||||
params.distance_postfix_bits = 1;
|
||||
params.literal_context_mode = CONTEXT_SIGNED;
|
||||
const int storage_ix0 = storage_ix_;
|
||||
MetaBlock mb;
|
||||
BuildMetaBlock(params, commands, ringbuffer_.start(), input_pos_,
|
||||
kRingBufferMask, &mb);
|
||||
StoreMetaBlock(mb, ringbuffer_.start(), kRingBufferMask,
|
||||
StoreMetaBlock(mb, is_last, ringbuffer_.start(), kRingBufferMask,
|
||||
&input_pos_, &storage_ix_, storage_);
|
||||
size_t output_size = storage_ix_ >> 3;
|
||||
memcpy(encoded_buffer, storage_, output_size);
|
||||
*encoded_size = output_size;
|
||||
storage_ix_ -= output_size << 3;
|
||||
storage_[storage_ix_ >> 3] = storage_[output_size];
|
||||
size_t output_size = is_last ? ((storage_ix_ + 7) >> 3) : (storage_ix_ >> 3);
|
||||
if (input_size + 4 < output_size) {
|
||||
storage_ix_ = storage_ix0;
|
||||
storage_[storage_ix_ >> 3] &= (1 << (storage_ix_ & 7)) - 1;
|
||||
EncodeMetaBlockLength(input_size, false, true, &storage_ix_, storage_);
|
||||
size_t hdr_size = (storage_ix_ + 7) >> 3;
|
||||
memcpy(encoded_buffer, storage_, hdr_size);
|
||||
memcpy(encoded_buffer + hdr_size, input_buffer, input_size);
|
||||
*encoded_size = hdr_size + input_size;
|
||||
if (is_last) {
|
||||
encoded_buffer[*encoded_size] = 0x3; // ISLAST, ISEMPTY
|
||||
++(*encoded_size);
|
||||
}
|
||||
storage_ix_ = 0;
|
||||
storage_[0] = 0;
|
||||
} else {
|
||||
memcpy(encoded_buffer, storage_, output_size);
|
||||
*encoded_size = output_size;
|
||||
if (is_last) {
|
||||
storage_ix_ = 0;
|
||||
storage_[0] = 0;
|
||||
} else {
|
||||
storage_ix_ -= output_size << 3;
|
||||
storage_[storage_ix_ >> 3] = storage_[output_size];
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void BrotliCompressor::FinishStream(
|
||||
size_t* encoded_size, uint8_t* encoded_buffer) {
|
||||
WriteBits(2, 0x3, &storage_ix_, storage_);
|
||||
*encoded_size = (storage_ix_ + 7) >> 3;
|
||||
memcpy(encoded_buffer, storage_, *encoded_size);
|
||||
WriteMetaBlock(0, NULL, true, encoded_size, encoded_buffer);
|
||||
}
|
||||
|
||||
|
||||
@ -857,21 +895,19 @@ int BrotliCompressBuffer(size_t input_size,
|
||||
|
||||
while (input_buffer < input_end) {
|
||||
int block_size = max_block_size;
|
||||
bool is_last = false;
|
||||
if (block_size >= input_end - input_buffer) {
|
||||
block_size = input_end - input_buffer;
|
||||
is_last = true;
|
||||
}
|
||||
size_t output_size = max_output_size;
|
||||
compressor.WriteMetaBlock(block_size, input_buffer,
|
||||
compressor.WriteMetaBlock(block_size, input_buffer, is_last,
|
||||
&output_size, &encoded_buffer[*encoded_size]);
|
||||
input_buffer += block_size;
|
||||
*encoded_size += output_size;
|
||||
max_output_size -= output_size;
|
||||
}
|
||||
|
||||
size_t output_size = max_output_size;
|
||||
compressor.FinishStream(&output_size, &encoded_buffer[*encoded_size]);
|
||||
*encoded_size += output_size;
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
|
@ -39,6 +39,7 @@ class BrotliCompressor {
|
||||
// written.
|
||||
void WriteMetaBlock(const size_t input_size,
|
||||
const uint8_t* input_buffer,
|
||||
const bool is_last,
|
||||
size_t* encoded_size,
|
||||
uint8_t* encoded_buffer);
|
||||
|
||||
|
@ -157,6 +157,17 @@ void CreateHuffmanTree(const int *data,
|
||||
}
|
||||
}
|
||||
|
||||
void Reverse(uint8_t* v, int start, int end) {
|
||||
--end;
|
||||
while (start < end) {
|
||||
int tmp = v[start];
|
||||
v[start] = v[end];
|
||||
v[end] = tmp;
|
||||
++start;
|
||||
--end;
|
||||
}
|
||||
}
|
||||
|
||||
void WriteHuffmanTreeRepetitions(
|
||||
const int previous_value,
|
||||
const int value,
|
||||
@ -170,26 +181,24 @@ void WriteHuffmanTreeRepetitions(
|
||||
++(*tree_size);
|
||||
--repetitions;
|
||||
}
|
||||
while (repetitions >= 1) {
|
||||
if (repetitions < 3) {
|
||||
for (int i = 0; i < repetitions; ++i) {
|
||||
tree[*tree_size] = value;
|
||||
extra_bits[*tree_size] = 0;
|
||||
++(*tree_size);
|
||||
}
|
||||
return;
|
||||
} else if (repetitions < 7) {
|
||||
// 3 to 6 left.
|
||||
tree[*tree_size] = 16;
|
||||
extra_bits[*tree_size] = repetitions - 3;
|
||||
if (repetitions < 3) {
|
||||
for (int i = 0; i < repetitions; ++i) {
|
||||
tree[*tree_size] = value;
|
||||
extra_bits[*tree_size] = 0;
|
||||
++(*tree_size);
|
||||
return;
|
||||
} else {
|
||||
tree[*tree_size] = 16;
|
||||
extra_bits[*tree_size] = 3;
|
||||
++(*tree_size);
|
||||
repetitions -= 6;
|
||||
}
|
||||
} else {
|
||||
repetitions -= 3;
|
||||
int start = *tree_size;
|
||||
while (repetitions >= 0) {
|
||||
tree[*tree_size] = 16;
|
||||
extra_bits[*tree_size] = repetitions & 0x3;
|
||||
++(*tree_size);
|
||||
repetitions >>= 2;
|
||||
--repetitions;
|
||||
}
|
||||
Reverse(tree, start, *tree_size);
|
||||
Reverse(extra_bits, start, *tree_size);
|
||||
}
|
||||
}
|
||||
|
||||
@ -198,30 +207,24 @@ void WriteHuffmanTreeRepetitionsZeros(
|
||||
uint8_t* tree,
|
||||
uint8_t* extra_bits,
|
||||
int* tree_size) {
|
||||
while (repetitions >= 1) {
|
||||
if (repetitions < 3) {
|
||||
for (int i = 0; i < repetitions; ++i) {
|
||||
tree[*tree_size] = 0;
|
||||
extra_bits[*tree_size] = 0;
|
||||
++(*tree_size);
|
||||
}
|
||||
return;
|
||||
} else if (repetitions < 11) {
|
||||
tree[*tree_size] = 17;
|
||||
extra_bits[*tree_size] = repetitions - 3;
|
||||
if (repetitions < 3) {
|
||||
for (int i = 0; i < repetitions; ++i) {
|
||||
tree[*tree_size] = 0;
|
||||
extra_bits[*tree_size] = 0;
|
||||
++(*tree_size);
|
||||
return;
|
||||
} else if (repetitions < 139) {
|
||||
tree[*tree_size] = 18;
|
||||
extra_bits[*tree_size] = repetitions - 11;
|
||||
++(*tree_size);
|
||||
return;
|
||||
} else {
|
||||
tree[*tree_size] = 18;
|
||||
extra_bits[*tree_size] = 0x7f; // 138 repeated 0s
|
||||
++(*tree_size);
|
||||
repetitions -= 138;
|
||||
}
|
||||
} else {
|
||||
repetitions -= 3;
|
||||
int start = *tree_size;
|
||||
while (repetitions >= 0) {
|
||||
tree[*tree_size] = 17;
|
||||
extra_bits[*tree_size] = repetitions & 0x7;
|
||||
++(*tree_size);
|
||||
repetitions >>= 3;
|
||||
--repetitions;
|
||||
}
|
||||
Reverse(tree, start, *tree_size);
|
||||
Reverse(extra_bits, start, *tree_size);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -98,7 +98,7 @@ void BuildEntropyCode(const Histogram<kSize>& histogram,
|
||||
ConvertBitDepthsToSymbols(&code->depth_[0], alphabet_size, &code->bits_[0]);
|
||||
}
|
||||
|
||||
static const int kCodeLengthCodes = 19;
|
||||
static const int kCodeLengthCodes = 18;
|
||||
|
||||
// Literal entropy code.
|
||||
typedef EntropyCode<256> EntropyCodeLiteral;
|
||||
@ -106,6 +106,10 @@ typedef EntropyCode<256> EntropyCodeLiteral;
|
||||
typedef EntropyCode<kNumCommandPrefixes> EntropyCodeCommand;
|
||||
typedef EntropyCode<kNumDistancePrefixes> EntropyCodeDistance;
|
||||
typedef EntropyCode<kNumBlockLenPrefixes> EntropyCodeBlockLength;
|
||||
// Context map entropy code, 256 Huffman tree indexes + 16 run length codes.
|
||||
typedef EntropyCode<272> EntropyCodeContextMap;
|
||||
// Block type entropy code, 256 block types + 2 special symbols.
|
||||
typedef EntropyCode<258> EntropyCodeBlockType;
|
||||
|
||||
} // namespace brotli
|
||||
|
||||
|
@ -78,6 +78,10 @@ typedef Histogram<256> HistogramLiteral;
|
||||
typedef Histogram<kNumCommandPrefixes> HistogramCommand;
|
||||
typedef Histogram<kNumDistancePrefixes> HistogramDistance;
|
||||
typedef Histogram<kNumBlockLenPrefixes> HistogramBlockLength;
|
||||
// Context map histogram, 256 Huffman tree indexes + 16 run length codes.
|
||||
typedef Histogram<272> HistogramContextMap;
|
||||
// Block type histogram, 256 block types + 2 special symbols.
|
||||
typedef Histogram<258> HistogramBlockType;
|
||||
|
||||
static const int kLiteralContextBits = 6;
|
||||
static const int kDistanceContextBits = 2;
|
||||
|
Loading…
Reference in New Issue
Block a user