mirror of
https://github.com/google/brotli.git
synced 2024-11-24 20:40:13 +00:00
Address review comments in the specification.
This commit updates the draft to the ietf -09 version: https://www.ietf.org/id/draft-alakuijala-brotli-09.txt In this version review comments from Jean-loup Gailly and the ietf secdir review were addressed.
This commit is contained in:
parent
26cf47f3f0
commit
e96d5b29e7
@ -7,9 +7,9 @@
|
|||||||
.ds LF Alakuijala & Szabadka
|
.ds LF Alakuijala & Szabadka
|
||||||
.ds RF FORMFEED[Page %]
|
.ds RF FORMFEED[Page %]
|
||||||
.ds LH Internet-Draft
|
.ds LH Internet-Draft
|
||||||
.ds RH December 2015
|
.ds RH April 2016
|
||||||
.ds CH Brotli
|
.ds CH Brotli
|
||||||
.ds CF Expires June 10, 2016
|
.ds CF Expires October 19, 2016
|
||||||
.hy 0
|
.hy 0
|
||||||
.nh
|
.nh
|
||||||
.ad l
|
.ad l
|
||||||
@ -18,13 +18,13 @@
|
|||||||
.tl 'Network Working Group''J. Alakuijala'
|
.tl 'Network Working Group''J. Alakuijala'
|
||||||
.tl 'Internet-Draft''Z. Szabadka'
|
.tl 'Internet-Draft''Z. Szabadka'
|
||||||
.tl 'Intended Status: Informational''Google, Inc'
|
.tl 'Intended Status: Informational''Google, Inc'
|
||||||
.tl 'Expires: June 10, 2016''December 2015'
|
.tl 'Expires: October 19, 2016''April 2016'
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
|
|
||||||
.ce 2
|
.ce 2
|
||||||
Brotli Compressed Data Format
|
Brotli Compressed Data Format
|
||||||
draft-alakuijala-brotli-08
|
draft-alakuijala-brotli-09
|
||||||
.fi
|
.fi
|
||||||
.in 3
|
.in 3
|
||||||
|
|
||||||
@ -52,12 +52,12 @@ and may be updated, replaced, or obsoleted by other documents at any
|
|||||||
time. It is inappropriate to use Internet-Drafts as reference
|
time. It is inappropriate to use Internet-Drafts as reference
|
||||||
material or to cite them other than as "work in progress."
|
material or to cite them other than as "work in progress."
|
||||||
|
|
||||||
This Internet-Draft will expire on June 10, 2016.
|
This Internet-Draft will expire on October 19, 2016.
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
Copyright Notice
|
Copyright Notice
|
||||||
|
|
||||||
Copyright (c) 2015 IETF Trust and the persons identified as the document
|
Copyright (c) 2016 IETF Trust and the persons identified as the document
|
||||||
authors. All rights reserved.
|
authors. All rights reserved.
|
||||||
|
|
||||||
This document is subject to BCP 78 and the IETF Trust's Legal
|
This document is subject to BCP 78 and the IETF Trust's Legal
|
||||||
@ -220,9 +220,27 @@ a sequence of bytes, starting with the first byte at the
|
|||||||
*right* margin and proceeding to the *left*, with the
|
*right* margin and proceeding to the *left*, with the
|
||||||
most-significant bit of each byte on the left as usual, one would
|
most-significant bit of each byte on the left as usual, one would
|
||||||
be able to parse the result from right to left, with fixed-width
|
be able to parse the result from right to left, with fixed-width
|
||||||
elements in the correct MSB-to-LSB order and prefix codes in
|
elements in the correct msb-to-lsb order and prefix codes in
|
||||||
bit-reversed order (i.e., with the first bit of the code in the
|
bit-reversed order (i.e., with the first bit of the code in the
|
||||||
relative LSB position).
|
relative lsb position).
|
||||||
|
|
||||||
|
As an example, consider packing the following data elements into
|
||||||
|
a sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2,
|
||||||
|
prefix code 110, prefix code 10, 12-bit integer value 3628.
|
||||||
|
|
||||||
|
.nf
|
||||||
|
byte 2 byte 1 byte 0
|
||||||
|
+--------+--------+--------+
|
||||||
|
|11100010|11000101|10010110|
|
||||||
|
+--------+--------+--------+
|
||||||
|
^ ^ ^ ^ ^
|
||||||
|
| | | | |
|
||||||
|
| | | | +------ integer value 6
|
||||||
|
| | | +---------- integer value 2
|
||||||
|
| | +-------------- prefix code 110
|
||||||
|
| +---------------- prefix code 10
|
||||||
|
+----------------------------- integer value 3628
|
||||||
|
.fi
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
2. Compressed representation overview
|
2. Compressed representation overview
|
||||||
@ -693,26 +711,26 @@ are compressed using a prefix code. The alphabet for code lengths
|
|||||||
is as follows:
|
is as follows:
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
0 - 15: Represent code lengths of 0 - 15
|
0..15: Represent code lengths of 0..15
|
||||||
16: Copy the previous non-zero code length 3 - 6 times
|
16: Copy the previous non-zero code length 3..6 times
|
||||||
The next 2 bits indicate repeat length
|
The next 2 bits indicate repeat length
|
||||||
(0 = 3, ... , 3 = 6)
|
(0 = 3, ... , 3 = 6)
|
||||||
If this is the first code length, or all previous
|
If this is the first code length, or all previous
|
||||||
code lengths are zero, a code length of 8 is
|
code lengths are zero, a code length of 8 is
|
||||||
repeated 3 - 6 times
|
repeated 3..6 times
|
||||||
A repeated code length code of 16 modifies the
|
A repeated code length code of 16 modifies the
|
||||||
repeat count of the previous one as follows:
|
repeat count of the previous one as follows:
|
||||||
repeat count = (4 * (repeat count - 2)) +
|
repeat count = (4 * (repeat count - 2)) +
|
||||||
(3 - 6 on the next 2 bits)
|
(3..6 on the next 2 bits)
|
||||||
Example: Codes 7, 16 (+2 bits 11), 16 (+2 bits 10)
|
Example: Codes 7, 16 (+2 bits 11), 16 (+2 bits 10)
|
||||||
will expand to 22 code lengths of 7
|
will expand to 22 code lengths of 7
|
||||||
(1 + 4 * (6 - 2) + 5)
|
(1 + 4 * (6 - 2) + 5)
|
||||||
17: Repeat a code length of 0 for 3 - 10 times.
|
17: Repeat a code length of 0 for 3..10 times.
|
||||||
(3 bits of length)
|
(3 bits of length)
|
||||||
A repeated code length code of 17 modifies the
|
A repeated code length code of 17 modifies the
|
||||||
repeat count of the previous one as follows:
|
repeat count of the previous one as follows:
|
||||||
repeat count = (8 * (repeat count - 2)) +
|
repeat count = (8 * (repeat count - 2)) +
|
||||||
(3 - 10 on the next 3 bits)
|
(3..10 on the next 3 bits)
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
Note that a code of 16 that follows an immediately preceding 16 modifies the
|
Note that a code of 16 that follows an immediately preceding 16 modifies the
|
||||||
@ -763,7 +781,7 @@ We can now define the format of the complex prefix code as follows:
|
|||||||
is for symbol 4.
|
is for symbol 4.
|
||||||
|
|
||||||
The code lengths of code length symbols are between 0 and
|
The code lengths of code length symbols are between 0 and
|
||||||
5, and they are represented with 2 - 4 bits according to
|
5, and they are represented with 2..4 bits according to
|
||||||
the variable length code above. A code length of 0 means
|
the variable length code above. A code length of 0 means
|
||||||
the corresponding code length symbol is not used.
|
the corresponding code length symbol is not used.
|
||||||
|
|
||||||
@ -816,7 +834,7 @@ represented with a pair <distance code, extra bits>. The distance
|
|||||||
code and the extra bits are encoded back-to-back, the distance code
|
code and the extra bits are encoded back-to-back, the distance code
|
||||||
is encoded using a prefix code over the distance alphabet,
|
is encoded using a prefix code over the distance alphabet,
|
||||||
while the extra bits value is encoded as a fixed-width integer
|
while the extra bits value is encoded as a fixed-width integer
|
||||||
value. The number of extra bits can be 0 - 24, and it is dependent
|
value. The number of extra bits can be 0..24, and it is dependent
|
||||||
on the distance code.
|
on the distance code.
|
||||||
|
|
||||||
To convert a distance code and associated extra bits to a backward
|
To convert a distance code and associated extra bits to a backward
|
||||||
@ -913,7 +931,7 @@ extra bits are encoded back-to-back, the insert-and-copy length code
|
|||||||
is encoded using a prefix code over the insert-and-copy length code
|
is encoded using a prefix code over the insert-and-copy length code
|
||||||
alphabet, while the extra bits values are encoded as fixed-width
|
alphabet, while the extra bits values are encoded as fixed-width
|
||||||
integer values. The number of insert and copy extra bits can be
|
integer values. The number of insert and copy extra bits can be
|
||||||
0 - 24, and they are dependent on the insert-and-copy length code.
|
0..24, and they are dependent on the insert-and-copy length code.
|
||||||
|
|
||||||
Some of the insert-and-copy length codes also express the fact that
|
Some of the insert-and-copy length codes also express the fact that
|
||||||
the distance symbol of the distance in the same command is 0, i.e. the
|
the distance symbol of the distance in the same command is 0, i.e. the
|
||||||
@ -929,17 +947,17 @@ are as follows:
|
|||||||
|
|
||||||
.nf
|
.nf
|
||||||
.KS
|
.KS
|
||||||
Extra Extra Extra
|
Extra Extra Extra
|
||||||
Code Bits Lengths Code Bits Lengths Code Bits Lengths
|
Code Bits Lengths Code Bits Lengths Code Bits Lengths
|
||||||
---- ---- ------ ---- ---- ------- ---- ---- -------
|
---- ---- ------- ---- ---- ------- ---- ---- -------
|
||||||
0 0 0 8 2 10-13 16 6 130-193
|
0 0 0 8 2 10..13 16 6 130..193
|
||||||
1 0 1 9 2 14-17 17 7 194-321
|
1 0 1 9 2 14..17 17 7 194..321
|
||||||
2 0 2 10 3 18-25 18 8 322-577
|
2 0 2 10 3 18..25 18 8 322..577
|
||||||
3 0 3 11 3 26-33 19 9 578-1089
|
3 0 3 11 3 26..33 19 9 578..1089
|
||||||
4 0 4 12 4 34-49 20 10 1090-2113
|
4 0 4 12 4 34..49 20 10 1090..2113
|
||||||
5 0 5 13 4 50-65 21 12 2114-6209
|
5 0 5 13 4 50..65 21 12 2114..6209
|
||||||
6 1 6,7 14 5 66-97 22 14 6210-22593
|
6 1 6,7 14 5 66..97 22 14 6210..22593
|
||||||
7 1 8,9 15 5 98-129 23 24 22594-16799809
|
7 1 8,9 15 5 98..129 23 24 22594..16799809
|
||||||
.KE
|
.KE
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
@ -948,17 +966,17 @@ of copy extra bits, and the range of copy lengths are as follows:
|
|||||||
|
|
||||||
.nf
|
.nf
|
||||||
.KS
|
.KS
|
||||||
Extra Extra Extra
|
Extra Extra Extra
|
||||||
Code Bits Lengths Code Bits Lengths Code Bits Lengths
|
Code Bits Lengths Code Bits Lengths Code Bits Lengths
|
||||||
---- ---- ------ ---- ---- ------- ---- ---- -------
|
---- ---- ------- ---- ---- ------- ---- ---- -------
|
||||||
0 0 2 8 1 10,11 16 5 70-101
|
0 0 2 8 1 10,11 16 5 70..101
|
||||||
1 0 3 9 1 12,13 17 5 102-133
|
1 0 3 9 1 12,13 17 5 102..133
|
||||||
2 0 4 10 2 14-17 18 6 134-197
|
2 0 4 10 2 14..17 18 6 134..197
|
||||||
3 0 5 11 2 18-21 19 7 198-325
|
3 0 5 11 2 18..21 19 7 198..325
|
||||||
4 0 6 12 3 22-29 20 8 326-581
|
4 0 6 12 3 22..29 20 8 326..581
|
||||||
5 0 7 13 3 30-37 21 9 582-1093
|
5 0 7 13 3 30..37 21 9 582..1093
|
||||||
6 0 8 14 4 38-53 22 10 1094-2117
|
6 0 8 14 4 38..53 22 10 1094..2117
|
||||||
7 0 9 15 4 54-69 23 24 2118-16779333
|
7 0 9 15 4 54..69 23 24 2118..16779333
|
||||||
.KE
|
.KE
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
@ -969,34 +987,34 @@ and a copy length code, the following table can be used:
|
|||||||
.KS
|
.KS
|
||||||
Insert
|
Insert
|
||||||
length Copy length code
|
length Copy length code
|
||||||
code 0-7 8-15 16-23
|
code 0..7 8..15 16..23
|
||||||
+---------+---------+
|
+----------+----------+
|
||||||
| | |
|
| | |
|
||||||
0-7 | 0-63 | 64-127 | <--- distance symbol 0
|
0..7 | 0..63 | 64..127 | <--- distance symbol 0
|
||||||
| | |
|
| | |
|
||||||
+---------+---------+---------+
|
+----------+----------+----------+
|
||||||
| | | |
|
| | | |
|
||||||
0-7 | 128-191 | 192-255 | 384-447 |
|
0..7 | 128..191 | 192..255 | 384..447 |
|
||||||
| | | |
|
| | | |
|
||||||
+---------+---------+---------+
|
+----------+----------+----------+
|
||||||
| | | |
|
| | | |
|
||||||
8-15 | 256-319 | 320-383 | 512-575 |
|
8..15 | 256..319 | 320..383 | 512..575 |
|
||||||
| | | |
|
| | | |
|
||||||
+---------+---------+---------+
|
+----------+----------+----------+
|
||||||
| | | |
|
| | | |
|
||||||
16-23 | 448-511 | 576-639 | 640-703 |
|
16..23 | 448..511 | 576..639 | 640..703 |
|
||||||
| | | |
|
| | | |
|
||||||
+---------+---------+---------+
|
+----------+----------+----------+
|
||||||
.KE
|
.KE
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
First, look up the cell with the 64 value range containing the
|
First, look up the cell with the 64 value range containing the
|
||||||
insert-and-copy length code, this gives the insert length code and
|
insert-and-copy length code, this gives the insert length code and
|
||||||
the copy length code ranges, both 8 values long.
|
the copy length code ranges, both 8 values long.
|
||||||
The copy length code within its range is determined by bits 0-2
|
The copy length code within its range is determined by bits 0..2
|
||||||
(counted from the LSB) of the insert-and-copy length code.
|
(counted from the lsb) of the insert-and-copy length code.
|
||||||
The insert length code within its range is determined by bits 3-5
|
The insert length code within its range is determined by bits 3..5
|
||||||
(counted from the LSB) of the insert-and-copy length code.
|
(counted from the lsb) of the insert-and-copy length code.
|
||||||
Given the insert length and copy length codes, the actual insert
|
Given the insert length and copy length codes, the actual insert
|
||||||
and copy lengths can be obtained by reading the number of extra
|
and copy lengths can be obtained by reading the number of extra
|
||||||
bits given by the tables above.
|
bits given by the tables above.
|
||||||
@ -1020,8 +1038,8 @@ the block type that preceded the current type,
|
|||||||
while a block type symbol 1 means that the new block type equals the current
|
while a block type symbol 1 means that the new block type equals the current
|
||||||
block type plus one. If the current block type is the maximal possible,
|
block type plus one. If the current block type is the maximal possible,
|
||||||
then a block type symbol of 1 results in wrapping to a new block type of 0.
|
then a block type symbol of 1 results in wrapping to a new block type of 0.
|
||||||
Block type symbols 2 - 257
|
Block type symbols 2..257
|
||||||
represent block types 0 - 255 respectively. The previous and current block types
|
represent block types 0..255 respectively. The previous and current block types
|
||||||
are initialized to 1 and 0, respectively, at the end of the
|
are initialized to 1 and 0, respectively, at the end of the
|
||||||
meta-block header.
|
meta-block header.
|
||||||
|
|
||||||
@ -1051,24 +1069,24 @@ Each block count in the compressed data is represented with a pair
|
|||||||
bits are encoded back-to-back, the block count code is encoded using
|
bits are encoded back-to-back, the block count code is encoded using
|
||||||
a prefix code over the block count code alphabet, while the extra
|
a prefix code over the block count code alphabet, while the extra
|
||||||
bits value is encoded as a fixed-width integer value. The number of
|
bits value is encoded as a fixed-width integer value. The number of
|
||||||
extra bits can be 0 - 24, and it is dependent on the block count
|
extra bits can be 0..24, and it is dependent on the block count
|
||||||
code. The symbols of the block count code alphabet, along with the
|
code. The symbols of the block count code alphabet, along with the
|
||||||
number of extra bits, and the range of block counts are as follows:
|
number of extra bits, and the range of block counts are as follows:
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
.KS
|
.KS
|
||||||
Extra Extra Extra
|
Extra Extra Extra
|
||||||
Code Bits Lengths Code Bits Lengths Code Bits Lengths
|
Code Bits Lengths Code Bits Lengths Code Bits Lengths
|
||||||
---- ---- ------ ---- ---- ------- ---- ---- -------
|
---- ---- ------- ---- ---- ------- ---- ---- -------
|
||||||
0 2 1-4 9 4 65-80 18 7 369-496
|
0 2 1..4 9 4 65..80 18 7 369..496
|
||||||
1 2 5-8 10 4 81-96 19 8 497-752
|
1 2 5..8 10 4 81..96 19 8 497..752
|
||||||
2 2 9-12 11 4 97-112 20 9 753-1264
|
2 2 9..12 11 4 97..112 20 9 753..1264
|
||||||
3 2 13-16 12 5 113-144 21 10 1265-2288
|
3 2 13..16 12 5 113..144 21 10 1265..2288
|
||||||
4 3 17-24 13 5 145-176 22 11 2289-4336
|
4 3 17..24 13 5 145..176 22 11 2289..4336
|
||||||
5 3 25-32 14 5 177-208 23 12 4337-8432
|
5 3 25..32 14 5 177..208 23 12 4337..8432
|
||||||
6 3 33-40 15 5 209-240 24 13 8433-16624
|
6 3 33..40 15 5 209..240 24 13 8433..16624
|
||||||
7 3 41-48 16 6 241-304 25 24 16625-16793840
|
7 3 41..48 16 6 241..304 25 24 16625..16793840
|
||||||
8 4 49-64 17 6 305-368
|
8 4 49..64 17 6 305..368
|
||||||
.KE
|
.KE
|
||||||
.fi
|
.fi
|
||||||
|
|
||||||
@ -1262,9 +1280,9 @@ now define the format of the context map (the same format is used
|
|||||||
for literal and distance context maps):
|
for literal and distance context maps):
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
1-5 bits: RLEMAX, 0 is encoded with one 0 bit, and values
|
1..5 bits: RLEMAX, 0 is encoded with one 0 bit, and values
|
||||||
1 - 16 are encoded with bit pattern xxxx1 (so 01001
|
1..16 are encoded with bit pattern xxxx1 (so 01001
|
||||||
is 5)
|
is 5)
|
||||||
|
|
||||||
Prefix code with alphabet size NTREES + RLEMAX
|
Prefix code with alphabet size NTREES + RLEMAX
|
||||||
|
|
||||||
@ -1398,7 +1416,7 @@ The form of these elementary transforms is as follows:
|
|||||||
.fi
|
.fi
|
||||||
|
|
||||||
For the purposes of UppercaseAll, word is parsed into UTF-8
|
For the purposes of UppercaseAll, word is parsed into UTF-8
|
||||||
characters and converted to upper-case by taking 1 - 3 bytes at a time,
|
characters and converted to upper-case by taking 1..3 bytes at a time,
|
||||||
using the algorithm below:
|
using the algorithm below:
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
@ -1447,10 +1465,10 @@ previous sections.
|
|||||||
The stream header has only the following one field:
|
The stream header has only the following one field:
|
||||||
|
|
||||||
.nf
|
.nf
|
||||||
1-7 bits: WBITS, a value in the range 10 - 24, encoded with
|
1..7 bits: WBITS, a value in the range 10..24, encoded with
|
||||||
the following variable length code (as it appears in
|
the following variable length code (as it appears in
|
||||||
the compressed data, where the bits are parsed from
|
the compressed data, where the bits are parsed from
|
||||||
right to left):
|
right to left):
|
||||||
|
|
||||||
Value Bit Pattern
|
Value Bit Pattern
|
||||||
----- -----------
|
----- -----------
|
||||||
@ -1527,7 +1545,7 @@ the following:
|
|||||||
zeros, then the stream should be rejected
|
zeros, then the stream should be rejected
|
||||||
as invalid)
|
as invalid)
|
||||||
|
|
||||||
0 - 7 bits: fill bits until the next byte boundary,
|
0..7 bits: fill bits until the next byte boundary,
|
||||||
must be all zeros
|
must be all zeros
|
||||||
|
|
||||||
MSKIPLEN bytes of metadata, not part of the
|
MSKIPLEN bytes of metadata, not part of the
|
||||||
@ -1546,7 +1564,7 @@ the following:
|
|||||||
ISLAST bit is not set (if the ignored bits are not
|
ISLAST bit is not set (if the ignored bits are not
|
||||||
all zeros, the stream should be rejected as invalid)
|
all zeros, the stream should be rejected as invalid)
|
||||||
|
|
||||||
1-11 bits: NBLTYPESL, # of literal block types, encoded with
|
1..11 bits: NBLTYPESL, # of literal block types, encoded with
|
||||||
the following variable length code (as it appears in
|
the following variable length code (as it appears in
|
||||||
the compressed data, where the bits are parsed from
|
the compressed data, where the bits are parsed from
|
||||||
right to left, so 0110111 has the value 12):
|
right to left, so 0110111 has the value 12):
|
||||||
@ -1555,13 +1573,13 @@ the following:
|
|||||||
----- -----------
|
----- -----------
|
||||||
1 0
|
1 0
|
||||||
2 0001
|
2 0001
|
||||||
3-4 x0011
|
3..4 x0011
|
||||||
5-8 xx0101
|
5..8 xx0101
|
||||||
9-16 xxx0111
|
9..16 xxx0111
|
||||||
17-32 xxxx1001
|
17..32 xxxx1001
|
||||||
33-64 xxxxx1011
|
33..64 xxxxx1011
|
||||||
65-128 xxxxxx1101
|
65..128 xxxxxx1101
|
||||||
129-256 xxxxxxx1111
|
129..256 xxxxxxx1111
|
||||||
|
|
||||||
Prefix code over the block type code alphabet for
|
Prefix code over the block type code alphabet for
|
||||||
literal block types, appears only if NBLTYPESL >= 2
|
literal block types, appears only if NBLTYPESL >= 2
|
||||||
@ -1572,8 +1590,8 @@ the following:
|
|||||||
Block count code + extra bits for first literal
|
Block count code + extra bits for first literal
|
||||||
block count, appears only if NBLTYPESL >= 2
|
block count, appears only if NBLTYPESL >= 2
|
||||||
|
|
||||||
1-11 bits: NBLTYPESI, # of insert-and-copy block types, encoded
|
1..11 bits: NBLTYPESI, # of insert-and-copy block types, encoded
|
||||||
with the same variable length code as above
|
with the same variable length code as above
|
||||||
|
|
||||||
Prefix code over the block type code alphabet for
|
Prefix code over the block type code alphabet for
|
||||||
insert-and-copy block types, appears only if NBLTYPESI >= 2
|
insert-and-copy block types, appears only if NBLTYPESI >= 2
|
||||||
@ -1584,8 +1602,8 @@ the following:
|
|||||||
Block count code + extra bits for first insert-and-copy
|
Block count code + extra bits for first insert-and-copy
|
||||||
block count, appears only if NBLTYPESI >= 2
|
block count, appears only if NBLTYPESI >= 2
|
||||||
|
|
||||||
1-11 bits: NBLTYPESD, # of distance block types, encoded
|
1..11 bits: NBLTYPESD, # of distance block types, encoded
|
||||||
with the same variable length code as above
|
with the same variable length code as above
|
||||||
|
|
||||||
Prefix code over the block type code alphabet for
|
Prefix code over the block type code alphabet for
|
||||||
distance block types, appears only if NBLTYPESD >= 2
|
distance block types, appears only if NBLTYPESD >= 2
|
||||||
@ -1604,15 +1622,15 @@ the following:
|
|||||||
|
|
||||||
NBLTYPESL x 2 bits: context mode for each literal block type
|
NBLTYPESL x 2 bits: context mode for each literal block type
|
||||||
|
|
||||||
1-11 bits: NTREESL, # of literal prefix trees, encoded
|
1..11 bits: NTREESL, # of literal prefix trees, encoded
|
||||||
with the same variable length code as NBLTYPESL
|
with the same variable length code as NBLTYPESL
|
||||||
|
|
||||||
Literal context map, encoded as described in Section 7.3.,
|
Literal context map, encoded as described in Section 7.3.,
|
||||||
appears only if NTREESL >= 2, otherwise the context map
|
appears only if NTREESL >= 2, otherwise the context map
|
||||||
has only zero values
|
has only zero values
|
||||||
|
|
||||||
1-11 bits: NTREESD, # of distance prefix trees, encoded
|
1..11 bits: NTREESD, # of distance prefix trees, encoded
|
||||||
with the same variable length code as NBLTYPESD
|
with the same variable length code as NBLTYPESD
|
||||||
|
|
||||||
Distance context map, encoded as described in Section 7.3.,
|
Distance context map, encoded as described in Section 7.3.,
|
||||||
appears only if NTREESD >= 2, otherwise the context map
|
appears only if NTREESD >= 2, otherwise the context map
|
||||||
@ -1806,19 +1824,183 @@ reference with <length = 5, distance = 2> adds X,Y,X,Y,X to the
|
|||||||
uncompressed stream.
|
uncompressed stream.
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
11. Security Considerations
|
11. Considerations for compressor implementations
|
||||||
|
|
||||||
|
Since the intent of this document is to define the brotli compressed data format
|
||||||
|
without reference to any particular compression algorithm, the material in this
|
||||||
|
section is not part of the definition of the format, and a compressor need not
|
||||||
|
follow it in order to be compliant.
|
||||||
|
|
||||||
|
.ti 0
|
||||||
|
11.1. Trivial compressor
|
||||||
|
|
||||||
|
In this section we present a very simple algorithm that produces a valid brotli
|
||||||
|
stream representing an arbitrary sequence of uncompressed bytes in the form of
|
||||||
|
the following C++ language function.
|
||||||
|
|
||||||
|
.nf
|
||||||
|
string BrotliCompressTrivial(const string& u) {
|
||||||
|
if (u.empty()) {
|
||||||
|
return string(1, 6);
|
||||||
|
}
|
||||||
|
int i;
|
||||||
|
string c;
|
||||||
|
c.append(1, 12);
|
||||||
|
for (i = 0; i + 65535 < u.size(); i += 65536) {
|
||||||
|
c.append(1, 248);
|
||||||
|
c.append(1, 255);
|
||||||
|
c.append(1, 15);
|
||||||
|
c.append(&u[i], 65536);
|
||||||
|
}
|
||||||
|
if (i < u.size()) {
|
||||||
|
int r = u.size() - i - 1;
|
||||||
|
c.append(1, (r & 31) << 3);
|
||||||
|
c.append(1, r >> 5);
|
||||||
|
c.append(1, 8 + (r >> 13));
|
||||||
|
c.append(&u[i], r + 1);
|
||||||
|
}
|
||||||
|
c.append(1, 3);
|
||||||
|
return c;
|
||||||
|
}
|
||||||
|
.fi
|
||||||
|
|
||||||
|
Note that this simple algorithm does not actually compress data, that is, the
|
||||||
|
brotli representation will always be bigger than the original, but it
|
||||||
|
shows that every sequence of N uncompressed bytes can be represented with a
|
||||||
|
valid brotli stream that is not longer than N + (3 * (N >> 16) + 5) bytes.
|
||||||
|
|
||||||
|
.ti 0
|
||||||
|
11.2. Aligning compressed meta-blocks to byte boundaries
|
||||||
|
|
||||||
|
As described in Section 9., only those meta-blocks that immediately follow an
|
||||||
|
uncompressed meta-block or a metadata meta-block are guaranteed to start on a
|
||||||
|
byte boundary. In some applications, it might be required that every
|
||||||
|
non-metadata meta-block starts on a byte boundary. This can be achieved by
|
||||||
|
appending an empty metadata meta-block after every non-metadata meta-block that
|
||||||
|
does not end on a byte boundary.
|
||||||
|
|
||||||
|
.ti 0
|
||||||
|
11.3. Creating self-contained parts within the compressed data
|
||||||
|
|
||||||
|
In some encoder implementations it might be required to make a sequence of
|
||||||
|
bytes within a brotli stream self-contained, that is, such that they
|
||||||
|
can be decompressed independently from previous parts of the compressed data.
|
||||||
|
This is a useful feature for three reasons. First, if a large compressed file
|
||||||
|
is damaged, it is possible to recover some of the file after the damage.
|
||||||
|
Second, it is useful when doing differential transfer of compressed data. If
|
||||||
|
a sequence of uncompressed bytes is unchanged and compressed independently
|
||||||
|
from previous data, then the compressed representation may also be
|
||||||
|
unchanged and can therefore be transferred very cheaply. Third, if sequences of
|
||||||
|
uncompressed bytes are compressed independently, it allows for parallel
|
||||||
|
compression of these byte sequences within the same file, in addition
|
||||||
|
to parallel compression of multiple files.
|
||||||
|
|
||||||
|
Given two sequences of uncompressed bytes, U0 and U1, we will now describe how
|
||||||
|
to create two sequences of compressed bytes, C0 and C1, such that the
|
||||||
|
concatenation of C0 and C1 is a valid brotli stream, and that C0 and C1
|
||||||
|
(together with the first byte of C0 that contains the window size)
|
||||||
|
can be decompressed independently from each other to U0 and U1.
|
||||||
|
|
||||||
|
When compressing the byte sequence U0 to produce C0, we can use any compressor
|
||||||
|
that works on the complete set of uncompressed bytes U0, with the following two
|
||||||
|
changes. First, the ISLAST bit of the last meta-block of C0 must not be set.
|
||||||
|
Second, C0 must end at a byte-boundary, which can be ensured by appending an
|
||||||
|
empty metadata meta-block to it, as in Section 11.2.
|
||||||
|
|
||||||
|
When compressing the byte sequence U1 to produce C1, we can use any compressor
|
||||||
|
that starts a new meta-block at the beginning of U1 within the U0+U1 input
|
||||||
|
stream, with the following two changes. First, backward distances in C1 must
|
||||||
|
not refer to static dictionary words or uncompressed bytes in U0.
|
||||||
|
Even if a sequence of bytes in U1 would match a static dictionary word, or a
|
||||||
|
sequence of bytes that overlaps U0, the compressor must represent this
|
||||||
|
sequence of bytes with a combination of literal insertions and backward
|
||||||
|
references to bytes in U1 instead. Second, the ring
|
||||||
|
buffer of last four distances must be replenished first with distances in C1
|
||||||
|
before using it to encode other distances in C1. Note that both compressors
|
||||||
|
producing C0 and C1 have to use the same window size, but the stream header is
|
||||||
|
emitted only by the compressor that produces C0.
|
||||||
|
|
||||||
|
Note that this method can be easily generalized to more than two sequences
|
||||||
|
of uncompressed bytes.
|
||||||
|
|
||||||
|
.ti 0
|
||||||
|
12. Security Considerations
|
||||||
|
|
||||||
As with any compressed file formats, decompressor implementations should
|
As with any compressed file formats, decompressor implementations should
|
||||||
handle all compressed data byte sequences, not only those that conform to this
|
handle all compressed data byte sequences, not only those that conform to this
|
||||||
specification, where non-conformant compressed data sequences should be discarded.
|
specification, where non-conformant compressed data sequences should be
|
||||||
|
discarded.
|
||||||
|
|
||||||
A possible attack against a system containing a decompressor
|
A possible attack against a system containing a decompressor
|
||||||
implementation (e.g. a web browser) is to exploit a buffer
|
implementation (e.g. a web browser) is to exploit a buffer overflow
|
||||||
overflow caused by an invalid compressed data. Therefore decompressor
|
triggered by invalid compressed data. Therefore decompressor
|
||||||
implementations should perform bounds-checking for each memory access
|
implementations should perform bounds-checking for each memory access
|
||||||
that result from values decoded from the compressed stream.
|
that result from values decoded from the compressed stream and derivatives
|
||||||
|
therof.
|
||||||
|
|
||||||
|
Another possible attack against a system containing a decompressor
|
||||||
|
implementation is to provide it (either valid or invalid) compressed data
|
||||||
|
that can make the decompressor system's resource consumption (cpu, memory, or
|
||||||
|
storage) to be disproportionately large compared to the size of the
|
||||||
|
compressed data. In addition to the size of the compressed data, the amount of
|
||||||
|
cpu, memory and storage required to decompress a single compressed meta-block
|
||||||
|
within a brotli stream is controlled by the following two paramters: the size of
|
||||||
|
the uncompressed meta-block, which is encoded at the start of the compressed
|
||||||
|
meta-block, and the size of the sliding window, which is encoded at the start
|
||||||
|
of the brotli stream. Decompressor implementations in systems where
|
||||||
|
memory or storage is constrained should perform a sanity-check on these two
|
||||||
|
parameters. The uncompressed meta-block size that was decoded from the
|
||||||
|
compressed stream should be compared against either a hard limit, given by the
|
||||||
|
system's constraints or some expectation about the uncompressed data, or against
|
||||||
|
a certain multiple of the size of the compressed data. If the uncompressed
|
||||||
|
meta-block size is determined to be too high, the compressed data should be
|
||||||
|
rejected. Likewise, when the complete uncompressed stream is kept in the
|
||||||
|
system containing the decompressor implementation, the total uncompressed
|
||||||
|
size of the stream should be checked before decompressing each additional
|
||||||
|
meta-block. If the size of the sliding window that was decoded from the start
|
||||||
|
of the compressed stream is greater than a certain soft limit, then the
|
||||||
|
decompressor implementation should, at first, allocate a smaller sliding
|
||||||
|
window that fits the first uncompressed meta-block, and afterwards, before
|
||||||
|
decompressing each additional meta-block, it should increase the size of the
|
||||||
|
sliding window until the sliding window size specified in the compressed data
|
||||||
|
is reached.
|
||||||
|
|
||||||
|
Correspondingly, possible attacks against a system containing a compressor
|
||||||
|
implementation (e.g. a web server) are to exploit a buffer overflow or cause
|
||||||
|
disproportionately large resource consumption by providing e.g. uncompressible
|
||||||
|
data.
|
||||||
|
As described in Section 11.1., an output buffer of
|
||||||
|
|
||||||
|
.nf
|
||||||
|
S(N) = N + (3 * (N >> 16) + 5)
|
||||||
|
.fi
|
||||||
|
|
||||||
|
bytes is sufficient to hold a valid compressed brotli
|
||||||
|
stream representing an arbitrary sequence of N uncompressed bytes.
|
||||||
|
Therefore compressor implementations should allocate at least S(N) bytes of
|
||||||
|
output buffer before compressing N bytes of data with unknown compressibility
|
||||||
|
and should perform bounds-checking for each write into this output buffer.
|
||||||
|
If their output buffer is full, compresor implementations should
|
||||||
|
revert to the trivial compression algorithm described in Section 11.1.
|
||||||
|
The resourse consumption of a compressor implementation for a particular input
|
||||||
|
data depends mostly on the algorithm used to find backward matches and on the
|
||||||
|
algorithm used to construct context maps and prefix codes and only to a lesser
|
||||||
|
extent on the input data itself. If the system containing a compressor
|
||||||
|
implementation is overloaded, a possible way to reduce resource usage is to
|
||||||
|
switch to more simple algorithms for backward reference search and prefix code
|
||||||
|
construction, or to fall back to the trivial compression algorithm described in
|
||||||
|
Section 11.1.
|
||||||
|
|
||||||
|
A possible attack against a system that sends compressed data over an encrypted
|
||||||
|
channel is the following. An attacker who can repeatedly mix arbitrary
|
||||||
|
(attacker-supplied) data with secret data (passwords, cookies) and observe the
|
||||||
|
length of the ciphertext can potentially reconstruct the secret data. To
|
||||||
|
protect against this kind of attack, applications should not mix sensitive data
|
||||||
|
with non-sensitive, potentially attacker-supplied data in the same compressed
|
||||||
|
stream.
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
12. IANA Considerations
|
13. IANA Considerations
|
||||||
|
|
||||||
The "HTTP Content Coding Registry" has been updated with the
|
The "HTTP Content Coding Registry" has been updated with the
|
||||||
registration below:
|
registration below:
|
||||||
@ -1834,7 +2016,7 @@ registration below:
|
|||||||
.fi
|
.fi
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
13. Informative References
|
14. Informative References
|
||||||
.in 14
|
.in 14
|
||||||
|
|
||||||
.ti 3
|
.ti 3
|
||||||
@ -1858,7 +2040,7 @@ http://www.ietf.org/rfc/rfc1951.txt
|
|||||||
.in 3
|
.in 3
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
14. Source code
|
15. Source code
|
||||||
|
|
||||||
Source code for a C language implementation of a brotli compliant
|
Source code for a C language implementation of a brotli compliant
|
||||||
decompressor and a C++ language implementation of a compressor is
|
decompressor and a C++ language implementation of a compressor is
|
||||||
@ -1866,7 +2048,7 @@ available in the brotli open-source project:
|
|||||||
https://github.com/google/brotli
|
https://github.com/google/brotli
|
||||||
|
|
||||||
.ti 0
|
.ti 0
|
||||||
15. Acknowledgments
|
16. Acknowledgments
|
||||||
|
|
||||||
The authors would like to thank Mark Adler, Robert Obryk, Thomas
|
The authors would like to thank Mark Adler, Robert Obryk, Thomas
|
||||||
Pickert, and Joe Tsai for providing helpful review comments,
|
Pickert, and Joe Tsai for providing helpful review comments,
|
||||||
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user