Minor formatting changes

* In the description about "three categories", explicitly number them instead of using a giant paragraph that is harder to follow. * Switch lists of items to consistently use American style commas. The American style lists is better for clarity purposes. Consider the following: -Each category of value (insert and copy lengths, literals and distances) +Each category of value (insert and copy lengths, literals, and distances) * Make sure not to break a hyphenated phrase with a newline. When the nroff file is processed, "insert-\nand-copy" becomes "insert- and-copy", making it inconsistent with other uses of the hyphenated phrase. * Consistently use the same hyphenated phrase if referred to as a single unit. "insert and copy" -> "insert-and-copy" "least significant" -> "least-significant" "most significant" -> "most-significant" "fixed length" -> "fixed-length" "block switch" -> "block-switch". * Consistently use "indexes" instead of "indices"
2024-11-24 04:20:16 +00:00 · 2015-11-01 16:50:13 -08:00 · 2015-11-01 16:50:13 -08:00 · c5b6b5c7c1
commit c5b6b5c7c1
parent 166edb0287
1 changed files with 97 additions and 93 deletions
--- a/docs/draft-alakuijala-brotli-07.nroff
+++ b/docs/draft-alakuijala-brotli-07.nroff
@ -160,11 +160,11 @@ String: a sequence of arbitrary bytes.

 Bytes stored within a computer do not have a "bit order", since
 they are always treated as a unit. However, a byte considered as
-an integer between 0 and 255 does have a most- and least-
-significant bit, and since we write numbers with the most-
-significant digit on the left, we also write bytes with the most-
-significant bit on the left. In the diagrams below, we number the
-bits of a byte so that bit 0 is the least-significant bit, i.e.,
+an integer between 0 and 255 does have a most- and
+least-significant bit, and since we write numbers with the
+most-significant digit on the left, we also write bytes with the
+most-significant bit on the left. In the diagrams below, we number
+the bits of a byte so that bit 0 is the least-significant bit, i.e.,
 the bits are numbered:

 .nf
@ -209,15 +209,15 @@ compressed byte sequence:
     starting with the least-significant bit of the data
     element. These are referred to here as integer values
     and are considered unsigned.
-   * Prefix codes are packed starting with the most-
-     significant bit of the code.
+   * Prefix codes are packed starting with the most-significant
+     bit of the code.
 .fi

 In other words, if one were to print out the compressed data as
 a sequence of bytes, starting with the first byte at the
-*right* margin and proceeding to the *left*, with the most-
-significant bit of each byte on the left as usual, one would be
-able to parse the result from right to left, with fixed-width
+*right* margin and proceeding to the *left*, with the
+most-significant bit of each byte on the left as usual, one would
+be able to parse the result from right to left, with fixed-width
 elements in the correct MSB-to-LSB order and prefix codes in
 bit-reversed order (i.e., with the first bit of the code in the
 relative LSB position).
@ -225,8 +225,8 @@ relative LSB position).
 .ti 0
 2. Compressed representation overview

-A compressed data set consists of a header and a series of meta-
-blocks. Each meta-block decompresses to a sequence of 0
+A compressed data set consists of a header and a series of
+meta-blocks. Each meta-block decompresses to a sequence of 0
 to 16,777,216 (16 MiB) uncompressed bytes. The final uncompressed data is
 the concatenation of the uncompressed sequences from each meta-block.

@ -259,30 +259,37 @@ duplicated is two, but the last command in the meta-block is permitted to have
 only literals and no pointer to a string to duplicate.

 Each command in the compressed data is represented using three categories
-of prefix codes: one set of prefix codes are for the literal sequence
-lengths (also referred to as literal insertion lengths) and backward
-copy lengths (that is, a single code word represents two lengths,
-one of the literal sequence and one of the backward copy), a separate
-set of prefix codes are for literals, and a third set of prefix codes are for
-distances. The prefix code descriptions for each meta-block appear in a compact
-form just before the compressed data in the meta-block header. The insert and
-copy length and distance prefix codes may be followed by extra bits that are
-added to the base values determined by the codes. The number of extra bits is
-determined by the code.
+of prefix codes:
+
+   1) One set of prefix codes are for the literal sequence lengths
+      (also referred to as literal insertion lengths) and backward
+      copy lengths (that is, a single code word represents two lengths,
+      one of the literal sequence and one of the backward copy).
+
+   2) One set of prefix codes are for literals.
+
+   3) One set of prefix codes are for distances.
+
+The prefix code descriptions for each meta-block appear in a compact
+form just before the compressed data in the meta-block header.
+The insert-and-copy length and distance prefix codes may be followed by
+extra bits that are added to the base values determined by the codes.
+The number of extra bits is determined by the code.

 One meta-block command then appears as a sequence of prefix codes:

-   Insert and copy length, literal, literal, ..., literal, distance
+   Insert-and-copy length, literal, literal, ..., literal, distance

-where the insert and copy defines the number of literals that immediately
-follow and the copy length, and the distance defines how far back to go
-for the copy, used in combination with the copy length. The resulting
+where the insert-and-copy defines an insertion length and a copy length.
+The insertion length determines the number of literals that immediately
+follow. The distance defines how far back to go for the copy and the
+copy length determine the number of bytes to copy. The resulting
 uncompressed data is the sequence of bytes:

   literal, literal, ..., literal, copy, copy, ..., copy

 where the number of literal bytes and copy bytes are determined by the
-insert and copy length code. (The number of bytes copied for a static
+insert-and-copy length code. (The number of bytes copied for a static
 dictionary entry can vary from the copy length.)

 The last command in the meta-block may end with the last literal if the
@ -301,14 +308,13 @@ a new block type and block count is read from the stream immediately
 preceding the next element of that category, which will use the new
 block type.

-The insert and copy block type directly determines which prefix code to
-use for the next insert and copy element. For the literal and distance
+The insert-and-copy block type directly determines which prefix code to
+use for the next insert-and-copy element. For the literal and distance
 elements, the respective block type is used in combination with other
 context information to determine which prefix code to use for the next
 element.

-Consider the following
-example:
+Consider the following example:

   (IaC0, L0, L1, L2, D0)(IaC1, D1)(IaC2, L3, L4, D2)(IaC3, L5, D3)

@ -341,7 +347,7 @@ meta-block is then:
   IaC0 L0 L1 LBlockSwitch(1, 3) L2 D0 IaC1 DBlockSwitch(1, 3) D1
   IaCBlockSwitch(1, 2) IaC2 L3 L4 D2 IaC3 LBlockSwitch(0, 1) L5 D3

-where *BlockSwitch(t, n) switches to block type t for a count of n elements.
+where xBlockSwitch(t, n) switches to block type t for a count of n elements.
 Note that in this example DBlockSwitch(1, 3) immediately precedes the
 next required distance D1. It does not follow the last distance of
 the previous block, D0. Whenever an element of a category is needed,
@ -349,14 +355,14 @@ and the block count for that category has reached zero, then a new
 block type and count is read from the stream just before reading that next
 element.

-The block switch commands for the first blocks of each category are not part
+The block-switch commands for the first blocks of each category are not part
 of the meta-block compressed data. Instead the first block type
 is defined to be 0, and the first block count for each category is
 encoded in the meta-block header. The prefix codes for the block types and counts, a total of
 six prefix codes over the three categories, are defined in a compact form in the meta-block
 header.

-Each category of value (insert-and-copy lengths, literals and distances)
+Each category of value (insert-and-copy lengths, literals, and distances)
 can be encoded with any prefix code from a collection of prefix
 codes belonging to the same category appearing in the meta-block header. The
 particular prefix code used can depend on two factors: the block
@ -366,10 +372,9 @@ the uncompressed data, and in the case of distances, the context is the copy
 length from the same command. For insert-and-copy lengths, no context
 is used and the prefix code depends only on the block type. In the case
 of literals and distances, the context is mapped to a context ID in
-the range 0..63 for literals and 0..3 for distances and the matrix
-of the prefix code indices for each block type and context ID,
-called the context map, is encoded in a compact form in the meta-
-block header.
+the range 0..63 for literals and 0..3 for distances. The matrix
+of the prefix code indexes for each block type and context ID,
+called the context map, is encoded in a compact form in the meta-block header.

 For example, the prefix code to use to decode L2 depends on the
 block type (1), and the literal context ID determined by the two uncompressed
@ -379,8 +384,8 @@ type (0), and the distance context ID determined by the copy length decoded
 from IaC0. The prefix code to use to decode IaC3 depends only on the block
 type (1).

-In addition to the parts listed above (prefix code for insert-
-and-copy lengths, literals, distances, block types and block counts
+In addition to the parts listed above (prefix code for insert-and-copy
+lengths, literals, distances, block types and block counts,
 and the context map), the meta-block header contains the number of
 uncompressed bytes coded in the meta-block and two additional parameters used in
 the representation of match distances: the number of postfix bits and
@ -515,7 +520,7 @@ initially in tree[I].Len; the codes are produced in tree[I].Code.
      value.

 .KS
-         for (n = 0;  n <= max_code; n++) {
+         for (n = 0; n <= max_code; n++) {
            len = tree[n].Len;
            if (len != 0) {
               tree[n].Code = next_code[len];
@ -577,7 +582,7 @@ Prefix codes are used for different purposes in the brotli
 format, and each purpose has a different alphabet size. For
 literal codes the alphabet size is 256. For insert-and-copy
 length codes the alphabet size is 704. For block count codes,
-the alphabet size is 26. For distance codes, block type codes and
+the alphabet size is 26. For distance codes, block type codes, and
 the prefix codes used in compressing the context map, the
 alphabet size is dynamic and is based on parameters defined in
 later sections. The following table summarizes the alphabet sizes
@ -585,25 +590,25 @@ for the various prefix codes and the sections where they are defined.

 .nf
 .KS
-   +-----------------+------------------------+-------------+
-   | Prefix code     | Alphabet size          | Definition  |
-   +-----------------+------------------------+-------------+
-   | literal         | 256                    |             |
-   +-----------------+------------------------+-------------+
-   | distance        | 16 + NDIRECT +         | Section 4.  |
-   |                 | (48 << NPOSTFIX)       |             |
-   +-----------------+------------------------+-------------+
-   | insert-and-copy | 704                    | Section 5.  |
-   | length          |                        |             |
-   +-----------------+------------------------+-------------+
-   | block count     | 26                     | Section 6.  |
-   +-----------------+------------------------+-------------+
-   | block type      | NBLTYPESx + 2,         | Section 6.  |
-   |                 | (where x is I, L or D) |             |
-   +-----------------+------------------------+-------------+
-   | context map     | NTREESx + RLEMAXx      | Section 7.  |
-   |                 | (where x is L or D)    |             |
-   +-----------------+------------------------+-------------+
+   +-----------------+-------------------------+------------+
+   | Prefix code     | Alphabet size           | Definition |
+   +-----------------+-------------------------+------------+
+   | literal         | 256                     |            |
+   +-----------------+-------------------------+------------+
+   | distance        | 16 + NDIRECT +          | Section 4. |
+   |                 | (48 << NPOSTFIX)        |            |
+   +-----------------+-------------------------+------------+
+   | insert-and-copy | 704                     | Section 5. |
+   | length          |                         |            |
+   +-----------------+-------------------------+------------+
+   | block count     | 26                      | Section 6. |
+   +-----------------+-------------------------+------------+
+   | block type      | NBLTYPESx + 2,          | Section 6. |
+   |                 | (where x is I, L, or D) |            |
+   +-----------------+-------------------------+------------+
+   | context map     | NTREESx + RLEMAXx       | Section 7. |
+   |                 | (where x is L or D)     |            |
+   +-----------------+-------------------------+------------+
 .KE
 .fi

@ -615,8 +620,8 @@ prefix code distinguishes between simple and complex prefix
 codes. If this value is 1, then a simple prefix code follows.
 Otherwise the value indicates the number of leading zeros.

-A simple prefix code can have only up to four symbols with non-
-zero code length. The format of the simple prefix code is as
+A simple prefix code can have only up to four symbols with
+non-zero code length. The format of the simple prefix code is as
 follows:

 .nf
@ -657,7 +662,7 @@ follows:
     prefix code.

   * if NSYM = 4, the code lengths (in order of symbols decoded)
-     depend on the tree-select bit: 2, 2, 2, 2, (tree-select bit 0)
+     depend on the tree-select bit: 2, 2, 2, 2 (tree-select bit 0),
     or 1, 2, 3, 3 (tree-select bit 1).
 .fi

@ -724,11 +729,10 @@ in the compressed data, where the bits are parsed from right to left):
 .KE
 .fi

-We can now define the format of the complex prefix code as
-follows:
+We can now define the format of the complex prefix code as follows:

 .nf
-   2 bits: HSKIP, values of 0, 2 or 3 represent the respective
+   2 bits: HSKIP, values of 0, 2, or 3 represent the respective
           number of skipped code lengths. The skipped lengths
           are taken to be zero. (An HSKIP of 1 indicates a
           Simple prefix code.)
@ -742,7 +746,7 @@ follows:
      is for symbol 4.

      The code lengths of code length symbols are between 0 and
-      5 and they are represented with 2 - 4 bits according to
+      5, and they are represented with 2 - 4 bits according to
      the variable length code above. A code length of 0 means
      the corresponding code length symbol is not used.

@ -805,7 +809,7 @@ the number of direct distance codes, denoted by NDIRECT (0..120). Both of
 these parameters are encoded in the meta-block header. We will also
 use the following derived parameter:

-   POSTFIX_MASK = ((1 << NPOSTFIX) - 1)
+   POSTFIX_MASK = (1 << NPOSTFIX) - 1

 The first 16 distance symbols are special symbols that reference
 past distances as follows:
@ -830,10 +834,10 @@ past distances as follows:
 .fi

 The ring buffer of four last distances is initialized by the values
-16, 15, 11 and 4 (i.e. the fourth-to-last is set to 16, the third-to-last
-to 15, the second-to-last to 11 and the last distance to 4) at the
-beginning of the *stream* (as opposed to the beginning of the meta-
-block) and it is not reset at meta-block boundaries. When a distance
+16, 15, 11, and 4 (i.e. the fourth-to-last is set to 16, the third-to-last
+to 15, the second-to-last to 11, and the last distance to 4) at the
+beginning of the *stream* (as opposed to the beginning of the meta-block)
+and it is not reset at meta-block boundaries. When a distance
 symbol 0 appears, the distance it represents (i.e. the last distance
 in the sequence of distances) is not pushed to the ring buffer of
 last distances, in other words, the expression "(second, third,
@ -865,7 +869,7 @@ Given a distance symbol "dcode" (>= 16 + NDIRECT), and extra bits
 .nf
   hcode = (dcode - NDIRECT - 16) >> NPOSTFIX
   lcode = (dcode - NDIRECT - 16) & POSTFIX_MASK
-   offset = ((2 + (hcode & 1)) << ndistbits) - 4;
+   offset = ((2 + (hcode & 1)) << ndistbits) - 4
   distance = ((offset + dextra) << NPOSTFIX) + lcode + NDIRECT + 1
 .fi

@ -881,7 +885,7 @@ of a meta-block is represented with the following triplet:

   <insert-and-copy length code, insert extra bits, copy extra bits>

-The insert-and-copy length code, the insert extra bits and the copy
+The insert-and-copy length code, the insert extra bits, and the copy
 extra bits are encoded back-to-back, the insert-and-copy length code
 is encoded using a prefix code over the insert-and-copy length code
 alphabet, while the extra bits values are encoded as fixed-width
@ -999,7 +1003,7 @@ are initialized to 1 and 0, respectively, at the end of the
 meta-block header.

 Since the first block type of each block category is 0, the block
-type of the first block switch command is not encoded in
+type of the first block-switch command is not encoded in
 the compressed data. Instead the block count for each category
 that has more than one type is encoded in the meta-block header.

@ -1009,13 +1013,13 @@ count down to exactly zero at the end of the meta-block.

 The number of different block types in each block category, denoted
 by NBLTYPESL, NBLTYPESI, and NBLTYPESD for literals, insert-and-copy
-lengths and distances, respectively, is encoded in the meta-block
+lengths, and distances, respectively, is encoded in the meta-block
 header, and it must equal to the largest block type plus one in that
 block category. In other words, the set of literal, insert-and-copy
-length and distance block types must be [0..NBLTYPESL-1],
+length, and distance block types must be [0..NBLTYPESL-1],
 [0..NBLTYPESI-1], and [0..NBLTYPESD-1], respectively. From this it
-follows that the alphabet size of literal, insert-and-copy length and
-distance block type codes is NBLTYPESL + 2, NBLTYPESI + 2 and
+follows that the alphabet size of literal, insert-and-copy length, and
+distance block type codes is NBLTYPESL + 2, NBLTYPESI + 2, and
 NBLTYPESD + 2, respectively.

 Each block count in the compressed data is represented with a pair
@ -1044,7 +1048,7 @@ number of extra bits and the range of block counts are as follows:
 .KE
 .fi

-The first block switch command of each block category is special in
+The first block-switch command of each block category is special in
 the sense that it is encoded in the meta-block header, and as
 described earlier the block type code is omitted, since it is an
 implicit zero.
@ -1072,10 +1076,10 @@ p1 and p2 are initialized to zero.
 There are four methods, called context modes, to compute the
 Context ID:
 .nf
-   * LSB6, where the Context ID is the value of six least
-     significant bits of p1,
-   * MSB6, where the Context ID is the value of six most
-     significant bits of p1,
+   * LSB6, where the Context ID is the value of six
+     least-significant bits of p1,
+   * MSB6, where the Context ID is the value of six
+     most-significant bits of p1,
   * UTF8, where the Context ID is a complex function of p1, p2,
     optimized for text compression, and
   * Signed, where Context ID is a complex function of p1, p2,
@ -1156,9 +1160,9 @@ Given p1 is the last uncompressed byte and p2 is the second-to-last
 uncompressed byte the context IDs can be computed as follows:

 .nf
-   For LSB6  :  Context ID = p1 & 0x3f
-   For MSB6  :  Context ID = p1 >> 2
-   For UTF8  :  Context ID = Lut0[p1] | Lut1[p2]
+   For LSB6:    Context ID = p1 & 0x3f
+   For MSB6:    Context ID = p1 >> 2
+   For UTF8:    Context ID = Lut0[p1] | Lut1[p2]
   For Signed:  Context ID = (Lut2[p1] << 3) | Lut2[p2]
 .fi

@ -1466,7 +1470,7 @@ the following:
           fill bits are not zero, then the stream should be
           rejected as invalid)
   2 bits: MNIBBLES, # of nibbles to represent the uncompressed
-           length, encoded with the following fixed length code:
+           length, encoded with the following fixed-length code:

                 Value    Bit Pattern
                 -----    -----------
@ -1563,7 +1567,7 @@ the following:

   2 bits: NPOSTFIX, parameter used in the distance coding

-   4 bits: four most significant bits of NDIRECT, to get the
+   4 bits: four most-significant bits of NDIRECT, to get the
           actual value of the parameter NDIRECT, left-shift
           this four bit number by NPOSTFIX bits

@ -1773,7 +1777,7 @@ specification, where non-conformant compressed data sequences should be discarde
 A possible attack against a system containing a decompressor
 implementation (e.g. a web browser) is to exploit a buffer
 overflow caused by an invalid compressed data. Therefore decompressor
-implementations should perform bound-checking for each memory access
+implementations should perform bounds-checking for each memory access
 that result from values decoded from the compressed stream.

 .ti 0
@ -5678,8 +5682,8 @@ length is 122,784 bytes and the zlib CRC-32 of the byte sequence is
      9ce0a58de0a49ee0a4bee0a4aae0a4a8e0a495e0a4bee0a4b0e0a58de0a4b0e0
      a4b5e0a4bee0a488e0a4b8e0a495e0a58de0a4b0e0a4bfe0a4afe0a4a4e0a4be

-   The number of words for each length is given by the following bit-
-   depth array:
+   The number of words for each length is given by the following
+   bit-depth array:

      NDBITS :=  0,  0,  0,  0, 10, 10, 11, 11, 10, 10,
                10, 10, 10,  9,  9,  8,  7,  7,  8,  7,