From 6d2575eab37e14865130fc676c6130a784b5adae Mon Sep 17 00:00:00 2001 From: Joe Tsai Date: Thu, 29 Oct 2015 09:39:06 -0700 Subject: [PATCH 1/5] Use consistent bit convention in Section 5. * Rather than say "lower 3 bits" in one sentence and "bits 3-5" in the sentence right after, just consistently use the same convention and say "0-2" and "3-5". --- docs/draft-alakuijala-brotli-07.nroff | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/draft-alakuijala-brotli-07.nroff b/docs/draft-alakuijala-brotli-07.nroff index 496cce8..0e2b288 100644 --- a/docs/draft-alakuijala-brotli-07.nroff +++ b/docs/draft-alakuijala-brotli-07.nroff @@ -964,13 +964,14 @@ and a copy length code, the following table can be used: First, look up the cell with the 64 value range containing the insert-and-copy length code, this gives the insert length code and -the copy length code ranges, both 8 values long. The copy length -code within its range is determined by the lowest 3 bits of the -insert-and-copy length code, and the insert length code within its -range is determined by bits 3-5 (counted from the LSB) of the insert- -and-copy length code. Given the insert length and copy length codes, -the actual insert and copy lengths can be obtained by reading the -number of extra bits given by the tables above. +the copy length code ranges, both 8 values long. +The copy length code within its range is determined by bits 0-2 +(counted from the LSB) of the insert-and-copy length code. +The insert length code within its range is determined by bits 3-5 +(counted from the LSB) of the insert-and-copy length code. +Given the insert length and copy length codes, the actual insert +and copy lengths can be obtained by reading the number of extra +bits given by the tables above. If the insert-and-copy length code is between 0 and 127, the distance code of the command is set to zero (the last distance reused). From 185cb9eadaafa5098b2c3ef9e4195350e2dc05fa Mon Sep 17 00:00:00 2001 From: Joe Tsai Date: Thu, 29 Oct 2015 09:40:41 -0700 Subject: [PATCH 2/5] Define the maximum number of bytes transforms may add to a word * This value is useful in implementing the decoder since we can know ahead-of-time what size buffer is needed to contain the output of a transformed word. --- docs/draft-alakuijala-brotli-07.nroff | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/draft-alakuijala-brotli-07.nroff b/docs/draft-alakuijala-brotli-07.nroff index 0e2b288..493d59d 100644 --- a/docs/draft-alakuijala-brotli-07.nroff +++ b/docs/draft-alakuijala-brotli-07.nroff @@ -1387,6 +1387,11 @@ Note that the OmitFirst8 elementary transform is not used in the list of transformations. The strings in Appendix B. are in C string format with respect to escape (backslash) characters. +The maximum number of additional bytes that a transform may add to a +base word is 13. Since the largest base word is 24 bytes long, a buffer +of 38 bytes is sufficient to store any transformed words +(counting a terminating zero byte). + .ti 0 9. Compressed data format From 2ffe45bd670611e4524ac34a0b88c6036754c129 Mon Sep 17 00:00:00 2001 From: Joe Tsai Date: Thu, 29 Oct 2015 09:42:00 -0700 Subject: [PATCH 3/5] Clarify Section 4. * If NDIRECT is zero, then the paragraph reads "from 16 to 15", which doesn't make much sense. Thus, add a conditional to avoid this minor oddity. --- docs/draft-alakuijala-brotli-07.nroff | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/draft-alakuijala-brotli-07.nroff b/docs/draft-alakuijala-brotli-07.nroff index 493d59d..3fcc608 100644 --- a/docs/draft-alakuijala-brotli-07.nroff +++ b/docs/draft-alakuijala-brotli-07.nroff @@ -845,9 +845,10 @@ not pushed to the ring buffer of last distances. If a special distance symbol resolves to a zero or negative value, the stream should be rejected as invalid. -The next NDIRECT distance symbols, from 16 to 15 + NDIRECT, represent -distances from 1 to NDIRECT. Neither the distance special symbols, nor -the NDIRECT direct distance symbols are followed by any extra bits. +If NDIRECT is greater than zero, then the next NDIRECT distance symbols, +from 16 to 15 + NDIRECT, represent distances from 1 to NDIRECT. +Neither the special distance symbols, nor the NDIRECT direct distance +symbols are followed by any extra bits. Distance symbols 16 + NDIRECT and greater all have extra bits, where the number of extra bits for a distance symbol "dcode" is given by the From ff3897df2d994bbb2c5a20297f71a1a1e76ecef9 Mon Sep 17 00:00:00 2001 From: Joe Tsai Date: Thu, 29 Oct 2015 09:44:23 -0700 Subject: [PATCH 4/5] Clarify Section 8. * The phrase "difference between these distances" can either refer to the conceptual difference (i.e. they hae different semantic meaning) or to the mathematical difference (i.e. use substraction for the two). Instead, just remove the sentence since the equations below make it clear what we're supposed to do here. --- docs/draft-alakuijala-brotli-07.nroff | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/draft-alakuijala-brotli-07.nroff b/docs/draft-alakuijala-brotli-07.nroff index 3fcc608..91ab2f5 100644 --- a/docs/draft-alakuijala-brotli-07.nroff +++ b/docs/draft-alakuijala-brotli-07.nroff @@ -1277,10 +1277,9 @@ At any given point during decoding the compressed data, a reference to a duplicated string in the uncompressed data produced so far has a maximum backward distance value, which is the minimum of the window size and the number of uncompressed bytes produced. However, decoding a distance -from the compressed stream, as described in section 4, can produce -distances that are greater than this maximum allowed value. The -difference between these distances and the first invalid distance -value is treated as reference to a word in the static dictionary +from the compressed stream, as described in Section 4., can produce +distances that are greater than this maximum allowed value. In this case, +the distance is treated as a reference to a word in the static dictionary given in Appendix A. The copy length for a static dictionary reference must be between 4 and 24. The static dictionary has three parts: @@ -1325,7 +1324,7 @@ follows: The string copied to the uncompressed stream is computed by applying the transformation to the base dictionary word. If transform_id is -greater than 120 or length is smaller than 4 or greater than 24, then +greater than 120, or the length is smaller than 4 or greater than 24, then the compressed stream should be rejected as invalid. Each word transformation has the following form: From 542a8b776e4e0e84372bc8bba3e2809685d1c858 Mon Sep 17 00:00:00 2001 From: Joe Tsai Date: Thu, 29 Oct 2015 09:50:19 -0700 Subject: [PATCH 5/5] Clarify Section 7.3 * Acknowledge the fact that the context map is conceptually really a two-dimensional matrix with 2 different keys, but in reality stored as a one-dimensional array. * Mention that InverseMoveToFrontTransform will not cause the context map to have invalid indexes. This gives someone implementing a decoder sanity that they do not have to go through the context map again and check that all values are less than NTREES. --- docs/draft-alakuijala-brotli-07.nroff | 29 ++++++++++++++++++--------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/docs/draft-alakuijala-brotli-07.nroff b/docs/draft-alakuijala-brotli-07.nroff index 91ab2f5..4310dbc 100644 --- a/docs/draft-alakuijala-brotli-07.nroff +++ b/docs/draft-alakuijala-brotli-07.nroff @@ -1053,10 +1053,10 @@ implicit zero. 7. Context modeling As described in Section 2, the prefix tree used to encode a literal -byte or a distance code depends on the context ID and the block type. +byte or a distance code depends on the block type and the context ID. This section specifies how to compute the context ID for a particular literal and distance code, and how to encode the context map that -maps a pair to the index of a prefix +maps a pair to the index of a prefix code in the array of literal and distance prefix codes. .ti 0 @@ -1190,15 +1190,21 @@ context map is an integer between 0 and 255, indicating the index of the prefix code to be used when encoding the next literal or distance. -The context map is encoded as a one-dimensional array, -CMAPL[0..(64 * NBLTYPESL - 1)] and CMAPD[0..(4 * NBLTYPESD - 1)]. +The context maps are two-dimensional matrices, encoded as +one-dimensional arrays: + +.nf + CMAPL[0..(64 * NBLTYPESL - 1)] + CMAPD[0..(4 * NBLTYPESD - 1)] +.fi The index of the prefix code for encoding a literal or distance -code with context ID, CIDx, and block type, BTYPE_x, is: +code with block type, BTYPE_x, and context ID, CIDx, is: +.nf index of literal prefix code = CMAPL[64 * BTYPE_L + CIDL] - index of distance prefix code = CMAPD[4 * BTYPE_D + CIDD] +.fi The values of the context map are encoded with the combination of run length encoding for zero values and prefix coding. Let @@ -1245,11 +1251,11 @@ for literal and distance context maps): .fi Note that RLEMAX may be larger than the value necessary to represent -the longest sequence of zero values. +the longest sequence of zero values. Also, the NTREES value is encoded +right before the context map as described in Section 9.2. -For the encoding of NTREES see Section 9.2. We define the -inverse move-to-front transform used in this specification by the -following C language function: +We define the inverse move-to-front transform used in this specification +by the following C language function: .nf void InverseMoveToFrontTransform(uint8_t* v, int v_len) { @@ -1270,6 +1276,9 @@ following C language function: } .fi +Note that the inverse move-to-front transform will not produce values +outside the [0..NTREES-1] interval. + .ti 0 8. Static dictionary