* At the beginning of the simple prefix code section, telling us that "a value
of 1 indicates the number of leading zeros" is not very helpful. Instead, it
should indicate that it means a complex prefix code and point the reader to
the relevant section (which repeats this information in more detail)
* Clearly indicate that reusing a value is an error! This seems to be the
behavior of the of the reference implementation.
* Clarify what the termination conditions are while reading the prefix codes.
Also, indicate that it is an error if the prefix tree is over-subscribed or
under-subscribed.
* Clearly state what is the maximum number of individual symbols that may be
read. This ensures that it is forbidden to an stream that continually says that
the symbols have zero length.
* In the description about "three categories", explicitly number them instead
of using a giant paragraph that is harder to follow.
* Switch lists of items to consistently use American style commas. The American
style lists is better for clarity purposes. Consider the following:
-Each category of value (insert and copy lengths, literals and distances)
+Each category of value (insert and copy lengths, literals, and distances)
* Make sure not to break a hyphenated phrase with a newline. When the nroff
file is processed, "insert-\nand-copy" becomes "insert- and-copy", making it
inconsistent with other uses of the hyphenated phrase.
* Consistently use the same hyphenated phrase if referred to as a single unit.
"insert and copy" -> "insert-and-copy"
"least significant" -> "least-significant"
"most significant" -> "most-significant"
"fixed length" -> "fixed-length"
"block switch" -> "block-switch".
* Consistently use "indexes" instead of "indices"
Many of the fields are copy-pastes of each other, but differ slightly
in placement of words, capitalization, or other random
oddities. This commit makes it so that if you simply do a search
replace on these following passages, you get the same thing:
s/NBLTYPESX/(NBLTYPESI|NBLTYPESL|NBLTYPESD)/g
s/CATEGORY/(insert-and-copy|literal|distance)/g
>>>
1-11 bits: NBLTYPESX, # of CATEGORY block types, encoded
with the same variable length code as above
Prefix code over the block type code alphabet for
CATEGORY block types, appears only if NBLTYPESX >= 2
Prefix code over the block count code alphabet for
CATEGORY block counts, appears only if NBLTYPESX >= 2
Block count code + Extra bits for first CATEGORY
block count, appears only if NBLTYPESX >= 2
<<<
>>>
Block type code for next CATEGORY block type, appears
only if NBLTYPESX >= 2 and the previous CATEGORY
block count is zero
Block count code + extra bits for next CATEGORY
block count, appears only if NBLTYPESX >= 2 and the
previous CATEGORY block count is zero
<<<
* Acknowledge the fact that the context map is conceptually really a
two-dimensional matrix with 2 different keys, but in reality stored
as a one-dimensional array.
* Mention that InverseMoveToFrontTransform will not cause the
context map to have invalid indexes. This gives someone implementing
a decoder sanity that they do not have to go through the context
map again and check that all values are less than NTREES.
* The phrase "difference between these distances" can either refer to
the conceptual difference (i.e. they hae different semantic meaning)
or to the mathematical difference (i.e. use substraction for the two).
Instead, just remove the sentence since the equations below make it
clear what we're supposed to do here.
* This value is useful in implementing the decoder since we can know
ahead-of-time what size buffer is needed to contain the output of a
transformed word.
* Rather than say "lower 3 bits" in one sentence and "bits 3-5" in
the sentence right after, just consistently use the same convention
and say "0-2" and "3-5".
* Provide exhaustive list of all the ways the last two bytes can be
sourced from.
* Also make a clear connection in this section that there are only 64
context IDs for literals. This is important for the indexing math
in context maps to make sense.
If bit-orderings are to be parsed from left-to-right,
then make the bit-strings left-justified.
If bit-orderings are to be parsed from right-to-left,
then make the bit-strings right-justified.
Section 3.1, which describes how prefix codes work
shows prefix codes that are "left-to-right", which
is better for demonstrating how the work. However,
most of the rest of the document uses a "right-to-left"
convention. We should distinctly say at the end of
section 3.1 that we are switching conventions.
Thus, change the prefix code in section 3.5 to be
"right-to-left" to be consistent with sections 9.1
and 9.2.
Also, change the variable names in section 7.3 to
be consistent with those used in section 10.
Also, change the description of MNIBBLES to be
"MNIBBLES - 4", similar to the convention of saying
"MLEN - 1". Beforehand, the phrase
"If MNIBBLES is 0, then ..." was unclear whether it
meant MNIBBLES before the "plus 4" or after.
Fixed minor whitespacing issues that caused print-out to be slightly
confusing. Biggest change is in section 9.2, where an indent seemed
to indicate that some fields were part of the previous field, when
they were not related.
Also, changed the order that transforms are described in section 8
to match the enumeration values that are explicitly defined in
Appendix B.
This is a partially backward incompatible format change,
that makes previously valid brotli streams that contain
larger than 16MB meta-blocks invalid.
The impact of this should be minimal, since the 'bro'
command-line tool does not create larger than 2MB
meta-blocks, so the only streams this change could
break are those created by a custom brotli encoder.
This commit contains only the specification update,
implementation in the decoder and encoder will
follow in later commits.
In the following three cases we allow more choices
for the compressor, which can potentially lead to
less compressed bits.
(1) Allow brotli streams where the block counts
do not count down to exactly zero at the end
of the meta-block. This makes it possible
for compressors to sometimes choose a block
count which can be represented with less bits
than the exact block count.
(2) Remove the restriction that prefix code
descriptions with exactly one non-zero
length symbol in the code length alphabet
must have 1 bit depth. This is because
bit depth 1 requires the most bits to encode.
(3) Allow any copy length value in the last
command where the copy part is ignored.
This makes it possible for a compressor
to choose a copy length which can be
represented with the least amount of bits.
In addition to the changes above, this commit also
has a wording clarification in the overview section
where the use of the 'context ID' expression is
changed to be consistent with the rest of the
specification, i.e. that it is a function of the
last two literals or the copy length.