Use consistent sentence spacing in the specification.

All sentence spacing was changed to one space, except
in the boilerplate which must be preserved verbatim.
This commit is contained in:
Zoltan Szabadka 2015-04-22 11:55:29 +02:00
parent 0c1a222159
commit 206d067c4a
2 changed files with 89 additions and 89 deletions

View File

@ -149,7 +149,7 @@ produce data sets that conform to all the specifications presented
here.
.ti 0
1.5. Definitions of terms and conventions used
1.5. Definitions of terms and conventions used
Byte: 8 bits stored or transmitted as a unit (same as an octet).
For this specification, a byte is exactly 8 bits, even on machines
@ -159,11 +159,11 @@ See below for the numbering of bits within a byte.
String: a sequence of arbitrary bytes.
Bytes stored within a computer do not have a "bit order", since
they are always treated as a unit. However, a byte considered as
they are always treated as a unit. However, a byte considered as
an integer between 0 and 255 does have a most- and least-
significant bit, and since we write numbers with the most-
significant digit on the left, we also write bytes with the most-
significant bit on the left. In the diagrams below, we number the
significant bit on the left. In the diagrams below, we number the
bits of a byte so that bit 0 is the least-significant bit, i.e.,
the bits are numbered:
@ -173,7 +173,7 @@ the bits are numbered:
+--------+
.fi
Within a computer, a number may occupy multiple bytes. All
Within a computer, a number may occupy multiple bytes. All
multi-byte numbers in the format described here are stored with
the least-significant byte first (at the lower memory address).
For example, the decimal number 520 is stored as:
@ -195,9 +195,9 @@ For example, the decimal number 520 is stored as:
This document does not address the issue of the order in which
bits of a byte are transmitted on a bit-sequential medium,
since the final data format described here is byte- rather than
bit-oriented. However, we describe the compressed block format
bit-oriented. However, we describe the compressed block format
below as a sequence of data elements of various bit
lengths, not a sequence of bytes. We must therefore specify
lengths, not a sequence of bytes. We must therefore specify
how to pack these data elements into bytes to form the final
compressed byte sequence:
@ -227,18 +227,18 @@ relative LSB position).
A compressed data set consists of a header and a series of meta-
blocks. Each meta-block decompresses to a sequence of 1
to 268,435,456 (256 MiB) uncompressed bytes. The final uncompressed data is
to 268,435,456 (256 MiB) uncompressed bytes. The final uncompressed data is
the concatenation of the uncompressed sequences from each meta-block.
The header contains the size of the sliding window that was used during compression.
The decompressor must retain at least that amount of uncompressed data prior to the
current position in the stream, in order to be able to decompress
what follows. The sliding window size is a power of two, minus 16, where
the power is in the range of 16 to 24. The possible sliding window
what follows. The sliding window size is a power of two, minus 16, where
the power is in the range of 16 to 24. The possible sliding window
sizes range from 64 KiB - 16 B to 16 MiB - 16 B.
Each meta-block is compressed using a combination of the LZ77
algorithm (Lempel-Ziv 1977, [LZ77]) and Huffman coding. The
algorithm (Lempel-Ziv 1977, [LZ77]) and Huffman coding. The
result of Huffman coding is referred to here as a prefix code.
The prefix codes for each meta-block are independent of
those for previous or subsequent meta-blocks; the LZ77 algorithm may
@ -253,8 +253,8 @@ compressed data part. The compressed data consists of a series of
commands. Each command consists of two parts: a sequence of literal
bytes (of strings that have not been detected as duplicated within
the sliding window), and a pointer to a duplicated string,
represented as a pair <length, backward distance>. There can be
zero literal bytes in the command. The minimum length of the string to be
represented as a pair <length, backward distance>. There can be
zero literal bytes in the command. The minimum length of the string to be
duplicated is two, but the last command in the meta-block is permitted to have
only literals and no pointer to a string to duplicate.
@ -265,9 +265,9 @@ copy lengths (that is, a single code word represents two lengths,
one of the literal sequence and one of the backward copy), a separate
set of prefix codes are for literals, and a third set of prefix codes are for
distances. The prefix code descriptions for each meta-block appear in a compact
form just before the compressed data in the meta-block header. The insert and
form just before the compressed data in the meta-block header. The insert and
copy length and distance prefix codes may be followed by extra bits that are
added to the base values determined by the codes. The number of extra bits is
added to the base values determined by the codes. The number of extra bits is
determined by the code.
One meta-block command then appears as a sequence of prefix codes:
@ -276,17 +276,17 @@ One meta-block command then appears as a sequence of prefix codes:
where the insert and copy defines the number of literals that immediately
follow and the copy length, and the distance defines how far back to go
for the copy, used in combination with the copy length. The resulting
for the copy, used in combination with the copy length. The resulting
uncompressed data is the sequence of bytes:
literal, literal, ..., literal, copy, copy, ..., copy
where the number of literal bytes and copy bytes are determined by the
insert and copy length code. (The number of bytes copied for a static
insert and copy length code. (The number of bytes copied for a static
dictionary entry can vary from the copy length.)
The last command in the meta-block may end with the last literal if the
total uncompressed length of the meta-block has been satisfied. In
total uncompressed length of the meta-block has been satisfied. In
that case there is no distance in the last command, and the copy length is
ignored.
@ -294,9 +294,9 @@ There can be more than one prefix code for each category, where the
prefix code to use for the next element of that category is determined
by the context of the compressed stream that precedes that element.
Part of that context is three current block types, one for each
category. A block type is in the range of 0..255. For each category
category. A block type is in the range of 0..255. For each category
there is a count of how many elements of that category remain to be
decoded using the current block type. Once that count is expended,
decoded using the current block type. Once that count is expended,
a new block type and block count is read from the stream immediately
preceding the next element of that category, which will use the new
block type.
@ -316,7 +316,7 @@ The meta-block here has four commands, contained in parentheses for clarity,
where each of the three categories of
symbols within these commands can be interpreted using different block types.
Here we separate out each category as its own sequence to show an example of block
types assigned to those elements. Each square-bracketed group is a block that
types assigned to those elements. Each square-bracketed group is a block that
uses the same block type:
[IaC0, IaC1][IaC2, IaC3] <-- insert-and-copy: block types 0 and 1
@ -343,8 +343,8 @@ meta-block is then:
where *BlockSwitch(t, n) switches to block type t for a count of n elements.
Note that in this example DBlockSwitch(1, 3) immediately precedes the
next required distance D1. It does not follow the last distance of
the previous block, D0. Whenever an element of a category is needed,
next required distance D1. It does not follow the last distance of
the previous block, D0. Whenever an element of a category is needed,
and the block count for that category has reached zero, then a new
block type and count is read from the stream just before reading that next
element.
@ -377,7 +377,7 @@ in the meta-block header,
and the two uncompressed bytes that were decoded from L0 and L1.
Similarly, the prefix code to use to decode D0 depends on the block
type (0), the distance context ID for block type 0, and the copy
length decoded from IaC0. The prefix code to use to decode IaC3
length decoded from IaC0. The prefix code to use to decode IaC3
depends only on the block type (1).
In addition to the parts listed above (prefix code for insert-
@ -391,7 +391,7 @@ A compressed meta-block may be marked in the header as the last meta-block,
which terminates the compressed stream.
A meta-block may instead simply store the uncompressed data directly as
bytes on byte boundaries with no coding or matching strings. In this
bytes on byte boundaries with no coding or matching strings. In this
case the meta-block header information only contains the number of
uncompressed bytes and the indication that the meta-block is uncompressed.
An uncompressed meta-block cannot be the last meta-block.
@ -417,7 +417,7 @@ edges descending from each non-leaf node are labeled 0 and 1 and
in which the leaf nodes correspond one-for-one with (are labeled
with) the symbols of the alphabet; then the code for a symbol is
the sequence of 0's and 1's on the edges leading from the root to
the leaf labeled with that symbol. For example:
the leaf labeled with that symbol. For example:
.nf
.KS
@ -492,7 +492,7 @@ from most- to least-significant bit. The code lengths are
initially in tree[I].Len; the codes are produced in tree[I].Code.
.nf
1) Count the number of codes for each code length. Let
1) Count the number of codes for each code length. Let
bl_count[N] be the number of codes of length N, N >= 1.
2) Find the numerical value of the smallest code for each
@ -527,7 +527,7 @@ initially in tree[I].Len; the codes are produced in tree[I].Code.
Example:
Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3, 3,
2, 4, 4). After step 1, we have:
2, 4, 4). After step 1, we have:
.nf
.KS
@ -605,7 +605,7 @@ The value of ALPHABET_BITS depends on the alphabet of the prefix
code: it is the smallest number of bits that can represent all
symbols in the alphabet. E.g. for the alphabet of literal bytes,
ALPHABET_BITS is 8. The value of each of the NSYM symbols above is
the value of the ALPHABETS_BITS width integer value. (If the integer
the value of the ALPHABETS_BITS width integer value. (If the integer
value is greater than or equal to the alphabet size, then the stream
should be rejected as invalid.)
@ -670,8 +670,8 @@ Note that a code of 16 that follows an immediately preceding 16 modifies the
previous repeat count, which becomes the new repeat count. The same is true for
a 17 following a 17. A sequence of three or more 16 codes in a row or three of
more 17 codes in a row is possible, modifying the count each time. Only the
final repeat count is used. The modification only applies if the same code
follows. A 16 repeat does not modify an immediately preceding 17 count, nor
final repeat count is used. The modification only applies if the same code
follows. A 16 repeat does not modify an immediately preceding 17 count, nor
vice versa.
A code length of 0 indicates that the corresponding symbol in the
@ -702,15 +702,15 @@ follows:
.nf
2 bits: HSKIP, values of 0, 2 or 3 represent the respective
number of skipped code lengths. The skipped lengths
are taken to be zero. (An HSKIP of 1 indicates a
number of skipped code lengths. The skipped lengths
are taken to be zero. (An HSKIP of 1 indicates a
Simple prefix code.)
Code lengths for symbols in the code length alphabet given
just above, in the order: 1, 2, 3, 4, 0, 5, 17, 6, 16, 7,
8, 9, 10, 11, 12, 13, 14, 15. If HSKIP is 2, then the
8, 9, 10, 11, 12, 13, 14, 15. If HSKIP is 2, then the
code lengths for symbols 1 and 2 are zero, and the first
code length is for symbol 3. If HSKIP is 3, then the code
code length is for symbol 3. If HSKIP is 3, then the code
length for symbol 3 is also zero, and the first code length
is for symbol 4.
@ -732,16 +732,16 @@ follows:
If the lengths have been read for the entire code length
alphabet and there was only one non-zero code length,
then the prefix code has one symbol whose code has zero
length. In this case, that symbol results in no bits
length. In this case, that symbol results in no bits
being emitted by the compressor, and no bits consumed by
the decompressor. That single symbol is immediately
returned when this code is decoded. (If the ignored non-
the decompressor. That single symbol is immediately
returned when this code is decoded. (If the ignored non-
zero length is not 1, then the stream should be rejected
as invalid.) An example of where this occurs is if the
entire code to be represented has symbols of length 8.
E.g. a literal code that represents all literal values
with equal probability. In this case the single symbol
is 16, which repeats the previous length. The previous
with equal probability. In this case the single symbol
is 16, which repeats the previous length. The previous
length is taken to be 8 before any code length code
lengths are read.
@ -966,11 +966,11 @@ meta-block header.
Since the first block type of each block category is 0, the block
type of the first block switch command is not encoded in
the compressed data. Instead the block count for each category
the compressed data. Instead the block count for each category
that has more than one type is encoded in the meta-block header.
The block counts for all three categories should count down to exactly
zero at the end of the meta-block. If any do not, then the stream
zero at the end of the meta-block. If any do not, then the stream
should be rejected as invalid.
The number of different block types in each block category, denoted

View File

@ -181,7 +181,7 @@ Internet-Draft Brotli April 2015
specifications presented here. A compliant compressor must produce
data sets that conform to all the specifications presented here.
1.5. Definitions of terms and conventions used
1.5. Definitions of terms and conventions used
Byte: 8 bits stored or transmitted as a unit (same as an octet). For
this specification, a byte is exactly 8 bits, even on machines which
@ -191,19 +191,19 @@ Internet-Draft Brotli April 2015
String: a sequence of arbitrary bytes.
Bytes stored within a computer do not have a "bit order", since they
are always treated as a unit. However, a byte considered as an
are always treated as a unit. However, a byte considered as an
integer between 0 and 255 does have a most- and least- significant
bit, and since we write numbers with the most- significant digit on
the left, we also write bytes with the most- significant bit on the
left. In the diagrams below, we number the bits of a byte so that
bit 0 is the least-significant bit, i.e., the bits are numbered:
left. In the diagrams below, we number the bits of a byte so that bit
0 is the least-significant bit, i.e., the bits are numbered:
+--------+
|76543210|
+--------+
Within a computer, a number may occupy multiple bytes. All multi-
byte numbers in the format described here are stored with the least-
Within a computer, a number may occupy multiple bytes. All multi-byte
numbers in the format described here are stored with the least-
significant byte first (at the lower memory address). For example,
the decimal number 520 is stored as:
@ -230,7 +230,7 @@ Internet-Draft Brotli April 2015
data format described here is byte- rather than bit-oriented.
However, we describe the compressed block format below as a sequence
of data elements of various bit lengths, not a sequence of bytes. We
of data elements of various bit lengths, not a sequence of bytes. We
must therefore specify how to pack these data elements into bytes to
form the final compressed byte sequence:
@ -256,21 +256,21 @@ Internet-Draft Brotli April 2015
A compressed data set consists of a header and a series of meta-
blocks. Each meta-block decompresses to a sequence of 1 to
268,435,456 (256 MiB) uncompressed bytes. The final uncompressed
data is the concatenation of the uncompressed sequences from each
meta-block.
268,435,456 (256 MiB) uncompressed bytes. The final uncompressed data
is the concatenation of the uncompressed sequences from each meta-
block.
The header contains the size of the sliding window that was used
during compression. The decompressor must retain at least that
amount of uncompressed data prior to the current position in the
stream, in order to be able to decompress what follows. The sliding
stream, in order to be able to decompress what follows. The sliding
window size is a power of two, minus 16, where the power is in the
range of 16 to 24. The possible sliding window sizes range from 64
range of 16 to 24. The possible sliding window sizes range from 64
KiB - 16 B to 16 MiB - 16 B.
Each meta-block is compressed using a combination of the LZ77
algorithm (Lempel-Ziv 1977, [LZ77]) and Huffman coding. The result
of Huffman coding is referred to here as a prefix code. The prefix
algorithm (Lempel-Ziv 1977, [LZ77]) and Huffman coding. The result of
Huffman coding is referred to here as a prefix code. The prefix
codes for each meta-block are independent of those for previous or
subsequent meta-blocks; the LZ77 algorithm may use a reference to a
duplicated string occurring in a previous meta-block, up to the
@ -292,8 +292,8 @@ Internet-Draft Brotli April 2015
commands. Each command consists of two parts: a sequence of literal
bytes (of strings that have not been detected as duplicated within
the sliding window), and a pointer to a duplicated string,
represented as a pair <length, backward distance>. There can be zero
literal bytes in the command. The minimum length of the string to be
represented as a pair <length, backward distance>. There can be zero
literal bytes in the command. The minimum length of the string to be
duplicated is two, but the last command in the meta-block is
permitted to have only literals and no pointer to a string to
duplicate.
@ -306,9 +306,9 @@ Internet-Draft Brotli April 2015
backward copy), a separate set of prefix codes are for literals, and
a third set of prefix codes are for distances. The prefix code
descriptions for each meta-block appear in a compact form just before
the compressed data in the meta-block header. The insert and copy
the compressed data in the meta-block header. The insert and copy
length and distance prefix codes may be followed by extra bits that
are added to the base values determined by the codes. The number of
are added to the base values determined by the codes. The number of
extra bits is determined by the code.
One meta-block command then appears as a sequence of prefix codes:
@ -318,12 +318,12 @@ Internet-Draft Brotli April 2015
where the insert and copy defines the number of literals that
immediately follow and the copy length, and the distance defines how
far back to go for the copy, used in combination with the copy
length. The resulting uncompressed data is the sequence of bytes:
length. The resulting uncompressed data is the sequence of bytes:
literal, literal, ..., literal, copy, copy, ..., copy
where the number of literal bytes and copy bytes are determined by
the insert and copy length code. (The number of bytes copied for a
the insert and copy length code. (The number of bytes copied for a
static dictionary entry can vary from the copy length.)
The last command in the meta-block may end with the last literal if
@ -343,10 +343,10 @@ Internet-Draft Brotli April 2015
prefix code to use for the next element of that category is
determined by the context of the compressed stream that precedes that
element. Part of that context is three current block types, one for
each category. A block type is in the range of 0..255. For each
each category. A block type is in the range of 0..255. For each
category there is a count of how many elements of that category
remain to be decoded using the current block type. Once that count
is expended, a new block type and block count is read from the stream
remain to be decoded using the current block type. Once that count is
expended, a new block type and block count is read from the stream
immediately preceding the next element of that category, which will
use the new block type.
@ -364,7 +364,7 @@ Internet-Draft Brotli April 2015
clarity, where each of the three categories of symbols within these
commands can be interpreted using different block types. Here we
separate out each category as its own sequence to show an example of
block types assigned to those elements. Each square-bracketed group
block types assigned to those elements. Each square-bracketed group
is a block that uses the same block type:
[IaC0, IaC1][IaC2, IaC3] <-- insert-and-copy: block types 0 and 1
@ -398,11 +398,11 @@ Internet-Draft Brotli April 2015
where *BlockSwitch(t, n) switches to block type t for a count of n
elements. Note that in this example DBlockSwitch(1, 3) immediately
precedes the next required distance D1. It does not follow the last
distance of the previous block, D0. Whenever an element of a
category is needed, and the block count for that category has reached
zero, then a new block type and count is read from the stream just
before reading that next element.
precedes the next required distance D1. It does not follow the last
distance of the previous block, D0. Whenever an element of a category
is needed, and the block count for that category has reached zero,
then a new block type and count is read from the stream just before
reading that next element.
The block switch commands for the first blocks of each category are
not part of the meta-block compressed data. Instead the first block
@ -431,7 +431,7 @@ Internet-Draft Brotli April 2015
meta-block header, and the two uncompressed bytes that were decoded
from L0 and L1. Similarly, the prefix code to use to decode D0
depends on the block type (0), the distance context ID for block type
0, and the copy length decoded from IaC0. The prefix code to use to
0, and the copy length decoded from IaC0. The prefix code to use to
decode IaC3 depends only on the block type (1).
In addition to the parts listed above (prefix code for insert- and-
@ -453,7 +453,7 @@ Internet-Draft Brotli April 2015
A meta-block may instead simply store the uncompressed data directly
as bytes on byte boundaries with no coding or matching strings. In
as bytes on byte boundaries with no coding or matching strings. In
this case the meta-block header information only contains the number
of uncompressed bytes and the indication that the meta-block is
uncompressed. An uncompressed meta-block cannot be the last meta-
@ -479,7 +479,7 @@ Internet-Draft Brotli April 2015
which the leaf nodes correspond one-for-one with (are labeled with)
the symbols of the alphabet; then the code for a symbol is the
sequence of 0's and 1's on the edges leading from the root to the
leaf labeled with that symbol. For example:
leaf labeled with that symbol. For example:
/\ Symbol Code
0 1 ------ ----
@ -549,7 +549,7 @@ Internet-Draft Brotli April 2015
significant bit. The code lengths are initially in tree[I].Len; the
codes are produced in tree[I].Code.
1) Count the number of codes for each code length. Let
1) Count the number of codes for each code length. Let
bl_count[N] be the number of codes of length N, N >= 1.
2) Find the numerical value of the smallest code for each
@ -588,7 +588,7 @@ Internet-Draft Brotli April 2015
Example:
Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3, 3, 2,
4, 4). After step 1, we have:
4, 4). After step 1, we have:
N bl_count[N]
- -----------
@ -662,7 +662,7 @@ Internet-Draft Brotli April 2015
code: it is the smallest number of bits that can represent all
symbols in the alphabet. E.g. for the alphabet of literal bytes,
ALPHABET_BITS is 8. The value of each of the NSYM symbols above is
the value of the ALPHABETS_BITS width integer value. (If the integer
the value of the ALPHABETS_BITS width integer value. (If the integer
value is greater than or equal to the alphabet size, then the stream
should be rejected as invalid.)
@ -740,8 +740,8 @@ Internet-Draft Brotli April 2015
count. The same is true for a 17 following a 17. A sequence of three
or more 16 codes in a row or three of more 17 codes in a row is
possible, modifying the count each time. Only the final repeat count
is used. The modification only applies if the same code follows. A
16 repeat does not modify an immediately preceding 17 count, nor vice
is used. The modification only applies if the same code follows. A 16
repeat does not modify an immediately preceding 17 count, nor vice
versa.
A code length of 0 indicates that the corresponding symbol in the
@ -765,15 +765,15 @@ Internet-Draft Brotli April 2015
We can now define the format of the complex prefix code as follows:
2 bits: HSKIP, values of 0, 2 or 3 represent the respective
number of skipped code lengths. The skipped lengths
are taken to be zero. (An HSKIP of 1 indicates a
number of skipped code lengths. The skipped lengths
are taken to be zero. (An HSKIP of 1 indicates a
Simple prefix code.)
Code lengths for symbols in the code length alphabet given
just above, in the order: 1, 2, 3, 4, 0, 5, 17, 6, 16, 7,
8, 9, 10, 11, 12, 13, 14, 15. If HSKIP is 2, then the
8, 9, 10, 11, 12, 13, 14, 15. If HSKIP is 2, then the
code lengths for symbols 1 and 2 are zero, and the first
code length is for symbol 3. If HSKIP is 3, then the code
code length is for symbol 3. If HSKIP is 3, then the code
length for symbol 3 is also zero, and the first code length
is for symbol 4.
@ -803,16 +803,16 @@ Internet-Draft Brotli April 2015
If the lengths have been read for the entire code length
alphabet and there was only one non-zero code length,
then the prefix code has one symbol whose code has zero
length. In this case, that symbol results in no bits
length. In this case, that symbol results in no bits
being emitted by the compressor, and no bits consumed by
the decompressor. That single symbol is immediately
returned when this code is decoded. (If the ignored non-
the decompressor. That single symbol is immediately
returned when this code is decoded. (If the ignored non-
zero length is not 1, then the stream should be rejected
as invalid.) An example of where this occurs is if the
entire code to be represented has symbols of length 8.
E.g. a literal code that represents all literal values
with equal probability. In this case the single symbol
is 16, which repeats the previous length. The previous
with equal probability. In this case the single symbol
is 16, which repeats the previous length. The previous
length is taken to be 8 before any code length code
lengths are read.
@ -1075,11 +1075,11 @@ Internet-Draft Brotli April 2015
Since the first block type of each block category is 0, the block
type of the first block switch command is not encoded in the
compressed data. Instead the block count for each category that has
compressed data. Instead the block count for each category that has
more than one type is encoded in the meta-block header.
The block counts for all three categories should count down to
exactly zero at the end of the meta-block. If any do not, then the
exactly zero at the end of the meta-block. If any do not, then the
stream should be rejected as invalid.
The number of different block types in each block category, denoted