Merge pull request #302 from inikep/Documentation

improved documentation
2016-08-25 17:00:33 +02:00 · 2016-08-25 17:00:33 +02:00 · baf7ecbdfd
commit baf7ecbdfd
parent 24a738c668 12731a9644
7 changed files with 269 additions and 133 deletions
--- a/programs/Makefile
+++ b/programs/Makefile
@ -24,9 +24,9 @@
 # zstd : Command Line Utility, supporting gzip-like arguments
 # zstd32 : Same as zstd, but forced to compile in 32-bits mode
 # zstd_nolegacy : zstd without support of decompression of legacy versions
-# zstd-small: minimal zstd without dictBuilder and bench
-# zstd-compress: compressor-only version of zstd
-# zstd-decompress: decompressor-only version of zstd
+# zstd-small : minimal zstd without dictionary builder and benchmark
+# zstd-compress : compressor-only version of zstd
+# zstd-decompress : decompressor-only version of zstd
 # ##########################################################################

 DESTDIR?=
--- a/programs/README.md
+++ b/programs/README.md
@ -0,0 +1,94 @@
+zstd - Command Line Interface
+================================
+
+Command Line Interface (CLI) can be created using the `make` command without any additional parameters.
+There are however other Makefile targets that create different variations of CLI:
+- `zstd` : default CLI supporting gzip-like arguments; includes dictionary builder, benchmark, and support for decompression of legacy zstd versions
+- `zstd32` : Same as `zstd`, but forced to compile in 32-bits mode
+- `zstd_nolegacy` : Same as `zstd` except of support for decompression of legacy zstd versions
+- `zstd-small` : CLI optimized for minimal size; without dictionary builder, benchmark, and support for decompression of legacy zstd versions
+- `zstd-compress` : compressor-only version of CLI; without dictionary builder, benchmark, and support for decompression of legacy zstd versions
+- `zstd-decompress` : decompressor-only version of CLI; without dictionary builder, benchmark, and support for decompression of legacy zstd versions
+
+
+#### Aggregation of parameters
+CLI supports aggregation of parameters i.e. `-b1`, `-e18`, and `-i1` can be joined into `-b1e18i1`. 
+
+
+#### Dictionary builder in Command Line Interface
+Zstd offers a training mode, which can be used to tune the algorithm for a selected
+type of data, by providing it with a few samples. The result of the training is stored
+in a file selected with the `-o` option (default name is `dictionary`),
+which can be loaded before compression and decompression.
+
+Using a dictionary, the compression ratio achievable on small data improves dramatically.
+These compression gains are achieved while simultaneously providing faster compression and decompression speeds.
+Dictionary work if there is some correlation in a family of small data (there is no universal dictionary). 
+Hence, deploying one dictionary per type of data will provide the greater benefits.
+Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm
+will rely more and more on previously decoded content to compress the rest of the file.
+
+Usage of the dictionary builder and created dictionaries with CLI:
+
+1. Create the dictionary : `zstd --train FullPathToTrainingSet/* -o dictionaryName`
+2. Compress with the dictionary: `zstd FILE -D dictionaryName`
+3. Decompress with the dictionary: `zstd --decompress FILE.zst -D dictionaryName`
+
+
+
+#### Benchmark in Command Line Interface
+CLI includes in-memory compression benchmark module for zstd.
+The benchmark is conducted using given filenames. The files are read into memory and joined together.
+It makes benchmark more precise as it eliminates I/O overhead.
+Many filenames can be supplied as multiple parameters, parameters with wildcards or
+names of directories can be used as parameters with the `-r` option.
+
+The benchmark measures ratio, compressed size, compression and decompression speed.
+One can select compression levels starting from `-b` and ending with `-e`.
+The `-i` parameter selects minimal time used for each of tested levels.
+
+
+
+#### Usage of Command Line Interface
+The full list of options can be obtained with `-h` or `-H` parameter:
+```
+Usage :
+      zstd [args] [FILE(s)] [-o file]
+
+FILE    : a filename
+          with no FILE, or when FILE is - , read standard input
+Arguments :
+ -#     : # compression level (1-19, default:3)
+ -d     : decompression
+ -D file: use `file` as Dictionary
+ -o file: result stored into `file` (only if 1 input file)
+ -f     : overwrite output without prompting
+--rm    : remove source file(s) after successful de/compression
+ -k     : preserve source file(s) (default)
+ -h/-H  : display help/long help and exit
+
+Advanced arguments :
+ -V     : display Version number and exit
+ -v     : verbose mode; specify multiple times to increase log level (default:2)
+ -q     : suppress warnings; specify twice to suppress errors too
+ -c     : force write to standard output, even if it is the console
+ -r     : operate recursively on directories
+--ultra : enable levels beyond 19, up to 22 (requires more memory)
+--no-dictID : don't write dictID into header (dictionary compression)
+--[no-]check : integrity check (default:enabled)
+--test  : test compressed file integrity
+--[no-]sparse : sparse mode (default:enabled on file, disabled on stdout)
+
+Dictionary builder :
+--train ## : create a dictionary from a training set of files
+ -o file : `file` is dictionary name (default: dictionary)
+--maxdict ## : limit dictionary to specified size (default : 112640)
+ -s#    : dictionary selectivity level (default: 9)
+--dictID ## : force dictionary ID to specified value (default: random)
+
+Benchmark arguments :
+ -b#    : benchmark file(s), using # compression level (default : 1)
+ -e#    : test all compression levels from -bX to # (default: 1)
+ -i#    : minimum evaluation time in seconds (default : 3s)
+ -B#    : cut file into independent blocks of size # (default: no block)
+ ```
--- a/projects/README.md
+++ b/projects/README.md
@ -4,7 +4,20 @@ projects for various integrated development environments (IDE)
 #### Included projects

 The following projects are included with the zstd distribution:
- cmake - CMake project contributed by Artyom Dymchenko
- VS2008 - Visual Studio 2008 project
- VS2010 - Visual Studio 2010 project (which also works well with Visual Studio 2012, 2013, 2015)
- build - command line scripts prepared for Visual Studio compilation without IDE
+- `cmake` - CMake project contributed by Artyom Dymchenko
+- `VS2005` - Visual Studio 2005 project
+- `VS2008` - Visual Studio 2008 project
+- `VS2010` - Visual Studio 2010 project (which also works well with Visual Studio 2012, 2013, 2015)
+- `build` - command line scripts prepared for Visual Studio compilation without IDE
+
+
+#### How to compile zstd with Visual Studio
+
+1. Install Visual Studio e.g. VS 2015 Community Edition (it's free).
+2. Download the latest version of zstd from https://github.com/Cyan4973/zstd/releases
+3. Decompress ZIP archive.
+4. Go to decompressed directory then to `projects` then `VS2010` and open `zstd.sln`
+5. Visual Studio will ask about converting VS2010 project to VS2015 and you should agree.
+6. Change `Debug` to `Release` and if you have 64-bit Windows change also `Win32` to `x64`.
+7. Press F7 on keyboard or select `BUILD` from the menu bar and choose `Build Solution`.
+8. If compilation will be fine a compiled executable will be in `projects\VS2010\bin\x64\Release\zstd.exe`
--- a/tests/Makefile
+++ b/tests/Makefile
@ -22,13 +22,17 @@
 #  - zstd homepage : http://www.zstd.net/
 # ##########################################################################
 # datagen : Synthetic and parametrable data generator, for tests
+# fullbench  : Precisely measure speed for each zstd inner functions
+# fullbench32: Same as fullbench, but forced to compile in 32-bits mode
 # fuzzer  : Test tool, to check zstd integrity on target platform
 # fuzzer32: Same as fuzzer, but forced to compile in 32-bits mode
+# paramgrill : parameter tester for zstd
+# test-zstd-speed.py : script for testing zstd speed difference between commits
+# versionsTest : compatibility test between zstd versions stored on Github (v0.1+)
 # zbufftest  : Test tool, to check ZBUFF integrity on target platform
 # zbufftest32: Same as zbufftest, but forced to compile in 32-bits mode
-# fullbench  : Precisely measure speed for each zstd inner function
-# fullbench32: Same as fullbench, but forced to compile in 32-bits mode
-# versionstest : Compatibility test between zstd versions stored on Github (v0.1+)
+# zstreamtest : Fuzzer test tool for zstd streaming API
+# zbufftest32: Same as zstreamtest, but forced to compile in 32-bits mode
 # ##########################################################################

 DESTDIR?=
--- a/tests/README.md
+++ b/tests/README.md
@ -1,14 +1,25 @@
-scripts for automated testing of zstd
+programs and scripts for automated testing of zstd
 ================================

-#### test-zstd-versions.py - script for testing zstd interoperability between versions
+This directory contains the following programs and scripts:
+- `datagen` : Synthetic and parametrable data generator, for tests
+- `fullbench`  : Precisely measure speed for each zstd inner functions
+- `fuzzer`  : Test tool, to check zstd integrity on target platform
+- `paramgrill` : parameter tester for zstd
+- `test-zstd-speed.py` : script for testing zstd speed difference between commits
+- `test-zstd-versions.py` : compatibility test between zstd versions stored on Github (v0.1+)
+- `zbufftest`  : Test tool to check ZBUFF (a buffered streaming API) integrity
+- `zstreamtest` : Fuzzer test tool for zstd streaming API
+
+
+#### `test-zstd-versions.py` - script for testing zstd interoperability between versions

 This script creates `versionsTest` directory to which zstd repository is cloned.
 Then all taged (released) versions of zstd are compiled.
 In the following step interoperability between zstd versions is checked.


-#### test-zstd-speed.py - script for testing zstd speed difference between commits
+#### `test-zstd-speed.py` - script for testing zstd speed difference between commits

 This script creates `speedTest` directory to which zstd repository is cloned.
 Then it compiles all branches of zstd and performs a speed benchmark for a given list of files (the `testFileNames` parameter).
--- a/tests/zbufftest.c
+++ b/tests/zbufftest.c
@ -1,5 +1,5 @@
 /*
-    Fuzzer test tool for zstd_buffered
+    Fuzzer test tool for ZBUFF - a buffered streaming API for ZSTD
    Copyright (C) Yann Collet 2015-2016

    GPL v2 License
--- a/zstd_compression_format.md
+++ b/zstd_compression_format.md
@ -97,6 +97,42 @@ to decode all concatenated frames in their sequential order,
 delivering the final decompressed result as if it was a single content.


+Skippable Frames
+----------------
+
+| `Magic_Number` | `Frame_Size` | `User_Data` |
+|:--------------:|:------------:|:-----------:|
+|   4 bytes      |  4 bytes     |   n bytes   |
+
+Skippable frames allow the insertion of user-defined data
+into a flow of concatenated frames.
+Its design is pretty straightforward,
+with the sole objective to allow the decoder to quickly skip
+over user-defined data and continue decoding.
+
+Skippable frames defined in this specification are compatible with [LZ4] ones.
+
+[LZ4]:http://www.lz4.org
+
+__`Magic_Number`__
+
+4 Bytes, little-endian format.
+Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
+All 16 values are valid to identify a skippable frame.
+
+__`Frame_Size`__
+
+This is the size, in bytes, of the following `User_Data`
+(without including the magic number nor the size field itself).
+This field is represented using 4 Bytes, little-endian format, unsigned 32-bits.
+This means `User_Data` can’t be bigger than (2^32-1) bytes.
+
+__`User_Data`__
+
+The `User_Data` can be anything. Data will just be skipped by the decoder.
+
+
+
 General Structure of Zstandard Frame format
 -------------------------------------------
 The structure of a single Zstandard frame is following:
@ -163,9 +199,9 @@ The `Flag_Value` can be converted into `Field_Size`,
 which is the number of bytes used by `Frame_Content_Size`
 according to the following table:

-|`Flag_Value`|  0  |  1  |  2  |  3  |
-| ---------- | --- | --- | --- | --- |
-|`Field_Size`| 0-1 |  2  |  4  |  8  |
+|`Flag_Value`|    0   |  1  |  2  |  3  |
+| ---------- | ------ | --- | --- | --- |
+|`Field_Size`| 0 or 1 |  2  |  4  |  8  |

 When `Flag_Value` is `0`, `Field_Size` depends on `Single_Segment_flag` :
 if `Single_Segment_flag` is set, `Field_Size` is 1.
@ -235,7 +271,7 @@ which can be any value from 1 to 2^64-1 bytes (16 EB).
 | ----------- | ---------- | ---------- |
 | Field name  | `Exponent` | `Mantissa` |

-Maximum distance is given by the following formulae :
+Maximum distance is given by the following formulas :
 ```
 windowLog = 10 + Exponent;
 windowBase = 1 << windowLog;
@ -361,40 +397,6 @@ up to `Block_Maximum_Decompressed_Size`, which is the smallest of :
 - 128 KB


-Skippable Frames
----------------
-
-| `Magic_Number` | `Frame_Size` | `User_Data` |
-|:--------------:|:------------:|:-----------:|
-|   4 bytes      |  4 bytes     |   n bytes   |
-
-Skippable frames allow the insertion of user-defined data
-into a flow of concatenated frames.
-Its design is pretty straightforward,
-with the sole objective to allow the decoder to quickly skip
-over user-defined data and continue decoding.
-
-Skippable frames defined in this specification are compatible with [LZ4] ones.
-
-[LZ4]:http://www.lz4.org
-
-__`Magic_Number`__
-
-4 Bytes, little-endian format.
-Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F.
-All 16 values are valid to identify a skippable frame.
-
-__`Frame_Size`__
-
-This is the size, in bytes, of the following `User_Data`
-(without including the magic number nor the size field itself).
-This field is represented using 4 Bytes, little-endian format, unsigned 32-bits.
-This means `User_Data` can’t be bigger than (2^32-1) bytes.
-
-__`User_Data`__
-
-The `User_Data` can be anything. Data will just be skipped by the decoder.
-

 The format of `Compressed_Block`
 --------------------------------
@ -413,7 +415,7 @@ To decode a compressed block, the following elements are necessary :
  or all previous blocks when `Single_Segment_flag` is set.
 - List of "recent offsets" from previous compressed block.
 - Decoding tables of previous compressed block for each symbol type
-  (literals, litLength, matchLength, offset).
+  (literals, literals lengths, match lengths, offsets).


 ### `Literals_Section`
@ -447,9 +449,12 @@ __`Literals_Block_Type`__

 This field uses 2 lowest bits of first byte, describing 4 different block types :

-|       Value           |  0                   |  1                   |      2                      |       3                       |
-| --------------------- | -------------------- | -------------------- | --------------------------- | ----------------------------- |
-| `Literals_Block_Type` | `Raw_Literals_Block` | `RLE_Literals_Block` | `Compressed_Literals_Block` | `Repeat_Stats_Literals_Block` |
+| `Literals_Block_Type`         | Value |
+| ----------------------------- | ----- |
+| `Raw_Literals_Block`          |   0   |
+| `RLE_Literals_Block`          |   1   |
+| `Compressed_Literals_Block`   |   2   |
+| `Repeat_Stats_Literals_Block` |   3   |

 - `Raw_Literals_Block` - Literals are stored uncompressed.
 - `RLE_Literals_Block` - Literals consist of a single byte value repeated N times.
@ -466,37 +471,37 @@ __`Size_Format`__

 - For `Compressed_Block`, it requires to decode both `Compressed_Size`
  and `Regenerated_Size` (the decompressed size). It will also decode the number of streams.
- For `Raw_Block` and `RLE_Block` it's enough to decode `Regenerated_Size`.
+- For `Raw_Literals_Block` and `RLE_Literals_Block` it's enough to decode `Regenerated_Size`.

 For values spanning several bytes, convention is little-endian.

 __`Size_Format` for `Raw_Literals_Block` and `RLE_Literals_Block`__ :

- Value : x0 : `Regenerated_Size` uses 5 bits (0-31).
+- Value x0 : `Regenerated_Size` uses 5 bits (0-31).
               `Literals_Section_Header` has 1 byte.
               `Regenerated_Size = Header[0]>>3`
- Value : 01 : `Regenerated_Size` uses 12 bits (0-4095).
+- Value 01 : `Regenerated_Size` uses 12 bits (0-4095).
               `Literals_Section_Header` has 2 bytes.
               `Regenerated_Size = (Header[0]>>4) + (Header[1]<<4)`
- Value : 11 : `Regenerated_Size` uses 20 bits (0-1048575).
+- Value 11 : `Regenerated_Size` uses 20 bits (0-1048575).
               `Literals_Section_Header` has 3 bytes.
               `Regenerated_Size = (Header[0]>>4) + (Header[1]<<4) + (Header[2]<<12)`

-Note : it's allowed to represent a short value (ex : `13`)
-using a long format, accepting the reduced compacity.
+Note : it's allowed to represent a short value (for example `13`)
+using a long format, accepting the increased compressed data size.

 __`Size_Format` for `Compressed_Literals_Block` and `Repeat_Stats_Literals_Block`__ :

- Value : 00 : _Single stream_.
+- Value 00 : _A single stream_.
               Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
               `Literals_Section_Header` has 3 bytes.
- Value : 01 : 4 streams.
+- Value 01 : 4 streams.
               Both `Compressed_Size` and `Regenerated_Size` use 10 bits (0-1023).
               `Literals_Section_Header` has 3 bytes.
- Value : 10 : 4 streams.
+- Value 10 : 4 streams.
               Both `Compressed_Size` and `Regenerated_Size` use 14 bits (0-16383).
               `Literals_Section_Header` has 4 bytes.
- Value : 11 : 4 streams.
+- Value 11 : 4 streams.
               Both `Compressed_Size` and `Regenerated_Size` use 18 bits (0-262143).
               `Literals_Section_Header` has 5 bytes.

@ -505,7 +510,7 @@ Both `Compressed_Size` and `Regenerated_Size` fields follow little-endian conven

 #### `Huffman_Tree_Description`

-This section is only present when `Literals_Block_Type` type is `Compressed_Block` (`2`).
+This section is only present when `Literals_Block_Type` type is `Compressed_Literals_Block` (`2`).

 Prefix coding represents symbols from an a priori known alphabet
 by bit sequences (codewords), one codeword for each symbol,
@ -527,9 +532,11 @@ This specification limits maximum code length to 11 bits.
 ##### Representation

 All literal values from zero (included) to last present one (excluded)
-are represented by `Weight` values, from 0 to `Max_Number_of_Bits`.
-Transformation from `Weight` to `Number_of_Bits` follows this formulae :
-`Number_of_Bits = Weight ? Max_Number_of_Bits + 1 - Weight : 0` .
+are represented by `Weight` with values from `0` to `Max_Number_of_Bits`.
+Transformation from `Weight` to `Number_of_Bits` follows this formula :
+```
+Number_of_Bits = Weight ? (Max_Number_of_Bits + 1 - Weight) : 0
+```
 The last symbol's `Weight` is deduced from previously decoded ones,
 by completing to the nearest power of 2.
 This power of 2 gives `Max_Number_of_Bits`, the depth of the current tree.
@ -544,7 +551,10 @@ Let's presume the following Huffman tree must be described :
 The tree depth is 4, since its smallest element uses 4 bits.
 Value `5` will not be listed, nor will values above `5`.
 Values from `0` to `4` will be listed using `Weight` instead of `Number_of_Bits`.
-Weight formula is : `Weight = Number_of_Bits ? Max_Number_of_Bits + 1 - Number_of_Bits : 0`.
+Weight formula is : 
+```
+Weight = Number_of_Bits ? (Max_Number_of_Bits + 1 - Number_of_Bits) : 0
+```
 It gives the following serie of weights :

 | `Weight` |  4  |  3  |  2  |  0  |  1  |
@ -575,9 +585,9 @@ which tells how to decode the list of weights.

 - if `headerByte` < 128 :
  the serie of weights is compressed by FSE.
-  The length of the FSE-compressed serie is `headerByte` (0-127).
+  The length of the FSE-compressed serie is equal to `headerByte` (0-127).

-##### FSE (Finite State Entropy) compression of Huffman weights
+##### Finite State Entropy (FSE) compression of Huffman weights

 The serie of weights is compressed using FSE compression.
 It's a single bitstream with 2 interleaved states,
@ -607,9 +617,10 @@ When both states have overflowed the bitstream, end is reached.
 ##### Conversion from weights to Huffman prefix codes

 All present symbols shall now have a `Weight` value.
-It is possible to transform weights into Number_of_Bits, using this formula :
-`Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0` .
-
+It is possible to transform weights into Number_of_Bits, using this formula:
+```
+Number_of_Bits = Number_of_Bits ? Max_Number_of_Bits + 1 - Weight : 0
+```
 Symbols are sorted by `Weight`. Within same `Weight`, symbols keep natural order.
 Symbols with a `Weight` of zero are removed.
 Then, starting from lowest weight, prefix codes are distributed in order.
@ -631,21 +642,21 @@ it gives the following distribution :
 | prefix codes     | N/A | 0000| 0001| 001 | 01  |   1  |


-#### Literals bitstreams
+#### The content of Huffman-compressed literal stream

 ##### Bitstreams sizes

 As seen in a previous paragraph,
-there are 2 flavors of Huffman-compressed literals :
-single stream, and 4-streams.
+there are 2 types of Huffman-compressed literals :
+a single stream and 4 streams.

-4-streams is useful for CPU with multiple execution units and out-of-order operations.
+Encoding using 4 streams is useful for CPU with multiple execution units and out-of-order operations.
 Since each stream can be decoded independently,
 it's possible to decode them up to 4x faster than a single stream,
 presuming the CPU has enough parallelism available.

 For single stream, header provides both the compressed and regenerated size.
-For 4-streams though,
+For 4 streams though,
 header only provides compressed and regenerated size of all 4 streams combined.
 In order to properly decode the 4 streams,
 it's necessary to know the compressed and regenerated size of each stream.
@ -658,8 +669,10 @@ bitstreams are preceded by 3 unsigned little-endian 16-bits values.
 Each value represents the compressed size of one stream, in order.
 The last stream size is deducted from total compressed size
 and from previously decoded stream sizes :
+
 `stream4CSize = totalCSize - 6 - stream1CSize - stream2CSize - stream3CSize`.

+
 ##### Bitstreams read and decode

 Each bitstream must be read _backward_,
@ -701,23 +714,18 @@ When all _sequences_ are decoded,
 if there is any literal left in the _literal section_,
 these bytes are added at the end of the block.

-The _Sequences_Section_ regroup all symbols required to decode commands.
+The `Sequences_Section` regroup all symbols required to decode commands.
 There are 3 symbol types : literals lengths, offsets and match lengths.
 They are encoded together, interleaved, in a single _bitstream_.

-Each symbol is a _code_ in its own context,
-which specifies a baseline and a number of bits to add.
-_Codes_ are FSE compressed,
-and interleaved with raw additional bits in the same bitstream.
-
-The Sequences section starts by a header,
-followed by optional Probability tables for each symbol type,
+The `Sequences_Section` starts by a header,
+followed by optional probability tables for each symbol type,
 followed by the bitstream.

 | `Sequences_Section_Header` | [`Literals_Length_Table`] | [`Offset_Table`] | [`Match_Length_Table`] | bitStream |
 | -------------------------- | ------------------------- | ---------------- | ---------------------- | --------- |

-To decode the Sequence section, it's required to know its size.
+To decode the `Sequences_Section`, it's required to know its size.
 This size is deducted from `blockSize - literalSectionSize`.


@ -748,8 +756,8 @@ This is a single byte, defining the compression mode of each symbol type.

 The last field, `Reserved`, must be all-zeroes.

-`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the compression mode of
-literals lengths, offsets and match lengths respectively.
+`Literals_Lengths_Mode`, `Offsets_Mode` and `Match_Lengths_Mode` define the `Compression_Mode` of
+literals lengths, offsets, and match lengths respectively.

 They follow the same enumeration :

@ -764,9 +772,14 @@ They follow the same enumeration :
          A distribution table will be present.
          It will be described in [next part](#distribution-tables).

-#### Symbols decoding
+#### The codes for literals lengths, match lengths, and offsets.

-##### Literals Length codes
+Each symbol is a _code_ in its own context,
+which specifies `Baseline` and `Number_of_Bits` to add.
+_Codes_ are FSE compressed,
+and interleaved with raw additional bits in the same bitstream.
+
+##### Literals length codes 

 Literals length codes are values ranging from `0` to `35` included.
 They define lengths from 0 to 131071 bytes.
@ -778,20 +791,20 @@ They define lengths from 0 to 131071 bytes.

 | `Literals_Length_Code` |  16  |  17  |  18  |  19  |  20  |  21  |  22  |  23  |
 | ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline               |  16  |  18  |  20  |  22  |  24  |  28  |  32  |  40  |
+| `Baseline`             |  16  |  18  |  20  |  22  |  24  |  28  |  32  |  40  |
 | `Number_of_Bits`       |   1  |   1  |   1  |   1  |   2  |   2  |   3  |   3  |

 | `Literals_Length_Code` |  24  |  25  |  26  |  27  |  28  |  29  |  30  |  31  |
 | ---------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline               |  48  |  64  |  128 |  256 |  512 | 1024 | 2048 | 4096 |
+| `Baseline`             |  48  |  64  |  128 |  256 |  512 | 1024 | 2048 | 4096 |
 | `Number_of_Bits`       |   4  |   6  |   7  |   8  |   9  |  10  |  11  |  12  |

 | `Literals_Length_Code` |  32  |  33  |  34  |  35  |
 | ---------------------- | ---- | ---- | ---- | ---- |
-| Baseline               | 8192 |16384 |32768 |65536 |
+| `Baseline`             | 8192 |16384 |32768 |65536 |
 | `Number_of_Bits`       |  13  |  14  |  15  |  16  |

-__Default distribution__
+##### Default distribution for literals length codes

 When `Compression_Mode` is `Predefined_Mode`,
 a predefined distribution is used for FSE compression.
@ -804,7 +817,7 @@ short literalsLength_defaultDistribution[36] =
         -1,-1,-1,-1 };
 ```

-##### Match Length codes
+##### Match length codes

 Match length codes are values ranging from `0` to `52` included.
 They define lengths from 3 to 131074 bytes.
@ -816,25 +829,25 @@ They define lengths from 3 to 131074 bytes.

 | `Match_Length_Code` |  32  |  33  |  34  |  35  |  36  |  37  |  38  |  39  |
 | ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline            |  35  |  37  |  39  |  41  |  43  |  47  |  51  |  59  |
+| `Baseline`          |  35  |  37  |  39  |  41  |  43  |  47  |  51  |  59  |
 | `Number_of_Bits`    |   1  |   1  |   1  |   1  |   2  |   2  |   3  |   3  |

 | `Match_Length_Code` |  40  |  41  |  42  |  43  |  44  |  45  |  46  |  47  |
 | ------------------- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
-| Baseline            |  67  |  83  |  99  |  131 |  258 |  514 | 1026 | 2050 |
+| `Baseline`          |  67  |  83  |  99  |  131 |  258 |  514 | 1026 | 2050 |
 | `Number_of_Bits`    |   4  |   4  |   5  |   7  |   8  |   9  |  10  |  11  |

 | `Match_Length_Code` |  48  |  49  |  50  |  51  |  52  |
 | ------------------- | ---- | ---- | ---- | ---- | ---- |
-| Baseline            | 4098 | 8194 |16486 |32770 |65538 |
+| `Baseline`          | 4098 | 8194 |16486 |32770 |65538 |
 | `Number_of_Bits`    |  12  |  13  |  14  |  15  |  16  |

-__Default distribution__
+##### Default distribution for match length codes

 When `Compression_Mode` is defined as `Predefined_Mode`,
 a predefined distribution is used for FSE compression.

-Here is its definition. It uses an accuracy of 6 bits (64 states).
+Below is its definition. It uses an accuracy of 6 bits (64 states).
 ```
 short matchLengths_defaultDistribution[53] =
        { 1, 4, 3, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
@ -853,26 +866,27 @@ For information, at the time of this writing.
 the reference decoder supports a maximum `N` value of `28` in 64-bits mode.

 An offset code is also the number of additional bits to read,
-and can be translated into an `Offset_Value` using the following formulae :
+and can be translated into an `Offset_Value` using the following formulas :

 ```
 Offset_Value = (1 << offsetCode) + readNBits(offsetCode);
 if (Offset_Value > 3) offset = Offset_Value - 3;
 ```
-It means that maximum `Offset_Value` is `2^(N+1))-1` and it supports back-reference distance up to 2^(N+1))-4
+It means that maximum `Offset_Value` is `2^(N+1))-1` and it supports back-reference distance up to `2^(N+1))-4`
 but is limited by [maximum back-reference distance](#window_descriptor).

-Offset_Value from 1 to 3 are special : they define "repeat codes",
+`Offset_Value` from 1 to 3 are special : they define "repeat codes",
 which means one of the previous offsets will be repeated.
 They are sorted in recency order, with 1 meaning the most recent one.
 See [Repeat offsets](#repeat-offsets) paragraph.

-__Default distribution__
+
+##### Default distribution for offset codes

 When `Compression_Mode` is defined as `Predefined_Mode`,
 a predefined distribution is used for FSE compression.

-Here is its definition. It uses an accuracy of 5 bits (32 states),
+Below is its definition. It uses an accuracy of 5 bits (32 states),
 and supports a maximum `N` of 28, allowing offset values up to 536,870,908 .

 If any sequence in the compressed block requires an offset larger than this,
@ -913,7 +927,7 @@ The bitstream starts by reporting on which scale it operates.
 Note that maximum `Accuracy_Log` for literal and match lengths is `9`,
 and for offsets is `8`. Higher values are considered errors.

-Then follow each symbol value, from `0` to last present one.
+Then follows each symbol value, from `0` to last present one.
 The number of bits used by each field is variable.
 It depends on :

@ -942,11 +956,11 @@ It depends on :

 Symbols probabilities are read one by one, in order.

-Probability is obtained from Value decoded by following formulae :
+Probability is obtained from Value decoded by following formula :
 `Proba = value - 1`

 It means value `0` becomes negative probability `-1`.
-`-1` is a special probability, which means `less than 1`.
+`-1` is a special probability, which means "less than 1".
 Its effect on distribution table is described in [next paragraph].
 For the purpose of calculating cumulated distribution, it counts as one.

@ -979,7 +993,7 @@ The table has a size of `tableSize = 1 << Accuracy_Log`.
 Each cell describes the symbol decoded,
 and instructions to get the next state.

-Symbols are scanned in their natural order for `less than 1` probabilities.
+Symbols are scanned in their natural order for "less than 1" probabilities.
 Symbols with this probability are being attributed a single cell,
 starting from the end of the table.
 These symbols define a full state reset, reading `Accuracy_Log` bits.
@ -1001,7 +1015,7 @@ typically by a "less than 1" probability symbol.
 The result is a list of state values.
 Each state will decode the current symbol.

-To get the Number of bits and baseline required for next state,
+To get the `Number_of_Bits` and `Baseline` required for next state,
 it's first necessary to sort all states in their natural order.
 The lower states will need 1 more bit than higher ones.

@ -1025,11 +1039,11 @@ Numbering starts from higher states using less bits.
 | width            |  32   |  32   |   32   |  16  |  16   |
 | `Number_of_Bits` |   5   |   5   |    5   |   4  |   4   |
 | range number     |   2   |   4   |    6   |   0  |   1   |
-| baseline         |  32   |  64   |   96   |   0  |  16   |
+| `Baseline`       |  32   |  64   |   96   |   0  |  16   |
 | range            | 32-63 | 64-95 | 96-127 | 0-15 | 16-31 |

 Next state is determined from current state
-by reading the required number of bits, and adding the specified baseline.
+by reading the required `Number_of_Bits`, and adding the specified `Baseline`.


 #### Bitstream
@ -1059,16 +1073,16 @@ Reminder : always keep in mind that all values are read _backward_.
 ##### Decoding a sequence

 A state gives a code.
-A code provides a baseline and number of bits to add.
+A code provides `Baseline` and `Number_of_Bits` to add.
 See [Symbol Decoding] section for details on each symbol.

-Decoding starts by reading the number of bits required to decode offset.
-It then does the same for match length,
-and then for literals length.
+Decoding starts by reading the `Number_of_Bits` required to decode `Offset`.
+It then does the same for `Match_Length`,
+and then for `Literals_Length`.

-Offset / matchLength / litLength define a sequence.
-It starts by inserting the number of literals defined by `litLength`,
-then continue by copying `matchLength` bytes from `currentPos - offset`.
+`Offset`, `Match_Length`, and `Literals_Length` define a sequence.
+It starts by inserting the number of literals defined by `Literals_Length`,
+then continue by copying `Match_Length` bytes from `currentPos - Offset`.

 The next operation is to update states.
 Using rules pre-calculated in the decoding tables,
@ -1080,17 +1094,17 @@ This operation will be repeated `Number_of_Sequences` times.
 At the end, the bitstream shall be entirely consumed,
 otherwise bitstream is considered corrupted.

-[Symbol Decoding]:#symbols-decoding
+[Symbol Decoding]:#the-codes-for-literals-lengths-match-lengths-and-offsets

 ##### Repeat offsets

-As seen in [Offset Codes], the first 3 values define a repeated offset.
-They are sorted in recency order, with 1 meaning "most recent one".
+As seen in [Offset Codes], the first 3 values define a repeated offset and we will call them `Repeated_Offset1`, `Repeated_Offset2`, and `Repeated_Offset3`.
+They are sorted in recency order, with `Repeated_Offset1` meaning "most recent one".

 There is an exception though, when current sequence's literals length is `0`.
-In which case, repcodes are "pushed by one",
-so 1 becomes 2, 2 becomes 3,
-and 3 becomes "offset_1 - 1_byte".
+In which case, repeated offsets are "pushed by one",
+so `Repeated_Offset1` becomes `Repeated_Offset2`, `Repeated_Offset2` becomes `Repeated_Offset3`,
+and `Repeated_Offset3` becomes `Repeated_Offset1 - 1_byte`.

 On first block, offset history is populated by the following values : 1, 4 and 8 (in order).

@ -1105,8 +1119,8 @@ they do not contribute to offset history.
 New offset take the lead in offset history,
 up to its previous place if it was already present.

-It means that when repeat offset 1 (most recent) is used, history is unmodified.
-When repeat offset 2 is used, it's swapped with offset 1.
+It means that when `Repeated_Offset1` (most recent) is used, history is unmodified.
+When `Repeated_Offset2` is used, it's swapped with `Repeated_Offset1`.


 Dictionary format
@ -1138,8 +1152,8 @@ _Reserved ranges :_

 __`Entropy_Tables`__ : following the same format as a [compressed blocks].
            They are stored in following order :
-            Huffman tables for literals, FSE table for offset,
-            FSE table for matchLenth, and FSE table for litLength.
+            Huffman tables for literals, FSE table for offsets,
+            FSE table for match lengths, and FSE table for literals lengths.
            It's finally followed by 3 offset values, populating recent offsets,
            stored in order, 4-bytes little-endian each, for a total of 12 bytes.