diff --git a/brotlispec.txt b/brotlispec.txt
index 55232aa..5a32a2d 100644
--- a/brotlispec.txt
+++ b/brotlispec.txt
@@ -52,9 +52,8 @@ Abstract
            such as Unix filters;
          * Compresses data with efficiency comparable to the best
            currently available general-purpose compression methods,
-           and in particular considerably better than the gzip
-           program and decompresses much faster than the LZMA
-           implementations.
+           and in particular considerably better than the gzip program;
+         * Decompresses much faster than the LZMA implementations.
 
       The data format defined by this specification does not attempt to:
          * Allow random access to compressed data;
@@ -196,23 +195,50 @@ Abstract
 
    The sequence of each type of value in the representation of a command
    (insert-and-copy lengths, literals and distances) within a meta-
-   block is further divided into blocks. In other words, each meta-block
-   has a series of insert-and-copy length blocks, a series of literal
-   blocks and a series of distance blocks. These are also called the
-   three block categories: a meta-block has a series of blocks for each 
-   block category. The subsequent blocks within each block category have
-   different block types, but blocks further away in the block sequence 
-   can have the same types. The block types are numbered from 0 to the
-   maximum block type number of 253 and the first block of each block
-   category has type 0. The block structure of a meta-block is
-   represented by the sequence of block-switch commands for each block
-   category, where a block-switch command is a pair <block type, block
-   length>. The block-switch commands are represented in the compressed
-   data before the start of each new block using a Huffman code tree for
+   block is further divided into blocks. In the "brotli" format, blocks
+   are not contiguous chunks of compressed data, but rather the pieces
+   of compressed data belonging to a block are interleaved with pieces
+   of data belonging to other blocks. Each meta-block can be logically
+   decomposed into a series of insert-and-copy length blocks, a series
+   of literal blocks and a series of distance blocks. These are also
+   called the three block categories: a meta-block has a series of
+   blocks for each block category. Note that the physical structure of
+   the meta-block is a series of commands, while the three series of
+   blocks is the logical structure. Consider the following example:
+
+      (IaC0, L0, L1, L2, D0)(IaC1, D1)(IaC2, L3, L4, D2)(IaC3, L5, D3)
+
+   The meta-block here has 4 commands, and each three types of symbols
+   within these commands can be rearranged into for example the
+   following logical block structure:
+
+      [IaC0, IaC1][IaC2, IaC3]  <-- block types 0 and 1
+
+      [L0, L1][L2, L3, L4][L5]  <-- block types 0, 1, and 0
+
+      [D0][D1, D2, D3]          <-- block types 0 and 1
+
+   The subsequent blocks within each block category must have different
+   block types, but blocks further away in the block sequence can have
+   the same types. The block types are numbered from 0 to the maximum
+   block type number of 255 and the first block of each block category
+   must have type 0. The block structure of a meta-block is represented
+   by the sequence of block-switch commands for each block category,
+   where a block-switch command is a pair <block type, block length>.
+   The block-switch commands are represented in the compressed data
+   before the start of each new block using a Huffman code tree for
    block types and a separate Huffman code tree for block lengths for
-   each block category. The code trees for block types and lengths
-   (total of six Huffman code trees) appear in a compact form in the
-   meta-block header.
+   each block category. In the above example the physical layout of the
+   meta-block is the following:
+
+      IaC0 L0 L1 LBlockSwitch(1, 3) L2 D0 IaC1 DBlockSwitch(1, 1) D1
+      IaCBlockSwitch(1, 2) IaC2 L3 L4 D2 IaC3 LBlockSwitch(0, 1) D3
+
+   Note that the block switch commands for the first blocks are not part
+   of the meta-block compressed data part, they are encoded in the meta-
+   block header. The code trees for block types and lengths (total of
+   six Huffman code trees) appear in a compact form in the meta-block
+   header.
 
    Each type of value (insert-and-copy lengths, literals and distances) 
    can be encoded with any Huffman tree from a collection of Huffman
@@ -235,7 +261,7 @@ Abstract
    and the context map), the meta-block header contains the number of
    input bytes in the meta-block and two additional parameters used in
    the representation of copy distances (number of "postfix bits" and
-   number of direct distance codes, see later).
+   number of direct distance codes).
 
 3. Compressed representation of Huffman codes
 
@@ -383,8 +409,7 @@ Abstract
       length codes, the alphabet size is 704. For block length codes,
       the alphabet size is 26. For distance codes, block type codes and
       the Huffman codes used in compressing the context map, the
-      alphabet size is dynamic and is based on other parameters (see
-      later).
+      alphabet size is dynamic and is based on other parameters.
 
    3.4. Simple Huffman codes
 
@@ -446,13 +471,19 @@ Abstract
                     If this is the first code length, or all previous
                     code lengths are zero, a code length of 8 is
                     repeated 3 - 6 times
-                       Example:  Codes 7, 16 (+2 bits 11),
-                                 16 (+2 bits 10) will expand to
-                                 12 code lengths of 7 (1 + 6 + 5)
+                    A repeated code length code of 16 modifies the
+                    repeat count of the previous one as follows:
+                       repeat count = (4 * (repeat count - 2)) +
+                                      (3 - 6 on the next 2 bits)
+                    Example:  Codes 7, 16 (+2 bits 11), 16 (+2 bits 10)
+                              will expand to 22 code lengths of 7
+                              (1 + 4 * (6 - 2) + 5)
                 17: Repeat a code length of 0 for 3 - 10 times.
                     (3 bits of length)
-                18: Repeat a code length of 0 for 11 - 138 times
-                    (7 bits of length)
+                    A repeated code length code of 17 modifies the
+                    repeat count of the previous one as follows:
+                       repeat count = (8 * (repeat count - 2)) +
+                                      (3 - 10 on the next 3 bits)
 
       A code length of 0 indicates that the corresponding symbol in the
       alphabet will not occur in the compressed data, and should not
@@ -475,12 +506,12 @@ Abstract
       follows:
 
             1 bit:  0, indicating a complex Huffman code
-            4 bits: HCLEN, # of code length codes - 4
+            4 bits: HCLEN, # of code length codes - 3
             1 bit : HSKIP, if 1, skip over first two code length codes
 
-            (HCLEN + 4 - 2 * HSKIP) code lengths for symbols in the code
+            (HCLEN + 3 - 2 * HSKIP) code lengths for symbols in the code
                length alphabet given just above, in the order: 1, 2, 3,
-               4, 0, 17, 18, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
+               4, 0, 17, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
 
                If HSKIP is 1, code lengths of code length symbols 1 and
                2 are implicit zeros. Code lengths of code length symbols
@@ -495,19 +526,18 @@ Abstract
             1 bit:  HLENINC, if 1, the number of code length symbols is
                     encoded next
 
-            3 bits: HNBITPAIRS, (# of bit pairs to represent HLEN) - 2,
-                    appears only if HLENINC = 1
-
-            2 * HNBITPAIRS + 2 bits: HLEN, # of code length symbols - 2,
-                                     appears only if HLENINC = 1
+          7-8 bits: HLEN, # of code length symbols, with the following
+                    encoding: values 4 - 67 with bit pattern 0xxxxxx,
+                    values 68 - 195 with bit pattern 1xxxxxxx, appears
+                    only if HLENINC = 1
 
             Sequence of code lengths symbols, encoded using the code
                length Huffman code. The number of code length symbols
-               is either HLEN + 2 (in case of HLENINC = 1), or as many
-               as is needed to assign a code length to each symbol in
-               the alphabet (i.e. the alphabet size minus the sum of all
-               the repeat lengths defined by extra bits of code length
-               symbols 16 - 18). In case of HLENINC = 1, all symbols
+               is either HLEN (in case of HLENINC = 1), or as many as is
+               needed to assign a code length to each symbol in the
+               alphabet (i.e. the alphabet size minus the sum of all the
+               repeat lengths defined by extra bits of code length
+               symbols 16 and 17). In case of HLENINC = 1, all symbols
                not assigned a code length have implicit code length 0.
 
    3.6. Validity of the Huffman code
@@ -582,7 +612,7 @@ Abstract
    the NDIRECT direct distance codes have any extra bits.
 
    Distance codes 16 + NDIRECT and greater all have extra bits, the
-   number of extra bits for a distance code `dcode' is given by the
+   number of extra bits for a distance code "dcode" is given by the
    following formula:
 
       ndistbits = 1 + ((dcode - NDIRECT - 16) >> (NPOSTFIX + 1))
@@ -590,8 +620,8 @@ Abstract
    The maximum number of extra bits is 24, therefore the size of the
    distance code alphabet is (16 + NDIRECT + (48 << NPOSTFIX)).
 
-   Given a distance code `dcode' (>= 16 + NDIRECT), and extra bits
-   `dextra', the backward distance is given by the following formula:
+   Given a distance code "dcode" (>= 16 + NDIRECT), and extra bits
+   "dextra", the backward distance is given by the following formula:
 
       hcode = (dcode - NDIRECT - 16) >> NPOSTFIX
       lcode = (dcode - NDIRECT - 16) & POSTFIX_MASK
@@ -704,8 +734,8 @@ Abstract
    alphabet. A block type code 0 means that the block type is the same
    as the type of the second last block from the same block category,
    while a block type code 1 means that the block type equals the last
-   block type plus one. Block type codes 2 - 255 represent block types
-   0 - 253. The second last and last block types are initialized with 0
+   block type plus one. Block type codes 2 - 257 represent block types
+   0 - 255. The second last and last block types are initialized with 0
    and 1, respectively, at the beginning of each meta-block.
 
    The first block type of each block category must be 0, and the block
@@ -851,10 +881,9 @@ Abstract
 
    7.2. Context id for distances
 
-      The context for encoding the next distance code is defined by the
-      copy length corresponding to the distance. The context ids are
-      0, 1, 2, and 3 for copy lengths 2, 3, 4, and more than 4,
-      respectively.
+      The context for encoding a distance code is defined by the copy
+      length corresponding to the distance. The context ids are 0, 1, 2,
+      and 3 for copy lengths 2, 3, 4, and more than 4, respectively.
 
    7.3. Encoding of the context map
 
@@ -869,7 +898,7 @@ Abstract
       CMAPL[0..(64 * NBLTYPESL - 1)] and CMAPD[0..(4 * NBLTYPESD - 1)].
 
       The index of the Huffman tree for encoding a literal or distance
-      code with context id `cid' and block type `bltype' is
+      code with context id "cid" and block type "bltype" is
 
          index of literal Huffman tree = CMAPL[bltype * 64 + cid]
 
@@ -899,9 +928,6 @@ Abstract
       now define the format of the context map (the same format is used 
       for literal and distance context maps):
 
-            8 bits: NTREES - 1, if NTREES = 1 all values in the context
-                    map are zeros, and no further bits are needed for
-                    the context map encoding. Otherwise,
           1-5 bits: RLEMAX, 0 is encoded with one 0 bit, and values
                     1 - 16 are encoded with bit pattern 1xxxx
 
@@ -914,7 +940,9 @@ Abstract
                     transform on the values in the context map to get
                     the Huffman code indexes
 
-8. Language-based static dictionaries
+      For the encoding of NTREES see Section 9.2.
+
+8. Static dictionary
 
    At any given point during decoding the compressed data, a reference
    to a duplicated string in the output produced so far has a maximum
@@ -923,24 +951,44 @@ Abstract
    from the input stream, as described in section 4, can produce
    distances that are greater than this maximum allowed value. The
    difference between these distances and the first invalid distance
-   value is treated as reference to a word in one of the language-based
-   static dictionaries given in Appendix A. The id of the static
-   dictionary is determined by the copy length of the command:
+   value is treated as reference to a word in the static dictionary
+   given in Appendix A. The maximum valid copy length for a static
+   dictionary reference is 24. The static dictionary has three parts:
 
-         dictionary id = copy length - 4
-         word id = distance - (max allowed distance + 1)
+      * DICT[0..DICTSIZE], an array of bytes
+      * DOFFSET[0..24], an array of byte offset values for each length
+      * NDBITS[0..24], an array of bit-depth values for each length
 
-   If the copy length is less than 4, or the dictionary id is invalid,
-   the compressed data set is invalid and must be discarded.
+   The number of static dictionary words for a given length is:
 
-   Each of the static dictionaries has 2^N words, and the index of the
-   referenced word is formed by the N least significant bits of the word
-   id. The word id right-shifted by N gives the index to one of the word
-   transformations given in Appendix B. If this transformation index is
-   greater than the maximum transformation index, the compressed data
-   set is invalid and must be discarded. The string copied to the output
-   stream is computed by applying the transformation to the referenced
-   static dictionary word.
+      NWORDS[length] = 0                       (if length < 3)
+      NWORDS[length] = (1 << NDBITS[lengths])  (if length >= 3)
+
+   DOFFSET and DICTSIZE are defined by the following recursion:
+
+      DOFFSET[0] = 0
+      DOFFSET[length + 1] = DOFFSET[length] + length * NWORDS[length]
+      DICTSIZE = DOFFSET[24] + 24 * NWORDS[24]
+
+   The offset of a word within the DICT array for a given length and
+   index is:
+
+      offset(length, index) = DOFFSET[length] + index * length
+
+   Each static dictionary word has 64 different forms, given by applying
+   a word transformation to a base word in the DICT array. The list of
+   word transformations is given in Appendix B. The static dictionary
+   word for a <length, distance> pair can be reconstructed as follows:
+
+      word_id = distance - (max allowed distance + 1)
+      index = word_id % NWORDS[length]
+      base_word = DICT[offset(length, index)..offset(length, index+1))
+      transform_id = word_id >> NBITS[length]
+
+   The string copied to the output stream is computed by applying the 
+   transformation to the base dictionary word. If transform_id is
+   greater than 63 or length is greater than 24, the compressed data set
+   is invalid and must be discarded.
 
 9. Compressed data format
 
@@ -979,6 +1027,11 @@ Abstract
             (MNIBBLES + 4) x 4 bits: MLEN - 1, where MLEN is the length
                of the meta-block in the input data in bytes
 
+            1 bit:  ISUNCOMPRESSED, if set to 1, any bits of input up to
+                    the next byte boundary are ignored, and the rest of
+                    the meta-block contains MLEN bytes of literal data;
+                    this field is only present if ISLAST bit is not set
+
          1-11 bits: NBLTYPESL, # of literal block types, encoded with
                     the following variable length code:
 
@@ -1035,17 +1088,25 @@ Abstract
 
             NBLTYPESL x 2 bits: context mode for each literal block type
 
+         1-11 bits: NTREESL, # of literal Huffman trees, encoded with
+                    the same variable length code as NBLTYPESL
+
             Literal context map, encoded as described in Paragraph 7.3,
-               the number of Huffman tree indexes is denoted by NHTREESL
+               appears only if NTREESL >= 2, otherwise the context map
+               has only zero values
+
+         1-11 bits: NTREESD, # of distance Huffman trees, encoded with
+                    the same variable length code as NBLTYPESD
 
             Distance context map, encoded as described in Paragraph 7.3,
-               the number of Huffman tree indexes is denoted by NHTREESD
+               appears only if NTREESD >= 2, otherwise the context map
+               has only zero values
 
-            NHTREESL Huffman codes for literals
+            NTREESL Huffman codes for literals
 
             NBLTYPESI Huffman codes for insert-and-copy lengths
 
-            NHTREESD Huffman codes for distances
+            NTREESD Huffman codes for distances
 
    9.3. Format of the meta-block data
 
@@ -1109,6 +1170,12 @@ Abstract
             if ISEMPTY
                break from loop
          read MLEN
+         if not ISLAST
+            read ISUNCOMPRESSED bit
+            if ISUNCOMPRESSED
+               skip any bits up to the next byte boundary
+               copy MLEN bytes of input to the output stream
+               continue to the next meta-block
          loop for each three block categories (i = L, I, D)
             read NBLTYPESi
             if NBLTYPESi >= 2
@@ -1122,8 +1189,16 @@ Abstract
                set block length, BLEN_i to 268435456
          read NPOSTFIX and NDIRECT
          read array of literal context modes, CMODE[]
-         read literal context map, CMAPL[]
-         read distance context map, CMAPD[]
+         read NTREESL
+         if NTREESL >= 2
+            read literal context map, CMAPL[]
+         else
+            fill CMAPL[] with zeros
+         read NTREESD
+         if NTREESD >= 2
+            read distance context map, CMAPD[]
+         else
+            fill CMAPD[] with zeros
          read array of Huffman codes for literals, HTREEL[]
          read array of Huffman codes for insert-and-copy, HTREEI[]
          read array of Huffman codes for distances, HTREED[]
diff --git a/dec/bit_reader.h b/dec/bit_reader.h
index c3f84f4..a7ad460 100644
--- a/dec/bit_reader.h
+++ b/dec/bit_reader.h
@@ -59,6 +59,12 @@ static BROTLI_INLINE uint32_t BrotliPrefetchBits(BrotliBitReader* const br) {
 // For jumping over a number of bits in the bit stream when accessed with
 // BrotliPrefetchBits and BrotliFillBitWindow.
 static BROTLI_INLINE void BrotliSetBitPos(BrotliBitReader* const br, int val) {
+#ifdef BROTLI_DECODE_DEBUG
+  int n_bits = val - br->bit_pos_;
+  const uint32_t bval = (uint32_t)(br->val_ >> br->bit_pos_) & kBitMask[n_bits];
+  printf("[BrotliReadBits]  %010ld %2d  val: %6x\n",
+         (br->pos_ << 3) + br->bit_pos_ - 64, n_bits, bval);
+#endif
   br->bit_pos_ = val;
 }
 
@@ -145,6 +151,10 @@ static BROTLI_INLINE uint32_t BrotliReadBits(
     BrotliBitReader* const br, int n_bits) {
   BrotliFillBitWindow(br);
   const uint32_t val = (uint32_t)(br->val_ >> br->bit_pos_) & kBitMask[n_bits];
+#ifdef BROTLI_DECODE_DEBUG
+  printf("[BrotliReadBits]  %010ld %2d  val: %6x\n",
+         (br->pos_ << 3) + br->bit_pos_ - 64, n_bits, val);
+#endif
   br->bit_pos_ += n_bits;
   return val;
 }
diff --git a/dec/decode.c b/dec/decode.c
index 3bc37cf..df59463 100644
--- a/dec/decode.c
+++ b/dec/decode.c
@@ -38,20 +38,16 @@ extern "C" {
 #endif
 
 static const int kDefaultCodeLength = 8;
-static const int kCodeLengthLiterals = 16;
 static const int kCodeLengthRepeatCode = 16;
-static const int kCodeLengthExtraBits[3] = { 2, 3, 7 };
-static const int kCodeLengthRepeatOffsets[3] = { 3, 3, 11 };
-
 static const int kNumLiteralCodes = 256;
 static const int kNumInsertAndCopyCodes = 704;
 static const int kNumBlockLengthCodes = 26;
 static const int kLiteralContextBits = 6;
 static const int kDistanceContextBits = 2;
 
-#define CODE_LENGTH_CODES 19
+#define CODE_LENGTH_CODES 18
 static const uint8_t kCodeLengthCodeOrder[CODE_LENGTH_CODES] = {
-  1, 2, 3, 4, 0, 17, 18, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
+  1, 2, 3, 4, 0, 17, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15,
 };
 
 #define NUM_DISTANCE_SHORT_CODES 16
@@ -71,11 +67,26 @@ static BROTLI_INLINE int DecodeWindowBits(BrotliBitReader* br) {
   }
 }
 
+// Decodes a number in the range [0..255], by reading 1 - 11 bits.
+static BROTLI_INLINE int DecodeVarLenUint8(BrotliBitReader* br) {
+  if (BrotliReadBits(br, 1)) {
+    int nbits = BrotliReadBits(br, 3);
+    if (nbits == 0) {
+      return 1;
+    } else {
+      return BrotliReadBits(br, nbits) + (1 << nbits);
+    }
+  }
+  return 0;
+}
+
 static void DecodeMetaBlockLength(BrotliBitReader* br,
                                   size_t* meta_block_length,
-                                  int* input_end) {
+                                  int* input_end,
+                                  int* is_uncompressed) {
   *input_end = BrotliReadBits(br, 1);
   *meta_block_length = 0;
+  *is_uncompressed = 0;
   if (*input_end && BrotliReadBits(br, 1)) {
     return;
   }
@@ -85,6 +96,9 @@ static void DecodeMetaBlockLength(BrotliBitReader* br,
     *meta_block_length |= BrotliReadBits(br, 4) << (i * 4);
   }
   ++(*meta_block_length);
+  if (!*input_end) {
+    *is_uncompressed = BrotliReadBits(br, 1);
+  }
 }
 
 // Decodes the next Huffman code from bit-stream.
@@ -130,6 +144,8 @@ static int ReadHuffmanCodeLengths(
   int max_symbol;
   int decode_number_of_code_length_codes;
   int prev_code_len = kDefaultCodeLength;
+  int repeat = 0;
+  int repeat_length = 0;
   HuffmanTree tree;
 
   if (!BrotliHuffmanTreeBuildImplicit(&tree, code_length_code_lengths,
@@ -146,9 +162,11 @@ static int ReadHuffmanCodeLengths(
   decode_number_of_code_length_codes = BrotliReadBits(br, 1);
   BROTLI_LOG_UINT(decode_number_of_code_length_codes);
   if (decode_number_of_code_length_codes) {
-    const int length_nbits = 2 + 2 * BrotliReadBits(br, 3);
-    max_symbol = 2 + BrotliReadBits(br, length_nbits);
-    BROTLI_LOG_UINT(length_nbits);
+    if (BrotliReadBits(br, 1)) {
+      max_symbol = 68 + BrotliReadBits(br, 7);
+    } else {
+      max_symbol = 4 + BrotliReadBits(br, 6);
+    }
     if (max_symbol > num_symbols) {
       printf("[ReadHuffmanCodeLengths] max_symbol > num_symbols (%d vs %d)\n",
              max_symbol, num_symbols);
@@ -160,7 +178,7 @@ static int ReadHuffmanCodeLengths(
   BROTLI_LOG_UINT(max_symbol);
 
   symbol = 0;
-  while (symbol < num_symbols) {
+  while (symbol + repeat < num_symbols) {
     int code_len;
     if (max_symbol-- == 0) break;
     if (!BrotliReadMoreInput(br)) {
@@ -169,30 +187,36 @@ static int ReadHuffmanCodeLengths(
     }
     code_len = ReadSymbol(&tree, br);
     BROTLI_LOG_UINT(symbol);
+    BROTLI_LOG_UINT(repeat);
+    BROTLI_LOG_UINT(repeat_length);
     BROTLI_LOG_UINT(code_len);
-    if (code_len < kCodeLengthLiterals) {
+    if ((code_len < kCodeLengthRepeatCode) ||
+        (code_len == kCodeLengthRepeatCode && repeat_length == 0) ||
+        (code_len > kCodeLengthRepeatCode && repeat_length > 0)) {
+      while (repeat > 0) {
+        code_lengths[symbol++] = repeat_length;
+        --repeat;
+      }
+    }
+    if (code_len < kCodeLengthRepeatCode) {
       code_lengths[symbol++] = code_len;
       if (code_len != 0) prev_code_len = code_len;
     } else {
-      const int use_prev = (code_len == kCodeLengthRepeatCode);
-      const int slot = code_len - kCodeLengthLiterals;
-      const int extra_bits = kCodeLengthExtraBits[slot];
-      const int repeat_offset = kCodeLengthRepeatOffsets[slot];
-      const int length = use_prev ? prev_code_len : 0;
-      int repeat = BrotliReadBits(br, extra_bits) + repeat_offset;
-      BROTLI_LOG_UINT(repeat);
-      BROTLI_LOG_UINT(length);
-      if (symbol + repeat > num_symbols) {
-        printf("[ReadHuffmanCodeLengths] symbol + repeat > num_symbols "
-               "(%d + %d vs %d)\n", symbol, repeat, num_symbols);
-        goto End;
-      } else {
-        while (repeat-- > 0) {
-          code_lengths[symbol++] = length;
-        }
+      const int extra_bits = code_len - 14;
+      if (repeat > 0) {
+        repeat -= 2;
+        repeat <<= extra_bits;
       }
+      repeat += BrotliReadBits(br, extra_bits) + 3;
+      repeat_length = (code_len == kCodeLengthRepeatCode ? prev_code_len : 0);
     }
   }
+  if (symbol + repeat > num_symbols) {
+    printf("[ReadHuffmanCodeLengths] symbol + repeat > num_symbols "
+           "(%d + %d vs %d)\n", symbol, repeat, num_symbols);
+    goto End;
+  }
+  while (repeat-- > 0) code_lengths[symbol++] = repeat_length;
   while (symbol < num_symbols) code_lengths[symbol++] = 0;
   ok = 1;
 
@@ -256,7 +280,7 @@ static int ReadHuffmanCode(int alphabet_size,
   } else {  // Decode Huffman-coded code lengths.
     int i;
     uint8_t code_length_code_lengths[CODE_LENGTH_CODES] = { 0 };
-    const int num_codes = BrotliReadBits(br, 4) + 4;
+    const int num_codes = BrotliReadBits(br, 4) + 3;
     BROTLI_LOG_UINT(num_codes);
     if (num_codes > CODE_LENGTH_CODES) {
       return 0;
@@ -434,7 +458,7 @@ static int DecodeContextMap(int context_map_size,
     printf("[DecodeContextMap] Unexpected end of input.\n");
     return 0;
   }
-  *num_htrees = BrotliReadBits(br, 8) + 1;
+  *num_htrees = DecodeVarLenUint8(br) + 1;
 
   BROTLI_LOG_UINT(context_map_size);
   BROTLI_LOG_UINT(*num_htrees);
@@ -569,7 +593,8 @@ int BrotliDecompressedSize(size_t encoded_size,
   DecodeWindowBits(&br);
   size_t meta_block_len;
   int input_end;
-  DecodeMetaBlockLength(&br, &meta_block_len, &input_end);
+  int is_uncompressed;
+  DecodeMetaBlockLength(&br, &meta_block_len, &input_end, &is_uncompressed);
   if (!input_end) {
     return 0;
   }
@@ -633,7 +658,8 @@ int BrotliDecompress(BrotliInput input, BrotliOutput output) {
   while (!input_end && ok) {
     size_t meta_block_len = 0;
     size_t meta_block_end_pos;
-    uint32_t block_length[3] = { UINT32_MAX, UINT32_MAX, UINT32_MAX };
+    int is_uncompressed;
+    uint32_t block_length[3] = { 1 << 28, 1 << 28, 1 << 28 };
     int block_type[3] = { 0 };
     int num_block_types[3] = { 1, 1, 1 };
     int block_type_rb[6] = { 0, 1, 0, 1, 0, 1 };
@@ -672,22 +698,30 @@ int BrotliDecompress(BrotliInput input, BrotliOutput output) {
       goto End;
     }
     BROTLI_LOG_UINT(pos);
-    DecodeMetaBlockLength(&br, &meta_block_len, &input_end);
+    DecodeMetaBlockLength(&br, &meta_block_len, &input_end, &is_uncompressed);
     BROTLI_LOG_UINT(meta_block_len);
     if (meta_block_len == 0) {
       goto End;
     }
     meta_block_end_pos = pos + meta_block_len;
+    if (is_uncompressed) {
+      BrotliSetBitPos(&br, (br.bit_pos_ + 7) & ~7);
+      for (; pos < meta_block_end_pos; ++pos) {
+        ringbuffer[pos & ringbuffer_mask] = BrotliReadBits(&br, 8);
+        if ((pos & ringbuffer_mask) == ringbuffer_mask) {
+          if (BrotliWrite(output, ringbuffer, ringbuffer_size) < 0) {
+            ok = 0;
+            goto End;
+          }
+        }
+      }
+      goto End;
+    }
     for (i = 0; i < 3; ++i) {
       block_type_trees[i].root_ = NULL;
       block_len_trees[i].root_ = NULL;
-      if (BrotliReadBits(&br, 1)) {
-        int nbits = BrotliReadBits(&br, 3);
-        if (nbits == 0) {
-          num_block_types[i] = 2;
-        } else {
-          num_block_types[i] = BrotliReadBits(&br, nbits) + (1 << nbits) + 1;
-        }
+      num_block_types[i] = DecodeVarLenUint8(&br) + 1;
+      if (num_block_types[i] >= 2) {
         if (!ReadHuffmanCode(
                 num_block_types[i] + 2, &block_type_trees[i], &br) ||
             !ReadHuffmanCode(kNumBlockLengthCodes, &block_len_trees[i], &br)) {
diff --git a/enc/bit_cost.h b/enc/bit_cost.h
index c2155aa..5d6ef0f 100644
--- a/enc/bit_cost.h
+++ b/enc/bit_cost.h
@@ -25,7 +25,7 @@
 namespace brotli {
 
 static const int kHuffmanExtraBits[kCodeLengthCodes] = {
-  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 7,
+  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3,
 };
 
 static inline int HuffmanTreeBitCost(const int* counts, const uint8_t* depth) {
@@ -58,25 +58,29 @@ static inline int HuffmanBitCost(const uint8_t* depth, int length) {
     }
     i += reps;
     if (value == 0) {
-      while (reps > 10) {
-        ++histogram[18];
-        reps -= 138;
-      }
-      if (reps > 2) {
-        ++histogram[17];
-      } else if (reps > 0) {
+      if (reps < 3) {
         histogram[0] += reps;
+      } else {
+        reps -= 3;
+        while (reps >= 0) {
+          ++histogram[17];
+          reps >>= 3;
+          --reps;
+        }
       }
     } else {
       tail_start = i;
       ++histogram[value];
       --reps;
-      while (reps > 2) {
-        ++histogram[16];
-        reps -= 6;
-      }
-      if (reps > 0) {
+      if (reps < 3) {
         histogram[value] += reps;
+      } else {
+        reps -= 3;
+        while (reps >= 0) {
+          ++histogram[16];
+          reps >>= 2;
+          --reps;
+        }
       }
     }
   }
@@ -87,7 +91,6 @@ static inline int HuffmanBitCost(const uint8_t* depth, int length) {
   // account for rle extra bits
   cost[16] += 2;
   cost[17] += 3;
-  cost[18] += 7;
 
   int tree_size = 0;
   int bits = 6 + 3 * max_depth;  // huffman tree of huffman tree cost
@@ -95,27 +98,6 @@ static inline int HuffmanBitCost(const uint8_t* depth, int length) {
     bits += histogram[i] * cost[i];  // huffman tree bit cost
     tree_size += histogram[i];
   }
-  // bit cost adjustment for long trailing zero sequence
-  int tail_size = length - tail_start;
-  int tail_bits = 0;
-  while (tail_size >= 1) {
-    if (tail_size < 3) {
-      tail_bits += tail_size * cost[0];
-      tree_size -= tail_size;
-      break;
-    } else if (tail_size < 11) {
-      tail_bits += cost[17];
-      --tree_size;
-      break;
-    } else {
-      tail_bits += cost[18];
-      tail_size -= 138;
-      --tree_size;
-    }
-  }
-  if (tail_bits > 12) {
-    bits += ((Log2Ceiling(tree_size - 1) + 1) & ~1) + 3 - tail_bits;
-  }
   return bits;
 }
 
diff --git a/enc/block_splitter.cc b/enc/block_splitter.cc
index 5552541..e3d7363 100644
--- a/enc/block_splitter.cc
+++ b/enc/block_splitter.cc
@@ -282,9 +282,8 @@ void ClusterBlocks(const DataType* data, const size_t length,
   }
   std::vector<HistogramType> clustered_histograms;
   std::vector<int> histogram_symbols;
-  // Block ids need to fit in one byte and there are two ids reserved for
-  // indicating 'same as last' and 'last plus one'.
-  static const int kMaxNumberOfBlockTypes = 254;
+  // Block ids need to fit in one byte.
+  static const int kMaxNumberOfBlockTypes = 256;
   ClusterHistograms(histograms, 1, histograms.size(),
                     kMaxNumberOfBlockTypes,
                     &clustered_histograms,
diff --git a/enc/block_splitter.h b/enc/block_splitter.h
index 272904c..2a491e3 100644
--- a/enc/block_splitter.h
+++ b/enc/block_splitter.h
@@ -30,7 +30,7 @@ namespace brotli {
 struct BlockSplit {
   int num_types_;
   std::vector<uint8_t> types_;
-  std::vector<uint8_t> type_codes_;
+  std::vector<int> type_codes_;
   std::vector<int> lengths_;
 };
 
diff --git a/enc/encode.cc b/enc/encode.cc
index 4223ca4..7d54dbe 100644
--- a/enc/encode.cc
+++ b/enc/encode.cc
@@ -64,21 +64,32 @@ double TotalBitCost(const std::vector<Histogram<kSize> >& histograms) {
   return retval;
 }
 
-void EncodeSize(size_t len, int* storage_ix, uint8_t* storage) {
-  std::vector<uint8_t> len_bytes;
-  do {
-    len_bytes.push_back(len & 0xff);
-    len >>= 8;
-  } while (len > 0);
-  WriteBits(3, len_bytes.size(), storage_ix, storage);
-  for (int i = 0; i < len_bytes.size(); ++i) {
-    WriteBits(8, len_bytes[i], storage_ix, storage);
+void EncodeVarLenUint8(int n, int* storage_ix, uint8_t* storage) {
+  if (n == 0) {
+    WriteBits(1, 0, storage_ix, storage);
+  } else {
+    WriteBits(1, 1, storage_ix, storage);
+    int nbits = Log2Floor(n);
+    WriteBits(3, nbits, storage_ix, storage);
+    if (nbits > 0) {
+      WriteBits(nbits, n - (1 << nbits), storage_ix, storage);
+    }
   }
 }
 
 void EncodeMetaBlockLength(size_t meta_block_size,
+                           bool is_last,
+                           bool is_uncompressed,
                            int* storage_ix, uint8_t* storage) {
-  WriteBits(1, 0, storage_ix, storage);
+  WriteBits(1, is_last, storage_ix, storage);
+  if (is_last) {
+    if (meta_block_size == 0) {
+      WriteBits(1, 1, storage_ix, storage);
+      return;
+    }
+    WriteBits(1, 0, storage_ix, storage);
+  }
+  --meta_block_size;
   int num_bits = Log2Floor(meta_block_size) + 1;
   if (num_bits < 16) {
     num_bits = 16;
@@ -89,6 +100,9 @@ void EncodeMetaBlockLength(size_t meta_block_size,
     meta_block_size >>= 4;
     num_bits -= 4;
   }
+  if (!is_last) {
+    WriteBits(1, is_uncompressed, storage_ix, storage);
+  }
 }
 
 template<int kSize>
@@ -104,16 +118,16 @@ void StoreHuffmanTreeOfHuffmanTreeToBitMask(
     const uint8_t* code_length_bitdepth,
     int* storage_ix, uint8_t* storage) {
   static const uint8_t kStorageOrder[kCodeLengthCodes] = {
-    1, 2, 3, 4, 0, 17, 18, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15
+    1, 2, 3, 4, 0, 17, 5, 6, 16, 7, 8, 9, 10, 11, 12, 13, 14, 15,
   };
   // Throw away trailing zeros:
   int codes_to_store = kCodeLengthCodes;
-  for (; codes_to_store > 4; --codes_to_store) {
+  for (; codes_to_store > 3; --codes_to_store) {
     if (code_length_bitdepth[kStorageOrder[codes_to_store - 1]] != 0) {
       break;
     }
   }
-  WriteBits(4, codes_to_store - 4, storage_ix, storage);
+  WriteBits(4, codes_to_store - 3, storage_ix, storage);
   const int skip_two_first =
       code_length_bitdepth[kStorageOrder[0]] == 0 &&
       code_length_bitdepth[kStorageOrder[1]] == 0;
@@ -144,9 +158,6 @@ void StoreHuffmanTreeToBitMask(
       case 17:
         WriteBits(3, extra_bits, storage_ix, storage);
         break;
-      case 18:
-        WriteBits(7, extra_bits, storage_ix, storage);
-        break;
     }
   }
 }
@@ -225,16 +236,16 @@ void StoreHuffmanCode(const EntropyCode<kSize>& code, int alphabet_size,
   }
   int trimmed_size = trimmed_histogram.total_count_;
   bool write_length = false;
-  if (trimmed_size > 1 && trimmed_size < huffman_tree_size) {
+  if (trimmed_size >= 4 && trimmed_size <= 195 &&
+      trimmed_size < huffman_tree_size) {
     EntropyCode<kCodeLengthCodes> trimmed_entropy;
     BuildEntropyCode(trimmed_histogram, 5, kCodeLengthCodes, &trimmed_entropy);
     int huffman_bit_cost = HuffmanTreeBitCost(huffman_tree_histogram,
                                               huffman_tree_entropy);
     int trimmed_bit_cost = HuffmanTreeBitCost(trimmed_histogram,
                                               trimmed_entropy);;
-    const int nbits = Log2Ceiling(trimmed_size - 1);
-    const int nbitpairs = (nbits == 0) ? 1 : (nbits + 1) / 2;
-    if (trimmed_bit_cost + 3 + 2 * nbitpairs < huffman_bit_cost) {
+    trimmed_bit_cost += (trimmed_size < 68 ? 7 : 8);
+    if (trimmed_bit_cost < huffman_bit_cost) {
       write_length = true;
       huffman_tree_size = trimmed_size;
       huffman_tree_entropy = trimmed_entropy;
@@ -245,10 +256,12 @@ void StoreHuffmanCode(const EntropyCode<kSize>& code, int alphabet_size,
       &huffman_tree_entropy.depth_[0], storage_ix, storage);
   WriteBits(1, write_length, storage_ix, storage);
   if (write_length) {
-    const int nbits = Log2Ceiling(huffman_tree_size - 1);
-    const int nbitpairs = (nbits == 0) ? 1 : (nbits + 1) / 2;
-    WriteBits(3, nbitpairs - 1, storage_ix, storage);
-    WriteBits(nbitpairs * 2, huffman_tree_size - 2, storage_ix, storage);
+    WriteBits(1, huffman_tree_size >= 68, storage_ix, storage);
+    if (huffman_tree_size < 68) {
+      WriteBits(6, huffman_tree_size - 4, storage_ix, storage);
+    } else {
+      WriteBits(7, huffman_tree_size - 68, storage_ix, storage);
+    }
   }
   StoreHuffmanTreeToBitMask(&huffman_tree[0], &huffman_tree_extra_bits[0],
                             huffman_tree_size, huffman_tree_entropy,
@@ -464,7 +477,7 @@ int BestMaxZeroRunLengthPrefix(const std::vector<int>& v) {
 void EncodeContextMap(const std::vector<int>& context_map,
                       int num_clusters,
                       int* storage_ix, uint8_t* storage) {
-  WriteBits(8, num_clusters - 1, storage_ix, storage);
+  EncodeVarLenUint8(num_clusters - 1, storage_ix, storage);
 
   if (num_clusters == 1) {
     return;
@@ -476,11 +489,11 @@ void EncodeContextMap(const std::vector<int>& context_map,
   int max_run_length_prefix = BestMaxZeroRunLengthPrefix(transformed_symbols);
   RunLengthCodeZeros(transformed_symbols, &max_run_length_prefix,
                      &rle_symbols, &extra_bits);
-  HistogramLiteral symbol_histogram;
+  HistogramContextMap symbol_histogram;
   for (int i = 0; i < rle_symbols.size(); ++i) {
     symbol_histogram.Add(rle_symbols[i]);
   }
-  EntropyCodeLiteral symbol_code;
+  EntropyCodeContextMap symbol_code;
   BuildEntropyCode(symbol_histogram, 15, num_clusters + max_run_length_prefix,
                    &symbol_code);
   bool use_rle = max_run_length_prefix > 0;
@@ -510,7 +523,7 @@ void BuildEntropyCodes(const std::vector<Histogram<kSize> >& histograms,
 }
 
 struct BlockSplitCode {
-  EntropyCodeLiteral block_type_code;
+  EntropyCodeBlockType block_type_code;
   EntropyCodeBlockLength block_len_code;
 };
 
@@ -553,18 +566,12 @@ void ComputeBlockTypeShortCodes(BlockSplit* split) {
 void BuildAndEncodeBlockSplitCode(const BlockSplit& split,
                                   BlockSplitCode* code,
                                   int* storage_ix, uint8_t* storage) {
-  if (split.num_types_ <= 1) {
-    WriteBits(1, 0, storage_ix, storage);
+  EncodeVarLenUint8(split.num_types_ - 1, storage_ix, storage);
+  if (split.num_types_ == 1) {
     return;
   }
-  WriteBits(1, 1, storage_ix, storage);
-  int nbits = Log2Floor(split.num_types_ - 1);
-  WriteBits(3, nbits, storage_ix, storage);
-  if (nbits > 0) {
-    WriteBits(nbits, split.num_types_ - 1 - (1 << nbits), storage_ix, storage);
-  }
 
-  HistogramLiteral type_histo;
+  HistogramBlockType type_histo;
   for (int i = 0; i < split.type_codes_.size(); ++i) {
     type_histo.Add(split.type_codes_[i]);
   }
@@ -591,7 +598,7 @@ void MoveAndEncode(const BlockSplitCode& code,
     ++it->idx_;
     it->type_ = it->split_.types_[it->idx_];
     it->length_ = it->split_.lengths_[it->idx_];
-    uint8_t type_code = it->split_.type_codes_[it->idx_];
+    int type_code = it->split_.type_codes_[it->idx_];
     EntropyEncode(type_code, code.block_type_code, storage_ix, storage);
     EncodeBlockLength(code.block_len_code, it->length_, storage_ix, storage);
   }
@@ -626,6 +633,9 @@ void BuildMetaBlock(const EncodingParams& params,
                     MetaBlock* mb) {
   mb->cmds = cmds;
   mb->params = params;
+  if (cmds.empty()) {
+    return;
+  }
   ComputeCommandPrefixes(&mb->cmds,
                          mb->params.num_direct_distance_codes,
                          mb->params.distance_postfix_bits);
@@ -661,9 +671,8 @@ void BuildMetaBlock(const EncodingParams& params,
                   &mb->command_histograms,
                   &distance_histograms);
 
-  // Histogram ids need to fit in one byte and there are 16 ids reserved for
-  // run length codes, which leaves a maximum number of 240 histograms.
-  static const int kMaxNumberOfHistograms = 240;
+  // Histogram ids need to fit in one byte.
+  static const int kMaxNumberOfHistograms = 256;
 
   mb->literal_histograms = literal_histograms;
   ClusterHistograms(literal_histograms,
@@ -692,14 +701,20 @@ size_t MetaBlockLength(const std::vector<Command>& cmds) {
 }
 
 void StoreMetaBlock(const MetaBlock& mb,
+                    const bool is_last,
                     const uint8_t* ringbuffer,
                     const size_t mask,
                     size_t* pos,
                     int* storage_ix, uint8_t* storage) {
   size_t length = MetaBlockLength(mb.cmds);
   const size_t end_pos = *pos + length;
-  EncodeMetaBlockLength(length - 1,
+  EncodeMetaBlockLength(length,
+                        is_last,
+                        false,
                         storage_ix, storage);
+  if (length == 0) {
+    return;
+  }
   BlockSplitCode literal_split_code;
   BlockSplitCode command_split_code;
   BlockSplitCode distance_split_code;
@@ -798,42 +813,65 @@ void BrotliCompressor::WriteStreamHeader() {
 
 void BrotliCompressor::WriteMetaBlock(const size_t input_size,
                                       const uint8_t* input_buffer,
+                                      const bool is_last,
                                       size_t* encoded_size,
                                       uint8_t* encoded_buffer) {
-  ringbuffer_.Write(input_buffer, input_size);
-  EstimateBitCostsForLiterals(input_pos_, input_size,
-                              kRingBufferMask, ringbuffer_.start(),
-                              &literal_cost_[0]);
   std::vector<Command> commands;
-  CreateBackwardReferences(input_size, input_pos_,
-                           ringbuffer_.start(),
-                           &literal_cost_[0],
-                           kRingBufferMask, kMaxBackwardDistance,
-                           hasher_,
-                           &commands);
-  ComputeDistanceShortCodes(&commands, dist_ringbuffer_,
-                            &dist_ringbuffer_idx_);
+  if (input_size > 0) {
+    ringbuffer_.Write(input_buffer, input_size);
+    EstimateBitCostsForLiterals(input_pos_, input_size,
+                                kRingBufferMask, ringbuffer_.start(),
+                                &literal_cost_[0]);
+    CreateBackwardReferences(input_size, input_pos_,
+                             ringbuffer_.start(),
+                             &literal_cost_[0],
+                             kRingBufferMask, kMaxBackwardDistance,
+                             hasher_,
+                             &commands);
+    ComputeDistanceShortCodes(&commands, dist_ringbuffer_,
+                              &dist_ringbuffer_idx_);
+  }
   EncodingParams params;
   params.num_direct_distance_codes = 12;
   params.distance_postfix_bits = 1;
   params.literal_context_mode = CONTEXT_SIGNED;
+  const int storage_ix0 = storage_ix_;
   MetaBlock mb;
   BuildMetaBlock(params, commands, ringbuffer_.start(), input_pos_,
                  kRingBufferMask, &mb);
-  StoreMetaBlock(mb, ringbuffer_.start(), kRingBufferMask,
+  StoreMetaBlock(mb, is_last, ringbuffer_.start(), kRingBufferMask,
                  &input_pos_, &storage_ix_, storage_);
-  size_t output_size = storage_ix_ >> 3;
-  memcpy(encoded_buffer, storage_, output_size);
-  *encoded_size = output_size;
-  storage_ix_ -= output_size << 3;
-  storage_[storage_ix_ >> 3] = storage_[output_size];
+  size_t output_size = is_last ? ((storage_ix_ + 7) >> 3) : (storage_ix_ >> 3);
+  if (input_size + 4 < output_size) {
+    storage_ix_ = storage_ix0;
+    storage_[storage_ix_ >> 3] &= (1 << (storage_ix_ & 7)) - 1;
+    EncodeMetaBlockLength(input_size, false, true, &storage_ix_, storage_);
+    size_t hdr_size = (storage_ix_ + 7) >> 3;
+    memcpy(encoded_buffer, storage_, hdr_size);
+    memcpy(encoded_buffer + hdr_size, input_buffer, input_size);
+    *encoded_size = hdr_size + input_size;
+    if (is_last) {
+      encoded_buffer[*encoded_size] = 0x3;  // ISLAST, ISEMPTY
+      ++(*encoded_size);
+    }
+    storage_ix_ = 0;
+    storage_[0] = 0;
+  } else {
+    memcpy(encoded_buffer, storage_, output_size);
+    *encoded_size = output_size;
+    if (is_last) {
+      storage_ix_ = 0;
+      storage_[0] = 0;
+    } else {
+      storage_ix_ -= output_size << 3;
+      storage_[storage_ix_ >> 3] = storage_[output_size];
+    }
+  }
 }
 
 void BrotliCompressor::FinishStream(
     size_t* encoded_size, uint8_t* encoded_buffer) {
-  WriteBits(2, 0x3, &storage_ix_, storage_);
-  *encoded_size = (storage_ix_ + 7) >> 3;
-  memcpy(encoded_buffer, storage_, *encoded_size);
+  WriteMetaBlock(0, NULL, true, encoded_size, encoded_buffer);
 }
 
 
@@ -857,21 +895,19 @@ int BrotliCompressBuffer(size_t input_size,
 
   while (input_buffer < input_end) {
     int block_size = max_block_size;
+    bool is_last = false;
     if (block_size >= input_end - input_buffer) {
       block_size = input_end - input_buffer;
+      is_last = true;
     }
     size_t output_size = max_output_size;
-    compressor.WriteMetaBlock(block_size, input_buffer,
+    compressor.WriteMetaBlock(block_size, input_buffer, is_last,
                               &output_size, &encoded_buffer[*encoded_size]);
     input_buffer += block_size;
     *encoded_size += output_size;
     max_output_size -= output_size;
   }
 
-  size_t output_size = max_output_size;
-  compressor.FinishStream(&output_size, &encoded_buffer[*encoded_size]);
-  *encoded_size += output_size;
-
   return 1;
 }
 
diff --git a/enc/encode.h b/enc/encode.h
index 60d150b..0494b83 100644
--- a/enc/encode.h
+++ b/enc/encode.h
@@ -39,6 +39,7 @@ class BrotliCompressor {
   // written.
   void WriteMetaBlock(const size_t input_size,
                       const uint8_t* input_buffer,
+                      const bool is_last,
                       size_t* encoded_size,
                       uint8_t* encoded_buffer);
 
diff --git a/enc/entropy_encode.cc b/enc/entropy_encode.cc
index 37d0d9e..5d53da7 100644
--- a/enc/entropy_encode.cc
+++ b/enc/entropy_encode.cc
@@ -157,6 +157,17 @@ void CreateHuffmanTree(const int *data,
   }
 }
 
+void Reverse(uint8_t* v, int start, int end) {
+  --end;
+  while (start < end) {
+    int tmp = v[start];
+    v[start] = v[end];
+    v[end] = tmp;
+    ++start;
+    --end;
+  }
+}
+
 void WriteHuffmanTreeRepetitions(
     const int previous_value,
     const int value,
@@ -170,26 +181,24 @@ void WriteHuffmanTreeRepetitions(
     ++(*tree_size);
     --repetitions;
   }
-  while (repetitions >= 1) {
-    if (repetitions < 3) {
-      for (int i = 0; i < repetitions; ++i) {
-        tree[*tree_size] = value;
-        extra_bits[*tree_size] = 0;
-        ++(*tree_size);
-      }
-      return;
-    } else if (repetitions < 7) {
-      // 3 to 6 left.
-      tree[*tree_size] = 16;
-      extra_bits[*tree_size] = repetitions - 3;
+  if (repetitions < 3) {
+    for (int i = 0; i < repetitions; ++i) {
+      tree[*tree_size] = value;
+      extra_bits[*tree_size] = 0;
       ++(*tree_size);
-      return;
-    } else {
-      tree[*tree_size] = 16;
-      extra_bits[*tree_size] = 3;
-      ++(*tree_size);
-      repetitions -= 6;
     }
+  } else {
+    repetitions -= 3;
+    int start = *tree_size;
+    while (repetitions >= 0) {
+      tree[*tree_size] = 16;
+      extra_bits[*tree_size] = repetitions & 0x3;
+      ++(*tree_size);
+      repetitions >>= 2;
+      --repetitions;
+    }
+    Reverse(tree, start, *tree_size);
+    Reverse(extra_bits, start, *tree_size);
   }
 }
 
@@ -198,30 +207,24 @@ void WriteHuffmanTreeRepetitionsZeros(
     uint8_t* tree,
     uint8_t* extra_bits,
     int* tree_size) {
-  while (repetitions >= 1) {
-    if (repetitions < 3) {
-      for (int i = 0; i < repetitions; ++i) {
-        tree[*tree_size] = 0;
-        extra_bits[*tree_size] = 0;
-        ++(*tree_size);
-      }
-      return;
-    } else if (repetitions < 11) {
-      tree[*tree_size] = 17;
-      extra_bits[*tree_size] = repetitions - 3;
+  if (repetitions < 3) {
+    for (int i = 0; i < repetitions; ++i) {
+      tree[*tree_size] = 0;
+      extra_bits[*tree_size] = 0;
       ++(*tree_size);
-      return;
-    } else if (repetitions < 139) {
-      tree[*tree_size] = 18;
-      extra_bits[*tree_size] = repetitions - 11;
-      ++(*tree_size);
-      return;
-    } else {
-      tree[*tree_size] = 18;
-      extra_bits[*tree_size] = 0x7f;  // 138 repeated 0s
-      ++(*tree_size);
-      repetitions -= 138;
     }
+  } else {
+    repetitions -= 3;
+    int start = *tree_size;
+    while (repetitions >= 0) {
+      tree[*tree_size] = 17;
+      extra_bits[*tree_size] = repetitions & 0x7;
+      ++(*tree_size);
+      repetitions >>= 3;
+      --repetitions;
+    }
+    Reverse(tree, start, *tree_size);
+    Reverse(extra_bits, start, *tree_size);
   }
 }
 
diff --git a/enc/entropy_encode.h b/enc/entropy_encode.h
index 6f08e9a..89c3e1a 100644
--- a/enc/entropy_encode.h
+++ b/enc/entropy_encode.h
@@ -98,7 +98,7 @@ void BuildEntropyCode(const Histogram<kSize>& histogram,
   ConvertBitDepthsToSymbols(&code->depth_[0], alphabet_size, &code->bits_[0]);
 }
 
-static const int kCodeLengthCodes = 19;
+static const int kCodeLengthCodes = 18;
 
 // Literal entropy code.
 typedef EntropyCode<256> EntropyCodeLiteral;
@@ -106,6 +106,10 @@ typedef EntropyCode<256> EntropyCodeLiteral;
 typedef EntropyCode<kNumCommandPrefixes> EntropyCodeCommand;
 typedef EntropyCode<kNumDistancePrefixes> EntropyCodeDistance;
 typedef EntropyCode<kNumBlockLenPrefixes> EntropyCodeBlockLength;
+// Context map entropy code, 256 Huffman tree indexes + 16 run length codes.
+typedef EntropyCode<272> EntropyCodeContextMap;
+// Block type entropy code, 256 block types + 2 special symbols.
+typedef EntropyCode<258> EntropyCodeBlockType;
 
 }  // namespace brotli
 
diff --git a/enc/histogram.h b/enc/histogram.h
index d8012f9..45726f5 100644
--- a/enc/histogram.h
+++ b/enc/histogram.h
@@ -78,6 +78,10 @@ typedef Histogram<256> HistogramLiteral;
 typedef Histogram<kNumCommandPrefixes> HistogramCommand;
 typedef Histogram<kNumDistancePrefixes> HistogramDistance;
 typedef Histogram<kNumBlockLenPrefixes> HistogramBlockLength;
+// Context map histogram, 256 Huffman tree indexes + 16 run length codes.
+typedef Histogram<272> HistogramContextMap;
+// Block type histogram, 256 block types + 2 special symbols.
+typedef Histogram<258> HistogramBlockType;
 
 static const int kLiteralContextBits = 6;
 static const int kDistanceContextBits = 2;