Commit Graph

93 Commits

Author SHA1 Message Date
Evgenii Kliuchnikov
a8f5813b84 Update
Documentation:
  - add note that brotli is a "stream" format, not an archive-like
  - regenerate .1 with Pandoc
Build:
  - drop legacy "BROTLI_BUILD_PORTABLE" option
  - drop "BROTLI_SANITIZED" definition
Code:
  - c: comb includes
  - c/enc: extract encoder state into separate header
  - c/enc: drop designated q10 codepath
  - c/enc: dealing better with flushing of empty stream
  - fix MSVC compilation
API:
  - py: use library version instead of one in version.h
  - c: add plugable API to report consumed input / produced output
  - c/java: support "lean" prepared dictionaries (without copy of source)
2022-11-17 13:03:09 +00:00
Eugene Kliuchnikov
62662f87cd
Strip "./" in includes (#925)
Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
2021-09-08 09:18:45 +02:00
Eugene Kliuchnikov
0e42caf359
Migrate to github actions (#920)
Not all combinations are migrated to the initial configuration; corresponding TODOs added.

Drive-by: additional combinations uncovered minor portability problems -> fixed
Drive-by: remove no-longer used "script" files.

Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
2021-08-31 14:07:17 +02:00
Evgenii Kliuchnikov
fcda9db7fd Shorten docs/brotli.svg
Kudos to @alrra
2020-10-08 14:50:33 +02:00
Eugene Kliuchnikov
f6b3aa6d0f
Add brotli logo (#845)
Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
2020-09-24 13:43:44 +02:00
Eugene Kliuchnikov
7f740f1308
Update (#807)
- fix formatting
 - fix type conversion
 - fix no-op arithmetic with null-pointer
 - improve performance of hash_longest_match64
 - go: detect read after close
 - java decoder: support compound dictionary
 - remove executable flag on non-scripts
2020-05-15 11:06:21 +02:00
Eugene Kliuchnikov
4b2b2d4f83
Update (#749)
Update:

 * Bazel: fix MSVC configuration
 * C: common: extended documentation and helpers around distance codes
 * C: common: enable BROTLI_DCHECK in "debug" builds
 * C: common: fix implicit trailing zero in `kPrefixSuffix`
 * C: dec: fix possible bit reader discharge for "large-window" mode
 * C: dec: simplify distance decoding via lookup table
 * C: dec: reuse decoder state members memory via union with lookup table
 * C: dec: add decoder state diagram
 * C: enc: clarify access to static dictionary
 * C: enc: improve static dictionary hash
 * C: enc: add "stream offset" parameter for parallel encoding
 * C: enc: reorganize hasher; now Q2-Q3 require exactly 256KiB
           to avoid global TCMalloc lock
 * C: enc: fix rare access to uninitialized data in ring-buffer
 * C: enc: reorganize logging / checks in `write_bits.h`
 * Java: dec: add "large-window" support
 * Java: dec: improve speed
 * Java: dec: debug and 32-bit mode are now activated via system properties
 * Java: dec: demystify some state variables (use better names)
 * Dictionary generator: add single input mode
 * Java: dec: modernize tests
 * Bazel: js: pick working commit for closure rules
2019-04-12 13:57:42 +02:00
Eugene Kliuchnikov
631fe194a1
Update (#651)
* fix `bazel` build (ignore switch case fall-through)
* add `NPOSTFIX` / `NDIRECT` encoder parameters
* fix source file lists (add `params.h`)
* fix bug in `durchschlag`
* print clarifying messages wheb CLI argument parsing fails
2018-03-20 17:37:41 +06:00
Eugene Kliuchnikov
35e69fc7cf
New feature: "Large Window Brotli" (#640)
* New feature: "Large Window Brotli"

By setting special encoder/decoder flag it is now possible to extend
LZ-window up to 30 bits; though produced stream will not be RFC7932
compliant.

Added new dictionary generator - "DSH". It combines speed of "Sieve"
and quality of "DM". Plus utilities to prepare train corpora
(remove unique strings).

Improved compression ratio: now two sub-blocks could be stitched:
the last copy command could be extended to span the next sub-block.

Fixed compression ineffectiveness caused by floating numbers rounding and
wrong cost heuristic.

Other C changes:
 - combined / moved `context.h` to `common`
 - moved transforms to `common`
 - unified some aspects of code formatting
 - added an abstraction for encoder (static) dictionary
 - moved default allocator/deallocator functions to `common`

brotli CLI:
 - window size is auto-adjusted if not specified explicitly

Java:
 - added "eager" decoding both to JNI wrapper and pure decoder
 - huge speed-up of `DictionaryData` initialization

* Add dictionaryless compressed dictionary

* Fix `sources.lst`

* Fix `sources.lst` and add a note that `libtool` is also required.

* Update setup.py

* Fix `EagerStreamTest`

* Fix BUILD file

* Add missing `libdivsufsort` dependency

* Fix "unused parameter" warning.
2018-02-26 09:04:36 -05:00
Eugene Kliuchnikov
da254cffdb
Update (#630)
* merge {dec|enc}/port.h into common/platform.h
 * fix one-shot q=10 1-byte input compression
 * fix some unprefixed definitions
 * make hashers host-endianness-independent
 * extract enc/params.h from enc/quality.h
 * fix API documentation / typos
 * improve `BrotliEncoderMaxCompressedSize`
2017-12-12 14:33:12 +01:00
Eugene Kliuchnikov
3e58ea5f90 Update (#617)
* remove `const` on `BrotliDictionary` members
 * extend `ZofliNode` distance range to 128MiB
 * add missing `port.h` include to `quality.h`
 * fix typo in encoder API-doc
 * regenerate `decode.min.js`
2017-10-13 14:50:51 +02:00
Tomáš Popela
a0c7dafe28 Fix permissions of various files in project (#613)
Move from 755 to 644.
2017-10-10 11:24:13 +02:00
Eugene Kliuchnikov
c60563591a Fix API documentation + theoretical NPEs (#602) 2017-09-20 15:02:01 +02:00
Eugene Kliuchnikov
d63e8f75f5 Update API, and more (#581)
Update API, and more:
 * remove "custom dictionary" support
 * c/encoder: fix #580: big-endian build
 * Java: reduce jar size
 * Java: speedup decoding
 * Java: add 32-bit CPU support
 * Java: make source code JS transpiler-ready
2017-08-04 10:02:56 +02:00
Eugene Kliuchnikov
05d5f3d77a Update (#560)
Update:
 * add decoder API to avoid ringbuffer reallocation
 * fix MSVC warnings
 * remove dead code
2017-06-13 12:52:56 +02:00
Eugene Kliuchnikov
03739d2b11 Update (#555)
Update:
 * new CLI; bro -> brotli; + man page
 * JNI wrappers preparation (for bazel build)
 * add raw binary dictionary representation `dictionary.bin`
 * add ability to side-load brotli RFC dictionary
 * decoder persists last error now
 * fix `BrotliDecoderDecompress` documentation
 * go reader don't block until necessary
 * more consistent bazel target names
 * Java dictionary data compiled footprint reduced
 * Java tests refactoring
2017-05-29 17:55:14 +02:00
Eugene Kliuchnikov
c931e576d2 Move java/ to java/org/brotli/ to fix sources.jar structure (#517)
Also added man pages to `docs/`
2017-02-28 16:59:52 +01:00
Zoltan Szabadka
af1768478a Update the spec reference to RFC 7932, remove the old internet draft. 2016-08-02 13:27:29 +02:00
Zoltan Szabadka
5f02d612e1 Update the spec to latest published version. 2016-05-26 12:20:42 +02:00
Zoltan Szabadka
1841f7cc26 Create -11 version of the spec. 2016-05-26 12:19:31 +02:00
Zoltan Szabadka
3a9032ba87 Address the DISCUSS ballot position from the IESG review of the spec. 2016-05-04 21:27:56 +02:00
Zoltan Szabadka
136d39bd70 Create -10 version of the specification. 2016-05-04 21:25:37 +02:00
Zoltan Szabadka
e96d5b29e7 Address review comments in the specification.
This commit updates the draft to the ietf -09 version:
https://www.ietf.org/id/draft-alakuijala-brotli-09.txt

In this version review comments from Jean-loup Gailly and
the ietf secdir review were addressed.
2016-04-20 10:48:14 +02:00
Zoltan Szabadka
26cf47f3f0 Create -09 version of the draft. 2016-04-20 10:45:40 +02:00
Zoltan Szabadka
6a92849c93 Change the title and the expiration date of the -08 draft. 2016-01-11 13:36:41 +01:00
Zoltan Szabadka
8ef0a2023d Create -08 version of the draft. 2016-01-11 13:35:30 +01:00
Zoltan Szabadka
3178f4bcf0 Add Robert Obryk to the Acknowledgements section of the spec
for his work on the first version of the spec in designing
the format of the compressed prefix codes.
2015-12-10 11:03:22 +01:00
Joe Tsai
3fe5c24738 Fix 72-char line length violator 2015-11-10 15:09:40 -08:00
szabadka
77db683f0f Merge pull request #255 from ende76/master
FIX: Typo in reference to NBLTYPESL, Minor: added missing word _lengths_ to insert-and-copy lengths
2015-11-10 16:06:47 +01:00
Thomas Pickert
0e3329d513 Fixed accidental plural plural wording 2015-11-10 09:52:10 -05:00
Ende
e33ff0a679 Rearranged wording to stay under 72 character limit 2015-11-10 05:32:50 -05:00
Ende
1b8b801078 Fixed two references to wrong NBLTYPESx 2015-11-07 17:55:22 -05:00
Ende
9bb41938d3 Minor: added missing word _lengths_ to insert-and-copy lengths 2015-11-06 14:14:18 -05:00
Zoltan Szabadka
652ca06b77 Update the Acknowledgments section of the spec. 2015-11-04 14:54:59 +01:00
Zoltan Szabadka
af61a51365 Update .txt version of the spec. 2015-11-03 17:22:53 +01:00
Joe Tsai
ce2bb01f33 Revert accidental deletion in Section 10. 2015-11-02 09:39:11 -08:00
Joe Tsai
2421ed928f Clarify pseudo-code in Section 10. 2015-11-02 03:35:39 -08:00
Joe Tsai
1a50dc9b0f fix formatting of Section 12. 2015-11-02 03:30:21 -08:00
Joe Tsai
902e81591c Fix formatting, section references, and grammar
* Add .nf and .fi tags everywhere they were missing

* Consistently use Section X.X. instead of the following:
	Paragraph X.X.
	section X

* Fix minor grammar issues
2015-11-02 03:27:27 -08:00
Joe Tsai
3ab9853648 Fix grammar in Section 2.
s/copy length determine /copy length determines /g
2015-11-01 23:00:07 -08:00
Joe Tsai
e57dbc0f5d Minor capitalization fix 2015-11-01 18:23:20 -08:00
Joe Tsai
5c869c9de2 Clarify simple and complex prefix codes
* At the beginning of the simple prefix code section, telling us that "a value
of 1 indicates the number of leading zeros" is not very helpful. Instead, it
should indicate that it means a complex prefix code and point the reader to
the relevant section (which repeats this information in more detail)

* Clearly indicate that reusing a value is an error! This seems to be the
behavior of the of the reference implementation.

* Clarify what the termination conditions are while reading the prefix codes.
Also, indicate that it is an error if the prefix tree is over-subscribed or
under-subscribed.

* Clearly state what is the maximum number of individual symbols that may be
read. This ensures that it is forbidden to an stream that continually says that
the symbols have zero length.
2015-11-01 17:01:38 -08:00
Joe Tsai
c5b6b5c7c1 Minor formatting changes
* In the description about "three categories", explicitly number them instead
of using a giant paragraph that is harder to follow.

* Switch lists of items to consistently use American style commas. The American
style lists is better for clarity purposes. Consider the following:
	-Each category of value (insert and copy lengths, literals and distances)
	+Each category of value (insert and copy lengths, literals, and distances)

* Make sure not to break a hyphenated phrase with a newline. When the nroff
file is processed, "insert-\nand-copy" becomes "insert- and-copy", making it
inconsistent with other uses of the hyphenated phrase.

* Consistently use the same hyphenated phrase if referred to as a single unit.
	"insert and copy"   -> "insert-and-copy"
	"least significant" -> "least-significant"
	"most significant"  -> "most-significant"
	"fixed length"      -> "fixed-length"
	"block switch"      -> "block-switch".

* Consistently use "indexes" instead of "indices"
2015-11-01 16:50:13 -08:00
Joe Tsai
166edb0287 Minor formatting of Section 9.2. and Section 9.3.
Many of the fields are copy-pastes of each other, but differ slightly
in placement of words, capitalization, or other random
oddities. This commit makes it so that if you simply do a search
replace on these following passages, you get the same thing:

s/NBLTYPESX/(NBLTYPESI|NBLTYPESL|NBLTYPESD)/g
s/CATEGORY/(insert-and-copy|literal|distance)/g

>>>
   1-11 bits: NBLTYPESX, # of CATEGORY block types, encoded
              with the same variable length code as above

      Prefix code over the block type code alphabet for
         CATEGORY block types, appears only if NBLTYPESX >= 2

      Prefix code over the block count code alphabet for
         CATEGORY block counts, appears only if NBLTYPESX >= 2

      Block count code + Extra bits for first CATEGORY
         block count, appears only if NBLTYPESX >= 2
<<<

>>>
      Block type code for next CATEGORY block type, appears
         only if NBLTYPESX >= 2 and the previous CATEGORY
         block count is zero

      Block count code + extra bits for next CATEGORY
         block count, appears only if NBLTYPESX >= 2 and the
         previous CATEGORY block count is zero
<<<
2015-11-01 16:28:25 -08:00
Joe Tsai
542a8b776e Clarify Section 7.3
* Acknowledge the fact that the context map is conceptually really a
two-dimensional matrix with 2 different keys, but in reality stored
as a one-dimensional array.

* Mention that InverseMoveToFrontTransform will not cause the
context map to have invalid indexes. This gives someone implementing
a decoder sanity that they do not have to go through the context
map again and check that all values are less than NTREES.
2015-10-29 09:50:19 -07:00
Joe Tsai
ff3897df2d Clarify Section 8.
* The phrase "difference between these distances" can either refer to
the conceptual difference (i.e. they hae different semantic meaning)
or to the mathematical difference (i.e. use substraction for the two).
Instead, just remove the sentence since the equations below make it
clear what we're supposed to do here.
2015-10-29 09:44:23 -07:00
Joe Tsai
2ffe45bd67 Clarify Section 4.
* If NDIRECT is zero, then the paragraph reads "from 16 to 15", which
doesn't make much sense. Thus, add a conditional to avoid this minor
oddity.
2015-10-29 09:42:00 -07:00
Joe Tsai
185cb9eada Define the maximum number of bytes transforms may add to a word
* This value is useful in implementing the decoder since we can know
ahead-of-time what size buffer is needed to contain the output of a
transformed word.
2015-10-29 09:40:41 -07:00
Joe Tsai
6d2575eab3 Use consistent bit convention in Section 5.
* Rather than say "lower 3 bits" in one sentence and "bits 3-5" in
the sentence right after, just consistently use the same convention
and say "0-2" and "3-5".
2015-10-29 09:39:06 -07:00
Joe Tsai
0e4cb52a8b Clarify Section 7.1.
* Provide exhaustive list of all the ways the last two bytes can be
sourced from.

* Also make a clear connection in this section that there are only 64
context IDs for literals. This is important for the indexing math
in context maps to make sense.
2015-10-29 08:32:11 -07:00