Commit Graph

30 Commits

Author SHA1 Message Date
Eugene Kliuchnikov
ce92c95601
brotlidump: fix dictionary file discovery (#997) 2023-01-03 20:44:14 +01:00
Evgenii Kliuchnikov
a8f5813b84 Update
Documentation:
  - add note that brotli is a "stream" format, not an archive-like
  - regenerate .1 with Pandoc
Build:
  - drop legacy "BROTLI_BUILD_PORTABLE" option
  - drop "BROTLI_SANITIZED" definition
Code:
  - c: comb includes
  - c/enc: extract encoder state into separate header
  - c/enc: drop designated q10 codepath
  - c/enc: dealing better with flushing of empty stream
  - fix MSVC compilation
API:
  - py: use library version instead of one in version.h
  - c: add plugable API to report consumed input / produced output
  - c/java: support "lean" prepared dictionaries (without copy of source)
2022-11-17 13:03:09 +00:00
Eugene Kliuchnikov
8376f72ed6
Prepare for copybara (#939)
Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
2021-11-10 10:34:39 +01:00
Eugene Kliuchnikov
62662f87cd
Strip "./" in includes (#925)
Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
2021-09-08 09:18:45 +02:00
Eugene Kliuchnikov
0e42caf359
Migrate to github actions (#920)
Not all combinations are migrated to the initial configuration; corresponding TODOs added.

Drive-by: additional combinations uncovered minor portability problems -> fixed
Drive-by: remove no-longer used "script" files.

Co-authored-by: Eugene Kliuchnikov <eustas@chromium.org>
2021-08-31 14:07:17 +02:00
Eugene Kliuchnikov
68f1b90ad0
Update (#918)
Prepare to use copybara worklow.
2021-08-18 19:15:07 +02:00
Eugene Kliuchnikov
f8c6717745
Update (#908)
* re-enable Js build/test
  * improve decoder performance
  * rewrite dictionary data in Java/Js to a shorter uncompressed form
  * improve dictionary generation tool
2021-06-23 09:40:57 +02:00
Tim Gates
685d7baea9
docs: Fix small typo: rougly -> roughly (#849) 2020-09-27 11:00:29 +02:00
Eugene Kliuchnikov
223d80cfbe
Update (#826)
* IMPORTANT: decoder: fix potential overflow when input chunk is >2GiB
 * simplify max Huffman table size calculation
 * eliminate symbol duplicates (static arrays in .h files)
 * minor combing in research/ code
2020-08-26 12:32:27 +02:00
Eugene Kliuchnikov
7f740f1308
Update (#807)
- fix formatting
 - fix type conversion
 - fix no-op arithmetic with null-pointer
 - improve performance of hash_longest_match64
 - go: detect read after close
 - java decoder: support compound dictionary
 - remove executable flag on non-scripts
2020-05-15 11:06:21 +02:00
Eugene Kliuchnikov
4b2b2d4f83
Update (#749)
Update:

 * Bazel: fix MSVC configuration
 * C: common: extended documentation and helpers around distance codes
 * C: common: enable BROTLI_DCHECK in "debug" builds
 * C: common: fix implicit trailing zero in `kPrefixSuffix`
 * C: dec: fix possible bit reader discharge for "large-window" mode
 * C: dec: simplify distance decoding via lookup table
 * C: dec: reuse decoder state members memory via union with lookup table
 * C: dec: add decoder state diagram
 * C: enc: clarify access to static dictionary
 * C: enc: improve static dictionary hash
 * C: enc: add "stream offset" parameter for parallel encoding
 * C: enc: reorganize hasher; now Q2-Q3 require exactly 256KiB
           to avoid global TCMalloc lock
 * C: enc: fix rare access to uninitialized data in ring-buffer
 * C: enc: reorganize logging / checks in `write_bits.h`
 * Java: dec: add "large-window" support
 * Java: dec: improve speed
 * Java: dec: debug and 32-bit mode are now activated via system properties
 * Java: dec: demystify some state variables (use better names)
 * Dictionary generator: add single input mode
 * Java: dec: modernize tests
 * Bazel: js: pick working commit for closure rules
2019-04-12 13:57:42 +02:00
Eugene Kliuchnikov
8544ae858d
Update (#680)
* fix MSVC warnings
 * cleanups
2018-06-09 11:17:13 +02:00
Eugene Kliuchnikov
1e7ea1d8e6
Inverse bazel project/workspace tree (#677)
* Inverse bazel workspace tree.

Now each subproject directly depends on root (c) project.

This helps to mitigate Bazel bug bazelbuild/bazel#2391; short summary:
Bazel does not work if referenced subproject `WORKSPACE` uses any
repositories that embedding project does not.

Bright side: building C project is much faster;
no need to download closure, go and JDK...
2018-06-04 17:53:16 +02:00
Eugene Kliuchnikov
631fe194a1
Update (#651)
* fix `bazel` build (ignore switch case fall-through)
* add `NPOSTFIX` / `NDIRECT` encoder parameters
* fix source file lists (add `params.h`)
* fix bug in `durchschlag`
* print clarifying messages wheb CLI argument parsing fails
2018-03-20 17:37:41 +06:00
Eugene Kliuchnikov
533843e354
Update (#643)
Update
 * make the zopflification aware of `NDIRECT`, `NPOSTFIX`
   (better compression in `font` mode)
 * add small and simple decoder tool
 * fix typo
 * Java: wrapper: make decoder channel more async-friendly

Ramp up version to 1.0.3 / 1.0.3
2018-03-02 15:49:58 +01:00
Eugene Kliuchnikov
35e69fc7cf
New feature: "Large Window Brotli" (#640)
* New feature: "Large Window Brotli"

By setting special encoder/decoder flag it is now possible to extend
LZ-window up to 30 bits; though produced stream will not be RFC7932
compliant.

Added new dictionary generator - "DSH". It combines speed of "Sieve"
and quality of "DM". Plus utilities to prepare train corpora
(remove unique strings).

Improved compression ratio: now two sub-blocks could be stitched:
the last copy command could be extended to span the next sub-block.

Fixed compression ineffectiveness caused by floating numbers rounding and
wrong cost heuristic.

Other C changes:
 - combined / moved `context.h` to `common`
 - moved transforms to `common`
 - unified some aspects of code formatting
 - added an abstraction for encoder (static) dictionary
 - moved default allocator/deallocator functions to `common`

brotli CLI:
 - window size is auto-adjusted if not specified explicitly

Java:
 - added "eager" decoding both to JNI wrapper and pure decoder
 - huge speed-up of `DictionaryData` initialization

* Add dictionaryless compressed dictionary

* Fix `sources.lst`

* Fix `sources.lst` and add a note that `libtool` is also required.

* Update setup.py

* Fix `EagerStreamTest`

* Fix BUILD file

* Add missing `libdivsufsort` dependency

* Fix "unused parameter" warning.
2018-02-26 09:04:36 -05:00
Daniel Chýlek
b5033d0e1e Fix brotlidump.py crashing when complex prefix code has exactly 1 non-zero code length (#635)
According to the format specification regarding complex prefix codes:

> If there are at least two non-zero code lengths, any trailing zero
> code lengths are omitted, i.e., the last code length in the
> sequence must be non-zero.  In this case, the sum of (32 >> code
> length) over all the non-zero code lengths must equal to 32.

> If the lengths have been read for the entire code length alphabet
> and there was only one non-zero code length, then the prefix code
> has one symbol whose code has zero length.

The script does not handle a case where there is just 1 non-zero code
length where the sum rule doesn't apply, which causes a StopIteration
exception when it attempts to read past the list boundaries.

An example of such file is tests/testdata/mapsdatazrh.compressed. I made
sure this change doesn't break anything by processing all *.compressed
files from the testdata folder with no thrown exceptions.
2018-02-08 12:48:24 +01:00
Eugene Kliuchnikov
0ad94eed00
Update (#620)
* add autotools build
* separate semantic and ABI version
* extract sources.lst (used by CMake and Automake)
* share pkgconfig templates (used by CMake and Automake)
* decoder: always set `total_out`
* encoder: fix `BROTLI_ENSURE_CAPACITY` macro (no-op after preprocessor)
* decoder/encoder: refine `free_func` contract
2017-11-28 15:37:28 +01:00
Eugene Kliuchnikov
39ef4bbdcf Add new (fast) dictionary generator engine. (#616)
Add CLI for dictionary generation.
Add BUILD file for research folder
2017-10-13 11:25:03 +02:00
Tomáš Popela
a0c7dafe28 Fix permissions of various files in project (#613)
Move from 755 to 644.
2017-10-10 11:24:13 +02:00
Eugene Kliuchnikov
a629289e32 Update (#590)
* add transpiled JS decoder
 * make PY wrapper accept memview
 * fix dictionary generator
 * speedup compression of RLEish data
2017-08-28 11:31:29 +02:00
Eugene Kliuchnikov
52441069ef Update (#574)
* Update
 * decoder: better behavior after failure
 * encoder: replace "len_x_code" with delta
 * research: add experimental dictionary generator
 * python: test combing
2017-07-21 10:07:24 +02:00
Eugene Kliuchnikov
27d94590a2 Research (#491)
* add advanced mode for optimal references generator
 * fix #489

Thanks to Ivan Nikulin for working on it.
2016-12-22 13:03:28 +01:00
Eugene Kliuchnikov
fd96151b2a Move brotlidump.py to research/ (#487) 2016-12-20 18:00:51 +01:00
Eugene Kliuchnikov
dd8fa3e8dd Update research
* don't use `assert` when side-effect is desired
 * use `gflags` to pick options from args

Other changes:
 * teach stub `Makefile` to do partial rebuild
 * remove obsolete `tools/version.h`
2016-09-22 11:32:23 +02:00
Ivan Nikulin
9294022929 Replace sais.hxx by submodule hillbig/esaxx. 2016-09-19 19:12:30 +02:00
Ivan Nikulin
4291932022 Update research tools description. 2016-09-15 17:19:26 +02:00
Ivan Nikulin
0e52c59a07 Update variable naming. 2016-09-15 16:59:52 +02:00
Ivan Nikulin
9589396e5d Add description of research tools. 2016-09-15 11:34:19 +02:00
Ivan Nikulin
58cecf1783 Add distance encoding research tools. 2016-09-15 10:44:19 +02:00