Currently, the output buffer is a std::vector<uint8_t>.
When the buffer grows, resizing will cause unnecessary memcpy().
This change uses a list of bytes object to represent output buffer, can avoid the extra overhead of resizing.
In addition, C++ code can be removed, it's a pure C extension.
* IMPORTANT: decoder: fix potential overflow when input chunk is >2GiB
* simplify max Huffman table size calculation
* eliminate symbol duplicates (static arrays in .h files)
* minor combing in research/ code
Our dist tarball is missing hash_rolling_inc.h and
hash_composite_inc.h, which causes subsequent autotools
builds to fail. Fix this by adding it to the sources list.
Signed-off-by: William A. Kennington III <william@wkennington.com>
* New feature: "Large Window Brotli"
By setting special encoder/decoder flag it is now possible to extend
LZ-window up to 30 bits; though produced stream will not be RFC7932
compliant.
Added new dictionary generator - "DSH". It combines speed of "Sieve"
and quality of "DM". Plus utilities to prepare train corpora
(remove unique strings).
Improved compression ratio: now two sub-blocks could be stitched:
the last copy command could be extended to span the next sub-block.
Fixed compression ineffectiveness caused by floating numbers rounding and
wrong cost heuristic.
Other C changes:
- combined / moved `context.h` to `common`
- moved transforms to `common`
- unified some aspects of code formatting
- added an abstraction for encoder (static) dictionary
- moved default allocator/deallocator functions to `common`
brotli CLI:
- window size is auto-adjusted if not specified explicitly
Java:
- added "eager" decoding both to JNI wrapper and pure decoder
- huge speed-up of `DictionaryData` initialization
* Add dictionaryless compressed dictionary
* Fix `sources.lst`
* Fix `sources.lst` and add a note that `libtool` is also required.
* Update setup.py
* Fix `EagerStreamTest`
* Fix BUILD file
* Add missing `libdivsufsort` dependency
* Fix "unused parameter" warning.
* [appveyor] remove 'deploy' stage; only test python 2.7 and 3.6
all the other python versions are being built and tested on
https://github.com/google/brotli-wheels/blob/d571d63/appveyor.yml
* remove terrify submodule as not needed any more
* [travis] just test py2.7 and 3.6 on linux; remove extra osx python builds
All the other python versions for OSX are being built/tested on:
https://github.com/google/brotli-wheels/blob/d571d63/.travis.yml
Also, there's no need to build and deploy wheels here, as that's done
in the separate repository.
* [setup.py] only rebuild if dependency are newer; fix typo in list of 'depends'
https://github.com/python/cpython/blob/v3.6.2/Lib/distutils/command/build_ext.py#L485-L500
* [ci] only run 'python setup.py test'
if we run 'python setup.py built test', the setuptools 'test' command will
forcibly re-run the build_ext subcommand because it wants to pass the --inplace
option (it ignores whether it's up to date, just re-runs it all the time).
with this we go from running built_ext twice, to running it only once per build
* [Makefile] run 'build_ext --inplace' instead of 'develop' as default target
The 'develop' command is like 'install' in the sense that it
modifies the user's python environment.
The default make target should be less intrusive, i.e. just building
the extension module in-place without modify anything in the user's
environment.
We don't need to tell make about the dependency between 'test' and
'build' target as that is baked in the `python setup.py test` command.
* [Makefile] add 'develop' target; remove unnecessary 'tests' target
`make test` is good enough
* [Makefile] `setup.py test` requires setuptools; run `python -m unittest`
This will work even if setuptools is not installed, which is unlikely
nowadays but still our `setup.py` works with plain distutils, so
we may well have our tests work without setuptools.
* [python/README.md] add ref to 'develop' target; remove 'tests', just 'make test'
* [setup.py] import modules as per nicksay's comment
https://github.com/google/brotli/pull/583#discussion_r131981049
* [Makefile] add 'develop' to .PHONY targets
remove 'tests' from .PHONY
* [appveyor] remove unused setup scripts
We don't need to install custom python versions, we are
using the pre-installed ones on Appveyor.
* [appveyor] remove unneeded setup code
Common:
* wrap dictionary data into `BrotliDictionary` structure
* replace public constant with getter `BrotliGetDictionary`
* reformat dictionary data
Decoder:
* adopt common changes
* clarify acceptable instance usage patterns
* hold reference to dictionary in state
Encoder:
* adopt common changes
* eliminate PIC spots in `CreateBackwardReferences`
* add per-chunk ratio guards for q0 and q1
* precompute relative distances to avoid repeated calculations
* prostpone hasher allocation/initialization
* refactor Hashers to be class-like structure
* further improvements for 1MiB+ inputs
* added new hasher type; made hashers more configurable
Java:
* Pull byte->int magic to `IntReader` from `BitReader`
* pull `BROTLI_MAX_BACKWARD_LIMIT` to constants
* split generic and Zopfli backward references code
* pull hashers init and stitch invocation to encoder
* make `dictionary_hash` a compilation unit
* add `size hint` parameter
* add new hasher
* use `size hint` to pick new hasher for q4
* modernize clz guard (fix#495)
* move `hash to binary tree` to separate file
* add `Initialize` and `Cleanup` to all hashers
* do not raise OOM if malloc(0) == NULL (fix#500)
Python 2.7 for Windows is compiled using MS Visaul C++ 9.0 (Visual Studio 2008).
However the latter does not support many modern C++ features which are
required to compile the Brotli encoder. So we monkey-patch distutils
to always look for MSVC version 10.0 instead of 9.0.
This is used for quality 11, for qualities <= 9 we already
have a simpler hash table.
The static data size is 252 kB, and this removes the
need to initialize a huge hash map at startup, which was
the reason why transforms had to be disabled by default.
In comparison, the static dictionary itself is 120 kB.
This supports every transform, except the kOmitFirstN.
* Change order of members of bit reader state structure.
* Remove unused includes for assert. Add BROTLI_DCHECK
macros and use it instead of assert.
* Do not calculate nbits in common case of ReadSymbol.
* Introduce and use PREDICT_TRUE / PREDICT_FALSE macros.
* Allocate less memory in the brotli decoder if it knows
the result size beforehand. Before this, the decoder
would always allocate 16MB if the encoder annotated the
window size as 22 bit (which is the default), even if the
file is only a few KB uncompressed. Now, it'll only
allocate a ringbuffer as large as needed for the result file.
But only if it can know the filesize, it's not possible
to know that if there are multiple metablocks or too large
uncompressed metablock.