Merge remote-tracking branch 'upstream/longRangeMatcher' into ldm-integrate
This commit is contained in:
commit
17d8e0bdcc
@ -39,4 +39,4 @@ outlined on that page and do not file a public issue.
|
||||
|
||||
## License
|
||||
By contributing to Zstandard, you agree that your contributions will be licensed
|
||||
under the [LICENSE](LICENSE) file in the root directory of this source tree.
|
||||
under both the [LICENSE](LICENSE) file and the [COPYING](COPYING) file in the root directory of this source tree.
|
||||
|
339
COPYING
Normal file
339
COPYING
Normal file
@ -0,0 +1,339 @@
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
|
||||
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The licenses for most software are designed to take away your
|
||||
freedom to share and change it. By contrast, the GNU General Public
|
||||
License is intended to guarantee your freedom to share and change free
|
||||
software--to make sure the software is free for all its users. This
|
||||
General Public License applies to most of the Free Software
|
||||
Foundation's software and to any other program whose authors commit to
|
||||
using it. (Some other Free Software Foundation software is covered by
|
||||
the GNU Lesser General Public License instead.) You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
this service if you wish), that you receive source code or can get it
|
||||
if you want it, that you can change the software or use pieces of it
|
||||
in new free programs; and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to make restrictions that forbid
|
||||
anyone to deny you these rights or to ask you to surrender the rights.
|
||||
These restrictions translate to certain responsibilities for you if you
|
||||
distribute copies of the software, or if you modify it.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must give the recipients all the rights that
|
||||
you have. You must make sure that they, too, receive or can get the
|
||||
source code. And you must show them these terms so they know their
|
||||
rights.
|
||||
|
||||
We protect your rights with two steps: (1) copyright the software, and
|
||||
(2) offer you this license which gives you legal permission to copy,
|
||||
distribute and/or modify the software.
|
||||
|
||||
Also, for each author's protection and ours, we want to make certain
|
||||
that everyone understands that there is no warranty for this free
|
||||
software. If the software is modified by someone else and passed on, we
|
||||
want its recipients to know that what they have is not the original, so
|
||||
that any problems introduced by others will not reflect on the original
|
||||
authors' reputations.
|
||||
|
||||
Finally, any free program is threatened constantly by software
|
||||
patents. We wish to avoid the danger that redistributors of a free
|
||||
program will individually obtain patent licenses, in effect making the
|
||||
program proprietary. To prevent this, we have made it clear that any
|
||||
patent must be licensed for everyone's free use or not licensed at all.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||
|
||||
0. This License applies to any program or other work which contains
|
||||
a notice placed by the copyright holder saying it may be distributed
|
||||
under the terms of this General Public License. The "Program", below,
|
||||
refers to any such program or work, and a "work based on the Program"
|
||||
means either the Program or any derivative work under copyright law:
|
||||
that is to say, a work containing the Program or a portion of it,
|
||||
either verbatim or with modifications and/or translated into another
|
||||
language. (Hereinafter, translation is included without limitation in
|
||||
the term "modification".) Each licensee is addressed as "you".
|
||||
|
||||
Activities other than copying, distribution and modification are not
|
||||
covered by this License; they are outside its scope. The act of
|
||||
running the Program is not restricted, and the output from the Program
|
||||
is covered only if its contents constitute a work based on the
|
||||
Program (independent of having been made by running the Program).
|
||||
Whether that is true depends on what the Program does.
|
||||
|
||||
1. You may copy and distribute verbatim copies of the Program's
|
||||
source code as you receive it, in any medium, provided that you
|
||||
conspicuously and appropriately publish on each copy an appropriate
|
||||
copyright notice and disclaimer of warranty; keep intact all the
|
||||
notices that refer to this License and to the absence of any warranty;
|
||||
and give any other recipients of the Program a copy of this License
|
||||
along with the Program.
|
||||
|
||||
You may charge a fee for the physical act of transferring a copy, and
|
||||
you may at your option offer warranty protection in exchange for a fee.
|
||||
|
||||
2. You may modify your copy or copies of the Program or any portion
|
||||
of it, thus forming a work based on the Program, and copy and
|
||||
distribute such modifications or work under the terms of Section 1
|
||||
above, provided that you also meet all of these conditions:
|
||||
|
||||
a) You must cause the modified files to carry prominent notices
|
||||
stating that you changed the files and the date of any change.
|
||||
|
||||
b) You must cause any work that you distribute or publish, that in
|
||||
whole or in part contains or is derived from the Program or any
|
||||
part thereof, to be licensed as a whole at no charge to all third
|
||||
parties under the terms of this License.
|
||||
|
||||
c) If the modified program normally reads commands interactively
|
||||
when run, you must cause it, when started running for such
|
||||
interactive use in the most ordinary way, to print or display an
|
||||
announcement including an appropriate copyright notice and a
|
||||
notice that there is no warranty (or else, saying that you provide
|
||||
a warranty) and that users may redistribute the program under
|
||||
these conditions, and telling the user how to view a copy of this
|
||||
License. (Exception: if the Program itself is interactive but
|
||||
does not normally print such an announcement, your work based on
|
||||
the Program is not required to print an announcement.)
|
||||
|
||||
These requirements apply to the modified work as a whole. If
|
||||
identifiable sections of that work are not derived from the Program,
|
||||
and can be reasonably considered independent and separate works in
|
||||
themselves, then this License, and its terms, do not apply to those
|
||||
sections when you distribute them as separate works. But when you
|
||||
distribute the same sections as part of a whole which is a work based
|
||||
on the Program, the distribution of the whole must be on the terms of
|
||||
this License, whose permissions for other licensees extend to the
|
||||
entire whole, and thus to each and every part regardless of who wrote it.
|
||||
|
||||
Thus, it is not the intent of this section to claim rights or contest
|
||||
your rights to work written entirely by you; rather, the intent is to
|
||||
exercise the right to control the distribution of derivative or
|
||||
collective works based on the Program.
|
||||
|
||||
In addition, mere aggregation of another work not based on the Program
|
||||
with the Program (or with a work based on the Program) on a volume of
|
||||
a storage or distribution medium does not bring the other work under
|
||||
the scope of this License.
|
||||
|
||||
3. You may copy and distribute the Program (or a work based on it,
|
||||
under Section 2) in object code or executable form under the terms of
|
||||
Sections 1 and 2 above provided that you also do one of the following:
|
||||
|
||||
a) Accompany it with the complete corresponding machine-readable
|
||||
source code, which must be distributed under the terms of Sections
|
||||
1 and 2 above on a medium customarily used for software interchange; or,
|
||||
|
||||
b) Accompany it with a written offer, valid for at least three
|
||||
years, to give any third party, for a charge no more than your
|
||||
cost of physically performing source distribution, a complete
|
||||
machine-readable copy of the corresponding source code, to be
|
||||
distributed under the terms of Sections 1 and 2 above on a medium
|
||||
customarily used for software interchange; or,
|
||||
|
||||
c) Accompany it with the information you received as to the offer
|
||||
to distribute corresponding source code. (This alternative is
|
||||
allowed only for noncommercial distribution and only if you
|
||||
received the program in object code or executable form with such
|
||||
an offer, in accord with Subsection b above.)
|
||||
|
||||
The source code for a work means the preferred form of the work for
|
||||
making modifications to it. For an executable work, complete source
|
||||
code means all the source code for all modules it contains, plus any
|
||||
associated interface definition files, plus the scripts used to
|
||||
control compilation and installation of the executable. However, as a
|
||||
special exception, the source code distributed need not include
|
||||
anything that is normally distributed (in either source or binary
|
||||
form) with the major components (compiler, kernel, and so on) of the
|
||||
operating system on which the executable runs, unless that component
|
||||
itself accompanies the executable.
|
||||
|
||||
If distribution of executable or object code is made by offering
|
||||
access to copy from a designated place, then offering equivalent
|
||||
access to copy the source code from the same place counts as
|
||||
distribution of the source code, even though third parties are not
|
||||
compelled to copy the source along with the object code.
|
||||
|
||||
4. You may not copy, modify, sublicense, or distribute the Program
|
||||
except as expressly provided under this License. Any attempt
|
||||
otherwise to copy, modify, sublicense or distribute the Program is
|
||||
void, and will automatically terminate your rights under this License.
|
||||
However, parties who have received copies, or rights, from you under
|
||||
this License will not have their licenses terminated so long as such
|
||||
parties remain in full compliance.
|
||||
|
||||
5. You are not required to accept this License, since you have not
|
||||
signed it. However, nothing else grants you permission to modify or
|
||||
distribute the Program or its derivative works. These actions are
|
||||
prohibited by law if you do not accept this License. Therefore, by
|
||||
modifying or distributing the Program (or any work based on the
|
||||
Program), you indicate your acceptance of this License to do so, and
|
||||
all its terms and conditions for copying, distributing or modifying
|
||||
the Program or works based on it.
|
||||
|
||||
6. Each time you redistribute the Program (or any work based on the
|
||||
Program), the recipient automatically receives a license from the
|
||||
original licensor to copy, distribute or modify the Program subject to
|
||||
these terms and conditions. You may not impose any further
|
||||
restrictions on the recipients' exercise of the rights granted herein.
|
||||
You are not responsible for enforcing compliance by third parties to
|
||||
this License.
|
||||
|
||||
7. If, as a consequence of a court judgment or allegation of patent
|
||||
infringement or for any other reason (not limited to patent issues),
|
||||
conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot
|
||||
distribute so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you
|
||||
may not distribute the Program at all. For example, if a patent
|
||||
license would not permit royalty-free redistribution of the Program by
|
||||
all those who receive copies directly or indirectly through you, then
|
||||
the only way you could satisfy both it and this License would be to
|
||||
refrain entirely from distribution of the Program.
|
||||
|
||||
If any portion of this section is held invalid or unenforceable under
|
||||
any particular circumstance, the balance of the section is intended to
|
||||
apply and the section as a whole is intended to apply in other
|
||||
circumstances.
|
||||
|
||||
It is not the purpose of this section to induce you to infringe any
|
||||
patents or other property right claims or to contest validity of any
|
||||
such claims; this section has the sole purpose of protecting the
|
||||
integrity of the free software distribution system, which is
|
||||
implemented by public license practices. Many people have made
|
||||
generous contributions to the wide range of software distributed
|
||||
through that system in reliance on consistent application of that
|
||||
system; it is up to the author/donor to decide if he or she is willing
|
||||
to distribute software through any other system and a licensee cannot
|
||||
impose that choice.
|
||||
|
||||
This section is intended to make thoroughly clear what is believed to
|
||||
be a consequence of the rest of this License.
|
||||
|
||||
8. If the distribution and/or use of the Program is restricted in
|
||||
certain countries either by patents or by copyrighted interfaces, the
|
||||
original copyright holder who places the Program under this License
|
||||
may add an explicit geographical distribution limitation excluding
|
||||
those countries, so that distribution is permitted only in or among
|
||||
countries not thus excluded. In such case, this License incorporates
|
||||
the limitation as if written in the body of this License.
|
||||
|
||||
9. The Free Software Foundation may publish revised and/or new versions
|
||||
of the General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the Program
|
||||
specifies a version number of this License which applies to it and "any
|
||||
later version", you have the option of following the terms and conditions
|
||||
either of that version or of any later version published by the Free
|
||||
Software Foundation. If the Program does not specify a version number of
|
||||
this License, you may choose any version ever published by the Free Software
|
||||
Foundation.
|
||||
|
||||
10. If you wish to incorporate parts of the Program into other free
|
||||
programs whose distribution conditions are different, write to the author
|
||||
to ask for permission. For software which is copyrighted by the Free
|
||||
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||
make exceptions for this. Our decision will be guided by the two goals
|
||||
of preserving the free status of all derivatives of our free software and
|
||||
of promoting the sharing and reuse of software generally.
|
||||
|
||||
NO WARRANTY
|
||||
|
||||
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||
REPAIR OR CORRECTION.
|
||||
|
||||
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
convey the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) <year> <name of author>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License along
|
||||
with this program; if not, write to the Free Software Foundation, Inc.,
|
||||
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program is interactive, make it output a short notice like this
|
||||
when it starts in an interactive mode:
|
||||
|
||||
Gnomovision version 69, Copyright (C) year name of author
|
||||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, the commands you use may
|
||||
be called something other than `show w' and `show c'; they could even be
|
||||
mouse-clicks or menu items--whatever suits your program.
|
||||
|
||||
You should also get your employer (if you work as a programmer) or your
|
||||
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||
necessary. Here is a sample; alter the names:
|
||||
|
||||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||
|
||||
<signature of Ty Coon>, 1 April 1989
|
||||
Ty Coon, President of Vice
|
||||
|
||||
This General Public License does not permit incorporating your program into
|
||||
proprietary programs. If your program is a subroutine library, you may
|
||||
consider it more useful to permit linking proprietary applications with the
|
||||
library. If this is what you want to do, use the GNU Lesser General
|
||||
Public License instead of this License.
|
@ -1,11 +0,0 @@
|
||||
Copyright (c) 2016-present, Facebook, Inc. All rights reserved.
|
||||
|
||||
The examples provided by Facebook are for non-commercial testing and evaluation
|
||||
purposes only. Facebook reserves all rights not expressly granted.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
||||
FACEBOOK BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
|
||||
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
||||
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
39
Makefile
39
Makefile
@ -1,10 +1,10 @@
|
||||
# ################################################################
|
||||
# Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
# Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the BSD-style license found in the
|
||||
# LICENSE file in the root directory of this source tree. An additional grant
|
||||
# of patent rights can be found in the PATENTS file in the same directory.
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
PRGDIR = programs
|
||||
@ -29,15 +29,12 @@ default: lib-release zstd-release
|
||||
all: | allmost examples manual
|
||||
|
||||
.PHONY: allmost
|
||||
allmost:
|
||||
$(MAKE) -C $(ZSTDDIR) all
|
||||
$(MAKE) -C $(PRGDIR) all
|
||||
$(MAKE) -C $(TESTDIR) all
|
||||
allmost: allzstd
|
||||
$(MAKE) -C $(ZWRAPDIR) all
|
||||
|
||||
#skip zwrapper, can't build that on alternate architectures without the proper zlib installed
|
||||
.PHONY: allarch
|
||||
allarch:
|
||||
.PHONY: allzstd
|
||||
allzstd:
|
||||
$(MAKE) -C $(ZSTDDIR) all
|
||||
$(MAKE) -C $(PRGDIR) all
|
||||
$(MAKE) -C $(TESTDIR) all
|
||||
@ -74,12 +71,9 @@ zstdmt:
|
||||
zlibwrapper:
|
||||
$(MAKE) -C $(ZWRAPDIR) test
|
||||
|
||||
.PHONY: shortest
|
||||
shortest:
|
||||
$(MAKE) -C $(TESTDIR) $@
|
||||
|
||||
.PHONY: test
|
||||
test:
|
||||
.PHONY: test shortest
|
||||
test shortest:
|
||||
$(MAKE) -C $(PRGDIR) allVariants
|
||||
$(MAKE) -C $(TESTDIR) $@
|
||||
|
||||
.PHONY: examples
|
||||
@ -146,6 +140,11 @@ gcc6build: clean
|
||||
gcc-6 -v
|
||||
CC=gcc-6 $(MAKE) all MOREFLAGS="-Werror"
|
||||
|
||||
.PHONY: gcc7build
|
||||
gcc7build: clean
|
||||
gcc-7 -v
|
||||
CC=gcc-7 $(MAKE) all MOREFLAGS="-Werror"
|
||||
|
||||
.PHONY: clangbuild
|
||||
clangbuild: clean
|
||||
clang -v
|
||||
@ -156,16 +155,16 @@ m32build: clean
|
||||
$(MAKE) all32
|
||||
|
||||
armbuild: clean
|
||||
CC=arm-linux-gnueabi-gcc CFLAGS="-Werror" $(MAKE) allarch
|
||||
CC=arm-linux-gnueabi-gcc CFLAGS="-Werror" $(MAKE) allzstd
|
||||
|
||||
aarch64build: clean
|
||||
CC=aarch64-linux-gnu-gcc CFLAGS="-Werror" $(MAKE) allarch
|
||||
CC=aarch64-linux-gnu-gcc CFLAGS="-Werror" $(MAKE) allzstd
|
||||
|
||||
ppcbuild: clean
|
||||
CC=powerpc-linux-gnu-gcc CLAGS="-m32 -Wno-attributes -Werror" $(MAKE) allarch
|
||||
CC=powerpc-linux-gnu-gcc CLAGS="-m32 -Wno-attributes -Werror" $(MAKE) allzstd
|
||||
|
||||
ppc64build: clean
|
||||
CC=powerpc-linux-gnu-gcc CFLAGS="-m64 -Werror" $(MAKE) allarch
|
||||
CC=powerpc-linux-gnu-gcc CFLAGS="-m64 -Werror" $(MAKE) allzstd
|
||||
|
||||
armfuzz: clean
|
||||
CC=arm-linux-gnueabi-gcc QEMU_SYS=qemu-arm-static MOREFLAGS="-static" FUZZER_FLAGS=--no-big-tests $(MAKE) -C $(TESTDIR) fuzztest
|
||||
|
18
NEWS
18
NEWS
@ -1,9 +1,23 @@
|
||||
v1.3.2
|
||||
license : changed /examples license to BSD + GPLv2
|
||||
license : fix a few header files to reflect new license (#825)
|
||||
fix : a rare compression bug when compression generates very large distances (only possible at --ultra -22)
|
||||
build: better compatibility with reproducible builds, by Bernhard M. Wiedemann (@bmwiedemann) (#818)
|
||||
|
||||
v1.3.1
|
||||
perf: substantially decreased memory usage in Multi-threading mode, thanks to reports by Tino Reichardt
|
||||
New license : BSD + GPLv2
|
||||
perf: substantially decreased memory usage in Multi-threading mode, thanks to reports by Tino Reichardt (@mcmilk)
|
||||
perf: Multi-threading supports up to 256 threads. Cap at 256 when more are requested (#760)
|
||||
build: fix Visual compilation for non x86/x64 targets, reported by Greg Slazinski (#718)
|
||||
cli : improved and fixed --list command, by @ib (#772)
|
||||
cli : command -vV to list supported formats, by @ib (#771)
|
||||
build : fixed binary variants, reported by @svenha (#788)
|
||||
build : fix Visual compilation for non x86/x64 targets, reported by Greg Slazinski (@GregSlazinski) (#718)
|
||||
API exp : breaking change : ZSTD_getframeHeader() provides more information
|
||||
API exp : breaking change : pinned down values of error codes
|
||||
doc : fixed huffman example, by Ulrich Kunitz (@ulikunitz)
|
||||
new : contrib/adaptive-compression, I/O driven compression strength, by Paul Cruz (@paulcruz74)
|
||||
new : contrib/long_distance_matching, statistics by Stella Lau (@stellamplau)
|
||||
updated : contrib/linux-kernel, by Nick Terrell (@terrelln)
|
||||
|
||||
v1.3.0
|
||||
cli : new : `--list` command, by Paul Cruz
|
||||
|
33
PATENTS
33
PATENTS
@ -1,33 +0,0 @@
|
||||
Additional Grant of Patent Rights Version 2
|
||||
|
||||
"Software" means the Zstandard software distributed by Facebook, Inc.
|
||||
|
||||
Facebook, Inc. ("Facebook") hereby grants to each recipient of the Software
|
||||
("you") a perpetual, worldwide, royalty-free, non-exclusive, irrevocable
|
||||
(subject to the termination provision below) license under any Necessary
|
||||
Claims, to make, have made, use, sell, offer to sell, import, and otherwise
|
||||
transfer the Software. For avoidance of doubt, no license is granted under
|
||||
Facebook’s rights in any patent claims that are infringed by (i) modifications
|
||||
to the Software made by you or any third party or (ii) the Software in
|
||||
combination with any software or other technology.
|
||||
|
||||
The license granted hereunder will terminate, automatically and without notice,
|
||||
if you (or any of your subsidiaries, corporate affiliates or agents) initiate
|
||||
directly or indirectly, or take a direct financial interest in, any Patent
|
||||
Assertion: (i) against Facebook or any of its subsidiaries or corporate
|
||||
affiliates, (ii) against any party if such Patent Assertion arises in whole or
|
||||
in part from any software, technology, product or service of Facebook or any of
|
||||
its subsidiaries or corporate affiliates, or (iii) against any party relating
|
||||
to the Software. Notwithstanding the foregoing, if Facebook or any of its
|
||||
subsidiaries or corporate affiliates files a lawsuit alleging patent
|
||||
infringement against you in the first instance, and you respond by filing a
|
||||
patent infringement counterclaim in that lawsuit against that party that is
|
||||
unrelated to the Software, the license granted hereunder will not terminate
|
||||
under section (i) of this paragraph due to such counterclaim.
|
||||
|
||||
A "Necessary Claim" is a claim of a patent owned by Facebook that is
|
||||
necessarily infringed by the Software standing alone.
|
||||
|
||||
A "Patent Assertion" is any lawsuit or other action alleging direct, indirect,
|
||||
or contributory infringement or inducement to infringe any patent, including a
|
||||
cross-claim or counterclaim.
|
@ -134,12 +134,12 @@ Going into `build` directory, you will find additional possibilities :
|
||||
|
||||
### Status
|
||||
|
||||
Zstandard is currently deployed within Facebook. It is used daily to compress and decompress very large amounts of data in multiple formats and use cases.
|
||||
Zstandard is currently deployed within Facebook. It is used continuously to compress large amounts of data in multiple formats and use cases.
|
||||
Zstandard is considered safe for production environments.
|
||||
|
||||
### License
|
||||
|
||||
Zstandard is [BSD-licensed](LICENSE). We also provide an [additional patent grant](PATENTS).
|
||||
Zstandard is dual-licensed under [BSD](LICENSE) and [GPLv2](COPYING).
|
||||
|
||||
### Contributing
|
||||
|
||||
|
12
appveyor.yml
12
appveyor.yml
@ -9,19 +9,19 @@
|
||||
- COMPILER: "gcc"
|
||||
HOST: "mingw"
|
||||
PLATFORM: "x64"
|
||||
SCRIPT: "make allarch && make -C tests test-symbols fullbench-dll fullbench-lib"
|
||||
SCRIPT: "make allzstd MOREFLAGS=-static && make -C tests test-symbols fullbench-dll fullbench-lib"
|
||||
ARTIFACT: "true"
|
||||
BUILD: "true"
|
||||
- COMPILER: "gcc"
|
||||
HOST: "mingw"
|
||||
PLATFORM: "x86"
|
||||
SCRIPT: "make allarch"
|
||||
SCRIPT: "make allzstd MOREFLAGS=-static"
|
||||
ARTIFACT: "true"
|
||||
BUILD: "true"
|
||||
- COMPILER: "clang"
|
||||
HOST: "mingw"
|
||||
PLATFORM: "x64"
|
||||
SCRIPT: "MOREFLAGS='--target=x86_64-w64-mingw32 -Werror -Wconversion -Wno-sign-conversion' make allarch"
|
||||
SCRIPT: "MOREFLAGS='--target=x86_64-w64-mingw32 -Werror -Wconversion -Wno-sign-conversion' make allzstd"
|
||||
BUILD: "true"
|
||||
|
||||
- COMPILER: "gcc"
|
||||
@ -172,15 +172,15 @@
|
||||
- COMPILER: "gcc"
|
||||
HOST: "mingw"
|
||||
PLATFORM: "x64"
|
||||
SCRIPT: "make allarch"
|
||||
SCRIPT: "make allzstd"
|
||||
- COMPILER: "gcc"
|
||||
HOST: "mingw"
|
||||
PLATFORM: "x86"
|
||||
SCRIPT: "make allarch"
|
||||
SCRIPT: "make allzstd"
|
||||
- COMPILER: "clang"
|
||||
HOST: "mingw"
|
||||
PLATFORM: "x64"
|
||||
SCRIPT: "MOREFLAGS='--target=x86_64-w64-mingw32 -Werror -Wconversion -Wno-sign-conversion' make allarch"
|
||||
SCRIPT: "MOREFLAGS='--target=x86_64-w64-mingw32 -Werror -Wconversion -Wno-sign-conversion' make allzstd"
|
||||
|
||||
- COMPILER: "visual"
|
||||
HOST: "visual"
|
||||
|
@ -2,9 +2,9 @@
|
||||
# Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the BSD-style license found in the
|
||||
# LICENSE file in the root directory of this source tree. An additional grant
|
||||
# of patent rights can be found in the PATENTS file in the same directory.
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
PROJECT(zstd)
|
||||
|
@ -1,17 +1,13 @@
|
||||
# ################################################################
|
||||
# * Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# * All rights reserved.
|
||||
# *
|
||||
# * This source code is licensed under the BSD-style license found in the
|
||||
# * LICENSE file in the root directory of this source tree. An additional grant
|
||||
# * of patent rights can be found in the PATENTS file in the same directory.
|
||||
# Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# You can contact the author at :
|
||||
# - zstd homepage : http://www.zstd.net/
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
PROJECT(contrib)
|
||||
|
||||
ADD_SUBDIRECTORY(pzstd)
|
||||
ADD_SUBDIRECTORY(gen_html)
|
||||
|
||||
|
@ -1,13 +1,10 @@
|
||||
# ################################################################
|
||||
# * Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# * All rights reserved.
|
||||
# *
|
||||
# * This source code is licensed under the BSD-style license found in the
|
||||
# * LICENSE file in the root directory of this source tree. An additional grant
|
||||
# * of patent rights can be found in the PATENTS file in the same directory.
|
||||
# Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# You can contact the author at :
|
||||
# - zstd homepage : http://www.zstd.net/
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
PROJECT(gen_html)
|
||||
|
@ -1,13 +1,10 @@
|
||||
# ################################################################
|
||||
# * Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# * All rights reserved.
|
||||
# *
|
||||
# * This source code is licensed under the BSD-style license found in the
|
||||
# * LICENSE file in the root directory of this source tree. An additional grant
|
||||
# * of patent rights can be found in the PATENTS file in the same directory.
|
||||
# Copyright (c) 2016-present, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# You can contact the author at :
|
||||
# - zstd homepage : http://www.zstd.net/
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
PROJECT(pzstd)
|
||||
|
@ -1,13 +1,10 @@
|
||||
# ################################################################
|
||||
# * Copyright (c) 2014-present, Yann Collet, Facebook, Inc.
|
||||
# * All rights reserved.
|
||||
# *
|
||||
# * This source code is licensed under the BSD-style license found in the
|
||||
# * LICENSE file in the root directory of this source tree. An additional grant
|
||||
# * of patent rights can be found in the PATENTS file in the same directory.
|
||||
# Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# You can contact the author at :
|
||||
# - zstd homepage : http://www.zstd.net/
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
PROJECT(libzstd)
|
||||
@ -151,10 +148,10 @@ IF (UNIX)
|
||||
ENDIF (UNIX)
|
||||
|
||||
# install target
|
||||
INSTALL(FILES
|
||||
${LIBRARY_DIR}/zstd.h
|
||||
${LIBRARY_DIR}/deprecated/zbuff.h
|
||||
${LIBRARY_DIR}/dictBuilder/zdict.h
|
||||
INSTALL(FILES
|
||||
${LIBRARY_DIR}/zstd.h
|
||||
${LIBRARY_DIR}/deprecated/zbuff.h
|
||||
${LIBRARY_DIR}/dictBuilder/zdict.h
|
||||
${LIBRARY_DIR}/common/zstd_errors.h
|
||||
DESTINATION "include")
|
||||
|
||||
|
@ -1,13 +1,10 @@
|
||||
# ################################################################
|
||||
# * Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# * All rights reserved.
|
||||
# *
|
||||
# * This source code is licensed under the BSD-style license found in the
|
||||
# * LICENSE file in the root directory of this source tree. An additional grant
|
||||
# * of patent rights can be found in the PATENTS file in the same directory.
|
||||
# Copyright (c) 2015-present, Yann Collet, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# You can contact the author at :
|
||||
# - zstd homepage : http://www.zstd.net/
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
PROJECT(programs)
|
||||
|
@ -3,7 +3,7 @@ dependencies:
|
||||
- sudo dpkg --add-architecture i386
|
||||
- sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test; sudo apt-get -y -qq update
|
||||
- sudo apt-get -y install gcc-powerpc-linux-gnu gcc-arm-linux-gnueabi libc6-dev-armel-cross gcc-aarch64-linux-gnu libc6-dev-arm64-cross
|
||||
- sudo apt-get -y install libstdc++-6-dev clang gcc g++ gcc-5 gcc-6 zlib1g-dev liblzma-dev
|
||||
- sudo apt-get -y install libstdc++-7-dev clang gcc g++ gcc-5 gcc-6 gcc-7 zlib1g-dev liblzma-dev
|
||||
- sudo apt-get -y install linux-libc-dev:i386 libc6-dev-i386
|
||||
|
||||
test:
|
||||
@ -45,7 +45,7 @@ test:
|
||||
parallel: true
|
||||
- ? |
|
||||
if [[ "$CIRCLE_NODE_INDEX" == "0" ]] ; then make ppc64build && make clean; fi &&
|
||||
if [[ "$CIRCLE_NODE_TOTAL" < "2" ]] || [[ "$CIRCLE_NODE_INDEX" == "1" ]]; then true && make clean; fi #could add another test here
|
||||
if [[ "$CIRCLE_NODE_TOTAL" < "2" ]] || [[ "$CIRCLE_NODE_INDEX" == "1" ]]; then make gcc7build && make clean; fi #could add another test here
|
||||
:
|
||||
parallel: true
|
||||
- ? |
|
||||
@ -64,7 +64,7 @@ test:
|
||||
#- gcc -v; make -C tests test32 MOREFLAGS="-I/usr/include/x86_64-linux-gnu" && make clean
|
||||
#- make uasan && make clean
|
||||
#- make asan32 && make clean
|
||||
#- make -C tests test32 CC=clang MOREFLAGS="-g -fsanitize=address -I/usr/include/x86_64-linux-gnu"
|
||||
#- make -C tests test32 CC=clang MOREFLAGS="-g -fsanitize=address -I/usr/include/x86_64-linux-gnu"
|
||||
# Valgrind tests
|
||||
#- CFLAGS="-O1 -g" make -C zlibWrapper valgrindTest && make clean
|
||||
#- make -C tests valgrindTest && make clean
|
||||
|
76
contrib/adaptive-compression/Makefile
Normal file
76
contrib/adaptive-compression/Makefile
Normal file
@ -0,0 +1,76 @@
|
||||
|
||||
ZSTDDIR = ../../lib
|
||||
PRGDIR = ../../programs
|
||||
ZSTDCOMMON_FILES := $(ZSTDDIR)/common/*.c
|
||||
ZSTDCOMP_FILES := $(ZSTDDIR)/compress/*.c
|
||||
ZSTDDECOMP_FILES := $(ZSTDDIR)/decompress/*.c
|
||||
ZSTD_FILES := $(ZSTDDECOMP_FILES) $(ZSTDCOMMON_FILES) $(ZSTDCOMP_FILES)
|
||||
|
||||
MULTITHREAD_LDFLAGS = -pthread
|
||||
DEBUGFLAGS= -g -DZSTD_DEBUG=1
|
||||
CPPFLAGS += -I$(ZSTDDIR) -I$(ZSTDDIR)/common -I$(ZSTDDIR)/compress \
|
||||
-I$(ZSTDDIR)/dictBuilder -I$(ZSTDDIR)/deprecated -I$(PRGDIR)
|
||||
CFLAGS ?= -O3
|
||||
CFLAGS += -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow \
|
||||
-Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement \
|
||||
-Wstrict-prototypes -Wundef -Wformat-security \
|
||||
-Wvla -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings \
|
||||
-Wredundant-decls
|
||||
CFLAGS += $(DEBUGFLAGS)
|
||||
CFLAGS += $(MOREFLAGS)
|
||||
FLAGS = $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) $(MULTITHREAD_LDFLAGS)
|
||||
|
||||
all: adapt datagen
|
||||
|
||||
adapt: $(ZSTD_FILES) adapt.c
|
||||
$(CC) $(FLAGS) $^ -o $@
|
||||
|
||||
adapt-debug: $(ZSTD_FILES) adapt.c
|
||||
$(CC) $(FLAGS) -DDEBUG_MODE=2 $^ -o adapt
|
||||
|
||||
datagen : $(PRGDIR)/datagen.c datagencli.c
|
||||
$(CC) $(FLAGS) $^ -o $@
|
||||
|
||||
test-adapt-correctness: datagen adapt
|
||||
@./test-correctness.sh
|
||||
@echo "test correctness complete"
|
||||
|
||||
test-adapt-performance: datagen adapt
|
||||
@./test-performance.sh
|
||||
@echo "test performance complete"
|
||||
|
||||
clean:
|
||||
@$(RM) -f adapt datagen
|
||||
@$(RM) -rf *.dSYM
|
||||
@$(RM) -f tmp*
|
||||
@$(RM) -f tests/*.zst
|
||||
@$(RM) -f tests/tmp*
|
||||
@echo "finished cleaning"
|
||||
|
||||
#-----------------------------------------------------------------------------
|
||||
# make install is validated only for Linux, OSX, BSD, Hurd and Solaris targets
|
||||
#-----------------------------------------------------------------------------
|
||||
ifneq (,$(filter $(shell uname),Linux Darwin GNU/kFreeBSD GNU OpenBSD FreeBSD NetBSD DragonFly SunOS))
|
||||
|
||||
ifneq (,$(filter $(shell uname),SunOS))
|
||||
INSTALL ?= ginstall
|
||||
else
|
||||
INSTALL ?= install
|
||||
endif
|
||||
|
||||
PREFIX ?= /usr/local
|
||||
DESTDIR ?=
|
||||
BINDIR ?= $(PREFIX)/bin
|
||||
|
||||
INSTALL_PROGRAM ?= $(INSTALL) -m 755
|
||||
|
||||
install: adapt
|
||||
@echo Installing binaries
|
||||
@$(INSTALL) -d -m 755 $(DESTDIR)$(BINDIR)/
|
||||
@$(INSTALL_PROGRAM) adapt $(DESTDIR)$(BINDIR)/zstd-adaptive
|
||||
@echo zstd-adaptive installation completed
|
||||
|
||||
uninstall:
|
||||
@$(RM) $(DESTDIR)$(BINDIR)/zstd-adaptive
|
||||
@echo zstd-adaptive programs successfully uninstalled
|
||||
endif
|
91
contrib/adaptive-compression/README.md
Normal file
91
contrib/adaptive-compression/README.md
Normal file
@ -0,0 +1,91 @@
|
||||
### Summary
|
||||
|
||||
`adapt` is a new compression tool targeted at optimizing performance across network connections and pipelines. The tool is aimed at sensing network speeds and adapting compression level based on network or pipe speeds.
|
||||
In situations where the compression level does not appropriately match the network/pipe speed, compression may be bottlenecking the entire pipeline or the files may not be compressed as much as they potentially could be, therefore losing efficiency. It also becomes quite impractical to manually measure and set an optimalcompression level (which could potentially change over time).
|
||||
|
||||
### Using `adapt`
|
||||
|
||||
In order to build and use the tool, you can simply run `make adapt` in the `adaptive-compression` directory under `contrib`. This will generate an executable available for use. Another possible method of installation is running `make install`, which will create and install the binary as the command `zstd-adaptive`.
|
||||
|
||||
Similar to many other compression utilities, `zstd-adaptive` can be invoked by using the following format:
|
||||
|
||||
`zstd-adaptive [options] [file(s)]`
|
||||
|
||||
Supported options for the above format are described below.
|
||||
|
||||
`zstd-adaptive` also supports reading from `stdin` and writing to `stdout`, which is potentially more useful. By default, if no files are given, `zstd-adaptive` reads from and writes to standard I/O. Therefore, you can simply insert it within a pipeline like so:
|
||||
|
||||
`cat FILE | zstd-adaptive | ssh "cat - > tmp.zst"`
|
||||
|
||||
If a file is provided, it is also possible to force writing to stdout using the `-c` flag like so:
|
||||
|
||||
`zstd-adaptive -c FILE | ssh "cat - > tmp.zst"`
|
||||
|
||||
Several options described below can be used to control the behavior of `zstd-adaptive`. More specifically, using the `-l#` and `-u#` flags will will set upper and lower bounds so that the compression level will always be within that range. The `-i#` flag can also be used to change the initial compression level. If an initial compression level is not provided, the initial compression level will be chosen such that it is within the appropriate range (it becomes equal to the lower bound).
|
||||
|
||||
### Options
|
||||
`-oFILE` : write output to `FILE`
|
||||
|
||||
`-i#` : provide initial compression level (must within the appropriate bounds)
|
||||
|
||||
`-h` : display help/information
|
||||
|
||||
`-f` : force the compression level to stay constant
|
||||
|
||||
`-c` : force write to `stdout`
|
||||
|
||||
`-p` : hide progress bar
|
||||
|
||||
`-q` : quiet mode -- do not show progress bar or other information
|
||||
|
||||
`-l#` : set a lower bound on the compression level (default is 1)
|
||||
|
||||
`-u#` : set an upper bound on the compression level (default is 22)
|
||||
### Benchmarking / Test results
|
||||
#### Artificial Tests
|
||||
These artificial tests were run by using the `pv` command line utility in order to limit pipe speeds (25 MB/s read and 5 MB/s write limits were chosen to mimic severe throughput constraints). A 40 GB backup file was sent through a pipeline, compressed, and written out to a file. Compression time, size, and ratio were computed. Data for `zstd -15` was excluded from these tests because the test runs quite long.
|
||||
|
||||
<table>
|
||||
<tr><th> 25 MB/s read limit </th></tr>
|
||||
<tr><td>
|
||||
|
||||
| Compressor Name | Ratio | Compressed Size | Compression Time |
|
||||
|:----------------|------:|----------------:|-----------------:|
|
||||
| zstd -3 | 2.108 | 20.718 GB | 29m 48.530s |
|
||||
| zstd-adaptive | 2.230 | 19.581 GB | 29m 48.798s |
|
||||
|
||||
</td><tr>
|
||||
</table>
|
||||
|
||||
<table>
|
||||
<tr><th> 5 MB/s write limit </th></tr>
|
||||
<tr><td>
|
||||
|
||||
| Compressor Name | Ratio | Compressed Size | Compression Time |
|
||||
|:----------------|------:|----------------:|-----------------:|
|
||||
| zstd -3 | 2.108 | 20.718 GB | 1h 10m 43.076s |
|
||||
| zstd-adaptive | 2.249 | 19.412 GB | 1h 06m 15.577s |
|
||||
|
||||
</td></tr>
|
||||
</table>
|
||||
|
||||
The commands used for this test generally followed the form:
|
||||
|
||||
`cat FILE | pv -L 25m -q | COMPRESSION | pv -q > tmp.zst # impose 25 MB/s read limit`
|
||||
|
||||
`cat FILE | pv -q | COMPRESSION | pv -L 5m -q > tmp.zst # impose 5 MB/s write limit`
|
||||
|
||||
#### SSH Tests
|
||||
|
||||
The following tests were performed by piping a relatively large backup file (approximately 80 GB) through compression and over SSH to be stored on a server. The test data includes statistics for time and compressed size on `zstd` at several compression levels, as well as `zstd-adaptive`. The data highlights the potential advantages that `zstd-adaptive` has over using a low static compression level and the negative imapcts that using an excessively high static compression level can have on
|
||||
pipe throughput.
|
||||
|
||||
| Compressor Name | Ratio | Compressed Size | Compression Time |
|
||||
|:----------------|------:|----------------:|-----------------:|
|
||||
| zstd -3 | 2.212 | 32.426 GB | 1h 17m 59.756s |
|
||||
| zstd -15 | 2.374 | 30.213 GB | 2h 56m 59.441s |
|
||||
| zstd-adaptive | 2.315 | 30.993 GB | 1h 18m 52.860s |
|
||||
|
||||
The commands used for this test generally followed the form:
|
||||
|
||||
`cat FILE | COMPRESSION | ssh dev "cat - > tmp.zst"`
|
1137
contrib/adaptive-compression/adapt.c
Normal file
1137
contrib/adaptive-compression/adapt.c
Normal file
File diff suppressed because it is too large
Load Diff
129
contrib/adaptive-compression/datagencli.c
Normal file
129
contrib/adaptive-compression/datagencli.c
Normal file
@ -0,0 +1,129 @@
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
/*-************************************
|
||||
* Dependencies
|
||||
**************************************/
|
||||
#include "util.h" /* Compiler options */
|
||||
#include <stdio.h> /* fprintf, stderr */
|
||||
#include "datagen.h" /* RDG_generate */
|
||||
|
||||
|
||||
/*-************************************
|
||||
* Constants
|
||||
**************************************/
|
||||
#define KB *(1 <<10)
|
||||
#define MB *(1 <<20)
|
||||
#define GB *(1U<<30)
|
||||
|
||||
#define SIZE_DEFAULT ((64 KB) + 1)
|
||||
#define SEED_DEFAULT 0
|
||||
#define COMPRESSIBILITY_DEFAULT 50
|
||||
|
||||
|
||||
/*-************************************
|
||||
* Macros
|
||||
**************************************/
|
||||
#define DISPLAY(...) fprintf(stderr, __VA_ARGS__)
|
||||
#define DISPLAYLEVEL(l, ...) if (displayLevel>=l) { DISPLAY(__VA_ARGS__); }
|
||||
static unsigned displayLevel = 2;
|
||||
|
||||
|
||||
/*-*******************************************************
|
||||
* Command line
|
||||
*********************************************************/
|
||||
static int usage(const char* programName)
|
||||
{
|
||||
DISPLAY( "Compressible data generator\n");
|
||||
DISPLAY( "Usage :\n");
|
||||
DISPLAY( " %s [args]\n", programName);
|
||||
DISPLAY( "\n");
|
||||
DISPLAY( "Arguments :\n");
|
||||
DISPLAY( " -g# : generate # data (default:%i)\n", SIZE_DEFAULT);
|
||||
DISPLAY( " -s# : Select seed (default:%i)\n", SEED_DEFAULT);
|
||||
DISPLAY( " -P# : Select compressibility in %% (default:%i%%)\n",
|
||||
COMPRESSIBILITY_DEFAULT);
|
||||
DISPLAY( " -h : display help and exit\n");
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
int main(int argc, const char** argv)
|
||||
{
|
||||
unsigned probaU32 = COMPRESSIBILITY_DEFAULT;
|
||||
double litProba = 0.0;
|
||||
U64 size = SIZE_DEFAULT;
|
||||
U32 seed = SEED_DEFAULT;
|
||||
const char* const programName = argv[0];
|
||||
|
||||
int argNb;
|
||||
for(argNb=1; argNb<argc; argNb++) {
|
||||
const char* argument = argv[argNb];
|
||||
|
||||
if(!argument) continue; /* Protection if argument empty */
|
||||
|
||||
/* Handle commands. Aggregated commands are allowed */
|
||||
if (*argument=='-') {
|
||||
argument++;
|
||||
while (*argument!=0) {
|
||||
switch(*argument)
|
||||
{
|
||||
case 'h':
|
||||
return usage(programName);
|
||||
case 'g':
|
||||
argument++;
|
||||
size=0;
|
||||
while ((*argument>='0') && (*argument<='9'))
|
||||
size *= 10, size += *argument++ - '0';
|
||||
if (*argument=='K') { size <<= 10; argument++; }
|
||||
if (*argument=='M') { size <<= 20; argument++; }
|
||||
if (*argument=='G') { size <<= 30; argument++; }
|
||||
if (*argument=='B') { argument++; }
|
||||
break;
|
||||
case 's':
|
||||
argument++;
|
||||
seed=0;
|
||||
while ((*argument>='0') && (*argument<='9'))
|
||||
seed *= 10, seed += *argument++ - '0';
|
||||
break;
|
||||
case 'P':
|
||||
argument++;
|
||||
probaU32 = 0;
|
||||
while ((*argument>='0') && (*argument<='9'))
|
||||
probaU32 *= 10, probaU32 += *argument++ - '0';
|
||||
if (probaU32>100) probaU32 = 100;
|
||||
break;
|
||||
case 'L': /* hidden argument : Literal distribution probability */
|
||||
argument++;
|
||||
litProba=0.;
|
||||
while ((*argument>='0') && (*argument<='9'))
|
||||
litProba *= 10, litProba += *argument++ - '0';
|
||||
if (litProba>100.) litProba=100.;
|
||||
litProba /= 100.;
|
||||
break;
|
||||
case 'v':
|
||||
displayLevel = 4;
|
||||
argument++;
|
||||
break;
|
||||
default:
|
||||
return usage(programName);
|
||||
}
|
||||
} } } /* for(argNb=1; argNb<argc; argNb++) */
|
||||
|
||||
DISPLAYLEVEL(4, "Compressible data Generator \n");
|
||||
if (probaU32!=COMPRESSIBILITY_DEFAULT)
|
||||
DISPLAYLEVEL(3, "Compressibility : %i%%\n", probaU32);
|
||||
DISPLAYLEVEL(3, "Seed = %u \n", seed);
|
||||
|
||||
RDG_genStdout(size, (double)probaU32/100, litProba, seed);
|
||||
DISPLAYLEVEL(1, "\n");
|
||||
|
||||
return 0;
|
||||
}
|
252
contrib/adaptive-compression/test-correctness.sh
Executable file
252
contrib/adaptive-compression/test-correctness.sh
Executable file
@ -0,0 +1,252 @@
|
||||
echo "correctness tests -- general"
|
||||
./datagen -s1 -g1GB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s2 -g500MB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s3 -g250MB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s4 -g125MB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s5 -g50MB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s6 -g25MB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s7 -g10MB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s8 -g5MB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s9 -g500KB > tmp
|
||||
./adapt -otmp.zst tmp
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ncorrectness tests -- streaming"
|
||||
./datagen -s10 -g1GB > tmp
|
||||
cat tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s11 -g100MB > tmp
|
||||
cat tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s12 -g10MB > tmp
|
||||
cat tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s13 -g1MB > tmp
|
||||
cat tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s14 -g100KB > tmp
|
||||
cat tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s15 -g10KB > tmp
|
||||
cat tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ncorrectness tests -- read limit"
|
||||
./datagen -s16 -g1GB > tmp
|
||||
pv -L 50m -q tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s17 -g100MB > tmp
|
||||
pv -L 50m -q tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s18 -g10MB > tmp
|
||||
pv -L 50m -q tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s19 -g1MB > tmp
|
||||
pv -L 50m -q tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s20 -g100KB > tmp
|
||||
pv -L 50m -q tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s21 -g10KB > tmp
|
||||
pv -L 50m -q tmp | ./adapt > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ncorrectness tests -- write limit"
|
||||
./datagen -s22 -g1GB > tmp
|
||||
pv -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s23 -g100MB > tmp
|
||||
pv -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s24 -g10MB > tmp
|
||||
pv -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s25 -g1MB > tmp
|
||||
pv -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s26 -g100KB > tmp
|
||||
pv -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s27 -g10KB > tmp
|
||||
pv -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ncorrectness tests -- read and write limits"
|
||||
./datagen -s28 -g1GB > tmp
|
||||
pv -L 50m -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s29 -g100MB > tmp
|
||||
pv -L 50m -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s30 -g10MB > tmp
|
||||
pv -L 50m -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s31 -g1MB > tmp
|
||||
pv -L 50m -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s32 -g100KB > tmp
|
||||
pv -L 50m -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s33 -g10KB > tmp
|
||||
pv -L 50m -q tmp | ./adapt | pv -L 5m -q > tmp.zst
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ncorrectness tests -- forced compression level"
|
||||
./datagen -s34 -g1GB > tmp
|
||||
./adapt tmp -otmp.zst -i11 -f
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s35 -g100MB > tmp
|
||||
./adapt tmp -otmp.zst -i11 -f
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s36 -g10MB > tmp
|
||||
./adapt tmp -otmp.zst -i11 -f
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s37 -g1MB > tmp
|
||||
./adapt tmp -otmp.zst -i11 -f
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s38 -g100KB > tmp
|
||||
./adapt tmp -otmp.zst -i11 -f
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
./datagen -s39 -g10KB > tmp
|
||||
./adapt tmp -otmp.zst -i11 -f
|
||||
zstd -d tmp.zst -o tmp2
|
||||
diff -s -q tmp tmp2
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ncorrectness tests -- window size test"
|
||||
./datagen -s39 -g1GB | pv -L 25m -q | ./adapt -i1 | pv -q > tmp.zst
|
||||
zstd -d tmp.zst
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ncorrectness tests -- testing bounds"
|
||||
./datagen -s40 -g1GB | pv -L 25m -q | ./adapt -i1 -u4 | pv -q > tmp.zst
|
||||
rm tmp*
|
||||
|
||||
./datagen -s41 -g1GB | ./adapt -i14 -l4 > tmp.zst
|
||||
rm tmp*
|
||||
make clean
|
59
contrib/adaptive-compression/test-performance.sh
Executable file
59
contrib/adaptive-compression/test-performance.sh
Executable file
@ -0,0 +1,59 @@
|
||||
echo "testing time -- no limits set"
|
||||
./datagen -s1 -g1GB > tmp
|
||||
time ./adapt -otmp1.zst tmp
|
||||
time zstd -1 -o tmp2.zst tmp
|
||||
rm tmp*
|
||||
|
||||
./datagen -s2 -g2GB > tmp
|
||||
time ./adapt -otmp1.zst tmp
|
||||
time zstd -1 -o tmp2.zst tmp
|
||||
rm tmp*
|
||||
|
||||
./datagen -s3 -g4GB > tmp
|
||||
time ./adapt -otmp1.zst tmp
|
||||
time zstd -1 -o tmp2.zst tmp
|
||||
rm tmp*
|
||||
|
||||
echo -e "\ntesting compression ratio -- no limits set"
|
||||
./datagen -s4 -g1GB > tmp
|
||||
time ./adapt -otmp1.zst tmp
|
||||
time zstd -1 -o tmp2.zst tmp
|
||||
ls -l tmp1.zst tmp2.zst
|
||||
rm tmp*
|
||||
|
||||
./datagen -s5 -g2GB > tmp
|
||||
time ./adapt -otmp1.zst tmp
|
||||
time zstd -1 -o tmp2.zst tmp
|
||||
ls -l tmp1.zst tmp2.zst
|
||||
rm tmp*
|
||||
|
||||
./datagen -s6 -g4GB > tmp
|
||||
time ./adapt -otmp1.zst tmp
|
||||
time zstd -1 -o tmp2.zst tmp
|
||||
ls -l tmp1.zst tmp2.zst
|
||||
rm tmp*
|
||||
|
||||
echo e "\ntesting performance at various compression levels -- no limits set"
|
||||
./datagen -s7 -g1GB > tmp
|
||||
echo "adapt"
|
||||
time ./adapt -i5 -f tmp -otmp1.zst
|
||||
echo "zstdcli"
|
||||
time zstd -5 tmp -o tmp2.zst
|
||||
ls -l tmp1.zst tmp2.zst
|
||||
rm tmp*
|
||||
|
||||
./datagen -s8 -g1GB > tmp
|
||||
echo "adapt"
|
||||
time ./adapt -i10 -f tmp -otmp1.zst
|
||||
echo "zstdcli"
|
||||
time zstd -10 tmp -o tmp2.zst
|
||||
ls -l tmp1.zst tmp2.zst
|
||||
rm tmp*
|
||||
|
||||
./datagen -s9 -g1GB > tmp
|
||||
echo "adapt"
|
||||
time ./adapt -i15 -f tmp -otmp1.zst
|
||||
echo "zstdcli"
|
||||
time zstd -15 tmp -o tmp2.zst
|
||||
ls -l tmp1.zst tmp2.zst
|
||||
rm tmp*
|
@ -1,11 +1,11 @@
|
||||
# ##########################################################################
|
||||
# ################################################################
|
||||
# Copyright (c) 2016-present, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the BSD-style license found in the
|
||||
# LICENSE file in the root directory of this source tree. An additional grant
|
||||
# of patent rights can be found in the PATENTS file in the same directory.
|
||||
# ##########################################################################
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
CFLAGS ?= -O3
|
||||
CFLAGS += -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow -Wstrict-aliasing=1 -Wswitch-enum -Wno-comment
|
||||
|
@ -2,9 +2,9 @@
|
||||
* Copyright (c) 2016-present, Przemyslaw Skibinski, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
#include <iostream>
|
||||
|
@ -1,7 +1,7 @@
|
||||
From 0cd63464d182bb9708f8b25f7da3dc8e5ec6b4fa Mon Sep 17 00:00:00 2001
|
||||
From 308795a7713ca6fcd468b60fba9a2fca99cee6a0 Mon Sep 17 00:00:00 2001
|
||||
From: Nick Terrell <terrelln@fb.com>
|
||||
Date: Thu, 20 Jul 2017 13:18:30 -0700
|
||||
Subject: [PATCH v3 0/4] Add xxhash and zstd modules
|
||||
Date: Tue, 8 Aug 2017 19:20:25 -0700
|
||||
Subject: [PATCH v5 0/5] Add xxhash and zstd modules
|
||||
|
||||
Hi all,
|
||||
|
||||
@ -16,27 +16,45 @@ Nick Terrell
|
||||
Changelog:
|
||||
|
||||
v1 -> v2:
|
||||
- Make pointer in lib/xxhash.c:394 non-const (1/4)
|
||||
- Use div_u64() for division of u64s (2/4)
|
||||
- Make pointer in lib/xxhash.c:394 non-const (1/5)
|
||||
- Use div_u64() for division of u64s (2/5)
|
||||
- Reduce stack usage of ZSTD_compressSequences(), ZSTD_buildSeqTable(),
|
||||
ZSTD_decompressSequencesLong(), FSE_buildDTable(), FSE_decompress_wksp(),
|
||||
HUF_writeCTable(), HUF_readStats(), HUF_readCTable(),
|
||||
HUF_compressWeights(), HUF_readDTableX2(), and HUF_readDTableX4() (2/4)
|
||||
- No zstd function uses more than 400 B of stack space (2/4)
|
||||
HUF_compressWeights(), HUF_readDTableX2(), and HUF_readDTableX4() (2/5)
|
||||
- No zstd function uses more than 400 B of stack space (2/5)
|
||||
|
||||
v2 -> v3:
|
||||
- Work around gcc-7 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388
|
||||
(2/4)
|
||||
- Fix bug in dictionary compression from upstream commit cc1522351f (2/4)
|
||||
- Port upstream BtrFS commits e1ddce71d6, 389a6cfc2a, and 6acafd1eff (3/4)
|
||||
- Change default compression level for BtrFS to 3 (3/4)
|
||||
(2/5)
|
||||
- Fix bug in dictionary compression from upstream commit cc1522351f (2/5)
|
||||
- Port upstream BtrFS commits e1ddce71d6, 389a6cfc2a, and 6acafd1eff (3/5)
|
||||
- Change default compression level for BtrFS to 3 (3/5)
|
||||
|
||||
Nick Terrell (4):
|
||||
v3 -> v4:
|
||||
- Fix compiler warnings (2/5)
|
||||
- Add missing includes (3/5)
|
||||
- Fix minor linter warnings (3/5, 4/5)
|
||||
- Add crypto patch (5/5)
|
||||
|
||||
v4 -> v5:
|
||||
- Fix rare compression bug from upstream commit 308047eb5d (2/5)
|
||||
- Fix bug introduced in v3 when working around the gcc-7 bug (2/5)
|
||||
- Fix ZSTD_DStream initialization code in squashfs (4/5)
|
||||
- Fix patch documentation for patches written by Sean Purcell (4/5)
|
||||
|
||||
Nick Terrell (5):
|
||||
lib: Add xxhash module
|
||||
lib: Add zstd modules
|
||||
btrfs: Add zstd support
|
||||
squashfs: Add zstd support
|
||||
crypto: Add zstd support
|
||||
|
||||
crypto/Kconfig | 9 +
|
||||
crypto/Makefile | 1 +
|
||||
crypto/testmgr.c | 10 +
|
||||
crypto/testmgr.h | 71 +
|
||||
crypto/zstd.c | 265 ++++
|
||||
fs/btrfs/Kconfig | 2 +
|
||||
fs/btrfs/Makefile | 2 +-
|
||||
fs/btrfs/compression.c | 1 +
|
||||
@ -47,13 +65,13 @@ Nick Terrell (4):
|
||||
fs/btrfs/props.c | 6 +
|
||||
fs/btrfs/super.c | 12 +-
|
||||
fs/btrfs/sysfs.c | 2 +
|
||||
fs/btrfs/zstd.c | 435 ++++++
|
||||
fs/btrfs/zstd.c | 432 ++++++
|
||||
fs/squashfs/Kconfig | 14 +
|
||||
fs/squashfs/Makefile | 1 +
|
||||
fs/squashfs/decompressor.c | 7 +
|
||||
fs/squashfs/decompressor.h | 4 +
|
||||
fs/squashfs/squashfs_fs.h | 1 +
|
||||
fs/squashfs/zstd_wrapper.c | 150 ++
|
||||
fs/squashfs/zstd_wrapper.c | 151 ++
|
||||
include/linux/xxhash.h | 236 +++
|
||||
include/linux/zstd.h | 1157 +++++++++++++++
|
||||
include/uapi/linux/btrfs.h | 8 +-
|
||||
@ -62,9 +80,9 @@ Nick Terrell (4):
|
||||
lib/xxhash.c | 500 +++++++
|
||||
lib/zstd/Makefile | 18 +
|
||||
lib/zstd/bitstream.h | 374 +++++
|
||||
lib/zstd/compress.c | 3479 ++++++++++++++++++++++++++++++++++++++++++++
|
||||
lib/zstd/decompress.c | 2526 ++++++++++++++++++++++++++++++++
|
||||
lib/zstd/entropy_common.c | 243 ++++
|
||||
lib/zstd/compress.c | 3484 ++++++++++++++++++++++++++++++++++++++++++++
|
||||
lib/zstd/decompress.c | 2528 ++++++++++++++++++++++++++++++++
|
||||
lib/zstd/entropy_common.c | 243 +++
|
||||
lib/zstd/error_private.h | 53 +
|
||||
lib/zstd/fse.h | 575 ++++++++
|
||||
lib/zstd/fse_compress.c | 795 ++++++++++
|
||||
@ -74,9 +92,10 @@ Nick Terrell (4):
|
||||
lib/zstd/huf_decompress.c | 960 ++++++++++++
|
||||
lib/zstd/mem.h | 151 ++
|
||||
lib/zstd/zstd_common.c | 75 +
|
||||
lib/zstd/zstd_internal.h | 250 ++++
|
||||
lib/zstd/zstd_internal.h | 263 ++++
|
||||
lib/zstd/zstd_opt.h | 1014 +++++++++++++
|
||||
39 files changed, 14382 insertions(+), 12 deletions(-)
|
||||
44 files changed, 14756 insertions(+), 12 deletions(-)
|
||||
create mode 100644 crypto/zstd.c
|
||||
create mode 100644 fs/btrfs/zstd.c
|
||||
create mode 100644 fs/squashfs/zstd_wrapper.c
|
||||
create mode 100644 include/linux/xxhash.h
|
||||
|
@ -1,7 +1,7 @@
|
||||
From fc7f26acbabda35f1c61dfc357dbb207dc8ed23d Mon Sep 17 00:00:00 2001
|
||||
From a4b1ffb6e89bbccd519f9afa0910635668436105 Mon Sep 17 00:00:00 2001
|
||||
From: Nick Terrell <terrelln@fb.com>
|
||||
Date: Mon, 17 Jul 2017 17:07:18 -0700
|
||||
Subject: [PATCH v3 1/4] lib: Add xxhash module
|
||||
Subject: [PATCH v5 1/5] lib: Add xxhash module
|
||||
|
||||
Adds xxhash kernel module with xxh32 and xxh64 hashes. xxhash is an
|
||||
extremely fast non-cryptographic hash algorithm for checksumming.
|
||||
|
@ -1,7 +1,7 @@
|
||||
From 686a6149b98250d66b5951e3ae05e79063e9de98 Mon Sep 17 00:00:00 2001
|
||||
From 2b29ec569f8438a0307debd29873859ca6d407fc Mon Sep 17 00:00:00 2001
|
||||
From: Nick Terrell <terrelln@fb.com>
|
||||
Date: Mon, 17 Jul 2017 17:08:19 -0700
|
||||
Subject: [PATCH v3 2/4] lib: Add zstd modules
|
||||
Subject: [PATCH v5 2/5] lib: Add zstd modules
|
||||
|
||||
Add zstd compression and decompression kernel modules.
|
||||
zstd offers a wide varity of compression speed and quality trade-offs.
|
||||
@ -114,26 +114,33 @@ v2 -> v3:
|
||||
- Work around gcc-7 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388
|
||||
- Fix bug in dictionary compression from upstream commit cc1522351f
|
||||
|
||||
include/linux/zstd.h | 1157 +++++++++++++++
|
||||
v3 -> v4:
|
||||
- Fix minor compiler warnings
|
||||
|
||||
v4 -> v5:
|
||||
- Fix rare compression bug from upstream commit 308047eb5d
|
||||
- Fix bug introduced in v3 when working around the gcc-7 bug
|
||||
|
||||
include/linux/zstd.h | 1155 +++++++++++++++
|
||||
lib/Kconfig | 8 +
|
||||
lib/Makefile | 2 +
|
||||
lib/zstd/Makefile | 18 +
|
||||
lib/zstd/bitstream.h | 374 +++++
|
||||
lib/zstd/compress.c | 3479 +++++++++++++++++++++++++++++++++++++++++++++
|
||||
lib/zstd/compress.c | 3482 +++++++++++++++++++++++++++++++++++++++++++++
|
||||
lib/zstd/decompress.c | 2526 ++++++++++++++++++++++++++++++++
|
||||
lib/zstd/entropy_common.c | 243 ++++
|
||||
lib/zstd/error_private.h | 53 +
|
||||
lib/zstd/error_private.h | 51 +
|
||||
lib/zstd/fse.h | 575 ++++++++
|
||||
lib/zstd/fse_compress.c | 795 +++++++++++
|
||||
lib/zstd/fse_decompress.c | 332 +++++
|
||||
lib/zstd/huf.h | 212 +++
|
||||
lib/zstd/huf_compress.c | 770 ++++++++++
|
||||
lib/zstd/huf_decompress.c | 960 +++++++++++++
|
||||
lib/zstd/mem.h | 151 ++
|
||||
lib/zstd/zstd_common.c | 75 +
|
||||
lib/zstd/zstd_internal.h | 250 ++++
|
||||
lib/zstd/zstd_opt.h | 1014 +++++++++++++
|
||||
19 files changed, 12994 insertions(+)
|
||||
lib/zstd/mem.h | 149 ++
|
||||
lib/zstd/zstd_common.c | 73 +
|
||||
lib/zstd/zstd_internal.h | 261 ++++
|
||||
lib/zstd/zstd_opt.h | 1012 +++++++++++++
|
||||
19 files changed, 12998 insertions(+)
|
||||
create mode 100644 include/linux/zstd.h
|
||||
create mode 100644 lib/zstd/Makefile
|
||||
create mode 100644 lib/zstd/bitstream.h
|
||||
@ -154,18 +161,16 @@ v2 -> v3:
|
||||
|
||||
diff --git a/include/linux/zstd.h b/include/linux/zstd.h
|
||||
new file mode 100644
|
||||
index 0000000..249575e
|
||||
index 0000000..305efd0
|
||||
--- /dev/null
|
||||
+++ b/include/linux/zstd.h
|
||||
@@ -0,0 +1,1157 @@
|
||||
@@ -0,0 +1,1155 @@
|
||||
+/*
|
||||
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -1753,18 +1758,16 @@ index 0000000..a826b99
|
||||
+#endif /* BITSTREAM_H_MODULE */
|
||||
diff --git a/lib/zstd/compress.c b/lib/zstd/compress.c
|
||||
new file mode 100644
|
||||
index 0000000..d60ab7d
|
||||
index 0000000..ff18ae6
|
||||
--- /dev/null
|
||||
+++ b/lib/zstd/compress.c
|
||||
@@ -0,0 +1,3479 @@
|
||||
@@ -0,0 +1,3482 @@
|
||||
+/**
|
||||
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -2342,7 +2345,7 @@ index 0000000..d60ab7d
|
||||
+ mlCodeTable[seqStorePtr->longLengthPos] = MaxML;
|
||||
+}
|
||||
+
|
||||
+ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCapacity, size_t srcSize)
|
||||
+ZSTD_STATIC size_t ZSTD_compressSequences_internal(ZSTD_CCtx *zc, void *dst, size_t dstCapacity)
|
||||
+{
|
||||
+ const int longOffsets = zc->params.cParams.windowLog > STREAM_ACCUMULATOR_MIN;
|
||||
+ const seqStore_t *seqStorePtr = &(zc->seqStore);
|
||||
@ -2395,7 +2398,7 @@ index 0000000..d60ab7d
|
||||
+ else
|
||||
+ op[0] = 0xFF, ZSTD_writeLE16(op + 1, (U16)(nbSeq - LONGNBSEQ)), op += 3;
|
||||
+ if (nbSeq == 0)
|
||||
+ goto _check_compressibility;
|
||||
+ return op - ostart;
|
||||
+
|
||||
+ /* seqHead : flags for FSE encoding type */
|
||||
+ seqHead = op++;
|
||||
@ -2585,28 +2588,33 @@ index 0000000..d60ab7d
|
||||
+ op += streamSize;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+/* check compressibility */
|
||||
+_check_compressibility:
|
||||
+ {
|
||||
+ size_t const minGain = ZSTD_minGain(srcSize);
|
||||
+ size_t const maxCSize = srcSize - minGain;
|
||||
+ if ((size_t)(op - ostart) >= maxCSize) {
|
||||
+ zc->flagStaticHufTable = HUF_repeat_none;
|
||||
+ return 0;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ /* confirm repcodes */
|
||||
+ {
|
||||
+ int i;
|
||||
+ for (i = 0; i < ZSTD_REP_NUM; i++)
|
||||
+ zc->rep[i] = zc->repToConfirm[i];
|
||||
+ }
|
||||
+
|
||||
+ return op - ostart;
|
||||
+}
|
||||
+
|
||||
+ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCapacity, size_t srcSize)
|
||||
+{
|
||||
+ size_t const cSize = ZSTD_compressSequences_internal(zc, dst, dstCapacity);
|
||||
+ size_t const minGain = ZSTD_minGain(srcSize);
|
||||
+ size_t const maxCSize = srcSize - minGain;
|
||||
+ /* If the srcSize <= dstCapacity, then there is enough space to write a
|
||||
+ * raw uncompressed block. Since we ran out of space, the block must not
|
||||
+ * be compressible, so fall back to a raw uncompressed block.
|
||||
+ */
|
||||
+ int const uncompressibleError = cSize == ERROR(dstSize_tooSmall) && srcSize <= dstCapacity;
|
||||
+ int i;
|
||||
+
|
||||
+ if (ZSTD_isError(cSize) && !uncompressibleError)
|
||||
+ return cSize;
|
||||
+ if (cSize >= maxCSize || uncompressibleError) {
|
||||
+ zc->flagStaticHufTable = HUF_repeat_none;
|
||||
+ return 0;
|
||||
+ }
|
||||
+ /* confirm repcodes */
|
||||
+ for (i = 0; i < ZSTD_REP_NUM; i++)
|
||||
+ zc->rep[i] = zc->repToConfirm[i];
|
||||
+ return cSize;
|
||||
+}
|
||||
+
|
||||
+/*! ZSTD_storeSeq() :
|
||||
+ Store a sequence (literal length, literals, offset code and match length code) into seqStore_t.
|
||||
+ `offsetCode` : distance to match, or 0 == repCode.
|
||||
@ -5238,7 +5246,7 @@ index 0000000..d60ab7d
|
||||
+MODULE_DESCRIPTION("Zstd Compressor");
|
||||
diff --git a/lib/zstd/decompress.c b/lib/zstd/decompress.c
|
||||
new file mode 100644
|
||||
index 0000000..62449ae
|
||||
index 0000000..72df4828
|
||||
--- /dev/null
|
||||
+++ b/lib/zstd/decompress.c
|
||||
@@ -0,0 +1,2526 @@
|
||||
@ -5248,8 +5256,6 @@ index 0000000..62449ae
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -6242,6 +6248,8 @@ index 0000000..62449ae
|
||||
+ BIT_reloadDStream(&seqState->DStream); /* <= 18 bits */
|
||||
+ FSE_updateState(&seqState->stateOffb, &seqState->DStream); /* <= 8 bits */
|
||||
+
|
||||
+ seq.match = NULL;
|
||||
+
|
||||
+ return seq;
|
||||
+}
|
||||
+
|
||||
@ -8019,18 +8027,16 @@ index 0000000..2b0a643
|
||||
+}
|
||||
diff --git a/lib/zstd/error_private.h b/lib/zstd/error_private.h
|
||||
new file mode 100644
|
||||
index 0000000..1a60b31
|
||||
index 0000000..2062ff0
|
||||
--- /dev/null
|
||||
+++ b/lib/zstd/error_private.h
|
||||
@@ -0,0 +1,53 @@
|
||||
@@ -0,0 +1,51 @@
|
||||
+/**
|
||||
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -11758,18 +11764,16 @@ index 0000000..6526482
|
||||
+}
|
||||
diff --git a/lib/zstd/mem.h b/lib/zstd/mem.h
|
||||
new file mode 100644
|
||||
index 0000000..3a0f34c
|
||||
index 0000000..42a697b
|
||||
--- /dev/null
|
||||
+++ b/lib/zstd/mem.h
|
||||
@@ -0,0 +1,151 @@
|
||||
@@ -0,0 +1,149 @@
|
||||
+/**
|
||||
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -11915,18 +11919,16 @@ index 0000000..3a0f34c
|
||||
+#endif /* MEM_H_MODULE */
|
||||
diff --git a/lib/zstd/zstd_common.c b/lib/zstd/zstd_common.c
|
||||
new file mode 100644
|
||||
index 0000000..a282624
|
||||
index 0000000..e5f06d7
|
||||
--- /dev/null
|
||||
+++ b/lib/zstd/zstd_common.c
|
||||
@@ -0,0 +1,75 @@
|
||||
@@ -0,0 +1,73 @@
|
||||
+/**
|
||||
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -11996,18 +11998,16 @@ index 0000000..a282624
|
||||
+}
|
||||
diff --git a/lib/zstd/zstd_internal.h b/lib/zstd/zstd_internal.h
|
||||
new file mode 100644
|
||||
index 0000000..f0ba474
|
||||
index 0000000..a0fb83e
|
||||
--- /dev/null
|
||||
+++ b/lib/zstd/zstd_internal.h
|
||||
@@ -0,0 +1,250 @@
|
||||
@@ -0,0 +1,261 @@
|
||||
+/**
|
||||
+ * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -12128,7 +12128,7 @@ index 0000000..f0ba474
|
||||
+/*-*******************************************
|
||||
+* Shared functions to include for inlining
|
||||
+*********************************************/
|
||||
+static void ZSTD_copy8(void *dst, const void *src) {
|
||||
+ZSTD_STATIC void ZSTD_copy8(void *dst, const void *src) {
|
||||
+ memcpy(dst, src, 8);
|
||||
+}
|
||||
+/*! ZSTD_wildcopy() :
|
||||
@ -12136,8 +12136,21 @@ index 0000000..f0ba474
|
||||
+#define WILDCOPY_OVERLENGTH 8
|
||||
+ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length)
|
||||
+{
|
||||
+ if (length > 0)
|
||||
+ memcpy(dst, src, length);
|
||||
+ const BYTE* ip = (const BYTE*)src;
|
||||
+ BYTE* op = (BYTE*)dst;
|
||||
+ BYTE* const oend = op + length;
|
||||
+ /* Work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388.
|
||||
+ * Avoid the bad case where the loop only runs once by handling the
|
||||
+ * special case separately. This doesn't trigger the bug because it
|
||||
+ * doesn't involve pointer/integer overflow.
|
||||
+ */
|
||||
+ if (length <= 8)
|
||||
+ return ZSTD_copy8(dst, src);
|
||||
+ do {
|
||||
+ ZSTD_copy8(op, ip);
|
||||
+ op += 8;
|
||||
+ ip += 8;
|
||||
+ } while (op < oend);
|
||||
+}
|
||||
+
|
||||
+/*-*******************************************
|
||||
@ -12252,18 +12265,16 @@ index 0000000..f0ba474
|
||||
+#endif /* ZSTD_CCOMMON_H_MODULE */
|
||||
diff --git a/lib/zstd/zstd_opt.h b/lib/zstd/zstd_opt.h
|
||||
new file mode 100644
|
||||
index 0000000..55e1b4c
|
||||
index 0000000..ecdd725
|
||||
--- /dev/null
|
||||
+++ b/lib/zstd/zstd_opt.h
|
||||
@@ -0,0 +1,1014 @@
|
||||
@@ -0,0 +1,1012 @@
|
||||
+/**
|
||||
+ * Copyright (c) 2016-present, Przemyslaw Skibinski, Yann Collet, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
+ *
|
||||
+ * This source code is licensed under the BSD-style license found in the
|
||||
+ * LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
+ * An additional grant of patent rights can be found in the PATENTS file in the
|
||||
+ * same directory.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it under
|
||||
+ * the terms of the GNU General Public License version 2 as published by the
|
||||
@ -13271,4 +13282,4 @@ index 0000000..55e1b4c
|
||||
+
|
||||
+#endif /* ZSTD_OPT_H_91842398743 */
|
||||
--
|
||||
2.9.3
|
||||
2.9.5
|
||||
|
@ -1,7 +1,7 @@
|
||||
From b0ef8fc63c9ca251ceca632f53aa1de8f1f17772 Mon Sep 17 00:00:00 2001
|
||||
From 8a9dddfbf6551afea73911e367dd4be64d62b9fd Mon Sep 17 00:00:00 2001
|
||||
From: Nick Terrell <terrelln@fb.com>
|
||||
Date: Mon, 17 Jul 2017 17:08:39 -0700
|
||||
Subject: [PATCH v3 3/4] btrfs: Add zstd support
|
||||
Subject: [PATCH v5 3/5] btrfs: Add zstd support
|
||||
|
||||
Add zstd compression and decompression support to BtrFS. zstd at its
|
||||
fastest level compresses almost as well as zlib, while offering much
|
||||
@ -67,6 +67,10 @@ v2 -> v3:
|
||||
- Port upstream BtrFS commits e1ddce71d6, 389a6cfc2a, and 6acafd1eff
|
||||
- Change default compression level for BtrFS to 3
|
||||
|
||||
v3 -> v4:
|
||||
- Add missing includes, which fixes the aarch64 build
|
||||
- Fix minor linter warnings
|
||||
|
||||
fs/btrfs/Kconfig | 2 +
|
||||
fs/btrfs/Makefile | 2 +-
|
||||
fs/btrfs/compression.c | 1 +
|
||||
@ -77,9 +81,9 @@ v2 -> v3:
|
||||
fs/btrfs/props.c | 6 +
|
||||
fs/btrfs/super.c | 12 +-
|
||||
fs/btrfs/sysfs.c | 2 +
|
||||
fs/btrfs/zstd.c | 435 +++++++++++++++++++++++++++++++++++++++++++++
|
||||
fs/btrfs/zstd.c | 432 +++++++++++++++++++++++++++++++++++++++++++++
|
||||
include/uapi/linux/btrfs.h | 8 +-
|
||||
12 files changed, 471 insertions(+), 12 deletions(-)
|
||||
12 files changed, 468 insertions(+), 12 deletions(-)
|
||||
create mode 100644 fs/btrfs/zstd.c
|
||||
|
||||
diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
|
||||
@ -277,10 +281,10 @@ index c2d5f35..2b6d37c 100644
|
||||
BTRFS_FEAT_ATTR_PTR(raid56),
|
||||
diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
|
||||
new file mode 100644
|
||||
index 0000000..1822068
|
||||
index 0000000..607ce47
|
||||
--- /dev/null
|
||||
+++ b/fs/btrfs/zstd.c
|
||||
@@ -0,0 +1,435 @@
|
||||
@@ -0,0 +1,432 @@
|
||||
+/*
|
||||
+ * Copyright (c) 2016-present, Facebook, Inc.
|
||||
+ * All rights reserved.
|
||||
@ -293,20 +297,16 @@ index 0000000..1822068
|
||||
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
+ * General Public License for more details.
|
||||
+ *
|
||||
+ * You should have received a copy of the GNU General Public
|
||||
+ * License along with this program; if not, write to the
|
||||
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
||||
+ * Boston, MA 021110-1307, USA.
|
||||
+ */
|
||||
+#include <linux/kernel.h>
|
||||
+#include <linux/slab.h>
|
||||
+#include <linux/vmalloc.h>
|
||||
+#include <linux/init.h>
|
||||
+#include <linux/err.h>
|
||||
+#include <linux/sched.h>
|
||||
+#include <linux/pagemap.h>
|
||||
+#include <linux/bio.h>
|
||||
+#include <linux/err.h>
|
||||
+#include <linux/init.h>
|
||||
+#include <linux/kernel.h>
|
||||
+#include <linux/mm.h>
|
||||
+#include <linux/pagemap.h>
|
||||
+#include <linux/refcount.h>
|
||||
+#include <linux/sched.h>
|
||||
+#include <linux/slab.h>
|
||||
+#include <linux/zstd.h>
|
||||
+#include "compression.h"
|
||||
+
|
||||
@ -316,7 +316,8 @@ index 0000000..1822068
|
||||
+
|
||||
+static ZSTD_parameters zstd_get_btrfs_parameters(size_t src_len)
|
||||
+{
|
||||
+ ZSTD_parameters params = ZSTD_getParams(ZSTD_BTRFS_DEFAULT_LEVEL, src_len, 0);
|
||||
+ ZSTD_parameters params = ZSTD_getParams(ZSTD_BTRFS_DEFAULT_LEVEL,
|
||||
+ src_len, 0);
|
||||
+
|
||||
+ if (params.cParams.windowLog > ZSTD_BTRFS_MAX_WINDOWLOG)
|
||||
+ params.cParams.windowLog = ZSTD_BTRFS_MAX_WINDOWLOG;
|
||||
|
@ -1,7 +1,7 @@
|
||||
From 0cd63464d182bb9708f8b25f7da3dc8e5ec6b4fa Mon Sep 17 00:00:00 2001
|
||||
From: Nick Terrell <terrelln@fb.com>
|
||||
From 46bf8f6d30d6ddf2446c110f122482b5e5e16933 Mon Sep 17 00:00:00 2001
|
||||
From: Sean Purcell <me@seanp.xyz>
|
||||
Date: Mon, 17 Jul 2017 17:08:59 -0700
|
||||
Subject: [PATCH v3 4/4] squashfs: Add zstd support
|
||||
Subject: [PATCH v5 4/5] squashfs: Add zstd support
|
||||
|
||||
Add zstd compression and decompression support to SquashFS. zstd is a
|
||||
great fit for SquashFS because it can compress at ratios approaching xz,
|
||||
@ -42,16 +42,23 @@ taking over the submission process.
|
||||
|
||||
zstd source repository: https://github.com/facebook/zstd
|
||||
|
||||
Cc: Sean Purcell <me@seanp.xyz>
|
||||
Signed-off-by: Sean Purcell <me@seanp.xyz>
|
||||
Signed-off-by: Nick Terrell <terrelln@fb.com>
|
||||
---
|
||||
v3 -> v4:
|
||||
- Fix minor linter warnings
|
||||
|
||||
v4 -> v5:
|
||||
- Fix ZSTD_DStream initialization code in squashfs
|
||||
- Fix patch documentation to reflect that Sean Purcell is the author
|
||||
|
||||
fs/squashfs/Kconfig | 14 +++++
|
||||
fs/squashfs/Makefile | 1 +
|
||||
fs/squashfs/decompressor.c | 7 +++
|
||||
fs/squashfs/decompressor.h | 4 ++
|
||||
fs/squashfs/squashfs_fs.h | 1 +
|
||||
fs/squashfs/zstd_wrapper.c | 150 +++++++++++++++++++++++++++++++++++++++++++++
|
||||
6 files changed, 177 insertions(+)
|
||||
fs/squashfs/zstd_wrapper.c | 151 +++++++++++++++++++++++++++++++++++++++++++++
|
||||
6 files changed, 178 insertions(+)
|
||||
create mode 100644 fs/squashfs/zstd_wrapper.c
|
||||
|
||||
diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
|
||||
@ -140,10 +147,10 @@ index 506f4ba..24d12fd 100644
|
||||
__le32 s_magic;
|
||||
diff --git a/fs/squashfs/zstd_wrapper.c b/fs/squashfs/zstd_wrapper.c
|
||||
new file mode 100644
|
||||
index 0000000..8cb7c76
|
||||
index 0000000..eeaabf8
|
||||
--- /dev/null
|
||||
+++ b/fs/squashfs/zstd_wrapper.c
|
||||
@@ -0,0 +1,150 @@
|
||||
@@ -0,0 +1,151 @@
|
||||
+/*
|
||||
+ * Squashfs - a compressed read only filesystem for Linux
|
||||
+ *
|
||||
@ -160,10 +167,6 @@ index 0000000..8cb7c76
|
||||
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
+ * GNU General Public License for more details.
|
||||
+ *
|
||||
+ * You should have received a copy of the GNU General Public License
|
||||
+ * along with this program; if not, write to the Free Software
|
||||
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
|
||||
+ *
|
||||
+ * zstd_wrapper.c
|
||||
+ */
|
||||
+
|
||||
@ -182,15 +185,18 @@ index 0000000..8cb7c76
|
||||
+struct workspace {
|
||||
+ void *mem;
|
||||
+ size_t mem_size;
|
||||
+ size_t window_size;
|
||||
+};
|
||||
+
|
||||
+static void *zstd_init(struct squashfs_sb_info *msblk, void *buff)
|
||||
+{
|
||||
+ struct workspace *wksp = kmalloc(sizeof(*wksp), GFP_KERNEL);
|
||||
+
|
||||
+ if (wksp == NULL)
|
||||
+ goto failed;
|
||||
+ wksp->mem_size = ZSTD_DStreamWorkspaceBound(max_t(size_t,
|
||||
+ msblk->block_size, SQUASHFS_METADATA_SIZE));
|
||||
+ wksp->window_size = max_t(size_t,
|
||||
+ msblk->block_size, SQUASHFS_METADATA_SIZE);
|
||||
+ wksp->mem_size = ZSTD_DStreamWorkspaceBound(wksp->window_size);
|
||||
+ wksp->mem = vmalloc(wksp->mem_size);
|
||||
+ if (wksp->mem == NULL)
|
||||
+ goto failed;
|
||||
@ -226,7 +232,7 @@ index 0000000..8cb7c76
|
||||
+ ZSTD_inBuffer in_buf = { NULL, 0, 0 };
|
||||
+ ZSTD_outBuffer out_buf = { NULL, 0, 0 };
|
||||
+
|
||||
+ stream = ZSTD_initDStream(wksp->mem_size, wksp->mem, wksp->mem_size);
|
||||
+ stream = ZSTD_initDStream(wksp->window_size, wksp->mem, wksp->mem_size);
|
||||
+
|
||||
+ if (!stream) {
|
||||
+ ERROR("Failed to initialize zstd decompressor\n");
|
||||
@ -239,6 +245,7 @@ index 0000000..8cb7c76
|
||||
+ do {
|
||||
+ if (in_buf.pos == in_buf.size && k < b) {
|
||||
+ int avail = min(length, msblk->devblksize - offset);
|
||||
+
|
||||
+ length -= avail;
|
||||
+ in_buf.src = bh[k]->b_data + offset;
|
||||
+ in_buf.size = avail;
|
||||
@ -249,8 +256,9 @@ index 0000000..8cb7c76
|
||||
+ if (out_buf.pos == out_buf.size) {
|
||||
+ out_buf.dst = squashfs_next_page(output);
|
||||
+ if (out_buf.dst == NULL) {
|
||||
+ /* shouldn't run out of pages before stream is
|
||||
+ * done */
|
||||
+ /* Shouldn't run out of pages
|
||||
+ * before stream is done.
|
||||
+ */
|
||||
+ squashfs_finish_page(output);
|
||||
+ goto out;
|
||||
+ }
|
||||
|
424
contrib/linux-kernel/0005-crypto-Add-zstd-support.patch
Normal file
424
contrib/linux-kernel/0005-crypto-Add-zstd-support.patch
Normal file
@ -0,0 +1,424 @@
|
||||
From 308795a7713ca6fcd468b60fba9a2fca99cee6a0 Mon Sep 17 00:00:00 2001
|
||||
From: Nick Terrell <terrelln@fb.com>
|
||||
Date: Wed, 2 Aug 2017 18:02:13 -0700
|
||||
Subject: [PATCH v5 5/5] crypto: Add zstd support
|
||||
|
||||
Adds zstd support to crypto and scompress. Only supports the default
|
||||
level.
|
||||
|
||||
Signed-off-by: Nick Terrell <terrelln@fb.com>
|
||||
---
|
||||
crypto/Kconfig | 9 ++
|
||||
crypto/Makefile | 1 +
|
||||
crypto/testmgr.c | 10 +++
|
||||
crypto/testmgr.h | 71 +++++++++++++++
|
||||
crypto/zstd.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
5 files changed, 356 insertions(+)
|
||||
create mode 100644 crypto/zstd.c
|
||||
|
||||
diff --git a/crypto/Kconfig b/crypto/Kconfig
|
||||
index caa770e..4fc3936 100644
|
||||
--- a/crypto/Kconfig
|
||||
+++ b/crypto/Kconfig
|
||||
@@ -1662,6 +1662,15 @@ config CRYPTO_LZ4HC
|
||||
help
|
||||
This is the LZ4 high compression mode algorithm.
|
||||
|
||||
+config CRYPTO_ZSTD
|
||||
+ tristate "Zstd compression algorithm"
|
||||
+ select CRYPTO_ALGAPI
|
||||
+ select CRYPTO_ACOMP2
|
||||
+ select ZSTD_COMPRESS
|
||||
+ select ZSTD_DECOMPRESS
|
||||
+ help
|
||||
+ This is the zstd algorithm.
|
||||
+
|
||||
comment "Random Number Generation"
|
||||
|
||||
config CRYPTO_ANSI_CPRNG
|
||||
diff --git a/crypto/Makefile b/crypto/Makefile
|
||||
index d41f033..b22e1e8 100644
|
||||
--- a/crypto/Makefile
|
||||
+++ b/crypto/Makefile
|
||||
@@ -133,6 +133,7 @@ obj-$(CONFIG_CRYPTO_USER_API_HASH) += algif_hash.o
|
||||
obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
|
||||
obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
|
||||
obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
|
||||
+obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
|
||||
|
||||
ecdh_generic-y := ecc.o
|
||||
ecdh_generic-y += ecdh.o
|
||||
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
|
||||
index 7125ba3..8a124d3 100644
|
||||
--- a/crypto/testmgr.c
|
||||
+++ b/crypto/testmgr.c
|
||||
@@ -3603,6 +3603,16 @@ static const struct alg_test_desc alg_test_descs[] = {
|
||||
.decomp = __VECS(zlib_deflate_decomp_tv_template)
|
||||
}
|
||||
}
|
||||
+ }, {
|
||||
+ .alg = "zstd",
|
||||
+ .test = alg_test_comp,
|
||||
+ .fips_allowed = 1,
|
||||
+ .suite = {
|
||||
+ .comp = {
|
||||
+ .comp = __VECS(zstd_comp_tv_template),
|
||||
+ .decomp = __VECS(zstd_decomp_tv_template)
|
||||
+ }
|
||||
+ }
|
||||
}
|
||||
};
|
||||
|
||||
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
|
||||
index 6ceb0e2..e6b5920 100644
|
||||
--- a/crypto/testmgr.h
|
||||
+++ b/crypto/testmgr.h
|
||||
@@ -34631,4 +34631,75 @@ static const struct comp_testvec lz4hc_decomp_tv_template[] = {
|
||||
},
|
||||
};
|
||||
|
||||
+static const struct comp_testvec zstd_comp_tv_template[] = {
|
||||
+ {
|
||||
+ .inlen = 68,
|
||||
+ .outlen = 39,
|
||||
+ .input = "The algorithm is zstd. "
|
||||
+ "The algorithm is zstd. "
|
||||
+ "The algorithm is zstd.",
|
||||
+ .output = "\x28\xb5\x2f\xfd\x00\x50\xf5\x00\x00\xb8\x54\x68\x65"
|
||||
+ "\x20\x61\x6c\x67\x6f\x72\x69\x74\x68\x6d\x20\x69\x73"
|
||||
+ "\x20\x7a\x73\x74\x64\x2e\x20\x01\x00\x55\x73\x36\x01"
|
||||
+ ,
|
||||
+ },
|
||||
+ {
|
||||
+ .inlen = 244,
|
||||
+ .outlen = 151,
|
||||
+ .input = "zstd, short for Zstandard, is a fast lossless "
|
||||
+ "compression algorithm, targeting real-time "
|
||||
+ "compression scenarios at zlib-level and better "
|
||||
+ "compression ratios. The zstd compression library "
|
||||
+ "provides in-memory compression and decompression "
|
||||
+ "functions.",
|
||||
+ .output = "\x28\xb5\x2f\xfd\x00\x50\x75\x04\x00\x42\x4b\x1e\x17"
|
||||
+ "\x90\x81\x31\x00\xf2\x2f\xe4\x36\xc9\xef\x92\x88\x32"
|
||||
+ "\xc9\xf2\x24\x94\xd8\x68\x9a\x0f\x00\x0c\xc4\x31\x6f"
|
||||
+ "\x0d\x0c\x38\xac\x5c\x48\x03\xcd\x63\x67\xc0\xf3\xad"
|
||||
+ "\x4e\x90\xaa\x78\xa0\xa4\xc5\x99\xda\x2f\xb6\x24\x60"
|
||||
+ "\xe2\x79\x4b\xaa\xb6\x6b\x85\x0b\xc9\xc6\x04\x66\x86"
|
||||
+ "\xe2\xcc\xe2\x25\x3f\x4f\x09\xcd\xb8\x9d\xdb\xc1\x90"
|
||||
+ "\xa9\x11\xbc\x35\x44\x69\x2d\x9c\x64\x4f\x13\x31\x64"
|
||||
+ "\xcc\xfb\x4d\x95\x93\x86\x7f\x33\x7f\x1a\xef\xe9\x30"
|
||||
+ "\xf9\x67\xa1\x94\x0a\x69\x0f\x60\xcd\xc3\xab\x99\xdc"
|
||||
+ "\x42\xed\x97\x05\x00\x33\xc3\x15\x95\x3a\x06\xa0\x0e"
|
||||
+ "\x20\xa9\x0e\x82\xb9\x43\x45\x01",
|
||||
+ },
|
||||
+};
|
||||
+
|
||||
+static const struct comp_testvec zstd_decomp_tv_template[] = {
|
||||
+ {
|
||||
+ .inlen = 43,
|
||||
+ .outlen = 68,
|
||||
+ .input = "\x28\xb5\x2f\xfd\x04\x50\xf5\x00\x00\xb8\x54\x68\x65"
|
||||
+ "\x20\x61\x6c\x67\x6f\x72\x69\x74\x68\x6d\x20\x69\x73"
|
||||
+ "\x20\x7a\x73\x74\x64\x2e\x20\x01\x00\x55\x73\x36\x01"
|
||||
+ "\x6b\xf4\x13\x35",
|
||||
+ .output = "The algorithm is zstd. "
|
||||
+ "The algorithm is zstd. "
|
||||
+ "The algorithm is zstd.",
|
||||
+ },
|
||||
+ {
|
||||
+ .inlen = 155,
|
||||
+ .outlen = 244,
|
||||
+ .input = "\x28\xb5\x2f\xfd\x04\x50\x75\x04\x00\x42\x4b\x1e\x17"
|
||||
+ "\x90\x81\x31\x00\xf2\x2f\xe4\x36\xc9\xef\x92\x88\x32"
|
||||
+ "\xc9\xf2\x24\x94\xd8\x68\x9a\x0f\x00\x0c\xc4\x31\x6f"
|
||||
+ "\x0d\x0c\x38\xac\x5c\x48\x03\xcd\x63\x67\xc0\xf3\xad"
|
||||
+ "\x4e\x90\xaa\x78\xa0\xa4\xc5\x99\xda\x2f\xb6\x24\x60"
|
||||
+ "\xe2\x79\x4b\xaa\xb6\x6b\x85\x0b\xc9\xc6\x04\x66\x86"
|
||||
+ "\xe2\xcc\xe2\x25\x3f\x4f\x09\xcd\xb8\x9d\xdb\xc1\x90"
|
||||
+ "\xa9\x11\xbc\x35\x44\x69\x2d\x9c\x64\x4f\x13\x31\x64"
|
||||
+ "\xcc\xfb\x4d\x95\x93\x86\x7f\x33\x7f\x1a\xef\xe9\x30"
|
||||
+ "\xf9\x67\xa1\x94\x0a\x69\x0f\x60\xcd\xc3\xab\x99\xdc"
|
||||
+ "\x42\xed\x97\x05\x00\x33\xc3\x15\x95\x3a\x06\xa0\x0e"
|
||||
+ "\x20\xa9\x0e\x82\xb9\x43\x45\x01\xaa\x6d\xda\x0d",
|
||||
+ .output = "zstd, short for Zstandard, is a fast lossless "
|
||||
+ "compression algorithm, targeting real-time "
|
||||
+ "compression scenarios at zlib-level and better "
|
||||
+ "compression ratios. The zstd compression library "
|
||||
+ "provides in-memory compression and decompression "
|
||||
+ "functions.",
|
||||
+ },
|
||||
+};
|
||||
#endif /* _CRYPTO_TESTMGR_H */
|
||||
diff --git a/crypto/zstd.c b/crypto/zstd.c
|
||||
new file mode 100644
|
||||
index 0000000..9a76b3e
|
||||
--- /dev/null
|
||||
+++ b/crypto/zstd.c
|
||||
@@ -0,0 +1,265 @@
|
||||
+/*
|
||||
+ * Cryptographic API.
|
||||
+ *
|
||||
+ * Copyright (c) 2017-present, Facebook, Inc.
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or modify it
|
||||
+ * under the terms of the GNU General Public License version 2 as published by
|
||||
+ * the Free Software Foundation.
|
||||
+ *
|
||||
+ * This program is distributed in the hope that it will be useful, but WITHOUT
|
||||
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
|
||||
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
|
||||
+ * more details.
|
||||
+ */
|
||||
+#include <linux/crypto.h>
|
||||
+#include <linux/init.h>
|
||||
+#include <linux/interrupt.h>
|
||||
+#include <linux/mm.h>
|
||||
+#include <linux/module.h>
|
||||
+#include <linux/net.h>
|
||||
+#include <linux/vmalloc.h>
|
||||
+#include <linux/zstd.h>
|
||||
+#include <crypto/internal/scompress.h>
|
||||
+
|
||||
+
|
||||
+#define ZSTD_DEF_LEVEL 3
|
||||
+
|
||||
+struct zstd_ctx {
|
||||
+ ZSTD_CCtx *cctx;
|
||||
+ ZSTD_DCtx *dctx;
|
||||
+ void *cwksp;
|
||||
+ void *dwksp;
|
||||
+};
|
||||
+
|
||||
+static ZSTD_parameters zstd_params(void)
|
||||
+{
|
||||
+ return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
|
||||
+}
|
||||
+
|
||||
+static int zstd_comp_init(struct zstd_ctx *ctx)
|
||||
+{
|
||||
+ int ret = 0;
|
||||
+ const ZSTD_parameters params = zstd_params();
|
||||
+ const size_t wksp_size = ZSTD_CCtxWorkspaceBound(params.cParams);
|
||||
+
|
||||
+ ctx->cwksp = vzalloc(wksp_size);
|
||||
+ if (!ctx->cwksp) {
|
||||
+ ret = -ENOMEM;
|
||||
+ goto out;
|
||||
+ }
|
||||
+
|
||||
+ ctx->cctx = ZSTD_initCCtx(ctx->cwksp, wksp_size);
|
||||
+ if (!ctx->cctx) {
|
||||
+ ret = -EINVAL;
|
||||
+ goto out_free;
|
||||
+ }
|
||||
+out:
|
||||
+ return ret;
|
||||
+out_free:
|
||||
+ vfree(ctx->cwksp);
|
||||
+ goto out;
|
||||
+}
|
||||
+
|
||||
+static int zstd_decomp_init(struct zstd_ctx *ctx)
|
||||
+{
|
||||
+ int ret = 0;
|
||||
+ const size_t wksp_size = ZSTD_DCtxWorkspaceBound();
|
||||
+
|
||||
+ ctx->dwksp = vzalloc(wksp_size);
|
||||
+ if (!ctx->dwksp) {
|
||||
+ ret = -ENOMEM;
|
||||
+ goto out;
|
||||
+ }
|
||||
+
|
||||
+ ctx->dctx = ZSTD_initDCtx(ctx->dwksp, wksp_size);
|
||||
+ if (!ctx->dctx) {
|
||||
+ ret = -EINVAL;
|
||||
+ goto out_free;
|
||||
+ }
|
||||
+out:
|
||||
+ return ret;
|
||||
+out_free:
|
||||
+ vfree(ctx->dwksp);
|
||||
+ goto out;
|
||||
+}
|
||||
+
|
||||
+static void zstd_comp_exit(struct zstd_ctx *ctx)
|
||||
+{
|
||||
+ vfree(ctx->cwksp);
|
||||
+ ctx->cwksp = NULL;
|
||||
+ ctx->cctx = NULL;
|
||||
+}
|
||||
+
|
||||
+static void zstd_decomp_exit(struct zstd_ctx *ctx)
|
||||
+{
|
||||
+ vfree(ctx->dwksp);
|
||||
+ ctx->dwksp = NULL;
|
||||
+ ctx->dctx = NULL;
|
||||
+}
|
||||
+
|
||||
+static int __zstd_init(void *ctx)
|
||||
+{
|
||||
+ int ret;
|
||||
+
|
||||
+ ret = zstd_comp_init(ctx);
|
||||
+ if (ret)
|
||||
+ return ret;
|
||||
+ ret = zstd_decomp_init(ctx);
|
||||
+ if (ret)
|
||||
+ zstd_comp_exit(ctx);
|
||||
+ return ret;
|
||||
+}
|
||||
+
|
||||
+static void *zstd_alloc_ctx(struct crypto_scomp *tfm)
|
||||
+{
|
||||
+ int ret;
|
||||
+ struct zstd_ctx *ctx;
|
||||
+
|
||||
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
|
||||
+ if (!ctx)
|
||||
+ return ERR_PTR(-ENOMEM);
|
||||
+
|
||||
+ ret = __zstd_init(ctx);
|
||||
+ if (ret) {
|
||||
+ kfree(ctx);
|
||||
+ return ERR_PTR(ret);
|
||||
+ }
|
||||
+
|
||||
+ return ctx;
|
||||
+}
|
||||
+
|
||||
+static int zstd_init(struct crypto_tfm *tfm)
|
||||
+{
|
||||
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
|
||||
+
|
||||
+ return __zstd_init(ctx);
|
||||
+}
|
||||
+
|
||||
+static void __zstd_exit(void *ctx)
|
||||
+{
|
||||
+ zstd_comp_exit(ctx);
|
||||
+ zstd_decomp_exit(ctx);
|
||||
+}
|
||||
+
|
||||
+static void zstd_free_ctx(struct crypto_scomp *tfm, void *ctx)
|
||||
+{
|
||||
+ __zstd_exit(ctx);
|
||||
+ kzfree(ctx);
|
||||
+}
|
||||
+
|
||||
+static void zstd_exit(struct crypto_tfm *tfm)
|
||||
+{
|
||||
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
|
||||
+
|
||||
+ __zstd_exit(ctx);
|
||||
+}
|
||||
+
|
||||
+static int __zstd_compress(const u8 *src, unsigned int slen,
|
||||
+ u8 *dst, unsigned int *dlen, void *ctx)
|
||||
+{
|
||||
+ size_t out_len;
|
||||
+ struct zstd_ctx *zctx = ctx;
|
||||
+ const ZSTD_parameters params = zstd_params();
|
||||
+
|
||||
+ out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, params);
|
||||
+ if (ZSTD_isError(out_len))
|
||||
+ return -EINVAL;
|
||||
+ *dlen = out_len;
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+static int zstd_compress(struct crypto_tfm *tfm, const u8 *src,
|
||||
+ unsigned int slen, u8 *dst, unsigned int *dlen)
|
||||
+{
|
||||
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
|
||||
+
|
||||
+ return __zstd_compress(src, slen, dst, dlen, ctx);
|
||||
+}
|
||||
+
|
||||
+static int zstd_scompress(struct crypto_scomp *tfm, const u8 *src,
|
||||
+ unsigned int slen, u8 *dst, unsigned int *dlen,
|
||||
+ void *ctx)
|
||||
+{
|
||||
+ return __zstd_compress(src, slen, dst, dlen, ctx);
|
||||
+}
|
||||
+
|
||||
+static int __zstd_decompress(const u8 *src, unsigned int slen,
|
||||
+ u8 *dst, unsigned int *dlen, void *ctx)
|
||||
+{
|
||||
+ size_t out_len;
|
||||
+ struct zstd_ctx *zctx = ctx;
|
||||
+
|
||||
+ out_len = ZSTD_decompressDCtx(zctx->dctx, dst, *dlen, src, slen);
|
||||
+ if (ZSTD_isError(out_len))
|
||||
+ return -EINVAL;
|
||||
+ *dlen = out_len;
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+static int zstd_decompress(struct crypto_tfm *tfm, const u8 *src,
|
||||
+ unsigned int slen, u8 *dst, unsigned int *dlen)
|
||||
+{
|
||||
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
|
||||
+
|
||||
+ return __zstd_decompress(src, slen, dst, dlen, ctx);
|
||||
+}
|
||||
+
|
||||
+static int zstd_sdecompress(struct crypto_scomp *tfm, const u8 *src,
|
||||
+ unsigned int slen, u8 *dst, unsigned int *dlen,
|
||||
+ void *ctx)
|
||||
+{
|
||||
+ return __zstd_decompress(src, slen, dst, dlen, ctx);
|
||||
+}
|
||||
+
|
||||
+static struct crypto_alg alg = {
|
||||
+ .cra_name = "zstd",
|
||||
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
|
||||
+ .cra_ctxsize = sizeof(struct zstd_ctx),
|
||||
+ .cra_module = THIS_MODULE,
|
||||
+ .cra_init = zstd_init,
|
||||
+ .cra_exit = zstd_exit,
|
||||
+ .cra_u = { .compress = {
|
||||
+ .coa_compress = zstd_compress,
|
||||
+ .coa_decompress = zstd_decompress } }
|
||||
+};
|
||||
+
|
||||
+static struct scomp_alg scomp = {
|
||||
+ .alloc_ctx = zstd_alloc_ctx,
|
||||
+ .free_ctx = zstd_free_ctx,
|
||||
+ .compress = zstd_scompress,
|
||||
+ .decompress = zstd_sdecompress,
|
||||
+ .base = {
|
||||
+ .cra_name = "zstd",
|
||||
+ .cra_driver_name = "zstd-scomp",
|
||||
+ .cra_module = THIS_MODULE,
|
||||
+ }
|
||||
+};
|
||||
+
|
||||
+static int __init zstd_mod_init(void)
|
||||
+{
|
||||
+ int ret;
|
||||
+
|
||||
+ ret = crypto_register_alg(&alg);
|
||||
+ if (ret)
|
||||
+ return ret;
|
||||
+
|
||||
+ ret = crypto_register_scomp(&scomp);
|
||||
+ if (ret)
|
||||
+ crypto_unregister_alg(&alg);
|
||||
+
|
||||
+ return ret;
|
||||
+}
|
||||
+
|
||||
+static void __exit zstd_mod_fini(void)
|
||||
+{
|
||||
+ crypto_unregister_alg(&alg);
|
||||
+ crypto_unregister_scomp(&scomp);
|
||||
+}
|
||||
+
|
||||
+module_init(zstd_mod_init);
|
||||
+module_exit(zstd_mod_fini);
|
||||
+
|
||||
+MODULE_LICENSE("GPL");
|
||||
+MODULE_DESCRIPTION("Zstd Compression Algorithm");
|
||||
+MODULE_ALIAS_CRYPTO("zstd");
|
||||
--
|
||||
2.9.3
|
420
contrib/linux-kernel/0006-squashfs-tools-Add-zstd-support.patch
Normal file
420
contrib/linux-kernel/0006-squashfs-tools-Add-zstd-support.patch
Normal file
@ -0,0 +1,420 @@
|
||||
From 57a3cf95b276946559f9e044c7352c11303bb9c1 Mon Sep 17 00:00:00 2001
|
||||
From: Sean Purcell <me@seanp.xyz>
|
||||
Date: Thu, 3 Aug 2017 17:47:03 -0700
|
||||
Subject: [PATCH v6] squashfs-tools: Add zstd support
|
||||
|
||||
This patch adds zstd support to squashfs-tools. It works with zstd
|
||||
versions >= 1.0.0. It was originally written by Sean Purcell.
|
||||
|
||||
Signed-off-by: Sean Purcell <me@seanp.xyz>
|
||||
Signed-off-by: Nick Terrell <terrelln@fb.com>
|
||||
---
|
||||
v4 -> v5:
|
||||
- Fix patch documentation to reflect that Sean Purcell is the author
|
||||
- Don't strip trailing whitespace of unreleated code
|
||||
- Make zstd_display_options() static
|
||||
|
||||
v5 -> v6:
|
||||
- Fix build instructions in Makefile
|
||||
|
||||
squashfs-tools/Makefile | 20 ++++
|
||||
squashfs-tools/compressor.c | 8 ++
|
||||
squashfs-tools/squashfs_fs.h | 1 +
|
||||
squashfs-tools/zstd_wrapper.c | 254 ++++++++++++++++++++++++++++++++++++++++++
|
||||
squashfs-tools/zstd_wrapper.h | 48 ++++++++
|
||||
5 files changed, 331 insertions(+)
|
||||
create mode 100644 squashfs-tools/zstd_wrapper.c
|
||||
create mode 100644 squashfs-tools/zstd_wrapper.h
|
||||
|
||||
diff --git a/squashfs-tools/Makefile b/squashfs-tools/Makefile
|
||||
index 52d2582..22fc559 100644
|
||||
--- a/squashfs-tools/Makefile
|
||||
+++ b/squashfs-tools/Makefile
|
||||
@@ -75,6 +75,18 @@ GZIP_SUPPORT = 1
|
||||
#LZMA_SUPPORT = 1
|
||||
#LZMA_DIR = ../../../../LZMA/lzma465
|
||||
|
||||
+
|
||||
+########### Building ZSTD support ############
|
||||
+#
|
||||
+# The ZSTD library is supported
|
||||
+# ZSTD homepage: http://zstd.net
|
||||
+# ZSTD source repository: https://github.com/facebook/zstd
|
||||
+#
|
||||
+# To build using the ZSTD library - install the library and uncomment the
|
||||
+# ZSTD_SUPPORT line below.
|
||||
+#
|
||||
+#ZSTD_SUPPORT = 1
|
||||
+
|
||||
######## Specifying default compression ########
|
||||
#
|
||||
# The next line specifies which compression algorithm is used by default
|
||||
@@ -177,6 +189,14 @@ LIBS += -llz4
|
||||
COMPRESSORS += lz4
|
||||
endif
|
||||
|
||||
+ifeq ($(ZSTD_SUPPORT),1)
|
||||
+CFLAGS += -DZSTD_SUPPORT
|
||||
+MKSQUASHFS_OBJS += zstd_wrapper.o
|
||||
+UNSQUASHFS_OBJS += zstd_wrapper.o
|
||||
+LIBS += -lzstd
|
||||
+COMPRESSORS += zstd
|
||||
+endif
|
||||
+
|
||||
ifeq ($(XATTR_SUPPORT),1)
|
||||
ifeq ($(XATTR_DEFAULT),1)
|
||||
CFLAGS += -DXATTR_SUPPORT -DXATTR_DEFAULT
|
||||
diff --git a/squashfs-tools/compressor.c b/squashfs-tools/compressor.c
|
||||
index 525e316..02b5e90 100644
|
||||
--- a/squashfs-tools/compressor.c
|
||||
+++ b/squashfs-tools/compressor.c
|
||||
@@ -65,6 +65,13 @@ static struct compressor xz_comp_ops = {
|
||||
extern struct compressor xz_comp_ops;
|
||||
#endif
|
||||
|
||||
+#ifndef ZSTD_SUPPORT
|
||||
+static struct compressor zstd_comp_ops = {
|
||||
+ ZSTD_COMPRESSION, "zstd"
|
||||
+};
|
||||
+#else
|
||||
+extern struct compressor zstd_comp_ops;
|
||||
+#endif
|
||||
|
||||
static struct compressor unknown_comp_ops = {
|
||||
0, "unknown"
|
||||
@@ -77,6 +84,7 @@ struct compressor *compressor[] = {
|
||||
&lzo_comp_ops,
|
||||
&lz4_comp_ops,
|
||||
&xz_comp_ops,
|
||||
+ &zstd_comp_ops,
|
||||
&unknown_comp_ops
|
||||
};
|
||||
|
||||
diff --git a/squashfs-tools/squashfs_fs.h b/squashfs-tools/squashfs_fs.h
|
||||
index 791fe12..afca918 100644
|
||||
--- a/squashfs-tools/squashfs_fs.h
|
||||
+++ b/squashfs-tools/squashfs_fs.h
|
||||
@@ -277,6 +277,7 @@ typedef long long squashfs_inode;
|
||||
#define LZO_COMPRESSION 3
|
||||
#define XZ_COMPRESSION 4
|
||||
#define LZ4_COMPRESSION 5
|
||||
+#define ZSTD_COMPRESSION 6
|
||||
|
||||
struct squashfs_super_block {
|
||||
unsigned int s_magic;
|
||||
diff --git a/squashfs-tools/zstd_wrapper.c b/squashfs-tools/zstd_wrapper.c
|
||||
new file mode 100644
|
||||
index 0000000..dcab75a
|
||||
--- /dev/null
|
||||
+++ b/squashfs-tools/zstd_wrapper.c
|
||||
@@ -0,0 +1,254 @@
|
||||
+/*
|
||||
+ * Copyright (c) 2017
|
||||
+ * Phillip Lougher <phillip@squashfs.org.uk>
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or
|
||||
+ * modify it under the terms of the GNU General Public License
|
||||
+ * as published by the Free Software Foundation; either version 2,
|
||||
+ * or (at your option) any later version.
|
||||
+ *
|
||||
+ * This program is distributed in the hope that it will be useful,
|
||||
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
+ * GNU General Public License for more details.
|
||||
+ *
|
||||
+ * zstd_wrapper.c
|
||||
+ *
|
||||
+ * Support for ZSTD compression http://zstd.net
|
||||
+ */
|
||||
+
|
||||
+#include <stdio.h>
|
||||
+#include <string.h>
|
||||
+#include <stdlib.h>
|
||||
+#include <zstd.h>
|
||||
+#include <zstd_errors.h>
|
||||
+
|
||||
+#include "squashfs_fs.h"
|
||||
+#include "zstd_wrapper.h"
|
||||
+#include "compressor.h"
|
||||
+
|
||||
+static int compression_level = ZSTD_DEFAULT_COMPRESSION_LEVEL;
|
||||
+
|
||||
+/*
|
||||
+ * This function is called by the options parsing code in mksquashfs.c
|
||||
+ * to parse any -X compressor option.
|
||||
+ *
|
||||
+ * This function returns:
|
||||
+ * >=0 (number of additional args parsed) on success
|
||||
+ * -1 if the option was unrecognised, or
|
||||
+ * -2 if the option was recognised, but otherwise bad in
|
||||
+ * some way (e.g. invalid parameter)
|
||||
+ *
|
||||
+ * Note: this function sets internal compressor state, but does not
|
||||
+ * pass back the results of the parsing other than success/failure.
|
||||
+ * The zstd_dump_options() function is called later to get the options in
|
||||
+ * a format suitable for writing to the filesystem.
|
||||
+ */
|
||||
+static int zstd_options(char *argv[], int argc)
|
||||
+{
|
||||
+ if (strcmp(argv[0], "-Xcompression-level") == 0) {
|
||||
+ if (argc < 2) {
|
||||
+ fprintf(stderr, "zstd: -Xcompression-level missing "
|
||||
+ "compression level\n");
|
||||
+ fprintf(stderr, "zstd: -Xcompression-level it should "
|
||||
+ "be 1 <= n <= %d\n", ZSTD_maxCLevel());
|
||||
+ goto failed;
|
||||
+ }
|
||||
+
|
||||
+ compression_level = atoi(argv[1]);
|
||||
+ if (compression_level < 1 ||
|
||||
+ compression_level > ZSTD_maxCLevel()) {
|
||||
+ fprintf(stderr, "zstd: -Xcompression-level invalid, it "
|
||||
+ "should be 1 <= n <= %d\n", ZSTD_maxCLevel());
|
||||
+ goto failed;
|
||||
+ }
|
||||
+
|
||||
+ return 1;
|
||||
+ }
|
||||
+
|
||||
+ return -1;
|
||||
+failed:
|
||||
+ return -2;
|
||||
+}
|
||||
+
|
||||
+/*
|
||||
+ * This function is called by mksquashfs to dump the parsed
|
||||
+ * compressor options in a format suitable for writing to the
|
||||
+ * compressor options field in the filesystem (stored immediately
|
||||
+ * after the superblock).
|
||||
+ *
|
||||
+ * This function returns a pointer to the compression options structure
|
||||
+ * to be stored (and the size), or NULL if there are no compression
|
||||
+ * options.
|
||||
+ */
|
||||
+static void *zstd_dump_options(int block_size, int *size)
|
||||
+{
|
||||
+ static struct zstd_comp_opts comp_opts;
|
||||
+
|
||||
+ /* don't return anything if the options are all default */
|
||||
+ if (compression_level == ZSTD_DEFAULT_COMPRESSION_LEVEL)
|
||||
+ return NULL;
|
||||
+
|
||||
+ comp_opts.compression_level = compression_level;
|
||||
+
|
||||
+ SQUASHFS_INSWAP_COMP_OPTS(&comp_opts);
|
||||
+
|
||||
+ *size = sizeof(comp_opts);
|
||||
+ return &comp_opts;
|
||||
+}
|
||||
+
|
||||
+/*
|
||||
+ * This function is a helper specifically for the append mode of
|
||||
+ * mksquashfs. Its purpose is to set the internal compressor state
|
||||
+ * to the stored compressor options in the passed compressor options
|
||||
+ * structure.
|
||||
+ *
|
||||
+ * In effect this function sets up the compressor options
|
||||
+ * to the same state they were when the filesystem was originally
|
||||
+ * generated, this is to ensure on appending, the compressor uses
|
||||
+ * the same compression options that were used to generate the
|
||||
+ * original filesystem.
|
||||
+ *
|
||||
+ * Note, even if there are no compressor options, this function is still
|
||||
+ * called with an empty compressor structure (size == 0), to explicitly
|
||||
+ * set the default options, this is to ensure any user supplied
|
||||
+ * -X options on the appending mksquashfs command line are over-ridden.
|
||||
+ *
|
||||
+ * This function returns 0 on sucessful extraction of options, and -1 on error.
|
||||
+ */
|
||||
+static int zstd_extract_options(int block_size, void *buffer, int size)
|
||||
+{
|
||||
+ struct zstd_comp_opts *comp_opts = buffer;
|
||||
+
|
||||
+ if (size == 0) {
|
||||
+ /* Set default values */
|
||||
+ compression_level = ZSTD_DEFAULT_COMPRESSION_LEVEL;
|
||||
+ return 0;
|
||||
+ }
|
||||
+
|
||||
+ /* we expect a comp_opts structure of sufficient size to be present */
|
||||
+ if (size < sizeof(*comp_opts))
|
||||
+ goto failed;
|
||||
+
|
||||
+ SQUASHFS_INSWAP_COMP_OPTS(comp_opts);
|
||||
+
|
||||
+ if (comp_opts->compression_level < 1 ||
|
||||
+ comp_opts->compression_level > ZSTD_maxCLevel()) {
|
||||
+ fprintf(stderr, "zstd: bad compression level in compression "
|
||||
+ "options structure\n");
|
||||
+ goto failed;
|
||||
+ }
|
||||
+
|
||||
+ compression_level = comp_opts->compression_level;
|
||||
+
|
||||
+ return 0;
|
||||
+
|
||||
+failed:
|
||||
+ fprintf(stderr, "zstd: error reading stored compressor options from "
|
||||
+ "filesystem!\n");
|
||||
+
|
||||
+ return -1;
|
||||
+}
|
||||
+
|
||||
+static void zstd_display_options(void *buffer, int size)
|
||||
+{
|
||||
+ struct zstd_comp_opts *comp_opts = buffer;
|
||||
+
|
||||
+ /* we expect a comp_opts structure of sufficient size to be present */
|
||||
+ if (size < sizeof(*comp_opts))
|
||||
+ goto failed;
|
||||
+
|
||||
+ SQUASHFS_INSWAP_COMP_OPTS(comp_opts);
|
||||
+
|
||||
+ if (comp_opts->compression_level < 1 ||
|
||||
+ comp_opts->compression_level > ZSTD_maxCLevel()) {
|
||||
+ fprintf(stderr, "zstd: bad compression level in compression "
|
||||
+ "options structure\n");
|
||||
+ goto failed;
|
||||
+ }
|
||||
+
|
||||
+ printf("\tcompression-level %d\n", comp_opts->compression_level);
|
||||
+
|
||||
+ return;
|
||||
+
|
||||
+failed:
|
||||
+ fprintf(stderr, "zstd: error reading stored compressor options from "
|
||||
+ "filesystem!\n");
|
||||
+}
|
||||
+
|
||||
+/*
|
||||
+ * This function is called by mksquashfs to initialise the
|
||||
+ * compressor, before compress() is called.
|
||||
+ *
|
||||
+ * This function returns 0 on success, and -1 on error.
|
||||
+ */
|
||||
+static int zstd_init(void **strm, int block_size, int datablock)
|
||||
+{
|
||||
+ ZSTD_CCtx *cctx = ZSTD_createCCtx();
|
||||
+
|
||||
+ if (!cctx) {
|
||||
+ fprintf(stderr, "zstd: failed to allocate compression "
|
||||
+ "context!\n");
|
||||
+ return -1;
|
||||
+ }
|
||||
+
|
||||
+ *strm = cctx;
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
+static int zstd_compress(void *strm, void *dest, void *src, int size,
|
||||
+ int block_size, int *error)
|
||||
+{
|
||||
+ const size_t res = ZSTD_compressCCtx((ZSTD_CCtx*)strm, dest, block_size,
|
||||
+ src, size, compression_level);
|
||||
+
|
||||
+ if (ZSTD_isError(res)) {
|
||||
+ /* FIXME:
|
||||
+ * zstd does not expose stable error codes. The error enum may
|
||||
+ * change between versions. Until upstream zstd stablizes the
|
||||
+ * error codes, we have no way of knowing why the error occurs.
|
||||
+ * zstd shouldn't fail to compress any input unless there isn't
|
||||
+ * enough output space. We assume that is the cause and return
|
||||
+ * the special error code for not enough output space.
|
||||
+ */
|
||||
+ return 0;
|
||||
+ }
|
||||
+
|
||||
+ return (int)res;
|
||||
+}
|
||||
+
|
||||
+static int zstd_uncompress(void *dest, void *src, int size, int outsize,
|
||||
+ int *error)
|
||||
+{
|
||||
+ const size_t res = ZSTD_decompress(dest, outsize, src, size);
|
||||
+
|
||||
+ if (ZSTD_isError(res)) {
|
||||
+ fprintf(stderr, "\t%d %d\n", outsize, size);
|
||||
+
|
||||
+ *error = (int)ZSTD_getErrorCode(res);
|
||||
+ return -1;
|
||||
+ }
|
||||
+
|
||||
+ return (int)res;
|
||||
+}
|
||||
+
|
||||
+static void zstd_usage(void)
|
||||
+{
|
||||
+ fprintf(stderr, "\t -Xcompression-level <compression-level>\n");
|
||||
+ fprintf(stderr, "\t\t<compression-level> should be 1 .. %d (default "
|
||||
+ "%d)\n", ZSTD_maxCLevel(), ZSTD_DEFAULT_COMPRESSION_LEVEL);
|
||||
+}
|
||||
+
|
||||
+struct compressor zstd_comp_ops = {
|
||||
+ .init = zstd_init,
|
||||
+ .compress = zstd_compress,
|
||||
+ .uncompress = zstd_uncompress,
|
||||
+ .options = zstd_options,
|
||||
+ .dump_options = zstd_dump_options,
|
||||
+ .extract_options = zstd_extract_options,
|
||||
+ .display_options = zstd_display_options,
|
||||
+ .usage = zstd_usage,
|
||||
+ .id = ZSTD_COMPRESSION,
|
||||
+ .name = "zstd",
|
||||
+ .supported = 1
|
||||
+};
|
||||
diff --git a/squashfs-tools/zstd_wrapper.h b/squashfs-tools/zstd_wrapper.h
|
||||
new file mode 100644
|
||||
index 0000000..4fbef0a
|
||||
--- /dev/null
|
||||
+++ b/squashfs-tools/zstd_wrapper.h
|
||||
@@ -0,0 +1,48 @@
|
||||
+#ifndef ZSTD_WRAPPER_H
|
||||
+#define ZSTD_WRAPPER_H
|
||||
+/*
|
||||
+ * Squashfs
|
||||
+ *
|
||||
+ * Copyright (c) 2017
|
||||
+ * Phillip Lougher <phillip@squashfs.org.uk>
|
||||
+ *
|
||||
+ * This program is free software; you can redistribute it and/or
|
||||
+ * modify it under the terms of the GNU General Public License
|
||||
+ * as published by the Free Software Foundation; either version 2,
|
||||
+ * or (at your option) any later version.
|
||||
+ *
|
||||
+ * This program is distributed in the hope that it will be useful,
|
||||
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
+ * GNU General Public License for more details.
|
||||
+ *
|
||||
+ * zstd_wrapper.h
|
||||
+ *
|
||||
+ */
|
||||
+
|
||||
+#ifndef linux
|
||||
+#define __BYTE_ORDER BYTE_ORDER
|
||||
+#define __BIG_ENDIAN BIG_ENDIAN
|
||||
+#define __LITTLE_ENDIAN LITTLE_ENDIAN
|
||||
+#else
|
||||
+#include <endian.h>
|
||||
+#endif
|
||||
+
|
||||
+#if __BYTE_ORDER == __BIG_ENDIAN
|
||||
+extern unsigned int inswap_le16(unsigned short);
|
||||
+extern unsigned int inswap_le32(unsigned int);
|
||||
+
|
||||
+#define SQUASHFS_INSWAP_COMP_OPTS(s) { \
|
||||
+ (s)->compression_level = inswap_le32((s)->compression_level); \
|
||||
+}
|
||||
+#else
|
||||
+#define SQUASHFS_INSWAP_COMP_OPTS(s)
|
||||
+#endif
|
||||
+
|
||||
+/* Default compression */
|
||||
+#define ZSTD_DEFAULT_COMPRESSION_LEVEL 15
|
||||
+
|
||||
+struct zstd_comp_opts {
|
||||
+ int compression_level;
|
||||
+};
|
||||
+#endif
|
||||
--
|
||||
2.9.5
|
@ -1,7 +1,7 @@
|
||||
# Linux Kernel Patch
|
||||
|
||||
There are four pieces, the `xxhash` kernel module, the `zstd_compress` and `zstd_decompress` kernel modules, the BtrFS patch, and the SquashFS patch.
|
||||
The patches are based off of the linux kernel master branch (version 4.10).
|
||||
The patches are based off of the linux kernel master branch.
|
||||
|
||||
## xxHash kernel module
|
||||
|
||||
@ -42,7 +42,7 @@ The patches are based off of the linux kernel master branch (version 4.10).
|
||||
Benchmarks run on a Ubuntu 14.04 with 2 cores and 4 GiB of RAM.
|
||||
The VM is running on a Macbook Pro with a 3.1 GHz Intel Core i7 processor,
|
||||
16 GB of ram, and a SSD.
|
||||
The kernel running was built from the master branch with the patch (version 4.10).
|
||||
The kernel running was built from the master branch with the patch.
|
||||
|
||||
The compression benchmark is copying 10 copies of the
|
||||
unzipped [silesia corpus](http://mattmahoney.net/dc/silesia.html) into a BtrFS
|
||||
@ -69,14 +69,14 @@ See `btrfs-benchmark.sh` for details.
|
||||
|
||||
* The patch is located in `squashfs.diff`
|
||||
* Additionally `fs/squashfs/zstd_wrapper.c` is provided as a source for convenience.
|
||||
* The patch has been tested on a 4.10 kernel.
|
||||
* The patch has been tested on the master branch of the kernel.
|
||||
|
||||
### Benchmarks
|
||||
|
||||
Benchmarks run on a Ubuntu 14.04 with 2 cores and 4 GiB of RAM.
|
||||
The VM is running on a Macbook Pro with a 3.1 GHz Intel Core i7 processor,
|
||||
16 GB of ram, and a SSD.
|
||||
The kernel running was built from the master branch with the patch (version 4.10).
|
||||
The kernel running was built from the master branch with the patch.
|
||||
|
||||
The compression benchmark is the file tree from the SquashFS archive found in the
|
||||
Ubuntu 16.10 desktop image (ubuntu-16.10-desktop-amd64.iso).
|
||||
|
@ -10,20 +10,16 @@
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
* General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public
|
||||
* License along with this program; if not, write to the
|
||||
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
||||
* Boston, MA 021110-1307, USA.
|
||||
*/
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/sched.h>
|
||||
#include <linux/pagemap.h>
|
||||
#include <linux/bio.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/init.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/mm.h>
|
||||
#include <linux/pagemap.h>
|
||||
#include <linux/refcount.h>
|
||||
#include <linux/sched.h>
|
||||
#include <linux/slab.h>
|
||||
#include <linux/zstd.h>
|
||||
#include "compression.h"
|
||||
|
||||
@ -33,7 +29,8 @@
|
||||
|
||||
static ZSTD_parameters zstd_get_btrfs_parameters(size_t src_len)
|
||||
{
|
||||
ZSTD_parameters params = ZSTD_getParams(ZSTD_BTRFS_DEFAULT_LEVEL, src_len, 0);
|
||||
ZSTD_parameters params = ZSTD_getParams(ZSTD_BTRFS_DEFAULT_LEVEL,
|
||||
src_len, 0);
|
||||
|
||||
if (params.cParams.windowLog > ZSTD_BTRFS_MAX_WINDOWLOG)
|
||||
params.cParams.windowLog = ZSTD_BTRFS_MAX_WINDOWLOG;
|
||||
|
@ -14,10 +14,6 @@
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
* GNU General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License
|
||||
* along with this program; if not, write to the Free Software
|
||||
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
|
||||
*
|
||||
* zstd_wrapper.c
|
||||
*/
|
||||
|
||||
@ -36,15 +32,18 @@
|
||||
struct workspace {
|
||||
void *mem;
|
||||
size_t mem_size;
|
||||
size_t window_size;
|
||||
};
|
||||
|
||||
static void *zstd_init(struct squashfs_sb_info *msblk, void *buff)
|
||||
{
|
||||
struct workspace *wksp = kmalloc(sizeof(*wksp), GFP_KERNEL);
|
||||
|
||||
if (wksp == NULL)
|
||||
goto failed;
|
||||
wksp->mem_size = ZSTD_DStreamWorkspaceBound(max_t(size_t,
|
||||
msblk->block_size, SQUASHFS_METADATA_SIZE));
|
||||
wksp->window_size = max_t(size_t,
|
||||
msblk->block_size, SQUASHFS_METADATA_SIZE);
|
||||
wksp->mem_size = ZSTD_DStreamWorkspaceBound(wksp->window_size);
|
||||
wksp->mem = vmalloc(wksp->mem_size);
|
||||
if (wksp->mem == NULL)
|
||||
goto failed;
|
||||
@ -80,7 +79,7 @@ static int zstd_uncompress(struct squashfs_sb_info *msblk, void *strm,
|
||||
ZSTD_inBuffer in_buf = { NULL, 0, 0 };
|
||||
ZSTD_outBuffer out_buf = { NULL, 0, 0 };
|
||||
|
||||
stream = ZSTD_initDStream(wksp->mem_size, wksp->mem, wksp->mem_size);
|
||||
stream = ZSTD_initDStream(wksp->window_size, wksp->mem, wksp->mem_size);
|
||||
|
||||
if (!stream) {
|
||||
ERROR("Failed to initialize zstd decompressor\n");
|
||||
@ -93,6 +92,7 @@ static int zstd_uncompress(struct squashfs_sb_info *msblk, void *strm,
|
||||
do {
|
||||
if (in_buf.pos == in_buf.size && k < b) {
|
||||
int avail = min(length, msblk->devblksize - offset);
|
||||
|
||||
length -= avail;
|
||||
in_buf.src = bh[k]->b_data + offset;
|
||||
in_buf.size = avail;
|
||||
@ -103,8 +103,9 @@ static int zstd_uncompress(struct squashfs_sb_info *msblk, void *strm,
|
||||
if (out_buf.pos == out_buf.size) {
|
||||
out_buf.dst = squashfs_next_page(output);
|
||||
if (out_buf.dst == NULL) {
|
||||
/* shouldn't run out of pages before stream is
|
||||
* done */
|
||||
/* Shouldn't run out of pages
|
||||
* before stream is done.
|
||||
*/
|
||||
squashfs_finish_page(output);
|
||||
goto out;
|
||||
}
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
@ -583,7 +581,7 @@ void ZSTD_seqToCodes(const seqStore_t *seqStorePtr)
|
||||
mlCodeTable[seqStorePtr->longLengthPos] = MaxML;
|
||||
}
|
||||
|
||||
ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCapacity, size_t srcSize)
|
||||
ZSTD_STATIC size_t ZSTD_compressSequences_internal(ZSTD_CCtx *zc, void *dst, size_t dstCapacity)
|
||||
{
|
||||
const int longOffsets = zc->params.cParams.windowLog > STREAM_ACCUMULATOR_MIN;
|
||||
const seqStore_t *seqStorePtr = &(zc->seqStore);
|
||||
@ -636,7 +634,7 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa
|
||||
else
|
||||
op[0] = 0xFF, ZSTD_writeLE16(op + 1, (U16)(nbSeq - LONGNBSEQ)), op += 3;
|
||||
if (nbSeq == 0)
|
||||
goto _check_compressibility;
|
||||
return op - ostart;
|
||||
|
||||
/* seqHead : flags for FSE encoding type */
|
||||
seqHead = op++;
|
||||
@ -826,28 +824,33 @@ ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCa
|
||||
op += streamSize;
|
||||
}
|
||||
}
|
||||
|
||||
/* check compressibility */
|
||||
_check_compressibility:
|
||||
{
|
||||
size_t const minGain = ZSTD_minGain(srcSize);
|
||||
size_t const maxCSize = srcSize - minGain;
|
||||
if ((size_t)(op - ostart) >= maxCSize) {
|
||||
zc->flagStaticHufTable = HUF_repeat_none;
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
/* confirm repcodes */
|
||||
{
|
||||
int i;
|
||||
for (i = 0; i < ZSTD_REP_NUM; i++)
|
||||
zc->rep[i] = zc->repToConfirm[i];
|
||||
}
|
||||
|
||||
return op - ostart;
|
||||
}
|
||||
|
||||
ZSTD_STATIC size_t ZSTD_compressSequences(ZSTD_CCtx *zc, void *dst, size_t dstCapacity, size_t srcSize)
|
||||
{
|
||||
size_t const cSize = ZSTD_compressSequences_internal(zc, dst, dstCapacity);
|
||||
size_t const minGain = ZSTD_minGain(srcSize);
|
||||
size_t const maxCSize = srcSize - minGain;
|
||||
/* If the srcSize <= dstCapacity, then there is enough space to write a
|
||||
* raw uncompressed block. Since we ran out of space, the block must not
|
||||
* be compressible, so fall back to a raw uncompressed block.
|
||||
*/
|
||||
int const uncompressibleError = cSize == ERROR(dstSize_tooSmall) && srcSize <= dstCapacity;
|
||||
int i;
|
||||
|
||||
if (ZSTD_isError(cSize) && !uncompressibleError)
|
||||
return cSize;
|
||||
if (cSize >= maxCSize || uncompressibleError) {
|
||||
zc->flagStaticHufTable = HUF_repeat_none;
|
||||
return 0;
|
||||
}
|
||||
/* confirm repcodes */
|
||||
for (i = 0; i < ZSTD_REP_NUM; i++)
|
||||
zc->rep[i] = zc->repToConfirm[i];
|
||||
return cSize;
|
||||
}
|
||||
|
||||
/*! ZSTD_storeSeq() :
|
||||
Store a sequence (literal length, literals, offset code and match length code) into seqStore_t.
|
||||
`offsetCode` : distance to match, or 0 == repCode.
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
@ -998,6 +996,8 @@ static seq_t ZSTD_decodeSequence(seqState_t *seqState)
|
||||
BIT_reloadDStream(&seqState->DStream); /* <= 18 bits */
|
||||
FSE_updateState(&seqState->stateOffb, &seqState->DStream); /* <= 8 bits */
|
||||
|
||||
seq.match = NULL;
|
||||
|
||||
return seq;
|
||||
}
|
||||
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
@ -126,7 +124,7 @@ static const U32 OF_defaultNormLog = OF_DEFAULTNORMLOG;
|
||||
/*-*******************************************
|
||||
* Shared functions to include for inlining
|
||||
*********************************************/
|
||||
static void ZSTD_copy8(void *dst, const void *src) {
|
||||
ZSTD_STATIC void ZSTD_copy8(void *dst, const void *src) {
|
||||
memcpy(dst, src, 8);
|
||||
}
|
||||
/*! ZSTD_wildcopy() :
|
||||
@ -134,8 +132,21 @@ static void ZSTD_copy8(void *dst, const void *src) {
|
||||
#define WILDCOPY_OVERLENGTH 8
|
||||
ZSTD_STATIC void ZSTD_wildcopy(void *dst, const void *src, ptrdiff_t length)
|
||||
{
|
||||
if (length > 0)
|
||||
memcpy(dst, src, length);
|
||||
const BYTE* ip = (const BYTE*)src;
|
||||
BYTE* op = (BYTE*)dst;
|
||||
BYTE* const oend = op + length;
|
||||
/* Work around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388.
|
||||
* Avoid the bad case where the loop only runs once by handling the
|
||||
* special case separately. This doesn't trigger the bug because it
|
||||
* doesn't involve pointer/integer overflow.
|
||||
*/
|
||||
if (length <= 8)
|
||||
return ZSTD_copy8(dst, src);
|
||||
do {
|
||||
ZSTD_copy8(op, ip);
|
||||
op += 8;
|
||||
ip += 8;
|
||||
} while (op < oend);
|
||||
}
|
||||
|
||||
/*-*******************************************
|
||||
|
@ -4,8 +4,6 @@
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of https://github.com/facebook/zstd.
|
||||
* An additional grant of patent rights can be found in the PATENTS file in the
|
||||
* same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
|
@ -2,9 +2,9 @@
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/*
|
||||
|
@ -2,9 +2,9 @@
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/*
|
||||
|
@ -2,19 +2,13 @@
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
* Free Software Foundation. This program is dual-licensed; you may select
|
||||
* either version 2 of the GNU General Public License ("GPL") or BSD license
|
||||
* ("BSD").
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/* DO_XXH should be 32 or 64 for xxh32 and xxh64 respectively */
|
||||
#define DO_XXH 0
|
||||
#define DO_XXH 0
|
||||
/* DO_CRC should be 0 or 1 */
|
||||
#define DO_CRC 0
|
||||
/* Buffer size */
|
||||
|
@ -2,15 +2,9 @@
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
* Free Software Foundation. This program is dual-licensed; you may select
|
||||
* either version 2 of the GNU General Public License ("GPL") or BSD license
|
||||
* ("BSD").
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/* Compression level or 0 to disable */
|
||||
|
@ -2,15 +2,9 @@
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it under
|
||||
* the terms of the GNU General Public License version 2 as published by the
|
||||
* Free Software Foundation. This program is dual-licensed; you may select
|
||||
* either version 2 of the GNU General Public License ("GPL") or BSD license
|
||||
* ("BSD").
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/* Compression level or 0 to disable */
|
||||
|
36
contrib/long_distance_matching/Makefile
Normal file
36
contrib/long_distance_matching/Makefile
Normal file
@ -0,0 +1,36 @@
|
||||
# ################################################################
|
||||
# Copyright (c) 2017-present, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
# This Makefile presumes libzstd is installed, using `sudo make install`
|
||||
|
||||
CPPFLAGS+= -I../../lib/common
|
||||
CFLAGS ?= -O3
|
||||
DEBUGFLAGS = -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow \
|
||||
-Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement \
|
||||
-Wstrict-prototypes -Wundef -Wpointer-arith -Wformat-security \
|
||||
-Wvla -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings \
|
||||
-Wredundant-decls
|
||||
CFLAGS += $(DEBUGFLAGS) $(MOREFLAGS)
|
||||
FLAGS = $(CPPFLAGS) $(CFLAGS)
|
||||
|
||||
LDFLAGS += -lzstd
|
||||
|
||||
.PHONY: default all clean
|
||||
|
||||
default: all
|
||||
|
||||
all: ldm
|
||||
|
||||
ldm: ldm_common.c ldm.c main.c
|
||||
$(CC) $(CPPFLAGS) $(CFLAGS) $^ $(LDFLAGS) -o $@
|
||||
|
||||
clean:
|
||||
@rm -f core *.o tmp* result* *.ldm *.ldm.dec \
|
||||
ldm
|
||||
@echo Cleaning completed
|
102
contrib/long_distance_matching/README.md
Normal file
102
contrib/long_distance_matching/README.md
Normal file
@ -0,0 +1,102 @@
|
||||
This is a compression algorithm focused on finding long distance matches.
|
||||
|
||||
It is based upon lz4 and uses nearly the same block format (github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md). The number of bytes to encode the offset is four instead of two in lz4 to reflect the longer distance matching. The block format is described in `ldm.h`.
|
||||
|
||||
### Build
|
||||
|
||||
Run `make`.
|
||||
|
||||
### Compressing a file
|
||||
|
||||
`ldm <filename>`
|
||||
|
||||
Decompression and verification can be enabled by defining `DECOMPRESS_AND_VERIFY` in `main.c`.
|
||||
The output file names are as follows:
|
||||
- `<filename>.ldm` : compressed file
|
||||
- `<filename>.ldm.dec` : decompressed file
|
||||
|
||||
### Parameters
|
||||
|
||||
There are various parameters that can be tuned. These parameters can be tuned in `ldm.h` or, alternatively if `ldm_params.h` is included, in `ldm_params.h` (for easier configuration).
|
||||
|
||||
The parameters are as follows and must all be defined:
|
||||
- `LDM_MEMORY_USAGE` : the memory usage of the underlying hash table in bytes.
|
||||
- `HASH_BUCKET_SIZE_LOG` : the log size of each bucket in the hash table (used in collision resolution).
|
||||
- `LDM_LAG` : the lag (in bytes) in inserting entries into the hash table.
|
||||
- `LDM_WINDOW_SIZE_LOG` : the log maximum window size when searching for matches.
|
||||
- `LDM_MIN_MATCH_LENGTH` : the minimum match length.
|
||||
- `INSERT_BY_TAG` : insert entries into the hash table as a function of the hash. This increases speed by reducing the number of hash table lookups and match comparisons. Certain hashes will never be inserted.
|
||||
- `USE_CHECKSUM` : store a checksum with the hash table entries for faster comparison. This halves the number of entries the hash table can contain.
|
||||
|
||||
The optional parameter `HASH_ONLY_EVERY_LOG` is the log inverse frequency of insertion into the hash table. That is, an entry is inserted approximately every `1 << HASH_ONLY_EVERY_LOG` times. If this parameter is not defined, the value is computed as a function of the window size and memory usage to approximate an even coverage of the window.
|
||||
|
||||
|
||||
### Benchmark
|
||||
|
||||
Below is a comparison of various compression methods on a tar of four versions of llvm (versions `3.9.0`, `3.9.1`, `4.0.0`, `4.0.1`) with a total size of `727900160` B.
|
||||
|
||||
| Method | Size | Ratio |
|
||||
|:---|---:|---:|
|
||||
|lrzip -p 32 -n -w 1 | `369968714` | `1.97`|
|
||||
|ldm | `209391361` | `3.48`|
|
||||
|lz4 | `189954338` | `3.83`|
|
||||
|lrzip -p 32 -l -w 1 | `163940343` | `4.44`|
|
||||
|zstd -1 | `126080293` | `5.77`|
|
||||
|lrzip -p 32 -n | `124821009` | `5.83`|
|
||||
|lrzip -p 32 -n -w 1 & zstd -1 | `120317909` | `6.05`|
|
||||
|zstd -3 -o | `115290952` | `6.31`|
|
||||
|lrzip -p 32 -g -L 9 -w 1 | `107168979` | `6.79`|
|
||||
|zstd -6 -o | `102772098` | `7.08`|
|
||||
|zstd -T16 -9 | `98040470` | `7.42`|
|
||||
|lrzip -p 32 -n -w 1 & zstd -T32 -19 | `88050289` | `8.27`|
|
||||
|zstd -T32 -19 | `83626098` | `8.70`|
|
||||
|lrzip -p 32 -n & zstd -1 | `36335117` | `20.03`|
|
||||
|ldm & zstd -6 | `32856232` | `22.15`|
|
||||
|lrzip -p 32 -g -L 9 | `32243594` | `22.58`|
|
||||
|lrzip -p 32 -n & zstd -6 | `30954572` | `23.52`|
|
||||
|lrzip -p 32 -n & zstd -T32 -19 | `26472064` | `27.50`|
|
||||
|
||||
The method marked `ldm` was run with the following parameters:
|
||||
|
||||
| Parameter | Value |
|
||||
|:---|---:|
|
||||
| `LDM_MEMORY_USAGE` | `23`|
|
||||
|`HASH_BUCKET_SIZE_LOG` | `3`|
|
||||
|`LDM_LAG` | `0`|
|
||||
|`LDM_WINDOW_SIZE_LOG` | `28`|
|
||||
|`LDM_MIN_MATCH_LENGTH`| `64`|
|
||||
|`INSERT_BY_TAG` | `1`|
|
||||
|`USE_CHECKSUM` | `1`|
|
||||
|
||||
The compression speed was `220.5 MB/s`.
|
||||
|
||||
### Parameter selection
|
||||
|
||||
Below is a brief discussion of the effects of the parameters on the speed and compression ratio.
|
||||
|
||||
#### Speed
|
||||
|
||||
A large bottleneck in terms of speed is finding the matches and comparing to see if they are greater than the minimum match length. Generally:
|
||||
- The fewer matches found (or the lower the percentage of the literals matched), the slower the algorithm will behave.
|
||||
- Increasing `HASH_ONLY_EVERY_LOG` results in fewer inserts and, if `INSERT_BY_TAG` is set, fewer lookups in the table. This has a large effect on speed, as well as compression ratio.
|
||||
- If `HASH_ONLY_EVERY_LOG` is not set, its value is calculated based on `LDM_WINDOW_SIZE_LOG` and `LDM_MEMORY_USAGE`. Increasing `LDM_WINDOW_SIZE_LOG` has the effect of increasing `HASH_ONLY_EVERY_LOG` and increasing `LDM_MEMORY_USAGE` decreases `HASH_ONLY_EVERY_LOG`.
|
||||
- `USE_CHECKSUM` generally improves speed with hash table lookups.
|
||||
|
||||
#### Compression ratio
|
||||
|
||||
The compression ratio is highly correlated with the coverage of matches. As a long distance matcher, the algorithm was designed to "optimize" for long distance matches outside the zstd compression window. The compression ratio after recompressing the output of the long-distance matcher with zstd was a more important signal in development than the raw compression ratio itself.
|
||||
|
||||
Generally, increasing `LDM_MEMORY_USAGE` will improve the compression ratio. However when using the default computed value of `HASH_ONLY_EVERY_LOG`, this increases the frequency of insertion and lookup in the table and thus may result in a decrease in speed.
|
||||
|
||||
Below is a table showing the speed and compression ratio when compressing the llvm tar (as described above) using different settings for `LDM_MEMORY_USAGE`. The other parameters were the same as used in the benchmark above.
|
||||
|
||||
| `LDM_MEMORY_USAGE` | Ratio | Speed (MB/s) | Ratio after zstd -6 |
|
||||
|---:| ---: | ---: | ---: |
|
||||
| `18` | `1.85` | `232.4` | `10.92` |
|
||||
| `21` | `2.79` | `233.9` | `15.92` |
|
||||
| `23` | `3.48` | `220.5` | `18.29` |
|
||||
| `25` | `4.56` | `140.8` | `19.21` |
|
||||
|
||||
### Compression statistics
|
||||
|
||||
Compression statistics (and the configuration) can be enabled/disabled via `COMPUTE_STATS` and `OUTPUT_CONFIGURATION` in `ldm.h`.
|
857
contrib/long_distance_matching/ldm.c
Normal file
857
contrib/long_distance_matching/ldm.c
Normal file
@ -0,0 +1,857 @@
|
||||
#include <limits.h>
|
||||
#include <stdint.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
#include "ldm.h"
|
||||
|
||||
#define LDM_HASHTABLESIZE (1 << (LDM_MEMORY_USAGE))
|
||||
#define LDM_HASHTABLESIZE_U32 ((LDM_HASHTABLESIZE) >> 2)
|
||||
#define LDM_HASHTABLESIZE_U64 ((LDM_HASHTABLESIZE) >> 3)
|
||||
|
||||
#if USE_CHECKSUM
|
||||
#define LDM_HASH_ENTRY_SIZE_LOG 3
|
||||
#else
|
||||
#define LDM_HASH_ENTRY_SIZE_LOG 2
|
||||
#endif
|
||||
|
||||
// Entries are inserted into the table HASH_ONLY_EVERY + 1 times "on average".
|
||||
#ifndef HASH_ONLY_EVERY_LOG
|
||||
#define HASH_ONLY_EVERY_LOG (LDM_WINDOW_SIZE_LOG-((LDM_MEMORY_USAGE)-(LDM_HASH_ENTRY_SIZE_LOG)))
|
||||
#endif
|
||||
|
||||
#define HASH_ONLY_EVERY ((1 << (HASH_ONLY_EVERY_LOG)) - 1)
|
||||
|
||||
#define HASH_BUCKET_SIZE (1 << (HASH_BUCKET_SIZE_LOG))
|
||||
#define NUM_HASH_BUCKETS_LOG ((LDM_MEMORY_USAGE)-(LDM_HASH_ENTRY_SIZE_LOG)-(HASH_BUCKET_SIZE_LOG))
|
||||
|
||||
#define HASH_CHAR_OFFSET 10
|
||||
|
||||
// Take the first match in the hash bucket only.
|
||||
//#define ZSTD_SKIP
|
||||
|
||||
static const U64 prime8bytes = 11400714785074694791ULL;
|
||||
|
||||
// Type of the small hash used to index into the hash table.
|
||||
typedef U32 hash_t;
|
||||
|
||||
#if USE_CHECKSUM
|
||||
typedef struct LDM_hashEntry {
|
||||
U32 offset;
|
||||
U32 checksum;
|
||||
} LDM_hashEntry;
|
||||
#else
|
||||
typedef struct LDM_hashEntry {
|
||||
U32 offset;
|
||||
} LDM_hashEntry;
|
||||
#endif
|
||||
|
||||
struct LDM_compressStats {
|
||||
U32 windowSizeLog, hashTableSizeLog;
|
||||
U32 numMatches;
|
||||
U64 totalMatchLength;
|
||||
U64 totalLiteralLength;
|
||||
U64 totalOffset;
|
||||
|
||||
U32 matchLengthHistogram[32];
|
||||
|
||||
U32 minOffset, maxOffset;
|
||||
U32 offsetHistogram[32];
|
||||
};
|
||||
|
||||
typedef struct LDM_hashTable LDM_hashTable;
|
||||
|
||||
struct LDM_CCtx {
|
||||
size_t isize; /* Input size */
|
||||
size_t maxOSize; /* Maximum output size */
|
||||
|
||||
const BYTE *ibase; /* Base of input */
|
||||
const BYTE *ip; /* Current input position */
|
||||
const BYTE *iend; /* End of input */
|
||||
|
||||
// Maximum input position such that hashing at the position does not exceed
|
||||
// end of input.
|
||||
const BYTE *ihashLimit;
|
||||
|
||||
// Maximum input position such that finding a match of at least the minimum
|
||||
// match length does not exceed end of input.
|
||||
const BYTE *imatchLimit;
|
||||
|
||||
const BYTE *obase; /* Base of output */
|
||||
BYTE *op; /* Output */
|
||||
|
||||
const BYTE *anchor; /* Anchor to start of current (match) block */
|
||||
|
||||
LDM_compressStats stats; /* Compression statistics */
|
||||
|
||||
LDM_hashTable *hashTable;
|
||||
|
||||
const BYTE *lastPosHashed; /* Last position hashed */
|
||||
U64 lastHash;
|
||||
|
||||
const BYTE *nextIp; // TODO: this is redundant (ip + step)
|
||||
const BYTE *nextPosHashed;
|
||||
U64 nextHash;
|
||||
|
||||
unsigned step; // ip step, should be 1.
|
||||
|
||||
const BYTE *lagIp;
|
||||
U64 lagHash;
|
||||
};
|
||||
|
||||
struct LDM_hashTable {
|
||||
U32 numBuckets; // The number of buckets.
|
||||
U32 numEntries; // numBuckets * HASH_BUCKET_SIZE.
|
||||
|
||||
LDM_hashEntry *entries;
|
||||
BYTE *bucketOffsets; // A pointer (per bucket) to the next insert position.
|
||||
};
|
||||
|
||||
static void HASH_destroyTable(LDM_hashTable *table) {
|
||||
free(table->entries);
|
||||
free(table->bucketOffsets);
|
||||
free(table);
|
||||
}
|
||||
|
||||
/**
|
||||
* Create a hash table that can contain size elements.
|
||||
* The number of buckets is determined by size >> HASH_BUCKET_SIZE_LOG.
|
||||
*
|
||||
* Returns NULL if table creation failed.
|
||||
*/
|
||||
static LDM_hashTable *HASH_createTable(U32 size) {
|
||||
LDM_hashTable *table = malloc(sizeof(LDM_hashTable));
|
||||
if (!table) return NULL;
|
||||
|
||||
table->numBuckets = size >> HASH_BUCKET_SIZE_LOG;
|
||||
table->numEntries = size;
|
||||
table->entries = calloc(size, sizeof(LDM_hashEntry));
|
||||
table->bucketOffsets = calloc(size >> HASH_BUCKET_SIZE_LOG, sizeof(BYTE));
|
||||
|
||||
if (!table->entries || !table->bucketOffsets) {
|
||||
HASH_destroyTable(table);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
return table;
|
||||
}
|
||||
|
||||
static LDM_hashEntry *getBucket(const LDM_hashTable *table, const hash_t hash) {
|
||||
return table->entries + (hash << HASH_BUCKET_SIZE_LOG);
|
||||
}
|
||||
|
||||
static unsigned ZSTD_NbCommonBytes (register size_t val) {
|
||||
if (MEM_isLittleEndian()) {
|
||||
if (MEM_64bits()) {
|
||||
# if defined(_MSC_VER) && defined(_WIN64)
|
||||
unsigned long r = 0;
|
||||
_BitScanForward64( &r, (U64)val );
|
||||
return (unsigned)(r>>3);
|
||||
# elif defined(__GNUC__) && (__GNUC__ >= 3)
|
||||
return (__builtin_ctzll((U64)val) >> 3);
|
||||
# else
|
||||
static const int DeBruijnBytePos[64] = { 0, 0, 0, 0, 0, 1, 1, 2,
|
||||
0, 3, 1, 3, 1, 4, 2, 7,
|
||||
0, 2, 3, 6, 1, 5, 3, 5,
|
||||
1, 3, 4, 4, 2, 5, 6, 7,
|
||||
7, 0, 1, 2, 3, 3, 4, 6,
|
||||
2, 6, 5, 5, 3, 4, 5, 6,
|
||||
7, 1, 2, 4, 6, 4, 4, 5,
|
||||
7, 2, 6, 5, 7, 6, 7, 7 };
|
||||
return DeBruijnBytePos[
|
||||
((U64)((val & -(long long)val) * 0x0218A392CDABBD3FULL)) >> 58];
|
||||
# endif
|
||||
} else { /* 32 bits */
|
||||
# if defined(_MSC_VER)
|
||||
unsigned long r=0;
|
||||
_BitScanForward( &r, (U32)val );
|
||||
return (unsigned)(r>>3);
|
||||
# elif defined(__GNUC__) && (__GNUC__ >= 3)
|
||||
return (__builtin_ctz((U32)val) >> 3);
|
||||
# else
|
||||
static const int DeBruijnBytePos[32] = { 0, 0, 3, 0, 3, 1, 3, 0,
|
||||
3, 2, 2, 1, 3, 2, 0, 1,
|
||||
3, 3, 1, 2, 2, 2, 2, 0,
|
||||
3, 1, 2, 0, 1, 0, 1, 1 };
|
||||
return DeBruijnBytePos[
|
||||
((U32)((val & -(S32)val) * 0x077CB531U)) >> 27];
|
||||
# endif
|
||||
}
|
||||
} else { /* Big Endian CPU */
|
||||
if (MEM_64bits()) {
|
||||
# if defined(_MSC_VER) && defined(_WIN64)
|
||||
unsigned long r = 0;
|
||||
_BitScanReverse64( &r, val );
|
||||
return (unsigned)(r>>3);
|
||||
# elif defined(__GNUC__) && (__GNUC__ >= 3)
|
||||
return (__builtin_clzll(val) >> 3);
|
||||
# else
|
||||
unsigned r;
|
||||
/* calculate this way due to compiler complaining in 32-bits mode */
|
||||
const unsigned n32 = sizeof(size_t)*4;
|
||||
if (!(val>>n32)) { r=4; } else { r=0; val>>=n32; }
|
||||
if (!(val>>16)) { r+=2; val>>=8; } else { val>>=24; }
|
||||
r += (!val);
|
||||
return r;
|
||||
# endif
|
||||
} else { /* 32 bits */
|
||||
# if defined(_MSC_VER)
|
||||
unsigned long r = 0;
|
||||
_BitScanReverse( &r, (unsigned long)val );
|
||||
return (unsigned)(r>>3);
|
||||
# elif defined(__GNUC__) && (__GNUC__ >= 3)
|
||||
return (__builtin_clz((U32)val) >> 3);
|
||||
# else
|
||||
unsigned r;
|
||||
if (!(val>>16)) { r=2; val>>=8; } else { r=0; val>>=24; }
|
||||
r += (!val);
|
||||
return r;
|
||||
# endif
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// From lib/compress/zstd_compress.c
|
||||
static size_t ZSTD_count(const BYTE *pIn, const BYTE *pMatch,
|
||||
const BYTE *const pInLimit) {
|
||||
const BYTE * const pStart = pIn;
|
||||
const BYTE * const pInLoopLimit = pInLimit - (sizeof(size_t)-1);
|
||||
|
||||
while (pIn < pInLoopLimit) {
|
||||
size_t const diff = MEM_readST(pMatch) ^ MEM_readST(pIn);
|
||||
if (!diff) {
|
||||
pIn += sizeof(size_t);
|
||||
pMatch += sizeof(size_t);
|
||||
continue;
|
||||
}
|
||||
pIn += ZSTD_NbCommonBytes(diff);
|
||||
return (size_t)(pIn - pStart);
|
||||
}
|
||||
|
||||
if (MEM_64bits()) {
|
||||
if ((pIn < (pInLimit - 3)) && (MEM_read32(pMatch) == MEM_read32(pIn))) {
|
||||
pIn += 4;
|
||||
pMatch += 4;
|
||||
}
|
||||
}
|
||||
if ((pIn < (pInLimit - 1)) && (MEM_read16(pMatch) == MEM_read16(pIn))) {
|
||||
pIn += 2;
|
||||
pMatch += 2;
|
||||
}
|
||||
if ((pIn < pInLimit) && (*pMatch == *pIn)) {
|
||||
pIn++;
|
||||
}
|
||||
return (size_t)(pIn - pStart);
|
||||
}
|
||||
|
||||
/**
|
||||
* Count number of bytes that match backwards before pIn and pMatch.
|
||||
*
|
||||
* We count only bytes where pMatch > pBase and pIn > pAnchor.
|
||||
*/
|
||||
static size_t countBackwardsMatch(const BYTE *pIn, const BYTE *pAnchor,
|
||||
const BYTE *pMatch, const BYTE *pBase) {
|
||||
size_t matchLength = 0;
|
||||
while (pIn > pAnchor && pMatch > pBase && pIn[-1] == pMatch[-1]) {
|
||||
pIn--;
|
||||
pMatch--;
|
||||
matchLength++;
|
||||
}
|
||||
return matchLength;
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns a pointer to the entry in the hash table matching the hash and
|
||||
* checksum with the "longest match length" as defined below. The forward and
|
||||
* backward match lengths are written to *pForwardMatchLength and
|
||||
* *pBackwardMatchLength.
|
||||
*
|
||||
* The match length is defined based on cctx->ip and the entry's offset.
|
||||
* The forward match is computed from cctx->ip and entry->offset + cctx->ibase.
|
||||
* The backward match is computed backwards from cctx->ip and
|
||||
* cctx->ibase only if the forward match is longer than LDM_MIN_MATCH_LENGTH.
|
||||
*/
|
||||
static LDM_hashEntry *HASH_getBestEntry(const LDM_CCtx *cctx,
|
||||
const hash_t hash,
|
||||
const U32 checksum,
|
||||
U64 *pForwardMatchLength,
|
||||
U64 *pBackwardMatchLength) {
|
||||
LDM_hashTable *table = cctx->hashTable;
|
||||
LDM_hashEntry *bucket = getBucket(table, hash);
|
||||
LDM_hashEntry *cur;
|
||||
LDM_hashEntry *bestEntry = NULL;
|
||||
U64 bestMatchLength = 0;
|
||||
#if !(USE_CHECKSUM)
|
||||
(void)checksum;
|
||||
#endif
|
||||
for (cur = bucket; cur < bucket + HASH_BUCKET_SIZE; ++cur) {
|
||||
const BYTE *pMatch = cur->offset + cctx->ibase;
|
||||
|
||||
// Check checksum for faster check.
|
||||
#if USE_CHECKSUM
|
||||
if (cur->checksum == checksum &&
|
||||
cctx->ip - pMatch <= LDM_WINDOW_SIZE) {
|
||||
#else
|
||||
if (cctx->ip - pMatch <= LDM_WINDOW_SIZE) {
|
||||
#endif
|
||||
U64 forwardMatchLength = ZSTD_count(cctx->ip, pMatch, cctx->iend);
|
||||
U64 backwardMatchLength, totalMatchLength;
|
||||
|
||||
// Only take matches where the forward match length is large enough
|
||||
// for speed.
|
||||
if (forwardMatchLength < LDM_MIN_MATCH_LENGTH) {
|
||||
continue;
|
||||
}
|
||||
|
||||
backwardMatchLength =
|
||||
countBackwardsMatch(cctx->ip, cctx->anchor,
|
||||
cur->offset + cctx->ibase,
|
||||
cctx->ibase);
|
||||
|
||||
totalMatchLength = forwardMatchLength + backwardMatchLength;
|
||||
|
||||
if (totalMatchLength >= bestMatchLength) {
|
||||
bestMatchLength = totalMatchLength;
|
||||
*pForwardMatchLength = forwardMatchLength;
|
||||
*pBackwardMatchLength = backwardMatchLength;
|
||||
|
||||
bestEntry = cur;
|
||||
#ifdef ZSTD_SKIP
|
||||
return cur;
|
||||
#endif
|
||||
}
|
||||
}
|
||||
}
|
||||
if (bestEntry != NULL) {
|
||||
return bestEntry;
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
/**
|
||||
* Insert an entry into the hash table. The table uses a "circular buffer",
|
||||
* with the oldest entry overwritten.
|
||||
*/
|
||||
static void HASH_insert(LDM_hashTable *table,
|
||||
const hash_t hash, const LDM_hashEntry entry) {
|
||||
*(getBucket(table, hash) + table->bucketOffsets[hash]) = entry;
|
||||
table->bucketOffsets[hash]++;
|
||||
table->bucketOffsets[hash] &= HASH_BUCKET_SIZE - 1;
|
||||
}
|
||||
|
||||
static void HASH_outputTableOccupancy(const LDM_hashTable *table) {
|
||||
U32 ctr = 0;
|
||||
LDM_hashEntry *cur = table->entries;
|
||||
LDM_hashEntry *end = table->entries + (table->numBuckets * HASH_BUCKET_SIZE);
|
||||
for (; cur < end; ++cur) {
|
||||
if (cur->offset == 0) {
|
||||
ctr++;
|
||||
}
|
||||
}
|
||||
|
||||
// The number of buckets is repeated as a check for now.
|
||||
printf("Num buckets, bucket size: %d (2^%d), %d\n",
|
||||
table->numBuckets, NUM_HASH_BUCKETS_LOG, HASH_BUCKET_SIZE);
|
||||
printf("Hash table size, empty slots, %% empty: %u, %u, %.3f\n",
|
||||
table->numEntries, ctr,
|
||||
100.0 * (double)(ctr) / table->numEntries);
|
||||
}
|
||||
|
||||
// TODO: This can be done more efficiently, for example by using builtin
|
||||
// functions (but it is not that important as it is only used for computing
|
||||
// stats).
|
||||
static int intLog2(U64 x) {
|
||||
int ret = 0;
|
||||
while (x >>= 1) {
|
||||
ret++;
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
void LDM_printCompressStats(const LDM_compressStats *stats) {
|
||||
printf("=====================\n");
|
||||
printf("Compression statistics\n");
|
||||
printf("Window size, hash table size (bytes): 2^%u, 2^%u\n",
|
||||
stats->windowSizeLog, stats->hashTableSizeLog);
|
||||
printf("num matches, total match length, %% matched: %u, %llu, %.3f\n",
|
||||
stats->numMatches,
|
||||
stats->totalMatchLength,
|
||||
100.0 * (double)stats->totalMatchLength /
|
||||
(double)(stats->totalMatchLength + stats->totalLiteralLength));
|
||||
printf("avg match length: %.1f\n", ((double)stats->totalMatchLength) /
|
||||
(double)stats->numMatches);
|
||||
printf("avg literal length, total literalLength: %.1f, %llu\n",
|
||||
((double)stats->totalLiteralLength) / (double)stats->numMatches,
|
||||
stats->totalLiteralLength);
|
||||
printf("avg offset length: %.1f\n",
|
||||
((double)stats->totalOffset) / (double)stats->numMatches);
|
||||
printf("min offset, max offset: %u, %u\n",
|
||||
stats->minOffset, stats->maxOffset);
|
||||
|
||||
printf("\n");
|
||||
printf("offset histogram | match length histogram\n");
|
||||
printf("offset/ML, num matches, %% of matches | num matches, %% of matches\n");
|
||||
|
||||
{
|
||||
int i;
|
||||
int logMaxOffset = intLog2(stats->maxOffset);
|
||||
for (i = 0; i <= logMaxOffset; i++) {
|
||||
printf("2^%*d: %10u %6.3f%% |2^%*d: %10u %6.3f \n",
|
||||
2, i,
|
||||
stats->offsetHistogram[i],
|
||||
100.0 * (double) stats->offsetHistogram[i] /
|
||||
(double) stats->numMatches,
|
||||
2, i,
|
||||
stats->matchLengthHistogram[i],
|
||||
100.0 * (double) stats->matchLengthHistogram[i] /
|
||||
(double) stats->numMatches);
|
||||
}
|
||||
}
|
||||
printf("\n");
|
||||
printf("=====================\n");
|
||||
}
|
||||
|
||||
/**
|
||||
* Return the upper (most significant) NUM_HASH_BUCKETS_LOG bits.
|
||||
*/
|
||||
static hash_t getSmallHash(U64 hash) {
|
||||
return hash >> (64 - NUM_HASH_BUCKETS_LOG);
|
||||
}
|
||||
|
||||
/**
|
||||
* Return the 32 bits after the upper NUM_HASH_BUCKETS_LOG bits.
|
||||
*/
|
||||
static U32 getChecksum(U64 hash) {
|
||||
return (hash >> (64 - 32 - NUM_HASH_BUCKETS_LOG)) & 0xFFFFFFFF;
|
||||
}
|
||||
|
||||
#if INSERT_BY_TAG
|
||||
static U32 lowerBitsFromHfHash(U64 hash) {
|
||||
// The number of bits used so far is NUM_HASH_BUCKETS_LOG + 32.
|
||||
// So there are 32 - NUM_HASH_BUCKETS_LOG bits left.
|
||||
// Occasional hashing requires HASH_ONLY_EVERY_LOG bits.
|
||||
// So if 32 - LDMHASHLOG < HASH_ONLY_EVERY_LOG, just return lower bits
|
||||
// allowing for reuse of bits.
|
||||
if (32 - NUM_HASH_BUCKETS_LOG < HASH_ONLY_EVERY_LOG) {
|
||||
return hash & HASH_ONLY_EVERY;
|
||||
} else {
|
||||
// Otherwise shift by
|
||||
// (32 - NUM_HASH_BUCKETS_LOG - HASH_ONLY_EVERY_LOG) bits first.
|
||||
return (hash >> (32 - NUM_HASH_BUCKETS_LOG - HASH_ONLY_EVERY_LOG)) &
|
||||
HASH_ONLY_EVERY;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
/**
|
||||
* Get a 64-bit hash using the first len bytes from buf.
|
||||
*
|
||||
* Giving bytes s = s_1, s_2, ... s_k, the hash is defined to be
|
||||
* H(s) = s_1*(a^(k-1)) + s_2*(a^(k-2)) + ... + s_k*(a^0)
|
||||
*
|
||||
* where the constant a is defined to be prime8bytes.
|
||||
*
|
||||
* The implementation adds an offset to each byte, so
|
||||
* H(s) = (s_1 + HASH_CHAR_OFFSET)*(a^(k-1)) + ...
|
||||
*/
|
||||
static U64 getHash(const BYTE *buf, U32 len) {
|
||||
U64 ret = 0;
|
||||
U32 i;
|
||||
for (i = 0; i < len; i++) {
|
||||
ret *= prime8bytes;
|
||||
ret += buf[i] + HASH_CHAR_OFFSET;
|
||||
}
|
||||
return ret;
|
||||
|
||||
}
|
||||
|
||||
static U64 ipow(U64 base, U64 exp) {
|
||||
U64 ret = 1;
|
||||
while (exp) {
|
||||
if (exp & 1) {
|
||||
ret *= base;
|
||||
}
|
||||
exp >>= 1;
|
||||
base *= base;
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
||||
static U64 updateHash(U64 hash, U32 len,
|
||||
BYTE toRemove, BYTE toAdd) {
|
||||
// TODO: this relies on compiler optimization.
|
||||
// The exponential can be calculated explicitly as len is constant.
|
||||
hash -= ((toRemove + HASH_CHAR_OFFSET) *
|
||||
ipow(prime8bytes, len - 1));
|
||||
hash *= prime8bytes;
|
||||
hash += toAdd + HASH_CHAR_OFFSET;
|
||||
return hash;
|
||||
}
|
||||
|
||||
/**
|
||||
* Update cctx->nextHash and cctx->nextPosHashed
|
||||
* based on cctx->lastHash and cctx->lastPosHashed.
|
||||
*
|
||||
* This uses a rolling hash and requires that the last position hashed
|
||||
* corresponds to cctx->nextIp - step.
|
||||
*/
|
||||
static void setNextHash(LDM_CCtx *cctx) {
|
||||
cctx->nextHash = updateHash(
|
||||
cctx->lastHash, LDM_HASH_LENGTH,
|
||||
cctx->lastPosHashed[0],
|
||||
cctx->lastPosHashed[LDM_HASH_LENGTH]);
|
||||
cctx->nextPosHashed = cctx->nextIp;
|
||||
|
||||
#if LDM_LAG
|
||||
if (cctx->ip - cctx->ibase > LDM_LAG) {
|
||||
cctx->lagHash = updateHash(
|
||||
cctx->lagHash, LDM_HASH_LENGTH,
|
||||
cctx->lagIp[0], cctx->lagIp[LDM_HASH_LENGTH]);
|
||||
cctx->lagIp++;
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
static void putHashOfCurrentPositionFromHash(LDM_CCtx *cctx, U64 hash) {
|
||||
// Hash only every HASH_ONLY_EVERY times, based on cctx->ip.
|
||||
// Note: this works only when cctx->step is 1.
|
||||
#if LDM_LAG
|
||||
if (cctx -> lagIp - cctx->ibase > 0) {
|
||||
#if INSERT_BY_TAG
|
||||
U32 hashEveryMask = lowerBitsFromHfHash(cctx->lagHash);
|
||||
if (hashEveryMask == HASH_ONLY_EVERY) {
|
||||
#else
|
||||
if (((cctx->ip - cctx->ibase) & HASH_ONLY_EVERY) == HASH_ONLY_EVERY) {
|
||||
#endif
|
||||
U32 smallHash = getSmallHash(cctx->lagHash);
|
||||
|
||||
# if USE_CHECKSUM
|
||||
U32 checksum = getChecksum(cctx->lagHash);
|
||||
const LDM_hashEntry entry = { cctx->lagIp - cctx->ibase, checksum };
|
||||
# else
|
||||
const LDM_hashEntry entry = { cctx->lagIp - cctx->ibase };
|
||||
# endif
|
||||
|
||||
HASH_insert(cctx->hashTable, smallHash, entry);
|
||||
}
|
||||
} else {
|
||||
#endif // LDM_LAG
|
||||
#if INSERT_BY_TAG
|
||||
U32 hashEveryMask = lowerBitsFromHfHash(hash);
|
||||
if (hashEveryMask == HASH_ONLY_EVERY) {
|
||||
#else
|
||||
if (((cctx->ip - cctx->ibase) & HASH_ONLY_EVERY) == HASH_ONLY_EVERY) {
|
||||
#endif
|
||||
U32 smallHash = getSmallHash(hash);
|
||||
|
||||
#if USE_CHECKSUM
|
||||
U32 checksum = getChecksum(hash);
|
||||
const LDM_hashEntry entry = { cctx->ip - cctx->ibase, checksum };
|
||||
#else
|
||||
const LDM_hashEntry entry = { cctx->ip - cctx->ibase };
|
||||
#endif
|
||||
HASH_insert(cctx->hashTable, smallHash, entry);
|
||||
}
|
||||
#if LDM_LAG
|
||||
}
|
||||
#endif
|
||||
|
||||
cctx->lastPosHashed = cctx->ip;
|
||||
cctx->lastHash = hash;
|
||||
}
|
||||
|
||||
/**
|
||||
* Copy over the cctx->lastHash, and cctx->lastPosHashed
|
||||
* fields from the "next" fields.
|
||||
*
|
||||
* This requires that cctx->ip == cctx->nextPosHashed.
|
||||
*/
|
||||
static void LDM_updateLastHashFromNextHash(LDM_CCtx *cctx) {
|
||||
putHashOfCurrentPositionFromHash(cctx, cctx->nextHash);
|
||||
}
|
||||
|
||||
/**
|
||||
* Insert hash of the current position into the hash table.
|
||||
*/
|
||||
static void LDM_putHashOfCurrentPosition(LDM_CCtx *cctx) {
|
||||
U64 hash = getHash(cctx->ip, LDM_HASH_LENGTH);
|
||||
|
||||
putHashOfCurrentPositionFromHash(cctx, hash);
|
||||
}
|
||||
|
||||
size_t LDM_initializeCCtx(LDM_CCtx *cctx,
|
||||
const void *src, size_t srcSize,
|
||||
void *dst, size_t maxDstSize) {
|
||||
cctx->isize = srcSize;
|
||||
cctx->maxOSize = maxDstSize;
|
||||
|
||||
cctx->ibase = (const BYTE *)src;
|
||||
cctx->ip = cctx->ibase;
|
||||
cctx->iend = cctx->ibase + srcSize;
|
||||
|
||||
cctx->ihashLimit = cctx->iend - LDM_HASH_LENGTH;
|
||||
cctx->imatchLimit = cctx->iend - LDM_MIN_MATCH_LENGTH;
|
||||
|
||||
cctx->obase = (BYTE *)dst;
|
||||
cctx->op = (BYTE *)dst;
|
||||
|
||||
cctx->anchor = cctx->ibase;
|
||||
|
||||
memset(&(cctx->stats), 0, sizeof(cctx->stats));
|
||||
#if USE_CHECKSUM
|
||||
cctx->hashTable = HASH_createTable(LDM_HASHTABLESIZE_U64);
|
||||
#else
|
||||
cctx->hashTable = HASH_createTable(LDM_HASHTABLESIZE_U32);
|
||||
#endif
|
||||
|
||||
if (!cctx->hashTable) return 1;
|
||||
|
||||
cctx->stats.minOffset = UINT_MAX;
|
||||
cctx->stats.windowSizeLog = LDM_WINDOW_SIZE_LOG;
|
||||
cctx->stats.hashTableSizeLog = LDM_MEMORY_USAGE;
|
||||
|
||||
cctx->lastPosHashed = NULL;
|
||||
|
||||
cctx->step = 1; // Fixed to be 1 for now. Changing may break things.
|
||||
cctx->nextIp = cctx->ip + cctx->step;
|
||||
cctx->nextPosHashed = 0;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void LDM_destroyCCtx(LDM_CCtx *cctx) {
|
||||
HASH_destroyTable(cctx->hashTable);
|
||||
}
|
||||
|
||||
/**
|
||||
* Finds the "best" match.
|
||||
*
|
||||
* Returns 0 if successful and 1 otherwise (i.e. no match can be found
|
||||
* in the remaining input that is long enough).
|
||||
*
|
||||
* forwardMatchLength contains the forward length of the match.
|
||||
*/
|
||||
static int LDM_findBestMatch(LDM_CCtx *cctx, const BYTE **match,
|
||||
U64 *forwardMatchLength, U64 *backwardMatchLength) {
|
||||
|
||||
LDM_hashEntry *entry = NULL;
|
||||
cctx->nextIp = cctx->ip + cctx->step;
|
||||
|
||||
while (entry == NULL) {
|
||||
U64 hash;
|
||||
hash_t smallHash;
|
||||
U32 checksum;
|
||||
#if INSERT_BY_TAG
|
||||
U32 hashEveryMask;
|
||||
#endif
|
||||
setNextHash(cctx);
|
||||
|
||||
hash = cctx->nextHash;
|
||||
smallHash = getSmallHash(hash);
|
||||
checksum = getChecksum(hash);
|
||||
#if INSERT_BY_TAG
|
||||
hashEveryMask = lowerBitsFromHfHash(hash);
|
||||
#endif
|
||||
|
||||
cctx->ip = cctx->nextIp;
|
||||
cctx->nextIp += cctx->step;
|
||||
|
||||
if (cctx->ip > cctx->imatchLimit) {
|
||||
return 1;
|
||||
}
|
||||
#if INSERT_BY_TAG
|
||||
if (hashEveryMask == HASH_ONLY_EVERY) {
|
||||
|
||||
entry = HASH_getBestEntry(cctx, smallHash, checksum,
|
||||
forwardMatchLength, backwardMatchLength);
|
||||
}
|
||||
#else
|
||||
entry = HASH_getBestEntry(cctx, smallHash, checksum,
|
||||
forwardMatchLength, backwardMatchLength);
|
||||
#endif
|
||||
|
||||
if (entry != NULL) {
|
||||
*match = entry->offset + cctx->ibase;
|
||||
}
|
||||
|
||||
putHashOfCurrentPositionFromHash(cctx, hash);
|
||||
|
||||
}
|
||||
setNextHash(cctx);
|
||||
return 0;
|
||||
}
|
||||
|
||||
void LDM_encodeLiteralLengthAndLiterals(
|
||||
LDM_CCtx *cctx, BYTE *pToken, const U64 literalLength) {
|
||||
/* Encode the literal length. */
|
||||
if (literalLength >= RUN_MASK) {
|
||||
U64 len = (U64)literalLength - RUN_MASK;
|
||||
*pToken = (RUN_MASK << ML_BITS);
|
||||
for (; len >= 255; len -= 255) {
|
||||
*(cctx->op)++ = 255;
|
||||
}
|
||||
*(cctx->op)++ = (BYTE)len;
|
||||
} else {
|
||||
*pToken = (BYTE)(literalLength << ML_BITS);
|
||||
}
|
||||
|
||||
/* Encode the literals. */
|
||||
memcpy(cctx->op, cctx->anchor, literalLength);
|
||||
cctx->op += literalLength;
|
||||
}
|
||||
|
||||
void LDM_outputBlock(LDM_CCtx *cctx,
|
||||
const U64 literalLength,
|
||||
const U32 offset,
|
||||
const U64 matchLength) {
|
||||
BYTE *pToken = cctx->op++;
|
||||
|
||||
/* Encode the literal length and literals. */
|
||||
LDM_encodeLiteralLengthAndLiterals(cctx, pToken, literalLength);
|
||||
|
||||
/* Encode the offset. */
|
||||
MEM_write32(cctx->op, offset);
|
||||
cctx->op += LDM_OFFSET_SIZE;
|
||||
|
||||
/* Encode the match length. */
|
||||
if (matchLength >= ML_MASK) {
|
||||
U64 matchLengthRemaining = matchLength;
|
||||
*pToken += ML_MASK;
|
||||
matchLengthRemaining -= ML_MASK;
|
||||
MEM_write32(cctx->op, 0xFFFFFFFF);
|
||||
while (matchLengthRemaining >= 4*0xFF) {
|
||||
cctx->op += 4;
|
||||
MEM_write32(cctx->op, 0xffffffff);
|
||||
matchLengthRemaining -= 4*0xFF;
|
||||
}
|
||||
cctx->op += matchLengthRemaining / 255;
|
||||
*(cctx->op)++ = (BYTE)(matchLengthRemaining % 255);
|
||||
} else {
|
||||
*pToken += (BYTE)(matchLength);
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: maxDstSize is unused. This function may seg fault when writing
|
||||
// beyond the size of dst, as it does not check maxDstSize. Writing to
|
||||
// a buffer and performing checks is a possible solution.
|
||||
//
|
||||
// This is based upon lz4.
|
||||
size_t LDM_compress(const void *src, size_t srcSize,
|
||||
void *dst, size_t maxDstSize) {
|
||||
LDM_CCtx cctx;
|
||||
const BYTE *match = NULL;
|
||||
U64 forwardMatchLength = 0;
|
||||
U64 backwardsMatchLength = 0;
|
||||
|
||||
if (LDM_initializeCCtx(&cctx, src, srcSize, dst, maxDstSize)) {
|
||||
// Initialization failed.
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef OUTPUT_CONFIGURATION
|
||||
LDM_outputConfiguration();
|
||||
#endif
|
||||
|
||||
/* Hash the first position and put it into the hash table. */
|
||||
LDM_putHashOfCurrentPosition(&cctx);
|
||||
|
||||
cctx.lagIp = cctx.ip;
|
||||
cctx.lagHash = cctx.lastHash;
|
||||
|
||||
/**
|
||||
* Find a match.
|
||||
* If no more matches can be found (i.e. the length of the remaining input
|
||||
* is less than the minimum match length), then stop searching for matches
|
||||
* and encode the final literals.
|
||||
*/
|
||||
while (!LDM_findBestMatch(&cctx, &match, &forwardMatchLength,
|
||||
&backwardsMatchLength)) {
|
||||
|
||||
#ifdef COMPUTE_STATS
|
||||
cctx.stats.numMatches++;
|
||||
#endif
|
||||
|
||||
cctx.ip -= backwardsMatchLength;
|
||||
match -= backwardsMatchLength;
|
||||
|
||||
/**
|
||||
* Write current block (literals, literal length, match offset, match
|
||||
* length) and update pointers and hashes.
|
||||
*/
|
||||
{
|
||||
const U64 literalLength = cctx.ip - cctx.anchor;
|
||||
const U32 offset = cctx.ip - match;
|
||||
const U64 matchLength = forwardMatchLength +
|
||||
backwardsMatchLength -
|
||||
LDM_MIN_MATCH_LENGTH;
|
||||
|
||||
LDM_outputBlock(&cctx, literalLength, offset, matchLength);
|
||||
|
||||
#ifdef COMPUTE_STATS
|
||||
cctx.stats.totalLiteralLength += literalLength;
|
||||
cctx.stats.totalOffset += offset;
|
||||
cctx.stats.totalMatchLength += matchLength + LDM_MIN_MATCH_LENGTH;
|
||||
cctx.stats.minOffset =
|
||||
offset < cctx.stats.minOffset ? offset : cctx.stats.minOffset;
|
||||
cctx.stats.maxOffset =
|
||||
offset > cctx.stats.maxOffset ? offset : cctx.stats.maxOffset;
|
||||
cctx.stats.offsetHistogram[(U32)intLog2(offset)]++;
|
||||
cctx.stats.matchLengthHistogram[
|
||||
(U32)intLog2(matchLength + LDM_MIN_MATCH_LENGTH)]++;
|
||||
#endif
|
||||
|
||||
// Move ip to end of block, inserting hashes at each position.
|
||||
cctx.nextIp = cctx.ip + cctx.step;
|
||||
while (cctx.ip < cctx.anchor + LDM_MIN_MATCH_LENGTH +
|
||||
matchLength + literalLength) {
|
||||
if (cctx.ip > cctx.lastPosHashed) {
|
||||
// TODO: Simplify.
|
||||
LDM_updateLastHashFromNextHash(&cctx);
|
||||
setNextHash(&cctx);
|
||||
}
|
||||
cctx.ip++;
|
||||
cctx.nextIp++;
|
||||
}
|
||||
}
|
||||
|
||||
// Set start of next block to current input pointer.
|
||||
cctx.anchor = cctx.ip;
|
||||
LDM_updateLastHashFromNextHash(&cctx);
|
||||
}
|
||||
|
||||
/* Encode the last literals (no more matches). */
|
||||
{
|
||||
const U64 lastRun = cctx.iend - cctx.anchor;
|
||||
BYTE *pToken = cctx.op++;
|
||||
LDM_encodeLiteralLengthAndLiterals(&cctx, pToken, lastRun);
|
||||
}
|
||||
|
||||
#ifdef COMPUTE_STATS
|
||||
LDM_printCompressStats(&cctx.stats);
|
||||
HASH_outputTableOccupancy(cctx.hashTable);
|
||||
#endif
|
||||
|
||||
{
|
||||
const size_t ret = cctx.op - cctx.obase;
|
||||
LDM_destroyCCtx(&cctx);
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
|
||||
void LDM_outputConfiguration(void) {
|
||||
printf("=====================\n");
|
||||
printf("Configuration\n");
|
||||
printf("LDM_WINDOW_SIZE_LOG: %d\n", LDM_WINDOW_SIZE_LOG);
|
||||
printf("LDM_MIN_MATCH_LENGTH, LDM_HASH_LENGTH: %d, %d\n",
|
||||
LDM_MIN_MATCH_LENGTH, LDM_HASH_LENGTH);
|
||||
printf("LDM_MEMORY_USAGE: %d\n", LDM_MEMORY_USAGE);
|
||||
printf("HASH_ONLY_EVERY_LOG: %d\n", HASH_ONLY_EVERY_LOG);
|
||||
printf("HASH_BUCKET_SIZE_LOG: %d\n", HASH_BUCKET_SIZE_LOG);
|
||||
printf("LDM_LAG: %d\n", LDM_LAG);
|
||||
printf("USE_CHECKSUM: %d\n", USE_CHECKSUM);
|
||||
printf("INSERT_BY_TAG: %d\n", INSERT_BY_TAG);
|
||||
printf("HASH_CHAR_OFFSET: %d\n", HASH_CHAR_OFFSET);
|
||||
printf("=====================\n");
|
||||
}
|
||||
|
197
contrib/long_distance_matching/ldm.h
Normal file
197
contrib/long_distance_matching/ldm.h
Normal file
@ -0,0 +1,197 @@
|
||||
#ifndef LDM_H
|
||||
#define LDM_H
|
||||
|
||||
#include "mem.h" // from /lib/common/mem.h
|
||||
|
||||
//#include "ldm_params.h"
|
||||
|
||||
// =============================================================================
|
||||
// Modify the parameters in ldm_params.h if "ldm_params.h" is included.
|
||||
// Otherwise, modify the parameters here.
|
||||
// =============================================================================
|
||||
|
||||
#ifndef LDM_PARAMS_H
|
||||
// Defines the size of the hash table.
|
||||
// Note that this is not the number of buckets.
|
||||
// Currently this should be less than WINDOW_SIZE_LOG + 4.
|
||||
#define LDM_MEMORY_USAGE 23
|
||||
|
||||
// The number of entries in a hash bucket.
|
||||
#define HASH_BUCKET_SIZE_LOG 3 // The maximum is 4 for now.
|
||||
|
||||
// Defines the lag in inserting elements into the hash table.
|
||||
#define LDM_LAG 0
|
||||
|
||||
// The maximum window size when searching for matches.
|
||||
// The maximum value is 30
|
||||
#define LDM_WINDOW_SIZE_LOG 28
|
||||
|
||||
// The minimum match length.
|
||||
// This should be a multiple of four.
|
||||
#define LDM_MIN_MATCH_LENGTH 64
|
||||
|
||||
// If INSERT_BY_TAG, insert entries into the hash table as a function of the
|
||||
// hash. Certain hashes will not be inserted.
|
||||
//
|
||||
// Otherwise, insert as a function of the position.
|
||||
#define INSERT_BY_TAG 1
|
||||
|
||||
// Store a checksum with the hash table entries for faster comparison.
|
||||
// This halves the number of entries the hash table can contain.
|
||||
#define USE_CHECKSUM 1
|
||||
#endif
|
||||
|
||||
// Output compression statistics.
|
||||
#define COMPUTE_STATS
|
||||
|
||||
// Output the configuration.
|
||||
#define OUTPUT_CONFIGURATION
|
||||
|
||||
// If defined, forces the probability of insertion to be approximately
|
||||
// one per (1 << HASH_ONLY_EVERY_LOG). If not defined, the probability will be
|
||||
// calculated based on the memory usage and window size for "even" insertion
|
||||
// throughout the window.
|
||||
|
||||
// #define HASH_ONLY_EVERY_LOG 8
|
||||
|
||||
// =============================================================================
|
||||
|
||||
// The number of bytes storing the compressed and decompressed size
|
||||
// in the header.
|
||||
#define LDM_COMPRESSED_SIZE 8
|
||||
#define LDM_DECOMPRESSED_SIZE 8
|
||||
#define LDM_HEADER_SIZE ((LDM_COMPRESSED_SIZE)+(LDM_DECOMPRESSED_SIZE))
|
||||
|
||||
#define ML_BITS 4
|
||||
#define ML_MASK ((1U<<ML_BITS)-1)
|
||||
#define RUN_BITS (8-ML_BITS)
|
||||
#define RUN_MASK ((1U<<RUN_BITS)-1)
|
||||
|
||||
// The number of bytes storing the offset.
|
||||
#define LDM_OFFSET_SIZE 4
|
||||
|
||||
#define LDM_WINDOW_SIZE (1 << (LDM_WINDOW_SIZE_LOG))
|
||||
|
||||
// TODO: Match lengths that are too small do not use the hash table efficiently.
|
||||
// There should be a minimum hash length given the hash table size.
|
||||
#define LDM_HASH_LENGTH LDM_MIN_MATCH_LENGTH
|
||||
|
||||
typedef struct LDM_compressStats LDM_compressStats;
|
||||
typedef struct LDM_CCtx LDM_CCtx;
|
||||
typedef struct LDM_DCtx LDM_DCtx;
|
||||
|
||||
/**
|
||||
* Compresses src into dst.
|
||||
* Returns the compressed size if successful, 0 otherwise.
|
||||
*
|
||||
* NB: This currently ignores maxDstSize and assumes enough space is available.
|
||||
*
|
||||
* Block format (see lz4 documentation for more information):
|
||||
* github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md
|
||||
*
|
||||
* A block is composed of sequences. Each sequence begins with a token, which
|
||||
* is a one-byte value separated into two 4-bit fields.
|
||||
*
|
||||
* The first field uses the four high bits of the token and encodes the literal
|
||||
* length. If the field value is 0, there is no literal. If it is 15,
|
||||
* additional bytes are added (each ranging from 0 to 255) to the previous
|
||||
* value to produce a total length.
|
||||
*
|
||||
* Following the token and optional length bytes are the literals.
|
||||
*
|
||||
* Next are the 4 bytes representing the offset of the match (2 in lz4),
|
||||
* representing the position to copy the literals.
|
||||
*
|
||||
* The lower four bits of the token encode the match length. With additional
|
||||
* bytes added similarly to the additional literal length bytes after the offset.
|
||||
*
|
||||
* The last sequence is incomplete and stops right after the literals.
|
||||
*/
|
||||
size_t LDM_compress(const void *src, size_t srcSize,
|
||||
void *dst, size_t maxDstSize);
|
||||
|
||||
/**
|
||||
* Initialize the compression context.
|
||||
*
|
||||
* Allocates memory for the hash table.
|
||||
*
|
||||
* Returns 0 if successful, 1 otherwise.
|
||||
*/
|
||||
size_t LDM_initializeCCtx(LDM_CCtx *cctx,
|
||||
const void *src, size_t srcSize,
|
||||
void *dst, size_t maxDstSize);
|
||||
|
||||
/**
|
||||
* Frees up memory allocated in LDM_initializeCCtx().
|
||||
*/
|
||||
void LDM_destroyCCtx(LDM_CCtx *cctx);
|
||||
|
||||
/**
|
||||
* Prints the distribution of offsets in the hash table.
|
||||
*
|
||||
* The offsets are defined as the distance of the hash table entry from the
|
||||
* current input position of the cctx.
|
||||
*/
|
||||
void LDM_outputHashTableOffsetHistogram(const LDM_CCtx *cctx);
|
||||
|
||||
/**
|
||||
* Outputs compression statistics to stdout.
|
||||
*/
|
||||
void LDM_printCompressStats(const LDM_compressStats *stats);
|
||||
|
||||
/**
|
||||
* Encode the literal length followed by the literals.
|
||||
*
|
||||
* The literal length is written to the upper four bits of pToken, with
|
||||
* additional bytes written to the output as needed (see lz4).
|
||||
*
|
||||
* This is followed by literalLength bytes corresponding to the literals.
|
||||
*/
|
||||
void LDM_encodeLiteralLengthAndLiterals(LDM_CCtx *cctx, BYTE *pToken,
|
||||
const U64 literalLength);
|
||||
|
||||
/**
|
||||
* Write current block (literals, literal length, match offset,
|
||||
* match length).
|
||||
*/
|
||||
void LDM_outputBlock(LDM_CCtx *cctx,
|
||||
const U64 literalLength,
|
||||
const U32 offset,
|
||||
const U64 matchLength);
|
||||
|
||||
/**
|
||||
* Decompresses src into dst.
|
||||
*
|
||||
* Note: assumes src does not have a header.
|
||||
*/
|
||||
size_t LDM_decompress(const void *src, size_t srcSize,
|
||||
void *dst, size_t maxDstSize);
|
||||
|
||||
/**
|
||||
* Initialize the decompression context.
|
||||
*/
|
||||
void LDM_initializeDCtx(LDM_DCtx *dctx,
|
||||
const void *src, size_t compressedSize,
|
||||
void *dst, size_t maxDecompressedSize);
|
||||
|
||||
/**
|
||||
* Reads the header from src and writes the compressed size and
|
||||
* decompressed size into compressedSize and decompressedSize respectively.
|
||||
*
|
||||
* NB: LDM_compress and LDM_decompress currently do not add/read headers.
|
||||
*/
|
||||
void LDM_readHeader(const void *src, U64 *compressedSize,
|
||||
U64 *decompressedSize);
|
||||
|
||||
/**
|
||||
* Write the compressed and decompressed size.
|
||||
*/
|
||||
void LDM_writeHeader(void *memPtr, U64 compressedSize,
|
||||
U64 decompressedSize);
|
||||
|
||||
/**
|
||||
* Output the configuration used.
|
||||
*/
|
||||
void LDM_outputConfiguration(void);
|
||||
|
||||
#endif /* LDM_H */
|
109
contrib/long_distance_matching/ldm_common.c
Normal file
109
contrib/long_distance_matching/ldm_common.c
Normal file
@ -0,0 +1,109 @@
|
||||
#include <stdio.h>
|
||||
|
||||
#include "ldm.h"
|
||||
|
||||
/**
|
||||
* This function reads the header at the beginning of src and writes
|
||||
* the compressed and decompressed size to compressedSize and
|
||||
* decompressedSize.
|
||||
*
|
||||
* The header consists of 16 bytes: 8 bytes each in little-endian format
|
||||
* of the compressed size and the decompressed size.
|
||||
*/
|
||||
void LDM_readHeader(const void *src, U64 *compressedSize,
|
||||
U64 *decompressedSize) {
|
||||
const BYTE *ip = (const BYTE *)src;
|
||||
*compressedSize = MEM_readLE64(ip);
|
||||
*decompressedSize = MEM_readLE64(ip + 8);
|
||||
}
|
||||
|
||||
/**
|
||||
* Writes the 16-byte header (8-bytes each of the compressedSize and
|
||||
* decompressedSize in little-endian format) to memPtr.
|
||||
*/
|
||||
void LDM_writeHeader(void *memPtr, U64 compressedSize,
|
||||
U64 decompressedSize) {
|
||||
MEM_writeLE64(memPtr, compressedSize);
|
||||
MEM_writeLE64((BYTE *)memPtr + 8, decompressedSize);
|
||||
}
|
||||
|
||||
struct LDM_DCtx {
|
||||
size_t compressedSize;
|
||||
size_t maxDecompressedSize;
|
||||
|
||||
const BYTE *ibase; /* Base of input */
|
||||
const BYTE *ip; /* Current input position */
|
||||
const BYTE *iend; /* End of source */
|
||||
|
||||
const BYTE *obase; /* Base of output */
|
||||
BYTE *op; /* Current output position */
|
||||
const BYTE *oend; /* End of output */
|
||||
};
|
||||
|
||||
void LDM_initializeDCtx(LDM_DCtx *dctx,
|
||||
const void *src, size_t compressedSize,
|
||||
void *dst, size_t maxDecompressedSize) {
|
||||
dctx->compressedSize = compressedSize;
|
||||
dctx->maxDecompressedSize = maxDecompressedSize;
|
||||
|
||||
dctx->ibase = src;
|
||||
dctx->ip = (const BYTE *)src;
|
||||
dctx->iend = dctx->ip + dctx->compressedSize;
|
||||
dctx->op = dst;
|
||||
dctx->oend = dctx->op + dctx->maxDecompressedSize;
|
||||
}
|
||||
|
||||
size_t LDM_decompress(const void *src, size_t compressedSize,
|
||||
void *dst, size_t maxDecompressedSize) {
|
||||
|
||||
LDM_DCtx dctx;
|
||||
LDM_initializeDCtx(&dctx, src, compressedSize, dst, maxDecompressedSize);
|
||||
|
||||
while (dctx.ip < dctx.iend) {
|
||||
BYTE *cpy;
|
||||
const BYTE *match;
|
||||
size_t length, offset;
|
||||
|
||||
/* Get the literal length. */
|
||||
const unsigned token = *(dctx.ip)++;
|
||||
if ((length = (token >> ML_BITS)) == RUN_MASK) {
|
||||
unsigned s;
|
||||
do {
|
||||
s = *(dctx.ip)++;
|
||||
length += s;
|
||||
} while (s == 255);
|
||||
}
|
||||
|
||||
/* Copy the literals. */
|
||||
cpy = dctx.op + length;
|
||||
memcpy(dctx.op, dctx.ip, length);
|
||||
dctx.ip += length;
|
||||
dctx.op = cpy;
|
||||
|
||||
//TODO: dynamic offset size?
|
||||
/* Encode the offset. */
|
||||
offset = MEM_read32(dctx.ip);
|
||||
dctx.ip += LDM_OFFSET_SIZE;
|
||||
match = dctx.op - offset;
|
||||
|
||||
/* Get the match length. */
|
||||
length = token & ML_MASK;
|
||||
if (length == ML_MASK) {
|
||||
unsigned s;
|
||||
do {
|
||||
s = *(dctx.ip)++;
|
||||
length += s;
|
||||
} while (s == 255);
|
||||
}
|
||||
length += LDM_MIN_MATCH_LENGTH;
|
||||
|
||||
/* Copy match. */
|
||||
cpy = dctx.op + length;
|
||||
|
||||
// TODO: this can be made more efficient.
|
||||
while (match < cpy - offset && dctx.op < dctx.oend) {
|
||||
*(dctx.op)++ = *match++;
|
||||
}
|
||||
}
|
||||
return dctx.op - (BYTE *)dst;
|
||||
}
|
12
contrib/long_distance_matching/ldm_params.h
Normal file
12
contrib/long_distance_matching/ldm_params.h
Normal file
@ -0,0 +1,12 @@
|
||||
#ifndef LDM_PARAMS_H
|
||||
#define LDM_PARAMS_H
|
||||
|
||||
#define LDM_MEMORY_USAGE 23
|
||||
#define HASH_BUCKET_SIZE_LOG 3
|
||||
#define LDM_LAG 0
|
||||
#define LDM_WINDOW_SIZE_LOG 28
|
||||
#define LDM_MIN_MATCH_LENGTH 64
|
||||
#define INSERT_BY_TAG 1
|
||||
#define USE_CHECKSUM 1
|
||||
|
||||
#endif // LDM_PARAMS_H
|
269
contrib/long_distance_matching/main.c
Normal file
269
contrib/long_distance_matching/main.c
Normal file
@ -0,0 +1,269 @@
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <sys/time.h>
|
||||
#include <sys/types.h>
|
||||
#include <sys/mman.h>
|
||||
#include <sys/stat.h>
|
||||
#include <unistd.h>
|
||||
#include <zstd.h>
|
||||
|
||||
#include <fcntl.h>
|
||||
#include "ldm.h"
|
||||
#include "zstd.h"
|
||||
|
||||
// #define DECOMPRESS_AND_VERIFY
|
||||
|
||||
/* Compress file given by fname and output to oname.
|
||||
* Returns 0 if successful, error code otherwise.
|
||||
*
|
||||
* This adds a header from LDM_writeHeader to the beginning of the output.
|
||||
*
|
||||
* This might seg fault if the compressed size is > the decompress
|
||||
* size due to the mmapping and output file size allocated to be the input size
|
||||
* The compress function should check before writing or buffer writes.
|
||||
*/
|
||||
static int compress(const char *fname, const char *oname) {
|
||||
int fdin, fdout;
|
||||
struct stat statbuf;
|
||||
char *src, *dst;
|
||||
size_t maxCompressedSize, compressedSize;
|
||||
|
||||
struct timeval tv1, tv2;
|
||||
double timeTaken;
|
||||
|
||||
|
||||
/* Open the input file. */
|
||||
if ((fdin = open(fname, O_RDONLY)) < 0) {
|
||||
perror("Error in file opening");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Open the output file. */
|
||||
if ((fdout = open(oname, O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600)) < 0) {
|
||||
perror("Can't create output file");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Find the size of the input file. */
|
||||
if (fstat (fdin, &statbuf) < 0) {
|
||||
perror("Fstat error");
|
||||
return 1;
|
||||
}
|
||||
|
||||
maxCompressedSize = (statbuf.st_size + LDM_HEADER_SIZE);
|
||||
|
||||
// Handle case where compressed size is > decompressed size.
|
||||
// TODO: The compress function should check before writing or buffer writes.
|
||||
maxCompressedSize += statbuf.st_size / 255;
|
||||
|
||||
ftruncate(fdout, maxCompressedSize);
|
||||
|
||||
/* mmap the input file. */
|
||||
if ((src = mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, fdin, 0))
|
||||
== (caddr_t) - 1) {
|
||||
perror("mmap error for input");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* mmap the output file. */
|
||||
if ((dst = mmap(0, maxCompressedSize, PROT_READ | PROT_WRITE,
|
||||
MAP_SHARED, fdout, 0)) == (caddr_t) - 1) {
|
||||
perror("mmap error for output");
|
||||
return 1;
|
||||
}
|
||||
|
||||
gettimeofday(&tv1, NULL);
|
||||
|
||||
compressedSize = LDM_HEADER_SIZE +
|
||||
LDM_compress(src, statbuf.st_size,
|
||||
dst + LDM_HEADER_SIZE, maxCompressedSize);
|
||||
|
||||
gettimeofday(&tv2, NULL);
|
||||
|
||||
// Write the header.
|
||||
LDM_writeHeader(dst, compressedSize, statbuf.st_size);
|
||||
|
||||
// Truncate file to compressedSize.
|
||||
ftruncate(fdout, compressedSize);
|
||||
|
||||
printf("%25s : %10lu -> %10lu - %s \n", fname,
|
||||
(size_t)statbuf.st_size, (size_t)compressedSize, oname);
|
||||
printf("Compression ratio: %.2fx --- %.1f%%\n",
|
||||
(double)statbuf.st_size / (double)compressedSize,
|
||||
(double)compressedSize / (double)(statbuf.st_size) * 100.0);
|
||||
|
||||
timeTaken = (double) (tv2.tv_usec - tv1.tv_usec) / 1000000 +
|
||||
(double) (tv2.tv_sec - tv1.tv_sec),
|
||||
|
||||
printf("Total compress time = %.3f seconds, Average scanning speed: %.3f MB/s\n",
|
||||
timeTaken,
|
||||
((double)statbuf.st_size / (double) (1 << 20)) / timeTaken);
|
||||
|
||||
// Close files.
|
||||
close(fdin);
|
||||
close(fdout);
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef DECOMPRESS_AND_VERIFY
|
||||
/* Decompress file compressed using LDM_compress.
|
||||
* The input file should have the LDM_HEADER followed by payload.
|
||||
* Returns 0 if succesful, and an error code otherwise.
|
||||
*/
|
||||
static int decompress(const char *fname, const char *oname) {
|
||||
int fdin, fdout;
|
||||
struct stat statbuf;
|
||||
char *src, *dst;
|
||||
U64 compressedSize, decompressedSize;
|
||||
size_t outSize;
|
||||
|
||||
/* Open the input file. */
|
||||
if ((fdin = open(fname, O_RDONLY)) < 0) {
|
||||
perror("Error in file opening");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Open the output file. */
|
||||
if ((fdout = open(oname, O_RDWR | O_CREAT | O_TRUNC, (mode_t)0600)) < 0) {
|
||||
perror("Can't create output file");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Find the size of the input file. */
|
||||
if (fstat (fdin, &statbuf) < 0) {
|
||||
perror("Fstat error");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* mmap the input file. */
|
||||
if ((src = mmap(0, statbuf.st_size, PROT_READ, MAP_SHARED, fdin, 0))
|
||||
== (caddr_t) - 1) {
|
||||
perror("mmap error for input");
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* Read the header. */
|
||||
LDM_readHeader(src, &compressedSize, &decompressedSize);
|
||||
|
||||
ftruncate(fdout, decompressedSize);
|
||||
|
||||
/* mmap the output file */
|
||||
if ((dst = mmap(0, decompressedSize, PROT_READ | PROT_WRITE,
|
||||
MAP_SHARED, fdout, 0)) == (caddr_t) - 1) {
|
||||
perror("mmap error for output");
|
||||
return 1;
|
||||
}
|
||||
|
||||
outSize = LDM_decompress(
|
||||
src + LDM_HEADER_SIZE, statbuf.st_size - LDM_HEADER_SIZE,
|
||||
dst, decompressedSize);
|
||||
printf("Ret size out: %zu\n", outSize);
|
||||
|
||||
close(fdin);
|
||||
close(fdout);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* Compare two files.
|
||||
* Returns 0 iff they are the same.
|
||||
*/
|
||||
static int compare(FILE *fp0, FILE *fp1) {
|
||||
int result = 0;
|
||||
while (result == 0) {
|
||||
char b0[1024];
|
||||
char b1[1024];
|
||||
const size_t r0 = fread(b0, 1, sizeof(b0), fp0);
|
||||
const size_t r1 = fread(b1, 1, sizeof(b1), fp1);
|
||||
|
||||
result = (int)r0 - (int)r1;
|
||||
|
||||
if (0 == r0 || 0 == r1) break;
|
||||
|
||||
if (0 == result) result = memcmp(b0, b1, r0);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
/* Verify the input file is the same as the decompressed file. */
|
||||
static int verify(const char *inpFilename, const char *decFilename) {
|
||||
FILE *inpFp, *decFp;
|
||||
|
||||
if ((inpFp = fopen(inpFilename, "rb")) == NULL) {
|
||||
perror("Could not open input file\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
if ((decFp = fopen(decFilename, "rb")) == NULL) {
|
||||
perror("Could not open decompressed file\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
printf("verify : %s <-> %s\n", inpFilename, decFilename);
|
||||
{
|
||||
const int cmp = compare(inpFp, decFp);
|
||||
if(0 == cmp) {
|
||||
printf("verify : OK\n");
|
||||
} else {
|
||||
printf("verify : NG\n");
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
fclose(decFp);
|
||||
fclose(inpFp);
|
||||
return 0;
|
||||
}
|
||||
#endif
|
||||
|
||||
int main(int argc, const char *argv[]) {
|
||||
const char * const exeName = argv[0];
|
||||
char inpFilename[256] = { 0 };
|
||||
char ldmFilename[256] = { 0 };
|
||||
char decFilename[256] = { 0 };
|
||||
|
||||
if (argc < 2) {
|
||||
printf("Wrong arguments\n");
|
||||
printf("Usage:\n");
|
||||
printf("%s FILE\n", exeName);
|
||||
return 1;
|
||||
}
|
||||
|
||||
snprintf(inpFilename, 256, "%s", argv[1]);
|
||||
snprintf(ldmFilename, 256, "%s.ldm", argv[1]);
|
||||
snprintf(decFilename, 256, "%s.ldm.dec", argv[1]);
|
||||
|
||||
printf("inp = [%s]\n", inpFilename);
|
||||
printf("ldm = [%s]\n", ldmFilename);
|
||||
printf("dec = [%s]\n", decFilename);
|
||||
|
||||
/* Compress */
|
||||
{
|
||||
if (compress(inpFilename, ldmFilename)) {
|
||||
printf("Compress error\n");
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef DECOMPRESS_AND_VERIFY
|
||||
/* Decompress */
|
||||
{
|
||||
struct timeval tv1, tv2;
|
||||
gettimeofday(&tv1, NULL);
|
||||
if (decompress(ldmFilename, decFilename)) {
|
||||
printf("Decompress error\n");
|
||||
return 1;
|
||||
}
|
||||
gettimeofday(&tv2, NULL);
|
||||
printf("Total decompress time = %f seconds\n",
|
||||
(double) (tv2.tv_usec - tv1.tv_usec) / 1000000 +
|
||||
(double) (tv2.tv_sec - tv1.tv_sec));
|
||||
}
|
||||
/* verify */
|
||||
if (verify(inpFilename, decFilename)) {
|
||||
printf("Verification error\n");
|
||||
return 1;
|
||||
}
|
||||
#endif
|
||||
return 0;
|
||||
}
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,11 +1,11 @@
|
||||
# ##########################################################################
|
||||
# ################################################################
|
||||
# Copyright (c) 2016-present, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the BSD-style license found in the
|
||||
# LICENSE file in the root directory of this source tree. An additional grant
|
||||
# of patent rights can be found in the PATENTS file in the same directory.
|
||||
# ##########################################################################
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
# Standard variables for installation
|
||||
DESTDIR ?=
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "Options.h"
|
||||
#include "util.h"
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "Pzstd.h"
|
||||
#include "SkippableFrame.h"
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "SkippableFrame.h"
|
||||
#include "mem.h"
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "ErrorHolder.h"
|
||||
#include "Options.h"
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "Options.h"
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "Pzstd.h"
|
||||
extern "C" {
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
extern "C" {
|
||||
#include "datagen.h"
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/**
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/**
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#pragma once
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "utils/Buffer.h"
|
||||
#include "utils/Range.h"
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "utils/Range.h"
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "utils/ResourcePool.h"
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "utils/ScopeGuard.h"
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "utils/ThreadPool.h"
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
#include "utils/Buffer.h"
|
||||
#include "utils/WorkQueue.h"
|
||||
|
@ -2,9 +2,9 @@
|
||||
# Copyright (c) 2017-present, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the BSD-style license found in the
|
||||
# LICENSE file in the root directory of this source tree. An additional grant
|
||||
# of patent rights can be found in the PATENTS file in the same directory.
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
# This Makefile presumes libzstd is built, using `make` in / or /lib/
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2017-present, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
#include <stdlib.h> // malloc, free, exit, atoi
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2017-present, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/*
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2017-present, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
#include <stdlib.h> // malloc, free, exit, atoi
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
|
@ -1,10 +1,10 @@
|
||||
/**
|
||||
/*
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
#include <stdlib.h> /* malloc, free */
|
||||
|
34
doc/educational_decoder/Makefile
Normal file
34
doc/educational_decoder/Makefile
Normal file
@ -0,0 +1,34 @@
|
||||
HARNESS_FILES=*.c
|
||||
|
||||
MULTITHREAD_LDFLAGS = -pthread
|
||||
DEBUGFLAGS= -g -DZSTD_DEBUG=1
|
||||
CPPFLAGS += -I$(ZSTDDIR) -I$(ZSTDDIR)/common -I$(ZSTDDIR)/compress \
|
||||
-I$(ZSTDDIR)/dictBuilder -I$(ZSTDDIR)/deprecated -I$(PRGDIR)
|
||||
CFLAGS ?= -O3
|
||||
CFLAGS += -Wall -Wextra -Wcast-qual -Wcast-align -Wshadow \
|
||||
-Wstrict-aliasing=1 -Wswitch-enum -Wdeclaration-after-statement \
|
||||
-Wstrict-prototypes -Wundef -Wformat-security \
|
||||
-Wvla -Wformat=2 -Winit-self -Wfloat-equal -Wwrite-strings \
|
||||
-Wredundant-decls
|
||||
CFLAGS += $(DEBUGFLAGS)
|
||||
CFLAGS += $(MOREFLAGS)
|
||||
FLAGS = $(CPPFLAGS) $(CFLAGS) $(LDFLAGS) $(MULTITHREAD_LDFLAGS)
|
||||
|
||||
harness: $(HARNESS_FILES)
|
||||
$(CC) $(FLAGS) $^ -o $@
|
||||
|
||||
clean:
|
||||
@$(RM) -f harness
|
||||
@$(RM) -rf harness.dSYM
|
||||
|
||||
test: harness
|
||||
@zstd README.md -o tmp.zst
|
||||
@./harness tmp.zst tmp
|
||||
@diff -s tmp README.md
|
||||
@$(RM) -f tmp*
|
||||
@zstd --train harness.c zstd_decompress.c zstd_decompress.h README.md
|
||||
@zstd -D dictionary README.md -o tmp.zst
|
||||
@./harness tmp.zst tmp dictionary
|
||||
@diff -s tmp README.md
|
||||
@$(RM) -f tmp* dictionary
|
||||
@make clean
|
@ -2,9 +2,9 @@
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
@ -87,7 +87,7 @@ int main(int argc, char **argv) {
|
||||
}
|
||||
|
||||
size_t decompressed_size = ZSTD_get_decompressed_size(input, input_size);
|
||||
if (decompressed_size == -1) {
|
||||
if (decompressed_size == (size_t)-1) {
|
||||
decompressed_size = MAX_COMPRESSION_RATIO * input_size;
|
||||
fprintf(stderr, "WARNING: Compressed data does not contain "
|
||||
"decompressed size, going to assume the compression "
|
||||
@ -106,9 +106,15 @@ int main(int argc, char **argv) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
dictionary_t* const parsed_dict = create_dictionary();
|
||||
if (dict) {
|
||||
parse_dictionary(parsed_dict, dict, dict_size);
|
||||
}
|
||||
size_t decompressed =
|
||||
ZSTD_decompress_with_dict(output, decompressed_size,
|
||||
input, input_size, dict, dict_size);
|
||||
input, input_size, parsed_dict);
|
||||
|
||||
free_dictionary(parsed_dict);
|
||||
|
||||
write_file(argv[2], output, decompressed);
|
||||
|
||||
@ -117,4 +123,3 @@ int main(int argc, char **argv) {
|
||||
free(dict);
|
||||
input = output = dict = NULL;
|
||||
}
|
||||
|
||||
|
@ -2,9 +2,9 @@
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
/// Zstandard educational decoder implementation
|
||||
@ -14,21 +14,7 @@
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
/// Zstandard decompression functions.
|
||||
/// `dst` must point to a space at least as large as the reconstructed output.
|
||||
size_t ZSTD_decompress(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len);
|
||||
/// If `dict != NULL` and `dict_len >= 8`, does the same thing as
|
||||
/// `ZSTD_decompress` but uses the provided dict
|
||||
size_t ZSTD_decompress_with_dict(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len,
|
||||
const void *const dict, const size_t dict_len);
|
||||
|
||||
/// Get the decompressed size of an input stream so memory can be allocated in
|
||||
/// advance
|
||||
/// Returns -1 if the size can't be determined
|
||||
size_t ZSTD_get_decompressed_size(const void *const src, const size_t src_len);
|
||||
#include "zstd_decompress.h"
|
||||
|
||||
/******* UTILITY MACROS AND TYPES *********************************************/
|
||||
// Max block size decompressed size is 128 KB and literal blocks can't be
|
||||
@ -108,10 +94,10 @@ static inline size_t IO_istream_len(const istream_t *const in);
|
||||
|
||||
/// Advances the stream by `len` bytes, and returns a pointer to the chunk that
|
||||
/// was skipped. The stream must be byte aligned.
|
||||
static inline const u8 *IO_read_bytes(istream_t *const in, size_t len);
|
||||
static inline const u8 *IO_get_read_ptr(istream_t *const in, size_t len);
|
||||
/// Advances the stream by `len` bytes, and returns a pointer to the chunk that
|
||||
/// was skipped so it can be written to.
|
||||
static inline u8 *IO_write_bytes(ostream_t *const out, size_t len);
|
||||
static inline u8 *IO_get_write_ptr(ostream_t *const out, size_t len);
|
||||
|
||||
/// Advance the inner state by `len` bytes. The stream must be byte aligned.
|
||||
static inline void IO_advance_input(istream_t *const in, size_t len);
|
||||
@ -307,7 +293,7 @@ typedef struct {
|
||||
|
||||
/// The decoded contents of a dictionary so that it doesn't have to be repeated
|
||||
/// for each frame that uses it
|
||||
typedef struct {
|
||||
struct dictionary_s {
|
||||
// Entropy tables
|
||||
HUF_dtable literals_dtable;
|
||||
FSE_dtable ll_dtable;
|
||||
@ -322,7 +308,7 @@ typedef struct {
|
||||
u64 previous_offsets[3];
|
||||
|
||||
u32 dictionary_id;
|
||||
} dictionary_t;
|
||||
};
|
||||
|
||||
/// A tuple containing the parts necessary to decode and execute a ZSTD sequence
|
||||
/// command
|
||||
@ -367,27 +353,36 @@ static void execute_sequences(frame_context_t *const ctx, ostream_t *const out,
|
||||
const sequence_command_t *const sequences,
|
||||
const size_t num_sequences);
|
||||
|
||||
// Parse a provided dictionary blob for use in decompression
|
||||
static void parse_dictionary(dictionary_t *const dict, const u8 *src,
|
||||
size_t src_len);
|
||||
static void free_dictionary(dictionary_t *const dict);
|
||||
// Copies literals and returns the total literal length that was copied
|
||||
static u32 copy_literals(const size_t seq, istream_t *litstream,
|
||||
ostream_t *const out);
|
||||
|
||||
// Given an offset code from a sequence command (either an actual offset value
|
||||
// or an index for previous offset), computes the correct offset and udpates
|
||||
// the offset history
|
||||
static size_t compute_offset(sequence_command_t seq, u64 *const offset_hist);
|
||||
|
||||
// Given an offset, match length, and total output, as well as the frame
|
||||
// context for the dictionary, determines if the dictionary is used and
|
||||
// executes the copy operation
|
||||
static void execute_match_copy(frame_context_t *const ctx, size_t offset,
|
||||
size_t match_length, size_t total_output,
|
||||
ostream_t *const out);
|
||||
|
||||
/******* END ZSTD HELPER STRUCTS AND PROTOTYPES *******************************/
|
||||
|
||||
size_t ZSTD_decompress(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len) {
|
||||
return ZSTD_decompress_with_dict(dst, dst_len, src, src_len, NULL, 0);
|
||||
dictionary_t* uninit_dict = create_dictionary();
|
||||
size_t const decomp_size = ZSTD_decompress_with_dict(dst, dst_len, src,
|
||||
src_len, uninit_dict);
|
||||
free_dictionary(uninit_dict);
|
||||
return decomp_size;
|
||||
}
|
||||
|
||||
size_t ZSTD_decompress_with_dict(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len,
|
||||
const void *const dict,
|
||||
const size_t dict_len) {
|
||||
dictionary_t parsed_dict;
|
||||
memset(&parsed_dict, 0, sizeof(dictionary_t));
|
||||
// dict_len < 8 is not a valid dictionary
|
||||
if (dict && dict_len > 8) {
|
||||
parse_dictionary(&parsed_dict, (const u8 *)dict, dict_len);
|
||||
}
|
||||
dictionary_t* parsed_dict) {
|
||||
|
||||
istream_t in = IO_make_istream(src, src_len);
|
||||
ostream_t out = IO_make_ostream(dst, dst_len);
|
||||
@ -396,11 +391,9 @@ size_t ZSTD_decompress_with_dict(void *const dst, const size_t dst_len,
|
||||
// Multiple frames can be appended into a single file or stream. A frame is
|
||||
// totally independent, has a defined beginning and end, and a set of
|
||||
// parameters which tells the decoder how to decompress it."
|
||||
while (IO_istream_len(&in) > 0) {
|
||||
decode_frame(&out, &in, &parsed_dict);
|
||||
}
|
||||
|
||||
free_dictionary(&parsed_dict);
|
||||
/* this decoder assumes decompression of a single frame */
|
||||
decode_frame(&out, &in, parsed_dict);
|
||||
|
||||
return out.ptr - (u8 *)dst;
|
||||
}
|
||||
@ -424,30 +417,6 @@ static void decompress_data(frame_context_t *const ctx, ostream_t *const out,
|
||||
static void decode_frame(ostream_t *const out, istream_t *const in,
|
||||
const dictionary_t *const dict) {
|
||||
const u32 magic_number = IO_read_bits(in, 32);
|
||||
|
||||
// Skippable frame
|
||||
//
|
||||
// "Magic_Number
|
||||
//
|
||||
// 4 Bytes, little-endian format. Value : 0x184D2A5?, which means any value
|
||||
// from 0x184D2A50 to 0x184D2A5F. All 16 values are valid to identify a
|
||||
// skippable frame."
|
||||
if ((magic_number & ~0xFU) == 0x184D2A50U) {
|
||||
// "Skippable frames allow the insertion of user-defined data into a
|
||||
// flow of concatenated frames. Its design is pretty straightforward,
|
||||
// with the sole objective to allow the decoder to quickly skip over
|
||||
// user-defined data and continue decoding.
|
||||
//
|
||||
// Skippable frames defined in this specification are compatible with
|
||||
// LZ4 ones."
|
||||
const size_t frame_size = IO_read_bits(in, 32);
|
||||
|
||||
// skip over frame
|
||||
IO_advance_input(in, frame_size);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
// Zstandard frame
|
||||
//
|
||||
// "Magic_Number
|
||||
@ -460,8 +429,8 @@ static void decode_frame(ostream_t *const out, istream_t *const in,
|
||||
return;
|
||||
}
|
||||
|
||||
// not a real frame
|
||||
ERROR("Invalid magic number");
|
||||
// not a real frame or a skippable frame
|
||||
ERROR("Tried to decode non-ZSTD frame");
|
||||
}
|
||||
|
||||
/// Decode a frame that contains compressed data. Not all frames do as there
|
||||
@ -672,8 +641,8 @@ static void decompress_data(frame_context_t *const ctx, ostream_t *const out,
|
||||
case 0: {
|
||||
// "Raw_Block - this is an uncompressed block. Block_Size is the
|
||||
// number of bytes to read and copy."
|
||||
const u8 *const read_ptr = IO_read_bytes(in, block_len);
|
||||
u8 *const write_ptr = IO_write_bytes(out, block_len);
|
||||
const u8 *const read_ptr = IO_get_read_ptr(in, block_len);
|
||||
u8 *const write_ptr = IO_get_write_ptr(out, block_len);
|
||||
|
||||
// Copy the raw data into the output
|
||||
memcpy(write_ptr, read_ptr, block_len);
|
||||
@ -685,8 +654,8 @@ static void decompress_data(frame_context_t *const ctx, ostream_t *const out,
|
||||
// "RLE_Block - this is a single byte, repeated N times. In which
|
||||
// case, Block_Size is the size to regenerate, while the
|
||||
// "compressed" block is just 1 byte (the byte to repeat)."
|
||||
const u8 *const read_ptr = IO_read_bytes(in, 1);
|
||||
u8 *const write_ptr = IO_write_bytes(out, block_len);
|
||||
const u8 *const read_ptr = IO_get_read_ptr(in, 1);
|
||||
u8 *const write_ptr = IO_get_write_ptr(out, block_len);
|
||||
|
||||
// Copy `block_len` copies of `read_ptr[0]` to the output
|
||||
memset(write_ptr, read_ptr[0], block_len);
|
||||
@ -832,13 +801,13 @@ static size_t decode_literals_simple(istream_t *const in, u8 **const literals,
|
||||
switch (block_type) {
|
||||
case 0: {
|
||||
// "Raw_Literals_Block - Literals are stored uncompressed."
|
||||
const u8 *const read_ptr = IO_read_bytes(in, size);
|
||||
const u8 *const read_ptr = IO_get_read_ptr(in, size);
|
||||
memcpy(*literals, read_ptr, size);
|
||||
break;
|
||||
}
|
||||
case 1: {
|
||||
// "RLE_Literals_Block - Literals consist of a single byte value repeated N times."
|
||||
const u8 *const read_ptr = IO_read_bytes(in, 1);
|
||||
const u8 *const read_ptr = IO_get_read_ptr(in, 1);
|
||||
memset(*literals, read_ptr[0], size);
|
||||
break;
|
||||
}
|
||||
@ -949,7 +918,7 @@ static void decode_huf_table(HUF_dtable *const dtable, istream_t *const in) {
|
||||
num_symbs = header - 127;
|
||||
const size_t bytes = (num_symbs + 1) / 2;
|
||||
|
||||
const u8 *const weight_src = IO_read_bytes(in, bytes);
|
||||
const u8 *const weight_src = IO_get_read_ptr(in, bytes);
|
||||
|
||||
for (int i = 0; i < num_symbs; i++) {
|
||||
// "They are encoded forward, 2
|
||||
@ -1157,7 +1126,7 @@ static void decompress_sequences(frame_context_t *const ctx, istream_t *in,
|
||||
}
|
||||
|
||||
const size_t len = IO_istream_len(in);
|
||||
const u8 *const src = IO_read_bytes(in, len);
|
||||
const u8 *const src = IO_get_read_ptr(in, len);
|
||||
|
||||
// "After writing the last bit containing information, the compressor writes
|
||||
// a single 1-bit and then fills the byte with 0-7 0 bits of padding."
|
||||
@ -1262,7 +1231,7 @@ static void decode_seq_table(FSE_dtable *const table, istream_t *const in,
|
||||
}
|
||||
case seq_rle: {
|
||||
// "RLE_Mode : it's a single code, repeated Number_of_Sequences times."
|
||||
const u8 symb = IO_read_bytes(in, 1)[0];
|
||||
const u8 symb = IO_get_read_ptr(in, 1)[0];
|
||||
FSE_init_dtable_rle(table, symb);
|
||||
break;
|
||||
}
|
||||
@ -1303,145 +1272,146 @@ static void execute_sequences(frame_context_t *const ctx, ostream_t *const out,
|
||||
|
||||
for (size_t i = 0; i < num_sequences; i++) {
|
||||
const sequence_command_t seq = sequences[i];
|
||||
|
||||
{
|
||||
// If the sequence asks for more literals than are left, the
|
||||
// sequence must be corrupted
|
||||
if (seq.literal_length > IO_istream_len(&litstream)) {
|
||||
CORRUPTION();
|
||||
}
|
||||
|
||||
u8 *const write_ptr = IO_write_bytes(out, seq.literal_length);
|
||||
const u8 *const read_ptr =
|
||||
IO_read_bytes(&litstream, seq.literal_length);
|
||||
// Copy literals to output
|
||||
memcpy(write_ptr, read_ptr, seq.literal_length);
|
||||
|
||||
total_output += seq.literal_length;
|
||||
const u32 literals_size = copy_literals(seq.literal_length, &litstream, out);
|
||||
total_output += literals_size;
|
||||
}
|
||||
|
||||
size_t offset;
|
||||
size_t const offset = compute_offset(seq, offset_hist);
|
||||
|
||||
// Offsets are special, we need to handle the repeat offsets
|
||||
if (seq.offset <= 3) {
|
||||
// "The first 3 values define a repeated offset and we will call
|
||||
// them Repeated_Offset1, Repeated_Offset2, and Repeated_Offset3.
|
||||
// They are sorted in recency order, with Repeated_Offset1 meaning
|
||||
// 'most recent one'".
|
||||
size_t const match_length = seq.match_length;
|
||||
|
||||
// Use 0 indexing for the array
|
||||
u32 idx = seq.offset - 1;
|
||||
if (seq.literal_length == 0) {
|
||||
// "There is an exception though, when current sequence's
|
||||
// literals length is 0. In this case, repeated offsets are
|
||||
// shifted by one, so Repeated_Offset1 becomes Repeated_Offset2,
|
||||
// Repeated_Offset2 becomes Repeated_Offset3, and
|
||||
// Repeated_Offset3 becomes Repeated_Offset1 - 1_byte."
|
||||
idx++;
|
||||
}
|
||||
execute_match_copy(ctx, offset, match_length, total_output, out);
|
||||
|
||||
if (idx == 0) {
|
||||
offset = offset_hist[0];
|
||||
} else {
|
||||
// If idx == 3 then literal length was 0 and the offset was 3,
|
||||
// as per the exception listed above
|
||||
offset = idx < 3 ? offset_hist[idx] : offset_hist[0] - 1;
|
||||
|
||||
// If idx == 1 we don't need to modify offset_hist[2], since
|
||||
// we're using the second-most recent code
|
||||
if (idx > 1) {
|
||||
offset_hist[2] = offset_hist[1];
|
||||
}
|
||||
offset_hist[1] = offset_hist[0];
|
||||
offset_hist[0] = offset;
|
||||
}
|
||||
} else {
|
||||
// When it's not a repeat offset:
|
||||
// "if (Offset_Value > 3) offset = Offset_Value - 3;"
|
||||
offset = seq.offset - 3;
|
||||
|
||||
// Shift back history
|
||||
offset_hist[2] = offset_hist[1];
|
||||
offset_hist[1] = offset_hist[0];
|
||||
offset_hist[0] = offset;
|
||||
}
|
||||
|
||||
size_t match_length = seq.match_length;
|
||||
|
||||
u8 *write_ptr = IO_write_bytes(out, match_length);
|
||||
if (total_output <= ctx->header.window_size) {
|
||||
// In this case offset might go back into the dictionary
|
||||
if (offset > total_output + ctx->dict_content_len) {
|
||||
// The offset goes beyond even the dictionary
|
||||
CORRUPTION();
|
||||
}
|
||||
|
||||
if (offset > total_output) {
|
||||
// "The rest of the dictionary is its content. The content act
|
||||
// as a "past" in front of data to compress or decompress, so it
|
||||
// can be referenced in sequence commands."
|
||||
const size_t dict_copy =
|
||||
MIN(offset - total_output, match_length);
|
||||
const size_t dict_offset =
|
||||
ctx->dict_content_len - (offset - total_output);
|
||||
|
||||
memcpy(write_ptr, ctx->dict_content + dict_offset, dict_copy);
|
||||
write_ptr += dict_copy;
|
||||
match_length -= dict_copy;
|
||||
}
|
||||
} else if (offset > ctx->header.window_size) {
|
||||
CORRUPTION();
|
||||
}
|
||||
|
||||
// We must copy byte by byte because the match length might be larger
|
||||
// than the offset
|
||||
// ex: if the output so far was "abc", a command with offset=3 and
|
||||
// match_length=6 would produce "abcabcabc" as the new output
|
||||
for (size_t i = 0; i < match_length; i++) {
|
||||
*write_ptr = *(write_ptr - offset);
|
||||
write_ptr++;
|
||||
}
|
||||
|
||||
total_output += seq.match_length;
|
||||
total_output += match_length;
|
||||
}
|
||||
|
||||
// Copy any leftover literals
|
||||
{
|
||||
size_t len = IO_istream_len(&litstream);
|
||||
u8 *const write_ptr = IO_write_bytes(out, len);
|
||||
const u8 *const read_ptr = IO_read_bytes(&litstream, len);
|
||||
memcpy(write_ptr, read_ptr, len);
|
||||
|
||||
copy_literals(len, &litstream, out);
|
||||
total_output += len;
|
||||
}
|
||||
|
||||
ctx->current_total_output = total_output;
|
||||
}
|
||||
|
||||
static u32 copy_literals(const size_t literal_length, istream_t *litstream,
|
||||
ostream_t *const out) {
|
||||
// If the sequence asks for more literals than are left, the
|
||||
// sequence must be corrupted
|
||||
if (literal_length > IO_istream_len(litstream)) {
|
||||
CORRUPTION();
|
||||
}
|
||||
|
||||
u8 *const write_ptr = IO_get_write_ptr(out, literal_length);
|
||||
const u8 *const read_ptr =
|
||||
IO_get_read_ptr(litstream, literal_length);
|
||||
// Copy literals to output
|
||||
memcpy(write_ptr, read_ptr, literal_length);
|
||||
|
||||
return literal_length;
|
||||
}
|
||||
|
||||
static size_t compute_offset(sequence_command_t seq, u64 *const offset_hist) {
|
||||
size_t offset;
|
||||
// Offsets are special, we need to handle the repeat offsets
|
||||
if (seq.offset <= 3) {
|
||||
// "The first 3 values define a repeated offset and we will call
|
||||
// them Repeated_Offset1, Repeated_Offset2, and Repeated_Offset3.
|
||||
// They are sorted in recency order, with Repeated_Offset1 meaning
|
||||
// 'most recent one'".
|
||||
|
||||
// Use 0 indexing for the array
|
||||
u32 idx = seq.offset - 1;
|
||||
if (seq.literal_length == 0) {
|
||||
// "There is an exception though, when current sequence's
|
||||
// literals length is 0. In this case, repeated offsets are
|
||||
// shifted by one, so Repeated_Offset1 becomes Repeated_Offset2,
|
||||
// Repeated_Offset2 becomes Repeated_Offset3, and
|
||||
// Repeated_Offset3 becomes Repeated_Offset1 - 1_byte."
|
||||
idx++;
|
||||
}
|
||||
|
||||
if (idx == 0) {
|
||||
offset = offset_hist[0];
|
||||
} else {
|
||||
// If idx == 3 then literal length was 0 and the offset was 3,
|
||||
// as per the exception listed above
|
||||
offset = idx < 3 ? offset_hist[idx] : offset_hist[0] - 1;
|
||||
|
||||
// If idx == 1 we don't need to modify offset_hist[2], since
|
||||
// we're using the second-most recent code
|
||||
if (idx > 1) {
|
||||
offset_hist[2] = offset_hist[1];
|
||||
}
|
||||
offset_hist[1] = offset_hist[0];
|
||||
offset_hist[0] = offset;
|
||||
}
|
||||
} else {
|
||||
// When it's not a repeat offset:
|
||||
// "if (Offset_Value > 3) offset = Offset_Value - 3;"
|
||||
offset = seq.offset - 3;
|
||||
|
||||
// Shift back history
|
||||
offset_hist[2] = offset_hist[1];
|
||||
offset_hist[1] = offset_hist[0];
|
||||
offset_hist[0] = offset;
|
||||
}
|
||||
return offset;
|
||||
}
|
||||
|
||||
static void execute_match_copy(frame_context_t *const ctx, size_t offset,
|
||||
size_t match_length, size_t total_output,
|
||||
ostream_t *const out) {
|
||||
u8 *write_ptr = IO_get_write_ptr(out, match_length);
|
||||
if (total_output <= ctx->header.window_size) {
|
||||
// In this case offset might go back into the dictionary
|
||||
if (offset > total_output + ctx->dict_content_len) {
|
||||
// The offset goes beyond even the dictionary
|
||||
CORRUPTION();
|
||||
}
|
||||
|
||||
if (offset > total_output) {
|
||||
// "The rest of the dictionary is its content. The content act
|
||||
// as a "past" in front of data to compress or decompress, so it
|
||||
// can be referenced in sequence commands."
|
||||
const size_t dict_copy =
|
||||
MIN(offset - total_output, match_length);
|
||||
const size_t dict_offset =
|
||||
ctx->dict_content_len - (offset - total_output);
|
||||
|
||||
memcpy(write_ptr, ctx->dict_content + dict_offset, dict_copy);
|
||||
write_ptr += dict_copy;
|
||||
match_length -= dict_copy;
|
||||
}
|
||||
} else if (offset > ctx->header.window_size) {
|
||||
CORRUPTION();
|
||||
}
|
||||
|
||||
// We must copy byte by byte because the match length might be larger
|
||||
// than the offset
|
||||
// ex: if the output so far was "abc", a command with offset=3 and
|
||||
// match_length=6 would produce "abcabcabc" as the new output
|
||||
for (size_t j = 0; j < match_length; j++) {
|
||||
*write_ptr = *(write_ptr - offset);
|
||||
write_ptr++;
|
||||
}
|
||||
}
|
||||
/******* END SEQUENCE EXECUTION ***********************************************/
|
||||
|
||||
/******* OUTPUT SIZE COUNTING *************************************************/
|
||||
static void traverse_frame(const frame_header_t *const header, istream_t *const in);
|
||||
|
||||
/// Get the decompressed size of an input stream so memory can be allocated in
|
||||
/// advance.
|
||||
/// This is more complex than the implementation in the reference
|
||||
/// implementation, as this API allows for the decompression of multiple
|
||||
/// concatenated frames.
|
||||
/// This implementation assumes `src` points to a single ZSTD-compressed frame
|
||||
size_t ZSTD_get_decompressed_size(const void *src, const size_t src_len) {
|
||||
istream_t in = IO_make_istream(src, src_len);
|
||||
size_t dst_size = 0;
|
||||
|
||||
// Each frame header only gives us the size of its frame, so iterate over
|
||||
// all
|
||||
// frames
|
||||
while (IO_istream_len(&in) > 0) {
|
||||
// get decompressed size from ZSTD frame header
|
||||
{
|
||||
const u32 magic_number = IO_read_bits(&in, 32);
|
||||
|
||||
if ((magic_number & ~0xFU) == 0x184D2A50U) {
|
||||
// skippable frame, this has no impact on output size
|
||||
const size_t frame_size = IO_read_bits(&in, 32);
|
||||
IO_advance_input(&in, frame_size);
|
||||
} else if (magic_number == 0xFD2FB528U) {
|
||||
if (magic_number == 0xFD2FB528U) {
|
||||
// ZSTD frame
|
||||
frame_header_t header;
|
||||
parse_frame_header(&header, &in);
|
||||
@ -1451,68 +1421,42 @@ size_t ZSTD_get_decompressed_size(const void *src, const size_t src_len) {
|
||||
return -1;
|
||||
}
|
||||
|
||||
dst_size += header.frame_content_size;
|
||||
|
||||
// Consume the input from the frame to reach the start of the next
|
||||
traverse_frame(&header, &in);
|
||||
return header.frame_content_size;
|
||||
} else {
|
||||
// not a real frame
|
||||
ERROR("Invalid magic number");
|
||||
// not a real frame or skippable frame
|
||||
ERROR("ZSTD frame magic number did not match");
|
||||
}
|
||||
}
|
||||
|
||||
return dst_size;
|
||||
}
|
||||
|
||||
/// Iterate over each block in a frame to find the end of it, to get to the
|
||||
/// start of the next frame
|
||||
static void traverse_frame(const frame_header_t *const header, istream_t *const in) {
|
||||
int last_block = 0;
|
||||
|
||||
do {
|
||||
// Parse the block header
|
||||
last_block = IO_read_bits(in, 1);
|
||||
const int block_type = IO_read_bits(in, 2);
|
||||
const size_t block_len = IO_read_bits(in, 21);
|
||||
|
||||
switch (block_type) {
|
||||
case 0: // Raw block, block_len bytes
|
||||
IO_advance_input(in, block_len);
|
||||
break;
|
||||
case 1: // RLE block, 1 byte
|
||||
IO_advance_input(in, 1);
|
||||
break;
|
||||
case 2: // Compressed block, compressed size is block_len
|
||||
IO_advance_input(in, block_len);
|
||||
break;
|
||||
case 3:
|
||||
// Reserved block type
|
||||
CORRUPTION();
|
||||
break;
|
||||
default:
|
||||
IMPOSSIBLE();
|
||||
}
|
||||
} while (!last_block);
|
||||
|
||||
if (header->content_checksum_flag) {
|
||||
IO_advance_input(in, 4);
|
||||
}
|
||||
}
|
||||
|
||||
/******* END OUTPUT SIZE COUNTING *********************************************/
|
||||
|
||||
/******* DICTIONARY PARSING ***************************************************/
|
||||
#define DICT_SIZE_ERROR() ERROR("Dictionary size cannot be less than 8 bytes")
|
||||
#define NULL_SRC() ERROR("Tried to create dictionary with pointer to null src");
|
||||
|
||||
dictionary_t* create_dictionary() {
|
||||
dictionary_t* dict = calloc(1, sizeof(dictionary_t));
|
||||
if (!dict) {
|
||||
BAD_ALLOC();
|
||||
}
|
||||
return dict;
|
||||
}
|
||||
|
||||
static void init_dictionary_content(dictionary_t *const dict,
|
||||
istream_t *const in);
|
||||
|
||||
static void parse_dictionary(dictionary_t *const dict, const u8 *src,
|
||||
void parse_dictionary(dictionary_t *const dict, const void *src,
|
||||
size_t src_len) {
|
||||
const u8 *byte_src = (const u8 *)src;
|
||||
memset(dict, 0, sizeof(dictionary_t));
|
||||
if (src == NULL) { /* cannot initialize dictionary with null src */
|
||||
NULL_SRC();
|
||||
}
|
||||
if (src_len < 8) {
|
||||
INP_SIZE();
|
||||
DICT_SIZE_ERROR();
|
||||
}
|
||||
|
||||
istream_t in = IO_make_istream(src, src_len);
|
||||
istream_t in = IO_make_istream(byte_src, src_len);
|
||||
|
||||
const u32 magic_number = IO_read_bits(&in, 32);
|
||||
if (magic_number != 0xEC30A437) {
|
||||
@ -1564,13 +1508,13 @@ static void init_dictionary_content(dictionary_t *const dict,
|
||||
BAD_ALLOC();
|
||||
}
|
||||
|
||||
const u8 *const content = IO_read_bytes(in, dict->content_size);
|
||||
const u8 *const content = IO_get_read_ptr(in, dict->content_size);
|
||||
|
||||
memcpy(dict->content, content, dict->content_size);
|
||||
}
|
||||
|
||||
/// Free an allocated dictionary
|
||||
static void free_dictionary(dictionary_t *const dict) {
|
||||
void free_dictionary(dictionary_t *const dict) {
|
||||
HUF_free_dtable(&dict->literals_dtable);
|
||||
FSE_free_dtable(&dict->ll_dtable);
|
||||
FSE_free_dtable(&dict->of_dtable);
|
||||
@ -1579,6 +1523,8 @@ static void free_dictionary(dictionary_t *const dict) {
|
||||
free(dict->content);
|
||||
|
||||
memset(dict, 0, sizeof(dictionary_t));
|
||||
|
||||
free(dict);
|
||||
}
|
||||
/******* END DICTIONARY PARSING ***********************************************/
|
||||
|
||||
@ -1657,7 +1603,7 @@ static inline size_t IO_istream_len(const istream_t *const in) {
|
||||
|
||||
/// Returns a pointer where `len` bytes can be read, and advances the internal
|
||||
/// state. The stream must be byte aligned.
|
||||
static inline const u8 *IO_read_bytes(istream_t *const in, size_t len) {
|
||||
static inline const u8 *IO_get_read_ptr(istream_t *const in, size_t len) {
|
||||
if (len > in->len) {
|
||||
INP_SIZE();
|
||||
}
|
||||
@ -1671,7 +1617,7 @@ static inline const u8 *IO_read_bytes(istream_t *const in, size_t len) {
|
||||
return ptr;
|
||||
}
|
||||
/// Returns a pointer to write `len` bytes to, and advances the internal state
|
||||
static inline u8 *IO_write_bytes(ostream_t *const out, size_t len) {
|
||||
static inline u8 *IO_get_write_ptr(ostream_t *const out, size_t len) {
|
||||
if (len > out->len) {
|
||||
OUT_SIZE();
|
||||
}
|
||||
@ -1710,7 +1656,7 @@ static inline istream_t IO_make_istream(const u8 *in, size_t len) {
|
||||
/// `in` must be byte aligned
|
||||
static inline istream_t IO_make_sub_istream(istream_t *const in, size_t len) {
|
||||
// Consume `len` bytes of the parent stream
|
||||
const u8 *const ptr = IO_read_bytes(in, len);
|
||||
const u8 *const ptr = IO_get_read_ptr(in, len);
|
||||
|
||||
// Make a substream using the pointer to those `len` bytes
|
||||
return IO_make_istream(ptr, len);
|
||||
@ -1814,7 +1760,7 @@ static size_t HUF_decompress_1stream(const HUF_dtable *const dtable,
|
||||
if (len == 0) {
|
||||
INP_SIZE();
|
||||
}
|
||||
const u8 *const src = IO_read_bytes(in, len);
|
||||
const u8 *const src = IO_get_read_ptr(in, len);
|
||||
|
||||
// "Each bitstream must be read backward, that is starting from the end down
|
||||
// to the beginning. Therefore it's necessary to know the size of each
|
||||
@ -2065,7 +2011,7 @@ static size_t FSE_decompress_interleaved2(const FSE_dtable *const dtable,
|
||||
if (len == 0) {
|
||||
INP_SIZE();
|
||||
}
|
||||
const u8 *const src = IO_read_bytes(in, len);
|
||||
const u8 *const src = IO_get_read_ptr(in, len);
|
||||
|
||||
// "Each bitstream must be read backward, that is starting from the end down
|
||||
// to the beginning. Therefore it's necessary to know the size of each
|
||||
@ -2192,7 +2138,7 @@ static void FSE_init_dtable(FSE_dtable *const dtable,
|
||||
}
|
||||
|
||||
// Now we can fill baseline and num bits
|
||||
for (int i = 0; i < size; i++) {
|
||||
for (size_t i = 0; i < size; i++) {
|
||||
u8 symbol = dtable->symbols[i];
|
||||
u16 next_state_desc = state_desc[symbol]++;
|
||||
// Fills in the table appropriately, next_state_desc increases by symbol
|
||||
@ -2355,4 +2301,3 @@ static void FSE_copy_dtable(FSE_dtable *const dst, const FSE_dtable *const src)
|
||||
memcpy(dst->new_state_base, src->new_state_base, size * sizeof(u16));
|
||||
}
|
||||
/******* END FSE PRIMITIVES ***************************************************/
|
||||
|
||||
|
@ -1,16 +1,58 @@
|
||||
/*
|
||||
* Copyright (c) 2017-present, Facebook, Inc.
|
||||
* Copyright (c) 2016-present, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the BSD-style license found in the
|
||||
* LICENSE file in the root directory of this source tree. An additional grant
|
||||
* of patent rights can be found in the PATENTS file in the same directory.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
size_t ZSTD_decompress(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len);
|
||||
size_t ZSTD_decompress_with_dict(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len,
|
||||
const void *const dict, const size_t dict_len);
|
||||
size_t ZSTD_get_decompressed_size(const void *const src, const size_t src_len);
|
||||
/******* EXPOSED TYPES ********************************************************/
|
||||
/*
|
||||
* Contains the parsed contents of a dictionary
|
||||
* This includes Huffman and FSE tables used for decoding and data on offsets
|
||||
*/
|
||||
typedef struct dictionary_s dictionary_t;
|
||||
/******* END EXPOSED TYPES ****************************************************/
|
||||
|
||||
/******* DECOMPRESSION FUNCTIONS **********************************************/
|
||||
/// Zstandard decompression functions.
|
||||
/// `dst` must point to a space at least as large as the reconstructed output.
|
||||
size_t ZSTD_decompress(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len);
|
||||
|
||||
/// If `dict != NULL` and `dict_len >= 8`, does the same thing as
|
||||
/// `ZSTD_decompress` but uses the provided dict
|
||||
size_t ZSTD_decompress_with_dict(void *const dst, const size_t dst_len,
|
||||
const void *const src, const size_t src_len,
|
||||
dictionary_t* parsed_dict);
|
||||
|
||||
/// Get the decompressed size of an input stream so memory can be allocated in
|
||||
/// advance
|
||||
/// Returns -1 if the size can't be determined
|
||||
/// Assumes decompression of a single frame
|
||||
size_t ZSTD_get_decompressed_size(const void *const src, const size_t src_len);
|
||||
/******* END DECOMPRESSION FUNCTIONS ******************************************/
|
||||
|
||||
/******* DICTIONARY MANAGEMENT ***********************************************/
|
||||
/*
|
||||
* Return a valid dictionary_t pointer for use with dictionary initialization
|
||||
* or decompression
|
||||
*/
|
||||
dictionary_t* create_dictionary();
|
||||
|
||||
/*
|
||||
* Parse a provided dictionary blob for use in decompression
|
||||
* `src` -- must point to memory space representing the dictionary
|
||||
* `src_len` -- must provide the dictionary size
|
||||
* `dict` -- will contain the parsed contents of the dictionary and
|
||||
* can be used for decompression
|
||||
*/
|
||||
void parse_dictionary(dictionary_t *const dict, const void *src,
|
||||
size_t src_len);
|
||||
|
||||
/*
|
||||
* Free internal Huffman tables, FSE tables, and dictionary content
|
||||
*/
|
||||
void free_dictionary(dictionary_t *const dict);
|
||||
/******* END DICTIONARY MANAGEMENT *******************************************/
|
||||
|
@ -16,7 +16,7 @@ Distribution of this document is unlimited.
|
||||
|
||||
### Version
|
||||
|
||||
0.2.5 (31/03/17)
|
||||
0.2.6 (19/08/17)
|
||||
|
||||
|
||||
Introduction
|
||||
@ -106,7 +106,7 @@ The structure of a single Zstandard frame is following:
|
||||
|
||||
| `Magic_Number` | `Frame_Header` |`Data_Block`| [More data blocks] | [`Content_Checksum`] |
|
||||
|:--------------:|:--------------:|:----------:| ------------------ |:--------------------:|
|
||||
| 4 bytes | 2-14 bytes | n bytes | | 0-4 bytes |
|
||||
| 4 bytes | 2-14 bytes | n bytes | | 0-4 bytes |
|
||||
|
||||
__`Magic_Number`__
|
||||
|
||||
@ -1249,23 +1249,30 @@ Consequently, a last byte of `0` is not possible.
|
||||
And the final-bit-flag itself is not part of the useful bitstream.
|
||||
Hence, the last byte contains between 0 and 7 useful bits.
|
||||
|
||||
For example, if the literal sequence "0145" was encoded using the prefix codes above,
|
||||
it would be encoded as:
|
||||
```
|
||||
00000001 01110000
|
||||
```
|
||||
Starting from the end,
|
||||
it's possible to read the bitstream in a __little-endian__ fashion,
|
||||
keeping track of already used bits. Since the bitstream is encoded in reverse
|
||||
order, starting from the end read symbols in forward order.
|
||||
|
||||
For example, if the literal sequence "0145" was encoded using above prefix code,
|
||||
it would be encoded (in reverse order) as:
|
||||
|
||||
|Symbol | 5 | 4 | 1 | 0 | Padding |
|
||||
|--------|------|------|----|---|---------|
|
||||
|Encoding|`0000`|`0001`|`01`|`1`| `10000` |
|
||||
|Encoding|`0000`|`0001`|`01`|`1`| `00001` |
|
||||
|
||||
Starting from the end,
|
||||
it's possible to read the bitstream in a __little-endian__ fashion,
|
||||
keeping track of already used bits. Since the bitstream is encoded in reverse
|
||||
order, by starting at the end the symbols can be read in forward order.
|
||||
Resulting in following 2-bytes bitstream :
|
||||
```
|
||||
00010000 00001101
|
||||
```
|
||||
|
||||
Reading the last `Max_Number_of_Bits` bits,
|
||||
it's then possible to compare extracted value to decoding table,
|
||||
Here is an alternative representation with the symbol codes separated by underscore:
|
||||
```
|
||||
0001_0000 00001_1_01
|
||||
```
|
||||
|
||||
Reading highest `Max_Number_of_Bits` bits,
|
||||
it's possible to compare extracted value to decoding table,
|
||||
determining the symbol to decode and number of bits to discard.
|
||||
|
||||
The process continues up to reading the required number of symbols per stream.
|
||||
@ -1516,12 +1523,13 @@ to crosscheck that an implementation build its decoding tables correctly.
|
||||
|
||||
Version changes
|
||||
---------------
|
||||
- 0.2.6 : fixed an error in huffman example, by Ulrich Kunitz
|
||||
- 0.2.5 : minor typos and clarifications
|
||||
- 0.2.4 : section restructuring, by Sean Purcell
|
||||
- 0.2.3 : clarified several details, by Sean Purcell
|
||||
- 0.2.2 : added predefined codes, by Johannes Rudolph
|
||||
- 0.2.1 : clarify field names, by Przemyslaw Skibinski
|
||||
- 0.2.0 : numerous format adjustments for zstd v0.8
|
||||
- 0.2.0 : numerous format adjustments for zstd v0.8+
|
||||
- 0.1.2 : limit Huffman tree depth to 11 bits
|
||||
- 0.1.1 : reserved dictID ranges
|
||||
- 0.1.0 : initial release
|
||||
|
@ -2,9 +2,9 @@
|
||||
# Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
# All rights reserved.
|
||||
#
|
||||
# This source code is licensed under the BSD-style license found in the
|
||||
# LICENSE file in the root directory of this source tree. An additional grant
|
||||
# of patent rights can be found in the PATENTS file in the same directory.
|
||||
# This source code is licensed under both the BSD-style license (found in the
|
||||
# LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
# in the COPYING file in the root directory of this source tree).
|
||||
# ################################################################
|
||||
|
||||
# This Makefile presumes libzstd is installed, using `sudo make install`
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
#include <stdlib.h> // malloc, exit
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
|
@ -1,9 +1,10 @@
|
||||
/**
|
||||
* Copyright 2016-present, Yann Collet, Facebook, Inc.
|
||||
/*
|
||||
* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* This source code is licensed under the license found in the
|
||||
* LICENSE-examples file in the root directory of this source tree.
|
||||
* This source code is licensed under both the BSD-style license (found in the
|
||||
* LICENSE file in the root directory of this source tree) and the GPLv2 (found
|
||||
* in the COPYING file in the root directory of this source tree).
|
||||
*/
|
||||
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user